Mercurial > cortex
changeset 513:4c4d45f6f30b
accept/reject changes
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Sun, 30 Mar 2014 10:41:18 -0400 |
parents | 8b962ab418c8 |
children | 447c3c8405a2 |
files | thesis/dxh-cortex-diff.diff thesis/dylan-accept.diff thesis/dylan-cortex-diff.diff thesis/dylan-reject.diff |
diffstat | 4 files changed, 428 insertions(+), 428 deletions(-) [+] |
line wrap: on
line diff
1.1 --- a/thesis/dxh-cortex-diff.diff Sun Mar 30 10:39:19 2014 -0400 1.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 1.3 @@ -1,428 +0,0 @@ 1.4 -diff -r f639e2139ce2 thesis/cortex.org 1.5 ---- a/thesis/cortex.org Sun Mar 30 01:34:43 2014 -0400 1.6 -+++ b/thesis/cortex.org Sun Mar 30 10:07:17 2014 -0400 1.7 -@@ -41,49 +41,46 @@ 1.8 - [[./images/aurellem-gray.png]] 1.9 - 1.10 - 1.11 --* Empathy and Embodiment as problem solving strategies 1.12 -+* Empathy \& Embodiment: problem solving strategies 1.13 - 1.14 -- By the end of this thesis, you will have seen a novel approach to 1.15 -- interpreting video using embodiment and empathy. You will have also 1.16 -- seen one way to efficiently implement empathy for embodied 1.17 -- creatures. Finally, you will become familiar with =CORTEX=, a system 1.18 -- for designing and simulating creatures with rich senses, which you 1.19 -- may choose to use in your own research. 1.20 -- 1.21 -- This is the core vision of my thesis: That one of the important ways 1.22 -- in which we understand others is by imagining ourselves in their 1.23 -- position and emphatically feeling experiences relative to our own 1.24 -- bodies. By understanding events in terms of our own previous 1.25 -- corporeal experience, we greatly constrain the possibilities of what 1.26 -- would otherwise be an unwieldy exponential search. This extra 1.27 -- constraint can be the difference between easily understanding what 1.28 -- is happening in a video and being completely lost in a sea of 1.29 -- incomprehensible color and movement. 1.30 -- 1.31 --** Recognizing actions in video is extremely difficult 1.32 -- 1.33 -- Consider for example the problem of determining what is happening 1.34 -- in a video of which this is one frame: 1.35 -- 1.36 -+** The problem: recognizing actions in video is extremely difficult 1.37 -+# developing / requires useful representations 1.38 -+ 1.39 -+ Examine the following collection of images. As you, and indeed very 1.40 -+ young children, can easily determine, each one is a picture of 1.41 -+ someone drinking. 1.42 -+ 1.43 -+ # dxh: cat, cup, drinking fountain, rain, straw, coconut 1.44 - #+caption: A cat drinking some water. Identifying this action is 1.45 -- #+caption: beyond the state of the art for computers. 1.46 -+ #+caption: beyond the capabilities of existing computer vision systems. 1.47 - #+ATTR_LaTeX: :width 7cm 1.48 - [[./images/cat-drinking.jpg]] 1.49 -+ 1.50 -+ Nevertheless, it is beyond the state of the art for a computer 1.51 -+ vision program to describe what's happening in each of these 1.52 -+ images, or what's common to them. Part of the problem is that many 1.53 -+ computer vision systems focus on pixel-level details or probability 1.54 -+ distributions of pixels, with little focus on [...] 1.55 -+ 1.56 -+ 1.57 -+ In fact, the contents of scene may have much less to do with pixel 1.58 -+ probabilities than with recognizing various affordances: things you 1.59 -+ can move, objects you can grasp, spaces that can be filled 1.60 -+ (Gibson). For example, what processes might enable you to see the 1.61 -+ chair in figure \ref{hidden-chair}? 1.62 -+ # Or suppose that you are building a program that recognizes chairs. 1.63 -+ # How could you ``see'' the chair ? 1.64 - 1.65 -- It is currently impossible for any computer program to reliably 1.66 -- label such a video as ``drinking''. And rightly so -- it is a very 1.67 -- hard problem! What features can you describe in terms of low level 1.68 -- functions of pixels that can even begin to describe at a high level 1.69 -- what is happening here? 1.70 -- 1.71 -- Or suppose that you are building a program that recognizes chairs. 1.72 -- How could you ``see'' the chair in figure \ref{hidden-chair}? 1.73 -- 1.74 -+ # dxh: blur chair 1.75 - #+caption: The chair in this image is quite obvious to humans, but I 1.76 - #+caption: doubt that any modern computer vision program can find it. 1.77 - #+name: hidden-chair 1.78 - #+ATTR_LaTeX: :width 10cm 1.79 - [[./images/fat-person-sitting-at-desk.jpg]] 1.80 -+ 1.81 -+ 1.82 -+ 1.83 -+ 1.84 - 1.85 - Finally, how is it that you can easily tell the difference between 1.86 - how the girls /muscles/ are working in figure \ref{girl}? 1.87 -@@ -95,10 +92,13 @@ 1.88 - #+ATTR_LaTeX: :width 7cm 1.89 - [[./images/wall-push.png]] 1.90 - 1.91 -+ 1.92 -+ 1.93 -+ 1.94 - Each of these examples tells us something about what might be going 1.95 - on in our minds as we easily solve these recognition problems. 1.96 - 1.97 -- The hidden chairs show us that we are strongly triggered by cues 1.98 -+ The hidden chair shows us that we are strongly triggered by cues 1.99 - relating to the position of human bodies, and that we can determine 1.100 - the overall physical configuration of a human body even if much of 1.101 - that body is occluded. 1.102 -@@ -109,10 +109,107 @@ 1.103 - most positions, and we can easily project this self-knowledge to 1.104 - imagined positions triggered by images of the human body. 1.105 - 1.106 --** =EMPATH= neatly solves recognition problems 1.107 -+** A step forward: the sensorimotor-centered approach 1.108 -+# ** =EMPATH= recognizes what creatures are doing 1.109 -+# neatly solves recognition problems 1.110 -+ In this thesis, I explore the idea that our knowledge of our own 1.111 -+ bodies enables us to recognize the actions of others. 1.112 -+ 1.113 -+ First, I built a system for constructing virtual creatures with 1.114 -+ physiologically plausible sensorimotor systems and detailed 1.115 -+ environments. The result is =CORTEX=, which is described in section 1.116 -+ \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other 1.117 -+ AI researchers; it is provided in full with detailed instructions 1.118 -+ on the web [here].) 1.119 -+ 1.120 -+ Next, I wrote routines which enabled a simple worm-like creature to 1.121 -+ infer the actions of a second worm-like creature, using only its 1.122 -+ own prior sensorimotor experiences and knowledge of the second 1.123 -+ worm's joint positions. This program, =EMPATH=, is described in 1.124 -+ section \ref{sec-3}, and the key results of this experiment are 1.125 -+ summarized below. 1.126 -+ 1.127 -+ #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer 1.128 -+ #+caption: the complete sensory experience and classify these four poses. 1.129 -+ #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}. 1.130 -+ #+name: worm-recognition-intro-2 1.131 -+ #+ATTR_LaTeX: :width 15cm 1.132 -+ [[./images/empathy-1.png]] 1.133 -+ 1.134 -+ # =CORTEX= provides a language for describing the sensorimotor 1.135 -+ # experiences of various creatures. 1.136 -+ 1.137 -+ # Next, I developed an experiment to test the power of =CORTEX='s 1.138 -+ # sensorimotor-centered language for solving recognition problems. As 1.139 -+ # a proof of concept, I wrote routines which enabled a simple 1.140 -+ # worm-like creature to infer the actions of a second worm-like 1.141 -+ # creature, using only its own previous sensorimotor experiences and 1.142 -+ # knowledge of the second worm's joints (figure 1.143 -+ # \ref{worm-recognition-intro-2}). The result of this proof of 1.144 -+ # concept was the program =EMPATH=, described in section 1.145 -+ # \ref{sec-3}. The key results of this 1.146 -+ 1.147 -+ # Using only first-person sensorimotor experiences and third-person 1.148 -+ # proprioceptive data, 1.149 -+ 1.150 -+*** Key results 1.151 -+ - After one-shot supervised training, =EMPATH= was able recognize a 1.152 -+ wide variety of static poses and dynamic actions---ranging from 1.153 -+ curling in a circle to wriggling with a particular frequency --- 1.154 -+ with 95\% accuracy. 1.155 -+ - These results were completely independent of viewing angle 1.156 -+ because the underlying body-centered language fundamentally is; 1.157 -+ once an action is learned, it can be recognized equally well from 1.158 -+ any viewing angle. 1.159 -+ - =EMPATH= is surprisingly short; the sensorimotor-centered 1.160 -+ language provided by =CORTEX= resulted in extremely economical 1.161 -+ recognition routines --- about 0000 lines in all --- suggesting 1.162 -+ that such representations are very powerful, and often 1.163 -+ indispensible for the types of recognition tasks considered here. 1.164 -+ - Although for expediency's sake, I relied on direct knowledge of 1.165 -+ joint positions in this proof of concept, it would be 1.166 -+ straightforward to extend =EMPATH= so that it (more 1.167 -+ realistically) infers joint positions from its visual data. 1.168 -+ 1.169 -+# because the underlying language is fundamentally orientation-independent 1.170 -+ 1.171 -+# recognize the actions of a worm with 95\% accuracy. The 1.172 -+# recognition tasks 1.173 - 1.174 -- I propose a system that can express the types of recognition 1.175 -- problems above in a form amenable to computation. It is split into 1.176 -+ 1.177 -+ 1.178 -+ 1.179 -+ [Talk about these results and what you find promising about them] 1.180 -+ 1.181 -+** Roadmap 1.182 -+ [I'm going to explain how =CORTEX= works, then break down how 1.183 -+ =EMPATH= does its thing. Because the details reveal such-and-such 1.184 -+ about the approach.] 1.185 -+ 1.186 -+ # The success of this simple proof-of-concept offers a tantalizing 1.187 -+ 1.188 -+ 1.189 -+ # explore the idea 1.190 -+ # The key contribution of this thesis is the idea that body-centered 1.191 -+ # representations (which express 1.192 -+ 1.193 -+ 1.194 -+ # the 1.195 -+ # body-centered approach --- in which I try to determine what's 1.196 -+ # happening in a scene by bringing it into registration with my own 1.197 -+ # bodily experiences --- are indispensible for recognizing what 1.198 -+ # creatures are doing in a scene. 1.199 -+ 1.200 -+* COMMENT 1.201 -+# body-centered language 1.202 -+ 1.203 -+ In this thesis, I'll describe =EMPATH=, which solves a certain 1.204 -+ class of recognition problems 1.205 -+ 1.206 -+ The key idea is to use self-centered (or first-person) language. 1.207 -+ 1.208 -+ I have built a system that can express the types of recognition 1.209 -+ problems in a form amenable to computation. It is split into 1.210 - four parts: 1.211 - 1.212 - - Free/Guided Play :: The creature moves around and experiences the 1.213 -@@ -286,14 +383,14 @@ 1.214 - code to create a creature, and can use a wide library of 1.215 - pre-existing blender models as a base for your own creatures. 1.216 - 1.217 -- - =CORTEX= implements a wide variety of senses, including touch, 1.218 -+ - =CORTEX= implements a wide variety of senses: touch, 1.219 - proprioception, vision, hearing, and muscle tension. Complicated 1.220 - senses like touch, and vision involve multiple sensory elements 1.221 - embedded in a 2D surface. You have complete control over the 1.222 - distribution of these sensor elements through the use of simple 1.223 - png image files. In particular, =CORTEX= implements more 1.224 - comprehensive hearing than any other creature simulation system 1.225 -- available. 1.226 -+ available. 1.227 - 1.228 - - =CORTEX= supports any number of creatures and any number of 1.229 - senses. Time in =CORTEX= dialates so that the simulated creatures 1.230 -@@ -353,7 +450,24 @@ 1.231 - \end{sidewaysfigure} 1.232 - #+END_LaTeX 1.233 - 1.234 --** Contributions 1.235 -+** Road map 1.236 -+ 1.237 -+ By the end of this thesis, you will have seen a novel approach to 1.238 -+ interpreting video using embodiment and empathy. You will have also 1.239 -+ seen one way to efficiently implement empathy for embodied 1.240 -+ creatures. Finally, you will become familiar with =CORTEX=, a system 1.241 -+ for designing and simulating creatures with rich senses, which you 1.242 -+ may choose to use in your own research. 1.243 -+ 1.244 -+ This is the core vision of my thesis: That one of the important ways 1.245 -+ in which we understand others is by imagining ourselves in their 1.246 -+ position and emphatically feeling experiences relative to our own 1.247 -+ bodies. By understanding events in terms of our own previous 1.248 -+ corporeal experience, we greatly constrain the possibilities of what 1.249 -+ would otherwise be an unwieldy exponential search. This extra 1.250 -+ constraint can be the difference between easily understanding what 1.251 -+ is happening in a video and being completely lost in a sea of 1.252 -+ incomprehensible color and movement. 1.253 - 1.254 - - I built =CORTEX=, a comprehensive platform for embodied AI 1.255 - experiments. =CORTEX= supports many features lacking in other 1.256 -@@ -363,18 +477,22 @@ 1.257 - - I built =EMPATH=, which uses =CORTEX= to identify the actions of 1.258 - a worm-like creature using a computational model of empathy. 1.259 - 1.260 --* Building =CORTEX= 1.261 -- 1.262 -- I intend for =CORTEX= to be used as a general-purpose library for 1.263 -- building creatures and outfitting them with senses, so that it will 1.264 -- be useful for other researchers who want to test out ideas of their 1.265 -- own. To this end, wherver I have had to make archetictural choices 1.266 -- about =CORTEX=, I have chosen to give as much freedom to the user as 1.267 -- possible, so that =CORTEX= may be used for things I have not 1.268 -- forseen. 1.269 -- 1.270 --** Simulation or Reality? 1.271 -- 1.272 -+ 1.273 -+* Designing =CORTEX= 1.274 -+ In this section, I outline the design decisions that went into 1.275 -+ making =CORTEX=, along with some details about its 1.276 -+ implementation. (A practical guide to getting started with =CORTEX=, 1.277 -+ which skips over the history and implementation details presented 1.278 -+ here, is provided in an appendix \ref{} at the end of this paper.) 1.279 -+ 1.280 -+ Throughout this project, I intended for =CORTEX= to be flexible and 1.281 -+ extensible enough to be useful for other researchers who want to 1.282 -+ test out ideas of their own. To this end, wherver I have had to make 1.283 -+ archetictural choices about =CORTEX=, I have chosen to give as much 1.284 -+ freedom to the user as possible, so that =CORTEX= may be used for 1.285 -+ things I have not forseen. 1.286 -+ 1.287 -+** Building in simulation versus reality 1.288 - The most important archetictural decision of all is the choice to 1.289 - use a computer-simulated environemnt in the first place! The world 1.290 - is a vast and rich place, and for now simulations are a very poor 1.291 -@@ -436,7 +554,7 @@ 1.292 - doing everything in software is far cheaper than building custom 1.293 - real-time hardware. All you need is a laptop and some patience. 1.294 - 1.295 --** Because of Time, simulation is perferable to reality 1.296 -+** Simulated time enables rapid prototyping and complex scenes 1.297 - 1.298 - I envision =CORTEX= being used to support rapid prototyping and 1.299 - iteration of ideas. Even if I could put together a well constructed 1.300 -@@ -459,8 +577,8 @@ 1.301 - simulations of very simple creatures in =CORTEX= generally run at 1.302 - 40x on my machine! 1.303 - 1.304 --** What is a sense? 1.305 -- 1.306 -+** All sense organs are two-dimensional surfaces 1.307 -+# What is a sense? 1.308 - If =CORTEX= is to support a wide variety of senses, it would help 1.309 - to have a better understanding of what a ``sense'' actually is! 1.310 - While vision, touch, and hearing all seem like they are quite 1.311 -@@ -956,7 +1074,7 @@ 1.312 - #+ATTR_LaTeX: :width 15cm 1.313 - [[./images/physical-hand.png]] 1.314 - 1.315 --** Eyes reuse standard video game components 1.316 -+** Sight reuses standard video game components... 1.317 - 1.318 - Vision is one of the most important senses for humans, so I need to 1.319 - build a simulated sense of vision for my AI. I will do this with 1.320 -@@ -1257,8 +1375,8 @@ 1.321 - community and is now (in modified form) part of a system for 1.322 - capturing in-game video to a file. 1.323 - 1.324 --** Hearing is hard; =CORTEX= does it right 1.325 -- 1.326 -+** ...but hearing must be built from scratch 1.327 -+# is hard; =CORTEX= does it right 1.328 - At the end of this section I will have simulated ears that work the 1.329 - same way as the simulated eyes in the last section. I will be able to 1.330 - place any number of ear-nodes in a blender file, and they will bind to 1.331 -@@ -1565,7 +1683,7 @@ 1.332 - jMonkeyEngine3 community and is used to record audio for demo 1.333 - videos. 1.334 - 1.335 --** Touch uses hundreds of hair-like elements 1.336 -+** Hundreds of hair-like elements provide a sense of touch 1.337 - 1.338 - Touch is critical to navigation and spatial reasoning and as such I 1.339 - need a simulated version of it to give to my AI creatures. 1.340 -@@ -2059,7 +2177,7 @@ 1.341 - #+ATTR_LaTeX: :width 15cm 1.342 - [[./images/touch-cube.png]] 1.343 - 1.344 --** Proprioception is the sense that makes everything ``real'' 1.345 -+** Proprioception provides knowledge of your own body's position 1.346 - 1.347 - Close your eyes, and touch your nose with your right index finger. 1.348 - How did you do it? You could not see your hand, and neither your 1.349 -@@ -2193,7 +2311,7 @@ 1.350 - #+ATTR_LaTeX: :width 11cm 1.351 - [[./images/proprio.png]] 1.352 - 1.353 --** Muscles are both effectors and sensors 1.354 -+** Muscles contain both sensors and effectors 1.355 - 1.356 - Surprisingly enough, terrestrial creatures only move by using 1.357 - torque applied about their joints. There's not a single straight 1.358 -@@ -2440,7 +2558,8 @@ 1.359 - hard control problems without worrying about physics or 1.360 - senses. 1.361 - 1.362 --* Empathy in a simulated worm 1.363 -+* =EMPATH=: the simulated worm experiment 1.364 -+# Empathy in a simulated worm 1.365 - 1.366 - Here I develop a computational model of empathy, using =CORTEX= as a 1.367 - base. Empathy in this context is the ability to observe another 1.368 -@@ -2732,7 +2851,7 @@ 1.369 - provided by an experience vector and reliably infering the rest of 1.370 - the senses. 1.371 - 1.372 --** Empathy is the process of tracing though \Phi-space 1.373 -+** ``Empathy'' requires retracing steps though \Phi-space 1.374 - 1.375 - Here is the core of a basic empathy algorithm, starting with an 1.376 - experience vector: 1.377 -@@ -2888,7 +3007,7 @@ 1.378 - #+end_src 1.379 - #+end_listing 1.380 - 1.381 --** Efficient action recognition with =EMPATH= 1.382 -+** =EMPATH= recognizes actions efficiently 1.383 - 1.384 - To use =EMPATH= with the worm, I first need to gather a set of 1.385 - experiences from the worm that includes the actions I want to 1.386 -@@ -3044,9 +3163,9 @@ 1.387 - to interpretation, and dissaggrement between empathy and experience 1.388 - is more excusable. 1.389 - 1.390 --** Digression: bootstrapping touch using free exploration 1.391 -- 1.392 -- In the previous section I showed how to compute actions in terms of 1.393 -+** Digression: Learn touch sensor layout through haptic experimentation, instead 1.394 -+# Boostraping touch using free exploration 1.395 -+In the previous section I showed how to compute actions in terms of 1.396 - body-centered predicates which relied averate touch activation of 1.397 - pre-defined regions of the worm's skin. What if, instead of recieving 1.398 - touch pre-grouped into the six faces of each worm segment, the true 1.399 -@@ -3210,13 +3329,14 @@ 1.400 - 1.401 - In this thesis you have seen the =CORTEX= system, a complete 1.402 - environment for creating simulated creatures. You have seen how to 1.403 -- implement five senses including touch, proprioception, hearing, 1.404 -- vision, and muscle tension. You have seen how to create new creatues 1.405 -- using blender, a 3D modeling tool. I hope that =CORTEX= will be 1.406 -- useful in further research projects. To this end I have included the 1.407 -- full source to =CORTEX= along with a large suite of tests and 1.408 -- examples. I have also created a user guide for =CORTEX= which is 1.409 -- inculded in an appendix to this thesis. 1.410 -+ implement five senses: touch, proprioception, hearing, vision, and 1.411 -+ muscle tension. You have seen how to create new creatues using 1.412 -+ blender, a 3D modeling tool. I hope that =CORTEX= will be useful in 1.413 -+ further research projects. To this end I have included the full 1.414 -+ source to =CORTEX= along with a large suite of tests and examples. I 1.415 -+ have also created a user guide for =CORTEX= which is inculded in an 1.416 -+ appendix to this thesis \ref{}. 1.417 -+# dxh: todo reference appendix 1.418 - 1.419 - You have also seen how I used =CORTEX= as a platform to attach the 1.420 - /action recognition/ problem, which is the problem of recognizing 1.421 -@@ -3234,8 +3354,8 @@ 1.422 - 1.423 - - =CORTEX=, a system for creating simulated creatures with rich 1.424 - senses. 1.425 -- - =EMPATH=, a program for recognizing actions by imagining sensory 1.426 -- experience. 1.427 -+ - =EMPATH=, a program for recognizing actions by aligning them with 1.428 -+ personal sensory experiences. 1.429 - 1.430 - # An anatomical joke: 1.431 - # - Training
2.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 2.2 +++ b/thesis/dylan-accept.diff Sun Mar 30 10:41:18 2014 -0400 2.3 @@ -0,0 +1,22 @@ 2.4 +@@ -3210,13 +3329,14 @@ 2.5 + 2.6 + In this thesis you have seen the =CORTEX= system, a complete 2.7 + environment for creating simulated creatures. You have seen how to 2.8 +- implement five senses including touch, proprioception, hearing, 2.9 +- vision, and muscle tension. You have seen how to create new creatues 2.10 +- using blender, a 3D modeling tool. I hope that =CORTEX= will be 2.11 +- useful in further research projects. To this end I have included the 2.12 +- full source to =CORTEX= along with a large suite of tests and 2.13 +- examples. I have also created a user guide for =CORTEX= which is 2.14 +- inculded in an appendix to this thesis. 2.15 ++ implement five senses: touch, proprioception, hearing, vision, and 2.16 ++ muscle tension. You have seen how to create new creatues using 2.17 ++ blender, a 3D modeling tool. I hope that =CORTEX= will be useful in 2.18 ++ further research projects. To this end I have included the full 2.19 ++ source to =CORTEX= along with a large suite of tests and examples. I 2.20 ++ have also created a user guide for =CORTEX= which is inculded in an 2.21 ++ appendix to this thesis \ref{}. 2.22 ++# dxh: todo reference appendix 2.23 + 2.24 + You have also seen how I used =CORTEX= as a platform to attach the 2.25 + /action recognition/ problem, which is the problem of recognizing
3.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 3.2 +++ b/thesis/dylan-cortex-diff.diff Sun Mar 30 10:41:18 2014 -0400 3.3 @@ -0,0 +1,395 @@ 3.4 +diff -r f639e2139ce2 thesis/cortex.org 3.5 +--- a/thesis/cortex.org Sun Mar 30 01:34:43 2014 -0400 3.6 ++++ b/thesis/cortex.org Sun Mar 30 10:07:17 2014 -0400 3.7 +@@ -41,49 +41,46 @@ 3.8 + [[./images/aurellem-gray.png]] 3.9 + 3.10 + 3.11 +-* Empathy and Embodiment as problem solving strategies 3.12 ++* Empathy \& Embodiment: problem solving strategies 3.13 + 3.14 +- By the end of this thesis, you will have seen a novel approach to 3.15 +- interpreting video using embodiment and empathy. You will have also 3.16 +- seen one way to efficiently implement empathy for embodied 3.17 +- creatures. Finally, you will become familiar with =CORTEX=, a system 3.18 +- for designing and simulating creatures with rich senses, which you 3.19 +- may choose to use in your own research. 3.20 +- 3.21 +- This is the core vision of my thesis: That one of the important ways 3.22 +- in which we understand others is by imagining ourselves in their 3.23 +- position and emphatically feeling experiences relative to our own 3.24 +- bodies. By understanding events in terms of our own previous 3.25 +- corporeal experience, we greatly constrain the possibilities of what 3.26 +- would otherwise be an unwieldy exponential search. This extra 3.27 +- constraint can be the difference between easily understanding what 3.28 +- is happening in a video and being completely lost in a sea of 3.29 +- incomprehensible color and movement. 3.30 +- 3.31 +-** Recognizing actions in video is extremely difficult 3.32 +- 3.33 +- Consider for example the problem of determining what is happening 3.34 +- in a video of which this is one frame: 3.35 +- 3.36 ++** The problem: recognizing actions in video is extremely difficult 3.37 ++# developing / requires useful representations 3.38 ++ 3.39 ++ Examine the following collection of images. As you, and indeed very 3.40 ++ young children, can easily determine, each one is a picture of 3.41 ++ someone drinking. 3.42 ++ 3.43 ++ # dxh: cat, cup, drinking fountain, rain, straw, coconut 3.44 + #+caption: A cat drinking some water. Identifying this action is 3.45 +- #+caption: beyond the state of the art for computers. 3.46 ++ #+caption: beyond the capabilities of existing computer vision systems. 3.47 + #+ATTR_LaTeX: :width 7cm 3.48 + [[./images/cat-drinking.jpg]] 3.49 ++ 3.50 ++ Nevertheless, it is beyond the state of the art for a computer 3.51 ++ vision program to describe what's happening in each of these 3.52 ++ images, or what's common to them. Part of the problem is that many 3.53 ++ computer vision systems focus on pixel-level details or probability 3.54 ++ distributions of pixels, with little focus on [...] 3.55 ++ 3.56 ++ 3.57 ++ In fact, the contents of scene may have much less to do with pixel 3.58 ++ probabilities than with recognizing various affordances: things you 3.59 ++ can move, objects you can grasp, spaces that can be filled 3.60 ++ (Gibson). For example, what processes might enable you to see the 3.61 ++ chair in figure \ref{hidden-chair}? 3.62 ++ # Or suppose that you are building a program that recognizes chairs. 3.63 ++ # How could you ``see'' the chair ? 3.64 + 3.65 +- It is currently impossible for any computer program to reliably 3.66 +- label such a video as ``drinking''. And rightly so -- it is a very 3.67 +- hard problem! What features can you describe in terms of low level 3.68 +- functions of pixels that can even begin to describe at a high level 3.69 +- what is happening here? 3.70 +- 3.71 +- Or suppose that you are building a program that recognizes chairs. 3.72 +- How could you ``see'' the chair in figure \ref{hidden-chair}? 3.73 +- 3.74 ++ # dxh: blur chair 3.75 + #+caption: The chair in this image is quite obvious to humans, but I 3.76 + #+caption: doubt that any modern computer vision program can find it. 3.77 + #+name: hidden-chair 3.78 + #+ATTR_LaTeX: :width 10cm 3.79 + [[./images/fat-person-sitting-at-desk.jpg]] 3.80 ++ 3.81 ++ 3.82 ++ 3.83 ++ 3.84 + 3.85 + Finally, how is it that you can easily tell the difference between 3.86 + how the girls /muscles/ are working in figure \ref{girl}? 3.87 +@@ -95,10 +92,13 @@ 3.88 + #+ATTR_LaTeX: :width 7cm 3.89 + [[./images/wall-push.png]] 3.90 + 3.91 ++ 3.92 ++ 3.93 ++ 3.94 + Each of these examples tells us something about what might be going 3.95 + on in our minds as we easily solve these recognition problems. 3.96 + 3.97 +- The hidden chairs show us that we are strongly triggered by cues 3.98 ++ The hidden chair shows us that we are strongly triggered by cues 3.99 + relating to the position of human bodies, and that we can determine 3.100 + the overall physical configuration of a human body even if much of 3.101 + that body is occluded. 3.102 +@@ -109,10 +109,107 @@ 3.103 + most positions, and we can easily project this self-knowledge to 3.104 + imagined positions triggered by images of the human body. 3.105 + 3.106 +-** =EMPATH= neatly solves recognition problems 3.107 ++** A step forward: the sensorimotor-centered approach 3.108 ++# ** =EMPATH= recognizes what creatures are doing 3.109 ++# neatly solves recognition problems 3.110 ++ In this thesis, I explore the idea that our knowledge of our own 3.111 ++ bodies enables us to recognize the actions of others. 3.112 ++ 3.113 ++ First, I built a system for constructing virtual creatures with 3.114 ++ physiologically plausible sensorimotor systems and detailed 3.115 ++ environments. The result is =CORTEX=, which is described in section 3.116 ++ \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other 3.117 ++ AI researchers; it is provided in full with detailed instructions 3.118 ++ on the web [here].) 3.119 ++ 3.120 ++ Next, I wrote routines which enabled a simple worm-like creature to 3.121 ++ infer the actions of a second worm-like creature, using only its 3.122 ++ own prior sensorimotor experiences and knowledge of the second 3.123 ++ worm's joint positions. This program, =EMPATH=, is described in 3.124 ++ section \ref{sec-3}, and the key results of this experiment are 3.125 ++ summarized below. 3.126 ++ 3.127 ++ #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer 3.128 ++ #+caption: the complete sensory experience and classify these four poses. 3.129 ++ #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}. 3.130 ++ #+name: worm-recognition-intro-2 3.131 ++ #+ATTR_LaTeX: :width 15cm 3.132 ++ [[./images/empathy-1.png]] 3.133 ++ 3.134 ++ # =CORTEX= provides a language for describing the sensorimotor 3.135 ++ # experiences of various creatures. 3.136 ++ 3.137 ++ # Next, I developed an experiment to test the power of =CORTEX='s 3.138 ++ # sensorimotor-centered language for solving recognition problems. As 3.139 ++ # a proof of concept, I wrote routines which enabled a simple 3.140 ++ # worm-like creature to infer the actions of a second worm-like 3.141 ++ # creature, using only its own previous sensorimotor experiences and 3.142 ++ # knowledge of the second worm's joints (figure 3.143 ++ # \ref{worm-recognition-intro-2}). The result of this proof of 3.144 ++ # concept was the program =EMPATH=, described in section 3.145 ++ # \ref{sec-3}. The key results of this 3.146 ++ 3.147 ++ # Using only first-person sensorimotor experiences and third-person 3.148 ++ # proprioceptive data, 3.149 ++ 3.150 ++*** Key results 3.151 ++ - After one-shot supervised training, =EMPATH= was able recognize a 3.152 ++ wide variety of static poses and dynamic actions---ranging from 3.153 ++ curling in a circle to wriggling with a particular frequency --- 3.154 ++ with 95\% accuracy. 3.155 ++ - These results were completely independent of viewing angle 3.156 ++ because the underlying body-centered language fundamentally is; 3.157 ++ once an action is learned, it can be recognized equally well from 3.158 ++ any viewing angle. 3.159 ++ - =EMPATH= is surprisingly short; the sensorimotor-centered 3.160 ++ language provided by =CORTEX= resulted in extremely economical 3.161 ++ recognition routines --- about 0000 lines in all --- suggesting 3.162 ++ that such representations are very powerful, and often 3.163 ++ indispensible for the types of recognition tasks considered here. 3.164 ++ - Although for expediency's sake, I relied on direct knowledge of 3.165 ++ joint positions in this proof of concept, it would be 3.166 ++ straightforward to extend =EMPATH= so that it (more 3.167 ++ realistically) infers joint positions from its visual data. 3.168 ++ 3.169 ++# because the underlying language is fundamentally orientation-independent 3.170 ++ 3.171 ++# recognize the actions of a worm with 95\% accuracy. The 3.172 ++# recognition tasks 3.173 + 3.174 +- I propose a system that can express the types of recognition 3.175 +- problems above in a form amenable to computation. It is split into 3.176 ++ 3.177 ++ 3.178 ++ 3.179 ++ [Talk about these results and what you find promising about them] 3.180 ++ 3.181 ++** Roadmap 3.182 ++ [I'm going to explain how =CORTEX= works, then break down how 3.183 ++ =EMPATH= does its thing. Because the details reveal such-and-such 3.184 ++ about the approach.] 3.185 ++ 3.186 ++ # The success of this simple proof-of-concept offers a tantalizing 3.187 ++ 3.188 ++ 3.189 ++ # explore the idea 3.190 ++ # The key contribution of this thesis is the idea that body-centered 3.191 ++ # representations (which express 3.192 ++ 3.193 ++ 3.194 ++ # the 3.195 ++ # body-centered approach --- in which I try to determine what's 3.196 ++ # happening in a scene by bringing it into registration with my own 3.197 ++ # bodily experiences --- are indispensible for recognizing what 3.198 ++ # creatures are doing in a scene. 3.199 ++ 3.200 ++* COMMENT 3.201 ++# body-centered language 3.202 ++ 3.203 ++ In this thesis, I'll describe =EMPATH=, which solves a certain 3.204 ++ class of recognition problems 3.205 ++ 3.206 ++ The key idea is to use self-centered (or first-person) language. 3.207 ++ 3.208 ++ I have built a system that can express the types of recognition 3.209 ++ problems in a form amenable to computation. It is split into 3.210 + four parts: 3.211 + 3.212 + - Free/Guided Play :: The creature moves around and experiences the 3.213 +@@ -286,14 +383,14 @@ 3.214 + code to create a creature, and can use a wide library of 3.215 + pre-existing blender models as a base for your own creatures. 3.216 + 3.217 +- - =CORTEX= implements a wide variety of senses, including touch, 3.218 ++ - =CORTEX= implements a wide variety of senses: touch, 3.219 + proprioception, vision, hearing, and muscle tension. Complicated 3.220 + senses like touch, and vision involve multiple sensory elements 3.221 + embedded in a 2D surface. You have complete control over the 3.222 + distribution of these sensor elements through the use of simple 3.223 + png image files. In particular, =CORTEX= implements more 3.224 + comprehensive hearing than any other creature simulation system 3.225 +- available. 3.226 ++ available. 3.227 + 3.228 + - =CORTEX= supports any number of creatures and any number of 3.229 + senses. Time in =CORTEX= dialates so that the simulated creatures 3.230 +@@ -353,7 +450,24 @@ 3.231 + \end{sidewaysfigure} 3.232 + #+END_LaTeX 3.233 + 3.234 +-** Contributions 3.235 ++** Road map 3.236 ++ 3.237 ++ By the end of this thesis, you will have seen a novel approach to 3.238 ++ interpreting video using embodiment and empathy. You will have also 3.239 ++ seen one way to efficiently implement empathy for embodied 3.240 ++ creatures. Finally, you will become familiar with =CORTEX=, a system 3.241 ++ for designing and simulating creatures with rich senses, which you 3.242 ++ may choose to use in your own research. 3.243 ++ 3.244 ++ This is the core vision of my thesis: That one of the important ways 3.245 ++ in which we understand others is by imagining ourselves in their 3.246 ++ position and emphatically feeling experiences relative to our own 3.247 ++ bodies. By understanding events in terms of our own previous 3.248 ++ corporeal experience, we greatly constrain the possibilities of what 3.249 ++ would otherwise be an unwieldy exponential search. This extra 3.250 ++ constraint can be the difference between easily understanding what 3.251 ++ is happening in a video and being completely lost in a sea of 3.252 ++ incomprehensible color and movement. 3.253 + 3.254 + - I built =CORTEX=, a comprehensive platform for embodied AI 3.255 + experiments. =CORTEX= supports many features lacking in other 3.256 +@@ -363,18 +477,22 @@ 3.257 + - I built =EMPATH=, which uses =CORTEX= to identify the actions of 3.258 + a worm-like creature using a computational model of empathy. 3.259 + 3.260 +-* Building =CORTEX= 3.261 +- 3.262 +- I intend for =CORTEX= to be used as a general-purpose library for 3.263 +- building creatures and outfitting them with senses, so that it will 3.264 +- be useful for other researchers who want to test out ideas of their 3.265 +- own. To this end, wherver I have had to make archetictural choices 3.266 +- about =CORTEX=, I have chosen to give as much freedom to the user as 3.267 +- possible, so that =CORTEX= may be used for things I have not 3.268 +- forseen. 3.269 +- 3.270 +-** Simulation or Reality? 3.271 +- 3.272 ++ 3.273 ++* Designing =CORTEX= 3.274 ++ In this section, I outline the design decisions that went into 3.275 ++ making =CORTEX=, along with some details about its 3.276 ++ implementation. (A practical guide to getting started with =CORTEX=, 3.277 ++ which skips over the history and implementation details presented 3.278 ++ here, is provided in an appendix \ref{} at the end of this paper.) 3.279 ++ 3.280 ++ Throughout this project, I intended for =CORTEX= to be flexible and 3.281 ++ extensible enough to be useful for other researchers who want to 3.282 ++ test out ideas of their own. To this end, wherver I have had to make 3.283 ++ archetictural choices about =CORTEX=, I have chosen to give as much 3.284 ++ freedom to the user as possible, so that =CORTEX= may be used for 3.285 ++ things I have not forseen. 3.286 ++ 3.287 ++** Building in simulation versus reality 3.288 + The most important archetictural decision of all is the choice to 3.289 + use a computer-simulated environemnt in the first place! The world 3.290 + is a vast and rich place, and for now simulations are a very poor 3.291 +@@ -436,7 +554,7 @@ 3.292 + doing everything in software is far cheaper than building custom 3.293 + real-time hardware. All you need is a laptop and some patience. 3.294 + 3.295 +-** Because of Time, simulation is perferable to reality 3.296 ++** Simulated time enables rapid prototyping and complex scenes 3.297 + 3.298 + I envision =CORTEX= being used to support rapid prototyping and 3.299 + iteration of ideas. Even if I could put together a well constructed 3.300 +@@ -459,8 +577,8 @@ 3.301 + simulations of very simple creatures in =CORTEX= generally run at 3.302 + 40x on my machine! 3.303 + 3.304 +-** What is a sense? 3.305 +- 3.306 ++** All sense organs are two-dimensional surfaces 3.307 ++# What is a sense? 3.308 + If =CORTEX= is to support a wide variety of senses, it would help 3.309 + to have a better understanding of what a ``sense'' actually is! 3.310 + While vision, touch, and hearing all seem like they are quite 3.311 +@@ -956,7 +1074,7 @@ 3.312 + #+ATTR_LaTeX: :width 15cm 3.313 + [[./images/physical-hand.png]] 3.314 + 3.315 +-** Eyes reuse standard video game components 3.316 ++** Sight reuses standard video game components... 3.317 + 3.318 + Vision is one of the most important senses for humans, so I need to 3.319 + build a simulated sense of vision for my AI. I will do this with 3.320 +@@ -1257,8 +1375,8 @@ 3.321 + community and is now (in modified form) part of a system for 3.322 + capturing in-game video to a file. 3.323 + 3.324 +-** Hearing is hard; =CORTEX= does it right 3.325 +- 3.326 ++** ...but hearing must be built from scratch 3.327 ++# is hard; =CORTEX= does it right 3.328 + At the end of this section I will have simulated ears that work the 3.329 + same way as the simulated eyes in the last section. I will be able to 3.330 + place any number of ear-nodes in a blender file, and they will bind to 3.331 +@@ -1565,7 +1683,7 @@ 3.332 + jMonkeyEngine3 community and is used to record audio for demo 3.333 + videos. 3.334 + 3.335 +-** Touch uses hundreds of hair-like elements 3.336 ++** Hundreds of hair-like elements provide a sense of touch 3.337 + 3.338 + Touch is critical to navigation and spatial reasoning and as such I 3.339 + need a simulated version of it to give to my AI creatures. 3.340 +@@ -2059,7 +2177,7 @@ 3.341 + #+ATTR_LaTeX: :width 15cm 3.342 + [[./images/touch-cube.png]] 3.343 + 3.344 +-** Proprioception is the sense that makes everything ``real'' 3.345 ++** Proprioception provides knowledge of your own body's position 3.346 + 3.347 + Close your eyes, and touch your nose with your right index finger. 3.348 + How did you do it? You could not see your hand, and neither your 3.349 +@@ -2193,7 +2311,7 @@ 3.350 + #+ATTR_LaTeX: :width 11cm 3.351 + [[./images/proprio.png]] 3.352 + 3.353 +-** Muscles are both effectors and sensors 3.354 ++** Muscles contain both sensors and effectors 3.355 + 3.356 + Surprisingly enough, terrestrial creatures only move by using 3.357 + torque applied about their joints. There's not a single straight 3.358 +@@ -2440,7 +2558,8 @@ 3.359 + hard control problems without worrying about physics or 3.360 + senses. 3.361 + 3.362 +-* Empathy in a simulated worm 3.363 ++* =EMPATH=: the simulated worm experiment 3.364 ++# Empathy in a simulated worm 3.365 + 3.366 + Here I develop a computational model of empathy, using =CORTEX= as a 3.367 + base. Empathy in this context is the ability to observe another 3.368 +@@ -2732,7 +2851,7 @@ 3.369 + provided by an experience vector and reliably infering the rest of 3.370 + the senses. 3.371 + 3.372 +-** Empathy is the process of tracing though \Phi-space 3.373 ++** ``Empathy'' requires retracing steps though \Phi-space 3.374 + 3.375 + Here is the core of a basic empathy algorithm, starting with an 3.376 + experience vector: 3.377 +@@ -2888,7 +3007,7 @@ 3.378 + #+end_src 3.379 + #+end_listing 3.380 + 3.381 +-** Efficient action recognition with =EMPATH= 3.382 ++** =EMPATH= recognizes actions efficiently 3.383 + 3.384 + To use =EMPATH= with the worm, I first need to gather a set of 3.385 + experiences from the worm that includes the actions I want to 3.386 +@@ -3044,9 +3163,9 @@ 3.387 + to interpretation, and dissaggrement between empathy and experience 3.388 + is more excusable. 3.389 + 3.390 +-** Digression: bootstrapping touch using free exploration 3.391 +- 3.392 +- In the previous section I showed how to compute actions in terms of 3.393 ++** Digression: Learn touch sensor layout through haptic experimentation, instead 3.394 ++# Boostraping touch using free exploration 3.395 ++In the previous section I showed how to compute actions in terms of 3.396 + body-centered predicates which relied averate touch activation of 3.397 + pre-defined regions of the worm's skin. What if, instead of recieving 3.398 + touch pre-grouped into the six faces of each worm segment, the true
4.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 4.2 +++ b/thesis/dylan-reject.diff Sun Mar 30 10:41:18 2014 -0400 4.3 @@ -0,0 +1,11 @@ 4.4 +@@ -3234,8 +3354,8 @@ 4.5 + 4.6 + - =CORTEX=, a system for creating simulated creatures with rich 4.7 + senses. 4.8 +- - =EMPATH=, a program for recognizing actions by imagining sensory 4.9 +- experience. 4.10 ++ - =EMPATH=, a program for recognizing actions by aligning them with 4.11 ++ personal sensory experiences. 4.12 + 4.13 + # An anatomical joke: 4.14 + # - Training