changeset 511:07c3feb32df3

go over changes by Dylan.
author Robert McIntyre <rlm@mit.edu>
date Sun, 30 Mar 2014 10:17:43 -0400 (2014-03-30)
parents f639e2139ce2
children 8b962ab418c8
files thesis/cortex.org thesis/dxh-cortex-diff.diff
diffstat 2 files changed, 611 insertions(+), 74 deletions(-) [+]
line wrap: on
line diff
     1.1 --- a/thesis/cortex.org	Sun Mar 30 01:34:43 2014 -0400
     1.2 +++ b/thesis/cortex.org	Sun Mar 30 10:17:43 2014 -0400
     1.3 @@ -41,49 +41,46 @@
     1.4      [[./images/aurellem-gray.png]]
     1.5  
     1.6  
     1.7 -* Empathy and Embodiment as problem solving strategies
     1.8 +* Empathy \& Embodiment: problem solving strategies
     1.9    
    1.10 -  By the end of this thesis, you will have seen a novel approach to
    1.11 -  interpreting video using embodiment and empathy. You will have also
    1.12 -  seen one way to efficiently implement empathy for embodied
    1.13 -  creatures. Finally, you will become familiar with =CORTEX=, a system
    1.14 -  for designing and simulating creatures with rich senses, which you
    1.15 -  may choose to use in your own research.
    1.16 -  
    1.17 -  This is the core vision of my thesis: That one of the important ways
    1.18 -  in which we understand others is by imagining ourselves in their
    1.19 -  position and emphatically feeling experiences relative to our own
    1.20 -  bodies. By understanding events in terms of our own previous
    1.21 -  corporeal experience, we greatly constrain the possibilities of what
    1.22 -  would otherwise be an unwieldy exponential search. This extra
    1.23 -  constraint can be the difference between easily understanding what
    1.24 -  is happening in a video and being completely lost in a sea of
    1.25 -  incomprehensible color and movement.
    1.26 -  
    1.27 -** Recognizing actions in video is extremely difficult
    1.28 -
    1.29 -   Consider for example the problem of determining what is happening
    1.30 -   in a video of which this is one frame:
    1.31 -
    1.32 +** The problem: recognizing actions in video is extremely difficult
    1.33 +# developing / requires useful representations
    1.34 +   
    1.35 +   Examine the following collection of images. As you, and indeed very
    1.36 +   young children, can easily determine, each one is a picture of
    1.37 +   someone drinking. 
    1.38 +
    1.39 +   # dxh: cat, cup, drinking fountain, rain, straw, coconut
    1.40     #+caption: A cat drinking some water. Identifying this action is 
    1.41 -   #+caption: beyond the state of the art for computers.
    1.42 +   #+caption: beyond the capabilities of existing computer vision systems.
    1.43     #+ATTR_LaTeX: :width 7cm
    1.44     [[./images/cat-drinking.jpg]]
    1.45 +     
    1.46 +   Nevertheless, it is beyond the state of the art for a computer
    1.47 +   vision program to describe what's happening in each of these
    1.48 +   images, or what's common to them. Part of the problem is that many
    1.49 +   computer vision systems focus on pixel-level details or probability
    1.50 +   distributions of pixels, with little focus on [...]
    1.51 +
    1.52 +
    1.53 +   In fact, the contents of scene may have much less to do with pixel
    1.54 +   probabilities than with recognizing various affordances: things you
    1.55 +   can move, objects you can grasp, spaces that can be filled
    1.56 +   (Gibson). For example, what processes might enable you to see the
    1.57 +   chair in figure \ref{hidden-chair}? 
    1.58 +   # Or suppose that you are building a program that recognizes chairs.
    1.59 +   # How could you ``see'' the chair ?
    1.60     
    1.61 -   It is currently impossible for any computer program to reliably
    1.62 -   label such a video as ``drinking''. And rightly so -- it is a very
    1.63 -   hard problem! What features can you describe in terms of low level
    1.64 -   functions of pixels that can even begin to describe at a high level
    1.65 -   what is happening here?
    1.66 -  
    1.67 -   Or suppose that you are building a program that recognizes chairs.
    1.68 -   How could you ``see'' the chair in figure \ref{hidden-chair}?
    1.69 -   
    1.70 +   # dxh: blur chair
    1.71     #+caption: The chair in this image is quite obvious to humans, but I 
    1.72     #+caption: doubt that any modern computer vision program can find it.
    1.73     #+name: hidden-chair
    1.74     #+ATTR_LaTeX: :width 10cm
    1.75     [[./images/fat-person-sitting-at-desk.jpg]]
    1.76 +
    1.77 +
    1.78 +   
    1.79 +
    1.80     
    1.81     Finally, how is it that you can easily tell the difference between
    1.82     how the girls /muscles/ are working in figure \ref{girl}?
    1.83 @@ -95,10 +92,13 @@
    1.84     #+ATTR_LaTeX: :width 7cm
    1.85     [[./images/wall-push.png]]
    1.86    
    1.87 +
    1.88 +
    1.89 +
    1.90     Each of these examples tells us something about what might be going
    1.91     on in our minds as we easily solve these recognition problems.
    1.92     
    1.93 -   The hidden chairs show us that we are strongly triggered by cues
    1.94 +   The hidden chair shows us that we are strongly triggered by cues
    1.95     relating to the position of human bodies, and that we can determine
    1.96     the overall physical configuration of a human body even if much of
    1.97     that body is occluded.
    1.98 @@ -109,10 +109,107 @@
    1.99     most positions, and we can easily project this self-knowledge to
   1.100     imagined positions triggered by images of the human body.
   1.101  
   1.102 -** =EMPATH= neatly solves recognition problems  
   1.103 +** A step forward: the sensorimotor-centered approach
   1.104 +# ** =EMPATH= recognizes what creatures are doing
   1.105 +# neatly solves recognition problems  
   1.106 +   In this thesis, I explore the idea that our knowledge of our own
   1.107 +   bodies enables us to recognize the actions of others. 
   1.108 +
   1.109 +   First, I built a system for constructing virtual creatures with
   1.110 +   physiologically plausible sensorimotor systems and detailed
   1.111 +   environments. The result is =CORTEX=, which is described in section
   1.112 +   \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other
   1.113 +   AI researchers; it is provided in full with detailed instructions
   1.114 +   on the web [here].)
   1.115 +
   1.116 +   Next, I wrote routines which enabled a simple worm-like creature to
   1.117 +   infer the actions of a second worm-like creature, using only its
   1.118 +   own prior sensorimotor experiences and knowledge of the second
   1.119 +   worm's joint positions. This program, =EMPATH=, is described in
   1.120 +   section \ref{sec-3}, and the key results of this experiment are
   1.121 +   summarized below.
   1.122 +
   1.123 +  #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer 
   1.124 +  #+caption: the complete sensory experience and classify these four poses.
   1.125 +  #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.
   1.126 +  #+name: worm-recognition-intro-2
   1.127 +  #+ATTR_LaTeX: :width 15cm
   1.128 +   [[./images/empathy-1.png]]
   1.129 +
   1.130 +   # =CORTEX= provides a language for describing the sensorimotor
   1.131 +   # experiences of various creatures. 
   1.132 +
   1.133 +   # Next, I developed an experiment to test the power of =CORTEX='s
   1.134 +   # sensorimotor-centered language for solving recognition problems. As
   1.135 +   # a proof of concept, I wrote routines which enabled a simple
   1.136 +   # worm-like creature to infer the actions of a second worm-like
   1.137 +   # creature, using only its own previous sensorimotor experiences and
   1.138 +   # knowledge of the second worm's joints (figure
   1.139 +   # \ref{worm-recognition-intro-2}). The result of this proof of
   1.140 +   # concept was the program =EMPATH=, described in section
   1.141 +   # \ref{sec-3}. The key results of this
   1.142 +
   1.143 +   # Using only first-person sensorimotor experiences and third-person
   1.144 +   # proprioceptive data, 
   1.145 +
   1.146 +*** Key results
   1.147 +   - After one-shot supervised training, =EMPATH= was able recognize a
   1.148 +     wide variety of static poses and dynamic actions---ranging from
   1.149 +     curling in a circle to wriggling with a particular frequency ---
   1.150 +     with 95\% accuracy.
   1.151 +   - These results were completely independent of viewing angle
   1.152 +     because the underlying body-centered language fundamentally is
   1.153 +     independent; once an action is learned, it can be recognized
   1.154 +     equally well from any viewing angle.
   1.155 +   - =EMPATH= is surprisingly short; the sensorimotor-centered
   1.156 +     language provided by =CORTEX= resulted in extremely economical
   1.157 +     recognition routines --- about 0000 lines in all --- suggesting
   1.158 +     that such representations are very powerful, and often
   1.159 +     indispensible for the types of recognition tasks considered here.
   1.160 +   - Although for expediency's sake, I relied on direct knowledge of
   1.161 +     joint positions in this proof of concept, it would be
   1.162 +     straightforward to extend =EMPATH= so that it (more
   1.163 +     realistically) infers joint positions from its visual data.
   1.164 +
   1.165 +# because the underlying language is fundamentally orientation-independent
   1.166 +
   1.167 +# recognize the actions of a worm with 95\% accuracy. The
   1.168 +#      recognition tasks 
   1.169     
   1.170 -   I propose a system that can express the types of recognition
   1.171 -   problems above in a form amenable to computation. It is split into
   1.172 +
   1.173 +
   1.174 +
   1.175 +   [Talk about these results and what you find promising about them]
   1.176 +
   1.177 +** Roadmap
   1.178 +   [I'm going to explain how =CORTEX= works, then break down how
   1.179 +   =EMPATH= does its thing. Because the details reveal such-and-such
   1.180 +   about the approach.]
   1.181 +
   1.182 +   # The success of this simple proof-of-concept offers a tantalizing
   1.183 +
   1.184 +
   1.185 +   # explore the idea 
   1.186 +   # The key contribution of this thesis is the idea that body-centered
   1.187 +   # representations (which express 
   1.188 +
   1.189 +
   1.190 +   # the
   1.191 +   # body-centered approach --- in which I try to determine what's
   1.192 +   # happening in a scene by bringing it into registration with my own
   1.193 +   # bodily experiences --- are indispensible for recognizing what
   1.194 +   # creatures are doing in a scene.
   1.195 +
   1.196 +* COMMENT
   1.197 +# body-centered language
   1.198 +   
   1.199 +   In this thesis, I'll describe =EMPATH=, which solves a certain
   1.200 +   class of recognition problems 
   1.201 +
   1.202 +   The key idea is to use self-centered (or first-person) language.
   1.203 +
   1.204 +   I have built a system that can express the types of recognition
   1.205 +   problems in a form amenable to computation. It is split into
   1.206     four parts:
   1.207  
   1.208     - Free/Guided Play :: The creature moves around and experiences the
   1.209 @@ -286,14 +383,14 @@
   1.210       code to create a creature, and can use a wide library of
   1.211       pre-existing blender models as a base for your own creatures.
   1.212  
   1.213 -   - =CORTEX= implements a wide variety of senses, including touch,
   1.214 +   - =CORTEX= implements a wide variety of senses: touch,
   1.215       proprioception, vision, hearing, and muscle tension. Complicated
   1.216       senses like touch, and vision involve multiple sensory elements
   1.217       embedded in a 2D surface. You have complete control over the
   1.218       distribution of these sensor elements through the use of simple
   1.219       png image files. In particular, =CORTEX= implements more
   1.220       comprehensive hearing than any other creature simulation system
   1.221 -     available. 
   1.222 +     available.
   1.223  
   1.224     - =CORTEX= supports any number of creatures and any number of
   1.225       senses. Time in =CORTEX= dialates so that the simulated creatures
   1.226 @@ -353,7 +450,24 @@
   1.227     \end{sidewaysfigure}
   1.228  #+END_LaTeX
   1.229  
   1.230 -** Contributions
   1.231 +** Road map
   1.232 +
   1.233 +   By the end of this thesis, you will have seen a novel approach to
   1.234 +  interpreting video using embodiment and empathy. You will have also
   1.235 +  seen one way to efficiently implement empathy for embodied
   1.236 +  creatures. Finally, you will become familiar with =CORTEX=, a system
   1.237 +  for designing and simulating creatures with rich senses, which you
   1.238 +  may choose to use in your own research.
   1.239 +  
   1.240 +  This is the core vision of my thesis: That one of the important ways
   1.241 +  in which we understand others is by imagining ourselves in their
   1.242 +  position and emphatically feeling experiences relative to our own
   1.243 +  bodies. By understanding events in terms of our own previous
   1.244 +  corporeal experience, we greatly constrain the possibilities of what
   1.245 +  would otherwise be an unwieldy exponential search. This extra
   1.246 +  constraint can be the difference between easily understanding what
   1.247 +  is happening in a video and being completely lost in a sea of
   1.248 +  incomprehensible color and movement.
   1.249  
   1.250     - I built =CORTEX=, a comprehensive platform for embodied AI
   1.251       experiments. =CORTEX= supports many features lacking in other
   1.252 @@ -363,18 +477,22 @@
   1.253     - I built =EMPATH=, which uses =CORTEX= to identify the actions of
   1.254       a worm-like creature using a computational model of empathy.
   1.255     
   1.256 -* Building =CORTEX=
   1.257 -
   1.258 -  I intend for =CORTEX= to be used as a general-purpose library for
   1.259 -  building creatures and outfitting them with senses, so that it will
   1.260 -  be useful for other researchers who want to test out ideas of their
   1.261 -  own. To this end, wherver I have had to make archetictural choices
   1.262 -  about =CORTEX=, I have chosen to give as much freedom to the user as
   1.263 -  possible, so that =CORTEX= may be used for things I have not
   1.264 -  forseen.
   1.265 -
   1.266 -** Simulation or Reality?
   1.267 -   
   1.268 +
   1.269 +* Designing =CORTEX=
   1.270 +  In this section, I outline the design decisions that went into
   1.271 +  making =CORTEX=, along with some details about its
   1.272 +  implementation. (A practical guide to getting started with =CORTEX=,
   1.273 +  which skips over the history and implementation details presented
   1.274 +  here, is provided in an appendix \ref{} at the end of this paper.)
   1.275 +
   1.276 +  Throughout this project, I intended for =CORTEX= to be flexible and
   1.277 +  extensible enough to be useful for other researchers who want to
   1.278 +  test out ideas of their own. To this end, wherver I have had to make
   1.279 +  archetictural choices about =CORTEX=, I have chosen to give as much
   1.280 +  freedom to the user as possible, so that =CORTEX= may be used for
   1.281 +  things I have not forseen.
   1.282 +
   1.283 +** Building in simulation versus reality
   1.284     The most important archetictural decision of all is the choice to
   1.285     use a computer-simulated environemnt in the first place! The world
   1.286     is a vast and rich place, and for now simulations are a very poor
   1.287 @@ -436,7 +554,7 @@
   1.288      doing everything in software is far cheaper than building custom
   1.289      real-time hardware. All you need is a laptop and some patience.
   1.290  
   1.291 -** Because of Time, simulation is perferable to reality
   1.292 +** Simulated time enables rapid prototyping and complex scenes 
   1.293  
   1.294     I envision =CORTEX= being used to support rapid prototyping and
   1.295     iteration of ideas. Even if I could put together a well constructed
   1.296 @@ -459,8 +577,8 @@
   1.297     simulations of very simple creatures in =CORTEX= generally run at
   1.298     40x on my machine!
   1.299  
   1.300 -** What is a sense?
   1.301 -   
   1.302 +** All sense organs are two-dimensional surfaces
   1.303 +# What is a sense?   
   1.304     If =CORTEX= is to support a wide variety of senses, it would help
   1.305     to have a better understanding of what a ``sense'' actually is!
   1.306     While vision, touch, and hearing all seem like they are quite
   1.307 @@ -956,7 +1074,7 @@
   1.308      #+ATTR_LaTeX: :width 15cm
   1.309      [[./images/physical-hand.png]]
   1.310  
   1.311 -** Eyes reuse standard video game components
   1.312 +** Sight reuses standard video game components...
   1.313  
   1.314     Vision is one of the most important senses for humans, so I need to
   1.315     build a simulated sense of vision for my AI. I will do this with
   1.316 @@ -1257,8 +1375,8 @@
   1.317      community and is now (in modified form) part of a system for
   1.318      capturing in-game video to a file.
   1.319  
   1.320 -** Hearing is hard; =CORTEX= does it right
   1.321 -   
   1.322 +** ...but hearing must be built from scratch
   1.323 +# is hard; =CORTEX= does it right
   1.324     At the end of this section I will have simulated ears that work the
   1.325     same way as the simulated eyes in the last section. I will be able to
   1.326     place any number of ear-nodes in a blender file, and they will bind to
   1.327 @@ -1565,7 +1683,7 @@
   1.328      jMonkeyEngine3 community and is used to record audio for demo
   1.329      videos.
   1.330  
   1.331 -** Touch uses hundreds of hair-like elements
   1.332 +** Hundreds of hair-like elements provide a sense of touch
   1.333  
   1.334     Touch is critical to navigation and spatial reasoning and as such I
   1.335     need a simulated version of it to give to my AI creatures.
   1.336 @@ -2059,7 +2177,7 @@
   1.337      #+ATTR_LaTeX: :width 15cm
   1.338      [[./images/touch-cube.png]]
   1.339  
   1.340 -** Proprioception is the sense that makes everything ``real''
   1.341 +** Proprioception provides knowledge of your own body's position
   1.342  
   1.343     Close your eyes, and touch your nose with your right index finger.
   1.344     How did you do it? You could not see your hand, and neither your
   1.345 @@ -2193,7 +2311,7 @@
   1.346      #+ATTR_LaTeX: :width 11cm
   1.347      [[./images/proprio.png]]
   1.348  
   1.349 -** Muscles are both effectors and sensors
   1.350 +** Muscles contain both sensors and effectors
   1.351  
   1.352     Surprisingly enough, terrestrial creatures only move by using
   1.353     torque applied about their joints. There's not a single straight
   1.354 @@ -2440,7 +2558,8 @@
   1.355          hard control problems without worrying about physics or
   1.356          senses.
   1.357  
   1.358 -* Empathy in a simulated worm
   1.359 +* =EMPATH=: the simulated worm experiment
   1.360 +# Empathy in a simulated worm
   1.361  
   1.362    Here I develop a computational model of empathy, using =CORTEX= as a
   1.363    base. Empathy in this context is the ability to observe another
   1.364 @@ -2732,7 +2851,7 @@
   1.365     provided by an experience vector and reliably infering the rest of
   1.366     the senses.
   1.367  
   1.368 -** Empathy is the process of tracing though \Phi-space 
   1.369 +** ``Empathy'' requires retracing steps though \Phi-space 
   1.370  
   1.371     Here is the core of a basic empathy algorithm, starting with an
   1.372     experience vector:
   1.373 @@ -2888,7 +3007,7 @@
   1.374     #+end_src
   1.375     #+end_listing
   1.376    
   1.377 -** Efficient action recognition with =EMPATH=
   1.378 +** =EMPATH= recognizes actions efficiently
   1.379     
   1.380     To use =EMPATH= with the worm, I first need to gather a set of
   1.381     experiences from the worm that includes the actions I want to
   1.382 @@ -3044,9 +3163,9 @@
   1.383    to interpretation, and dissaggrement between empathy and experience
   1.384    is more excusable.
   1.385  
   1.386 -** Digression: bootstrapping touch using free exploration
   1.387 -
   1.388 -   In the previous section I showed how to compute actions in terms of
   1.389 +** Digression: Learn touch sensor layout through haptic experimentation, instead 
   1.390 +# Boostraping touch using free exploration   
   1.391 +In the previous section I showed how to compute actions in terms of
   1.392     body-centered predicates which relied averate touch activation of
   1.393     pre-defined regions of the worm's skin. What if, instead of recieving
   1.394     touch pre-grouped into the six faces of each worm segment, the true
   1.395 @@ -3210,13 +3329,14 @@
   1.396    
   1.397    In this thesis you have seen the =CORTEX= system, a complete
   1.398    environment for creating simulated creatures. You have seen how to
   1.399 -  implement five senses including touch, proprioception, hearing,
   1.400 -  vision, and muscle tension. You have seen how to create new creatues
   1.401 -  using blender, a 3D modeling tool. I hope that =CORTEX= will be
   1.402 -  useful in further research projects. To this end I have included the
   1.403 -  full source to =CORTEX= along with a large suite of tests and
   1.404 -  examples. I have also created a user guide for =CORTEX= which is
   1.405 -  inculded in an appendix to this thesis.
   1.406 +  implement five senses: touch, proprioception, hearing, vision, and
   1.407 +  muscle tension. You have seen how to create new creatues using
   1.408 +  blender, a 3D modeling tool. I hope that =CORTEX= will be useful in
   1.409 +  further research projects. To this end I have included the full
   1.410 +  source to =CORTEX= along with a large suite of tests and examples. I
   1.411 +  have also created a user guide for =CORTEX= which is inculded in an
   1.412 +  appendix to this thesis \ref{}.
   1.413 +# dxh: todo reference appendix
   1.414  
   1.415    You have also seen how I used =CORTEX= as a platform to attach the
   1.416    /action recognition/ problem, which is the problem of recognizing
     2.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     2.2 +++ b/thesis/dxh-cortex-diff.diff	Sun Mar 30 10:17:43 2014 -0400
     2.3 @@ -0,0 +1,417 @@
     2.4 +diff -r f639e2139ce2 thesis/cortex.org
     2.5 +--- a/thesis/cortex.org	Sun Mar 30 01:34:43 2014 -0400
     2.6 ++++ b/thesis/cortex.org	Sun Mar 30 10:07:17 2014 -0400
     2.7 +@@ -41,49 +41,46 @@
     2.8 +     [[./images/aurellem-gray.png]]
     2.9 + 
    2.10 + 
    2.11 +-* Empathy and Embodiment as problem solving strategies
    2.12 ++* Empathy \& Embodiment: problem solving strategies
    2.13 +   
    2.14 +-  By the end of this thesis, you will have seen a novel approach to
    2.15 +-  interpreting video using embodiment and empathy. You will have also
    2.16 +-  seen one way to efficiently implement empathy for embodied
    2.17 +-  creatures. Finally, you will become familiar with =CORTEX=, a system
    2.18 +-  for designing and simulating creatures with rich senses, which you
    2.19 +-  may choose to use in your own research.
    2.20 +-  
    2.21 +-  This is the core vision of my thesis: That one of the important ways
    2.22 +-  in which we understand others is by imagining ourselves in their
    2.23 +-  position and emphatically feeling experiences relative to our own
    2.24 +-  bodies. By understanding events in terms of our own previous
    2.25 +-  corporeal experience, we greatly constrain the possibilities of what
    2.26 +-  would otherwise be an unwieldy exponential search. This extra
    2.27 +-  constraint can be the difference between easily understanding what
    2.28 +-  is happening in a video and being completely lost in a sea of
    2.29 +-  incomprehensible color and movement.
    2.30 +-  
    2.31 +-** Recognizing actions in video is extremely difficult
    2.32 +-
    2.33 +-   Consider for example the problem of determining what is happening
    2.34 +-   in a video of which this is one frame:
    2.35 +-
    2.36 ++** The problem: recognizing actions in video is extremely difficult
    2.37 ++# developing / requires useful representations
    2.38 ++   
    2.39 ++   Examine the following collection of images. As you, and indeed very
    2.40 ++   young children, can easily determine, each one is a picture of
    2.41 ++   someone drinking. 
    2.42 ++
    2.43 ++   # dxh: cat, cup, drinking fountain, rain, straw, coconut
    2.44 +    #+caption: A cat drinking some water. Identifying this action is 
    2.45 +-   #+caption: beyond the state of the art for computers.
    2.46 ++   #+caption: beyond the capabilities of existing computer vision systems.
    2.47 +    #+ATTR_LaTeX: :width 7cm
    2.48 +    [[./images/cat-drinking.jpg]]
    2.49 ++     
    2.50 ++   Nevertheless, it is beyond the state of the art for a computer
    2.51 ++   vision program to describe what's happening in each of these
    2.52 ++   images, or what's common to them. Part of the problem is that many
    2.53 ++   computer vision systems focus on pixel-level details or probability
    2.54 ++   distributions of pixels, with little focus on [...]
    2.55 ++
    2.56 ++
    2.57 ++   In fact, the contents of scene may have much less to do with pixel
    2.58 ++   probabilities than with recognizing various affordances: things you
    2.59 ++   can move, objects you can grasp, spaces that can be filled
    2.60 ++   (Gibson). For example, what processes might enable you to see the
    2.61 ++   chair in figure \ref{hidden-chair}? 
    2.62 ++   # Or suppose that you are building a program that recognizes chairs.
    2.63 ++   # How could you ``see'' the chair ?
    2.64 +    
    2.65 +-   It is currently impossible for any computer program to reliably
    2.66 +-   label such a video as ``drinking''. And rightly so -- it is a very
    2.67 +-   hard problem! What features can you describe in terms of low level
    2.68 +-   functions of pixels that can even begin to describe at a high level
    2.69 +-   what is happening here?
    2.70 +-  
    2.71 +-   Or suppose that you are building a program that recognizes chairs.
    2.72 +-   How could you ``see'' the chair in figure \ref{hidden-chair}?
    2.73 +-   
    2.74 ++   # dxh: blur chair
    2.75 +    #+caption: The chair in this image is quite obvious to humans, but I 
    2.76 +    #+caption: doubt that any modern computer vision program can find it.
    2.77 +    #+name: hidden-chair
    2.78 +    #+ATTR_LaTeX: :width 10cm
    2.79 +    [[./images/fat-person-sitting-at-desk.jpg]]
    2.80 ++
    2.81 ++
    2.82 ++   
    2.83 ++
    2.84 +    
    2.85 +    Finally, how is it that you can easily tell the difference between
    2.86 +    how the girls /muscles/ are working in figure \ref{girl}?
    2.87 +@@ -95,10 +92,13 @@
    2.88 +    #+ATTR_LaTeX: :width 7cm
    2.89 +    [[./images/wall-push.png]]
    2.90 +   
    2.91 ++
    2.92 ++
    2.93 ++
    2.94 +    Each of these examples tells us something about what might be going
    2.95 +    on in our minds as we easily solve these recognition problems.
    2.96 +    
    2.97 +-   The hidden chairs show us that we are strongly triggered by cues
    2.98 ++   The hidden chair shows us that we are strongly triggered by cues
    2.99 +    relating to the position of human bodies, and that we can determine
   2.100 +    the overall physical configuration of a human body even if much of
   2.101 +    that body is occluded.
   2.102 +@@ -109,10 +109,107 @@
   2.103 +    most positions, and we can easily project this self-knowledge to
   2.104 +    imagined positions triggered by images of the human body.
   2.105 + 
   2.106 +-** =EMPATH= neatly solves recognition problems  
   2.107 ++** A step forward: the sensorimotor-centered approach
   2.108 ++# ** =EMPATH= recognizes what creatures are doing
   2.109 ++# neatly solves recognition problems  
   2.110 ++   In this thesis, I explore the idea that our knowledge of our own
   2.111 ++   bodies enables us to recognize the actions of others. 
   2.112 ++
   2.113 ++   First, I built a system for constructing virtual creatures with
   2.114 ++   physiologically plausible sensorimotor systems and detailed
   2.115 ++   environments. The result is =CORTEX=, which is described in section
   2.116 ++   \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other
   2.117 ++   AI researchers; it is provided in full with detailed instructions
   2.118 ++   on the web [here].)
   2.119 ++
   2.120 ++   Next, I wrote routines which enabled a simple worm-like creature to
   2.121 ++   infer the actions of a second worm-like creature, using only its
   2.122 ++   own prior sensorimotor experiences and knowledge of the second
   2.123 ++   worm's joint positions. This program, =EMPATH=, is described in
   2.124 ++   section \ref{sec-3}, and the key results of this experiment are
   2.125 ++   summarized below.
   2.126 ++
   2.127 ++  #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer 
   2.128 ++  #+caption: the complete sensory experience and classify these four poses.
   2.129 ++  #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.
   2.130 ++  #+name: worm-recognition-intro-2
   2.131 ++  #+ATTR_LaTeX: :width 15cm
   2.132 ++   [[./images/empathy-1.png]]
   2.133 ++
   2.134 ++   # =CORTEX= provides a language for describing the sensorimotor
   2.135 ++   # experiences of various creatures. 
   2.136 ++
   2.137 ++   # Next, I developed an experiment to test the power of =CORTEX='s
   2.138 ++   # sensorimotor-centered language for solving recognition problems. As
   2.139 ++   # a proof of concept, I wrote routines which enabled a simple
   2.140 ++   # worm-like creature to infer the actions of a second worm-like
   2.141 ++   # creature, using only its own previous sensorimotor experiences and
   2.142 ++   # knowledge of the second worm's joints (figure
   2.143 ++   # \ref{worm-recognition-intro-2}). The result of this proof of
   2.144 ++   # concept was the program =EMPATH=, described in section
   2.145 ++   # \ref{sec-3}. The key results of this
   2.146 ++
   2.147 ++   # Using only first-person sensorimotor experiences and third-person
   2.148 ++   # proprioceptive data, 
   2.149 ++
   2.150 ++*** Key results
   2.151 ++   - After one-shot supervised training, =EMPATH= was able recognize a
   2.152 ++     wide variety of static poses and dynamic actions---ranging from
   2.153 ++     curling in a circle to wriggling with a particular frequency ---
   2.154 ++     with 95\% accuracy.
   2.155 ++   - These results were completely independent of viewing angle
   2.156 ++     because the underlying body-centered language fundamentally is;
   2.157 ++     once an action is learned, it can be recognized equally well from
   2.158 ++     any viewing angle.
   2.159 ++   - =EMPATH= is surprisingly short; the sensorimotor-centered
   2.160 ++     language provided by =CORTEX= resulted in extremely economical
   2.161 ++     recognition routines --- about 0000 lines in all --- suggesting
   2.162 ++     that such representations are very powerful, and often
   2.163 ++     indispensible for the types of recognition tasks considered here.
   2.164 ++   - Although for expediency's sake, I relied on direct knowledge of
   2.165 ++     joint positions in this proof of concept, it would be
   2.166 ++     straightforward to extend =EMPATH= so that it (more
   2.167 ++     realistically) infers joint positions from its visual data.
   2.168 ++
   2.169 ++# because the underlying language is fundamentally orientation-independent
   2.170 ++
   2.171 ++# recognize the actions of a worm with 95\% accuracy. The
   2.172 ++#      recognition tasks 
   2.173 +    
   2.174 +-   I propose a system that can express the types of recognition
   2.175 +-   problems above in a form amenable to computation. It is split into
   2.176 ++
   2.177 ++
   2.178 ++
   2.179 ++   [Talk about these results and what you find promising about them]
   2.180 ++
   2.181 ++** Roadmap
   2.182 ++   [I'm going to explain how =CORTEX= works, then break down how
   2.183 ++   =EMPATH= does its thing. Because the details reveal such-and-such
   2.184 ++   about the approach.]
   2.185 ++
   2.186 ++   # The success of this simple proof-of-concept offers a tantalizing
   2.187 ++
   2.188 ++
   2.189 ++   # explore the idea 
   2.190 ++   # The key contribution of this thesis is the idea that body-centered
   2.191 ++   # representations (which express 
   2.192 ++
   2.193 ++
   2.194 ++   # the
   2.195 ++   # body-centered approach --- in which I try to determine what's
   2.196 ++   # happening in a scene by bringing it into registration with my own
   2.197 ++   # bodily experiences --- are indispensible for recognizing what
   2.198 ++   # creatures are doing in a scene.
   2.199 ++
   2.200 ++* COMMENT
   2.201 ++# body-centered language
   2.202 ++   
   2.203 ++   In this thesis, I'll describe =EMPATH=, which solves a certain
   2.204 ++   class of recognition problems 
   2.205 ++
   2.206 ++   The key idea is to use self-centered (or first-person) language.
   2.207 ++
   2.208 ++   I have built a system that can express the types of recognition
   2.209 ++   problems in a form amenable to computation. It is split into
   2.210 +    four parts:
   2.211 + 
   2.212 +    - Free/Guided Play :: The creature moves around and experiences the
   2.213 +@@ -286,14 +383,14 @@
   2.214 +      code to create a creature, and can use a wide library of
   2.215 +      pre-existing blender models as a base for your own creatures.
   2.216 + 
   2.217 +-   - =CORTEX= implements a wide variety of senses, including touch,
   2.218 ++   - =CORTEX= implements a wide variety of senses: touch,
   2.219 +      proprioception, vision, hearing, and muscle tension. Complicated
   2.220 +      senses like touch, and vision involve multiple sensory elements
   2.221 +      embedded in a 2D surface. You have complete control over the
   2.222 +      distribution of these sensor elements through the use of simple
   2.223 +      png image files. In particular, =CORTEX= implements more
   2.224 +      comprehensive hearing than any other creature simulation system
   2.225 +-     available. 
   2.226 ++     available.
   2.227 + 
   2.228 +    - =CORTEX= supports any number of creatures and any number of
   2.229 +      senses. Time in =CORTEX= dialates so that the simulated creatures
   2.230 +@@ -353,7 +450,24 @@
   2.231 +    \end{sidewaysfigure}
   2.232 + #+END_LaTeX
   2.233 + 
   2.234 +-** Contributions
   2.235 ++** Road map
   2.236 ++
   2.237 ++   By the end of this thesis, you will have seen a novel approach to
   2.238 ++  interpreting video using embodiment and empathy. You will have also
   2.239 ++  seen one way to efficiently implement empathy for embodied
   2.240 ++  creatures. Finally, you will become familiar with =CORTEX=, a system
   2.241 ++  for designing and simulating creatures with rich senses, which you
   2.242 ++  may choose to use in your own research.
   2.243 ++  
   2.244 ++  This is the core vision of my thesis: That one of the important ways
   2.245 ++  in which we understand others is by imagining ourselves in their
   2.246 ++  position and emphatically feeling experiences relative to our own
   2.247 ++  bodies. By understanding events in terms of our own previous
   2.248 ++  corporeal experience, we greatly constrain the possibilities of what
   2.249 ++  would otherwise be an unwieldy exponential search. This extra
   2.250 ++  constraint can be the difference between easily understanding what
   2.251 ++  is happening in a video and being completely lost in a sea of
   2.252 ++  incomprehensible color and movement.
   2.253 + 
   2.254 +    - I built =CORTEX=, a comprehensive platform for embodied AI
   2.255 +      experiments. =CORTEX= supports many features lacking in other
   2.256 +@@ -363,18 +477,22 @@
   2.257 +    - I built =EMPATH=, which uses =CORTEX= to identify the actions of
   2.258 +      a worm-like creature using a computational model of empathy.
   2.259 +    
   2.260 +-* Building =CORTEX=
   2.261 +-
   2.262 +-  I intend for =CORTEX= to be used as a general-purpose library for
   2.263 +-  building creatures and outfitting them with senses, so that it will
   2.264 +-  be useful for other researchers who want to test out ideas of their
   2.265 +-  own. To this end, wherver I have had to make archetictural choices
   2.266 +-  about =CORTEX=, I have chosen to give as much freedom to the user as
   2.267 +-  possible, so that =CORTEX= may be used for things I have not
   2.268 +-  forseen.
   2.269 +-
   2.270 +-** Simulation or Reality?
   2.271 +-   
   2.272 ++
   2.273 ++* Designing =CORTEX=
   2.274 ++  In this section, I outline the design decisions that went into
   2.275 ++  making =CORTEX=, along with some details about its
   2.276 ++  implementation. (A practical guide to getting started with =CORTEX=,
   2.277 ++  which skips over the history and implementation details presented
   2.278 ++  here, is provided in an appendix \ref{} at the end of this paper.)
   2.279 ++
   2.280 ++  Throughout this project, I intended for =CORTEX= to be flexible and
   2.281 ++  extensible enough to be useful for other researchers who want to
   2.282 ++  test out ideas of their own. To this end, wherver I have had to make
   2.283 ++  archetictural choices about =CORTEX=, I have chosen to give as much
   2.284 ++  freedom to the user as possible, so that =CORTEX= may be used for
   2.285 ++  things I have not forseen.
   2.286 ++
   2.287 ++** Building in simulation versus reality
   2.288 +    The most important archetictural decision of all is the choice to
   2.289 +    use a computer-simulated environemnt in the first place! The world
   2.290 +    is a vast and rich place, and for now simulations are a very poor
   2.291 +@@ -436,7 +554,7 @@
   2.292 +     doing everything in software is far cheaper than building custom
   2.293 +     real-time hardware. All you need is a laptop and some patience.
   2.294 + 
   2.295 +-** Because of Time, simulation is perferable to reality
   2.296 ++** Simulated time enables rapid prototyping and complex scenes 
   2.297 + 
   2.298 +    I envision =CORTEX= being used to support rapid prototyping and
   2.299 +    iteration of ideas. Even if I could put together a well constructed
   2.300 +@@ -459,8 +577,8 @@
   2.301 +    simulations of very simple creatures in =CORTEX= generally run at
   2.302 +    40x on my machine!
   2.303 + 
   2.304 +-** What is a sense?
   2.305 +-   
   2.306 ++** All sense organs are two-dimensional surfaces
   2.307 ++# What is a sense?   
   2.308 +    If =CORTEX= is to support a wide variety of senses, it would help
   2.309 +    to have a better understanding of what a ``sense'' actually is!
   2.310 +    While vision, touch, and hearing all seem like they are quite
   2.311 +@@ -956,7 +1074,7 @@
   2.312 +     #+ATTR_LaTeX: :width 15cm
   2.313 +     [[./images/physical-hand.png]]
   2.314 + 
   2.315 +-** Eyes reuse standard video game components
   2.316 ++** Sight reuses standard video game components...
   2.317 + 
   2.318 +    Vision is one of the most important senses for humans, so I need to
   2.319 +    build a simulated sense of vision for my AI. I will do this with
   2.320 +@@ -1257,8 +1375,8 @@
   2.321 +     community and is now (in modified form) part of a system for
   2.322 +     capturing in-game video to a file.
   2.323 + 
   2.324 +-** Hearing is hard; =CORTEX= does it right
   2.325 +-   
   2.326 ++** ...but hearing must be built from scratch
   2.327 ++# is hard; =CORTEX= does it right
   2.328 +    At the end of this section I will have simulated ears that work the
   2.329 +    same way as the simulated eyes in the last section. I will be able to
   2.330 +    place any number of ear-nodes in a blender file, and they will bind to
   2.331 +@@ -1565,7 +1683,7 @@
   2.332 +     jMonkeyEngine3 community and is used to record audio for demo
   2.333 +     videos.
   2.334 + 
   2.335 +-** Touch uses hundreds of hair-like elements
   2.336 ++** Hundreds of hair-like elements provide a sense of touch
   2.337 + 
   2.338 +    Touch is critical to navigation and spatial reasoning and as such I
   2.339 +    need a simulated version of it to give to my AI creatures.
   2.340 +@@ -2059,7 +2177,7 @@
   2.341 +     #+ATTR_LaTeX: :width 15cm
   2.342 +     [[./images/touch-cube.png]]
   2.343 + 
   2.344 +-** Proprioception is the sense that makes everything ``real''
   2.345 ++** Proprioception provides knowledge of your own body's position
   2.346 + 
   2.347 +    Close your eyes, and touch your nose with your right index finger.
   2.348 +    How did you do it? You could not see your hand, and neither your
   2.349 +@@ -2193,7 +2311,7 @@
   2.350 +     #+ATTR_LaTeX: :width 11cm
   2.351 +     [[./images/proprio.png]]
   2.352 + 
   2.353 +-** Muscles are both effectors and sensors
   2.354 ++** Muscles contain both sensors and effectors
   2.355 + 
   2.356 +    Surprisingly enough, terrestrial creatures only move by using
   2.357 +    torque applied about their joints. There's not a single straight
   2.358 +@@ -2440,7 +2558,8 @@
   2.359 +         hard control problems without worrying about physics or
   2.360 +         senses.
   2.361 + 
   2.362 +-* Empathy in a simulated worm
   2.363 ++* =EMPATH=: the simulated worm experiment
   2.364 ++# Empathy in a simulated worm
   2.365 + 
   2.366 +   Here I develop a computational model of empathy, using =CORTEX= as a
   2.367 +   base. Empathy in this context is the ability to observe another
   2.368 +@@ -2732,7 +2851,7 @@
   2.369 +    provided by an experience vector and reliably infering the rest of
   2.370 +    the senses.
   2.371 + 
   2.372 +-** Empathy is the process of tracing though \Phi-space 
   2.373 ++** ``Empathy'' requires retracing steps though \Phi-space 
   2.374 + 
   2.375 +    Here is the core of a basic empathy algorithm, starting with an
   2.376 +    experience vector:
   2.377 +@@ -2888,7 +3007,7 @@
   2.378 +    #+end_src
   2.379 +    #+end_listing
   2.380 +   
   2.381 +-** Efficient action recognition with =EMPATH=
   2.382 ++** =EMPATH= recognizes actions efficiently
   2.383 +    
   2.384 +    To use =EMPATH= with the worm, I first need to gather a set of
   2.385 +    experiences from the worm that includes the actions I want to
   2.386 +@@ -3044,9 +3163,9 @@
   2.387 +   to interpretation, and dissaggrement between empathy and experience
   2.388 +   is more excusable.
   2.389 + 
   2.390 +-** Digression: bootstrapping touch using free exploration
   2.391 +-
   2.392 +-   In the previous section I showed how to compute actions in terms of
   2.393 ++** Digression: Learn touch sensor layout through haptic experimentation, instead 
   2.394 ++# Boostraping touch using free exploration   
   2.395 ++In the previous section I showed how to compute actions in terms of
   2.396 +    body-centered predicates which relied averate touch activation of
   2.397 +    pre-defined regions of the worm's skin. What if, instead of recieving
   2.398 +    touch pre-grouped into the six faces of each worm segment, the true
   2.399 +@@ -3210,13 +3329,14 @@
   2.400 +   
   2.401 +   In this thesis you have seen the =CORTEX= system, a complete
   2.402 +   environment for creating simulated creatures. You have seen how to
   2.403 +-  implement five senses including touch, proprioception, hearing,
   2.404 +-  vision, and muscle tension. You have seen how to create new creatues
   2.405 +-  using blender, a 3D modeling tool. I hope that =CORTEX= will be
   2.406 +-  useful in further research projects. To this end I have included the
   2.407 +-  full source to =CORTEX= along with a large suite of tests and
   2.408 +-  examples. I have also created a user guide for =CORTEX= which is
   2.409 +-  inculded in an appendix to this thesis.
   2.410 ++  implement five senses: touch, proprioception, hearing, vision, and
   2.411 ++  muscle tension. You have seen how to create new creatues using
   2.412 ++  blender, a 3D modeling tool. I hope that =CORTEX= will be useful in
   2.413 ++  further research projects. To this end I have included the full
   2.414 ++  source to =CORTEX= along with a large suite of tests and examples. I
   2.415 ++  have also created a user guide for =CORTEX= which is inculded in an
   2.416 ++  appendix to this thesis \ref{}.
   2.417 ++# dxh: todo reference appendix
   2.418 + 
   2.419 +   You have also seen how I used =CORTEX= as a platform to attach the
   2.420 +   /action recognition/ problem, which is the problem of recognizing