Mercurial > cortex

changeset 513:4c4d45f6f30b
accept/reject changes
author: Robert McIntyre <rlm@mit.edu>
date: Sun, 30 Mar 2014 10:41:18 -0400
parents: 8b962ab418c8
children: 447c3c8405a2
files: thesis/dxh-cortex-diff.diff thesis/dylan-accept.diff thesis/dylan-cortex-diff.diff thesis/dylan-reject.diff
diffstat: 4 files changed, 428 insertions(+), 428 deletions(-) [+]
[-]

thesis/dxh-cortex-diff.diff 428

thesis/dylan-accept.diff 22

thesis/dylan-cortex-diff.diff 395

thesis/dylan-reject.diff 11 thesis/dxh-cortex-diff.diff 428 thesis/dylan-accept.diff 22 thesis/dylan-cortex-diff.diff 395 thesis/dylan-reject.diff 11
thesis/dxh-cortex-diff.diff 428
thesis/dylan-accept.diff 22
     1.1 --- a/thesis/dxh-cortex-diff.diff	Sun Mar 30 10:39:19 2014 -0400
     1.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.3 @@ -1,428 +0,0 @@
     1.4 -diff -r f639e2139ce2 thesis/cortex.org
     1.5 ---- a/thesis/cortex.org	Sun Mar 30 01:34:43 2014 -0400
     1.6 -+++ b/thesis/cortex.org	Sun Mar 30 10:07:17 2014 -0400
     1.7 -@@ -41,49 +41,46 @@
     1.8 -     [[./images/aurellem-gray.png]]
     1.9 - 
    1.10 - 
    1.11 --* Empathy and Embodiment as problem solving strategies
    1.12 -+* Empathy \& Embodiment: problem solving strategies
    1.13 -   
    1.14 --  By the end of this thesis, you will have seen a novel approach to
    1.15 --  interpreting video using embodiment and empathy. You will have also
    1.16 --  seen one way to efficiently implement empathy for embodied
    1.17 --  creatures. Finally, you will become familiar with =CORTEX=, a system
    1.18 --  for designing and simulating creatures with rich senses, which you
    1.19 --  may choose to use in your own research.
    1.20 --  
    1.21 --  This is the core vision of my thesis: That one of the important ways
    1.22 --  in which we understand others is by imagining ourselves in their
    1.23 --  position and emphatically feeling experiences relative to our own
    1.24 --  bodies. By understanding events in terms of our own previous
    1.25 --  corporeal experience, we greatly constrain the possibilities of what
    1.26 --  would otherwise be an unwieldy exponential search. This extra
    1.27 --  constraint can be the difference between easily understanding what
    1.28 --  is happening in a video and being completely lost in a sea of
    1.29 --  incomprehensible color and movement.
    1.30 --  
    1.31 --** Recognizing actions in video is extremely difficult
    1.32 --
    1.33 --   Consider for example the problem of determining what is happening
    1.34 --   in a video of which this is one frame:
    1.35 --
    1.36 -+** The problem: recognizing actions in video is extremely difficult
    1.37 -+# developing / requires useful representations
    1.38 -+   
    1.39 -+   Examine the following collection of images. As you, and indeed very
    1.40 -+   young children, can easily determine, each one is a picture of
    1.41 -+   someone drinking. 
    1.42 -+
    1.43 -+   # dxh: cat, cup, drinking fountain, rain, straw, coconut
    1.44 -    #+caption: A cat drinking some water. Identifying this action is 
    1.45 --   #+caption: beyond the state of the art for computers.
    1.46 -+   #+caption: beyond the capabilities of existing computer vision systems.
    1.47 -    #+ATTR_LaTeX: :width 7cm
    1.48 -    [[./images/cat-drinking.jpg]]
    1.49 -+     
    1.50 -+   Nevertheless, it is beyond the state of the art for a computer
    1.51 -+   vision program to describe what's happening in each of these
    1.52 -+   images, or what's common to them. Part of the problem is that many
    1.53 -+   computer vision systems focus on pixel-level details or probability
    1.54 -+   distributions of pixels, with little focus on [...]
    1.55 -+
    1.56 -+
    1.57 -+   In fact, the contents of scene may have much less to do with pixel
    1.58 -+   probabilities than with recognizing various affordances: things you
    1.59 -+   can move, objects you can grasp, spaces that can be filled
    1.60 -+   (Gibson). For example, what processes might enable you to see the
    1.61 -+   chair in figure \ref{hidden-chair}? 
    1.62 -+   # Or suppose that you are building a program that recognizes chairs.
    1.63 -+   # How could you ``see'' the chair ?
    1.64 -    
    1.65 --   It is currently impossible for any computer program to reliably
    1.66 --   label such a video as ``drinking''. And rightly so -- it is a very
    1.67 --   hard problem! What features can you describe in terms of low level
    1.68 --   functions of pixels that can even begin to describe at a high level
    1.69 --   what is happening here?
    1.70 --  
    1.71 --   Or suppose that you are building a program that recognizes chairs.
    1.72 --   How could you ``see'' the chair in figure \ref{hidden-chair}?
    1.73 --   
    1.74 -+   # dxh: blur chair
    1.75 -    #+caption: The chair in this image is quite obvious to humans, but I 
    1.76 -    #+caption: doubt that any modern computer vision program can find it.
    1.77 -    #+name: hidden-chair
    1.78 -    #+ATTR_LaTeX: :width 10cm
    1.79 -    [[./images/fat-person-sitting-at-desk.jpg]]
    1.80 -+
    1.81 -+
    1.82 -+   
    1.83 -+
    1.84 -    
    1.85 -    Finally, how is it that you can easily tell the difference between
    1.86 -    how the girls /muscles/ are working in figure \ref{girl}?
    1.87 -@@ -95,10 +92,13 @@
    1.88 -    #+ATTR_LaTeX: :width 7cm
    1.89 -    [[./images/wall-push.png]]
    1.90 -   
    1.91 -+
    1.92 -+
    1.93 -+
    1.94 -    Each of these examples tells us something about what might be going
    1.95 -    on in our minds as we easily solve these recognition problems.
    1.96 -    
    1.97 --   The hidden chairs show us that we are strongly triggered by cues
    1.98 -+   The hidden chair shows us that we are strongly triggered by cues
    1.99 -    relating to the position of human bodies, and that we can determine
   1.100 -    the overall physical configuration of a human body even if much of
   1.101 -    that body is occluded.
   1.102 -@@ -109,10 +109,107 @@
   1.103 -    most positions, and we can easily project this self-knowledge to
   1.104 -    imagined positions triggered by images of the human body.
   1.105 - 
   1.106 --** =EMPATH= neatly solves recognition problems  
   1.107 -+** A step forward: the sensorimotor-centered approach
   1.108 -+# ** =EMPATH= recognizes what creatures are doing
   1.109 -+# neatly solves recognition problems  
   1.110 -+   In this thesis, I explore the idea that our knowledge of our own
   1.111 -+   bodies enables us to recognize the actions of others. 
   1.112 -+
   1.113 -+   First, I built a system for constructing virtual creatures with
   1.114 -+   physiologically plausible sensorimotor systems and detailed
   1.115 -+   environments. The result is =CORTEX=, which is described in section
   1.116 -+   \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other
   1.117 -+   AI researchers; it is provided in full with detailed instructions
   1.118 -+   on the web [here].)
   1.119 -+
   1.120 -+   Next, I wrote routines which enabled a simple worm-like creature to
   1.121 -+   infer the actions of a second worm-like creature, using only its
   1.122 -+   own prior sensorimotor experiences and knowledge of the second
   1.123 -+   worm's joint positions. This program, =EMPATH=, is described in
   1.124 -+   section \ref{sec-3}, and the key results of this experiment are
   1.125 -+   summarized below.
   1.126 -+
   1.127 -+  #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer 
   1.128 -+  #+caption: the complete sensory experience and classify these four poses.
   1.129 -+  #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.
   1.130 -+  #+name: worm-recognition-intro-2
   1.131 -+  #+ATTR_LaTeX: :width 15cm
   1.132 -+   [[./images/empathy-1.png]]
   1.133 -+
   1.134 -+   # =CORTEX= provides a language for describing the sensorimotor
   1.135 -+   # experiences of various creatures. 
   1.136 -+
   1.137 -+   # Next, I developed an experiment to test the power of =CORTEX='s
   1.138 -+   # sensorimotor-centered language for solving recognition problems. As
   1.139 -+   # a proof of concept, I wrote routines which enabled a simple
   1.140 -+   # worm-like creature to infer the actions of a second worm-like
   1.141 -+   # creature, using only its own previous sensorimotor experiences and
   1.142 -+   # knowledge of the second worm's joints (figure
   1.143 -+   # \ref{worm-recognition-intro-2}). The result of this proof of
   1.144 -+   # concept was the program =EMPATH=, described in section
   1.145 -+   # \ref{sec-3}. The key results of this
   1.146 -+
   1.147 -+   # Using only first-person sensorimotor experiences and third-person
   1.148 -+   # proprioceptive data, 
   1.149 -+
   1.150 -+*** Key results
   1.151 -+   - After one-shot supervised training, =EMPATH= was able recognize a
   1.152 -+     wide variety of static poses and dynamic actions---ranging from
   1.153 -+     curling in a circle to wriggling with a particular frequency ---
   1.154 -+     with 95\% accuracy.
   1.155 -+   - These results were completely independent of viewing angle
   1.156 -+     because the underlying body-centered language fundamentally is;
   1.157 -+     once an action is learned, it can be recognized equally well from
   1.158 -+     any viewing angle.
   1.159 -+   - =EMPATH= is surprisingly short; the sensorimotor-centered
   1.160 -+     language provided by =CORTEX= resulted in extremely economical
   1.161 -+     recognition routines --- about 0000 lines in all --- suggesting
   1.162 -+     that such representations are very powerful, and often
   1.163 -+     indispensible for the types of recognition tasks considered here.
   1.164 -+   - Although for expediency's sake, I relied on direct knowledge of
   1.165 -+     joint positions in this proof of concept, it would be
   1.166 -+     straightforward to extend =EMPATH= so that it (more
   1.167 -+     realistically) infers joint positions from its visual data.
   1.168 -+
   1.169 -+# because the underlying language is fundamentally orientation-independent
   1.170 -+
   1.171 -+# recognize the actions of a worm with 95\% accuracy. The
   1.172 -+#      recognition tasks 
   1.173 -    
   1.174 --   I propose a system that can express the types of recognition
   1.175 --   problems above in a form amenable to computation. It is split into
   1.176 -+
   1.177 -+
   1.178 -+
   1.179 -+   [Talk about these results and what you find promising about them]
   1.180 -+
   1.181 -+** Roadmap
   1.182 -+   [I'm going to explain how =CORTEX= works, then break down how
   1.183 -+   =EMPATH= does its thing. Because the details reveal such-and-such
   1.184 -+   about the approach.]
   1.185 -+
   1.186 -+   # The success of this simple proof-of-concept offers a tantalizing
   1.187 -+
   1.188 -+
   1.189 -+   # explore the idea 
   1.190 -+   # The key contribution of this thesis is the idea that body-centered
   1.191 -+   # representations (which express 
   1.192 -+
   1.193 -+
   1.194 -+   # the
   1.195 -+   # body-centered approach --- in which I try to determine what's
   1.196 -+   # happening in a scene by bringing it into registration with my own
   1.197 -+   # bodily experiences --- are indispensible for recognizing what
   1.198 -+   # creatures are doing in a scene.
   1.199 -+
   1.200 -+* COMMENT
   1.201 -+# body-centered language
   1.202 -+   
   1.203 -+   In this thesis, I'll describe =EMPATH=, which solves a certain
   1.204 -+   class of recognition problems 
   1.205 -+
   1.206 -+   The key idea is to use self-centered (or first-person) language.
   1.207 -+
   1.208 -+   I have built a system that can express the types of recognition
   1.209 -+   problems in a form amenable to computation. It is split into
   1.210 -    four parts:
   1.211 - 
   1.212 -    - Free/Guided Play :: The creature moves around and experiences the
   1.213 -@@ -286,14 +383,14 @@
   1.214 -      code to create a creature, and can use a wide library of
   1.215 -      pre-existing blender models as a base for your own creatures.
   1.216 - 
   1.217 --   - =CORTEX= implements a wide variety of senses, including touch,
   1.218 -+   - =CORTEX= implements a wide variety of senses: touch,
   1.219 -      proprioception, vision, hearing, and muscle tension. Complicated
   1.220 -      senses like touch, and vision involve multiple sensory elements
   1.221 -      embedded in a 2D surface. You have complete control over the
   1.222 -      distribution of these sensor elements through the use of simple
   1.223 -      png image files. In particular, =CORTEX= implements more
   1.224 -      comprehensive hearing than any other creature simulation system
   1.225 --     available. 
   1.226 -+     available.
   1.227 - 
   1.228 -    - =CORTEX= supports any number of creatures and any number of
   1.229 -      senses. Time in =CORTEX= dialates so that the simulated creatures
   1.230 -@@ -353,7 +450,24 @@
   1.231 -    \end{sidewaysfigure}
   1.232 - #+END_LaTeX
   1.233 - 
   1.234 --** Contributions
   1.235 -+** Road map
   1.236 -+
   1.237 -+   By the end of this thesis, you will have seen a novel approach to
   1.238 -+  interpreting video using embodiment and empathy. You will have also
   1.239 -+  seen one way to efficiently implement empathy for embodied
   1.240 -+  creatures. Finally, you will become familiar with =CORTEX=, a system
   1.241 -+  for designing and simulating creatures with rich senses, which you
   1.242 -+  may choose to use in your own research.
   1.243 -+  
   1.244 -+  This is the core vision of my thesis: That one of the important ways
   1.245 -+  in which we understand others is by imagining ourselves in their
   1.246 -+  position and emphatically feeling experiences relative to our own
   1.247 -+  bodies. By understanding events in terms of our own previous
   1.248 -+  corporeal experience, we greatly constrain the possibilities of what
   1.249 -+  would otherwise be an unwieldy exponential search. This extra
   1.250 -+  constraint can be the difference between easily understanding what
   1.251 -+  is happening in a video and being completely lost in a sea of
   1.252 -+  incomprehensible color and movement.
   1.253 - 
   1.254 -    - I built =CORTEX=, a comprehensive platform for embodied AI
   1.255 -      experiments. =CORTEX= supports many features lacking in other
   1.256 -@@ -363,18 +477,22 @@
   1.257 -    - I built =EMPATH=, which uses =CORTEX= to identify the actions of
   1.258 -      a worm-like creature using a computational model of empathy.
   1.259 -    
   1.260 --* Building =CORTEX=
   1.261 --
   1.262 --  I intend for =CORTEX= to be used as a general-purpose library for
   1.263 --  building creatures and outfitting them with senses, so that it will
   1.264 --  be useful for other researchers who want to test out ideas of their
   1.265 --  own. To this end, wherver I have had to make archetictural choices
   1.266 --  about =CORTEX=, I have chosen to give as much freedom to the user as
   1.267 --  possible, so that =CORTEX= may be used for things I have not
   1.268 --  forseen.
   1.269 --
   1.270 --** Simulation or Reality?
   1.271 --   
   1.272 -+
   1.273 -+* Designing =CORTEX=
   1.274 -+  In this section, I outline the design decisions that went into
   1.275 -+  making =CORTEX=, along with some details about its
   1.276 -+  implementation. (A practical guide to getting started with =CORTEX=,
   1.277 -+  which skips over the history and implementation details presented
   1.278 -+  here, is provided in an appendix \ref{} at the end of this paper.)
   1.279 -+
   1.280 -+  Throughout this project, I intended for =CORTEX= to be flexible and
   1.281 -+  extensible enough to be useful for other researchers who want to
   1.282 -+  test out ideas of their own. To this end, wherver I have had to make
   1.283 -+  archetictural choices about =CORTEX=, I have chosen to give as much
   1.284 -+  freedom to the user as possible, so that =CORTEX= may be used for
   1.285 -+  things I have not forseen.
   1.286 -+
   1.287 -+** Building in simulation versus reality
   1.288 -    The most important archetictural decision of all is the choice to
   1.289 -    use a computer-simulated environemnt in the first place! The world
   1.290 -    is a vast and rich place, and for now simulations are a very poor
   1.291 -@@ -436,7 +554,7 @@
   1.292 -     doing everything in software is far cheaper than building custom
   1.293 -     real-time hardware. All you need is a laptop and some patience.
   1.294 - 
   1.295 --** Because of Time, simulation is perferable to reality
   1.296 -+** Simulated time enables rapid prototyping and complex scenes 
   1.297 - 
   1.298 -    I envision =CORTEX= being used to support rapid prototyping and
   1.299 -    iteration of ideas. Even if I could put together a well constructed
   1.300 -@@ -459,8 +577,8 @@
   1.301 -    simulations of very simple creatures in =CORTEX= generally run at
   1.302 -    40x on my machine!
   1.303 - 
   1.304 --** What is a sense?
   1.305 --   
   1.306 -+** All sense organs are two-dimensional surfaces
   1.307 -+# What is a sense?   
   1.308 -    If =CORTEX= is to support a wide variety of senses, it would help
   1.309 -    to have a better understanding of what a ``sense'' actually is!
   1.310 -    While vision, touch, and hearing all seem like they are quite
   1.311 -@@ -956,7 +1074,7 @@
   1.312 -     #+ATTR_LaTeX: :width 15cm
   1.313 -     [[./images/physical-hand.png]]
   1.314 - 
   1.315 --** Eyes reuse standard video game components
   1.316 -+** Sight reuses standard video game components...
   1.317 - 
   1.318 -    Vision is one of the most important senses for humans, so I need to
   1.319 -    build a simulated sense of vision for my AI. I will do this with
   1.320 -@@ -1257,8 +1375,8 @@
   1.321 -     community and is now (in modified form) part of a system for
   1.322 -     capturing in-game video to a file.
   1.323 - 
   1.324 --** Hearing is hard; =CORTEX= does it right
   1.325 --   
   1.326 -+** ...but hearing must be built from scratch
   1.327 -+# is hard; =CORTEX= does it right
   1.328 -    At the end of this section I will have simulated ears that work the
   1.329 -    same way as the simulated eyes in the last section. I will be able to
   1.330 -    place any number of ear-nodes in a blender file, and they will bind to
   1.331 -@@ -1565,7 +1683,7 @@
   1.332 -     jMonkeyEngine3 community and is used to record audio for demo
   1.333 -     videos.
   1.334 - 
   1.335 --** Touch uses hundreds of hair-like elements
   1.336 -+** Hundreds of hair-like elements provide a sense of touch
   1.337 - 
   1.338 -    Touch is critical to navigation and spatial reasoning and as such I
   1.339 -    need a simulated version of it to give to my AI creatures.
   1.340 -@@ -2059,7 +2177,7 @@
   1.341 -     #+ATTR_LaTeX: :width 15cm
   1.342 -     [[./images/touch-cube.png]]
   1.343 - 
   1.344 --** Proprioception is the sense that makes everything ``real''
   1.345 -+** Proprioception provides knowledge of your own body's position
   1.346 - 
   1.347 -    Close your eyes, and touch your nose with your right index finger.
   1.348 -    How did you do it? You could not see your hand, and neither your
   1.349 -@@ -2193,7 +2311,7 @@
   1.350 -     #+ATTR_LaTeX: :width 11cm
   1.351 -     [[./images/proprio.png]]
   1.352 - 
   1.353 --** Muscles are both effectors and sensors
   1.354 -+** Muscles contain both sensors and effectors
   1.355 - 
   1.356 -    Surprisingly enough, terrestrial creatures only move by using
   1.357 -    torque applied about their joints. There's not a single straight
   1.358 -@@ -2440,7 +2558,8 @@
   1.359 -         hard control problems without worrying about physics or
   1.360 -         senses.
   1.361 - 
   1.362 --* Empathy in a simulated worm
   1.363 -+* =EMPATH=: the simulated worm experiment
   1.364 -+# Empathy in a simulated worm
   1.365 - 
   1.366 -   Here I develop a computational model of empathy, using =CORTEX= as a
   1.367 -   base. Empathy in this context is the ability to observe another
   1.368 -@@ -2732,7 +2851,7 @@
   1.369 -    provided by an experience vector and reliably infering the rest of
   1.370 -    the senses.
   1.371 - 
   1.372 --** Empathy is the process of tracing though \Phi-space 
   1.373 -+** ``Empathy'' requires retracing steps though \Phi-space 
   1.374 - 
   1.375 -    Here is the core of a basic empathy algorithm, starting with an
   1.376 -    experience vector:
   1.377 -@@ -2888,7 +3007,7 @@
   1.378 -    #+end_src
   1.379 -    #+end_listing
   1.380 -   
   1.381 --** Efficient action recognition with =EMPATH=
   1.382 -+** =EMPATH= recognizes actions efficiently
   1.383 -    
   1.384 -    To use =EMPATH= with the worm, I first need to gather a set of
   1.385 -    experiences from the worm that includes the actions I want to
   1.386 -@@ -3044,9 +3163,9 @@
   1.387 -   to interpretation, and dissaggrement between empathy and experience
   1.388 -   is more excusable.
   1.389 - 
   1.390 --** Digression: bootstrapping touch using free exploration
   1.391 --
   1.392 --   In the previous section I showed how to compute actions in terms of
   1.393 -+** Digression: Learn touch sensor layout through haptic experimentation, instead 
   1.394 -+# Boostraping touch using free exploration   
   1.395 -+In the previous section I showed how to compute actions in terms of
   1.396 -    body-centered predicates which relied averate touch activation of
   1.397 -    pre-defined regions of the worm's skin. What if, instead of recieving
   1.398 -    touch pre-grouped into the six faces of each worm segment, the true
   1.399 -@@ -3210,13 +3329,14 @@
   1.400 -   
   1.401 -   In this thesis you have seen the =CORTEX= system, a complete
   1.402 -   environment for creating simulated creatures. You have seen how to
   1.403 --  implement five senses including touch, proprioception, hearing,
   1.404 --  vision, and muscle tension. You have seen how to create new creatues
   1.405 --  using blender, a 3D modeling tool. I hope that =CORTEX= will be
   1.406 --  useful in further research projects. To this end I have included the
   1.407 --  full source to =CORTEX= along with a large suite of tests and
   1.408 --  examples. I have also created a user guide for =CORTEX= which is
   1.409 --  inculded in an appendix to this thesis.
   1.410 -+  implement five senses: touch, proprioception, hearing, vision, and
   1.411 -+  muscle tension. You have seen how to create new creatues using
   1.412 -+  blender, a 3D modeling tool. I hope that =CORTEX= will be useful in
   1.413 -+  further research projects. To this end I have included the full
   1.414 -+  source to =CORTEX= along with a large suite of tests and examples. I
   1.415 -+  have also created a user guide for =CORTEX= which is inculded in an
   1.416 -+  appendix to this thesis \ref{}.
   1.417 -+# dxh: todo reference appendix
   1.418 - 
   1.419 -   You have also seen how I used =CORTEX= as a platform to attach the
   1.420 -   /action recognition/ problem, which is the problem of recognizing
   1.421 -@@ -3234,8 +3354,8 @@
   1.422 - 
   1.423 -   - =CORTEX=, a system for creating simulated creatures with rich
   1.424 -     senses.
   1.425 --  - =EMPATH=, a program for recognizing actions by imagining sensory
   1.426 --    experience. 
   1.427 -+  - =EMPATH=, a program for recognizing actions by aligning them with
   1.428 -+    personal sensory experiences.
   1.429 - 
   1.430 - # An anatomical joke:
   1.431 - # - Training

     2.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     2.2 +++ b/thesis/dylan-accept.diff	Sun Mar 30 10:41:18 2014 -0400
     2.3 @@ -0,0 +1,22 @@
     2.4 +@@ -3210,13 +3329,14 @@
     2.5 +   
     2.6 +   In this thesis you have seen the =CORTEX= system, a complete
     2.7 +   environment for creating simulated creatures. You have seen how to
     2.8 +-  implement five senses including touch, proprioception, hearing,
     2.9 +-  vision, and muscle tension. You have seen how to create new creatues
    2.10 +-  using blender, a 3D modeling tool. I hope that =CORTEX= will be
    2.11 +-  useful in further research projects. To this end I have included the
    2.12 +-  full source to =CORTEX= along with a large suite of tests and
    2.13 +-  examples. I have also created a user guide for =CORTEX= which is
    2.14 +-  inculded in an appendix to this thesis.
    2.15 ++  implement five senses: touch, proprioception, hearing, vision, and
    2.16 ++  muscle tension. You have seen how to create new creatues using
    2.17 ++  blender, a 3D modeling tool. I hope that =CORTEX= will be useful in
    2.18 ++  further research projects. To this end I have included the full
    2.19 ++  source to =CORTEX= along with a large suite of tests and examples. I
    2.20 ++  have also created a user guide for =CORTEX= which is inculded in an
    2.21 ++  appendix to this thesis \ref{}.
    2.22 ++# dxh: todo reference appendix
    2.23 + 
    2.24 +   You have also seen how I used =CORTEX= as a platform to attach the
    2.25 +   /action recognition/ problem, which is the problem of recognizing

     3.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     3.2 +++ b/thesis/dylan-cortex-diff.diff	Sun Mar 30 10:41:18 2014 -0400
     3.3 @@ -0,0 +1,395 @@
     3.4 +diff -r f639e2139ce2 thesis/cortex.org
     3.5 +--- a/thesis/cortex.org	Sun Mar 30 01:34:43 2014 -0400
     3.6 ++++ b/thesis/cortex.org	Sun Mar 30 10:07:17 2014 -0400
     3.7 +@@ -41,49 +41,46 @@
     3.8 +     [[./images/aurellem-gray.png]]
     3.9 + 
    3.10 + 
    3.11 +-* Empathy and Embodiment as problem solving strategies
    3.12 ++* Empathy \& Embodiment: problem solving strategies
    3.13 +   
    3.14 +-  By the end of this thesis, you will have seen a novel approach to
    3.15 +-  interpreting video using embodiment and empathy. You will have also
    3.16 +-  seen one way to efficiently implement empathy for embodied
    3.17 +-  creatures. Finally, you will become familiar with =CORTEX=, a system
    3.18 +-  for designing and simulating creatures with rich senses, which you
    3.19 +-  may choose to use in your own research.
    3.20 +-  
    3.21 +-  This is the core vision of my thesis: That one of the important ways
    3.22 +-  in which we understand others is by imagining ourselves in their
    3.23 +-  position and emphatically feeling experiences relative to our own
    3.24 +-  bodies. By understanding events in terms of our own previous
    3.25 +-  corporeal experience, we greatly constrain the possibilities of what
    3.26 +-  would otherwise be an unwieldy exponential search. This extra
    3.27 +-  constraint can be the difference between easily understanding what
    3.28 +-  is happening in a video and being completely lost in a sea of
    3.29 +-  incomprehensible color and movement.
    3.30 +-  
    3.31 +-** Recognizing actions in video is extremely difficult
    3.32 +-
    3.33 +-   Consider for example the problem of determining what is happening
    3.34 +-   in a video of which this is one frame:
    3.35 +-
    3.36 ++** The problem: recognizing actions in video is extremely difficult
    3.37 ++# developing / requires useful representations
    3.38 ++   
    3.39 ++   Examine the following collection of images. As you, and indeed very
    3.40 ++   young children, can easily determine, each one is a picture of
    3.41 ++   someone drinking. 
    3.42 ++
    3.43 ++   # dxh: cat, cup, drinking fountain, rain, straw, coconut
    3.44 +    #+caption: A cat drinking some water. Identifying this action is 
    3.45 +-   #+caption: beyond the state of the art for computers.
    3.46 ++   #+caption: beyond the capabilities of existing computer vision systems.
    3.47 +    #+ATTR_LaTeX: :width 7cm
    3.48 +    [[./images/cat-drinking.jpg]]
    3.49 ++     
    3.50 ++   Nevertheless, it is beyond the state of the art for a computer
    3.51 ++   vision program to describe what's happening in each of these
    3.52 ++   images, or what's common to them. Part of the problem is that many
    3.53 ++   computer vision systems focus on pixel-level details or probability
    3.54 ++   distributions of pixels, with little focus on [...]
    3.55 ++
    3.56 ++
    3.57 ++   In fact, the contents of scene may have much less to do with pixel
    3.58 ++   probabilities than with recognizing various affordances: things you
    3.59 ++   can move, objects you can grasp, spaces that can be filled
    3.60 ++   (Gibson). For example, what processes might enable you to see the
    3.61 ++   chair in figure \ref{hidden-chair}? 
    3.62 ++   # Or suppose that you are building a program that recognizes chairs.
    3.63 ++   # How could you ``see'' the chair ?
    3.64 +    
    3.65 +-   It is currently impossible for any computer program to reliably
    3.66 +-   label such a video as ``drinking''. And rightly so -- it is a very
    3.67 +-   hard problem! What features can you describe in terms of low level
    3.68 +-   functions of pixels that can even begin to describe at a high level
    3.69 +-   what is happening here?
    3.70 +-  
    3.71 +-   Or suppose that you are building a program that recognizes chairs.
    3.72 +-   How could you ``see'' the chair in figure \ref{hidden-chair}?
    3.73 +-   
    3.74 ++   # dxh: blur chair
    3.75 +    #+caption: The chair in this image is quite obvious to humans, but I 
    3.76 +    #+caption: doubt that any modern computer vision program can find it.
    3.77 +    #+name: hidden-chair
    3.78 +    #+ATTR_LaTeX: :width 10cm
    3.79 +    [[./images/fat-person-sitting-at-desk.jpg]]
    3.80 ++
    3.81 ++
    3.82 ++   
    3.83 ++
    3.84 +    
    3.85 +    Finally, how is it that you can easily tell the difference between
    3.86 +    how the girls /muscles/ are working in figure \ref{girl}?
    3.87 +@@ -95,10 +92,13 @@
    3.88 +    #+ATTR_LaTeX: :width 7cm
    3.89 +    [[./images/wall-push.png]]
    3.90 +   
    3.91 ++
    3.92 ++
    3.93 ++
    3.94 +    Each of these examples tells us something about what might be going
    3.95 +    on in our minds as we easily solve these recognition problems.
    3.96 +    
    3.97 +-   The hidden chairs show us that we are strongly triggered by cues
    3.98 ++   The hidden chair shows us that we are strongly triggered by cues
    3.99 +    relating to the position of human bodies, and that we can determine
   3.100 +    the overall physical configuration of a human body even if much of
   3.101 +    that body is occluded.
   3.102 +@@ -109,10 +109,107 @@
   3.103 +    most positions, and we can easily project this self-knowledge to
   3.104 +    imagined positions triggered by images of the human body.
   3.105 + 
   3.106 +-** =EMPATH= neatly solves recognition problems  
   3.107 ++** A step forward: the sensorimotor-centered approach
   3.108 ++# ** =EMPATH= recognizes what creatures are doing
   3.109 ++# neatly solves recognition problems  
   3.110 ++   In this thesis, I explore the idea that our knowledge of our own
   3.111 ++   bodies enables us to recognize the actions of others. 
   3.112 ++
   3.113 ++   First, I built a system for constructing virtual creatures with
   3.114 ++   physiologically plausible sensorimotor systems and detailed
   3.115 ++   environments. The result is =CORTEX=, which is described in section
   3.116 ++   \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other
   3.117 ++   AI researchers; it is provided in full with detailed instructions
   3.118 ++   on the web [here].)
   3.119 ++
   3.120 ++   Next, I wrote routines which enabled a simple worm-like creature to
   3.121 ++   infer the actions of a second worm-like creature, using only its
   3.122 ++   own prior sensorimotor experiences and knowledge of the second
   3.123 ++   worm's joint positions. This program, =EMPATH=, is described in
   3.124 ++   section \ref{sec-3}, and the key results of this experiment are
   3.125 ++   summarized below.
   3.126 ++
   3.127 ++  #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer 
   3.128 ++  #+caption: the complete sensory experience and classify these four poses.
   3.129 ++  #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.
   3.130 ++  #+name: worm-recognition-intro-2
   3.131 ++  #+ATTR_LaTeX: :width 15cm
   3.132 ++   [[./images/empathy-1.png]]
   3.133 ++
   3.134 ++   # =CORTEX= provides a language for describing the sensorimotor
   3.135 ++   # experiences of various creatures. 
   3.136 ++
   3.137 ++   # Next, I developed an experiment to test the power of =CORTEX='s
   3.138 ++   # sensorimotor-centered language for solving recognition problems. As
   3.139 ++   # a proof of concept, I wrote routines which enabled a simple
   3.140 ++   # worm-like creature to infer the actions of a second worm-like
   3.141 ++   # creature, using only its own previous sensorimotor experiences and
   3.142 ++   # knowledge of the second worm's joints (figure
   3.143 ++   # \ref{worm-recognition-intro-2}). The result of this proof of
   3.144 ++   # concept was the program =EMPATH=, described in section
   3.145 ++   # \ref{sec-3}. The key results of this
   3.146 ++
   3.147 ++   # Using only first-person sensorimotor experiences and third-person
   3.148 ++   # proprioceptive data, 
   3.149 ++
   3.150 ++*** Key results
   3.151 ++   - After one-shot supervised training, =EMPATH= was able recognize a
   3.152 ++     wide variety of static poses and dynamic actions---ranging from
   3.153 ++     curling in a circle to wriggling with a particular frequency ---
   3.154 ++     with 95\% accuracy.
   3.155 ++   - These results were completely independent of viewing angle
   3.156 ++     because the underlying body-centered language fundamentally is;
   3.157 ++     once an action is learned, it can be recognized equally well from
   3.158 ++     any viewing angle.
   3.159 ++   - =EMPATH= is surprisingly short; the sensorimotor-centered
   3.160 ++     language provided by =CORTEX= resulted in extremely economical
   3.161 ++     recognition routines --- about 0000 lines in all --- suggesting
   3.162 ++     that such representations are very powerful, and often
   3.163 ++     indispensible for the types of recognition tasks considered here.
   3.164 ++   - Although for expediency's sake, I relied on direct knowledge of
   3.165 ++     joint positions in this proof of concept, it would be
   3.166 ++     straightforward to extend =EMPATH= so that it (more
   3.167 ++     realistically) infers joint positions from its visual data.
   3.168 ++
   3.169 ++# because the underlying language is fundamentally orientation-independent
   3.170 ++
   3.171 ++# recognize the actions of a worm with 95\% accuracy. The
   3.172 ++#      recognition tasks 
   3.173 +    
   3.174 +-   I propose a system that can express the types of recognition
   3.175 +-   problems above in a form amenable to computation. It is split into
   3.176 ++
   3.177 ++
   3.178 ++
   3.179 ++   [Talk about these results and what you find promising about them]
   3.180 ++
   3.181 ++** Roadmap
   3.182 ++   [I'm going to explain how =CORTEX= works, then break down how
   3.183 ++   =EMPATH= does its thing. Because the details reveal such-and-such
   3.184 ++   about the approach.]
   3.185 ++
   3.186 ++   # The success of this simple proof-of-concept offers a tantalizing
   3.187 ++
   3.188 ++
   3.189 ++   # explore the idea 
   3.190 ++   # The key contribution of this thesis is the idea that body-centered
   3.191 ++   # representations (which express 
   3.192 ++
   3.193 ++
   3.194 ++   # the
   3.195 ++   # body-centered approach --- in which I try to determine what's
   3.196 ++   # happening in a scene by bringing it into registration with my own
   3.197 ++   # bodily experiences --- are indispensible for recognizing what
   3.198 ++   # creatures are doing in a scene.
   3.199 ++
   3.200 ++* COMMENT
   3.201 ++# body-centered language
   3.202 ++   
   3.203 ++   In this thesis, I'll describe =EMPATH=, which solves a certain
   3.204 ++   class of recognition problems 
   3.205 ++
   3.206 ++   The key idea is to use self-centered (or first-person) language.
   3.207 ++
   3.208 ++   I have built a system that can express the types of recognition
   3.209 ++   problems in a form amenable to computation. It is split into
   3.210 +    four parts:
   3.211 + 
   3.212 +    - Free/Guided Play :: The creature moves around and experiences the
   3.213 +@@ -286,14 +383,14 @@
   3.214 +      code to create a creature, and can use a wide library of
   3.215 +      pre-existing blender models as a base for your own creatures.
   3.216 + 
   3.217 +-   - =CORTEX= implements a wide variety of senses, including touch,
   3.218 ++   - =CORTEX= implements a wide variety of senses: touch,
   3.219 +      proprioception, vision, hearing, and muscle tension. Complicated
   3.220 +      senses like touch, and vision involve multiple sensory elements
   3.221 +      embedded in a 2D surface. You have complete control over the
   3.222 +      distribution of these sensor elements through the use of simple
   3.223 +      png image files. In particular, =CORTEX= implements more
   3.224 +      comprehensive hearing than any other creature simulation system
   3.225 +-     available. 
   3.226 ++     available.
   3.227 + 
   3.228 +    - =CORTEX= supports any number of creatures and any number of
   3.229 +      senses. Time in =CORTEX= dialates so that the simulated creatures
   3.230 +@@ -353,7 +450,24 @@
   3.231 +    \end{sidewaysfigure}
   3.232 + #+END_LaTeX
   3.233 + 
   3.234 +-** Contributions
   3.235 ++** Road map
   3.236 ++
   3.237 ++   By the end of this thesis, you will have seen a novel approach to
   3.238 ++  interpreting video using embodiment and empathy. You will have also
   3.239 ++  seen one way to efficiently implement empathy for embodied
   3.240 ++  creatures. Finally, you will become familiar with =CORTEX=, a system
   3.241 ++  for designing and simulating creatures with rich senses, which you
   3.242 ++  may choose to use in your own research.
   3.243 ++  
   3.244 ++  This is the core vision of my thesis: That one of the important ways
   3.245 ++  in which we understand others is by imagining ourselves in their
   3.246 ++  position and emphatically feeling experiences relative to our own
   3.247 ++  bodies. By understanding events in terms of our own previous
   3.248 ++  corporeal experience, we greatly constrain the possibilities of what
   3.249 ++  would otherwise be an unwieldy exponential search. This extra
   3.250 ++  constraint can be the difference between easily understanding what
   3.251 ++  is happening in a video and being completely lost in a sea of
   3.252 ++  incomprehensible color and movement.
   3.253 + 
   3.254 +    - I built =CORTEX=, a comprehensive platform for embodied AI
   3.255 +      experiments. =CORTEX= supports many features lacking in other
   3.256 +@@ -363,18 +477,22 @@
   3.257 +    - I built =EMPATH=, which uses =CORTEX= to identify the actions of
   3.258 +      a worm-like creature using a computational model of empathy.
   3.259 +    
   3.260 +-* Building =CORTEX=
   3.261 +-
   3.262 +-  I intend for =CORTEX= to be used as a general-purpose library for
   3.263 +-  building creatures and outfitting them with senses, so that it will
   3.264 +-  be useful for other researchers who want to test out ideas of their
   3.265 +-  own. To this end, wherver I have had to make archetictural choices
   3.266 +-  about =CORTEX=, I have chosen to give as much freedom to the user as
   3.267 +-  possible, so that =CORTEX= may be used for things I have not
   3.268 +-  forseen.
   3.269 +-
   3.270 +-** Simulation or Reality?
   3.271 +-   
   3.272 ++
   3.273 ++* Designing =CORTEX=
   3.274 ++  In this section, I outline the design decisions that went into
   3.275 ++  making =CORTEX=, along with some details about its
   3.276 ++  implementation. (A practical guide to getting started with =CORTEX=,
   3.277 ++  which skips over the history and implementation details presented
   3.278 ++  here, is provided in an appendix \ref{} at the end of this paper.)
   3.279 ++
   3.280 ++  Throughout this project, I intended for =CORTEX= to be flexible and
   3.281 ++  extensible enough to be useful for other researchers who want to
   3.282 ++  test out ideas of their own. To this end, wherver I have had to make
   3.283 ++  archetictural choices about =CORTEX=, I have chosen to give as much
   3.284 ++  freedom to the user as possible, so that =CORTEX= may be used for
   3.285 ++  things I have not forseen.
   3.286 ++
   3.287 ++** Building in simulation versus reality
   3.288 +    The most important archetictural decision of all is the choice to
   3.289 +    use a computer-simulated environemnt in the first place! The world
   3.290 +    is a vast and rich place, and for now simulations are a very poor
   3.291 +@@ -436,7 +554,7 @@
   3.292 +     doing everything in software is far cheaper than building custom
   3.293 +     real-time hardware. All you need is a laptop and some patience.
   3.294 + 
   3.295 +-** Because of Time, simulation is perferable to reality
   3.296 ++** Simulated time enables rapid prototyping and complex scenes 
   3.297 + 
   3.298 +    I envision =CORTEX= being used to support rapid prototyping and
   3.299 +    iteration of ideas. Even if I could put together a well constructed
   3.300 +@@ -459,8 +577,8 @@
   3.301 +    simulations of very simple creatures in =CORTEX= generally run at
   3.302 +    40x on my machine!
   3.303 + 
   3.304 +-** What is a sense?
   3.305 +-   
   3.306 ++** All sense organs are two-dimensional surfaces
   3.307 ++# What is a sense?   
   3.308 +    If =CORTEX= is to support a wide variety of senses, it would help
   3.309 +    to have a better understanding of what a ``sense'' actually is!
   3.310 +    While vision, touch, and hearing all seem like they are quite
   3.311 +@@ -956,7 +1074,7 @@
   3.312 +     #+ATTR_LaTeX: :width 15cm
   3.313 +     [[./images/physical-hand.png]]
   3.314 + 
   3.315 +-** Eyes reuse standard video game components
   3.316 ++** Sight reuses standard video game components...
   3.317 + 
   3.318 +    Vision is one of the most important senses for humans, so I need to
   3.319 +    build a simulated sense of vision for my AI. I will do this with
   3.320 +@@ -1257,8 +1375,8 @@
   3.321 +     community and is now (in modified form) part of a system for
   3.322 +     capturing in-game video to a file.
   3.323 + 
   3.324 +-** Hearing is hard; =CORTEX= does it right
   3.325 +-   
   3.326 ++** ...but hearing must be built from scratch
   3.327 ++# is hard; =CORTEX= does it right
   3.328 +    At the end of this section I will have simulated ears that work the
   3.329 +    same way as the simulated eyes in the last section. I will be able to
   3.330 +    place any number of ear-nodes in a blender file, and they will bind to
   3.331 +@@ -1565,7 +1683,7 @@
   3.332 +     jMonkeyEngine3 community and is used to record audio for demo
   3.333 +     videos.
   3.334 + 
   3.335 +-** Touch uses hundreds of hair-like elements
   3.336 ++** Hundreds of hair-like elements provide a sense of touch
   3.337 + 
   3.338 +    Touch is critical to navigation and spatial reasoning and as such I
   3.339 +    need a simulated version of it to give to my AI creatures.
   3.340 +@@ -2059,7 +2177,7 @@
   3.341 +     #+ATTR_LaTeX: :width 15cm
   3.342 +     [[./images/touch-cube.png]]
   3.343 + 
   3.344 +-** Proprioception is the sense that makes everything ``real''
   3.345 ++** Proprioception provides knowledge of your own body's position
   3.346 + 
   3.347 +    Close your eyes, and touch your nose with your right index finger.
   3.348 +    How did you do it? You could not see your hand, and neither your
   3.349 +@@ -2193,7 +2311,7 @@
   3.350 +     #+ATTR_LaTeX: :width 11cm
   3.351 +     [[./images/proprio.png]]
   3.352 + 
   3.353 +-** Muscles are both effectors and sensors
   3.354 ++** Muscles contain both sensors and effectors
   3.355 + 
   3.356 +    Surprisingly enough, terrestrial creatures only move by using
   3.357 +    torque applied about their joints. There's not a single straight
   3.358 +@@ -2440,7 +2558,8 @@
   3.359 +         hard control problems without worrying about physics or
   3.360 +         senses.
   3.361 + 
   3.362 +-* Empathy in a simulated worm
   3.363 ++* =EMPATH=: the simulated worm experiment
   3.364 ++# Empathy in a simulated worm
   3.365 + 
   3.366 +   Here I develop a computational model of empathy, using =CORTEX= as a
   3.367 +   base. Empathy in this context is the ability to observe another
   3.368 +@@ -2732,7 +2851,7 @@
   3.369 +    provided by an experience vector and reliably infering the rest of
   3.370 +    the senses.
   3.371 + 
   3.372 +-** Empathy is the process of tracing though \Phi-space 
   3.373 ++** ``Empathy'' requires retracing steps though \Phi-space 
   3.374 + 
   3.375 +    Here is the core of a basic empathy algorithm, starting with an
   3.376 +    experience vector:
   3.377 +@@ -2888,7 +3007,7 @@
   3.378 +    #+end_src
   3.379 +    #+end_listing
   3.380 +   
   3.381 +-** Efficient action recognition with =EMPATH=
   3.382 ++** =EMPATH= recognizes actions efficiently
   3.383 +    
   3.384 +    To use =EMPATH= with the worm, I first need to gather a set of
   3.385 +    experiences from the worm that includes the actions I want to
   3.386 +@@ -3044,9 +3163,9 @@
   3.387 +   to interpretation, and dissaggrement between empathy and experience
   3.388 +   is more excusable.
   3.389 + 
   3.390 +-** Digression: bootstrapping touch using free exploration
   3.391 +-
   3.392 +-   In the previous section I showed how to compute actions in terms of
   3.393 ++** Digression: Learn touch sensor layout through haptic experimentation, instead 
   3.394 ++# Boostraping touch using free exploration   
   3.395 ++In the previous section I showed how to compute actions in terms of
   3.396 +    body-centered predicates which relied averate touch activation of
   3.397 +    pre-defined regions of the worm's skin. What if, instead of recieving
   3.398 +    touch pre-grouped into the six faces of each worm segment, the true

     4.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     4.2 +++ b/thesis/dylan-reject.diff	Sun Mar 30 10:41:18 2014 -0400
     4.3 @@ -0,0 +1,11 @@
     4.4 +@@ -3234,8 +3354,8 @@
     4.5 + 
     4.6 +   - =CORTEX=, a system for creating simulated creatures with rich
     4.7 +     senses.
     4.8 +-  - =EMPATH=, a program for recognizing actions by imagining sensory
     4.9 +-    experience. 
    4.10 ++  - =EMPATH=, a program for recognizing actions by aligning them with
    4.11 ++    personal sensory experiences.
    4.12 + 
    4.13 + # An anatomical joke:
    4.14 + # - Training