rlm@513: diff -r f639e2139ce2 thesis/cortex.org
rlm@513: --- a/thesis/cortex.org	Sun Mar 30 01:34:43 2014 -0400
rlm@513: +++ b/thesis/cortex.org	Sun Mar 30 10:07:17 2014 -0400
rlm@513: @@ -41,49 +41,46 @@
rlm@513:      [[./images/aurellem-gray.png]]
rlm@513:  
rlm@513:  
rlm@513: -* Empathy and Embodiment as problem solving strategies
rlm@513: +* Empathy \& Embodiment: problem solving strategies
rlm@513:    
rlm@513: -  By the end of this thesis, you will have seen a novel approach to
rlm@513: -  interpreting video using embodiment and empathy. You will have also
rlm@513: -  seen one way to efficiently implement empathy for embodied
rlm@513: -  creatures. Finally, you will become familiar with =CORTEX=, a system
rlm@513: -  for designing and simulating creatures with rich senses, which you
rlm@513: -  may choose to use in your own research.
rlm@513: -  
rlm@513: -  This is the core vision of my thesis: That one of the important ways
rlm@513: -  in which we understand others is by imagining ourselves in their
rlm@513: -  position and emphatically feeling experiences relative to our own
rlm@513: -  bodies. By understanding events in terms of our own previous
rlm@513: -  corporeal experience, we greatly constrain the possibilities of what
rlm@513: -  would otherwise be an unwieldy exponential search. This extra
rlm@513: -  constraint can be the difference between easily understanding what
rlm@513: -  is happening in a video and being completely lost in a sea of
rlm@513: -  incomprehensible color and movement.
rlm@513: -  
rlm@513: -** Recognizing actions in video is extremely difficult
rlm@513: -
rlm@513: -   Consider for example the problem of determining what is happening
rlm@513: -   in a video of which this is one frame:
rlm@513: -
rlm@513: +** The problem: recognizing actions in video is extremely difficult
rlm@513: +# developing / requires useful representations
rlm@513: +   
rlm@513: +   Examine the following collection of images. As you, and indeed very
rlm@513: +   young children, can easily determine, each one is a picture of
rlm@513: +   someone drinking. 
rlm@513: +
rlm@513: +   # dxh: cat, cup, drinking fountain, rain, straw, coconut
rlm@513:     #+caption: A cat drinking some water. Identifying this action is 
rlm@513: -   #+caption: beyond the state of the art for computers.
rlm@513: +   #+caption: beyond the capabilities of existing computer vision systems.
rlm@513:     #+ATTR_LaTeX: :width 7cm
rlm@513:     [[./images/cat-drinking.jpg]]
rlm@513: +     
rlm@513: +   Nevertheless, it is beyond the state of the art for a computer
rlm@513: +   vision program to describe what's happening in each of these
rlm@513: +   images, or what's common to them. Part of the problem is that many
rlm@513: +   computer vision systems focus on pixel-level details or probability
rlm@513: +   distributions of pixels, with little focus on [...]
rlm@513: +
rlm@513: +
rlm@513: +   In fact, the contents of scene may have much less to do with pixel
rlm@513: +   probabilities than with recognizing various affordances: things you
rlm@513: +   can move, objects you can grasp, spaces that can be filled
rlm@513: +   (Gibson). For example, what processes might enable you to see the
rlm@513: +   chair in figure \ref{hidden-chair}? 
rlm@513: +   # Or suppose that you are building a program that recognizes chairs.
rlm@513: +   # How could you ``see'' the chair ?
rlm@513:     
rlm@513: -   It is currently impossible for any computer program to reliably
rlm@513: -   label such a video as ``drinking''. And rightly so -- it is a very
rlm@513: -   hard problem! What features can you describe in terms of low level
rlm@513: -   functions of pixels that can even begin to describe at a high level
rlm@513: -   what is happening here?
rlm@513: -  
rlm@513: -   Or suppose that you are building a program that recognizes chairs.
rlm@513: -   How could you ``see'' the chair in figure \ref{hidden-chair}?
rlm@513: -   
rlm@513: +   # dxh: blur chair
rlm@513:     #+caption: The chair in this image is quite obvious to humans, but I 
rlm@513:     #+caption: doubt that any modern computer vision program can find it.
rlm@513:     #+name: hidden-chair
rlm@513:     #+ATTR_LaTeX: :width 10cm
rlm@513:     [[./images/fat-person-sitting-at-desk.jpg]]
rlm@513: +
rlm@513: +
rlm@513: +   
rlm@513: +
rlm@513:     
rlm@513:     Finally, how is it that you can easily tell the difference between
rlm@513:     how the girls /muscles/ are working in figure \ref{girl}?
rlm@513: @@ -95,10 +92,13 @@
rlm@513:     #+ATTR_LaTeX: :width 7cm
rlm@513:     [[./images/wall-push.png]]
rlm@513:    
rlm@513: +
rlm@513: +
rlm@513: +
rlm@513:     Each of these examples tells us something about what might be going
rlm@513:     on in our minds as we easily solve these recognition problems.
rlm@513:     
rlm@513: -   The hidden chairs show us that we are strongly triggered by cues
rlm@513: +   The hidden chair shows us that we are strongly triggered by cues
rlm@513:     relating to the position of human bodies, and that we can determine
rlm@513:     the overall physical configuration of a human body even if much of
rlm@513:     that body is occluded.
rlm@513: @@ -109,10 +109,107 @@
rlm@513:     most positions, and we can easily project this self-knowledge to
rlm@513:     imagined positions triggered by images of the human body.
rlm@513:  
rlm@513: -** =EMPATH= neatly solves recognition problems  
rlm@513: +** A step forward: the sensorimotor-centered approach
rlm@513: +# ** =EMPATH= recognizes what creatures are doing
rlm@513: +# neatly solves recognition problems  
rlm@513: +   In this thesis, I explore the idea that our knowledge of our own
rlm@513: +   bodies enables us to recognize the actions of others. 
rlm@513: +
rlm@513: +   First, I built a system for constructing virtual creatures with
rlm@513: +   physiologically plausible sensorimotor systems and detailed
rlm@513: +   environments. The result is =CORTEX=, which is described in section
rlm@513: +   \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other
rlm@513: +   AI researchers; it is provided in full with detailed instructions
rlm@513: +   on the web [here].)
rlm@513: +
rlm@513: +   Next, I wrote routines which enabled a simple worm-like creature to
rlm@513: +   infer the actions of a second worm-like creature, using only its
rlm@513: +   own prior sensorimotor experiences and knowledge of the second
rlm@513: +   worm's joint positions. This program, =EMPATH=, is described in
rlm@513: +   section \ref{sec-3}, and the key results of this experiment are
rlm@513: +   summarized below.
rlm@513: +
rlm@513: +  #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer 
rlm@513: +  #+caption: the complete sensory experience and classify these four poses.
rlm@513: +  #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.
rlm@513: +  #+name: worm-recognition-intro-2
rlm@513: +  #+ATTR_LaTeX: :width 15cm
rlm@513: +   [[./images/empathy-1.png]]
rlm@513: +
rlm@513: +   # =CORTEX= provides a language for describing the sensorimotor
rlm@513: +   # experiences of various creatures. 
rlm@513: +
rlm@513: +   # Next, I developed an experiment to test the power of =CORTEX='s
rlm@513: +   # sensorimotor-centered language for solving recognition problems. As
rlm@513: +   # a proof of concept, I wrote routines which enabled a simple
rlm@513: +   # worm-like creature to infer the actions of a second worm-like
rlm@513: +   # creature, using only its own previous sensorimotor experiences and
rlm@513: +   # knowledge of the second worm's joints (figure
rlm@513: +   # \ref{worm-recognition-intro-2}). The result of this proof of
rlm@513: +   # concept was the program =EMPATH=, described in section
rlm@513: +   # \ref{sec-3}. The key results of this
rlm@513: +
rlm@513: +   # Using only first-person sensorimotor experiences and third-person
rlm@513: +   # proprioceptive data, 
rlm@513: +
rlm@513: +*** Key results
rlm@513: +   - After one-shot supervised training, =EMPATH= was able recognize a
rlm@513: +     wide variety of static poses and dynamic actions---ranging from
rlm@513: +     curling in a circle to wriggling with a particular frequency ---
rlm@513: +     with 95\% accuracy.
rlm@513: +   - These results were completely independent of viewing angle
rlm@513: +     because the underlying body-centered language fundamentally is;
rlm@513: +     once an action is learned, it can be recognized equally well from
rlm@513: +     any viewing angle.
rlm@513: +   - =EMPATH= is surprisingly short; the sensorimotor-centered
rlm@513: +     language provided by =CORTEX= resulted in extremely economical
rlm@513: +     recognition routines --- about 0000 lines in all --- suggesting
rlm@513: +     that such representations are very powerful, and often
rlm@513: +     indispensible for the types of recognition tasks considered here.
rlm@513: +   - Although for expediency's sake, I relied on direct knowledge of
rlm@513: +     joint positions in this proof of concept, it would be
rlm@513: +     straightforward to extend =EMPATH= so that it (more
rlm@513: +     realistically) infers joint positions from its visual data.
rlm@513: +
rlm@513: +# because the underlying language is fundamentally orientation-independent
rlm@513: +
rlm@513: +# recognize the actions of a worm with 95\% accuracy. The
rlm@513: +#      recognition tasks 
rlm@513:     
rlm@513: -   I propose a system that can express the types of recognition
rlm@513: -   problems above in a form amenable to computation. It is split into
rlm@513: +
rlm@513: +
rlm@513: +
rlm@513: +   [Talk about these results and what you find promising about them]
rlm@513: +
rlm@513: +** Roadmap
rlm@513: +   [I'm going to explain how =CORTEX= works, then break down how
rlm@513: +   =EMPATH= does its thing. Because the details reveal such-and-such
rlm@513: +   about the approach.]
rlm@513: +
rlm@513: +   # The success of this simple proof-of-concept offers a tantalizing
rlm@513: +
rlm@513: +
rlm@513: +   # explore the idea 
rlm@513: +   # The key contribution of this thesis is the idea that body-centered
rlm@513: +   # representations (which express 
rlm@513: +
rlm@513: +
rlm@513: +   # the
rlm@513: +   # body-centered approach --- in which I try to determine what's
rlm@513: +   # happening in a scene by bringing it into registration with my own
rlm@513: +   # bodily experiences --- are indispensible for recognizing what
rlm@513: +   # creatures are doing in a scene.
rlm@513: +
rlm@513: +* COMMENT
rlm@513: +# body-centered language
rlm@513: +   
rlm@513: +   In this thesis, I'll describe =EMPATH=, which solves a certain
rlm@513: +   class of recognition problems 
rlm@513: +
rlm@513: +   The key idea is to use self-centered (or first-person) language.
rlm@513: +
rlm@513: +   I have built a system that can express the types of recognition
rlm@513: +   problems in a form amenable to computation. It is split into
rlm@513:     four parts:
rlm@513:  
rlm@513:     - Free/Guided Play :: The creature moves around and experiences the
rlm@513: @@ -286,14 +383,14 @@
rlm@513:       code to create a creature, and can use a wide library of
rlm@513:       pre-existing blender models as a base for your own creatures.
rlm@513:  
rlm@513: -   - =CORTEX= implements a wide variety of senses, including touch,
rlm@513: +   - =CORTEX= implements a wide variety of senses: touch,
rlm@513:       proprioception, vision, hearing, and muscle tension. Complicated
rlm@513:       senses like touch, and vision involve multiple sensory elements
rlm@513:       embedded in a 2D surface. You have complete control over the
rlm@513:       distribution of these sensor elements through the use of simple
rlm@513:       png image files. In particular, =CORTEX= implements more
rlm@513:       comprehensive hearing than any other creature simulation system
rlm@513: -     available. 
rlm@513: +     available.
rlm@513:  
rlm@513:     - =CORTEX= supports any number of creatures and any number of
rlm@513:       senses. Time in =CORTEX= dialates so that the simulated creatures
rlm@513: @@ -353,7 +450,24 @@
rlm@513:     \end{sidewaysfigure}
rlm@513:  #+END_LaTeX
rlm@513:  
rlm@513: -** Contributions
rlm@513: +** Road map
rlm@513: +
rlm@513: +   By the end of this thesis, you will have seen a novel approach to
rlm@513: +  interpreting video using embodiment and empathy. You will have also
rlm@513: +  seen one way to efficiently implement empathy for embodied
rlm@513: +  creatures. Finally, you will become familiar with =CORTEX=, a system
rlm@513: +  for designing and simulating creatures with rich senses, which you
rlm@513: +  may choose to use in your own research.
rlm@513: +  
rlm@513: +  This is the core vision of my thesis: That one of the important ways
rlm@513: +  in which we understand others is by imagining ourselves in their
rlm@513: +  position and emphatically feeling experiences relative to our own
rlm@513: +  bodies. By understanding events in terms of our own previous
rlm@513: +  corporeal experience, we greatly constrain the possibilities of what
rlm@513: +  would otherwise be an unwieldy exponential search. This extra
rlm@513: +  constraint can be the difference between easily understanding what
rlm@513: +  is happening in a video and being completely lost in a sea of
rlm@513: +  incomprehensible color and movement.
rlm@513:  
rlm@513:     - I built =CORTEX=, a comprehensive platform for embodied AI
rlm@513:       experiments. =CORTEX= supports many features lacking in other
rlm@513: @@ -363,18 +477,22 @@
rlm@513:     - I built =EMPATH=, which uses =CORTEX= to identify the actions of
rlm@513:       a worm-like creature using a computational model of empathy.
rlm@513:     
rlm@513: -* Building =CORTEX=
rlm@513: -
rlm@513: -  I intend for =CORTEX= to be used as a general-purpose library for
rlm@513: -  building creatures and outfitting them with senses, so that it will
rlm@513: -  be useful for other researchers who want to test out ideas of their
rlm@513: -  own. To this end, wherver I have had to make archetictural choices
rlm@513: -  about =CORTEX=, I have chosen to give as much freedom to the user as
rlm@513: -  possible, so that =CORTEX= may be used for things I have not
rlm@513: -  forseen.
rlm@513: -
rlm@513: -** Simulation or Reality?
rlm@513: -   
rlm@513: +
rlm@513: +* Designing =CORTEX=
rlm@513: +  In this section, I outline the design decisions that went into
rlm@513: +  making =CORTEX=, along with some details about its
rlm@513: +  implementation. (A practical guide to getting started with =CORTEX=,
rlm@513: +  which skips over the history and implementation details presented
rlm@513: +  here, is provided in an appendix \ref{} at the end of this paper.)
rlm@513: +
rlm@513: +  Throughout this project, I intended for =CORTEX= to be flexible and
rlm@513: +  extensible enough to be useful for other researchers who want to
rlm@513: +  test out ideas of their own. To this end, wherver I have had to make
rlm@513: +  archetictural choices about =CORTEX=, I have chosen to give as much
rlm@513: +  freedom to the user as possible, so that =CORTEX= may be used for
rlm@513: +  things I have not forseen.
rlm@513: +
rlm@513: +** Building in simulation versus reality
rlm@513:     The most important archetictural decision of all is the choice to
rlm@513:     use a computer-simulated environemnt in the first place! The world
rlm@513:     is a vast and rich place, and for now simulations are a very poor