Mercurial > cortex

     1 diff -r f639e2139ce2 thesis/cortex.org

     2 --- a/thesis/cortex.org	Sun Mar 30 01:34:43 2014 -0400

     3 +++ b/thesis/cortex.org	Sun Mar 30 10:07:17 2014 -0400

     4 @@ -41,49 +41,46 @@

     5      [[./images/aurellem-gray.png]]

     6  

     7  

     8 -* Empathy and Embodiment as problem solving strategies

     9 +* Empathy \& Embodiment: problem solving strategies

    10    

    11 -  By the end of this thesis, you will have seen a novel approach to

    12 -  interpreting video using embodiment and empathy. You will have also

    13 -  seen one way to efficiently implement empathy for embodied

    14 -  creatures. Finally, you will become familiar with =CORTEX=, a system

    15 -  for designing and simulating creatures with rich senses, which you

    16 -  may choose to use in your own research.

    17 -  

    18 -  This is the core vision of my thesis: That one of the important ways

    19 -  in which we understand others is by imagining ourselves in their

    20 -  position and emphatically feeling experiences relative to our own

    21 -  bodies. By understanding events in terms of our own previous

    22 -  corporeal experience, we greatly constrain the possibilities of what

    23 -  would otherwise be an unwieldy exponential search. This extra

    24 -  constraint can be the difference between easily understanding what

    25 -  is happening in a video and being completely lost in a sea of

    26 -  incomprehensible color and movement.

    27 -  

    28 -** Recognizing actions in video is extremely difficult

    29 -

    30 -   Consider for example the problem of determining what is happening

    31 -   in a video of which this is one frame:

    32 -

    33 +** The problem: recognizing actions in video is extremely difficult

    34 +# developing / requires useful representations

    35 +   

    36 +   Examine the following collection of images. As you, and indeed very

    37 +   young children, can easily determine, each one is a picture of

    38 +   someone drinking. 

    39 +

    40 +   # dxh: cat, cup, drinking fountain, rain, straw, coconut

    41     #+caption: A cat drinking some water. Identifying this action is 

    42 -   #+caption: beyond the state of the art for computers.

    43 +   #+caption: beyond the capabilities of existing computer vision systems.

    44     #+ATTR_LaTeX: :width 7cm

    45     [[./images/cat-drinking.jpg]]

    46 +     

    47 +   Nevertheless, it is beyond the state of the art for a computer

    48 +   vision program to describe what's happening in each of these

    49 +   images, or what's common to them. Part of the problem is that many

    50 +   computer vision systems focus on pixel-level details or probability

    51 +   distributions of pixels, with little focus on [...]

    52 +

    53 +

    54 +   In fact, the contents of scene may have much less to do with pixel

    55 +   probabilities than with recognizing various affordances: things you

    56 +   can move, objects you can grasp, spaces that can be filled

    57 +   (Gibson). For example, what processes might enable you to see the

    58 +   chair in figure \ref{hidden-chair}? 

    59 +   # Or suppose that you are building a program that recognizes chairs.

    60 +   # How could you ``see'' the chair ?

    61     

    62 -   It is currently impossible for any computer program to reliably

    63 -   label such a video as ``drinking''. And rightly so -- it is a very

    64 -   hard problem! What features can you describe in terms of low level

    65 -   functions of pixels that can even begin to describe at a high level

    66 -   what is happening here?

    67 -  

    68 -   Or suppose that you are building a program that recognizes chairs.

    69 -   How could you ``see'' the chair in figure \ref{hidden-chair}?

    70 -   

    71 +   # dxh: blur chair

    72     #+caption: The chair in this image is quite obvious to humans, but I 

    73     #+caption: doubt that any modern computer vision program can find it.

    74     #+name: hidden-chair

    75     #+ATTR_LaTeX: :width 10cm

    76     [[./images/fat-person-sitting-at-desk.jpg]]

    77 +

    78 +

    79 +   

    80 +

    81     

    82     Finally, how is it that you can easily tell the difference between

    83     how the girls /muscles/ are working in figure \ref{girl}?

    84 @@ -95,10 +92,13 @@

    85     #+ATTR_LaTeX: :width 7cm

    86     [[./images/wall-push.png]]

    87    

    88 +

    89 +

    90 +

    91     Each of these examples tells us something about what might be going

    92     on in our minds as we easily solve these recognition problems.

    93     

    94 -   The hidden chairs show us that we are strongly triggered by cues

    95 +   The hidden chair shows us that we are strongly triggered by cues

    96     relating to the position of human bodies, and that we can determine

    97     the overall physical configuration of a human body even if much of

    98     that body is occluded.

    99 @@ -109,10 +109,107 @@

   100     most positions, and we can easily project this self-knowledge to

   101     imagined positions triggered by images of the human body.

   102  

   103 -** =EMPATH= neatly solves recognition problems  

   104 +** A step forward: the sensorimotor-centered approach

   105 +# ** =EMPATH= recognizes what creatures are doing

   106 +# neatly solves recognition problems  

   107 +   In this thesis, I explore the idea that our knowledge of our own

   108 +   bodies enables us to recognize the actions of others. 

   109 +

   110 +   First, I built a system for constructing virtual creatures with

   111 +   physiologically plausible sensorimotor systems and detailed

   112 +   environments. The result is =CORTEX=, which is described in section

   113 +   \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other

   114 +   AI researchers; it is provided in full with detailed instructions

   115 +   on the web [here].)

   116 +

   117 +   Next, I wrote routines which enabled a simple worm-like creature to

   118 +   infer the actions of a second worm-like creature, using only its

   119 +   own prior sensorimotor experiences and knowledge of the second

   120 +   worm's joint positions. This program, =EMPATH=, is described in

   121 +   section \ref{sec-3}, and the key results of this experiment are

   122 +   summarized below.

   123 +

   124 +  #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer 

   125 +  #+caption: the complete sensory experience and classify these four poses.

   126 +  #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.

   127 +  #+name: worm-recognition-intro-2

   128 +  #+ATTR_LaTeX: :width 15cm

   129 +   [[./images/empathy-1.png]]

   130 +

   131 +   # =CORTEX= provides a language for describing the sensorimotor

   132 +   # experiences of various creatures. 

   133 +

   134 +   # Next, I developed an experiment to test the power of =CORTEX='s

   135 +   # sensorimotor-centered language for solving recognition problems. As

   136 +   # a proof of concept, I wrote routines which enabled a simple

   137 +   # worm-like creature to infer the actions of a second worm-like

   138 +   # creature, using only its own previous sensorimotor experiences and

   139 +   # knowledge of the second worm's joints (figure

   140 +   # \ref{worm-recognition-intro-2}). The result of this proof of

   141 +   # concept was the program =EMPATH=, described in section

   142 +   # \ref{sec-3}. The key results of this

   143 +

   144 +   # Using only first-person sensorimotor experiences and third-person

   145 +   # proprioceptive data, 

   146 +

   147 +*** Key results

   148 +   - After one-shot supervised training, =EMPATH= was able recognize a

   149 +     wide variety of static poses and dynamic actions---ranging from

   150 +     curling in a circle to wriggling with a particular frequency ---

   151 +     with 95\% accuracy.

   152 +   - These results were completely independent of viewing angle

   153 +     because the underlying body-centered language fundamentally is;

   154 +     once an action is learned, it can be recognized equally well from

   155 +     any viewing angle.

   156 +   - =EMPATH= is surprisingly short; the sensorimotor-centered

   157 +     language provided by =CORTEX= resulted in extremely economical

   158 +     recognition routines --- about 0000 lines in all --- suggesting

   159 +     that such representations are very powerful, and often

   160 +     indispensible for the types of recognition tasks considered here.

   161 +   - Although for expediency's sake, I relied on direct knowledge of

   162 +     joint positions in this proof of concept, it would be

   163 +     straightforward to extend =EMPATH= so that it (more

   164 +     realistically) infers joint positions from its visual data.

   165 +

   166 +# because the underlying language is fundamentally orientation-independent

   167 +

   168 +# recognize the actions of a worm with 95\% accuracy. The

   169 +#      recognition tasks 

   170     

   171 -   I propose a system that can express the types of recognition

   172 -   problems above in a form amenable to computation. It is split into

   173 +

   174 +

   175 +

   176 +   [Talk about these results and what you find promising about them]

   177 +

   178 +** Roadmap

   179 +   [I'm going to explain how =CORTEX= works, then break down how

   180 +   =EMPATH= does its thing. Because the details reveal such-and-such

   181 +   about the approach.]

   182 +

   183 +   # The success of this simple proof-of-concept offers a tantalizing

   184 +

   185 +

   186 +   # explore the idea 

   187 +   # The key contribution of this thesis is the idea that body-centered

   188 +   # representations (which express 

   189 +

   190 +

   191 +   # the

   192 +   # body-centered approach --- in which I try to determine what's

   193 +   # happening in a scene by bringing it into registration with my own

   194 +   # bodily experiences --- are indispensible for recognizing what

   195 +   # creatures are doing in a scene.

   196 +

   197 +* COMMENT

   198 +# body-centered language

   199 +   

   200 +   In this thesis, I'll describe =EMPATH=, which solves a certain

   201 +   class of recognition problems 

   202 +

   203 +   The key idea is to use self-centered (or first-person) language.

   204 +

   205 +   I have built a system that can express the types of recognition

   206 +   problems in a form amenable to computation. It is split into

   207     four parts:

   208  

   209     - Free/Guided Play :: The creature moves around and experiences the

   210 @@ -286,14 +383,14 @@

   211       code to create a creature, and can use a wide library of

   212       pre-existing blender models as a base for your own creatures.

   213  

   214 -   - =CORTEX= implements a wide variety of senses, including touch,

   215 +   - =CORTEX= implements a wide variety of senses: touch,

   216       proprioception, vision, hearing, and muscle tension. Complicated

   217       senses like touch, and vision involve multiple sensory elements

   218       embedded in a 2D surface. You have complete control over the

   219       distribution of these sensor elements through the use of simple

   220       png image files. In particular, =CORTEX= implements more

   221       comprehensive hearing than any other creature simulation system

   222 -     available. 

   223 +     available.

   224  

   225     - =CORTEX= supports any number of creatures and any number of

   226       senses. Time in =CORTEX= dialates so that the simulated creatures

   227 @@ -353,7 +450,24 @@

   228     \end{sidewaysfigure}

   229  #+END_LaTeX

   230  

   231 -** Contributions

   232 +** Road map

   233 +

   234 +   By the end of this thesis, you will have seen a novel approach to

   235 +  interpreting video using embodiment and empathy. You will have also

   236 +  seen one way to efficiently implement empathy for embodied

   237 +  creatures. Finally, you will become familiar with =CORTEX=, a system

   238 +  for designing and simulating creatures with rich senses, which you

   239 +  may choose to use in your own research.

   240 +  

   241 +  This is the core vision of my thesis: That one of the important ways

   242 +  in which we understand others is by imagining ourselves in their

   243 +  position and emphatically feeling experiences relative to our own

   244 +  bodies. By understanding events in terms of our own previous

   245 +  corporeal experience, we greatly constrain the possibilities of what

   246 +  would otherwise be an unwieldy exponential search. This extra

   247 +  constraint can be the difference between easily understanding what

   248 +  is happening in a video and being completely lost in a sea of

   249 +  incomprehensible color and movement.

   250  

   251     - I built =CORTEX=, a comprehensive platform for embodied AI

   252       experiments. =CORTEX= supports many features lacking in other

   253 @@ -363,18 +477,22 @@

   254     - I built =EMPATH=, which uses =CORTEX= to identify the actions of

   255       a worm-like creature using a computational model of empathy.

   256     

   257 -* Building =CORTEX=

   258 -

   259 -  I intend for =CORTEX= to be used as a general-purpose library for

   260 -  building creatures and outfitting them with senses, so that it will

   261 -  be useful for other researchers who want to test out ideas of their

   262 -  own. To this end, wherver I have had to make archetictural choices

   263 -  about =CORTEX=, I have chosen to give as much freedom to the user as

   264 -  possible, so that =CORTEX= may be used for things I have not

   265 -  forseen.

   266 -

   267 -** Simulation or Reality?

   268 -   

   269 +

   270 +* Designing =CORTEX=

   271 +  In this section, I outline the design decisions that went into

   272 +  making =CORTEX=, along with some details about its

   273 +  implementation. (A practical guide to getting started with =CORTEX=,

   274 +  which skips over the history and implementation details presented

   275 +  here, is provided in an appendix \ref{} at the end of this paper.)

   276 +

   277 +  Throughout this project, I intended for =CORTEX= to be flexible and

   278 +  extensible enough to be useful for other researchers who want to

   279 +  test out ideas of their own. To this end, wherver I have had to make

   280 +  archetictural choices about =CORTEX=, I have chosen to give as much

   281 +  freedom to the user as possible, so that =CORTEX= may be used for

   282 +  things I have not forseen.

   283 +

   284 +** Building in simulation versus reality

   285     The most important archetictural decision of all is the choice to

   286     use a computer-simulated environemnt in the first place! The world

   287     is a vast and rich place, and for now simulations are a very poor
author	Robert McIntyre <rlm@mit.edu>
date	Sun, 27 Apr 2014 20:25:22 -0400
parents	90b236381642
children