Mercurial > cortex

     1 diff -r f639e2139ce2 thesis/cortex.org

     2 --- a/thesis/cortex.org	Sun Mar 30 01:34:43 2014 -0400

     3 +++ b/thesis/cortex.org	Sun Mar 30 10:07:17 2014 -0400

     4 @@ -41,49 +41,46 @@

     5      [[./images/aurellem-gray.png]]

     6  

     7  

     8 -* Empathy and Embodiment as problem solving strategies

     9 +* Empathy \& Embodiment: problem solving strategies

    10    

    11 -  By the end of this thesis, you will have seen a novel approach to

    12 -  interpreting video using embodiment and empathy. You will have also

    13 -  seen one way to efficiently implement empathy for embodied

    14 -  creatures. Finally, you will become familiar with =CORTEX=, a system

    15 -  for designing and simulating creatures with rich senses, which you

    16 -  may choose to use in your own research.

    17 -  

    18 -  This is the core vision of my thesis: That one of the important ways

    19 -  in which we understand others is by imagining ourselves in their

    20 -  position and emphatically feeling experiences relative to our own

    21 -  bodies. By understanding events in terms of our own previous

    22 -  corporeal experience, we greatly constrain the possibilities of what

    23 -  would otherwise be an unwieldy exponential search. This extra

    24 -  constraint can be the difference between easily understanding what

    25 -  is happening in a video and being completely lost in a sea of

    26 -  incomprehensible color and movement.

    27 -  

    28 -** Recognizing actions in video is extremely difficult

    29 -

    30 -   Consider for example the problem of determining what is happening

    31 -   in a video of which this is one frame:

    32 -

    33 +** The problem: recognizing actions in video is extremely difficult

    34 +# developing / requires useful representations

    35 +   

    36 +   Examine the following collection of images. As you, and indeed very

    37 +   young children, can easily determine, each one is a picture of

    38 +   someone drinking. 

    39 +

    40 +   # dxh: cat, cup, drinking fountain, rain, straw, coconut

    41     #+caption: A cat drinking some water. Identifying this action is 

    42 -   #+caption: beyond the state of the art for computers.

    43 +   #+caption: beyond the capabilities of existing computer vision systems.

    44     #+ATTR_LaTeX: :width 7cm

    45     [[./images/cat-drinking.jpg]]

    46 +     

    47 +   Nevertheless, it is beyond the state of the art for a computer

    48 +   vision program to describe what's happening in each of these

    49 +   images, or what's common to them. Part of the problem is that many

    50 +   computer vision systems focus on pixel-level details or probability

    51 +   distributions of pixels, with little focus on [...]

    52 +

    53 +

    54 +   In fact, the contents of scene may have much less to do with pixel

    55 +   probabilities than with recognizing various affordances: things you

    56 +   can move, objects you can grasp, spaces that can be filled

    57 +   (Gibson). For example, what processes might enable you to see the

    58 +   chair in figure \ref{hidden-chair}? 

    59 +   # Or suppose that you are building a program that recognizes chairs.

    60 +   # How could you ``see'' the chair ?

    61     

    62 -   It is currently impossible for any computer program to reliably

    63 -   label such a video as ``drinking''. And rightly so -- it is a very

    64 -   hard problem! What features can you describe in terms of low level

    65 -   functions of pixels that can even begin to describe at a high level

    66 -   what is happening here?

    67 -  

    68 -   Or suppose that you are building a program that recognizes chairs.

    69 -   How could you ``see'' the chair in figure \ref{hidden-chair}?

    70 -   

    71 +   # dxh: blur chair

    72     #+caption: The chair in this image is quite obvious to humans, but I 

    73     #+caption: doubt that any modern computer vision program can find it.

    74     #+name: hidden-chair

    75     #+ATTR_LaTeX: :width 10cm

    76     [[./images/fat-person-sitting-at-desk.jpg]]

    77 +

    78 +

    79 +   

    80 +

    81     

    82     Finally, how is it that you can easily tell the difference between

    83     how the girls /muscles/ are working in figure \ref{girl}?

    84 @@ -95,10 +92,13 @@

    85     #+ATTR_LaTeX: :width 7cm

    86     [[./images/wall-push.png]]

    87    

    88 +

    89 +

    90 +

    91     Each of these examples tells us something about what might be going

    92     on in our minds as we easily solve these recognition problems.

    93     

    94 -   The hidden chairs show us that we are strongly triggered by cues

    95 +   The hidden chair shows us that we are strongly triggered by cues

    96     relating to the position of human bodies, and that we can determine

    97     the overall physical configuration of a human body even if much of

    98     that body is occluded.

    99 @@ -109,10 +109,107 @@

   100     most positions, and we can easily project this self-knowledge to

   101     imagined positions triggered by images of the human body.

   102  

   103 -** =EMPATH= neatly solves recognition problems  

   104 +** A step forward: the sensorimotor-centered approach

   105 +# ** =EMPATH= recognizes what creatures are doing

   106 +# neatly solves recognition problems  

   107 +   In this thesis, I explore the idea that our knowledge of our own

   108 +   bodies enables us to recognize the actions of others. 

   109 +

   110 +   First, I built a system for constructing virtual creatures with

   111 +   physiologically plausible sensorimotor systems and detailed

   112 +   environments. The result is =CORTEX=, which is described in section

   113 +   \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other

   114 +   AI researchers; it is provided in full with detailed instructions

   115 +   on the web [here].)

   116 +

   117 +   Next, I wrote routines which enabled a simple worm-like creature to

   118 +   infer the actions of a second worm-like creature, using only its

   119 +   own prior sensorimotor experiences and knowledge of the second

   120 +   worm's joint positions. This program, =EMPATH=, is described in

   121 +   section \ref{sec-3}, and the key results of this experiment are

   122 +   summarized below.

   123 +

   124 +  #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer 

   125 +  #+caption: the complete sensory experience and classify these four poses.

   126 +  #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.

   127 +  #+name: worm-recognition-intro-2

   128 +  #+ATTR_LaTeX: :width 15cm

   129 +   [[./images/empathy-1.png]]

   130 +

   131 +   # =CORTEX= provides a language for describing the sensorimotor

   132 +   # experiences of various creatures. 

   133 +

   134 +   # Next, I developed an experiment to test the power of =CORTEX='s

   135 +   # sensorimotor-centered language for solving recognition problems. As

   136 +   # a proof of concept, I wrote routines which enabled a simple

   137 +   # worm-like creature to infer the actions of a second worm-like

   138 +   # creature, using only its own previous sensorimotor experiences and

   139 +   # knowledge of the second worm's joints (figure

   140 +   # \ref{worm-recognition-intro-2}). The result of this proof of

   141 +   # concept was the program =EMPATH=, described in section

   142 +   # \ref{sec-3}. The key results of this

   143 +

   144 +   # Using only first-person sensorimotor experiences and third-person

   145 +   # proprioceptive data, 

   146 +

   147 +*** Key results

   148 +   - After one-shot supervised training, =EMPATH= was able recognize a

   149 +     wide variety of static poses and dynamic actions---ranging from

   150 +     curling in a circle to wriggling with a particular frequency ---

   151 +     with 95\% accuracy.

   152 +   - These results were completely independent of viewing angle

   153 +     because the underlying body-centered language fundamentally is;

   154 +     once an action is learned, it can be recognized equally well from

   155 +     any viewing angle.

   156 +   - =EMPATH= is surprisingly short; the sensorimotor-centered

   157 +     language provided by =CORTEX= resulted in extremely economical

   158 +     recognition routines --- about 0000 lines in all --- suggesting

   159 +     that such representations are very powerful, and often

   160 +     indispensible for the types of recognition tasks considered here.

   161 +   - Although for expediency's sake, I relied on direct knowledge of

   162 +     joint positions in this proof of concept, it would be

   163 +     straightforward to extend =EMPATH= so that it (more

   164 +     realistically) infers joint positions from its visual data.

   165 +

   166 +# because the underlying language is fundamentally orientation-independent

   167 +

   168 +# recognize the actions of a worm with 95\% accuracy. The

   169 +#      recognition tasks 

   170     

   171 -   I propose a system that can express the types of recognition

   172 -   problems above in a form amenable to computation. It is split into

   173 +

   174 +

   175 +

   176 +   [Talk about these results and what you find promising about them]

   177 +

   178 +** Roadmap

   179 +   [I'm going to explain how =CORTEX= works, then break down how

   180 +   =EMPATH= does its thing. Because the details reveal such-and-such

   181 +   about the approach.]

   182 +

   183 +   # The success of this simple proof-of-concept offers a tantalizing

   184 +

   185 +

   186 +   # explore the idea 

   187 +   # The key contribution of this thesis is the idea that body-centered

   188 +   # representations (which express 

   189 +

   190 +

   191 +   # the

   192 +   # body-centered approach --- in which I try to determine what's

   193 +   # happening in a scene by bringing it into registration with my own

   194 +   # bodily experiences --- are indispensible for recognizing what

   195 +   # creatures are doing in a scene.

   196 +

   197 +* COMMENT

   198 +# body-centered language

   199 +   

   200 +   In this thesis, I'll describe =EMPATH=, which solves a certain

   201 +   class of recognition problems 

   202 +

   203 +   The key idea is to use self-centered (or first-person) language.

   204 +

   205 +   I have built a system that can express the types of recognition

   206 +   problems in a form amenable to computation. It is split into

   207     four parts:

   208  

   209     - Free/Guided Play :: The creature moves around and experiences the

   210 @@ -286,14 +383,14 @@

   211       code to create a creature, and can use a wide library of

   212       pre-existing blender models as a base for your own creatures.

   213  

   214 -   - =CORTEX= implements a wide variety of senses, including touch,

   215 +   - =CORTEX= implements a wide variety of senses: touch,

   216       proprioception, vision, hearing, and muscle tension. Complicated

   217       senses like touch, and vision involve multiple sensory elements

   218       embedded in a 2D surface. You have complete control over the

   219       distribution of these sensor elements through the use of simple

   220       png image files. In particular, =CORTEX= implements more

   221       comprehensive hearing than any other creature simulation system

   222 -     available. 

   223 +     available.

   224  

   225     - =CORTEX= supports any number of creatures and any number of

   226       senses. Time in =CORTEX= dialates so that the simulated creatures

   227 @@ -353,7 +450,24 @@

   228     \end{sidewaysfigure}

   229  #+END_LaTeX

   230  

   231 -** Contributions

   232 +** Road map

   233 +

   234 +   By the end of this thesis, you will have seen a novel approach to

   235 +  interpreting video using embodiment and empathy. You will have also

   236 +  seen one way to efficiently implement empathy for embodied

   237 +  creatures. Finally, you will become familiar with =CORTEX=, a system

   238 +  for designing and simulating creatures with rich senses, which you

   239 +  may choose to use in your own research.

   240 +  

   241 +  This is the core vision of my thesis: That one of the important ways

   242 +  in which we understand others is by imagining ourselves in their

   243 +  position and emphatically feeling experiences relative to our own

   244 +  bodies. By understanding events in terms of our own previous

   245 +  corporeal experience, we greatly constrain the possibilities of what

   246 +  would otherwise be an unwieldy exponential search. This extra

   247 +  constraint can be the difference between easily understanding what

   248 +  is happening in a video and being completely lost in a sea of

   249 +  incomprehensible color and movement.

   250  

   251     - I built =CORTEX=, a comprehensive platform for embodied AI

   252       experiments. =CORTEX= supports many features lacking in other

   253 @@ -363,18 +477,22 @@

   254     - I built =EMPATH=, which uses =CORTEX= to identify the actions of

   255       a worm-like creature using a computational model of empathy.

   256     

   257 -* Building =CORTEX=

   258 -

   259 -  I intend for =CORTEX= to be used as a general-purpose library for

   260 -  building creatures and outfitting them with senses, so that it will

   261 -  be useful for other researchers who want to test out ideas of their

   262 -  own. To this end, wherver I have had to make archetictural choices

   263 -  about =CORTEX=, I have chosen to give as much freedom to the user as

   264 -  possible, so that =CORTEX= may be used for things I have not

   265 -  forseen.

   266 -

   267 -** Simulation or Reality?

   268 -   

   269 +

   270 +* Designing =CORTEX=

   271 +  In this section, I outline the design decisions that went into

   272 +  making =CORTEX=, along with some details about its

   273 +  implementation. (A practical guide to getting started with =CORTEX=,

   274 +  which skips over the history and implementation details presented

   275 +  here, is provided in an appendix \ref{} at the end of this paper.)

   276 +

   277 +  Throughout this project, I intended for =CORTEX= to be flexible and

   278 +  extensible enough to be useful for other researchers who want to

   279 +  test out ideas of their own. To this end, wherver I have had to make

   280 +  archetictural choices about =CORTEX=, I have chosen to give as much

   281 +  freedom to the user as possible, so that =CORTEX= may be used for

   282 +  things I have not forseen.

   283 +

   284 +** Building in simulation versus reality

   285     The most important archetictural decision of all is the choice to

   286     use a computer-simulated environemnt in the first place! The world

   287     is a vast and rich place, and for now simulations are a very poor

   288 @@ -436,7 +554,7 @@

   289      doing everything in software is far cheaper than building custom

   290      real-time hardware. All you need is a laptop and some patience.

   291  

   292 -** Because of Time, simulation is perferable to reality

   293 +** Simulated time enables rapid prototyping and complex scenes 

   294  

   295     I envision =CORTEX= being used to support rapid prototyping and

   296     iteration of ideas. Even if I could put together a well constructed

   297 @@ -459,8 +577,8 @@

   298     simulations of very simple creatures in =CORTEX= generally run at

   299     40x on my machine!

   300  

   301 -** What is a sense?

   302 -   

   303 +** All sense organs are two-dimensional surfaces

   304 +# What is a sense?   

   305     If =CORTEX= is to support a wide variety of senses, it would help

   306     to have a better understanding of what a ``sense'' actually is!

   307     While vision, touch, and hearing all seem like they are quite

   308 @@ -956,7 +1074,7 @@

   309      #+ATTR_LaTeX: :width 15cm

   310      [[./images/physical-hand.png]]

   311  

   312 -** Eyes reuse standard video game components

   313 +** Sight reuses standard video game components...

   314  

   315     Vision is one of the most important senses for humans, so I need to

   316     build a simulated sense of vision for my AI. I will do this with

   317 @@ -1257,8 +1375,8 @@

   318      community and is now (in modified form) part of a system for

   319      capturing in-game video to a file.

   320  

   321 -** Hearing is hard; =CORTEX= does it right

   322 -   

   323 +** ...but hearing must be built from scratch

   324 +# is hard; =CORTEX= does it right

   325     At the end of this section I will have simulated ears that work the

   326     same way as the simulated eyes in the last section. I will be able to

   327     place any number of ear-nodes in a blender file, and they will bind to

   328 @@ -1565,7 +1683,7 @@

   329      jMonkeyEngine3 community and is used to record audio for demo

   330      videos.

   331  

   332 -** Touch uses hundreds of hair-like elements

   333 +** Hundreds of hair-like elements provide a sense of touch

   334  

   335     Touch is critical to navigation and spatial reasoning and as such I

   336     need a simulated version of it to give to my AI creatures.

   337 @@ -2059,7 +2177,7 @@

   338      #+ATTR_LaTeX: :width 15cm

   339      [[./images/touch-cube.png]]

   340  

   341 -** Proprioception is the sense that makes everything ``real''

   342 +** Proprioception provides knowledge of your own body's position

   343  

   344     Close your eyes, and touch your nose with your right index finger.

   345     How did you do it? You could not see your hand, and neither your

   346 @@ -2193,7 +2311,7 @@

   347      #+ATTR_LaTeX: :width 11cm

   348      [[./images/proprio.png]]

   349  

   350 -** Muscles are both effectors and sensors

   351 +** Muscles contain both sensors and effectors

   352  

   353     Surprisingly enough, terrestrial creatures only move by using

   354     torque applied about their joints. There's not a single straight

   355 @@ -2440,7 +2558,8 @@

   356          hard control problems without worrying about physics or

   357          senses.

   358  

   359 -* Empathy in a simulated worm

   360 +* =EMPATH=: the simulated worm experiment

   361 +# Empathy in a simulated worm

   362  

   363    Here I develop a computational model of empathy, using =CORTEX= as a

   364    base. Empathy in this context is the ability to observe another

   365 @@ -2732,7 +2851,7 @@

   366     provided by an experience vector and reliably infering the rest of

   367     the senses.

   368  

   369 -** Empathy is the process of tracing though \Phi-space 

   370 +** ``Empathy'' requires retracing steps though \Phi-space 

   371  

   372     Here is the core of a basic empathy algorithm, starting with an

   373     experience vector:

   374 @@ -2888,7 +3007,7 @@

   375     #+end_src

   376     #+end_listing

   377    

   378 -** Efficient action recognition with =EMPATH=

   379 +** =EMPATH= recognizes actions efficiently

   380     

   381     To use =EMPATH= with the worm, I first need to gather a set of

   382     experiences from the worm that includes the actions I want to

   383 @@ -3044,9 +3163,9 @@

   384    to interpretation, and dissaggrement between empathy and experience

   385    is more excusable.

   386  

   387 -** Digression: bootstrapping touch using free exploration

   388 -

   389 -   In the previous section I showed how to compute actions in terms of

   390 +** Digression: Learn touch sensor layout through haptic experimentation, instead 

   391 +# Boostraping touch using free exploration   

   392 +In the previous section I showed how to compute actions in terms of

   393     body-centered predicates which relied averate touch activation of

   394     pre-defined regions of the worm's skin. What if, instead of recieving

   395     touch pre-grouped into the six faces of each worm segment, the true
author	Robert McIntyre <rlm@mit.edu>
date	Sun, 30 Mar 2014 10:41:18 -0400
parents
children	447c3c8405a2