Mercurial > cortex

     1 #+title: =CORTEX=

     2 #+author: Robert McIntyre

     3 #+email: rlm@mit.edu

     4 #+description: Using embodied AI to facilitate Artificial Imagination.

     5 #+keywords: AI, clojure, embodiment

     6 

     7 

     8 * Empathy and Embodiment as problem solving strategies

     9   

    10   By the end of this thesis, you will have seen a novel approach to

    11   interpreting video using embodiment and empathy. You will have also

    12   seen one way to efficiently implement empathy for embodied

    13   creatures.

    14   

    15   The core vision of this thesis is that one of the important ways in

    16   which we understand others is by imagining ourselves in their

    17   posistion and empathicaly feeling experiences based on our own past

    18   experiences and imagination.

    19 

    20   By understanding events in terms of our own previous corperal

    21   experience, we greatly constrain the possibilities of what would

    22   otherwise be an unweidly exponential search. This extra constraint

    23   can be the difference between easily understanding what is happening

    24   in a video and being completely lost in a sea of incomprehensible

    25   color and movement.

    26 

    27 ** Recognizing actions in video is extremely difficult

    28 

    29   Consider for example the problem of determining what is happening in

    30   a video of which this is one frame:

    31 

    32   #+caption: A cat drinking some water. Identifying this action is 

    33   #+caption: beyond the state of the art for computers.

    34   #+ATTR_LaTeX: :width 7cm

    35   [[./images/cat-drinking.jpg]]

    36   

    37   It is currently impossible for any computer program to reliably

    38   label such an video as "drinking".  And rightly so -- it is a very

    39   hard problem! What features can you describe in terms of low level

    40   functions of pixels that can even begin to describe what is

    41   happening here? 

    42   

    43   Or suppose that you are building a program that recognizes

    44   chairs. How could you ``see'' the chair in the following pictures?

    45 

    46   #+caption: When you look at this, do you think ``chair''? I certainly do.

    47   #+ATTR_LaTeX: :width 10cm

    48   [[./images/invisible-chair.png]]

    49   

    50   #+caption: The chair in this image is quite obvious to humans, but I 

    51   #+caption: doubt that any computer program can find it.

    52   #+ATTR_LaTeX: :width 10cm

    53   [[./images/fat-person-sitting-at-desk.jpg]]

    54 

    55   Finally, how is it that you can easily tell the difference between

    56   how the girls /muscles/ are working in \ref{girl}?

    57 

    58   #+caption: The mysterious ``common sense'' appears here as you are able 

    59   #+caption: to ``see'' the difference in how the girl's arm muscles

    60   #+caption: are activated differently in the two images.

    61   #+name: girl

    62   #+ATTR_LaTeX: :width 10cm

    63   [[./images/wall-push.png]]

    64   

    65 

    66   These problems are difficult because the language of pixels is far

    67   removed from what we would consider to be an acceptable description

    68   of the events in these images. In order to process them, we must

    69   raise the images into some higher level of abstraction where their

    70   descriptions become more similar to how we would describe them in

    71   English. The question is, how can we raise 

    72   

    73 

    74   I think humans are able to label such video as "drinking" because

    75   they imagine /themselves/ as the cat, and imagine putting their face

    76   up against a stream of water and sticking out their tongue. In that

    77   imagined world, they can feel the cool water hitting their tongue,

    78   and feel the water entering their body, and are able to recognize

    79   that /feeling/ as drinking. So, the label of the action is not

    80   really in the pixels of the image, but is found clearly in a

    81   simulation inspired by those pixels. An imaginative system, having

    82   been trained on drinking and non-drinking examples and learning that

    83   the most important component of drinking is the feeling of water

    84   sliding down one's throat, would analyze a video of a cat drinking

    85   in the following manner:

    86    

    87    - Create a physical model of the video by putting a "fuzzy" model

    88      of its own body in place of the cat. Also, create a simulation of

    89      the stream of water.

    90 

    91    - Play out this simulated scene and generate imagined sensory

    92      experience. This will include relevant muscle contractions, a

    93      close up view of the stream from the cat's perspective, and most

    94      importantly, the imagined feeling of water entering the mouth.

    95 

    96    - The action is now easily identified as drinking by the sense of

    97      taste alone. The other senses (such as the tongue moving in and

    98      out) help to give plausibility to the simulated action. Note that

    99      the sense of vision, while critical in creating the simulation,

   100      is not critical for identifying the action from the simulation.

   101 

   102    cat drinking, mimes, leaning, common sense

   103 

   104 ** =EMPATH= neatly solves recognition problems

   105 

   106    factorization , right language, etc

   107 

   108    a new possibility for the question ``what is a chair?'' -- it's the

   109    feeling of your butt on something and your knees bent, with your

   110    back muscles and legs relaxed.

   111 

   112 ** =CORTEX= is a toolkit for building sensate creatures

   113 

   114    Hand integration demo

   115 

   116 ** Contributions

   117 

   118 * Building =CORTEX=

   119 

   120 ** To explore embodiment, we need a world, body, and senses

   121 

   122 ** Because of Time, simulation is perferable to reality

   123 

   124 ** Video game engines are a great starting point

   125 

   126 ** Bodies are composed of segments connected by joints

   127 

   128 ** Eyes reuse standard video game components

   129 

   130 ** Hearing is hard; =CORTEX= does it right

   131 

   132 ** Touch uses hundreds of hair-like elements

   133 

   134 ** Proprioception is the sense that makes everything ``real''

   135 

   136 ** Muscles are both effectors and sensors

   137 

   138 ** =CORTEX= brings complex creatures to life!

   139 

   140 ** =CORTEX= enables many possiblities for further research

   141 

   142 * Empathy in a simulated worm

   143 

   144 ** Embodiment factors action recognition into managable parts

   145 

   146 ** Action recognition is easy with a full gamut of senses

   147 

   148 ** Digression: bootstrapping touch using free exploration

   149 

   150 ** \Phi-space describes the worm's experiences

   151 

   152 ** Empathy is the process of tracing though \Phi-space 

   153   

   154 ** Efficient action recognition =EMPATH=

   155 

   156 * Contributions

   157   - Built =CORTEX=, a comprehensive platform for embodied AI

   158     experiments. Has many new features lacking in other systems, such

   159     as sound. Easy to model/create new creatures.

   160   - created a novel concept for action recognition by using artificial

   161     imagination. 

   162 

   163 In the second half of the thesis I develop a computational model of

   164 empathy, using =CORTEX= as a base. Empathy in this context is the

   165 ability to observe another creature and infer what sorts of sensations

   166 that creature is feeling. My empathy algorithm involves multiple

   167 phases. First is free-play, where the creature moves around and gains

   168 sensory experience. From this experience I construct a representation

   169 of the creature's sensory state space, which I call \phi-space. Using

   170 \phi-space, I construct an efficient function for enriching the

   171 limited data that comes from observing another creature with a full

   172 compliment of imagined sensory data based on previous experience. I

   173 can then use the imagined sensory data to recognize what the observed

   174 creature is doing and feeling, using straightforward embodied action

   175 predicates. This is all demonstrated with using a simple worm-like

   176 creature, and recognizing worm-actions based on limited data.

   177 

   178 Embodied representation using multiple senses such as touch,

   179 proprioception, and muscle tension turns out be be exceedingly

   180 efficient at describing body-centered actions. It is the ``right

   181 language for the job''. For example, it takes only around 5 lines of

   182 LISP code to describe the action of ``curling'' using embodied

   183 primitives. It takes about 8 lines to describe the seemingly

   184 complicated action of wiggling.

   185 

   186 

   187 

   188 * COMMENT names for cortex

   189  - bioland
author	Robert McIntyre <rlm@mit.edu>
date	Sun, 23 Mar 2014 23:43:20 -0400
parents	97dc719fd1ac
children	c20de2267d39