Mercurial > cortex

     1 #+title: =CORTEX=

     2 #+author: Robert McIntyre

     3 #+email: rlm@mit.edu

     4 #+description: Using embodied AI to facilitate Artificial Imagination.

     5 #+keywords: AI, clojure, embodiment

     6 

     7 

     8 * Empathy and Embodiment as problem solving strategies

     9   

    10   By the end of this thesis, you will have seen a novel approach to

    11   interpreting video using embodiment and empathy. You will have also

    12   seen one way to efficiently implement empathy for embodied

    13   creatures.

    14   

    15   The core vision of this thesis is that one of the important ways in

    16   which we understand others is by imagining ourselves in their

    17   posistion and empathicaly feeling experiences based on our own past

    18   experiences and imagination.

    19 

    20   By understanding events in terms of our own previous corperal

    21   experience, we greatly constrain the possibilities of what would

    22   otherwise be an unweidly exponential search. This extra constraint

    23   can be the difference between easily understanding what is happening

    24   in a video and being completely lost in a sea of incomprehensible

    25   color and movement.

    26 

    27 ** Recognizing actions in video is extremely difficult

    28 

    29   Consider for example the problem of determining what is happening in

    30   a video of which this is one frame:

    31 

    32   #+caption: A cat drinking some water. Identifying this action is 

    33   #+caption: beyond the state of the art for computers.

    34   #+ATTR_LaTeX: :width 7cm

    35   [[./images/cat-drinking.jpg]]

    36   

    37   It is currently impossible for any computer program to reliably

    38   label such an video as "drinking".  And rightly so -- it is a very

    39   hard problem! What features can you describe in terms of low level

    40   functions of pixels that can even begin to describe what is

    41   happening here? 

    42   

    43   Or suppose that you are building a program that recognizes

    44   chairs. How could you ``see'' the chair in the following picture?

    45 

    46   #+caption: When you look at this, do you think ``chair''? I certainly do.

    47   #+ATTR_LaTeX: :width 10cm

    48   [[./images/invisible-chair.png]]

    49   

    50   #+caption: The chair in this image is quite obvious to humans, but I 

    51   #+caption: doubt that any computer program can find it.

    52   #+ATTR_LaTeX: :width 10cm

    53   [[./images/fat-person-sitting-at-desk.jpg]]

    54 

    55 

    56   I think humans are able to label

    57   such video as "drinking" because they imagine /themselves/ as the

    58   cat, and imagine putting their face up against a stream of water and

    59   sticking out their tongue. In that imagined world, they can feel the

    60   cool water hitting their tongue, and feel the water entering their

    61   body, and are able to recognize that /feeling/ as drinking. So, the

    62   label of the action is not really in the pixels of the image, but is

    63   found clearly in a simulation inspired by those pixels. An

    64   imaginative system, having been trained on drinking and non-drinking

    65   examples and learning that the most important component of drinking

    66   is the feeling of water sliding down one's throat, would analyze a

    67   video of a cat drinking in the following manner:

    68    

    69    - Create a physical model of the video by putting a "fuzzy" model

    70      of its own body in place of the cat. Also, create a simulation of

    71      the stream of water.

    72 

    73    - Play out this simulated scene and generate imagined sensory

    74      experience. This will include relevant muscle contractions, a

    75      close up view of the stream from the cat's perspective, and most

    76      importantly, the imagined feeling of water entering the mouth.

    77 

    78    - The action is now easily identified as drinking by the sense of

    79      taste alone. The other senses (such as the tongue moving in and

    80      out) help to give plausibility to the simulated action. Note that

    81      the sense of vision, while critical in creating the simulation,

    82      is not critical for identifying the action from the simulation.

    83 

    84 

    85 

    86 

    87 

    88 

    89 

    90    cat drinking, mimes, leaning, common sense

    91 

    92 ** =EMPATH= neatly solves recognition problems

    93 

    94    factorization , right language, etc

    95 

    96    a new possibility for the question ``what is a chair?'' -- it's the

    97    feeling of your butt on something and your knees bent, with your

    98    back muscles and legs relaxed.

    99 

   100 ** =CORTEX= is a toolkit for building sensate creatures

   101 

   102    Hand integration demo

   103 

   104 ** Contributions

   105 

   106 * Building =CORTEX=

   107 

   108 ** To explore embodiment, we need a world, body, and senses

   109 

   110 ** Because of Time, simulation is perferable to reality

   111 

   112 ** Video game engines are a great starting point

   113 

   114 ** Bodies are composed of segments connected by joints

   115 

   116 ** Eyes reuse standard video game components

   117 

   118 ** Hearing is hard; =CORTEX= does it right

   119 

   120 ** Touch uses hundreds of hair-like elements

   121 

   122 ** Proprioception is the force that makes everything ``real''

   123 

   124 ** Muscles are both effectors and sensors

   125 

   126 ** =CORTEX= brings complex creatures to life!

   127 

   128 ** =CORTEX= enables many possiblities for further research

   129 

   130 * Empathy in a simulated worm

   131 

   132 ** Embodiment factors action recognition into managable parts

   133 

   134 ** Action recognition is easy with a full gamut of senses

   135 

   136 ** Digression: bootstrapping touch using free exploration

   137 

   138 ** \Phi-space describes the worm's experiences

   139 

   140 ** Empathy is the process of tracing though \Phi-space 

   141   

   142 ** Efficient action recognition via empathy

   143 

   144 * Contributions

   145   - Built =CORTEX=, a comprehensive platform for embodied AI

   146     experiments. Has many new features lacking in other systems, such

   147     as sound. Easy to model/create new creatures.

   148   - created a novel concept for action recognition by using artificial

   149     imagination. 

   150 

   151 In the second half of the thesis I develop a computational model of

   152 empathy, using =CORTEX= as a base. Empathy in this context is the

   153 ability to observe another creature and infer what sorts of sensations

   154 that creature is feeling. My empathy algorithm involves multiple

   155 phases. First is free-play, where the creature moves around and gains

   156 sensory experience. From this experience I construct a representation

   157 of the creature's sensory state space, which I call \phi-space. Using

   158 \phi-space, I construct an efficient function for enriching the

   159 limited data that comes from observing another creature with a full

   160 compliment of imagined sensory data based on previous experience. I

   161 can then use the imagined sensory data to recognize what the observed

   162 creature is doing and feeling, using straightforward embodied action

   163 predicates. This is all demonstrated with using a simple worm-like

   164 creature, and recognizing worm-actions based on limited data.

   165 

   166 Embodied representation using multiple senses such as touch,

   167 proprioception, and muscle tension turns out be be exceedingly

   168 efficient at describing body-centered actions. It is the ``right

   169 language for the job''. For example, it takes only around 5 lines of

   170 LISP code to describe the action of ``curling'' using embodied

   171 primitives. It takes about 8 lines to describe the seemingly

   172 complicated action of wiggling.

   173 

   174 

   175 

   176 * COMMENT names for cortex

   177  - bioland
author	Robert McIntyre <rlm@mit.edu>
date	Sun, 23 Mar 2014 22:23:54 -0400
parents	4dcb923c9b16
children	b01c070b03d4