Mercurial > cortex

diff thesis/cortex.org @ 437:c1e6b7221b2f
progress on intro.
author: Robert McIntyre <rlm@mit.edu>
date: Sun, 23 Mar 2014 22:20:44 -0400
parents: 853377051f1e
children: 4dcb923c9b16
     1.1 --- a/thesis/cortex.org	Sun Mar 23 19:09:14 2014 -0400
     1.2 +++ b/thesis/cortex.org	Sun Mar 23 22:20:44 2014 -0400
     1.3 @@ -4,26 +4,102 @@
     1.4  #+description: Using embodied AI to facilitate Artificial Imagination.
     1.5  #+keywords: AI, clojure, embodiment
     1.6  
     1.7 -* Embodiment is a critical component of Intelligence
     1.8 +
     1.9 +* Empathy and Embodiment as a problem solving strategy
    1.10 +  
    1.11 +  By the end of this thesis, you will have seen a novel approach to
    1.12 +  interpreting video using embodiment and empathy. You will have also
    1.13 +  seen one way to efficiently implement empathy for embodied
    1.14 +  creatures.
    1.15 +  
    1.16 +  The core vision of this thesis is that one of the important ways in
    1.17 +  which we understand others is by imagining ourselves in their
    1.18 +  posistion and empathicaly feeling experiences based on our own past
    1.19 +  experiences and imagination.
    1.20 +
    1.21 +  By understanding events in terms of our own previous corperal
    1.22 +  experience, we greatly constrain the possibilities of what would
    1.23 +  otherwise be an unweidly exponential search. This extra constraint
    1.24 +  can be the difference between easily understanding what is happening
    1.25 +  in a video and being completely lost in a sea of incomprehensible
    1.26 +  color and movement.
    1.27  
    1.28  ** Recognizing actions in video is extremely difficult
    1.29 +
    1.30 +  Consider for example the problem of determining what is happening in
    1.31 +  a video of which this is one frame:
    1.32 +
    1.33 +  #+caption: A cat drinking some water. Identifying this action is beyond the state of the art for computers.
    1.34 +  #+ATTR_LaTeX: :width 7cm
    1.35 +  [[./images/cat-drinking.jpg]]
    1.36 +  
    1.37 +  It is currently impossible for any computer program to reliably
    1.38 +  label such an video as "drinking".  And rightly so -- it is a very
    1.39 +  hard problem! What features can you describe in terms of low level
    1.40 +  functions of pixels that can even begin to describe what is
    1.41 +  happening here? 
    1.42 +  
    1.43 +  Or suppose that you are building a program that recognizes
    1.44 +  chairs. How could you ``see'' the chair in the following picture?
    1.45 +
    1.46 +  #+caption: When you look at this, do you think ``chair''? I certainly do.
    1.47 +  #+ATTR_LaTeX: :width 10cm
    1.48 +  [[./images/invisible-chair.png]]
    1.49 +  
    1.50 +  #+caption: The chair in this image is quite obvious to humans, but I doubt any computer program can find it.
    1.51 +  #+ATTR_LaTeX: :width 10cm
    1.52 +  [[./images/fat-person-sitting-at-desk.jpg]]
    1.53 +
    1.54 +
    1.55 +  I think humans are able to label
    1.56 +  such video as "drinking" because they imagine /themselves/ as the
    1.57 +  cat, and imagine putting their face up against a stream of water and
    1.58 +  sticking out their tongue. In that imagined world, they can feel the
    1.59 +  cool water hitting their tongue, and feel the water entering their
    1.60 +  body, and are able to recognize that /feeling/ as drinking. So, the
    1.61 +  label of the action is not really in the pixels of the image, but is
    1.62 +  found clearly in a simulation inspired by those pixels. An
    1.63 +  imaginative system, having been trained on drinking and non-drinking
    1.64 +  examples and learning that the most important component of drinking
    1.65 +  is the feeling of water sliding down one's throat, would analyze a
    1.66 +  video of a cat drinking in the following manner:
    1.67 +   
    1.68 +   - Create a physical model of the video by putting a "fuzzy" model
    1.69 +     of its own body in place of the cat. Also, create a simulation of
    1.70 +     the stream of water.
    1.71 +
    1.72 +   - Play out this simulated scene and generate imagined sensory
    1.73 +     experience. This will include relevant muscle contractions, a
    1.74 +     close up view of the stream from the cat's perspective, and most
    1.75 +     importantly, the imagined feeling of water entering the mouth.
    1.76 +
    1.77 +   - The action is now easily identified as drinking by the sense of
    1.78 +     taste alone. The other senses (such as the tongue moving in and
    1.79 +     out) help to give plausibility to the simulated action. Note that
    1.80 +     the sense of vision, while critical in creating the simulation,
    1.81 +     is not critical for identifying the action from the simulation.
    1.82 +
    1.83 +
    1.84 +
    1.85 +
    1.86 +
    1.87 +
    1.88 +
    1.89     cat drinking, mimes, leaning, common sense
    1.90  
    1.91 -** Embodiment is the the right language for the job
    1.92 +** =EMPATH= neatly solves recognition problems
    1.93 +
    1.94 +   factorization , right language, etc
    1.95  
    1.96     a new possibility for the question ``what is a chair?'' -- it's the
    1.97     feeling of your butt on something and your knees bent, with your
    1.98     back muscles and legs relaxed.
    1.99  
   1.100 -** =CORTEX= is a system for exploring embodiment
   1.101 +** =CORTEX= is a toolkit for building sensate creatures
   1.102  
   1.103     Hand integration demo
   1.104  
   1.105 -** =CORTEX= solves recognition problems using empathy
   1.106 -   
   1.107 -   worm empathy demo
   1.108 -
   1.109 -** Overview
   1.110 +** Contributions
   1.111  
   1.112  * Building =CORTEX=
   1.113  
   1.114 @@ -55,7 +131,7 @@
   1.115  
   1.116  ** Action recognition is easy with a full gamut of senses
   1.117  
   1.118 -** Digression: bootstrapping with multiple senses
   1.119 +** Digression: bootstrapping touch using free exploration
   1.120  
   1.121  ** \Phi-space describes the worm's experiences
   1.122  
   1.123 @@ -70,10 +146,6 @@
   1.124    - created a novel concept for action recognition by using artificial
   1.125      imagination. 
   1.126  
   1.127 -* =CORTEX= User Guide
   1.128 -
   1.129 -
   1.130 -
   1.131  In the second half of the thesis I develop a computational model of
   1.132  empathy, using =CORTEX= as a base. Empathy in this context is the
   1.133  ability to observe another creature and infer what sorts of sensations
   1.134 @@ -97,3 +169,7 @@
   1.135  primitives. It takes about 8 lines to describe the seemingly
   1.136  complicated action of wiggling.
   1.137  
   1.138 +
   1.139 +
   1.140 +* COMMENT names for cortex
   1.141 + - bioland
   1.142 \ No newline at end of file
author	Robert McIntyre <rlm@mit.edu>
date	Sun, 23 Mar 2014 22:20:44 -0400
parents	853377051f1e
children	4dcb923c9b16