diff thesis/cortex.org @ 440:b01c070b03d4

save for tonight.
author Robert McIntyre <rlm@mit.edu>
date Sun, 23 Mar 2014 23:43:20 -0400
parents 97dc719fd1ac
children c20de2267d39
line wrap: on
line diff
     1.1 --- a/thesis/cortex.org	Sun Mar 23 22:23:54 2014 -0400
     1.2 +++ b/thesis/cortex.org	Sun Mar 23 23:43:20 2014 -0400
     1.3 @@ -41,7 +41,7 @@
     1.4    happening here? 
     1.5    
     1.6    Or suppose that you are building a program that recognizes
     1.7 -  chairs. How could you ``see'' the chair in the following picture?
     1.8 +  chairs. How could you ``see'' the chair in the following pictures?
     1.9  
    1.10    #+caption: When you look at this, do you think ``chair''? I certainly do.
    1.11    #+ATTR_LaTeX: :width 10cm
    1.12 @@ -52,19 +52,37 @@
    1.13    #+ATTR_LaTeX: :width 10cm
    1.14    [[./images/fat-person-sitting-at-desk.jpg]]
    1.15  
    1.16 +  Finally, how is it that you can easily tell the difference between
    1.17 +  how the girls /muscles/ are working in \ref{girl}?
    1.18  
    1.19 -  I think humans are able to label
    1.20 -  such video as "drinking" because they imagine /themselves/ as the
    1.21 -  cat, and imagine putting their face up against a stream of water and
    1.22 -  sticking out their tongue. In that imagined world, they can feel the
    1.23 -  cool water hitting their tongue, and feel the water entering their
    1.24 -  body, and are able to recognize that /feeling/ as drinking. So, the
    1.25 -  label of the action is not really in the pixels of the image, but is
    1.26 -  found clearly in a simulation inspired by those pixels. An
    1.27 -  imaginative system, having been trained on drinking and non-drinking
    1.28 -  examples and learning that the most important component of drinking
    1.29 -  is the feeling of water sliding down one's throat, would analyze a
    1.30 -  video of a cat drinking in the following manner:
    1.31 +  #+caption: The mysterious ``common sense'' appears here as you are able 
    1.32 +  #+caption: to ``see'' the difference in how the girl's arm muscles
    1.33 +  #+caption: are activated differently in the two images.
    1.34 +  #+name: girl
    1.35 +  #+ATTR_LaTeX: :width 10cm
    1.36 +  [[./images/wall-push.png]]
    1.37 +  
    1.38 +
    1.39 +  These problems are difficult because the language of pixels is far
    1.40 +  removed from what we would consider to be an acceptable description
    1.41 +  of the events in these images. In order to process them, we must
    1.42 +  raise the images into some higher level of abstraction where their
    1.43 +  descriptions become more similar to how we would describe them in
    1.44 +  English. The question is, how can we raise 
    1.45 +  
    1.46 +
    1.47 +  I think humans are able to label such video as "drinking" because
    1.48 +  they imagine /themselves/ as the cat, and imagine putting their face
    1.49 +  up against a stream of water and sticking out their tongue. In that
    1.50 +  imagined world, they can feel the cool water hitting their tongue,
    1.51 +  and feel the water entering their body, and are able to recognize
    1.52 +  that /feeling/ as drinking. So, the label of the action is not
    1.53 +  really in the pixels of the image, but is found clearly in a
    1.54 +  simulation inspired by those pixels. An imaginative system, having
    1.55 +  been trained on drinking and non-drinking examples and learning that
    1.56 +  the most important component of drinking is the feeling of water
    1.57 +  sliding down one's throat, would analyze a video of a cat drinking
    1.58 +  in the following manner:
    1.59     
    1.60     - Create a physical model of the video by putting a "fuzzy" model
    1.61       of its own body in place of the cat. Also, create a simulation of
    1.62 @@ -81,12 +99,6 @@
    1.63       the sense of vision, while critical in creating the simulation,
    1.64       is not critical for identifying the action from the simulation.
    1.65  
    1.66 -
    1.67 -
    1.68 -
    1.69 -
    1.70 -
    1.71 -
    1.72     cat drinking, mimes, leaning, common sense
    1.73  
    1.74  ** =EMPATH= neatly solves recognition problems
    1.75 @@ -119,7 +131,7 @@
    1.76  
    1.77  ** Touch uses hundreds of hair-like elements
    1.78  
    1.79 -** Proprioception is the force that makes everything ``real''
    1.80 +** Proprioception is the sense that makes everything ``real''
    1.81  
    1.82  ** Muscles are both effectors and sensors
    1.83  
    1.84 @@ -139,7 +151,7 @@
    1.85  
    1.86  ** Empathy is the process of tracing though \Phi-space 
    1.87    
    1.88 -** Efficient action recognition via empathy
    1.89 +** Efficient action recognition =EMPATH=
    1.90  
    1.91  * Contributions
    1.92    - Built =CORTEX=, a comprehensive platform for embodied AI