Mercurial > cortex
diff thesis/cortex.org @ 440:b01c070b03d4
save for tonight.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Sun, 23 Mar 2014 23:43:20 -0400 |
parents | 97dc719fd1ac |
children | c20de2267d39 |
line wrap: on
line diff
1.1 --- a/thesis/cortex.org Sun Mar 23 22:23:54 2014 -0400 1.2 +++ b/thesis/cortex.org Sun Mar 23 23:43:20 2014 -0400 1.3 @@ -41,7 +41,7 @@ 1.4 happening here? 1.5 1.6 Or suppose that you are building a program that recognizes 1.7 - chairs. How could you ``see'' the chair in the following picture? 1.8 + chairs. How could you ``see'' the chair in the following pictures? 1.9 1.10 #+caption: When you look at this, do you think ``chair''? I certainly do. 1.11 #+ATTR_LaTeX: :width 10cm 1.12 @@ -52,19 +52,37 @@ 1.13 #+ATTR_LaTeX: :width 10cm 1.14 [[./images/fat-person-sitting-at-desk.jpg]] 1.15 1.16 + Finally, how is it that you can easily tell the difference between 1.17 + how the girls /muscles/ are working in \ref{girl}? 1.18 1.19 - I think humans are able to label 1.20 - such video as "drinking" because they imagine /themselves/ as the 1.21 - cat, and imagine putting their face up against a stream of water and 1.22 - sticking out their tongue. In that imagined world, they can feel the 1.23 - cool water hitting their tongue, and feel the water entering their 1.24 - body, and are able to recognize that /feeling/ as drinking. So, the 1.25 - label of the action is not really in the pixels of the image, but is 1.26 - found clearly in a simulation inspired by those pixels. An 1.27 - imaginative system, having been trained on drinking and non-drinking 1.28 - examples and learning that the most important component of drinking 1.29 - is the feeling of water sliding down one's throat, would analyze a 1.30 - video of a cat drinking in the following manner: 1.31 + #+caption: The mysterious ``common sense'' appears here as you are able 1.32 + #+caption: to ``see'' the difference in how the girl's arm muscles 1.33 + #+caption: are activated differently in the two images. 1.34 + #+name: girl 1.35 + #+ATTR_LaTeX: :width 10cm 1.36 + [[./images/wall-push.png]] 1.37 + 1.38 + 1.39 + These problems are difficult because the language of pixels is far 1.40 + removed from what we would consider to be an acceptable description 1.41 + of the events in these images. In order to process them, we must 1.42 + raise the images into some higher level of abstraction where their 1.43 + descriptions become more similar to how we would describe them in 1.44 + English. The question is, how can we raise 1.45 + 1.46 + 1.47 + I think humans are able to label such video as "drinking" because 1.48 + they imagine /themselves/ as the cat, and imagine putting their face 1.49 + up against a stream of water and sticking out their tongue. In that 1.50 + imagined world, they can feel the cool water hitting their tongue, 1.51 + and feel the water entering their body, and are able to recognize 1.52 + that /feeling/ as drinking. So, the label of the action is not 1.53 + really in the pixels of the image, but is found clearly in a 1.54 + simulation inspired by those pixels. An imaginative system, having 1.55 + been trained on drinking and non-drinking examples and learning that 1.56 + the most important component of drinking is the feeling of water 1.57 + sliding down one's throat, would analyze a video of a cat drinking 1.58 + in the following manner: 1.59 1.60 - Create a physical model of the video by putting a "fuzzy" model 1.61 of its own body in place of the cat. Also, create a simulation of 1.62 @@ -81,12 +99,6 @@ 1.63 the sense of vision, while critical in creating the simulation, 1.64 is not critical for identifying the action from the simulation. 1.65 1.66 - 1.67 - 1.68 - 1.69 - 1.70 - 1.71 - 1.72 cat drinking, mimes, leaning, common sense 1.73 1.74 ** =EMPATH= neatly solves recognition problems 1.75 @@ -119,7 +131,7 @@ 1.76 1.77 ** Touch uses hundreds of hair-like elements 1.78 1.79 -** Proprioception is the force that makes everything ``real'' 1.80 +** Proprioception is the sense that makes everything ``real'' 1.81 1.82 ** Muscles are both effectors and sensors 1.83 1.84 @@ -139,7 +151,7 @@ 1.85 1.86 ** Empathy is the process of tracing though \Phi-space 1.87 1.88 -** Efficient action recognition via empathy 1.89 +** Efficient action recognition =EMPATH= 1.90 1.91 * Contributions 1.92 - Built =CORTEX=, a comprehensive platform for embodied AI