# HG changeset patch # User Robert McIntyre # Date 1395632600 14400 # Node ID b01c070b03d4a892b9acfcf5b9aa5f01f85ddd90 # Parent 97dc719fd1acf8947478a45c6c432e0592af4d6c save for tonight. diff -r 97dc719fd1ac -r b01c070b03d4 thesis/cortex.org --- a/thesis/cortex.org Sun Mar 23 22:23:54 2014 -0400 +++ b/thesis/cortex.org Sun Mar 23 23:43:20 2014 -0400 @@ -41,7 +41,7 @@ happening here? Or suppose that you are building a program that recognizes - chairs. How could you ``see'' the chair in the following picture? + chairs. How could you ``see'' the chair in the following pictures? #+caption: When you look at this, do you think ``chair''? I certainly do. #+ATTR_LaTeX: :width 10cm @@ -52,19 +52,37 @@ #+ATTR_LaTeX: :width 10cm [[./images/fat-person-sitting-at-desk.jpg]] + Finally, how is it that you can easily tell the difference between + how the girls /muscles/ are working in \ref{girl}? - I think humans are able to label - such video as "drinking" because they imagine /themselves/ as the - cat, and imagine putting their face up against a stream of water and - sticking out their tongue. In that imagined world, they can feel the - cool water hitting their tongue, and feel the water entering their - body, and are able to recognize that /feeling/ as drinking. So, the - label of the action is not really in the pixels of the image, but is - found clearly in a simulation inspired by those pixels. An - imaginative system, having been trained on drinking and non-drinking - examples and learning that the most important component of drinking - is the feeling of water sliding down one's throat, would analyze a - video of a cat drinking in the following manner: + #+caption: The mysterious ``common sense'' appears here as you are able + #+caption: to ``see'' the difference in how the girl's arm muscles + #+caption: are activated differently in the two images. + #+name: girl + #+ATTR_LaTeX: :width 10cm + [[./images/wall-push.png]] + + + These problems are difficult because the language of pixels is far + removed from what we would consider to be an acceptable description + of the events in these images. In order to process them, we must + raise the images into some higher level of abstraction where their + descriptions become more similar to how we would describe them in + English. The question is, how can we raise + + + I think humans are able to label such video as "drinking" because + they imagine /themselves/ as the cat, and imagine putting their face + up against a stream of water and sticking out their tongue. In that + imagined world, they can feel the cool water hitting their tongue, + and feel the water entering their body, and are able to recognize + that /feeling/ as drinking. So, the label of the action is not + really in the pixels of the image, but is found clearly in a + simulation inspired by those pixels. An imaginative system, having + been trained on drinking and non-drinking examples and learning that + the most important component of drinking is the feeling of water + sliding down one's throat, would analyze a video of a cat drinking + in the following manner: - Create a physical model of the video by putting a "fuzzy" model of its own body in place of the cat. Also, create a simulation of @@ -81,12 +99,6 @@ the sense of vision, while critical in creating the simulation, is not critical for identifying the action from the simulation. - - - - - - cat drinking, mimes, leaning, common sense ** =EMPATH= neatly solves recognition problems @@ -119,7 +131,7 @@ ** Touch uses hundreds of hair-like elements -** Proprioception is the force that makes everything ``real'' +** Proprioception is the sense that makes everything ``real'' ** Muscles are both effectors and sensors @@ -139,7 +151,7 @@ ** Empathy is the process of tracing though \Phi-space -** Efficient action recognition via empathy +** Efficient action recognition =EMPATH= * Contributions - Built =CORTEX=, a comprehensive platform for embodied AI diff -r 97dc719fd1ac -r b01c070b03d4 thesis/images/wall-push.png Binary file thesis/images/wall-push.png has changed