view thesis/cortex.org @ 440:b01c070b03d4

save for tonight.
author Robert McIntyre <rlm@mit.edu>
date Sun, 23 Mar 2014 23:43:20 -0400
parents 97dc719fd1ac
children c20de2267d39
line wrap: on
line source
1 #+title: =CORTEX=
2 #+author: Robert McIntyre
3 #+email: rlm@mit.edu
4 #+description: Using embodied AI to facilitate Artificial Imagination.
5 #+keywords: AI, clojure, embodiment
8 * Empathy and Embodiment as problem solving strategies
10 By the end of this thesis, you will have seen a novel approach to
11 interpreting video using embodiment and empathy. You will have also
12 seen one way to efficiently implement empathy for embodied
13 creatures.
15 The core vision of this thesis is that one of the important ways in
16 which we understand others is by imagining ourselves in their
17 posistion and empathicaly feeling experiences based on our own past
18 experiences and imagination.
20 By understanding events in terms of our own previous corperal
21 experience, we greatly constrain the possibilities of what would
22 otherwise be an unweidly exponential search. This extra constraint
23 can be the difference between easily understanding what is happening
24 in a video and being completely lost in a sea of incomprehensible
25 color and movement.
27 ** Recognizing actions in video is extremely difficult
29 Consider for example the problem of determining what is happening in
30 a video of which this is one frame:
32 #+caption: A cat drinking some water. Identifying this action is
33 #+caption: beyond the state of the art for computers.
34 #+ATTR_LaTeX: :width 7cm
35 [[./images/cat-drinking.jpg]]
37 It is currently impossible for any computer program to reliably
38 label such an video as "drinking". And rightly so -- it is a very
39 hard problem! What features can you describe in terms of low level
40 functions of pixels that can even begin to describe what is
41 happening here?
43 Or suppose that you are building a program that recognizes
44 chairs. How could you ``see'' the chair in the following pictures?
46 #+caption: When you look at this, do you think ``chair''? I certainly do.
47 #+ATTR_LaTeX: :width 10cm
48 [[./images/invisible-chair.png]]
50 #+caption: The chair in this image is quite obvious to humans, but I
51 #+caption: doubt that any computer program can find it.
52 #+ATTR_LaTeX: :width 10cm
53 [[./images/fat-person-sitting-at-desk.jpg]]
55 Finally, how is it that you can easily tell the difference between
56 how the girls /muscles/ are working in \ref{girl}?
58 #+caption: The mysterious ``common sense'' appears here as you are able
59 #+caption: to ``see'' the difference in how the girl's arm muscles
60 #+caption: are activated differently in the two images.
61 #+name: girl
62 #+ATTR_LaTeX: :width 10cm
63 [[./images/wall-push.png]]
66 These problems are difficult because the language of pixels is far
67 removed from what we would consider to be an acceptable description
68 of the events in these images. In order to process them, we must
69 raise the images into some higher level of abstraction where their
70 descriptions become more similar to how we would describe them in
71 English. The question is, how can we raise
74 I think humans are able to label such video as "drinking" because
75 they imagine /themselves/ as the cat, and imagine putting their face
76 up against a stream of water and sticking out their tongue. In that
77 imagined world, they can feel the cool water hitting their tongue,
78 and feel the water entering their body, and are able to recognize
79 that /feeling/ as drinking. So, the label of the action is not
80 really in the pixels of the image, but is found clearly in a
81 simulation inspired by those pixels. An imaginative system, having
82 been trained on drinking and non-drinking examples and learning that
83 the most important component of drinking is the feeling of water
84 sliding down one's throat, would analyze a video of a cat drinking
85 in the following manner:
87 - Create a physical model of the video by putting a "fuzzy" model
88 of its own body in place of the cat. Also, create a simulation of
89 the stream of water.
91 - Play out this simulated scene and generate imagined sensory
92 experience. This will include relevant muscle contractions, a
93 close up view of the stream from the cat's perspective, and most
94 importantly, the imagined feeling of water entering the mouth.
96 - The action is now easily identified as drinking by the sense of
97 taste alone. The other senses (such as the tongue moving in and
98 out) help to give plausibility to the simulated action. Note that
99 the sense of vision, while critical in creating the simulation,
100 is not critical for identifying the action from the simulation.
102 cat drinking, mimes, leaning, common sense
104 ** =EMPATH= neatly solves recognition problems
106 factorization , right language, etc
108 a new possibility for the question ``what is a chair?'' -- it's the
109 feeling of your butt on something and your knees bent, with your
110 back muscles and legs relaxed.
112 ** =CORTEX= is a toolkit for building sensate creatures
114 Hand integration demo
116 ** Contributions
118 * Building =CORTEX=
120 ** To explore embodiment, we need a world, body, and senses
122 ** Because of Time, simulation is perferable to reality
124 ** Video game engines are a great starting point
126 ** Bodies are composed of segments connected by joints
128 ** Eyes reuse standard video game components
130 ** Hearing is hard; =CORTEX= does it right
132 ** Touch uses hundreds of hair-like elements
134 ** Proprioception is the sense that makes everything ``real''
136 ** Muscles are both effectors and sensors
138 ** =CORTEX= brings complex creatures to life!
140 ** =CORTEX= enables many possiblities for further research
142 * Empathy in a simulated worm
144 ** Embodiment factors action recognition into managable parts
146 ** Action recognition is easy with a full gamut of senses
148 ** Digression: bootstrapping touch using free exploration
150 ** \Phi-space describes the worm's experiences
152 ** Empathy is the process of tracing though \Phi-space
154 ** Efficient action recognition =EMPATH=
156 * Contributions
157 - Built =CORTEX=, a comprehensive platform for embodied AI
158 experiments. Has many new features lacking in other systems, such
159 as sound. Easy to model/create new creatures.
160 - created a novel concept for action recognition by using artificial
161 imagination.
163 In the second half of the thesis I develop a computational model of
164 empathy, using =CORTEX= as a base. Empathy in this context is the
165 ability to observe another creature and infer what sorts of sensations
166 that creature is feeling. My empathy algorithm involves multiple
167 phases. First is free-play, where the creature moves around and gains
168 sensory experience. From this experience I construct a representation
169 of the creature's sensory state space, which I call \phi-space. Using
170 \phi-space, I construct an efficient function for enriching the
171 limited data that comes from observing another creature with a full
172 compliment of imagined sensory data based on previous experience. I
173 can then use the imagined sensory data to recognize what the observed
174 creature is doing and feeling, using straightforward embodied action
175 predicates. This is all demonstrated with using a simple worm-like
176 creature, and recognizing worm-actions based on limited data.
178 Embodied representation using multiple senses such as touch,
179 proprioception, and muscle tension turns out be be exceedingly
180 efficient at describing body-centered actions. It is the ``right
181 language for the job''. For example, it takes only around 5 lines of
182 LISP code to describe the action of ``curling'' using embodied
183 primitives. It takes about 8 lines to describe the seemingly
184 complicated action of wiggling.
188 * COMMENT names for cortex
189 - bioland