view thesis/cortex.org @ 439:97dc719fd1ac

fix labels.
author Robert McIntyre <rlm@mit.edu>
date Sun, 23 Mar 2014 22:23:54 -0400
parents 4dcb923c9b16
children b01c070b03d4
line wrap: on
line source
1 #+title: =CORTEX=
2 #+author: Robert McIntyre
3 #+email: rlm@mit.edu
4 #+description: Using embodied AI to facilitate Artificial Imagination.
5 #+keywords: AI, clojure, embodiment
8 * Empathy and Embodiment as problem solving strategies
10 By the end of this thesis, you will have seen a novel approach to
11 interpreting video using embodiment and empathy. You will have also
12 seen one way to efficiently implement empathy for embodied
13 creatures.
15 The core vision of this thesis is that one of the important ways in
16 which we understand others is by imagining ourselves in their
17 posistion and empathicaly feeling experiences based on our own past
18 experiences and imagination.
20 By understanding events in terms of our own previous corperal
21 experience, we greatly constrain the possibilities of what would
22 otherwise be an unweidly exponential search. This extra constraint
23 can be the difference between easily understanding what is happening
24 in a video and being completely lost in a sea of incomprehensible
25 color and movement.
27 ** Recognizing actions in video is extremely difficult
29 Consider for example the problem of determining what is happening in
30 a video of which this is one frame:
32 #+caption: A cat drinking some water. Identifying this action is
33 #+caption: beyond the state of the art for computers.
34 #+ATTR_LaTeX: :width 7cm
35 [[./images/cat-drinking.jpg]]
37 It is currently impossible for any computer program to reliably
38 label such an video as "drinking". And rightly so -- it is a very
39 hard problem! What features can you describe in terms of low level
40 functions of pixels that can even begin to describe what is
41 happening here?
43 Or suppose that you are building a program that recognizes
44 chairs. How could you ``see'' the chair in the following picture?
46 #+caption: When you look at this, do you think ``chair''? I certainly do.
47 #+ATTR_LaTeX: :width 10cm
48 [[./images/invisible-chair.png]]
50 #+caption: The chair in this image is quite obvious to humans, but I
51 #+caption: doubt that any computer program can find it.
52 #+ATTR_LaTeX: :width 10cm
53 [[./images/fat-person-sitting-at-desk.jpg]]
56 I think humans are able to label
57 such video as "drinking" because they imagine /themselves/ as the
58 cat, and imagine putting their face up against a stream of water and
59 sticking out their tongue. In that imagined world, they can feel the
60 cool water hitting their tongue, and feel the water entering their
61 body, and are able to recognize that /feeling/ as drinking. So, the
62 label of the action is not really in the pixels of the image, but is
63 found clearly in a simulation inspired by those pixels. An
64 imaginative system, having been trained on drinking and non-drinking
65 examples and learning that the most important component of drinking
66 is the feeling of water sliding down one's throat, would analyze a
67 video of a cat drinking in the following manner:
69 - Create a physical model of the video by putting a "fuzzy" model
70 of its own body in place of the cat. Also, create a simulation of
71 the stream of water.
73 - Play out this simulated scene and generate imagined sensory
74 experience. This will include relevant muscle contractions, a
75 close up view of the stream from the cat's perspective, and most
76 importantly, the imagined feeling of water entering the mouth.
78 - The action is now easily identified as drinking by the sense of
79 taste alone. The other senses (such as the tongue moving in and
80 out) help to give plausibility to the simulated action. Note that
81 the sense of vision, while critical in creating the simulation,
82 is not critical for identifying the action from the simulation.
90 cat drinking, mimes, leaning, common sense
92 ** =EMPATH= neatly solves recognition problems
94 factorization , right language, etc
96 a new possibility for the question ``what is a chair?'' -- it's the
97 feeling of your butt on something and your knees bent, with your
98 back muscles and legs relaxed.
100 ** =CORTEX= is a toolkit for building sensate creatures
102 Hand integration demo
104 ** Contributions
106 * Building =CORTEX=
108 ** To explore embodiment, we need a world, body, and senses
110 ** Because of Time, simulation is perferable to reality
112 ** Video game engines are a great starting point
114 ** Bodies are composed of segments connected by joints
116 ** Eyes reuse standard video game components
118 ** Hearing is hard; =CORTEX= does it right
120 ** Touch uses hundreds of hair-like elements
122 ** Proprioception is the force that makes everything ``real''
124 ** Muscles are both effectors and sensors
126 ** =CORTEX= brings complex creatures to life!
128 ** =CORTEX= enables many possiblities for further research
130 * Empathy in a simulated worm
132 ** Embodiment factors action recognition into managable parts
134 ** Action recognition is easy with a full gamut of senses
136 ** Digression: bootstrapping touch using free exploration
138 ** \Phi-space describes the worm's experiences
140 ** Empathy is the process of tracing though \Phi-space
142 ** Efficient action recognition via empathy
144 * Contributions
145 - Built =CORTEX=, a comprehensive platform for embodied AI
146 experiments. Has many new features lacking in other systems, such
147 as sound. Easy to model/create new creatures.
148 - created a novel concept for action recognition by using artificial
149 imagination.
151 In the second half of the thesis I develop a computational model of
152 empathy, using =CORTEX= as a base. Empathy in this context is the
153 ability to observe another creature and infer what sorts of sensations
154 that creature is feeling. My empathy algorithm involves multiple
155 phases. First is free-play, where the creature moves around and gains
156 sensory experience. From this experience I construct a representation
157 of the creature's sensory state space, which I call \phi-space. Using
158 \phi-space, I construct an efficient function for enriching the
159 limited data that comes from observing another creature with a full
160 compliment of imagined sensory data based on previous experience. I
161 can then use the imagined sensory data to recognize what the observed
162 creature is doing and feeling, using straightforward embodied action
163 predicates. This is all demonstrated with using a simple worm-like
164 creature, and recognizing worm-actions based on limited data.
166 Embodied representation using multiple senses such as touch,
167 proprioception, and muscle tension turns out be be exceedingly
168 efficient at describing body-centered actions. It is the ``right
169 language for the job''. For example, it takes only around 5 lines of
170 LISP code to describe the action of ``curling'' using embodied
171 primitives. It takes about 8 lines to describe the seemingly
172 complicated action of wiggling.
176 * COMMENT names for cortex
177 - bioland