Mercurial > cortex
comparison thesis/org/first-chapter.org @ 430:5205535237fb
fix skew in self-organizing-touch, work on thesis.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Sat, 22 Mar 2014 16:10:34 -0400 |
parents | thesis/aux/org/first-chapter.org@b5d0f0adf19f |
children |
comparison
equal
deleted
inserted
replaced
429:b5d0f0adf19f | 430:5205535237fb |
---|---|
1 #+title: =CORTEX= | |
2 #+author: Robert McIntyre | |
3 #+email: rlm@mit.edu | |
4 #+description: Using embodied AI to facilitate Artificial Imagination. | |
5 #+keywords: AI, clojure, embodiment | |
6 #+SETUPFILE: ../../aurellem/org/setup.org | |
7 #+INCLUDE: ../../aurellem/org/level-0.org | |
8 #+babel: :mkdirp yes :noweb yes :exports both | |
9 #+OPTIONS: toc:nil, num:nil | |
10 | |
11 * Artificial Imagination | |
12 Imagine watching a video of someone skateboarding. When you watch | |
13 the video, you can imagine yourself skateboarding, and your | |
14 knowledge of the human body and its dynamics guides your | |
15 interpretation of the scene. For example, even if the skateboarder | |
16 is partially occluded, you can infer the positions of his arms and | |
17 body from your own knowledge of how your body would be positioned if | |
18 you were skateboarding. If the skateboarder suffers an accident, you | |
19 wince in sympathy, imagining the pain your own body would experience | |
20 if it were in the same situation. This empathy with other people | |
21 guides our understanding of whatever they are doing because it is a | |
22 powerful constraint on what is probable and possible. In order to | |
23 make use of this powerful empathy constraint, I need a system that | |
24 can generate and make sense of sensory data from the many different | |
25 senses that humans possess. The two key proprieties of such a system | |
26 are /embodiment/ and /imagination/. | |
27 | |
28 ** What is imagination? | |
29 | |
30 One kind of imagination is /sympathetic/ imagination: you imagine | |
31 yourself in the position of something/someone you are | |
32 observing. This type of imagination comes into play when you follow | |
33 along visually when watching someone perform actions, or when you | |
34 sympathetically grimace when someone hurts themselves. This type of | |
35 imagination uses the constraints you have learned about your own | |
36 body to highly constrain the possibilities in whatever you are | |
37 seeing. It uses all your senses to including your senses of touch, | |
38 proprioception, etc. Humans are flexible when it comes to "putting | |
39 themselves in another's shoes," and can sympathetically understand | |
40 not only other humans, but entities ranging from animals to cartoon | |
41 characters to [[http://www.youtube.com/watch?v=0jz4HcwTQmU][single dots]] on a screen! | |
42 | |
43 # and can infer intention from the actions of not only other humans, | |
44 # but also animals, cartoon characters, and even abstract moving dots | |
45 # on a screen! | |
46 | |
47 Another kind of imagination is /predictive/ imagination: you | |
48 construct scenes in your mind that are not entirely related to | |
49 whatever you are observing, but instead are predictions of the | |
50 future or simply flights of fancy. You use this type of imagination | |
51 to plan out multi-step actions, or play out dangerous situations in | |
52 your mind so as to avoid messing them up in reality. | |
53 | |
54 Of course, sympathetic and predictive imagination blend into each | |
55 other and are not completely separate concepts. One dimension along | |
56 which you can distinguish types of imagination is dependence on raw | |
57 sense data. Sympathetic imagination is highly constrained by your | |
58 senses, while predictive imagination can be more or less dependent | |
59 on your senses depending on how far ahead you imagine. Daydreaming | |
60 is an extreme form of predictive imagination that wanders through | |
61 different possibilities without concern for whether they are | |
62 related to whatever is happening in reality. | |
63 | |
64 For this thesis, I will mostly focus on sympathetic imagination and | |
65 the constraint it provides for understanding sensory data. | |
66 | |
67 ** What problems can imagination solve? | |
68 | |
69 Consider a video of a cat drinking some water. | |
70 | |
71 #+caption: A cat drinking some water. Identifying this action is beyond the state of the art for computers. | |
72 #+ATTR_LaTeX: width=5cm | |
73 [[../images/cat-drinking.jpg]] | |
74 | |
75 It is currently impossible for any computer program to reliably | |
76 label such an video as "drinking". I think humans are able to label | |
77 such video as "drinking" because they imagine /themselves/ as the | |
78 cat, and imagine putting their face up against a stream of water | |
79 and sticking out their tongue. In that imagined world, they can | |
80 feel the cool water hitting their tongue, and feel the water | |
81 entering their body, and are able to recognize that /feeling/ as | |
82 drinking. So, the label of the action is not really in the pixels | |
83 of the image, but is found clearly in a simulation inspired by | |
84 those pixels. An imaginative system, having been trained on | |
85 drinking and non-drinking examples and learning that the most | |
86 important component of drinking is the feeling of water sliding | |
87 down one's throat, would analyze a video of a cat drinking in the | |
88 following manner: | |
89 | |
90 - Create a physical model of the video by putting a "fuzzy" model | |
91 of its own body in place of the cat. Also, create a simulation of | |
92 the stream of water. | |
93 | |
94 - Play out this simulated scene and generate imagined sensory | |
95 experience. This will include relevant muscle contractions, a | |
96 close up view of the stream from the cat's perspective, and most | |
97 importantly, the imagined feeling of water entering the mouth. | |
98 | |
99 - The action is now easily identified as drinking by the sense of | |
100 taste alone. The other senses (such as the tongue moving in and | |
101 out) help to give plausibility to the simulated action. Note that | |
102 the sense of vision, while critical in creating the simulation, | |
103 is not critical for identifying the action from the simulation. | |
104 | |
105 More generally, I expect imaginative systems to be particularly | |
106 good at identifying embodied actions in videos. | |
107 | |
108 * Cortex | |
109 | |
110 The previous example involves liquids, the sense of taste, and | |
111 imagining oneself as a cat. For this thesis I constrain myself to | |
112 simpler, more easily digitizable senses and situations. | |
113 | |
114 My system, =CORTEX= performs imagination in two different simplified | |
115 worlds: /worm world/ and /stick-figure world/. In each of these | |
116 worlds, entities capable of imagination recognize actions by | |
117 simulating the experience from their own perspective, and then | |
118 recognizing the action from a database of examples. | |
119 | |
120 In order to serve as a framework for experiments in imagination, | |
121 =CORTEX= requires simulated bodies, worlds, and senses like vision, | |
122 hearing, touch, proprioception, etc. | |
123 | |
124 ** A Video Game Engine takes care of some of the groundwork | |
125 | |
126 When it comes to simulation environments, the engines used to | |
127 create the worlds in video games offer top-notch physics and | |
128 graphics support. These engines also have limited support for | |
129 creating cameras and rendering 3D sound, which can be repurposed | |
130 for vision and hearing respectively. Physics collision detection | |
131 can be expanded to create a sense of touch. | |
132 | |
133 jMonkeyEngine3 is one such engine for creating video games in | |
134 Java. It uses OpenGL to render to the screen and uses screengraphs | |
135 to avoid drawing things that do not appear on the screen. It has an | |
136 active community and several games in the pipeline. The engine was | |
137 not built to serve any particular game but is instead meant to be | |
138 used for any 3D game. I chose jMonkeyEngine3 it because it had the | |
139 most features out of all the open projects I looked at, and because | |
140 I could then write my code in Clojure, an implementation of LISP | |
141 that runs on the JVM. | |
142 | |
143 ** =CORTEX= Extends jMonkeyEngine3 to implement rich senses | |
144 | |
145 Using the game-making primitives provided by jMonkeyEngine3, I have | |
146 constructed every major human sense except for smell and | |
147 taste. =CORTEX= also provides an interface for creating creatures | |
148 in Blender, a 3D modeling environment, and then "rigging" the | |
149 creatures with senses using 3D annotations in Blender. A creature | |
150 can have any number of senses, and there can be any number of | |
151 creatures in a simulation. | |
152 | |
153 The senses available in =CORTEX= are: | |
154 | |
155 - [[../../cortex/html/vision.html][Vision]] | |
156 - [[../../cortex/html/hearing.html][Hearing]] | |
157 - [[../../cortex/html/touch.html][Touch]] | |
158 - [[../../cortex/html/proprioception.html][Proprioception]] | |
159 - [[../../cortex/html/movement.html][Muscle Tension]] | |
160 | |
161 * A roadmap for =CORTEX= experiments | |
162 | |
163 ** Worm World | |
164 | |
165 Worms in =CORTEX= are segmented creatures which vary in length and | |
166 number of segments, and have the senses of vision, proprioception, | |
167 touch, and muscle tension. | |
168 | |
169 #+attr_html: width=755 | |
170 #+caption: This is the tactile-sensor-profile for the upper segment of a worm. It defines regions of high touch sensitivity (where there are many white pixels) and regions of low sensitivity (where white pixels are sparse). | |
171 [[../images/finger-UV.png]] | |
172 | |
173 | |
174 #+begin_html | |
175 <div class="figure"> | |
176 <center> | |
177 <video controls="controls" width="550"> | |
178 <source src="../video/worm-touch.ogg" type="video/ogg" | |
179 preload="none" /> | |
180 </video> | |
181 <br> <a href="http://youtu.be/RHx2wqzNVcU"> YouTube </a> | |
182 </center> | |
183 <p>The worm responds to touch.</p> | |
184 </div> | |
185 #+end_html | |
186 | |
187 #+begin_html | |
188 <div class="figure"> | |
189 <center> | |
190 <video controls="controls" width="550"> | |
191 <source src="../video/test-proprioception.ogg" type="video/ogg" | |
192 preload="none" /> | |
193 </video> | |
194 <br> <a href="http://youtu.be/JjdDmyM8b0w"> YouTube </a> | |
195 </center> | |
196 <p>Proprioception in a worm. The proprioceptive readout is | |
197 in the upper left corner of the screen.</p> | |
198 </div> | |
199 #+end_html | |
200 | |
201 A worm is trained in various actions such as sinusoidal movement, | |
202 curling, flailing, and spinning by directly playing motor | |
203 contractions while the worm "feels" the experience. These actions | |
204 are recorded both as vectors of muscle tension, touch, and | |
205 proprioceptive data, but also in higher level forms such as | |
206 frequencies of the various contractions and a symbolic name for the | |
207 action. | |
208 | |
209 Then, the worm watches a video of another worm performing one of | |
210 the actions, and must judge which action was performed. Normally | |
211 this would be an extremely difficult problem, but the worm is able | |
212 to greatly diminish the search space through sympathetic | |
213 imagination. First, it creates an imagined copy of its body which | |
214 it observes from a third person point of view. Then for each frame | |
215 of the video, it maneuvers its simulated body to be in registration | |
216 with the worm depicted in the video. The physical constraints | |
217 imposed by the physics simulation greatly decrease the number of | |
218 poses that have to be tried, making the search feasible. As the | |
219 imaginary worm moves, it generates imaginary muscle tension and | |
220 proprioceptive sensations. The worm determines the action not by | |
221 vision, but by matching the imagined proprioceptive data with | |
222 previous examples. | |
223 | |
224 By using non-visual sensory data such as touch, the worms can also | |
225 answer body related questions such as "did your head touch your | |
226 tail?" and "did worm A touch worm B?" | |
227 | |
228 The proprioceptive information used for action identification is | |
229 body-centric, so only the registration step is dependent on point | |
230 of view, not the identification step. Registration is not specific | |
231 to any particular action. Thus, action identification can be | |
232 divided into a point-of-view dependent generic registration step, | |
233 and a action-specific step that is body-centered and invariant to | |
234 point of view. | |
235 | |
236 ** Stick Figure World | |
237 | |
238 This environment is similar to Worm World, except the creatures are | |
239 more complicated and the actions and questions more varied. It is | |
240 an experiment to see how far imagination can go in interpreting | |
241 actions. |