view thesis/org/first-chapter.html @ 539:fc116e960f56

more elaboration
author Robert McIntyre <rlm@mit.edu>
date Sun, 27 Apr 2014 21:52:39 -0400
parents 5205535237fb
children
line wrap: on
line source
1 <?xml version="1.0" encoding="utf-8"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
4 <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
5 <head>
6 <title><code>CORTEX</code></title>
7 <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
8 <meta name="title" content="<code>CORTEX</code>"/>
9 <meta name="generator" content="Org-mode"/>
10 <meta name="generated" content="2013-11-07 04:21:29 EST"/>
11 <meta name="author" content="Robert McIntyre"/>
12 <meta name="description" content="Using embodied AI to facilitate Artificial Imagination."/>
13 <meta name="keywords" content="AI, clojure, embodiment"/>
14 <style type="text/css">
15 <!--/*--><![CDATA[/*><!--*/
16 html { font-family: Times, serif; font-size: 12pt; }
17 .title { text-align: center; }
18 .todo { color: red; }
19 .done { color: green; }
20 .tag { background-color: #add8e6; font-weight:normal }
21 .target { }
22 .timestamp { color: #bebebe; }
23 .timestamp-kwd { color: #5f9ea0; }
24 .right {margin-left:auto; margin-right:0px; text-align:right;}
25 .left {margin-left:0px; margin-right:auto; text-align:left;}
26 .center {margin-left:auto; margin-right:auto; text-align:center;}
27 p.verse { margin-left: 3% }
28 pre {
29 border: 1pt solid #AEBDCC;
30 background-color: #F3F5F7;
31 padding: 5pt;
32 font-family: courier, monospace;
33 font-size: 90%;
34 overflow:auto;
35 }
36 table { border-collapse: collapse; }
37 td, th { vertical-align: top; }
38 th.right { text-align:center; }
39 th.left { text-align:center; }
40 th.center { text-align:center; }
41 td.right { text-align:right; }
42 td.left { text-align:left; }
43 td.center { text-align:center; }
44 dt { font-weight: bold; }
45 div.figure { padding: 0.5em; }
46 div.figure p { text-align: center; }
47 div.inlinetask {
48 padding:10px;
49 border:2px solid gray;
50 margin:10px;
51 background: #ffffcc;
52 }
53 textarea { overflow-x: auto; }
54 .linenr { font-size:smaller }
55 .code-highlighted {background-color:#ffff00;}
56 .org-info-js_info-navigation { border-style:none; }
57 #org-info-js_console-label { font-size:10px; font-weight:bold;
58 white-space:nowrap; }
59 .org-info-js_search-highlight {background-color:#ffff00; color:#000000;
60 font-weight:bold; }
61 /*]]>*/-->
62 </style>
63 <script type="text/javascript">var _gaq = _gaq || [];_gaq.push(['_setAccount', 'UA-31261312-1']);_gaq.push(['_trackPageview']);(function() {var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);})();</script><link rel="stylesheet" type="text/css" href="../../aurellem/css/argentum.css" />
64 <script type="text/javascript">
65 <!--/*--><![CDATA[/*><!--*/
66 function CodeHighlightOn(elem, id)
67 {
68 var target = document.getElementById(id);
69 if(null != target) {
70 elem.cacheClassElem = elem.className;
71 elem.cacheClassTarget = target.className;
72 target.className = "code-highlighted";
73 elem.className = "code-highlighted";
74 }
75 }
76 function CodeHighlightOff(elem, id)
77 {
78 var target = document.getElementById(id);
79 if(elem.cacheClassElem)
80 elem.className = elem.cacheClassElem;
81 if(elem.cacheClassTarget)
82 target.className = elem.cacheClassTarget;
83 }
84 /*]]>*///-->
85 </script>
87 </head>
88 <body>
91 <div id="content">
92 <h1 class="title"><code>CORTEX</code></h1>
95 <div class="header">
96 <div class="float-right">
97 <!--
98 <form>
99 <input type="text"/><input type="submit" value="search the blog &raquo;"/>
100 </form>
101 -->
102 </div>
104 <h1>aurellem <em>&#x2609;</em></h1>
105 <ul class="nav">
106 <li><a href="/">read the blog &raquo;</a></li>
107 <!-- li><a href="#">learn about us &raquo;</a></li-->
108 </ul>
109 </div>
111 <div class="author">Written by <author>Robert McIntyre</author></div>
119 <div id="outline-container-1" class="outline-2">
120 <h2 id="sec-1">Artificial Imagination</h2>
121 <div class="outline-text-2" id="text-1">
124 <p>
125 Imagine watching a video of someone skateboarding. When you watch
126 the video, you can imagine yourself skateboarding, and your
127 knowledge of the human body and its dynamics guides your
128 interpretation of the scene. For example, even if the skateboarder
129 is partially occluded, you can infer the positions of his arms and
130 body from your own knowledge of how your body would be positioned if
131 you were skateboarding. If the skateboarder suffers an accident, you
132 wince in sympathy, imagining the pain your own body would experience
133 if it were in the same situation. This empathy with other people
134 guides our understanding of whatever they are doing because it is a
135 powerful constraint on what is probable and possible. In order to
136 make use of this powerful empathy constraint, I need a system that
137 can generate and make sense of sensory data from the many different
138 senses that humans possess. The two key proprieties of such a system
139 are <i>embodiment</i> and <i>imagination</i>.
140 </p>
142 </div>
144 <div id="outline-container-1-1" class="outline-3">
145 <h3 id="sec-1-1">What is imagination?</h3>
146 <div class="outline-text-3" id="text-1-1">
149 <p>
150 One kind of imagination is <i>sympathetic</i> imagination: you imagine
151 yourself in the position of something/someone you are
152 observing. This type of imagination comes into play when you follow
153 along visually when watching someone perform actions, or when you
154 sympathetically grimace when someone hurts themselves. This type of
155 imagination uses the constraints you have learned about your own
156 body to highly constrain the possibilities in whatever you are
157 seeing. It uses all your senses to including your senses of touch,
158 proprioception, etc. Humans are flexible when it comes to "putting
159 themselves in another's shoes," and can sympathetically understand
160 not only other humans, but entities ranging animals to cartoon
161 characters to <a href="http://www.youtube.com/watch?v=0jz4HcwTQmU">single dots</a> on a screen!
162 </p>
163 <p>
164 Another kind of imagination is <i>predictive</i> imagination: you
165 construct scenes in your mind that are not entirely related to
166 whatever you are observing, but instead are predictions of the
167 future or simply flights of fancy. You use this type of imagination
168 to plan out multi-step actions, or play out dangerous situations in
169 your mind so as to avoid messing them up in reality.
170 </p>
171 <p>
172 Of course, sympathetic and predictive imagination blend into each
173 other and are not completely separate concepts. One dimension along
174 which you can distinguish types of imagination is dependence on raw
175 sense data. Sympathetic imagination is highly constrained by your
176 senses, while predictive imagination can be more or less dependent
177 on your senses depending on how far ahead you imagine. Daydreaming
178 is an extreme form of predictive imagination that wanders through
179 different possibilities without concern for whether they are
180 related to whatever is happening in reality.
181 </p>
182 <p>
183 For this thesis, I will mostly focus on sympathetic imagination and
184 the constraint it provides for understanding sensory data.
185 </p>
186 </div>
188 </div>
190 <div id="outline-container-1-2" class="outline-3">
191 <h3 id="sec-1-2">What problems can imagination solve?</h3>
192 <div class="outline-text-3" id="text-1-2">
195 <p>
196 Consider a video of a cat drinking some water.
197 </p>
199 <div class="figure">
200 <p><img src="../images/cat-drinking.jpg" alt="../images/cat-drinking.jpg" /></p>
201 <p>A cat drinking some water. Identifying this action is beyond the state of the art for computers.</p>
202 </div>
204 <p>
205 It is currently impossible for any computer program to reliably
206 label such an video as "drinking". I think humans are able to label
207 such video as "drinking" because they imagine <i>themselves</i> as the
208 cat, and imagine putting their face up against a stream of water
209 and sticking out their tongue. In that imagined world, they can
210 feel the cool water hitting their tongue, and feel the water
211 entering their body, and are able to recognize that <i>feeling</i> as
212 drinking. So, the label of the action is not really in the pixels
213 of the image, but is found clearly in a simulation inspired by
214 those pixels. An imaginative system, having been trained on
215 drinking and non-drinking examples and learning that the most
216 important component of drinking is the feeling of water sliding
217 down one's throat, would analyze a video of a cat drinking in the
218 following manner:
219 </p>
220 <ul>
221 <li>Create a physical model of the video by putting a "fuzzy" model
222 of its own body in place of the cat. Also, create a simulation of
223 the stream of water.
225 </li>
226 <li>Play out this simulated scene and generate imagined sensory
227 experience. This will include relevant muscle contractions, a
228 close up view of the stream from the cat's perspective, and most
229 importantly, the imagined feeling of water entering the mouth.
231 </li>
232 <li>The action is now easily identified as drinking by the sense of
233 taste alone. The other senses (such as the tongue moving in and
234 out) help to give plausibility to the simulated action. Note that
235 the sense of vision, while critical in creating the simulation,
236 is not critical for identifying the action from the simulation.
237 </li>
238 </ul>
241 <p>
242 More generally, I expect imaginative systems to be particularly
243 good at identifying embodied actions in videos.
244 </p>
245 </div>
246 </div>
248 </div>
250 <div id="outline-container-2" class="outline-2">
251 <h2 id="sec-2">Cortex</h2>
252 <div class="outline-text-2" id="text-2">
255 <p>
256 The previous example involves liquids, the sense of taste, and
257 imagining oneself as a cat. For this thesis I constrain myself to
258 simpler, more easily digitizable senses and situations.
259 </p>
260 <p>
261 My system, <code>Cortex</code> performs imagination in two different simplified
262 worlds: <i>worm world</i> and <i>stick figure world</i>. In each of these
263 worlds, entities capable of imagination recognize actions by
264 simulating the experience from their own perspective, and then
265 recognizing the action from a database of examples.
266 </p>
267 <p>
268 In order to serve as a framework for experiments in imagination,
269 <code>Cortex</code> requires simulated bodies, worlds, and senses like vision,
270 hearing, touch, proprioception, etc.
271 </p>
273 </div>
275 <div id="outline-container-2-1" class="outline-3">
276 <h3 id="sec-2-1">A Video Game Engine takes care of some of the groundwork</h3>
277 <div class="outline-text-3" id="text-2-1">
280 <p>
281 When it comes to simulation environments, the engines used to
282 create the worlds in video games offer top-notch physics and
283 graphics support. These engines also have limited support for
284 creating cameras and rendering 3D sound, which can be repurposed
285 for vision and hearing respectively. Physics collision detection
286 can be expanded to create a sense of touch.
287 </p>
288 <p>
289 jMonkeyEngine3 is one such engine for creating video games in
290 Java. It uses OpenGL to render to the screen and uses screengraphs
291 to avoid drawing things that do not appear on the screen. It has an
292 active community and several games in the pipeline. The engine was
293 not built to serve any particular game but is instead meant to be
294 used for any 3D game. I chose jMonkeyEngine3 it because it had the
295 most features out of all the open projects I looked at, and because
296 I could then write my code in Clojure, an implementation of LISP
297 that runs on the JVM.
298 </p>
299 </div>
301 </div>
303 <div id="outline-container-2-2" class="outline-3">
304 <h3 id="sec-2-2"><code>CORTEX</code> Extends jMonkeyEngine3 to implement rich senses</h3>
305 <div class="outline-text-3" id="text-2-2">
308 <p>
309 Using the game-making primitives provided by jMonkeyEngine3, I have
310 constructed every major human sense except for smell and
311 taste. <code>Cortex</code> also provides an interface for creating creatures
312 in Blender, a 3D modeling environment, and then "rigging" the
313 creatures with senses using 3D annotations in Blender. A creature
314 can have any number of senses, and there can be any number of
315 creatures in a simulation.
316 </p>
317 <p>
318 The senses available in <code>Cortex</code> are:
319 </p>
320 <ul>
321 <li><a href="../../cortex/html/vision.html">Vision</a>
322 </li>
323 <li><a href="../../cortex/html/hearing.html">Hearing</a>
324 </li>
325 <li><a href="../../cortex/html/touch.html">Touch</a>
326 </li>
327 <li><a href="../../cortex/html/proprioception.html">Proprioception</a>
328 </li>
329 <li><a href="../../cortex/html/movement.html">Muscle Tension</a>
330 </li>
331 </ul>
334 </div>
335 </div>
337 </div>
339 <div id="outline-container-3" class="outline-2">
340 <h2 id="sec-3">A roadmap for <code>Cortex</code> experiments</h2>
341 <div class="outline-text-2" id="text-3">
345 </div>
347 <div id="outline-container-3-1" class="outline-3">
348 <h3 id="sec-3-1">Worm World</h3>
349 <div class="outline-text-3" id="text-3-1">
352 <p>
353 Worms in <code>Cortex</code> are segmented creatures which vary in length and
354 number of segments, and have the senses of vision, proprioception,
355 touch, and muscle tension.
356 </p>
358 <div class="figure">
359 <p><img src="../images/finger-UV.png" width=755 alt="../images/finger-UV.png" /></p>
360 <p>This is the tactile-sensor-profile for the upper segment of a worm. It defines regions of high touch sensitivity (where there are many white pixels) and regions of low sensitivity (where white pixels are sparse).</p>
361 </div>
366 <div class="figure">
367 <center>
368 <video controls="controls" width="550">
369 <source src="../video/worm-touch.ogg" type="video/ogg"
370 preload="none" />
371 </video>
372 <br> <a href="http://youtu.be/RHx2wqzNVcU"> YouTube </a>
373 </center>
374 <p>The worm responds to touch.</p>
375 </div>
377 <div class="figure">
378 <center>
379 <video controls="controls" width="550">
380 <source src="../video/test-proprioception.ogg" type="video/ogg"
381 preload="none" />
382 </video>
383 <br> <a href="http://youtu.be/JjdDmyM8b0w"> YouTube </a>
384 </center>
385 <p>Proprioception in a worm. The proprioceptive readout is
386 in the upper left corner of the screen.</p>
387 </div>
389 <p>
390 A worm is trained in various actions such as sinusoidal movement,
391 curling, flailing, and spinning by directly playing motor
392 contractions while the worm "feels" the experience. These actions
393 are recorded both as vectors of muscle tension, touch, and
394 proprioceptive data, but also in higher level forms such as
395 frequencies of the various contractions and a symbolic name for the
396 action.
397 </p>
398 <p>
399 Then, the worm watches a video of another worm performing one of
400 the actions, and must judge which action was performed. Normally
401 this would be an extremely difficult problem, but the worm is able
402 to greatly diminish the search space through sympathetic
403 imagination. First, it creates an imagined copy of its body which
404 it observes from a third person point of view. Then for each frame
405 of the video, it maneuvers its simulated body to be in registration
406 with the worm depicted in the video. The physical constraints
407 imposed by the physics simulation greatly decrease the number of
408 poses that have to be tried, making the search feasible. As the
409 imaginary worm moves, it generates imaginary muscle tension and
410 proprioceptive sensations. The worm determines the action not by
411 vision, but by matching the imagined proprioceptive data with
412 previous examples.
413 </p>
414 <p>
415 By using non-visual sensory data such as touch, the worms can also
416 answer body related questions such as "did your head touch your
417 tail?" and "did worm A touch worm B?"
418 </p>
419 <p>
420 The proprioceptive information used for action identification is
421 body-centric, so only the registration step is dependent on point
422 of view, not the identification step. Registration is not specific
423 to any particular action. Thus, action identification can be
424 divided into a point-of-view dependent generic registration step,
425 and a action-specific step that is body-centered and invariant to
426 point of view.
427 </p>
428 </div>
430 </div>
432 <div id="outline-container-3-2" class="outline-3">
433 <h3 id="sec-3-2">Stick Figure World</h3>
434 <div class="outline-text-3" id="text-3-2">
437 <p>
438 This environment is similar to Worm World, except the creatures are
439 more complicated and the actions and questions more varied. It is
440 an experiment to see how far imagination can go in interpreting
441 actions.
442 </p></div>
443 </div>
444 </div>
445 </div>
447 <div id="postamble">
448 <p class="date">Date: 2013-11-07 04:21:29 EST</p>
449 <p class="author">Author: Robert McIntyre</p>
450 <p class="creator">Org version 7.7 with Emacs version 24</p>
451 <a href="http://validator.w3.org/check?uri=referer">Validate XHTML 1.0</a>
453 </div>
454 </body>
455 </html>