Mercurial > cortex
comparison thesis/org/first-chapter.html @ 401:7ee735a836da
incorporate thesis.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Sun, 16 Mar 2014 23:31:16 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
400:6ba908c1a0a9 | 401:7ee735a836da |
---|---|
1 <?xml version="1.0" encoding="utf-8"?> | |
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" | |
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> | |
4 <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> | |
5 <head> | |
6 <title><code>CORTEX</code></title> | |
7 <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> | |
8 <meta name="title" content="<code>CORTEX</code>"/> | |
9 <meta name="generator" content="Org-mode"/> | |
10 <meta name="generated" content="2013-11-07 04:21:29 EST"/> | |
11 <meta name="author" content="Robert McIntyre"/> | |
12 <meta name="description" content="Using embodied AI to facilitate Artificial Imagination."/> | |
13 <meta name="keywords" content="AI, clojure, embodiment"/> | |
14 <style type="text/css"> | |
15 <!--/*--><![CDATA[/*><!--*/ | |
16 html { font-family: Times, serif; font-size: 12pt; } | |
17 .title { text-align: center; } | |
18 .todo { color: red; } | |
19 .done { color: green; } | |
20 .tag { background-color: #add8e6; font-weight:normal } | |
21 .target { } | |
22 .timestamp { color: #bebebe; } | |
23 .timestamp-kwd { color: #5f9ea0; } | |
24 .right {margin-left:auto; margin-right:0px; text-align:right;} | |
25 .left {margin-left:0px; margin-right:auto; text-align:left;} | |
26 .center {margin-left:auto; margin-right:auto; text-align:center;} | |
27 p.verse { margin-left: 3% } | |
28 pre { | |
29 border: 1pt solid #AEBDCC; | |
30 background-color: #F3F5F7; | |
31 padding: 5pt; | |
32 font-family: courier, monospace; | |
33 font-size: 90%; | |
34 overflow:auto; | |
35 } | |
36 table { border-collapse: collapse; } | |
37 td, th { vertical-align: top; } | |
38 th.right { text-align:center; } | |
39 th.left { text-align:center; } | |
40 th.center { text-align:center; } | |
41 td.right { text-align:right; } | |
42 td.left { text-align:left; } | |
43 td.center { text-align:center; } | |
44 dt { font-weight: bold; } | |
45 div.figure { padding: 0.5em; } | |
46 div.figure p { text-align: center; } | |
47 div.inlinetask { | |
48 padding:10px; | |
49 border:2px solid gray; | |
50 margin:10px; | |
51 background: #ffffcc; | |
52 } | |
53 textarea { overflow-x: auto; } | |
54 .linenr { font-size:smaller } | |
55 .code-highlighted {background-color:#ffff00;} | |
56 .org-info-js_info-navigation { border-style:none; } | |
57 #org-info-js_console-label { font-size:10px; font-weight:bold; | |
58 white-space:nowrap; } | |
59 .org-info-js_search-highlight {background-color:#ffff00; color:#000000; | |
60 font-weight:bold; } | |
61 /*]]>*/--> | |
62 </style> | |
63 <script type="text/javascript">var _gaq = _gaq || [];_gaq.push(['_setAccount', 'UA-31261312-1']);_gaq.push(['_trackPageview']);(function() {var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);})();</script><link rel="stylesheet" type="text/css" href="../../aurellem/css/argentum.css" /> | |
64 <script type="text/javascript"> | |
65 <!--/*--><![CDATA[/*><!--*/ | |
66 function CodeHighlightOn(elem, id) | |
67 { | |
68 var target = document.getElementById(id); | |
69 if(null != target) { | |
70 elem.cacheClassElem = elem.className; | |
71 elem.cacheClassTarget = target.className; | |
72 target.className = "code-highlighted"; | |
73 elem.className = "code-highlighted"; | |
74 } | |
75 } | |
76 function CodeHighlightOff(elem, id) | |
77 { | |
78 var target = document.getElementById(id); | |
79 if(elem.cacheClassElem) | |
80 elem.className = elem.cacheClassElem; | |
81 if(elem.cacheClassTarget) | |
82 target.className = elem.cacheClassTarget; | |
83 } | |
84 /*]]>*///--> | |
85 </script> | |
86 | |
87 </head> | |
88 <body> | |
89 | |
90 | |
91 <div id="content"> | |
92 <h1 class="title"><code>CORTEX</code></h1> | |
93 | |
94 | |
95 <div class="header"> | |
96 <div class="float-right"> | |
97 <!-- | |
98 <form> | |
99 <input type="text"/><input type="submit" value="search the blog »"/> | |
100 </form> | |
101 --> | |
102 </div> | |
103 | |
104 <h1>aurellem <em>☉</em></h1> | |
105 <ul class="nav"> | |
106 <li><a href="/">read the blog »</a></li> | |
107 <!-- li><a href="#">learn about us »</a></li--> | |
108 </ul> | |
109 </div> | |
110 | |
111 <div class="author">Written by <author>Robert McIntyre</author></div> | |
112 | |
113 | |
114 | |
115 | |
116 | |
117 | |
118 | |
119 <div id="outline-container-1" class="outline-2"> | |
120 <h2 id="sec-1">Artificial Imagination</h2> | |
121 <div class="outline-text-2" id="text-1"> | |
122 | |
123 | |
124 <p> | |
125 Imagine watching a video of someone skateboarding. When you watch | |
126 the video, you can imagine yourself skateboarding, and your | |
127 knowledge of the human body and its dynamics guides your | |
128 interpretation of the scene. For example, even if the skateboarder | |
129 is partially occluded, you can infer the positions of his arms and | |
130 body from your own knowledge of how your body would be positioned if | |
131 you were skateboarding. If the skateboarder suffers an accident, you | |
132 wince in sympathy, imagining the pain your own body would experience | |
133 if it were in the same situation. This empathy with other people | |
134 guides our understanding of whatever they are doing because it is a | |
135 powerful constraint on what is probable and possible. In order to | |
136 make use of this powerful empathy constraint, I need a system that | |
137 can generate and make sense of sensory data from the many different | |
138 senses that humans possess. The two key proprieties of such a system | |
139 are <i>embodiment</i> and <i>imagination</i>. | |
140 </p> | |
141 | |
142 </div> | |
143 | |
144 <div id="outline-container-1-1" class="outline-3"> | |
145 <h3 id="sec-1-1">What is imagination?</h3> | |
146 <div class="outline-text-3" id="text-1-1"> | |
147 | |
148 | |
149 <p> | |
150 One kind of imagination is <i>sympathetic</i> imagination: you imagine | |
151 yourself in the position of something/someone you are | |
152 observing. This type of imagination comes into play when you follow | |
153 along visually when watching someone perform actions, or when you | |
154 sympathetically grimace when someone hurts themselves. This type of | |
155 imagination uses the constraints you have learned about your own | |
156 body to highly constrain the possibilities in whatever you are | |
157 seeing. It uses all your senses to including your senses of touch, | |
158 proprioception, etc. Humans are flexible when it comes to "putting | |
159 themselves in another's shoes," and can sympathetically understand | |
160 not only other humans, but entities ranging animals to cartoon | |
161 characters to <a href="http://www.youtube.com/watch?v=0jz4HcwTQmU">single dots</a> on a screen! | |
162 </p> | |
163 <p> | |
164 Another kind of imagination is <i>predictive</i> imagination: you | |
165 construct scenes in your mind that are not entirely related to | |
166 whatever you are observing, but instead are predictions of the | |
167 future or simply flights of fancy. You use this type of imagination | |
168 to plan out multi-step actions, or play out dangerous situations in | |
169 your mind so as to avoid messing them up in reality. | |
170 </p> | |
171 <p> | |
172 Of course, sympathetic and predictive imagination blend into each | |
173 other and are not completely separate concepts. One dimension along | |
174 which you can distinguish types of imagination is dependence on raw | |
175 sense data. Sympathetic imagination is highly constrained by your | |
176 senses, while predictive imagination can be more or less dependent | |
177 on your senses depending on how far ahead you imagine. Daydreaming | |
178 is an extreme form of predictive imagination that wanders through | |
179 different possibilities without concern for whether they are | |
180 related to whatever is happening in reality. | |
181 </p> | |
182 <p> | |
183 For this thesis, I will mostly focus on sympathetic imagination and | |
184 the constraint it provides for understanding sensory data. | |
185 </p> | |
186 </div> | |
187 | |
188 </div> | |
189 | |
190 <div id="outline-container-1-2" class="outline-3"> | |
191 <h3 id="sec-1-2">What problems can imagination solve?</h3> | |
192 <div class="outline-text-3" id="text-1-2"> | |
193 | |
194 | |
195 <p> | |
196 Consider a video of a cat drinking some water. | |
197 </p> | |
198 | |
199 <div class="figure"> | |
200 <p><img src="../images/cat-drinking.jpg" alt="../images/cat-drinking.jpg" /></p> | |
201 <p>A cat drinking some water. Identifying this action is beyond the state of the art for computers.</p> | |
202 </div> | |
203 | |
204 <p> | |
205 It is currently impossible for any computer program to reliably | |
206 label such an video as "drinking". I think humans are able to label | |
207 such video as "drinking" because they imagine <i>themselves</i> as the | |
208 cat, and imagine putting their face up against a stream of water | |
209 and sticking out their tongue. In that imagined world, they can | |
210 feel the cool water hitting their tongue, and feel the water | |
211 entering their body, and are able to recognize that <i>feeling</i> as | |
212 drinking. So, the label of the action is not really in the pixels | |
213 of the image, but is found clearly in a simulation inspired by | |
214 those pixels. An imaginative system, having been trained on | |
215 drinking and non-drinking examples and learning that the most | |
216 important component of drinking is the feeling of water sliding | |
217 down one's throat, would analyze a video of a cat drinking in the | |
218 following manner: | |
219 </p> | |
220 <ul> | |
221 <li>Create a physical model of the video by putting a "fuzzy" model | |
222 of its own body in place of the cat. Also, create a simulation of | |
223 the stream of water. | |
224 | |
225 </li> | |
226 <li>Play out this simulated scene and generate imagined sensory | |
227 experience. This will include relevant muscle contractions, a | |
228 close up view of the stream from the cat's perspective, and most | |
229 importantly, the imagined feeling of water entering the mouth. | |
230 | |
231 </li> | |
232 <li>The action is now easily identified as drinking by the sense of | |
233 taste alone. The other senses (such as the tongue moving in and | |
234 out) help to give plausibility to the simulated action. Note that | |
235 the sense of vision, while critical in creating the simulation, | |
236 is not critical for identifying the action from the simulation. | |
237 </li> | |
238 </ul> | |
239 | |
240 | |
241 <p> | |
242 More generally, I expect imaginative systems to be particularly | |
243 good at identifying embodied actions in videos. | |
244 </p> | |
245 </div> | |
246 </div> | |
247 | |
248 </div> | |
249 | |
250 <div id="outline-container-2" class="outline-2"> | |
251 <h2 id="sec-2">Cortex</h2> | |
252 <div class="outline-text-2" id="text-2"> | |
253 | |
254 | |
255 <p> | |
256 The previous example involves liquids, the sense of taste, and | |
257 imagining oneself as a cat. For this thesis I constrain myself to | |
258 simpler, more easily digitizable senses and situations. | |
259 </p> | |
260 <p> | |
261 My system, <code>Cortex</code> performs imagination in two different simplified | |
262 worlds: <i>worm world</i> and <i>stick figure world</i>. In each of these | |
263 worlds, entities capable of imagination recognize actions by | |
264 simulating the experience from their own perspective, and then | |
265 recognizing the action from a database of examples. | |
266 </p> | |
267 <p> | |
268 In order to serve as a framework for experiments in imagination, | |
269 <code>Cortex</code> requires simulated bodies, worlds, and senses like vision, | |
270 hearing, touch, proprioception, etc. | |
271 </p> | |
272 | |
273 </div> | |
274 | |
275 <div id="outline-container-2-1" class="outline-3"> | |
276 <h3 id="sec-2-1">A Video Game Engine takes care of some of the groundwork</h3> | |
277 <div class="outline-text-3" id="text-2-1"> | |
278 | |
279 | |
280 <p> | |
281 When it comes to simulation environments, the engines used to | |
282 create the worlds in video games offer top-notch physics and | |
283 graphics support. These engines also have limited support for | |
284 creating cameras and rendering 3D sound, which can be repurposed | |
285 for vision and hearing respectively. Physics collision detection | |
286 can be expanded to create a sense of touch. | |
287 </p> | |
288 <p> | |
289 jMonkeyEngine3 is one such engine for creating video games in | |
290 Java. It uses OpenGL to render to the screen and uses screengraphs | |
291 to avoid drawing things that do not appear on the screen. It has an | |
292 active community and several games in the pipeline. The engine was | |
293 not built to serve any particular game but is instead meant to be | |
294 used for any 3D game. I chose jMonkeyEngine3 it because it had the | |
295 most features out of all the open projects I looked at, and because | |
296 I could then write my code in Clojure, an implementation of LISP | |
297 that runs on the JVM. | |
298 </p> | |
299 </div> | |
300 | |
301 </div> | |
302 | |
303 <div id="outline-container-2-2" class="outline-3"> | |
304 <h3 id="sec-2-2"><code>CORTEX</code> Extends jMonkeyEngine3 to implement rich senses</h3> | |
305 <div class="outline-text-3" id="text-2-2"> | |
306 | |
307 | |
308 <p> | |
309 Using the game-making primitives provided by jMonkeyEngine3, I have | |
310 constructed every major human sense except for smell and | |
311 taste. <code>Cortex</code> also provides an interface for creating creatures | |
312 in Blender, a 3D modeling environment, and then "rigging" the | |
313 creatures with senses using 3D annotations in Blender. A creature | |
314 can have any number of senses, and there can be any number of | |
315 creatures in a simulation. | |
316 </p> | |
317 <p> | |
318 The senses available in <code>Cortex</code> are: | |
319 </p> | |
320 <ul> | |
321 <li><a href="../../cortex/html/vision.html">Vision</a> | |
322 </li> | |
323 <li><a href="../../cortex/html/hearing.html">Hearing</a> | |
324 </li> | |
325 <li><a href="../../cortex/html/touch.html">Touch</a> | |
326 </li> | |
327 <li><a href="../../cortex/html/proprioception.html">Proprioception</a> | |
328 </li> | |
329 <li><a href="../../cortex/html/movement.html">Muscle Tension</a> | |
330 </li> | |
331 </ul> | |
332 | |
333 | |
334 </div> | |
335 </div> | |
336 | |
337 </div> | |
338 | |
339 <div id="outline-container-3" class="outline-2"> | |
340 <h2 id="sec-3">A roadmap for <code>Cortex</code> experiments</h2> | |
341 <div class="outline-text-2" id="text-3"> | |
342 | |
343 | |
344 | |
345 </div> | |
346 | |
347 <div id="outline-container-3-1" class="outline-3"> | |
348 <h3 id="sec-3-1">Worm World</h3> | |
349 <div class="outline-text-3" id="text-3-1"> | |
350 | |
351 | |
352 <p> | |
353 Worms in <code>Cortex</code> are segmented creatures which vary in length and | |
354 number of segments, and have the senses of vision, proprioception, | |
355 touch, and muscle tension. | |
356 </p> | |
357 | |
358 <div class="figure"> | |
359 <p><img src="../images/finger-UV.png" width=755 alt="../images/finger-UV.png" /></p> | |
360 <p>This is the tactile-sensor-profile for the upper segment of a worm. It defines regions of high touch sensitivity (where there are many white pixels) and regions of low sensitivity (where white pixels are sparse).</p> | |
361 </div> | |
362 | |
363 | |
364 | |
365 | |
366 <div class="figure"> | |
367 <center> | |
368 <video controls="controls" width="550"> | |
369 <source src="../video/worm-touch.ogg" type="video/ogg" | |
370 preload="none" /> | |
371 </video> | |
372 <br> <a href="http://youtu.be/RHx2wqzNVcU"> YouTube </a> | |
373 </center> | |
374 <p>The worm responds to touch.</p> | |
375 </div> | |
376 | |
377 <div class="figure"> | |
378 <center> | |
379 <video controls="controls" width="550"> | |
380 <source src="../video/test-proprioception.ogg" type="video/ogg" | |
381 preload="none" /> | |
382 </video> | |
383 <br> <a href="http://youtu.be/JjdDmyM8b0w"> YouTube </a> | |
384 </center> | |
385 <p>Proprioception in a worm. The proprioceptive readout is | |
386 in the upper left corner of the screen.</p> | |
387 </div> | |
388 | |
389 <p> | |
390 A worm is trained in various actions such as sinusoidal movement, | |
391 curling, flailing, and spinning by directly playing motor | |
392 contractions while the worm "feels" the experience. These actions | |
393 are recorded both as vectors of muscle tension, touch, and | |
394 proprioceptive data, but also in higher level forms such as | |
395 frequencies of the various contractions and a symbolic name for the | |
396 action. | |
397 </p> | |
398 <p> | |
399 Then, the worm watches a video of another worm performing one of | |
400 the actions, and must judge which action was performed. Normally | |
401 this would be an extremely difficult problem, but the worm is able | |
402 to greatly diminish the search space through sympathetic | |
403 imagination. First, it creates an imagined copy of its body which | |
404 it observes from a third person point of view. Then for each frame | |
405 of the video, it maneuvers its simulated body to be in registration | |
406 with the worm depicted in the video. The physical constraints | |
407 imposed by the physics simulation greatly decrease the number of | |
408 poses that have to be tried, making the search feasible. As the | |
409 imaginary worm moves, it generates imaginary muscle tension and | |
410 proprioceptive sensations. The worm determines the action not by | |
411 vision, but by matching the imagined proprioceptive data with | |
412 previous examples. | |
413 </p> | |
414 <p> | |
415 By using non-visual sensory data such as touch, the worms can also | |
416 answer body related questions such as "did your head touch your | |
417 tail?" and "did worm A touch worm B?" | |
418 </p> | |
419 <p> | |
420 The proprioceptive information used for action identification is | |
421 body-centric, so only the registration step is dependent on point | |
422 of view, not the identification step. Registration is not specific | |
423 to any particular action. Thus, action identification can be | |
424 divided into a point-of-view dependent generic registration step, | |
425 and a action-specific step that is body-centered and invariant to | |
426 point of view. | |
427 </p> | |
428 </div> | |
429 | |
430 </div> | |
431 | |
432 <div id="outline-container-3-2" class="outline-3"> | |
433 <h3 id="sec-3-2">Stick Figure World</h3> | |
434 <div class="outline-text-3" id="text-3-2"> | |
435 | |
436 | |
437 <p> | |
438 This environment is similar to Worm World, except the creatures are | |
439 more complicated and the actions and questions more varied. It is | |
440 an experiment to see how far imagination can go in interpreting | |
441 actions. | |
442 </p></div> | |
443 </div> | |
444 </div> | |
445 </div> | |
446 | |
447 <div id="postamble"> | |
448 <p class="date">Date: 2013-11-07 04:21:29 EST</p> | |
449 <p class="author">Author: Robert McIntyre</p> | |
450 <p class="creator">Org version 7.7 with Emacs version 24</p> | |
451 <a href="http://validator.w3.org/check?uri=referer">Validate XHTML 1.0</a> | |
452 | |
453 </div> | |
454 </body> | |
455 </html> |