annotate thesis/org/first-chapter.html @ 570:9647f0168287

update for interview.
author Robert McIntyre <rlm@mit.edu>
date Tue, 15 Jul 2014 02:46:00 -0400
parents 5205535237fb
children
rev   line source
rlm@401 1 <?xml version="1.0" encoding="utf-8"?>
rlm@401 2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
rlm@401 3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
rlm@401 4 <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
rlm@401 5 <head>
rlm@401 6 <title><code>CORTEX</code></title>
rlm@401 7 <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
rlm@401 8 <meta name="title" content="<code>CORTEX</code>"/>
rlm@401 9 <meta name="generator" content="Org-mode"/>
rlm@401 10 <meta name="generated" content="2013-11-07 04:21:29 EST"/>
rlm@401 11 <meta name="author" content="Robert McIntyre"/>
rlm@401 12 <meta name="description" content="Using embodied AI to facilitate Artificial Imagination."/>
rlm@401 13 <meta name="keywords" content="AI, clojure, embodiment"/>
rlm@401 14 <style type="text/css">
rlm@401 15 <!--/*--><![CDATA[/*><!--*/
rlm@401 16 html { font-family: Times, serif; font-size: 12pt; }
rlm@401 17 .title { text-align: center; }
rlm@401 18 .todo { color: red; }
rlm@401 19 .done { color: green; }
rlm@401 20 .tag { background-color: #add8e6; font-weight:normal }
rlm@401 21 .target { }
rlm@401 22 .timestamp { color: #bebebe; }
rlm@401 23 .timestamp-kwd { color: #5f9ea0; }
rlm@401 24 .right {margin-left:auto; margin-right:0px; text-align:right;}
rlm@401 25 .left {margin-left:0px; margin-right:auto; text-align:left;}
rlm@401 26 .center {margin-left:auto; margin-right:auto; text-align:center;}
rlm@401 27 p.verse { margin-left: 3% }
rlm@401 28 pre {
rlm@401 29 border: 1pt solid #AEBDCC;
rlm@401 30 background-color: #F3F5F7;
rlm@401 31 padding: 5pt;
rlm@401 32 font-family: courier, monospace;
rlm@401 33 font-size: 90%;
rlm@401 34 overflow:auto;
rlm@401 35 }
rlm@401 36 table { border-collapse: collapse; }
rlm@401 37 td, th { vertical-align: top; }
rlm@401 38 th.right { text-align:center; }
rlm@401 39 th.left { text-align:center; }
rlm@401 40 th.center { text-align:center; }
rlm@401 41 td.right { text-align:right; }
rlm@401 42 td.left { text-align:left; }
rlm@401 43 td.center { text-align:center; }
rlm@401 44 dt { font-weight: bold; }
rlm@401 45 div.figure { padding: 0.5em; }
rlm@401 46 div.figure p { text-align: center; }
rlm@401 47 div.inlinetask {
rlm@401 48 padding:10px;
rlm@401 49 border:2px solid gray;
rlm@401 50 margin:10px;
rlm@401 51 background: #ffffcc;
rlm@401 52 }
rlm@401 53 textarea { overflow-x: auto; }
rlm@401 54 .linenr { font-size:smaller }
rlm@401 55 .code-highlighted {background-color:#ffff00;}
rlm@401 56 .org-info-js_info-navigation { border-style:none; }
rlm@401 57 #org-info-js_console-label { font-size:10px; font-weight:bold;
rlm@401 58 white-space:nowrap; }
rlm@401 59 .org-info-js_search-highlight {background-color:#ffff00; color:#000000;
rlm@401 60 font-weight:bold; }
rlm@401 61 /*]]>*/-->
rlm@401 62 </style>
rlm@401 63 <script type="text/javascript">var _gaq = _gaq || [];_gaq.push(['_setAccount', 'UA-31261312-1']);_gaq.push(['_trackPageview']);(function() {var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);})();</script><link rel="stylesheet" type="text/css" href="../../aurellem/css/argentum.css" />
rlm@401 64 <script type="text/javascript">
rlm@401 65 <!--/*--><![CDATA[/*><!--*/
rlm@401 66 function CodeHighlightOn(elem, id)
rlm@401 67 {
rlm@401 68 var target = document.getElementById(id);
rlm@401 69 if(null != target) {
rlm@401 70 elem.cacheClassElem = elem.className;
rlm@401 71 elem.cacheClassTarget = target.className;
rlm@401 72 target.className = "code-highlighted";
rlm@401 73 elem.className = "code-highlighted";
rlm@401 74 }
rlm@401 75 }
rlm@401 76 function CodeHighlightOff(elem, id)
rlm@401 77 {
rlm@401 78 var target = document.getElementById(id);
rlm@401 79 if(elem.cacheClassElem)
rlm@401 80 elem.className = elem.cacheClassElem;
rlm@401 81 if(elem.cacheClassTarget)
rlm@401 82 target.className = elem.cacheClassTarget;
rlm@401 83 }
rlm@401 84 /*]]>*///-->
rlm@401 85 </script>
rlm@401 86
rlm@401 87 </head>
rlm@401 88 <body>
rlm@401 89
rlm@401 90
rlm@401 91 <div id="content">
rlm@401 92 <h1 class="title"><code>CORTEX</code></h1>
rlm@401 93
rlm@401 94
rlm@401 95 <div class="header">
rlm@401 96 <div class="float-right">
rlm@401 97 <!--
rlm@401 98 <form>
rlm@401 99 <input type="text"/><input type="submit" value="search the blog &raquo;"/>
rlm@401 100 </form>
rlm@401 101 -->
rlm@401 102 </div>
rlm@401 103
rlm@401 104 <h1>aurellem <em>&#x2609;</em></h1>
rlm@401 105 <ul class="nav">
rlm@401 106 <li><a href="/">read the blog &raquo;</a></li>
rlm@401 107 <!-- li><a href="#">learn about us &raquo;</a></li-->
rlm@401 108 </ul>
rlm@401 109 </div>
rlm@401 110
rlm@401 111 <div class="author">Written by <author>Robert McIntyre</author></div>
rlm@401 112
rlm@401 113
rlm@401 114
rlm@401 115
rlm@401 116
rlm@401 117
rlm@401 118
rlm@401 119 <div id="outline-container-1" class="outline-2">
rlm@401 120 <h2 id="sec-1">Artificial Imagination</h2>
rlm@401 121 <div class="outline-text-2" id="text-1">
rlm@401 122
rlm@401 123
rlm@401 124 <p>
rlm@401 125 Imagine watching a video of someone skateboarding. When you watch
rlm@401 126 the video, you can imagine yourself skateboarding, and your
rlm@401 127 knowledge of the human body and its dynamics guides your
rlm@401 128 interpretation of the scene. For example, even if the skateboarder
rlm@401 129 is partially occluded, you can infer the positions of his arms and
rlm@401 130 body from your own knowledge of how your body would be positioned if
rlm@401 131 you were skateboarding. If the skateboarder suffers an accident, you
rlm@401 132 wince in sympathy, imagining the pain your own body would experience
rlm@401 133 if it were in the same situation. This empathy with other people
rlm@401 134 guides our understanding of whatever they are doing because it is a
rlm@401 135 powerful constraint on what is probable and possible. In order to
rlm@401 136 make use of this powerful empathy constraint, I need a system that
rlm@401 137 can generate and make sense of sensory data from the many different
rlm@401 138 senses that humans possess. The two key proprieties of such a system
rlm@401 139 are <i>embodiment</i> and <i>imagination</i>.
rlm@401 140 </p>
rlm@401 141
rlm@401 142 </div>
rlm@401 143
rlm@401 144 <div id="outline-container-1-1" class="outline-3">
rlm@401 145 <h3 id="sec-1-1">What is imagination?</h3>
rlm@401 146 <div class="outline-text-3" id="text-1-1">
rlm@401 147
rlm@401 148
rlm@401 149 <p>
rlm@401 150 One kind of imagination is <i>sympathetic</i> imagination: you imagine
rlm@401 151 yourself in the position of something/someone you are
rlm@401 152 observing. This type of imagination comes into play when you follow
rlm@401 153 along visually when watching someone perform actions, or when you
rlm@401 154 sympathetically grimace when someone hurts themselves. This type of
rlm@401 155 imagination uses the constraints you have learned about your own
rlm@401 156 body to highly constrain the possibilities in whatever you are
rlm@401 157 seeing. It uses all your senses to including your senses of touch,
rlm@401 158 proprioception, etc. Humans are flexible when it comes to "putting
rlm@401 159 themselves in another's shoes," and can sympathetically understand
rlm@401 160 not only other humans, but entities ranging animals to cartoon
rlm@401 161 characters to <a href="http://www.youtube.com/watch?v=0jz4HcwTQmU">single dots</a> on a screen!
rlm@401 162 </p>
rlm@401 163 <p>
rlm@401 164 Another kind of imagination is <i>predictive</i> imagination: you
rlm@401 165 construct scenes in your mind that are not entirely related to
rlm@401 166 whatever you are observing, but instead are predictions of the
rlm@401 167 future or simply flights of fancy. You use this type of imagination
rlm@401 168 to plan out multi-step actions, or play out dangerous situations in
rlm@401 169 your mind so as to avoid messing them up in reality.
rlm@401 170 </p>
rlm@401 171 <p>
rlm@401 172 Of course, sympathetic and predictive imagination blend into each
rlm@401 173 other and are not completely separate concepts. One dimension along
rlm@401 174 which you can distinguish types of imagination is dependence on raw
rlm@401 175 sense data. Sympathetic imagination is highly constrained by your
rlm@401 176 senses, while predictive imagination can be more or less dependent
rlm@401 177 on your senses depending on how far ahead you imagine. Daydreaming
rlm@401 178 is an extreme form of predictive imagination that wanders through
rlm@401 179 different possibilities without concern for whether they are
rlm@401 180 related to whatever is happening in reality.
rlm@401 181 </p>
rlm@401 182 <p>
rlm@401 183 For this thesis, I will mostly focus on sympathetic imagination and
rlm@401 184 the constraint it provides for understanding sensory data.
rlm@401 185 </p>
rlm@401 186 </div>
rlm@401 187
rlm@401 188 </div>
rlm@401 189
rlm@401 190 <div id="outline-container-1-2" class="outline-3">
rlm@401 191 <h3 id="sec-1-2">What problems can imagination solve?</h3>
rlm@401 192 <div class="outline-text-3" id="text-1-2">
rlm@401 193
rlm@401 194
rlm@401 195 <p>
rlm@401 196 Consider a video of a cat drinking some water.
rlm@401 197 </p>
rlm@401 198
rlm@401 199 <div class="figure">
rlm@401 200 <p><img src="../images/cat-drinking.jpg" alt="../images/cat-drinking.jpg" /></p>
rlm@401 201 <p>A cat drinking some water. Identifying this action is beyond the state of the art for computers.</p>
rlm@401 202 </div>
rlm@401 203
rlm@401 204 <p>
rlm@401 205 It is currently impossible for any computer program to reliably
rlm@401 206 label such an video as "drinking". I think humans are able to label
rlm@401 207 such video as "drinking" because they imagine <i>themselves</i> as the
rlm@401 208 cat, and imagine putting their face up against a stream of water
rlm@401 209 and sticking out their tongue. In that imagined world, they can
rlm@401 210 feel the cool water hitting their tongue, and feel the water
rlm@401 211 entering their body, and are able to recognize that <i>feeling</i> as
rlm@401 212 drinking. So, the label of the action is not really in the pixels
rlm@401 213 of the image, but is found clearly in a simulation inspired by
rlm@401 214 those pixels. An imaginative system, having been trained on
rlm@401 215 drinking and non-drinking examples and learning that the most
rlm@401 216 important component of drinking is the feeling of water sliding
rlm@401 217 down one's throat, would analyze a video of a cat drinking in the
rlm@401 218 following manner:
rlm@401 219 </p>
rlm@401 220 <ul>
rlm@401 221 <li>Create a physical model of the video by putting a "fuzzy" model
rlm@401 222 of its own body in place of the cat. Also, create a simulation of
rlm@401 223 the stream of water.
rlm@401 224
rlm@401 225 </li>
rlm@401 226 <li>Play out this simulated scene and generate imagined sensory
rlm@401 227 experience. This will include relevant muscle contractions, a
rlm@401 228 close up view of the stream from the cat's perspective, and most
rlm@401 229 importantly, the imagined feeling of water entering the mouth.
rlm@401 230
rlm@401 231 </li>
rlm@401 232 <li>The action is now easily identified as drinking by the sense of
rlm@401 233 taste alone. The other senses (such as the tongue moving in and
rlm@401 234 out) help to give plausibility to the simulated action. Note that
rlm@401 235 the sense of vision, while critical in creating the simulation,
rlm@401 236 is not critical for identifying the action from the simulation.
rlm@401 237 </li>
rlm@401 238 </ul>
rlm@401 239
rlm@401 240
rlm@401 241 <p>
rlm@401 242 More generally, I expect imaginative systems to be particularly
rlm@401 243 good at identifying embodied actions in videos.
rlm@401 244 </p>
rlm@401 245 </div>
rlm@401 246 </div>
rlm@401 247
rlm@401 248 </div>
rlm@401 249
rlm@401 250 <div id="outline-container-2" class="outline-2">
rlm@401 251 <h2 id="sec-2">Cortex</h2>
rlm@401 252 <div class="outline-text-2" id="text-2">
rlm@401 253
rlm@401 254
rlm@401 255 <p>
rlm@401 256 The previous example involves liquids, the sense of taste, and
rlm@401 257 imagining oneself as a cat. For this thesis I constrain myself to
rlm@401 258 simpler, more easily digitizable senses and situations.
rlm@401 259 </p>
rlm@401 260 <p>
rlm@401 261 My system, <code>Cortex</code> performs imagination in two different simplified
rlm@401 262 worlds: <i>worm world</i> and <i>stick figure world</i>. In each of these
rlm@401 263 worlds, entities capable of imagination recognize actions by
rlm@401 264 simulating the experience from their own perspective, and then
rlm@401 265 recognizing the action from a database of examples.
rlm@401 266 </p>
rlm@401 267 <p>
rlm@401 268 In order to serve as a framework for experiments in imagination,
rlm@401 269 <code>Cortex</code> requires simulated bodies, worlds, and senses like vision,
rlm@401 270 hearing, touch, proprioception, etc.
rlm@401 271 </p>
rlm@401 272
rlm@401 273 </div>
rlm@401 274
rlm@401 275 <div id="outline-container-2-1" class="outline-3">
rlm@401 276 <h3 id="sec-2-1">A Video Game Engine takes care of some of the groundwork</h3>
rlm@401 277 <div class="outline-text-3" id="text-2-1">
rlm@401 278
rlm@401 279
rlm@401 280 <p>
rlm@401 281 When it comes to simulation environments, the engines used to
rlm@401 282 create the worlds in video games offer top-notch physics and
rlm@401 283 graphics support. These engines also have limited support for
rlm@401 284 creating cameras and rendering 3D sound, which can be repurposed
rlm@401 285 for vision and hearing respectively. Physics collision detection
rlm@401 286 can be expanded to create a sense of touch.
rlm@401 287 </p>
rlm@401 288 <p>
rlm@401 289 jMonkeyEngine3 is one such engine for creating video games in
rlm@401 290 Java. It uses OpenGL to render to the screen and uses screengraphs
rlm@401 291 to avoid drawing things that do not appear on the screen. It has an
rlm@401 292 active community and several games in the pipeline. The engine was
rlm@401 293 not built to serve any particular game but is instead meant to be
rlm@401 294 used for any 3D game. I chose jMonkeyEngine3 it because it had the
rlm@401 295 most features out of all the open projects I looked at, and because
rlm@401 296 I could then write my code in Clojure, an implementation of LISP
rlm@401 297 that runs on the JVM.
rlm@401 298 </p>
rlm@401 299 </div>
rlm@401 300
rlm@401 301 </div>
rlm@401 302
rlm@401 303 <div id="outline-container-2-2" class="outline-3">
rlm@401 304 <h3 id="sec-2-2"><code>CORTEX</code> Extends jMonkeyEngine3 to implement rich senses</h3>
rlm@401 305 <div class="outline-text-3" id="text-2-2">
rlm@401 306
rlm@401 307
rlm@401 308 <p>
rlm@401 309 Using the game-making primitives provided by jMonkeyEngine3, I have
rlm@401 310 constructed every major human sense except for smell and
rlm@401 311 taste. <code>Cortex</code> also provides an interface for creating creatures
rlm@401 312 in Blender, a 3D modeling environment, and then "rigging" the
rlm@401 313 creatures with senses using 3D annotations in Blender. A creature
rlm@401 314 can have any number of senses, and there can be any number of
rlm@401 315 creatures in a simulation.
rlm@401 316 </p>
rlm@401 317 <p>
rlm@401 318 The senses available in <code>Cortex</code> are:
rlm@401 319 </p>
rlm@401 320 <ul>
rlm@401 321 <li><a href="../../cortex/html/vision.html">Vision</a>
rlm@401 322 </li>
rlm@401 323 <li><a href="../../cortex/html/hearing.html">Hearing</a>
rlm@401 324 </li>
rlm@401 325 <li><a href="../../cortex/html/touch.html">Touch</a>
rlm@401 326 </li>
rlm@401 327 <li><a href="../../cortex/html/proprioception.html">Proprioception</a>
rlm@401 328 </li>
rlm@401 329 <li><a href="../../cortex/html/movement.html">Muscle Tension</a>
rlm@401 330 </li>
rlm@401 331 </ul>
rlm@401 332
rlm@401 333
rlm@401 334 </div>
rlm@401 335 </div>
rlm@401 336
rlm@401 337 </div>
rlm@401 338
rlm@401 339 <div id="outline-container-3" class="outline-2">
rlm@401 340 <h2 id="sec-3">A roadmap for <code>Cortex</code> experiments</h2>
rlm@401 341 <div class="outline-text-2" id="text-3">
rlm@401 342
rlm@401 343
rlm@401 344
rlm@401 345 </div>
rlm@401 346
rlm@401 347 <div id="outline-container-3-1" class="outline-3">
rlm@401 348 <h3 id="sec-3-1">Worm World</h3>
rlm@401 349 <div class="outline-text-3" id="text-3-1">
rlm@401 350
rlm@401 351
rlm@401 352 <p>
rlm@401 353 Worms in <code>Cortex</code> are segmented creatures which vary in length and
rlm@401 354 number of segments, and have the senses of vision, proprioception,
rlm@401 355 touch, and muscle tension.
rlm@401 356 </p>
rlm@401 357
rlm@401 358 <div class="figure">
rlm@401 359 <p><img src="../images/finger-UV.png" width=755 alt="../images/finger-UV.png" /></p>
rlm@401 360 <p>This is the tactile-sensor-profile for the upper segment of a worm. It defines regions of high touch sensitivity (where there are many white pixels) and regions of low sensitivity (where white pixels are sparse).</p>
rlm@401 361 </div>
rlm@401 362
rlm@401 363
rlm@401 364
rlm@401 365
rlm@401 366 <div class="figure">
rlm@401 367 <center>
rlm@401 368 <video controls="controls" width="550">
rlm@401 369 <source src="../video/worm-touch.ogg" type="video/ogg"
rlm@401 370 preload="none" />
rlm@401 371 </video>
rlm@401 372 <br> <a href="http://youtu.be/RHx2wqzNVcU"> YouTube </a>
rlm@401 373 </center>
rlm@401 374 <p>The worm responds to touch.</p>
rlm@401 375 </div>
rlm@401 376
rlm@401 377 <div class="figure">
rlm@401 378 <center>
rlm@401 379 <video controls="controls" width="550">
rlm@401 380 <source src="../video/test-proprioception.ogg" type="video/ogg"
rlm@401 381 preload="none" />
rlm@401 382 </video>
rlm@401 383 <br> <a href="http://youtu.be/JjdDmyM8b0w"> YouTube </a>
rlm@401 384 </center>
rlm@401 385 <p>Proprioception in a worm. The proprioceptive readout is
rlm@401 386 in the upper left corner of the screen.</p>
rlm@401 387 </div>
rlm@401 388
rlm@401 389 <p>
rlm@401 390 A worm is trained in various actions such as sinusoidal movement,
rlm@401 391 curling, flailing, and spinning by directly playing motor
rlm@401 392 contractions while the worm "feels" the experience. These actions
rlm@401 393 are recorded both as vectors of muscle tension, touch, and
rlm@401 394 proprioceptive data, but also in higher level forms such as
rlm@401 395 frequencies of the various contractions and a symbolic name for the
rlm@401 396 action.
rlm@401 397 </p>
rlm@401 398 <p>
rlm@401 399 Then, the worm watches a video of another worm performing one of
rlm@401 400 the actions, and must judge which action was performed. Normally
rlm@401 401 this would be an extremely difficult problem, but the worm is able
rlm@401 402 to greatly diminish the search space through sympathetic
rlm@401 403 imagination. First, it creates an imagined copy of its body which
rlm@401 404 it observes from a third person point of view. Then for each frame
rlm@401 405 of the video, it maneuvers its simulated body to be in registration
rlm@401 406 with the worm depicted in the video. The physical constraints
rlm@401 407 imposed by the physics simulation greatly decrease the number of
rlm@401 408 poses that have to be tried, making the search feasible. As the
rlm@401 409 imaginary worm moves, it generates imaginary muscle tension and
rlm@401 410 proprioceptive sensations. The worm determines the action not by
rlm@401 411 vision, but by matching the imagined proprioceptive data with
rlm@401 412 previous examples.
rlm@401 413 </p>
rlm@401 414 <p>
rlm@401 415 By using non-visual sensory data such as touch, the worms can also
rlm@401 416 answer body related questions such as "did your head touch your
rlm@401 417 tail?" and "did worm A touch worm B?"
rlm@401 418 </p>
rlm@401 419 <p>
rlm@401 420 The proprioceptive information used for action identification is
rlm@401 421 body-centric, so only the registration step is dependent on point
rlm@401 422 of view, not the identification step. Registration is not specific
rlm@401 423 to any particular action. Thus, action identification can be
rlm@401 424 divided into a point-of-view dependent generic registration step,
rlm@401 425 and a action-specific step that is body-centered and invariant to
rlm@401 426 point of view.
rlm@401 427 </p>
rlm@401 428 </div>
rlm@401 429
rlm@401 430 </div>
rlm@401 431
rlm@401 432 <div id="outline-container-3-2" class="outline-3">
rlm@401 433 <h3 id="sec-3-2">Stick Figure World</h3>
rlm@401 434 <div class="outline-text-3" id="text-3-2">
rlm@401 435
rlm@401 436
rlm@401 437 <p>
rlm@401 438 This environment is similar to Worm World, except the creatures are
rlm@401 439 more complicated and the actions and questions more varied. It is
rlm@401 440 an experiment to see how far imagination can go in interpreting
rlm@401 441 actions.
rlm@401 442 </p></div>
rlm@401 443 </div>
rlm@401 444 </div>
rlm@401 445 </div>
rlm@401 446
rlm@401 447 <div id="postamble">
rlm@401 448 <p class="date">Date: 2013-11-07 04:21:29 EST</p>
rlm@401 449 <p class="author">Author: Robert McIntyre</p>
rlm@401 450 <p class="creator">Org version 7.7 with Emacs version 24</p>
rlm@401 451 <a href="http://validator.w3.org/check?uri=referer">Validate XHTML 1.0</a>
rlm@401 452
rlm@401 453 </div>
rlm@401 454 </body>
rlm@401 455 </html>