Mercurial > cortex

     1 <?xml version="1.0" encoding="utf-8"?>

     2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

     3                "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

     4 <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">

     5 <head>

     6 <title><code>CORTEX</code></title>

     7 <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>

     8 <meta name="title" content="<code>CORTEX</code>"/>

     9 <meta name="generator" content="Org-mode"/>

    10 <meta name="generated" content="2013-11-07 04:21:29 EST"/>

    11 <meta name="author" content="Robert McIntyre"/>

    12 <meta name="description" content="Using embodied AI to facilitate Artificial Imagination."/>

    13 <meta name="keywords" content="AI, clojure, embodiment"/>

    14 <style type="text/css">

    15  <!--/*--><![CDATA[/*><!--*/

    16   html { font-family: Times, serif; font-size: 12pt; }

    17   .title  { text-align: center; }

    18   .todo   { color: red; }

    19   .done   { color: green; }

    20   .tag    { background-color: #add8e6; font-weight:normal }

    21   .target { }

    22   .timestamp { color: #bebebe; }

    23   .timestamp-kwd { color: #5f9ea0; }

    24   .right  {margin-left:auto; margin-right:0px;  text-align:right;}

    25   .left   {margin-left:0px;  margin-right:auto; text-align:left;}

    26   .center {margin-left:auto; margin-right:auto; text-align:center;}

    27   p.verse { margin-left: 3% }

    28   pre {

    29 	border: 1pt solid #AEBDCC;

    30 	background-color: #F3F5F7;

    31 	padding: 5pt;

    32 	font-family: courier, monospace;

    33         font-size: 90%;

    34         overflow:auto;

    35   }

    36   table { border-collapse: collapse; }

    37   td, th { vertical-align: top;  }

    38   th.right  { text-align:center;  }

    39   th.left   { text-align:center;   }

    40   th.center { text-align:center; }

    41   td.right  { text-align:right;  }

    42   td.left   { text-align:left;   }

    43   td.center { text-align:center; }

    44   dt { font-weight: bold; }

    45   div.figure { padding: 0.5em; }

    46   div.figure p { text-align: center; }

    47   div.inlinetask {

    48     padding:10px;

    49     border:2px solid gray;

    50     margin:10px;

    51     background: #ffffcc;

    52   }

    53   textarea { overflow-x: auto; }

    54   .linenr { font-size:smaller }

    55   .code-highlighted {background-color:#ffff00;}

    56   .org-info-js_info-navigation { border-style:none; }

    57   #org-info-js_console-label { font-size:10px; font-weight:bold;

    58                                white-space:nowrap; }

    59   .org-info-js_search-highlight {background-color:#ffff00; color:#000000;

    60                                  font-weight:bold; }

    61   /*]]>*/-->

    62 </style>

    63 <script type="text/javascript">var _gaq = _gaq || [];_gaq.push(['_setAccount', 'UA-31261312-1']);_gaq.push(['_trackPageview']);(function() {var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);})();</script><link rel="stylesheet" type="text/css" href="../../aurellem/css/argentum.css" />

    64 <script type="text/javascript">

    65 <!--/*--><![CDATA[/*><!--*/

    66  function CodeHighlightOn(elem, id)

    67  {

    68    var target = document.getElementById(id);

    69    if(null != target) {

    70      elem.cacheClassElem = elem.className;

    71      elem.cacheClassTarget = target.className;

    72      target.className = "code-highlighted";

    73      elem.className   = "code-highlighted";

    74    }

    75  }

    76  function CodeHighlightOff(elem, id)

    77  {

    78    var target = document.getElementById(id);

    79    if(elem.cacheClassElem)

    80      elem.className = elem.cacheClassElem;

    81    if(elem.cacheClassTarget)

    82      target.className = elem.cacheClassTarget;

    83  }

    84 /*]]>*///-->

    85 </script>

    86 

    87 </head>

    88 <body>

    89 

    90 

    91 <div id="content">

    92 <h1 class="title"><code>CORTEX</code></h1>

    93 

    94 

    95 <div class="header">

    96   <div class="float-right">	

    97     <!-- 

    98     <form>

    99       <input type="text"/><input type="submit" value="search the blog &raquo;"/> 

   100     </form>

   101     -->

   102   </div>

   103 

   104   <h1>aurellem <em>&#x2609;</em></h1>

   105   <ul class="nav">

   106     <li><a href="/">read the blog &raquo;</a></li>

   107     <!-- li><a href="#">learn about us &raquo;</a></li-->

   108   </ul>

   109 </div>

   110 

   111 <div class="author">Written by <author>Robert McIntyre</author></div>

   112 

   113 

   114 

   115 

   116 

   117 

   118 

   119 <div id="outline-container-1" class="outline-2">

   120 <h2 id="sec-1">Artificial Imagination</h2>

   121 <div class="outline-text-2" id="text-1">

   122 

   123 

   124 <p>

   125   Imagine watching a video of someone skateboarding. When you watch

   126   the video, you can imagine yourself skateboarding, and your

   127   knowledge of the human body and its dynamics guides your

   128   interpretation of the scene. For example, even if the skateboarder

   129   is partially occluded, you can infer the positions of his arms and

   130   body from your own knowledge of how your body would be positioned if

   131   you were skateboarding. If the skateboarder suffers an accident, you

   132   wince in sympathy, imagining the pain your own body would experience

   133   if it were in the same situation. This empathy with other people

   134   guides our understanding of whatever they are doing because it is a

   135   powerful constraint on what is probable and possible. In order to

   136   make use of this powerful empathy constraint, I need a system that

   137   can generate and make sense of sensory data from the many different

   138   senses that humans possess. The two key proprieties of such a system

   139   are <i>embodiment</i> and <i>imagination</i>.

   140 </p>

   141 

   142 </div>

   143 

   144 <div id="outline-container-1-1" class="outline-3">

   145 <h3 id="sec-1-1">What is imagination?</h3>

   146 <div class="outline-text-3" id="text-1-1">

   147 

   148 

   149 <p>

   150    One kind of imagination is <i>sympathetic</i> imagination: you imagine

   151    yourself in the position of something/someone you are

   152    observing. This type of imagination comes into play when you follow

   153    along visually when watching someone perform actions, or when you

   154    sympathetically grimace when someone hurts themselves. This type of

   155    imagination uses the constraints you have learned about your own

   156    body to highly constrain the possibilities in whatever you are

   157    seeing. It uses all your senses to including your senses of touch,

   158    proprioception, etc. Humans are flexible when it comes to "putting

   159    themselves in another's shoes," and can sympathetically understand

   160    not only other humans, but entities ranging animals to cartoon

   161    characters to <a href="http://www.youtube.com/watch?v=0jz4HcwTQmU">single dots</a> on a screen!

   162 </p>

   163 <p>

   164    Another kind of imagination is <i>predictive</i> imagination: you

   165    construct scenes in your mind that are not entirely related to

   166    whatever you are observing, but instead are predictions of the

   167    future or simply flights of fancy. You use this type of imagination

   168    to plan out multi-step actions, or play out dangerous situations in

   169    your mind so as to avoid messing them up in reality.

   170 </p>

   171 <p>

   172    Of course, sympathetic and predictive imagination blend into each

   173    other and are not completely separate concepts. One dimension along

   174    which you can distinguish types of imagination is dependence on raw

   175    sense data. Sympathetic imagination is highly constrained by your

   176    senses, while predictive imagination can be more or less dependent

   177    on your senses depending on how far ahead you imagine. Daydreaming

   178    is an extreme form of predictive imagination that wanders through

   179    different possibilities without concern for whether they are

   180    related to whatever is happening in reality.

   181 </p>

   182 <p>

   183    For this thesis, I will mostly focus on sympathetic imagination and

   184    the constraint it provides for understanding sensory data.

   185 </p>

   186 </div>

   187 

   188 </div>

   189 

   190 <div id="outline-container-1-2" class="outline-3">

   191 <h3 id="sec-1-2">What problems can imagination solve?</h3>

   192 <div class="outline-text-3" id="text-1-2">

   193 

   194 

   195 <p>

   196    Consider a video of a cat drinking some water.

   197 </p>

   198 

   199 <div class="figure">

   200 <p><img src="../images/cat-drinking.jpg"  alt="../images/cat-drinking.jpg" /></p>

   201 <p>A cat drinking some water. Identifying this action is beyond the state of the art for computers.</p>

   202 </div>

   203 

   204 <p>

   205    It is currently impossible for any computer program to reliably

   206    label such an video as "drinking". I think humans are able to label

   207    such video as "drinking" because they imagine <i>themselves</i> as the

   208    cat, and imagine putting their face up against a stream of water

   209    and sticking out their tongue. In that imagined world, they can

   210    feel the cool water hitting their tongue, and feel the water

   211    entering their body, and are able to recognize that <i>feeling</i> as

   212    drinking. So, the label of the action is not really in the pixels

   213    of the image, but is found clearly in a simulation inspired by

   214    those pixels. An imaginative system, having been trained on

   215    drinking and non-drinking examples and learning that the most

   216    important component of drinking is the feeling of water sliding

   217    down one's throat, would analyze a video of a cat drinking in the

   218    following manner:

   219 </p>

   220 <ul>

   221 <li>Create a physical model of the video by putting a "fuzzy" model

   222      of its own body in place of the cat. Also, create a simulation of

   223      the stream of water.

   224 

   225 </li>

   226 <li>Play out this simulated scene and generate imagined sensory

   227      experience. This will include relevant muscle contractions, a

   228      close up view of the stream from the cat's perspective, and most

   229      importantly, the imagined feeling of water entering the mouth.

   230 

   231 </li>

   232 <li>The action is now easily identified as drinking by the sense of

   233      taste alone. The other senses (such as the tongue moving in and

   234      out) help to give plausibility to the simulated action. Note that

   235      the sense of vision, while critical in creating the simulation,

   236      is not critical for identifying the action from the simulation.

   237 </li>

   238 </ul>

   239 

   240 

   241 <p>

   242    More generally, I expect imaginative systems to be particularly

   243    good at identifying embodied actions in videos.

   244 </p>

   245 </div>

   246 </div>

   247 

   248 </div>

   249 

   250 <div id="outline-container-2" class="outline-2">

   251 <h2 id="sec-2">Cortex</h2>

   252 <div class="outline-text-2" id="text-2">

   253 

   254 

   255 <p>

   256   The previous example involves liquids, the sense of taste, and

   257   imagining oneself as a cat. For this thesis I constrain myself to

   258   simpler, more easily digitizable senses and situations.

   259 </p>

   260 <p>

   261   My system, <code>Cortex</code> performs imagination in two different simplified

   262   worlds: <i>worm world</i> and <i>stick figure world</i>. In each of these

   263   worlds, entities capable of imagination recognize actions by

   264   simulating the experience from their own perspective, and then

   265   recognizing the action from a database of examples.

   266 </p>

   267 <p>

   268   In order to serve as a framework for experiments in imagination,

   269   <code>Cortex</code> requires simulated bodies, worlds, and senses like vision,

   270   hearing, touch, proprioception, etc.

   271 </p>

   272 

   273 </div>

   274 

   275 <div id="outline-container-2-1" class="outline-3">

   276 <h3 id="sec-2-1">A Video Game Engine takes care of some of the groundwork</h3>

   277 <div class="outline-text-3" id="text-2-1">

   278 

   279 

   280 <p>

   281    When it comes to simulation environments, the engines used to

   282    create the worlds in video games offer top-notch physics and

   283    graphics support. These engines also have limited support for

   284    creating cameras and rendering 3D sound, which can be repurposed

   285    for vision and hearing respectively. Physics collision detection

   286    can be expanded to create a sense of touch.

   287 </p>

   288 <p>   

   289    jMonkeyEngine3 is one such engine for creating video games in

   290    Java. It uses OpenGL to render to the screen and uses screengraphs

   291    to avoid drawing things that do not appear on the screen. It has an

   292    active community and several games in the pipeline. The engine was

   293    not built to serve any particular game but is instead meant to be

   294    used for any 3D game. I chose jMonkeyEngine3 it because it had the

   295    most features out of all the open projects I looked at, and because

   296    I could then write my code in Clojure, an implementation of LISP

   297    that runs on the JVM.

   298 </p>

   299 </div>

   300 

   301 </div>

   302 

   303 <div id="outline-container-2-2" class="outline-3">

   304 <h3 id="sec-2-2"><code>CORTEX</code> Extends jMonkeyEngine3 to implement rich senses</h3>

   305 <div class="outline-text-3" id="text-2-2">

   306 

   307 

   308 <p>

   309    Using the game-making primitives provided by jMonkeyEngine3, I have

   310    constructed every major human sense except for smell and

   311    taste. <code>Cortex</code> also provides an interface for creating creatures

   312    in Blender, a 3D modeling environment, and then "rigging" the

   313    creatures with senses using 3D annotations in Blender. A creature

   314    can have any number of senses, and there can be any number of

   315    creatures in a simulation.

   316 </p>

   317 <p>   

   318    The senses available in <code>Cortex</code> are:

   319 </p>

   320 <ul>

   321 <li><a href="../../cortex/html/vision.html">Vision</a>

   322 </li>

   323 <li><a href="../../cortex/html/hearing.html">Hearing</a>

   324 </li>

   325 <li><a href="../../cortex/html/touch.html">Touch</a>

   326 </li>

   327 <li><a href="../../cortex/html/proprioception.html">Proprioception</a>

   328 </li>

   329 <li><a href="../../cortex/html/movement.html">Muscle Tension</a>

   330 </li>

   331 </ul>

   332 

   333 

   334 </div>

   335 </div>

   336 

   337 </div>

   338 

   339 <div id="outline-container-3" class="outline-2">

   340 <h2 id="sec-3">A roadmap for <code>Cortex</code> experiments</h2>

   341 <div class="outline-text-2" id="text-3">

   342 

   343 

   344 

   345 </div>

   346 

   347 <div id="outline-container-3-1" class="outline-3">

   348 <h3 id="sec-3-1">Worm World</h3>

   349 <div class="outline-text-3" id="text-3-1">

   350 

   351 

   352 <p>

   353    Worms in <code>Cortex</code> are segmented creatures which vary in length and

   354    number of segments, and have the senses of vision, proprioception,

   355    touch, and muscle tension.

   356 </p>

   357 

   358 <div class="figure">

   359 <p><img src="../images/finger-UV.png" width=755 alt="../images/finger-UV.png" /></p>

   360 <p>This is the tactile-sensor-profile for the upper segment of a worm. It defines regions of high touch sensitivity (where there are many white pixels) and regions of low sensitivity (where white pixels are sparse).</p>

   361 </div>

   362 

   363 

   364 

   365 

   366 <div class="figure">

   367   <center>

   368     <video controls="controls" width="550">

   369       <source src="../video/worm-touch.ogg" type="video/ogg"

   370               preload="none" />

   371     </video>

   372     <br> <a href="http://youtu.be/RHx2wqzNVcU"> YouTube </a>

   373   </center>

   374   <p>The worm responds to touch.</p>

   375 </div>

   376 

   377 <div class="figure">

   378   <center>

   379     <video controls="controls" width="550">

   380       <source src="../video/test-proprioception.ogg" type="video/ogg"

   381               preload="none" />

   382     </video>

   383     <br> <a href="http://youtu.be/JjdDmyM8b0w"> YouTube </a>

   384   </center>

   385   <p>Proprioception in a worm. The proprioceptive readout is

   386     in the upper left corner of the screen.</p>

   387 </div>

   388 

   389 <p>

   390    A worm is trained in various actions such as sinusoidal movement,

   391    curling, flailing, and spinning by directly playing motor

   392    contractions while the worm "feels" the experience. These actions

   393    are recorded both as vectors of muscle tension, touch, and

   394    proprioceptive data, but also in higher level forms such as

   395    frequencies of the various contractions and a symbolic name for the

   396    action.

   397 </p>

   398 <p>

   399    Then, the worm watches a video of another worm performing one of

   400    the actions, and must judge which action was performed. Normally

   401    this would be an extremely difficult problem, but the worm is able

   402    to greatly diminish the search space through sympathetic

   403    imagination. First, it creates an imagined copy of its body which

   404    it observes from a third person point of view. Then for each frame

   405    of the video, it maneuvers its simulated body to be in registration

   406    with the worm depicted in the video. The physical constraints

   407    imposed by the physics simulation greatly decrease the number of

   408    poses that have to be tried, making the search feasible. As the

   409    imaginary worm moves, it generates imaginary muscle tension and

   410    proprioceptive sensations. The worm determines the action not by

   411    vision, but by matching the imagined proprioceptive data with

   412    previous examples.

   413 </p>

   414 <p>

   415    By using non-visual sensory data such as touch, the worms can also

   416    answer body related questions such as "did your head touch your

   417    tail?" and "did worm A touch worm B?"

   418 </p>

   419 <p>

   420    The proprioceptive information used for action identification is

   421    body-centric, so only the registration step is dependent on point

   422    of view, not the identification step. Registration is not specific

   423    to any particular action. Thus, action identification can be

   424    divided into a point-of-view dependent generic registration step,

   425    and a action-specific step that is body-centered and invariant to

   426    point of view.

   427 </p>

   428 </div>

   429 

   430 </div>

   431 

   432 <div id="outline-container-3-2" class="outline-3">

   433 <h3 id="sec-3-2">Stick Figure World</h3>

   434 <div class="outline-text-3" id="text-3-2">

   435 

   436 

   437 <p>

   438    This environment is similar to Worm World, except the creatures are

   439    more complicated and the actions and questions more varied. It is

   440    an experiment to see how far imagination can go in interpreting

   441    actions.  

   442 </p></div>

   443 </div>

   444 </div>

   445 </div>

   446 

   447 <div id="postamble">

   448 <p class="date">Date: 2013-11-07 04:21:29 EST</p>

   449 <p class="author">Author: Robert McIntyre</p>

   450 <p class="creator">Org version 7.7 with Emacs version 24</p>

   451 <a href="http://validator.w3.org/check?uri=referer">Validate XHTML 1.0</a>

   452 

   453 </div>

   454 </body>

   455 </html>
author	Robert McIntyre <rlm@mit.edu>
date	Fri, 25 Apr 2014 00:05:46 -0400
parents	5205535237fb
children