rlm@401: <?xml version="1.0" encoding="utf-8"?>
rlm@401: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
rlm@401:                "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
rlm@401: <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
rlm@401: <head>
rlm@401: <title><code>CORTEX</code></title>
rlm@401: <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
rlm@401: <meta name="title" content="<code>CORTEX</code>"/>
rlm@401: <meta name="generator" content="Org-mode"/>
rlm@401: <meta name="generated" content="2013-11-07 04:21:29 EST"/>
rlm@401: <meta name="author" content="Robert McIntyre"/>
rlm@401: <meta name="description" content="Using embodied AI to facilitate Artificial Imagination."/>
rlm@401: <meta name="keywords" content="AI, clojure, embodiment"/>
rlm@401: <style type="text/css">
rlm@401:  <!--/*--><![CDATA[/*><!--*/
rlm@401:   html { font-family: Times, serif; font-size: 12pt; }
rlm@401:   .title  { text-align: center; }
rlm@401:   .todo   { color: red; }
rlm@401:   .done   { color: green; }
rlm@401:   .tag    { background-color: #add8e6; font-weight:normal }
rlm@401:   .target { }
rlm@401:   .timestamp { color: #bebebe; }
rlm@401:   .timestamp-kwd { color: #5f9ea0; }
rlm@401:   .right  {margin-left:auto; margin-right:0px;  text-align:right;}
rlm@401:   .left   {margin-left:0px;  margin-right:auto; text-align:left;}
rlm@401:   .center {margin-left:auto; margin-right:auto; text-align:center;}
rlm@401:   p.verse { margin-left: 3% }
rlm@401:   pre {
rlm@401: 	border: 1pt solid #AEBDCC;
rlm@401: 	background-color: #F3F5F7;
rlm@401: 	padding: 5pt;
rlm@401: 	font-family: courier, monospace;
rlm@401:         font-size: 90%;
rlm@401:         overflow:auto;
rlm@401:   }
rlm@401:   table { border-collapse: collapse; }
rlm@401:   td, th { vertical-align: top;  }
rlm@401:   th.right  { text-align:center;  }
rlm@401:   th.left   { text-align:center;   }
rlm@401:   th.center { text-align:center; }
rlm@401:   td.right  { text-align:right;  }
rlm@401:   td.left   { text-align:left;   }
rlm@401:   td.center { text-align:center; }
rlm@401:   dt { font-weight: bold; }
rlm@401:   div.figure { padding: 0.5em; }
rlm@401:   div.figure p { text-align: center; }
rlm@401:   div.inlinetask {
rlm@401:     padding:10px;
rlm@401:     border:2px solid gray;
rlm@401:     margin:10px;
rlm@401:     background: #ffffcc;
rlm@401:   }
rlm@401:   textarea { overflow-x: auto; }
rlm@401:   .linenr { font-size:smaller }
rlm@401:   .code-highlighted {background-color:#ffff00;}
rlm@401:   .org-info-js_info-navigation { border-style:none; }
rlm@401:   #org-info-js_console-label { font-size:10px; font-weight:bold;
rlm@401:                                white-space:nowrap; }
rlm@401:   .org-info-js_search-highlight {background-color:#ffff00; color:#000000;
rlm@401:                                  font-weight:bold; }
rlm@401:   /*]]>*/-->
rlm@401: </style>
rlm@401: <script type="text/javascript">var _gaq = _gaq || [];_gaq.push(['_setAccount', 'UA-31261312-1']);_gaq.push(['_trackPageview']);(function() {var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);})();</script><link rel="stylesheet" type="text/css" href="../../aurellem/css/argentum.css" />
rlm@401: <script type="text/javascript">
rlm@401: <!--/*--><![CDATA[/*><!--*/
rlm@401:  function CodeHighlightOn(elem, id)
rlm@401:  {
rlm@401:    var target = document.getElementById(id);
rlm@401:    if(null != target) {
rlm@401:      elem.cacheClassElem = elem.className;
rlm@401:      elem.cacheClassTarget = target.className;
rlm@401:      target.className = "code-highlighted";
rlm@401:      elem.className   = "code-highlighted";
rlm@401:    }
rlm@401:  }
rlm@401:  function CodeHighlightOff(elem, id)
rlm@401:  {
rlm@401:    var target = document.getElementById(id);
rlm@401:    if(elem.cacheClassElem)
rlm@401:      elem.className = elem.cacheClassElem;
rlm@401:    if(elem.cacheClassTarget)
rlm@401:      target.className = elem.cacheClassTarget;
rlm@401:  }
rlm@401: /*]]>*///-->
rlm@401: </script>
rlm@401: 
rlm@401: </head>
rlm@401: <body>
rlm@401: 
rlm@401: 
rlm@401: <div id="content">
rlm@401: <h1 class="title"><code>CORTEX</code></h1>
rlm@401: 
rlm@401: 
rlm@401: <div class="header">
rlm@401:   <div class="float-right">	
rlm@401:     <!-- 
rlm@401:     <form>
rlm@401:       <input type="text"/><input type="submit" value="search the blog &raquo;"/> 
rlm@401:     </form>
rlm@401:     -->
rlm@401:   </div>
rlm@401: 
rlm@401:   <h1>aurellem <em>&#x2609;</em></h1>
rlm@401:   <ul class="nav">
rlm@401:     <li><a href="/">read the blog &raquo;</a></li>
rlm@401:     <!-- li><a href="#">learn about us &raquo;</a></li-->
rlm@401:   </ul>
rlm@401: </div>
rlm@401: 
rlm@401: <div class="author">Written by <author>Robert McIntyre</author></div>
rlm@401: 
rlm@401: 
rlm@401: 
rlm@401: 
rlm@401: 
rlm@401: 
rlm@401: 
rlm@401: <div id="outline-container-1" class="outline-2">
rlm@401: <h2 id="sec-1">Artificial Imagination</h2>
rlm@401: <div class="outline-text-2" id="text-1">
rlm@401: 
rlm@401: 
rlm@401: <p>
rlm@401:   Imagine watching a video of someone skateboarding. When you watch
rlm@401:   the video, you can imagine yourself skateboarding, and your
rlm@401:   knowledge of the human body and its dynamics guides your
rlm@401:   interpretation of the scene. For example, even if the skateboarder
rlm@401:   is partially occluded, you can infer the positions of his arms and
rlm@401:   body from your own knowledge of how your body would be positioned if
rlm@401:   you were skateboarding. If the skateboarder suffers an accident, you
rlm@401:   wince in sympathy, imagining the pain your own body would experience
rlm@401:   if it were in the same situation. This empathy with other people
rlm@401:   guides our understanding of whatever they are doing because it is a
rlm@401:   powerful constraint on what is probable and possible. In order to
rlm@401:   make use of this powerful empathy constraint, I need a system that
rlm@401:   can generate and make sense of sensory data from the many different
rlm@401:   senses that humans possess. The two key proprieties of such a system
rlm@401:   are <i>embodiment</i> and <i>imagination</i>.
rlm@401: </p>
rlm@401: 
rlm@401: </div>
rlm@401: 
rlm@401: <div id="outline-container-1-1" class="outline-3">
rlm@401: <h3 id="sec-1-1">What is imagination?</h3>
rlm@401: <div class="outline-text-3" id="text-1-1">
rlm@401: 
rlm@401: 
rlm@401: <p>
rlm@401:    One kind of imagination is <i>sympathetic</i> imagination: you imagine
rlm@401:    yourself in the position of something/someone you are
rlm@401:    observing. This type of imagination comes into play when you follow
rlm@401:    along visually when watching someone perform actions, or when you
rlm@401:    sympathetically grimace when someone hurts themselves. This type of
rlm@401:    imagination uses the constraints you have learned about your own
rlm@401:    body to highly constrain the possibilities in whatever you are
rlm@401:    seeing. It uses all your senses to including your senses of touch,
rlm@401:    proprioception, etc. Humans are flexible when it comes to "putting
rlm@401:    themselves in another's shoes," and can sympathetically understand
rlm@401:    not only other humans, but entities ranging animals to cartoon
rlm@401:    characters to <a href="http://www.youtube.com/watch?v=0jz4HcwTQmU">single dots</a> on a screen!
rlm@401: </p>
rlm@401: <p>
rlm@401:    Another kind of imagination is <i>predictive</i> imagination: you
rlm@401:    construct scenes in your mind that are not entirely related to
rlm@401:    whatever you are observing, but instead are predictions of the
rlm@401:    future or simply flights of fancy. You use this type of imagination
rlm@401:    to plan out multi-step actions, or play out dangerous situations in
rlm@401:    your mind so as to avoid messing them up in reality.
rlm@401: </p>
rlm@401: <p>
rlm@401:    Of course, sympathetic and predictive imagination blend into each
rlm@401:    other and are not completely separate concepts. One dimension along
rlm@401:    which you can distinguish types of imagination is dependence on raw
rlm@401:    sense data. Sympathetic imagination is highly constrained by your
rlm@401:    senses, while predictive imagination can be more or less dependent
rlm@401:    on your senses depending on how far ahead you imagine. Daydreaming
rlm@401:    is an extreme form of predictive imagination that wanders through
rlm@401:    different possibilities without concern for whether they are
rlm@401:    related to whatever is happening in reality.
rlm@401: </p>
rlm@401: <p>
rlm@401:    For this thesis, I will mostly focus on sympathetic imagination and
rlm@401:    the constraint it provides for understanding sensory data.
rlm@401: </p>
rlm@401: </div>
rlm@401: 
rlm@401: </div>
rlm@401: 
rlm@401: <div id="outline-container-1-2" class="outline-3">
rlm@401: <h3 id="sec-1-2">What problems can imagination solve?</h3>
rlm@401: <div class="outline-text-3" id="text-1-2">
rlm@401: 
rlm@401: 
rlm@401: <p>
rlm@401:    Consider a video of a cat drinking some water.
rlm@401: </p>
rlm@401: 
rlm@401: <div class="figure">
rlm@401: <p><img src="../images/cat-drinking.jpg"  alt="../images/cat-drinking.jpg" /></p>
rlm@401: <p>A cat drinking some water. Identifying this action is beyond the state of the art for computers.</p>
rlm@401: </div>
rlm@401: 
rlm@401: <p>
rlm@401:    It is currently impossible for any computer program to reliably
rlm@401:    label such an video as "drinking". I think humans are able to label
rlm@401:    such video as "drinking" because they imagine <i>themselves</i> as the
rlm@401:    cat, and imagine putting their face up against a stream of water
rlm@401:    and sticking out their tongue. In that imagined world, they can
rlm@401:    feel the cool water hitting their tongue, and feel the water
rlm@401:    entering their body, and are able to recognize that <i>feeling</i> as
rlm@401:    drinking. So, the label of the action is not really in the pixels
rlm@401:    of the image, but is found clearly in a simulation inspired by
rlm@401:    those pixels. An imaginative system, having been trained on
rlm@401:    drinking and non-drinking examples and learning that the most
rlm@401:    important component of drinking is the feeling of water sliding
rlm@401:    down one's throat, would analyze a video of a cat drinking in the
rlm@401:    following manner:
rlm@401: </p>
rlm@401: <ul>
rlm@401: <li>Create a physical model of the video by putting a "fuzzy" model
rlm@401:      of its own body in place of the cat. Also, create a simulation of
rlm@401:      the stream of water.
rlm@401: 
rlm@401: </li>
rlm@401: <li>Play out this simulated scene and generate imagined sensory
rlm@401:      experience. This will include relevant muscle contractions, a
rlm@401:      close up view of the stream from the cat's perspective, and most
rlm@401:      importantly, the imagined feeling of water entering the mouth.
rlm@401: 
rlm@401: </li>
rlm@401: <li>The action is now easily identified as drinking by the sense of
rlm@401:      taste alone. The other senses (such as the tongue moving in and
rlm@401:      out) help to give plausibility to the simulated action. Note that
rlm@401:      the sense of vision, while critical in creating the simulation,
rlm@401:      is not critical for identifying the action from the simulation.
rlm@401: </li>
rlm@401: </ul>
rlm@401: 
rlm@401: 
rlm@401: <p>
rlm@401:    More generally, I expect imaginative systems to be particularly
rlm@401:    good at identifying embodied actions in videos.
rlm@401: </p>
rlm@401: </div>
rlm@401: </div>
rlm@401: 
rlm@401: </div>
rlm@401: 
rlm@401: <div id="outline-container-2" class="outline-2">
rlm@401: <h2 id="sec-2">Cortex</h2>
rlm@401: <div class="outline-text-2" id="text-2">
rlm@401: 
rlm@401: 
rlm@401: <p>
rlm@401:   The previous example involves liquids, the sense of taste, and
rlm@401:   imagining oneself as a cat. For this thesis I constrain myself to
rlm@401:   simpler, more easily digitizable senses and situations.
rlm@401: </p>
rlm@401: <p>
rlm@401:   My system, <code>Cortex</code> performs imagination in two different simplified
rlm@401:   worlds: <i>worm world</i> and <i>stick figure world</i>. In each of these
rlm@401:   worlds, entities capable of imagination recognize actions by
rlm@401:   simulating the experience from their own perspective, and then
rlm@401:   recognizing the action from a database of examples.
rlm@401: </p>
rlm@401: <p>
rlm@401:   In order to serve as a framework for experiments in imagination,
rlm@401:   <code>Cortex</code> requires simulated bodies, worlds, and senses like vision,
rlm@401:   hearing, touch, proprioception, etc.
rlm@401: </p>
rlm@401: 
rlm@401: </div>
rlm@401: 
rlm@401: <div id="outline-container-2-1" class="outline-3">
rlm@401: <h3 id="sec-2-1">A Video Game Engine takes care of some of the groundwork</h3>
rlm@401: <div class="outline-text-3" id="text-2-1">
rlm@401: 
rlm@401: 
rlm@401: <p>
rlm@401:    When it comes to simulation environments, the engines used to
rlm@401:    create the worlds in video games offer top-notch physics and
rlm@401:    graphics support. These engines also have limited support for
rlm@401:    creating cameras and rendering 3D sound, which can be repurposed
rlm@401:    for vision and hearing respectively. Physics collision detection
rlm@401:    can be expanded to create a sense of touch.
rlm@401: </p>
rlm@401: <p>   
rlm@401:    jMonkeyEngine3 is one such engine for creating video games in
rlm@401:    Java. It uses OpenGL to render to the screen and uses screengraphs
rlm@401:    to avoid drawing things that do not appear on the screen. It has an
rlm@401:    active community and several games in the pipeline. The engine was
rlm@401:    not built to serve any particular game but is instead meant to be
rlm@401:    used for any 3D game. I chose jMonkeyEngine3 it because it had the
rlm@401:    most features out of all the open projects I looked at, and because
rlm@401:    I could then write my code in Clojure, an implementation of LISP
rlm@401:    that runs on the JVM.
rlm@401: </p>
rlm@401: </div>
rlm@401: 
rlm@401: </div>
rlm@401: 
rlm@401: <div id="outline-container-2-2" class="outline-3">
rlm@401: <h3 id="sec-2-2"><code>CORTEX</code> Extends jMonkeyEngine3 to implement rich senses</h3>
rlm@401: <div class="outline-text-3" id="text-2-2">
rlm@401: 
rlm@401: 
rlm@401: <p>
rlm@401:    Using the game-making primitives provided by jMonkeyEngine3, I have
rlm@401:    constructed every major human sense except for smell and
rlm@401:    taste. <code>Cortex</code> also provides an interface for creating creatures
rlm@401:    in Blender, a 3D modeling environment, and then "rigging" the
rlm@401:    creatures with senses using 3D annotations in Blender. A creature
rlm@401:    can have any number of senses, and there can be any number of
rlm@401:    creatures in a simulation.
rlm@401: </p>
rlm@401: <p>   
rlm@401:    The senses available in <code>Cortex</code> are:
rlm@401: </p>
rlm@401: <ul>
rlm@401: <li><a href="../../cortex/html/vision.html">Vision</a>
rlm@401: </li>
rlm@401: <li><a href="../../cortex/html/hearing.html">Hearing</a>
rlm@401: </li>
rlm@401: <li><a href="../../cortex/html/touch.html">Touch</a>
rlm@401: </li>
rlm@401: <li><a href="../../cortex/html/proprioception.html">Proprioception</a>
rlm@401: </li>
rlm@401: <li><a href="../../cortex/html/movement.html">Muscle Tension</a>
rlm@401: </li>
rlm@401: </ul>
rlm@401: 
rlm@401: 
rlm@401: </div>
rlm@401: </div>
rlm@401: 
rlm@401: </div>
rlm@401: 
rlm@401: <div id="outline-container-3" class="outline-2">
rlm@401: <h2 id="sec-3">A roadmap for <code>Cortex</code> experiments</h2>
rlm@401: <div class="outline-text-2" id="text-3">
rlm@401: 
rlm@401: 
rlm@401: 
rlm@401: </div>
rlm@401: 
rlm@401: <div id="outline-container-3-1" class="outline-3">
rlm@401: <h3 id="sec-3-1">Worm World</h3>
rlm@401: <div class="outline-text-3" id="text-3-1">
rlm@401: 
rlm@401: 
rlm@401: <p>
rlm@401:    Worms in <code>Cortex</code> are segmented creatures which vary in length and
rlm@401:    number of segments, and have the senses of vision, proprioception,
rlm@401:    touch, and muscle tension.
rlm@401: </p>
rlm@401: 
rlm@401: <div class="figure">
rlm@401: <p><img src="../images/finger-UV.png" width=755 alt="../images/finger-UV.png" /></p>
rlm@401: <p>This is the tactile-sensor-profile for the upper segment of a worm. It defines regions of high touch sensitivity (where there are many white pixels) and regions of low sensitivity (where white pixels are sparse).</p>
rlm@401: </div>
rlm@401: 
rlm@401: 
rlm@401: 
rlm@401: 
rlm@401: <div class="figure">
rlm@401:   <center>
rlm@401:     <video controls="controls" width="550">
rlm@401:       <source src="../video/worm-touch.ogg" type="video/ogg"
rlm@401:               preload="none" />
rlm@401:     </video>
rlm@401:     <br> <a href="http://youtu.be/RHx2wqzNVcU"> YouTube </a>
rlm@401:   </center>
rlm@401:   <p>The worm responds to touch.</p>
rlm@401: </div>
rlm@401: 
rlm@401: <div class="figure">
rlm@401:   <center>
rlm@401:     <video controls="controls" width="550">
rlm@401:       <source src="../video/test-proprioception.ogg" type="video/ogg"
rlm@401:               preload="none" />
rlm@401:     </video>
rlm@401:     <br> <a href="http://youtu.be/JjdDmyM8b0w"> YouTube </a>
rlm@401:   </center>
rlm@401:   <p>Proprioception in a worm. The proprioceptive readout is
rlm@401:     in the upper left corner of the screen.</p>
rlm@401: </div>
rlm@401: 
rlm@401: <p>
rlm@401:    A worm is trained in various actions such as sinusoidal movement,
rlm@401:    curling, flailing, and spinning by directly playing motor
rlm@401:    contractions while the worm "feels" the experience. These actions
rlm@401:    are recorded both as vectors of muscle tension, touch, and
rlm@401:    proprioceptive data, but also in higher level forms such as
rlm@401:    frequencies of the various contractions and a symbolic name for the
rlm@401:    action.
rlm@401: </p>
rlm@401: <p>
rlm@401:    Then, the worm watches a video of another worm performing one of
rlm@401:    the actions, and must judge which action was performed. Normally
rlm@401:    this would be an extremely difficult problem, but the worm is able
rlm@401:    to greatly diminish the search space through sympathetic
rlm@401:    imagination. First, it creates an imagined copy of its body which
rlm@401:    it observes from a third person point of view. Then for each frame
rlm@401:    of the video, it maneuvers its simulated body to be in registration
rlm@401:    with the worm depicted in the video. The physical constraints
rlm@401:    imposed by the physics simulation greatly decrease the number of
rlm@401:    poses that have to be tried, making the search feasible. As the
rlm@401:    imaginary worm moves, it generates imaginary muscle tension and
rlm@401:    proprioceptive sensations. The worm determines the action not by
rlm@401:    vision, but by matching the imagined proprioceptive data with
rlm@401:    previous examples.
rlm@401: </p>
rlm@401: <p>
rlm@401:    By using non-visual sensory data such as touch, the worms can also
rlm@401:    answer body related questions such as "did your head touch your
rlm@401:    tail?" and "did worm A touch worm B?"
rlm@401: </p>
rlm@401: <p>
rlm@401:    The proprioceptive information used for action identification is
rlm@401:    body-centric, so only the registration step is dependent on point
rlm@401:    of view, not the identification step. Registration is not specific
rlm@401:    to any particular action. Thus, action identification can be
rlm@401:    divided into a point-of-view dependent generic registration step,
rlm@401:    and a action-specific step that is body-centered and invariant to
rlm@401:    point of view.
rlm@401: </p>
rlm@401: </div>
rlm@401: 
rlm@401: </div>
rlm@401: 
rlm@401: <div id="outline-container-3-2" class="outline-3">
rlm@401: <h3 id="sec-3-2">Stick Figure World</h3>
rlm@401: <div class="outline-text-3" id="text-3-2">
rlm@401: 
rlm@401: 
rlm@401: <p>
rlm@401:    This environment is similar to Worm World, except the creatures are
rlm@401:    more complicated and the actions and questions more varied. It is
rlm@401:    an experiment to see how far imagination can go in interpreting
rlm@401:    actions.  
rlm@401: </p></div>
rlm@401: </div>
rlm@401: </div>
rlm@401: </div>
rlm@401: 
rlm@401: <div id="postamble">
rlm@401: <p class="date">Date: 2013-11-07 04:21:29 EST</p>
rlm@401: <p class="author">Author: Robert McIntyre</p>
rlm@401: <p class="creator">Org version 7.7 with Emacs version 24</p>
rlm@401: <a href="http://validator.w3.org/check?uri=referer">Validate XHTML 1.0</a>
rlm@401: 
rlm@401: </div>
rlm@401: </body>
rlm@401: </html>