diff thesis/cortex.org @ 448:af13fc73e851

completing second part of first chapter.
author Robert McIntyre <rlm@mit.edu>
date Tue, 25 Mar 2014 22:54:41 -0400
parents 284316604be0
children 09b7c8dd4365
line wrap: on
line diff
     1.1 --- a/thesis/cortex.org	Tue Mar 25 11:30:15 2014 -0400
     1.2 +++ b/thesis/cortex.org	Tue Mar 25 22:54:41 2014 -0400
     1.3 @@ -41,16 +41,10 @@
     1.4     what is happening here?
     1.5    
     1.6     Or suppose that you are building a program that recognizes chairs.
     1.7 -   How could you ``see'' the chair in figure \ref{invisible-chair} and
     1.8 -   figure \ref{hidden-chair}?
     1.9 -   
    1.10 -   #+caption: When you look at this, do you think ``chair''? I certainly do.
    1.11 -   #+name: invisible-chair
    1.12 -   #+ATTR_LaTeX: :width 10cm
    1.13 -   [[./images/invisible-chair.png]]
    1.14 +   How could you ``see'' the chair in figure \ref{hidden-chair}?
    1.15     
    1.16     #+caption: The chair in this image is quite obvious to humans, but I 
    1.17 -   #+caption: doubt that any computer program can find it.
    1.18 +   #+caption: doubt that any modern computer vision program can find it.
    1.19     #+name: hidden-chair
    1.20     #+ATTR_LaTeX: :width 10cm
    1.21     [[./images/fat-person-sitting-at-desk.jpg]]
    1.22 @@ -62,7 +56,7 @@
    1.23     #+caption: to discern the difference in how the girl's arm muscles
    1.24     #+caption: are activated between the two images.
    1.25     #+name: girl
    1.26 -   #+ATTR_LaTeX: :width 10cm
    1.27 +   #+ATTR_LaTeX: :width 7cm
    1.28     [[./images/wall-push.png]]
    1.29    
    1.30     Each of these examples tells us something about what might be going
    1.31 @@ -85,31 +79,31 @@
    1.32     problems above in a form amenable to computation. It is split into
    1.33     four parts:
    1.34  
    1.35 -   - Free/Guided Play (Training) :: The creature moves around and
    1.36 -        experiences the world through its unique perspective. Many
    1.37 -        otherwise complicated actions are easily described in the
    1.38 -        language of a full suite of body-centered, rich senses. For
    1.39 -        example, drinking is the feeling of water sliding down your
    1.40 -        throat, and cooling your insides. It's often accompanied by
    1.41 -        bringing your hand close to your face, or bringing your face
    1.42 -        close to water. Sitting down is the feeling of bending your
    1.43 -        knees, activating your quadriceps, then feeling a surface with
    1.44 -        your bottom and relaxing your legs. These body-centered action
    1.45 -        descriptions can be either learned or hard coded.
    1.46 -   - Alignment (Posture imitation) :: When trying to interpret a video
    1.47 -        or image, the creature takes a model of itself and aligns it
    1.48 -        with whatever it sees. This alignment can even cross species,
    1.49 -        as when humans try to align themselves with things like
    1.50 -        ponies, dogs, or other humans with a different body type.
    1.51 -   - Empathy (Sensory extrapolation) :: The alignment triggers
    1.52 -        associations with sensory data from prior experiences. For
    1.53 -        example, the alignment itself easily maps to proprioceptive
    1.54 -        data. Any sounds or obvious skin contact in the video can to a
    1.55 -        lesser extent trigger previous experience. Segments of
    1.56 -        previous experiences are stitched together to form a coherent
    1.57 -        and complete sensory portrait of the scene.
    1.58 -   - Recognition (Classification) :: With the scene described in terms
    1.59 -        of first person sensory events, the creature can now run its
    1.60 +   - Free/Guided Play :: The creature moves around and experiences the
    1.61 +        world through its unique perspective. Many otherwise
    1.62 +        complicated actions are easily described in the language of a
    1.63 +        full suite of body-centered, rich senses. For example,
    1.64 +        drinking is the feeling of water sliding down your throat, and
    1.65 +        cooling your insides. It's often accompanied by bringing your
    1.66 +        hand close to your face, or bringing your face close to water.
    1.67 +        Sitting down is the feeling of bending your knees, activating
    1.68 +        your quadriceps, then feeling a surface with your bottom and
    1.69 +        relaxing your legs. These body-centered action descriptions
    1.70 +        can be either learned or hard coded.
    1.71 +   - Posture Imitation :: When trying to interpret a video or image,
    1.72 +        the creature takes a model of itself and aligns it with
    1.73 +        whatever it sees. This alignment can even cross species, as
    1.74 +        when humans try to align themselves with things like ponies,
    1.75 +        dogs, or other humans with a different body type.
    1.76 +   - Empathy         :: The alignment triggers associations with
    1.77 +        sensory data from prior experiences. For example, the
    1.78 +        alignment itself easily maps to proprioceptive data. Any
    1.79 +        sounds or obvious skin contact in the video can to a lesser
    1.80 +        extent trigger previous experience. Segments of previous
    1.81 +        experiences are stitched together to form a coherent and
    1.82 +        complete sensory portrait of the scene.
    1.83 +   - Recognition      :: With the scene described in terms of first
    1.84 +        person sensory events, the creature can now run its
    1.85          action-identification programs on this synthesized sensory
    1.86          data, just as it would if it were actually experiencing the
    1.87          scene first-hand. If previous experience has been accurately
    1.88 @@ -193,16 +187,16 @@
    1.89     model of your body, and aligns the model with the video. Then, you
    1.90     need a /recognizer/, which uses the aligned model to interpret the
    1.91     action. The power in this method lies in the fact that you describe
    1.92 -   all actions form a body-centered, viewpoint You are less tied to
    1.93 +   all actions form a body-centered viewpoint. You are less tied to
    1.94     the particulars of any visual representation of the actions. If you
    1.95     teach the system what ``running'' is, and you have a good enough
    1.96     aligner, the system will from then on be able to recognize running
    1.97     from any point of view, even strange points of view like above or
    1.98     underneath the runner. This is in contrast to action recognition
    1.99 -   schemes that try to identify actions using a non-embodied approach
   1.100 -   such as TODO:REFERENCE. If these systems learn about running as
   1.101 -   viewed from the side, they will not automatically be able to
   1.102 -   recognize running from any other viewpoint.
   1.103 +   schemes that try to identify actions using a non-embodied approach.
   1.104 +   If these systems learn about running as viewed from the side, they
   1.105 +   will not automatically be able to recognize running from any other
   1.106 +   viewpoint.
   1.107  
   1.108     Another powerful advantage is that using the language of multiple
   1.109     body-centered rich senses to describe body-centerd actions offers a
   1.110 @@ -234,8 +228,81 @@
   1.111  
   1.112  ** =CORTEX= is a toolkit for building sensate creatures
   1.113  
   1.114 -   Hand integration demo
   1.115 +   I built =CORTEX= to be a general AI research platform for doing
   1.116 +   experiments involving multiple rich senses and a wide variety and
   1.117 +   number of creatures. I intend it to be useful as a library for many
   1.118 +   more projects than just this one. =CORTEX= was necessary to meet a
   1.119 +   need among AI researchers at CSAIL and beyond, which is that people
   1.120 +   often will invent neat ideas that are best expressed in the
   1.121 +   language of creatures and senses, but in order to explore those
   1.122 +   ideas they must first build a platform in which they can create
   1.123 +   simulated creatures with rich senses! There are many ideas that
   1.124 +   would be simple to execute (such as =EMPATH=), but attached to them
   1.125 +   is the multi-month effort to make a good creature simulator. Often,
   1.126 +   that initial investment of time proves to be too much, and the
   1.127 +   project must make do with a lesser environment.
   1.128  
   1.129 +   =CORTEX= is well suited as an environment for embodied AI research
   1.130 +   for three reasons:
   1.131 +
   1.132 +   - You can create new creatures using Blender, a popular 3D modeling
   1.133 +     program. Each sense can be specified using special blender nodes
   1.134 +     with biologically inspired paramaters. You need not write any
   1.135 +     code to create a creature, and can use a wide library of
   1.136 +     pre-existing blender models as a base for your own creatures.
   1.137 +
   1.138 +   - =CORTEX= implements a wide variety of senses, including touch,
   1.139 +     proprioception, vision, hearing, and muscle tension. Complicated
   1.140 +     senses like touch, and vision involve multiple sensory elements
   1.141 +     embedded in a 2D surface. You have complete control over the
   1.142 +     distribution of these sensor elements through the use of simple
   1.143 +     png image files. In particular, =CORTEX= implements more
   1.144 +     comprehensive hearing than any other creature simulation system
   1.145 +     available. 
   1.146 +
   1.147 +   - =CORTEX= supports any number of creatures and any number of
   1.148 +     senses. Time in =CORTEX= dialates so that the simulated creatures
   1.149 +     always precieve a perfectly smooth flow of time, regardless of
   1.150 +     the actual computational load.
   1.151 +
   1.152 +   =CORTEX= is built on top of =jMonkeyEngine3=, which is a video game
   1.153 +   engine designed to create cross-platform 3D desktop games. =CORTEX=
   1.154 +   is mainly written in clojure, a dialect of =LISP= that runs on the
   1.155 +   java virtual machine (JVM). The API for creating and simulating
   1.156 +   creatures is entirely expressed in clojure. Hearing is implemented
   1.157 +   as a layer of clojure code on top of a layer of java code on top of
   1.158 +   a layer of =C++= code which implements a modified version of
   1.159 +   =OpenAL= to support multiple listeners. =CORTEX= is the only
   1.160 +   simulation environment that I know of that can support multiple
   1.161 +   entities that can each hear the world from their own perspective.
   1.162 +   Other senses also require a small layer of Java code. =CORTEX= also
   1.163 +   uses =bullet=, a physics simulator written in =C=.
   1.164 +
   1.165 +   #+caption: Here is the worm from above modeled in Blender, a free 
   1.166 +   #+caption: 3D-modeling program. Senses and joints are described
   1.167 +   #+caption: using special nodes in Blender.
   1.168 +   #+name: worm-recognition-intro
   1.169 +   #+ATTR_LaTeX: :width 12cm
   1.170 +   [[./images/blender-worm.png]]
   1.171 +
   1.172 +   During one test with =CORTEX=, I created 3,000 entities each with
   1.173 +   their own independent senses and ran them all at only 1/80 real
   1.174 +   time. In another test, I created a detailed model of my own hand,
   1.175 +   equipped with a realistic distribution of touch (more sensitive at
   1.176 +   the fingertips), as well as eyes and ears, and it ran at around 1/4
   1.177 +   real time.
   1.178 +
   1.179 +   #+caption: Here is the worm from above modeled in Blender, a free 
   1.180 +   #+caption: 3D-modeling program. Senses and joints are described
   1.181 +   #+caption: using special nodes in Blender.
   1.182 +   #+name: worm-recognition-intro
   1.183 +   #+ATTR_LaTeX: :width 15cm
   1.184 +   [[./images/full-hand.png]]
   1.185 +   
   1.186 +   
   1.187 +   
   1.188 +
   1.189 +   
   1.190  ** Contributions
   1.191  
   1.192  * Building =CORTEX=