cortex: thesis/org/first-chapter.org comparison

comparison thesis/org/first-chapter.org @ 401:7ee735a836da

incorporate thesis.

author	Robert McIntyre <rlm@mit.edu>
date	Sun, 16 Mar 2014 23:31:16 -0400
parents
children

comparison

equal deleted inserted replaced

-:6ba908c1a0a9
+:7ee735a836da
+#+title: =CORTEX=
+#+author: Robert McIntyre
+#+email: rlm@mit.edu
+#+description: Using embodied AI to facilitate Artificial Imagination.
+#+keywords: AI, clojure, embodiment
+#+SETUPFILE: ../../aurellem/org/setup.org
+#+INCLUDE: ../../aurellem/org/level-0.org
+#+babel: :mkdirp yes :noweb yes :exports both
+#+OPTIONS: toc:nil, num:nil
+* Artificial Imagination
+Imagine watching a video of someone skateboarding. When you watch
+the video, you can imagine yourself skateboarding, and your
+knowledge of the human body and its dynamics guides your
+interpretation of the scene. For example, even if the skateboarder
+is partially occluded, you can infer the positions of his arms and
+body from your own knowledge of how your body would be positioned if
+you were skateboarding. If the skateboarder suffers an accident, you
+wince in sympathy, imagining the pain your own body would experience
+if it were in the same situation. This empathy with other people
+guides our understanding of whatever they are doing because it is a
+powerful constraint on what is probable and possible. In order to
+make use of this powerful empathy constraint, I need a system that
+can generate and make sense of sensory data from the many different
+senses that humans possess. The two key proprieties of such a system
+are /embodiment/ and /imagination/.
+** What is imagination?
+One kind of imagination is /sympathetic/ imagination: you imagine
+yourself in the position of something/someone you are
+observing. This type of imagination comes into play when you follow
+along visually when watching someone perform actions, or when you
+sympathetically grimace when someone hurts themselves. This type of
+imagination uses the constraints you have learned about your own
+body to highly constrain the possibilities in whatever you are
+seeing. It uses all your senses to including your senses of touch,
+proprioception, etc. Humans are flexible when it comes to "putting
+themselves in another's shoes," and can sympathetically understand
+not only other humans, but entities ranging from animals to cartoon
+characters to [[http://www.youtube.com/watch?v=0jz4HcwTQmU][single dots]] on a screen!
+Another kind of imagination is /predictive/ imagination: you
+construct scenes in your mind that are not entirely related to
+whatever you are observing, but instead are predictions of the
+future or simply flights of fancy. You use this type of imagination
+to plan out multi-step actions, or play out dangerous situations in
+your mind so as to avoid messing them up in reality.
+Of course, sympathetic and predictive imagination blend into each
+other and are not completely separate concepts. One dimension along
+which you can distinguish types of imagination is dependence on raw
+sense data. Sympathetic imagination is highly constrained by your
+senses, while predictive imagination can be more or less dependent
+on your senses depending on how far ahead you imagine. Daydreaming
+is an extreme form of predictive imagination that wanders through
+different possibilities without concern for whether they are
+related to whatever is happening in reality.
+For this thesis, I will mostly focus on sympathetic imagination and
+the constraint it provides for understanding sensory data.
+** What problems can imagination solve?
+Consider a video of a cat drinking some water.
+#+caption: A cat drinking some water. Identifying this action is beyond the state of the art for computers.
+#+ATTR_LaTeX: width=5cm
+[[../images/cat-drinking.jpg]]
+It is currently impossible for any computer program to reliably
+label such an video as "drinking". I think humans are able to label
+such video as "drinking" because they imagine /themselves/ as the
+cat, and imagine putting their face up against a stream of water
+and sticking out their tongue. In that imagined world, they can
+feel the cool water hitting their tongue, and feel the water
+entering their body, and are able to recognize that /feeling/ as
+drinking. So, the label of the action is not really in the pixels
+of the image, but is found clearly in a simulation inspired by
+those pixels. An imaginative system, having been trained on
+drinking and non-drinking examples and learning that the most
+important component of drinking is the feeling of water sliding
+down one's throat, would analyze a video of a cat drinking in the
+following manner:
+- Create a physical model of the video by putting a "fuzzy" model
+of its own body in place of the cat. Also, create a simulation of
+the stream of water.
+- Play out this simulated scene and generate imagined sensory
+experience. This will include relevant muscle contractions, a
+close up view of the stream from the cat's perspective, and most
+importantly, the imagined feeling of water entering the mouth.
+- The action is now easily identified as drinking by the sense of
+taste alone. The other senses (such as the tongue moving in and
+out) help to give plausibility to the simulated action. Note that
+the sense of vision, while critical in creating the simulation,
+is not critical for identifying the action from the simulation.
+More generally, I expect imaginative systems to be particularly
+good at identifying embodied actions in videos.
+* Cortex
+The previous example involves liquids, the sense of taste, and
+imagining oneself as a cat. For this thesis I constrain myself to
+simpler, more easily digitizable senses and situations.
+My system, =CORTEX= performs imagination in two different simplified
+worlds: /worm world/ and /stick-figure world/. In each of these
+worlds, entities capable of imagination recognize actions by
+simulating the experience from their own perspective, and then
+recognizing the action from a database of examples.
+In order to serve as a framework for experiments in imagination,
+=CORTEX= requires simulated bodies, worlds, and senses like vision,
+hearing, touch, proprioception, etc.
+** A Video Game Engine takes care of some of the groundwork
+When it comes to simulation environments, the engines used to
+create the worlds in video games offer top-notch physics and
+graphics support. These engines also have limited support for
+creating cameras and rendering 3D sound, which can be repurposed
+for vision and hearing respectively. Physics collision detection
+can be expanded to create a sense of touch.
+jMonkeyEngine3 is one such engine for creating video games in
+Java. It uses OpenGL to render to the screen and uses screengraphs
+to avoid drawing things that do not appear on the screen. It has an
+active community and several games in the pipeline. The engine was
+not built to serve any particular game but is instead meant to be
+used for any 3D game. I chose jMonkeyEngine3 it because it had the
+most features out of all the open projects I looked at, and because
+I could then write my code in Clojure, an implementation of LISP
+that runs on the JVM.
+** =CORTEX= Extends jMonkeyEngine3 to implement rich senses
+Using the game-making primitives provided by jMonkeyEngine3, I have
+constructed every major human sense except for smell and
+taste. =CORTEX= also provides an interface for creating creatures
+in Blender, a 3D modeling environment, and then "rigging" the
+creatures with senses using 3D annotations in Blender. A creature
+can have any number of senses, and there can be any number of
+creatures in a simulation.
+The senses available in =CORTEX= are:
+- [[../../cortex/html/vision.html][Vision]]
+- [[../../cortex/html/hearing.html][Hearing]]
+- [[../../cortex/html/touch.html][Touch]]
+- [[../../cortex/html/proprioception.html][Proprioception]]
+- [[../../cortex/html/movement.html][Muscle Tension]]
+* A roadmap for =CORTEX= experiments
+** Worm World
+Worms in =CORTEX= are segmented creatures which vary in length and
+number of segments, and have the senses of vision, proprioception,
+touch, and muscle tension.
+#+attr_html: width=755
+#+caption: This is the tactile-sensor-profile for the upper segment of a worm. It defines regions of high touch sensitivity (where there are many white pixels) and regions of low sensitivity (where white pixels are sparse).
+[[../images/finger-UV.png]]
+#+begin_html
+<div class="figure">
+<center>
+<video controls="controls" width="550">
+<source src="../video/worm-touch.ogg" type="video/ogg"
+	      preload="none" />
+</video>
+<br> <a href="http://youtu.be/RHx2wqzNVcU"> YouTube </a>
+</center>
+<p>The worm responds to touch.</p>
+</div>
+#+end_html
+#+begin_html
+<div class="figure">
+<center>
+<video controls="controls" width="550">
+<source src="../video/test-proprioception.ogg" type="video/ogg"
+	      preload="none" />
+</video>
+<br> <a href="http://youtu.be/JjdDmyM8b0w"> YouTube </a>
+</center>
+<p>Proprioception in a worm. The proprioceptive readout is
+in the upper left corner of the screen.</p>
+</div>
+#+end_html
+A worm is trained in various actions such as sinusoidal movement,
+curling, flailing, and spinning by directly playing motor
+contractions while the worm "feels" the experience. These actions
+are recorded both as vectors of muscle tension, touch, and
+proprioceptive data, but also in higher level forms such as
+frequencies of the various contractions and a symbolic name for the
+action.
+Then, the worm watches a video of another worm performing one of
+the actions, and must judge which action was performed. Normally
+this would be an extremely difficult problem, but the worm is able
+to greatly diminish the search space through sympathetic
+imagination. First, it creates an imagined copy of its body which
+it observes from a third person point of view. Then for each frame
+of the video, it maneuvers its simulated body to be in registration
+with the worm depicted in the video. The physical constraints
+imposed by the physics simulation greatly decrease the number of
+poses that have to be tried, making the search feasible. As the
+imaginary worm moves, it generates imaginary muscle tension and
+proprioceptive sensations. The worm determines the action not by
+vision, but by matching the imagined proprioceptive data with
+previous examples.
+By using non-visual sensory data such as touch, the worms can also
+answer body related questions such as "did your head touch your
+tail?" and "did worm A touch worm B?"
+The proprioceptive information used for action identification is
+body-centric, so only the registration step is dependent on point
+of view, not the identification step. Registration is not specific
+to any particular action. Thus, action identification can be
+divided into a point-of-view dependent generic registration step,
+and a action-specific step that is body-centered and invariant to
+point of view.
+** Stick Figure World
+This environment is similar to Worm World, except the creatures are
+more complicated and the actions and questions more varied. It is
+an experiment to see how far imagination can go in interpreting
+actions.

Mercurial > cortex

comparison thesis/org/first-chapter.org @ 401:7ee735a836da