cortex: thesis/dylan-cortex-diff.diff comparison

comparison thesis/dylan-cortex-diff.diff @ 513:4c4d45f6f30b

accept/reject changes

author	Robert McIntyre <rlm@mit.edu>
date	Sun, 30 Mar 2014 10:41:18 -0400
parents
children	447c3c8405a2

comparison

equal deleted inserted replaced

-:8b962ab418c8
+:4c4d45f6f30b
+diff -r f639e2139ce2 thesis/cortex.org
+--- a/thesis/cortex.org	Sun Mar 30 01:34:43 2014 -0400
++++ b/thesis/cortex.org	Sun Mar 30 10:07:17 2014 -0400
+@@ -41,49 +41,46 @@
+[[./images/aurellem-gray.png]]
+-* Empathy and Embodiment as problem solving strategies
++* Empathy \& Embodiment: problem solving strategies
+-  By the end of this thesis, you will have seen a novel approach to
+-  interpreting video using embodiment and empathy. You will have also
+-  seen one way to efficiently implement empathy for embodied
+-  creatures. Finally, you will become familiar with =CORTEX=, a system
+-  for designing and simulating creatures with rich senses, which you
+-  may choose to use in your own research.
+-
+-  This is the core vision of my thesis: That one of the important ways
+-  in which we understand others is by imagining ourselves in their
+-  position and emphatically feeling experiences relative to our own
+-  bodies. By understanding events in terms of our own previous
+-  corporeal experience, we greatly constrain the possibilities of what
+-  would otherwise be an unwieldy exponential search. This extra
+-  constraint can be the difference between easily understanding what
+-  is happening in a video and being completely lost in a sea of
+-  incomprehensible color and movement.
+-
+-** Recognizing actions in video is extremely difficult
+-
+-   Consider for example the problem of determining what is happening
+-   in a video of which this is one frame:
+-
++** The problem: recognizing actions in video is extremely difficult
++# developing / requires useful representations
++
++   Examine the following collection of images. As you, and indeed very
++   young children, can easily determine, each one is a picture of
++   someone drinking.
++
++   # dxh: cat, cup, drinking fountain, rain, straw, coconut
+#+caption: A cat drinking some water. Identifying this action is
+-   #+caption: beyond the state of the art for computers.
++   #+caption: beyond the capabilities of existing computer vision systems.
+#+ATTR_LaTeX: :width 7cm
+[[./images/cat-drinking.jpg]]
++
++   Nevertheless, it is beyond the state of the art for a computer
++   vision program to describe what's happening in each of these
++   images, or what's common to them. Part of the problem is that many
++   computer vision systems focus on pixel-level details or probability
++   distributions of pixels, with little focus on [...]
++
++
++   In fact, the contents of scene may have much less to do with pixel
++   probabilities than with recognizing various affordances: things you
++   can move, objects you can grasp, spaces that can be filled
++   (Gibson). For example, what processes might enable you to see the
++   chair in figure \ref{hidden-chair}?
++   # Or suppose that you are building a program that recognizes chairs.
++   # How could you ``see'' the chair ?
+-   It is currently impossible for any computer program to reliably
+-   label such a video as ``drinking''. And rightly so -- it is a very
+-   hard problem! What features can you describe in terms of low level
+-   functions of pixels that can even begin to describe at a high level
+-   what is happening here?
+-
+-   Or suppose that you are building a program that recognizes chairs.
+-   How could you ``see'' the chair in figure \ref{hidden-chair}?
+-
++   # dxh: blur chair
+#+caption: The chair in this image is quite obvious to humans, but I
+#+caption: doubt that any modern computer vision program can find it.
+#+name: hidden-chair
+#+ATTR_LaTeX: :width 10cm
+[[./images/fat-person-sitting-at-desk.jpg]]
++
++
++
++
+Finally, how is it that you can easily tell the difference between
+how the girls /muscles/ are working in figure \ref{girl}?
+@@ -95,10 +92,13 @@
+#+ATTR_LaTeX: :width 7cm
+[[./images/wall-push.png]]
++
++
++
+Each of these examples tells us something about what might be going
+on in our minds as we easily solve these recognition problems.
+-   The hidden chairs show us that we are strongly triggered by cues
++   The hidden chair shows us that we are strongly triggered by cues
+relating to the position of human bodies, and that we can determine
+the overall physical configuration of a human body even if much of
+that body is occluded.
+@@ -109,10 +109,107 @@
+most positions, and we can easily project this self-knowledge to
+imagined positions triggered by images of the human body.
+-** =EMPATH= neatly solves recognition problems
++** A step forward: the sensorimotor-centered approach
++# ** =EMPATH= recognizes what creatures are doing
++# neatly solves recognition problems
++   In this thesis, I explore the idea that our knowledge of our own
++   bodies enables us to recognize the actions of others.
++
++   First, I built a system for constructing virtual creatures with
++   physiologically plausible sensorimotor systems and detailed
++   environments. The result is =CORTEX=, which is described in section
++   \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other
++   AI researchers; it is provided in full with detailed instructions
++   on the web [here].)
++
++   Next, I wrote routines which enabled a simple worm-like creature to
++   infer the actions of a second worm-like creature, using only its
++   own prior sensorimotor experiences and knowledge of the second
++   worm's joint positions. This program, =EMPATH=, is described in
++   section \ref{sec-3}, and the key results of this experiment are
++   summarized below.
++
++  #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer
++  #+caption: the complete sensory experience and classify these four poses.
++  #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.
++  #+name: worm-recognition-intro-2
++  #+ATTR_LaTeX: :width 15cm
++   [[./images/empathy-1.png]]
++
++   # =CORTEX= provides a language for describing the sensorimotor
++   # experiences of various creatures.
++
++   # Next, I developed an experiment to test the power of =CORTEX='s
++   # sensorimotor-centered language for solving recognition problems. As
++   # a proof of concept, I wrote routines which enabled a simple
++   # worm-like creature to infer the actions of a second worm-like
++   # creature, using only its own previous sensorimotor experiences and
++   # knowledge of the second worm's joints (figure
++   # \ref{worm-recognition-intro-2}). The result of this proof of
++   # concept was the program =EMPATH=, described in section
++   # \ref{sec-3}. The key results of this
++
++   # Using only first-person sensorimotor experiences and third-person
++   # proprioceptive data,
++
++*** Key results
++   - After one-shot supervised training, =EMPATH= was able recognize a
++     wide variety of static poses and dynamic actions---ranging from
++     curling in a circle to wriggling with a particular frequency ---
++     with 95\% accuracy.
++   - These results were completely independent of viewing angle
++     because the underlying body-centered language fundamentally is;
++     once an action is learned, it can be recognized equally well from
++     any viewing angle.
++   - =EMPATH= is surprisingly short; the sensorimotor-centered
++     language provided by =CORTEX= resulted in extremely economical
++     recognition routines --- about 0000 lines in all --- suggesting
++     that such representations are very powerful, and often
++     indispensible for the types of recognition tasks considered here.
++   - Although for expediency's sake, I relied on direct knowledge of
++     joint positions in this proof of concept, it would be
++     straightforward to extend =EMPATH= so that it (more
++     realistically) infers joint positions from its visual data.
++
++# because the underlying language is fundamentally orientation-independent
++
++# recognize the actions of a worm with 95\% accuracy. The
++#      recognition tasks
+-   I propose a system that can express the types of recognition
+-   problems above in a form amenable to computation. It is split into
++
++
++
++   [Talk about these results and what you find promising about them]
++
++** Roadmap
++   [I'm going to explain how =CORTEX= works, then break down how
++   =EMPATH= does its thing. Because the details reveal such-and-such
++   about the approach.]
++
++   # The success of this simple proof-of-concept offers a tantalizing
++
++
++   # explore the idea
++   # The key contribution of this thesis is the idea that body-centered
++   # representations (which express
++
++
++   # the
++   # body-centered approach --- in which I try to determine what's
++   # happening in a scene by bringing it into registration with my own
++   # bodily experiences --- are indispensible for recognizing what
++   # creatures are doing in a scene.
++
++* COMMENT
++# body-centered language
++
++   In this thesis, I'll describe =EMPATH=, which solves a certain
++   class of recognition problems
++
++   The key idea is to use self-centered (or first-person) language.
++
++   I have built a system that can express the types of recognition
++   problems in a form amenable to computation. It is split into
+four parts:
+- Free/Guided Play :: The creature moves around and experiences the
+@@ -286,14 +383,14 @@
+code to create a creature, and can use a wide library of
+pre-existing blender models as a base for your own creatures.
+-   - =CORTEX= implements a wide variety of senses, including touch,
++   - =CORTEX= implements a wide variety of senses: touch,
+proprioception, vision, hearing, and muscle tension. Complicated
+senses like touch, and vision involve multiple sensory elements
+embedded in a 2D surface. You have complete control over the
+distribution of these sensor elements through the use of simple
+png image files. In particular, =CORTEX= implements more
+comprehensive hearing than any other creature simulation system
+-     available.
++     available.
+- =CORTEX= supports any number of creatures and any number of
+senses. Time in =CORTEX= dialates so that the simulated creatures
+@@ -353,7 +450,24 @@
+\end{sidewaysfigure}
+#+END_LaTeX
+-** Contributions
++** Road map
++
++   By the end of this thesis, you will have seen a novel approach to
++  interpreting video using embodiment and empathy. You will have also
++  seen one way to efficiently implement empathy for embodied
++  creatures. Finally, you will become familiar with =CORTEX=, a system
++  for designing and simulating creatures with rich senses, which you
++  may choose to use in your own research.
++
++  This is the core vision of my thesis: That one of the important ways
++  in which we understand others is by imagining ourselves in their
++  position and emphatically feeling experiences relative to our own
++  bodies. By understanding events in terms of our own previous
++  corporeal experience, we greatly constrain the possibilities of what
++  would otherwise be an unwieldy exponential search. This extra
++  constraint can be the difference between easily understanding what
++  is happening in a video and being completely lost in a sea of
++  incomprehensible color and movement.
+- I built =CORTEX=, a comprehensive platform for embodied AI
+experiments. =CORTEX= supports many features lacking in other
+@@ -363,18 +477,22 @@
+- I built =EMPATH=, which uses =CORTEX= to identify the actions of
+a worm-like creature using a computational model of empathy.
+-* Building =CORTEX=
+-
+-  I intend for =CORTEX= to be used as a general-purpose library for
+-  building creatures and outfitting them with senses, so that it will
+-  be useful for other researchers who want to test out ideas of their
+-  own. To this end, wherver I have had to make archetictural choices
+-  about =CORTEX=, I have chosen to give as much freedom to the user as
+-  possible, so that =CORTEX= may be used for things I have not
+-  forseen.
+-
+-** Simulation or Reality?
+-
++
++* Designing =CORTEX=
++  In this section, I outline the design decisions that went into
++  making =CORTEX=, along with some details about its
++  implementation. (A practical guide to getting started with =CORTEX=,
++  which skips over the history and implementation details presented
++  here, is provided in an appendix \ref{} at the end of this paper.)
++
++  Throughout this project, I intended for =CORTEX= to be flexible and
++  extensible enough to be useful for other researchers who want to
++  test out ideas of their own. To this end, wherver I have had to make
++  archetictural choices about =CORTEX=, I have chosen to give as much
++  freedom to the user as possible, so that =CORTEX= may be used for
++  things I have not forseen.
++
++** Building in simulation versus reality
+The most important archetictural decision of all is the choice to
+use a computer-simulated environemnt in the first place! The world
+is a vast and rich place, and for now simulations are a very poor
+@@ -436,7 +554,7 @@
+doing everything in software is far cheaper than building custom
+real-time hardware. All you need is a laptop and some patience.
+-** Because of Time, simulation is perferable to reality
++** Simulated time enables rapid prototyping and complex scenes
+I envision =CORTEX= being used to support rapid prototyping and
+iteration of ideas. Even if I could put together a well constructed
+@@ -459,8 +577,8 @@
+simulations of very simple creatures in =CORTEX= generally run at
+40x on my machine!
+-** What is a sense?
+-
++** All sense organs are two-dimensional surfaces
++# What is a sense?
+If =CORTEX= is to support a wide variety of senses, it would help
+to have a better understanding of what a ``sense'' actually is!
+While vision, touch, and hearing all seem like they are quite
+@@ -956,7 +1074,7 @@
+#+ATTR_LaTeX: :width 15cm
+[[./images/physical-hand.png]]
+-** Eyes reuse standard video game components
++** Sight reuses standard video game components...
+Vision is one of the most important senses for humans, so I need to
+build a simulated sense of vision for my AI. I will do this with
+@@ -1257,8 +1375,8 @@
+community and is now (in modified form) part of a system for
+capturing in-game video to a file.
+-** Hearing is hard; =CORTEX= does it right
+-
++** ...but hearing must be built from scratch
++# is hard; =CORTEX= does it right
+At the end of this section I will have simulated ears that work the
+same way as the simulated eyes in the last section. I will be able to
+place any number of ear-nodes in a blender file, and they will bind to
+@@ -1565,7 +1683,7 @@
+jMonkeyEngine3 community and is used to record audio for demo
+videos.
+-** Touch uses hundreds of hair-like elements
++** Hundreds of hair-like elements provide a sense of touch
+Touch is critical to navigation and spatial reasoning and as such I
+need a simulated version of it to give to my AI creatures.
+@@ -2059,7 +2177,7 @@
+#+ATTR_LaTeX: :width 15cm
+[[./images/touch-cube.png]]
+-** Proprioception is the sense that makes everything ``real''
++** Proprioception provides knowledge of your own body's position
+Close your eyes, and touch your nose with your right index finger.
+How did you do it? You could not see your hand, and neither your
+@@ -2193,7 +2311,7 @@
+#+ATTR_LaTeX: :width 11cm
+[[./images/proprio.png]]
+-** Muscles are both effectors and sensors
++** Muscles contain both sensors and effectors
+Surprisingly enough, terrestrial creatures only move by using
+torque applied about their joints. There's not a single straight
+@@ -2440,7 +2558,8 @@
+hard control problems without worrying about physics or
+senses.
+-* Empathy in a simulated worm
++* =EMPATH=: the simulated worm experiment
++# Empathy in a simulated worm
+Here I develop a computational model of empathy, using =CORTEX= as a
+base. Empathy in this context is the ability to observe another
+@@ -2732,7 +2851,7 @@
+provided by an experience vector and reliably infering the rest of
+the senses.
+-** Empathy is the process of tracing though \Phi-space
++** ``Empathy'' requires retracing steps though \Phi-space
+Here is the core of a basic empathy algorithm, starting with an
+experience vector:
+@@ -2888,7 +3007,7 @@
+#+end_src
+#+end_listing
+-** Efficient action recognition with =EMPATH=
++** =EMPATH= recognizes actions efficiently
+To use =EMPATH= with the worm, I first need to gather a set of
+experiences from the worm that includes the actions I want to
+@@ -3044,9 +3163,9 @@
+to interpretation, and dissaggrement between empathy and experience
+is more excusable.
+-** Digression: bootstrapping touch using free exploration
+-
+-   In the previous section I showed how to compute actions in terms of
++** Digression: Learn touch sensor layout through haptic experimentation, instead
++# Boostraping touch using free exploration
++In the previous section I showed how to compute actions in terms of
+body-centered predicates which relied averate touch activation of
+pre-defined regions of the worm's skin. What if, instead of recieving
+touch pre-grouped into the six faces of each worm segment, the true

Mercurial > cortex

comparison thesis/dylan-cortex-diff.diff @ 513:4c4d45f6f30b