cortex: thesis/cortex.org comparison

comparison thesis/cortex.org @ 511:07c3feb32df3

go over changes by Dylan.

author	Robert McIntyre <rlm@mit.edu>
date	Sun, 30 Mar 2014 10:17:43 -0400
parents	f639e2139ce2
children	447c3c8405a2

comparison

equal deleted inserted replaced

-:f639e2139ce2
+:07c3feb32df3
 #+name: name
 #+ATTR_LaTeX: :width 10cm
 [[./images/aurellem-gray.png]]
-* Empathy and Embodiment as problem solving strategies
+* Empathy \& Embodiment: problem solving strategies
-By the end of this thesis, you will have seen a novel approach to
+** The problem: recognizing actions in video is extremely difficult
-interpreting video using embodiment and empathy. You will have also
+# developing / requires useful representations
-seen one way to efficiently implement empathy for embodied
-creatures. Finally, you will become familiar with =CORTEX=, a system
+Examine the following collection of images. As you, and indeed very
-for designing and simulating creatures with rich senses, which you
+young children, can easily determine, each one is a picture of
-may choose to use in your own research.
+someone drinking.
-This is the core vision of my thesis: That one of the important ways
+# dxh: cat, cup, drinking fountain, rain, straw, coconut
-in which we understand others is by imagining ourselves in their
-position and emphatically feeling experiences relative to our own
-bodies. By understanding events in terms of our own previous
-corporeal experience, we greatly constrain the possibilities of what
-would otherwise be an unwieldy exponential search. This extra
-constraint can be the difference between easily understanding what
-is happening in a video and being completely lost in a sea of
-incomprehensible color and movement.
-** Recognizing actions in video is extremely difficult
-Consider for example the problem of determining what is happening
-in a video of which this is one frame:
 #+caption: A cat drinking some water. Identifying this action is
-#+caption: beyond the state of the art for computers.
+#+caption: beyond the capabilities of existing computer vision systems.
 #+ATTR_LaTeX: :width 7cm
 [[./images/cat-drinking.jpg]]
-It is currently impossible for any computer program to reliably
+Nevertheless, it is beyond the state of the art for a computer
-label such a video as ``drinking''. And rightly so -- it is a very
+vision program to describe what's happening in each of these
-hard problem! What features can you describe in terms of low level
+images, or what's common to them. Part of the problem is that many
-functions of pixels that can even begin to describe at a high level
+computer vision systems focus on pixel-level details or probability
-what is happening here?
+distributions of pixels, with little focus on [...]
-Or suppose that you are building a program that recognizes chairs.
-How could you ``see'' the chair in figure \ref{hidden-chair}?
+In fact, the contents of scene may have much less to do with pixel
+probabilities than with recognizing various affordances: things you
+can move, objects you can grasp, spaces that can be filled
+(Gibson). For example, what processes might enable you to see the
+chair in figure \ref{hidden-chair}?
+# Or suppose that you are building a program that recognizes chairs.
+# How could you ``see'' the chair ?
+# dxh: blur chair
 #+caption: The chair in this image is quite obvious to humans, but I
 #+caption: doubt that any modern computer vision program can find it.
 #+name: hidden-chair
 #+ATTR_LaTeX: :width 10cm
 [[./images/fat-person-sitting-at-desk.jpg]]
 Finally, how is it that you can easily tell the difference between
 how the girls /muscles/ are working in figure \ref{girl}?
 #+caption: The mysterious ``common sense'' appears here as you are able
 #+caption: are activated between the two images.
 #+name: girl
 #+ATTR_LaTeX: :width 7cm
 [[./images/wall-push.png]]
 Each of these examples tells us something about what might be going
 on in our minds as we easily solve these recognition problems.
-The hidden chairs show us that we are strongly triggered by cues
+The hidden chair shows us that we are strongly triggered by cues
 relating to the position of human bodies, and that we can determine
 the overall physical configuration of a human body even if much of
 that body is occluded.
 The picture of the girl pushing against the wall tells us that we
 have common sense knowledge about the kinetics of our own bodies.
 We know well how our muscles would have to work to maintain us in
 most positions, and we can easily project this self-knowledge to
 imagined positions triggered by images of the human body.
-** =EMPATH= neatly solves recognition problems
+** A step forward: the sensorimotor-centered approach
+# ** =EMPATH= recognizes what creatures are doing
-I propose a system that can express the types of recognition
+# neatly solves recognition problems
-problems above in a form amenable to computation. It is split into
+In this thesis, I explore the idea that our knowledge of our own
+bodies enables us to recognize the actions of others.
+First, I built a system for constructing virtual creatures with
+physiologically plausible sensorimotor systems and detailed
+environments. The result is =CORTEX=, which is described in section
+\ref{sec-2}. (=CORTEX= was built to be flexible and useful to other
+AI researchers; it is provided in full with detailed instructions
+on the web [here].)
+Next, I wrote routines which enabled a simple worm-like creature to
+infer the actions of a second worm-like creature, using only its
+own prior sensorimotor experiences and knowledge of the second
+worm's joint positions. This program, =EMPATH=, is described in
+section \ref{sec-3}, and the key results of this experiment are
+summarized below.
+#+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer
+#+caption: the complete sensory experience and classify these four poses.
+#+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.
+#+name: worm-recognition-intro-2
+#+ATTR_LaTeX: :width 15cm
+[[./images/empathy-1.png]]
+# =CORTEX= provides a language for describing the sensorimotor
+# experiences of various creatures.
+# Next, I developed an experiment to test the power of =CORTEX='s
+# sensorimotor-centered language for solving recognition problems. As
+# a proof of concept, I wrote routines which enabled a simple
+# worm-like creature to infer the actions of a second worm-like
+# creature, using only its own previous sensorimotor experiences and
+# knowledge of the second worm's joints (figure
+# \ref{worm-recognition-intro-2}). The result of this proof of
+# concept was the program =EMPATH=, described in section
+# \ref{sec-3}. The key results of this
+# Using only first-person sensorimotor experiences and third-person
+# proprioceptive data,
+*** Key results
+- After one-shot supervised training, =EMPATH= was able recognize a
+wide variety of static poses and dynamic actions---ranging from
+curling in a circle to wriggling with a particular frequency ---
+with 95\% accuracy.
+- These results were completely independent of viewing angle
+because the underlying body-centered language fundamentally is
+independent; once an action is learned, it can be recognized
+equally well from any viewing angle.
+- =EMPATH= is surprisingly short; the sensorimotor-centered
+language provided by =CORTEX= resulted in extremely economical
+recognition routines --- about 0000 lines in all --- suggesting
+that such representations are very powerful, and often
+indispensible for the types of recognition tasks considered here.
+- Although for expediency's sake, I relied on direct knowledge of
+joint positions in this proof of concept, it would be
+straightforward to extend =EMPATH= so that it (more
+realistically) infers joint positions from its visual data.
+# because the underlying language is fundamentally orientation-independent
+# recognize the actions of a worm with 95\% accuracy. The
+#      recognition tasks
+[Talk about these results and what you find promising about them]
+** Roadmap
+[I'm going to explain how =CORTEX= works, then break down how
+=EMPATH= does its thing. Because the details reveal such-and-such
+about the approach.]
+# The success of this simple proof-of-concept offers a tantalizing
+# explore the idea
+# The key contribution of this thesis is the idea that body-centered
+# representations (which express
+# the
+# body-centered approach --- in which I try to determine what's
+# happening in a scene by bringing it into registration with my own
+# bodily experiences --- are indispensible for recognizing what
+# creatures are doing in a scene.
+* COMMENT
+# body-centered language
+In this thesis, I'll describe =EMPATH=, which solves a certain
+class of recognition problems
+The key idea is to use self-centered (or first-person) language.
+I have built a system that can express the types of recognition
+problems in a form amenable to computation. It is split into
 four parts:
 - Free/Guided Play :: The creature moves around and experiences the
 world through its unique perspective. Many otherwise
 complicated actions are easily described in the language of a
 program. Each sense can be specified using special blender nodes
 with biologically inspired paramaters. You need not write any
 code to create a creature, and can use a wide library of
 pre-existing blender models as a base for your own creatures.
-- =CORTEX= implements a wide variety of senses, including touch,
+- =CORTEX= implements a wide variety of senses: touch,
 proprioception, vision, hearing, and muscle tension. Complicated
 senses like touch, and vision involve multiple sensory elements
 embedded in a 2D surface. You have complete control over the
 distribution of these sensor elements through the use of simple
 png image files. In particular, =CORTEX= implements more
 comprehensive hearing than any other creature simulation system
 available.
 - =CORTEX= supports any number of creatures and any number of
 senses. Time in =CORTEX= dialates so that the simulated creatures
 always precieve a perfectly smooth flow of time, regardless of
 the actual computational load.
 its own finger from the eye in its palm, and that it can feel its
 own thumb touching its palm.}
 \end{sidewaysfigure}
 #+END_LaTeX
-** Contributions
+** Road map
+By the end of this thesis, you will have seen a novel approach to
+interpreting video using embodiment and empathy. You will have also
+seen one way to efficiently implement empathy for embodied
+creatures. Finally, you will become familiar with =CORTEX=, a system
+for designing and simulating creatures with rich senses, which you
+may choose to use in your own research.
+This is the core vision of my thesis: That one of the important ways
+in which we understand others is by imagining ourselves in their
+position and emphatically feeling experiences relative to our own
+bodies. By understanding events in terms of our own previous
+corporeal experience, we greatly constrain the possibilities of what
+would otherwise be an unwieldy exponential search. This extra
+constraint can be the difference between easily understanding what
+is happening in a video and being completely lost in a sea of
+incomprehensible color and movement.
 - I built =CORTEX=, a comprehensive platform for embodied AI
 experiments. =CORTEX= supports many features lacking in other
 systems, such proper simulation of hearing. It is easy to create
 new =CORTEX= creatures using Blender, a free 3D modeling program.
 - I built =EMPATH=, which uses =CORTEX= to identify the actions of
 a worm-like creature using a computational model of empathy.
-* Building =CORTEX=
+* Designing =CORTEX=
-I intend for =CORTEX= to be used as a general-purpose library for
+In this section, I outline the design decisions that went into
-building creatures and outfitting them with senses, so that it will
+making =CORTEX=, along with some details about its
-be useful for other researchers who want to test out ideas of their
+implementation. (A practical guide to getting started with =CORTEX=,
-own. To this end, wherver I have had to make archetictural choices
+which skips over the history and implementation details presented
-about =CORTEX=, I have chosen to give as much freedom to the user as
+here, is provided in an appendix \ref{} at the end of this paper.)
-possible, so that =CORTEX= may be used for things I have not
-forseen.
+Throughout this project, I intended for =CORTEX= to be flexible and
+extensible enough to be useful for other researchers who want to
-** Simulation or Reality?
+test out ideas of their own. To this end, wherver I have had to make
+archetictural choices about =CORTEX=, I have chosen to give as much
+freedom to the user as possible, so that =CORTEX= may be used for
+things I have not forseen.
+** Building in simulation versus reality
 The most important archetictural decision of all is the choice to
 use a computer-simulated environemnt in the first place! The world
 is a vast and rich place, and for now simulations are a very poor
 reflection of its complexity. It may be that there is a significant
 qualatative difference between dealing with senses in the real
 time in the simulated world can be slowed down to accommodate the
 limitations of the character's programming. In terms of cost,
 doing everything in software is far cheaper than building custom
 real-time hardware. All you need is a laptop and some patience.
-** Because of Time, simulation is perferable to reality
+** Simulated time enables rapid prototyping and complex scenes
 I envision =CORTEX= being used to support rapid prototyping and
 iteration of ideas. Even if I could put together a well constructed
 kit for creating robots, it would still not be enough because of
 the scourge of real-time processing. Anyone who wants to test their
 the simulation. The cost is that =CORTEX= can sometimes run slower
 than real time. This can also be an advantage, however ---
 simulations of very simple creatures in =CORTEX= generally run at
 40x on my machine!
-** What is a sense?
+** All sense organs are two-dimensional surfaces
+# What is a sense?
 If =CORTEX= is to support a wide variety of senses, it would help
 to have a better understanding of what a ``sense'' actually is!
 While vision, touch, and hearing all seem like they are quite
 different things, I was supprised to learn during the course of
 this thesis that they (and all physical senses) can be expressed as
 #+caption: simulation environment.
 #+name: name
 #+ATTR_LaTeX: :width 15cm
 [[./images/physical-hand.png]]
-** Eyes reuse standard video game components
+** Sight reuses standard video game components...
 Vision is one of the most important senses for humans, so I need to
 build a simulated sense of vision for my AI. I will do this with
 simulated eyes. Each eye can be independently moved and should see
 its own version of the world depending on where it is.
 This vision code has already been absorbed by the jMonkeyEngine
 community and is now (in modified form) part of a system for
 capturing in-game video to a file.
-** Hearing is hard; =CORTEX= does it right
+** ...but hearing must be built from scratch
+# is hard; =CORTEX= does it right
 At the end of this section I will have simulated ears that work the
 same way as the simulated eyes in the last section. I will be able to
 place any number of ear-nodes in a blender file, and they will bind to
 the closest physical object and follow it as it moves around. Each ear
 will provide access to the sound data it picks up between every frame.
 This system of hearing has also been co-opted by the
 jMonkeyEngine3 community and is used to record audio for demo
 videos.
-** Touch uses hundreds of hair-like elements
+** Hundreds of hair-like elements provide a sense of touch
 Touch is critical to navigation and spatial reasoning and as such I
 need a simulated version of it to give to my AI creatures.
 Human skin has a wide array of touch sensors, each of which
 #+caption: part of the ground.
 #+name: touch-cube-uv-map
 #+ATTR_LaTeX: :width 15cm
 [[./images/touch-cube.png]]
-** Proprioception is the sense that makes everything ``real''
+** Proprioception provides knowledge of your own body's position
 Close your eyes, and touch your nose with your right index finger.
 How did you do it? You could not see your hand, and neither your
 hand nor your nose could use the sense of touch to guide the path
 of your hand. There are no sound cues, and Taste and Smell
 #+caption: pitch, and White is roll.
 #+name: proprio
 #+ATTR_LaTeX: :width 11cm
 [[./images/proprio.png]]
-** Muscles are both effectors and sensors
+** Muscles contain both sensors and effectors
 Surprisingly enough, terrestrial creatures only move by using
 torque applied about their joints. There's not a single straight
 line of force in the human body at all! (A straight line of force
 would correspond to some sort of jet or rocket propulsion.)
 - Inverse kinematics :: experiments in sense guided motor control
 are easy given =CORTEX='s support -- you can get right to the
 hard control problems without worrying about physics or
 senses.
-* Empathy in a simulated worm
+* =EMPATH=: the simulated worm experiment
+# Empathy in a simulated worm
 Here I develop a computational model of empathy, using =CORTEX= as a
 base. Empathy in this context is the ability to observe another
 creature and infer what sorts of sensations that creature is
 feeling. My empathy algorithm involves multiple phases. First is
 There is a simple way of taking \Phi-space and the total ordering
 provided by an experience vector and reliably infering the rest of
 the senses.
-** Empathy is the process of tracing though \Phi-space
+** ``Empathy'' requires retracing steps though \Phi-space
 Here is the core of a basic empathy algorithm, starting with an
 experience vector:
 First, group the experiences into tiered proprioceptive bins. I use
 (recur (dec i) (assoc! v (dec i) cur)))
 (recur i (assoc! v i 0))))))
 #+end_src
 #+end_listing
-** Efficient action recognition with =EMPATH=
+** =EMPATH= recognizes actions efficiently
 To use =EMPATH= with the worm, I first need to gather a set of
 experiences from the worm that includes the actions I want to
 recognize. The =generate-phi-space= program (listing
 \ref{generate-phi-space} runs the worm through a series of
 boundaries of transitioning from one type of action to another.
 During these transitions the exact label for the action is more open
 to interpretation, and dissaggrement between empathy and experience
 is more excusable.
-** Digression: bootstrapping touch using free exploration
+** Digression: Learn touch sensor layout through haptic experimentation, instead
+# Boostraping touch using free exploration
 In the previous section I showed how to compute actions in terms of
 body-centered predicates which relied averate touch activation of
 pre-defined regions of the worm's skin. What if, instead of recieving
 touch pre-grouped into the six faces of each worm segment, the true
 topology of the worm's skin was unknown? This is more similiar to how
 a nerve fiber bundle might be arranged. While two fibers that are
 * Contributions
 In this thesis you have seen the =CORTEX= system, a complete
 environment for creating simulated creatures. You have seen how to
-implement five senses including touch, proprioception, hearing,
+implement five senses: touch, proprioception, hearing, vision, and
-vision, and muscle tension. You have seen how to create new creatues
+muscle tension. You have seen how to create new creatues using
-using blender, a 3D modeling tool. I hope that =CORTEX= will be
+blender, a 3D modeling tool. I hope that =CORTEX= will be useful in
-useful in further research projects. To this end I have included the
+further research projects. To this end I have included the full
-full source to =CORTEX= along with a large suite of tests and
+source to =CORTEX= along with a large suite of tests and examples. I
-examples. I have also created a user guide for =CORTEX= which is
+have also created a user guide for =CORTEX= which is inculded in an
-inculded in an appendix to this thesis.
+appendix to this thesis \ref{}.
+# dxh: todo reference appendix
 You have also seen how I used =CORTEX= as a platform to attach the
 /action recognition/ problem, which is the problem of recognizing
 actions in video. You saw a simple system called =EMPATH= which
 ientifies actions by first describing actions in a body-centerd,

Mercurial > cortex

comparison thesis/cortex.org @ 511:07c3feb32df3