cortex: thesis/dxh-cortex-diff.diff annotate

annotate thesis/dxh-cortex-diff.diff @ 511:07c3feb32df3

go over changes by Dylan.

author	Robert McIntyre <rlm@mit.edu>
date	Sun, 30 Mar 2014 10:17:43 -0400
parents
children	8b962ab418c8

rev	line source
rlm@511	1 diff -r f639e2139ce2 thesis/cortex.org
rlm@511	2 --- a/thesis/cortex.org Sun Mar 30 01:34:43 2014 -0400
rlm@511	3 +++ b/thesis/cortex.org Sun Mar 30 10:07:17 2014 -0400
rlm@511	4 @@ -41,49 +41,46 @@
rlm@511	5 [[./images/aurellem-gray.png]]
rlm@511	6
rlm@511	7
rlm@511	8 -* Empathy and Embodiment as problem solving strategies
rlm@511	9 +* Empathy \& Embodiment: problem solving strategies
rlm@511	10
rlm@511	11 - By the end of this thesis, you will have seen a novel approach to
rlm@511	12 - interpreting video using embodiment and empathy. You will have also
rlm@511	13 - seen one way to efficiently implement empathy for embodied
rlm@511	14 - creatures. Finally, you will become familiar with =CORTEX=, a system
rlm@511	15 - for designing and simulating creatures with rich senses, which you
rlm@511	16 - may choose to use in your own research.
rlm@511	17 -
rlm@511	18 - This is the core vision of my thesis: That one of the important ways
rlm@511	19 - in which we understand others is by imagining ourselves in their
rlm@511	20 - position and emphatically feeling experiences relative to our own
rlm@511	21 - bodies. By understanding events in terms of our own previous
rlm@511	22 - corporeal experience, we greatly constrain the possibilities of what
rlm@511	23 - would otherwise be an unwieldy exponential search. This extra
rlm@511	24 - constraint can be the difference between easily understanding what
rlm@511	25 - is happening in a video and being completely lost in a sea of
rlm@511	26 - incomprehensible color and movement.
rlm@511	27 -
rlm@511	28 -** Recognizing actions in video is extremely difficult
rlm@511	29 -
rlm@511	30 - Consider for example the problem of determining what is happening
rlm@511	31 - in a video of which this is one frame:
rlm@511	32 -
rlm@511	33 +** The problem: recognizing actions in video is extremely difficult
rlm@511	34 +# developing / requires useful representations
rlm@511	35 +
rlm@511	36 + Examine the following collection of images. As you, and indeed very
rlm@511	37 + young children, can easily determine, each one is a picture of
rlm@511	38 + someone drinking.
rlm@511	39 +
rlm@511	40 + # dxh: cat, cup, drinking fountain, rain, straw, coconut
rlm@511	41 #+caption: A cat drinking some water. Identifying this action is
rlm@511	42 - #+caption: beyond the state of the art for computers.
rlm@511	43 + #+caption: beyond the capabilities of existing computer vision systems.
rlm@511	44 #+ATTR_LaTeX: :width 7cm
rlm@511	45 [[./images/cat-drinking.jpg]]
rlm@511	46 +
rlm@511	47 + Nevertheless, it is beyond the state of the art for a computer
rlm@511	48 + vision program to describe what's happening in each of these
rlm@511	49 + images, or what's common to them. Part of the problem is that many
rlm@511	50 + computer vision systems focus on pixel-level details or probability
rlm@511	51 + distributions of pixels, with little focus on [...]
rlm@511	52 +
rlm@511	53 +
rlm@511	54 + In fact, the contents of scene may have much less to do with pixel
rlm@511	55 + probabilities than with recognizing various affordances: things you
rlm@511	56 + can move, objects you can grasp, spaces that can be filled
rlm@511	57 + (Gibson). For example, what processes might enable you to see the
rlm@511	58 + chair in figure \ref{hidden-chair}?
rlm@511	59 + # Or suppose that you are building a program that recognizes chairs.
rlm@511	60 + # How could you ``see'' the chair ?
rlm@511	61
rlm@511	62 - It is currently impossible for any computer program to reliably
rlm@511	63 - label such a video as ``drinking''. And rightly so -- it is a very
rlm@511	64 - hard problem! What features can you describe in terms of low level
rlm@511	65 - functions of pixels that can even begin to describe at a high level
rlm@511	66 - what is happening here?
rlm@511	67 -
rlm@511	68 - Or suppose that you are building a program that recognizes chairs.
rlm@511	69 - How could you ``see'' the chair in figure \ref{hidden-chair}?
rlm@511	70 -
rlm@511	71 + # dxh: blur chair
rlm@511	72 #+caption: The chair in this image is quite obvious to humans, but I
rlm@511	73 #+caption: doubt that any modern computer vision program can find it.
rlm@511	74 #+name: hidden-chair
rlm@511	75 #+ATTR_LaTeX: :width 10cm
rlm@511	76 [[./images/fat-person-sitting-at-desk.jpg]]
rlm@511	77 +
rlm@511	78 +
rlm@511	79 +
rlm@511	80 +
rlm@511	81
rlm@511	82 Finally, how is it that you can easily tell the difference between
rlm@511	83 how the girls /muscles/ are working in figure \ref{girl}?
rlm@511	84 @@ -95,10 +92,13 @@
rlm@511	85 #+ATTR_LaTeX: :width 7cm
rlm@511	86 [[./images/wall-push.png]]
rlm@511	87
rlm@511	88 +
rlm@511	89 +
rlm@511	90 +
rlm@511	91 Each of these examples tells us something about what might be going
rlm@511	92 on in our minds as we easily solve these recognition problems.
rlm@511	93
rlm@511	94 - The hidden chairs show us that we are strongly triggered by cues
rlm@511	95 + The hidden chair shows us that we are strongly triggered by cues
rlm@511	96 relating to the position of human bodies, and that we can determine
rlm@511	97 the overall physical configuration of a human body even if much of
rlm@511	98 that body is occluded.
rlm@511	99 @@ -109,10 +109,107 @@
rlm@511	100 most positions, and we can easily project this self-knowledge to
rlm@511	101 imagined positions triggered by images of the human body.
rlm@511	102
rlm@511	103 -** =EMPATH= neatly solves recognition problems
rlm@511	104 +** A step forward: the sensorimotor-centered approach
rlm@511	105 +# ** =EMPATH= recognizes what creatures are doing
rlm@511	106 +# neatly solves recognition problems
rlm@511	107 + In this thesis, I explore the idea that our knowledge of our own
rlm@511	108 + bodies enables us to recognize the actions of others.
rlm@511	109 +
rlm@511	110 + First, I built a system for constructing virtual creatures with
rlm@511	111 + physiologically plausible sensorimotor systems and detailed
rlm@511	112 + environments. The result is =CORTEX=, which is described in section
rlm@511	113 + \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other
rlm@511	114 + AI researchers; it is provided in full with detailed instructions
rlm@511	115 + on the web [here].)
rlm@511	116 +
rlm@511	117 + Next, I wrote routines which enabled a simple worm-like creature to
rlm@511	118 + infer the actions of a second worm-like creature, using only its
rlm@511	119 + own prior sensorimotor experiences and knowledge of the second
rlm@511	120 + worm's joint positions. This program, =EMPATH=, is described in
rlm@511	121 + section \ref{sec-3}, and the key results of this experiment are
rlm@511	122 + summarized below.
rlm@511	123 +
rlm@511	124 + #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer
rlm@511	125 + #+caption: the complete sensory experience and classify these four poses.
rlm@511	126 + #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.
rlm@511	127 + #+name: worm-recognition-intro-2
rlm@511	128 + #+ATTR_LaTeX: :width 15cm
rlm@511	129 + [[./images/empathy-1.png]]
rlm@511	130 +
rlm@511	131 + # =CORTEX= provides a language for describing the sensorimotor
rlm@511	132 + # experiences of various creatures.
rlm@511	133 +
rlm@511	134 + # Next, I developed an experiment to test the power of =CORTEX='s
rlm@511	135 + # sensorimotor-centered language for solving recognition problems. As
rlm@511	136 + # a proof of concept, I wrote routines which enabled a simple
rlm@511	137 + # worm-like creature to infer the actions of a second worm-like
rlm@511	138 + # creature, using only its own previous sensorimotor experiences and
rlm@511	139 + # knowledge of the second worm's joints (figure
rlm@511	140 + # \ref{worm-recognition-intro-2}). The result of this proof of
rlm@511	141 + # concept was the program =EMPATH=, described in section
rlm@511	142 + # \ref{sec-3}. The key results of this
rlm@511	143 +
rlm@511	144 + # Using only first-person sensorimotor experiences and third-person
rlm@511	145 + # proprioceptive data,
rlm@511	146 +
rlm@511	147 +*** Key results
rlm@511	148 + - After one-shot supervised training, =EMPATH= was able recognize a
rlm@511	149 + wide variety of static poses and dynamic actions---ranging from
rlm@511	150 + curling in a circle to wriggling with a particular frequency ---
rlm@511	151 + with 95\% accuracy.
rlm@511	152 + - These results were completely independent of viewing angle
rlm@511	153 + because the underlying body-centered language fundamentally is;
rlm@511	154 + once an action is learned, it can be recognized equally well from
rlm@511	155 + any viewing angle.
rlm@511	156 + - =EMPATH= is surprisingly short; the sensorimotor-centered
rlm@511	157 + language provided by =CORTEX= resulted in extremely economical
rlm@511	158 + recognition routines --- about 0000 lines in all --- suggesting
rlm@511	159 + that such representations are very powerful, and often
rlm@511	160 + indispensible for the types of recognition tasks considered here.
rlm@511	161 + - Although for expediency's sake, I relied on direct knowledge of
rlm@511	162 + joint positions in this proof of concept, it would be
rlm@511	163 + straightforward to extend =EMPATH= so that it (more
rlm@511	164 + realistically) infers joint positions from its visual data.
rlm@511	165 +
rlm@511	166 +# because the underlying language is fundamentally orientation-independent
rlm@511	167 +
rlm@511	168 +# recognize the actions of a worm with 95\% accuracy. The
rlm@511	169 +# recognition tasks
rlm@511	170
rlm@511	171 - I propose a system that can express the types of recognition
rlm@511	172 - problems above in a form amenable to computation. It is split into
rlm@511	173 +
rlm@511	174 +
rlm@511	175 +
rlm@511	176 + [Talk about these results and what you find promising about them]
rlm@511	177 +
rlm@511	178 +** Roadmap
rlm@511	179 + [I'm going to explain how =CORTEX= works, then break down how
rlm@511	180 + =EMPATH= does its thing. Because the details reveal such-and-such
rlm@511	181 + about the approach.]
rlm@511	182 +
rlm@511	183 + # The success of this simple proof-of-concept offers a tantalizing
rlm@511	184 +
rlm@511	185 +
rlm@511	186 + # explore the idea
rlm@511	187 + # The key contribution of this thesis is the idea that body-centered
rlm@511	188 + # representations (which express
rlm@511	189 +
rlm@511	190 +
rlm@511	191 + # the
rlm@511	192 + # body-centered approach --- in which I try to determine what's
rlm@511	193 + # happening in a scene by bringing it into registration with my own
rlm@511	194 + # bodily experiences --- are indispensible for recognizing what
rlm@511	195 + # creatures are doing in a scene.
rlm@511	196 +
rlm@511	197 +* COMMENT
rlm@511	198 +# body-centered language
rlm@511	199 +
rlm@511	200 + In this thesis, I'll describe =EMPATH=, which solves a certain
rlm@511	201 + class of recognition problems
rlm@511	202 +
rlm@511	203 + The key idea is to use self-centered (or first-person) language.
rlm@511	204 +
rlm@511	205 + I have built a system that can express the types of recognition
rlm@511	206 + problems in a form amenable to computation. It is split into
rlm@511	207 four parts:
rlm@511	208
rlm@511	209 - Free/Guided Play :: The creature moves around and experiences the
rlm@511	210 @@ -286,14 +383,14 @@
rlm@511	211 code to create a creature, and can use a wide library of
rlm@511	212 pre-existing blender models as a base for your own creatures.
rlm@511	213
rlm@511	214 - - =CORTEX= implements a wide variety of senses, including touch,
rlm@511	215 + - =CORTEX= implements a wide variety of senses: touch,
rlm@511	216 proprioception, vision, hearing, and muscle tension. Complicated
rlm@511	217 senses like touch, and vision involve multiple sensory elements
rlm@511	218 embedded in a 2D surface. You have complete control over the
rlm@511	219 distribution of these sensor elements through the use of simple
rlm@511	220 png image files. In particular, =CORTEX= implements more
rlm@511	221 comprehensive hearing than any other creature simulation system
rlm@511	222 - available.
rlm@511	223 + available.
rlm@511	224
rlm@511	225 - =CORTEX= supports any number of creatures and any number of
rlm@511	226 senses. Time in =CORTEX= dialates so that the simulated creatures
rlm@511	227 @@ -353,7 +450,24 @@
rlm@511	228 \end{sidewaysfigure}
rlm@511	229 #+END_LaTeX
rlm@511	230
rlm@511	231 -** Contributions
rlm@511	232 +** Road map
rlm@511	233 +
rlm@511	234 + By the end of this thesis, you will have seen a novel approach to
rlm@511	235 + interpreting video using embodiment and empathy. You will have also
rlm@511	236 + seen one way to efficiently implement empathy for embodied
rlm@511	237 + creatures. Finally, you will become familiar with =CORTEX=, a system
rlm@511	238 + for designing and simulating creatures with rich senses, which you
rlm@511	239 + may choose to use in your own research.
rlm@511	240 +
rlm@511	241 + This is the core vision of my thesis: That one of the important ways
rlm@511	242 + in which we understand others is by imagining ourselves in their
rlm@511	243 + position and emphatically feeling experiences relative to our own
rlm@511	244 + bodies. By understanding events in terms of our own previous
rlm@511	245 + corporeal experience, we greatly constrain the possibilities of what
rlm@511	246 + would otherwise be an unwieldy exponential search. This extra
rlm@511	247 + constraint can be the difference between easily understanding what
rlm@511	248 + is happening in a video and being completely lost in a sea of
rlm@511	249 + incomprehensible color and movement.
rlm@511	250
rlm@511	251 - I built =CORTEX=, a comprehensive platform for embodied AI
rlm@511	252 experiments. =CORTEX= supports many features lacking in other
rlm@511	253 @@ -363,18 +477,22 @@
rlm@511	254 - I built =EMPATH=, which uses =CORTEX= to identify the actions of
rlm@511	255 a worm-like creature using a computational model of empathy.
rlm@511	256
rlm@511	257 -* Building =CORTEX=
rlm@511	258 -
rlm@511	259 - I intend for =CORTEX= to be used as a general-purpose library for
rlm@511	260 - building creatures and outfitting them with senses, so that it will
rlm@511	261 - be useful for other researchers who want to test out ideas of their
rlm@511	262 - own. To this end, wherver I have had to make archetictural choices
rlm@511	263 - about =CORTEX=, I have chosen to give as much freedom to the user as
rlm@511	264 - possible, so that =CORTEX= may be used for things I have not
rlm@511	265 - forseen.
rlm@511	266 -
rlm@511	267 -** Simulation or Reality?
rlm@511	268 -
rlm@511	269 +
rlm@511	270 +* Designing =CORTEX=
rlm@511	271 + In this section, I outline the design decisions that went into
rlm@511	272 + making =CORTEX=, along with some details about its
rlm@511	273 + implementation. (A practical guide to getting started with =CORTEX=,
rlm@511	274 + which skips over the history and implementation details presented
rlm@511	275 + here, is provided in an appendix \ref{} at the end of this paper.)
rlm@511	276 +
rlm@511	277 + Throughout this project, I intended for =CORTEX= to be flexible and
rlm@511	278 + extensible enough to be useful for other researchers who want to
rlm@511	279 + test out ideas of their own. To this end, wherver I have had to make
rlm@511	280 + archetictural choices about =CORTEX=, I have chosen to give as much
rlm@511	281 + freedom to the user as possible, so that =CORTEX= may be used for
rlm@511	282 + things I have not forseen.
rlm@511	283 +
rlm@511	284 +** Building in simulation versus reality
rlm@511	285 The most important archetictural decision of all is the choice to
rlm@511	286 use a computer-simulated environemnt in the first place! The world
rlm@511	287 is a vast and rich place, and for now simulations are a very poor
rlm@511	288 @@ -436,7 +554,7 @@
rlm@511	289 doing everything in software is far cheaper than building custom
rlm@511	290 real-time hardware. All you need is a laptop and some patience.
rlm@511	291
rlm@511	292 -** Because of Time, simulation is perferable to reality
rlm@511	293 +** Simulated time enables rapid prototyping and complex scenes
rlm@511	294
rlm@511	295 I envision =CORTEX= being used to support rapid prototyping and
rlm@511	296 iteration of ideas. Even if I could put together a well constructed
rlm@511	297 @@ -459,8 +577,8 @@
rlm@511	298 simulations of very simple creatures in =CORTEX= generally run at
rlm@511	299 40x on my machine!
rlm@511	300
rlm@511	301 -** What is a sense?
rlm@511	302 -
rlm@511	303 +** All sense organs are two-dimensional surfaces
rlm@511	304 +# What is a sense?
rlm@511	305 If =CORTEX= is to support a wide variety of senses, it would help
rlm@511	306 to have a better understanding of what a ``sense'' actually is!
rlm@511	307 While vision, touch, and hearing all seem like they are quite
rlm@511	308 @@ -956,7 +1074,7 @@
rlm@511	309 #+ATTR_LaTeX: :width 15cm
rlm@511	310 [[./images/physical-hand.png]]
rlm@511	311
rlm@511	312 -** Eyes reuse standard video game components
rlm@511	313 +** Sight reuses standard video game components...
rlm@511	314
rlm@511	315 Vision is one of the most important senses for humans, so I need to
rlm@511	316 build a simulated sense of vision for my AI. I will do this with
rlm@511	317 @@ -1257,8 +1375,8 @@
rlm@511	318 community and is now (in modified form) part of a system for
rlm@511	319 capturing in-game video to a file.
rlm@511	320
rlm@511	321 -** Hearing is hard; =CORTEX= does it right
rlm@511	322 -
rlm@511	323 +** ...but hearing must be built from scratch
rlm@511	324 +# is hard; =CORTEX= does it right
rlm@511	325 At the end of this section I will have simulated ears that work the
rlm@511	326 same way as the simulated eyes in the last section. I will be able to
rlm@511	327 place any number of ear-nodes in a blender file, and they will bind to
rlm@511	328 @@ -1565,7 +1683,7 @@
rlm@511	329 jMonkeyEngine3 community and is used to record audio for demo
rlm@511	330 videos.
rlm@511	331
rlm@511	332 -** Touch uses hundreds of hair-like elements
rlm@511	333 +** Hundreds of hair-like elements provide a sense of touch
rlm@511	334
rlm@511	335 Touch is critical to navigation and spatial reasoning and as such I
rlm@511	336 need a simulated version of it to give to my AI creatures.
rlm@511	337 @@ -2059,7 +2177,7 @@
rlm@511	338 #+ATTR_LaTeX: :width 15cm
rlm@511	339 [[./images/touch-cube.png]]
rlm@511	340
rlm@511	341 -** Proprioception is the sense that makes everything ``real''
rlm@511	342 +** Proprioception provides knowledge of your own body's position
rlm@511	343
rlm@511	344 Close your eyes, and touch your nose with your right index finger.
rlm@511	345 How did you do it? You could not see your hand, and neither your
rlm@511	346 @@ -2193,7 +2311,7 @@
rlm@511	347 #+ATTR_LaTeX: :width 11cm
rlm@511	348 [[./images/proprio.png]]
rlm@511	349
rlm@511	350 -** Muscles are both effectors and sensors
rlm@511	351 +** Muscles contain both sensors and effectors
rlm@511	352
rlm@511	353 Surprisingly enough, terrestrial creatures only move by using
rlm@511	354 torque applied about their joints. There's not a single straight
rlm@511	355 @@ -2440,7 +2558,8 @@
rlm@511	356 hard control problems without worrying about physics or
rlm@511	357 senses.
rlm@511	358
rlm@511	359 -* Empathy in a simulated worm
rlm@511	360 +* =EMPATH=: the simulated worm experiment
rlm@511	361 +# Empathy in a simulated worm
rlm@511	362
rlm@511	363 Here I develop a computational model of empathy, using =CORTEX= as a
rlm@511	364 base. Empathy in this context is the ability to observe another
rlm@511	365 @@ -2732,7 +2851,7 @@
rlm@511	366 provided by an experience vector and reliably infering the rest of
rlm@511	367 the senses.
rlm@511	368
rlm@511	369 -** Empathy is the process of tracing though \Phi-space
rlm@511	370 +** ``Empathy'' requires retracing steps though \Phi-space
rlm@511	371
rlm@511	372 Here is the core of a basic empathy algorithm, starting with an
rlm@511	373 experience vector:
rlm@511	374 @@ -2888,7 +3007,7 @@
rlm@511	375 #+end_src
rlm@511	376 #+end_listing
rlm@511	377
rlm@511	378 -** Efficient action recognition with =EMPATH=
rlm@511	379 +** =EMPATH= recognizes actions efficiently
rlm@511	380
rlm@511	381 To use =EMPATH= with the worm, I first need to gather a set of
rlm@511	382 experiences from the worm that includes the actions I want to
rlm@511	383 @@ -3044,9 +3163,9 @@
rlm@511	384 to interpretation, and dissaggrement between empathy and experience
rlm@511	385 is more excusable.
rlm@511	386
rlm@511	387 -** Digression: bootstrapping touch using free exploration
rlm@511	388 -
rlm@511	389 - In the previous section I showed how to compute actions in terms of
rlm@511	390 +** Digression: Learn touch sensor layout through haptic experimentation, instead
rlm@511	391 +# Boostraping touch using free exploration
rlm@511	392 +In the previous section I showed how to compute actions in terms of
rlm@511	393 body-centered predicates which relied averate touch activation of
rlm@511	394 pre-defined regions of the worm's skin. What if, instead of recieving
rlm@511	395 touch pre-grouped into the six faces of each worm segment, the true
rlm@511	396 @@ -3210,13 +3329,14 @@
rlm@511	397
rlm@511	398 In this thesis you have seen the =CORTEX= system, a complete
rlm@511	399 environment for creating simulated creatures. You have seen how to
rlm@511	400 - implement five senses including touch, proprioception, hearing,
rlm@511	401 - vision, and muscle tension. You have seen how to create new creatues
rlm@511	402 - using blender, a 3D modeling tool. I hope that =CORTEX= will be
rlm@511	403 - useful in further research projects. To this end I have included the
rlm@511	404 - full source to =CORTEX= along with a large suite of tests and
rlm@511	405 - examples. I have also created a user guide for =CORTEX= which is
rlm@511	406 - inculded in an appendix to this thesis.
rlm@511	407 + implement five senses: touch, proprioception, hearing, vision, and
rlm@511	408 + muscle tension. You have seen how to create new creatues using
rlm@511	409 + blender, a 3D modeling tool. I hope that =CORTEX= will be useful in
rlm@511	410 + further research projects. To this end I have included the full
rlm@511	411 + source to =CORTEX= along with a large suite of tests and examples. I
rlm@511	412 + have also created a user guide for =CORTEX= which is inculded in an
rlm@511	413 + appendix to this thesis \ref{}.
rlm@511	414 +# dxh: todo reference appendix
rlm@511	415
rlm@511	416 You have also seen how I used =CORTEX= as a platform to attach the
rlm@511	417 /action recognition/ problem, which is the problem of recognizing

Mercurial > cortex

annotate thesis/dxh-cortex-diff.diff @ 511:07c3feb32df3