Mercurial > cortex
view thesis/dylan-cortex-diff.diff @ 513:4c4d45f6f30b
accept/reject changes
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Sun, 30 Mar 2014 10:41:18 -0400 |
parents | |
children | 447c3c8405a2 |
line wrap: on
line source
1 diff -r f639e2139ce2 thesis/cortex.org2 --- a/thesis/cortex.org Sun Mar 30 01:34:43 2014 -04003 +++ b/thesis/cortex.org Sun Mar 30 10:07:17 2014 -04004 @@ -41,49 +41,46 @@5 [[./images/aurellem-gray.png]]8 -* Empathy and Embodiment as problem solving strategies9 +* Empathy \& Embodiment: problem solving strategies11 - By the end of this thesis, you will have seen a novel approach to12 - interpreting video using embodiment and empathy. You will have also13 - seen one way to efficiently implement empathy for embodied14 - creatures. Finally, you will become familiar with =CORTEX=, a system15 - for designing and simulating creatures with rich senses, which you16 - may choose to use in your own research.17 -18 - This is the core vision of my thesis: That one of the important ways19 - in which we understand others is by imagining ourselves in their20 - position and emphatically feeling experiences relative to our own21 - bodies. By understanding events in terms of our own previous22 - corporeal experience, we greatly constrain the possibilities of what23 - would otherwise be an unwieldy exponential search. This extra24 - constraint can be the difference between easily understanding what25 - is happening in a video and being completely lost in a sea of26 - incomprehensible color and movement.27 -28 -** Recognizing actions in video is extremely difficult29 -30 - Consider for example the problem of determining what is happening31 - in a video of which this is one frame:32 -33 +** The problem: recognizing actions in video is extremely difficult34 +# developing / requires useful representations35 +36 + Examine the following collection of images. As you, and indeed very37 + young children, can easily determine, each one is a picture of38 + someone drinking.39 +40 + # dxh: cat, cup, drinking fountain, rain, straw, coconut41 #+caption: A cat drinking some water. Identifying this action is42 - #+caption: beyond the state of the art for computers.43 + #+caption: beyond the capabilities of existing computer vision systems.44 #+ATTR_LaTeX: :width 7cm45 [[./images/cat-drinking.jpg]]46 +47 + Nevertheless, it is beyond the state of the art for a computer48 + vision program to describe what's happening in each of these49 + images, or what's common to them. Part of the problem is that many50 + computer vision systems focus on pixel-level details or probability51 + distributions of pixels, with little focus on [...]52 +53 +54 + In fact, the contents of scene may have much less to do with pixel55 + probabilities than with recognizing various affordances: things you56 + can move, objects you can grasp, spaces that can be filled57 + (Gibson). For example, what processes might enable you to see the58 + chair in figure \ref{hidden-chair}?59 + # Or suppose that you are building a program that recognizes chairs.60 + # How could you ``see'' the chair ?62 - It is currently impossible for any computer program to reliably63 - label such a video as ``drinking''. And rightly so -- it is a very64 - hard problem! What features can you describe in terms of low level65 - functions of pixels that can even begin to describe at a high level66 - what is happening here?67 -68 - Or suppose that you are building a program that recognizes chairs.69 - How could you ``see'' the chair in figure \ref{hidden-chair}?70 -71 + # dxh: blur chair72 #+caption: The chair in this image is quite obvious to humans, but I73 #+caption: doubt that any modern computer vision program can find it.74 #+name: hidden-chair75 #+ATTR_LaTeX: :width 10cm76 [[./images/fat-person-sitting-at-desk.jpg]]77 +78 +79 +80 +82 Finally, how is it that you can easily tell the difference between83 how the girls /muscles/ are working in figure \ref{girl}?84 @@ -95,10 +92,13 @@85 #+ATTR_LaTeX: :width 7cm86 [[./images/wall-push.png]]88 +89 +90 +91 Each of these examples tells us something about what might be going92 on in our minds as we easily solve these recognition problems.94 - The hidden chairs show us that we are strongly triggered by cues95 + The hidden chair shows us that we are strongly triggered by cues96 relating to the position of human bodies, and that we can determine97 the overall physical configuration of a human body even if much of98 that body is occluded.99 @@ -109,10 +109,107 @@100 most positions, and we can easily project this self-knowledge to101 imagined positions triggered by images of the human body.103 -** =EMPATH= neatly solves recognition problems104 +** A step forward: the sensorimotor-centered approach105 +# ** =EMPATH= recognizes what creatures are doing106 +# neatly solves recognition problems107 + In this thesis, I explore the idea that our knowledge of our own108 + bodies enables us to recognize the actions of others.109 +110 + First, I built a system for constructing virtual creatures with111 + physiologically plausible sensorimotor systems and detailed112 + environments. The result is =CORTEX=, which is described in section113 + \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other114 + AI researchers; it is provided in full with detailed instructions115 + on the web [here].)116 +117 + Next, I wrote routines which enabled a simple worm-like creature to118 + infer the actions of a second worm-like creature, using only its119 + own prior sensorimotor experiences and knowledge of the second120 + worm's joint positions. This program, =EMPATH=, is described in121 + section \ref{sec-3}, and the key results of this experiment are122 + summarized below.123 +124 + #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer125 + #+caption: the complete sensory experience and classify these four poses.126 + #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.127 + #+name: worm-recognition-intro-2128 + #+ATTR_LaTeX: :width 15cm129 + [[./images/empathy-1.png]]130 +131 + # =CORTEX= provides a language for describing the sensorimotor132 + # experiences of various creatures.133 +134 + # Next, I developed an experiment to test the power of =CORTEX='s135 + # sensorimotor-centered language for solving recognition problems. As136 + # a proof of concept, I wrote routines which enabled a simple137 + # worm-like creature to infer the actions of a second worm-like138 + # creature, using only its own previous sensorimotor experiences and139 + # knowledge of the second worm's joints (figure140 + # \ref{worm-recognition-intro-2}). The result of this proof of141 + # concept was the program =EMPATH=, described in section142 + # \ref{sec-3}. The key results of this143 +144 + # Using only first-person sensorimotor experiences and third-person145 + # proprioceptive data,146 +147 +*** Key results148 + - After one-shot supervised training, =EMPATH= was able recognize a149 + wide variety of static poses and dynamic actions---ranging from150 + curling in a circle to wriggling with a particular frequency ---151 + with 95\% accuracy.152 + - These results were completely independent of viewing angle153 + because the underlying body-centered language fundamentally is;154 + once an action is learned, it can be recognized equally well from155 + any viewing angle.156 + - =EMPATH= is surprisingly short; the sensorimotor-centered157 + language provided by =CORTEX= resulted in extremely economical158 + recognition routines --- about 0000 lines in all --- suggesting159 + that such representations are very powerful, and often160 + indispensible for the types of recognition tasks considered here.161 + - Although for expediency's sake, I relied on direct knowledge of162 + joint positions in this proof of concept, it would be163 + straightforward to extend =EMPATH= so that it (more164 + realistically) infers joint positions from its visual data.165 +166 +# because the underlying language is fundamentally orientation-independent167 +168 +# recognize the actions of a worm with 95\% accuracy. The169 +# recognition tasks171 - I propose a system that can express the types of recognition172 - problems above in a form amenable to computation. It is split into173 +174 +175 +176 + [Talk about these results and what you find promising about them]177 +178 +** Roadmap179 + [I'm going to explain how =CORTEX= works, then break down how180 + =EMPATH= does its thing. Because the details reveal such-and-such181 + about the approach.]182 +183 + # The success of this simple proof-of-concept offers a tantalizing184 +185 +186 + # explore the idea187 + # The key contribution of this thesis is the idea that body-centered188 + # representations (which express189 +190 +191 + # the192 + # body-centered approach --- in which I try to determine what's193 + # happening in a scene by bringing it into registration with my own194 + # bodily experiences --- are indispensible for recognizing what195 + # creatures are doing in a scene.196 +197 +* COMMENT198 +# body-centered language199 +200 + In this thesis, I'll describe =EMPATH=, which solves a certain201 + class of recognition problems202 +203 + The key idea is to use self-centered (or first-person) language.204 +205 + I have built a system that can express the types of recognition206 + problems in a form amenable to computation. It is split into207 four parts:209 - Free/Guided Play :: The creature moves around and experiences the210 @@ -286,14 +383,14 @@211 code to create a creature, and can use a wide library of212 pre-existing blender models as a base for your own creatures.214 - - =CORTEX= implements a wide variety of senses, including touch,215 + - =CORTEX= implements a wide variety of senses: touch,216 proprioception, vision, hearing, and muscle tension. Complicated217 senses like touch, and vision involve multiple sensory elements218 embedded in a 2D surface. You have complete control over the219 distribution of these sensor elements through the use of simple220 png image files. In particular, =CORTEX= implements more221 comprehensive hearing than any other creature simulation system222 - available.223 + available.225 - =CORTEX= supports any number of creatures and any number of226 senses. Time in =CORTEX= dialates so that the simulated creatures227 @@ -353,7 +450,24 @@228 \end{sidewaysfigure}229 #+END_LaTeX231 -** Contributions232 +** Road map233 +234 + By the end of this thesis, you will have seen a novel approach to235 + interpreting video using embodiment and empathy. You will have also236 + seen one way to efficiently implement empathy for embodied237 + creatures. Finally, you will become familiar with =CORTEX=, a system238 + for designing and simulating creatures with rich senses, which you239 + may choose to use in your own research.240 +241 + This is the core vision of my thesis: That one of the important ways242 + in which we understand others is by imagining ourselves in their243 + position and emphatically feeling experiences relative to our own244 + bodies. By understanding events in terms of our own previous245 + corporeal experience, we greatly constrain the possibilities of what246 + would otherwise be an unwieldy exponential search. This extra247 + constraint can be the difference between easily understanding what248 + is happening in a video and being completely lost in a sea of249 + incomprehensible color and movement.251 - I built =CORTEX=, a comprehensive platform for embodied AI252 experiments. =CORTEX= supports many features lacking in other253 @@ -363,18 +477,22 @@254 - I built =EMPATH=, which uses =CORTEX= to identify the actions of255 a worm-like creature using a computational model of empathy.257 -* Building =CORTEX=258 -259 - I intend for =CORTEX= to be used as a general-purpose library for260 - building creatures and outfitting them with senses, so that it will261 - be useful for other researchers who want to test out ideas of their262 - own. To this end, wherver I have had to make archetictural choices263 - about =CORTEX=, I have chosen to give as much freedom to the user as264 - possible, so that =CORTEX= may be used for things I have not265 - forseen.266 -267 -** Simulation or Reality?268 -269 +270 +* Designing =CORTEX=271 + In this section, I outline the design decisions that went into272 + making =CORTEX=, along with some details about its273 + implementation. (A practical guide to getting started with =CORTEX=,274 + which skips over the history and implementation details presented275 + here, is provided in an appendix \ref{} at the end of this paper.)276 +277 + Throughout this project, I intended for =CORTEX= to be flexible and278 + extensible enough to be useful for other researchers who want to279 + test out ideas of their own. To this end, wherver I have had to make280 + archetictural choices about =CORTEX=, I have chosen to give as much281 + freedom to the user as possible, so that =CORTEX= may be used for282 + things I have not forseen.283 +284 +** Building in simulation versus reality285 The most important archetictural decision of all is the choice to286 use a computer-simulated environemnt in the first place! The world287 is a vast and rich place, and for now simulations are a very poor288 @@ -436,7 +554,7 @@289 doing everything in software is far cheaper than building custom290 real-time hardware. All you need is a laptop and some patience.292 -** Because of Time, simulation is perferable to reality293 +** Simulated time enables rapid prototyping and complex scenes295 I envision =CORTEX= being used to support rapid prototyping and296 iteration of ideas. Even if I could put together a well constructed297 @@ -459,8 +577,8 @@298 simulations of very simple creatures in =CORTEX= generally run at299 40x on my machine!301 -** What is a sense?302 -303 +** All sense organs are two-dimensional surfaces304 +# What is a sense?305 If =CORTEX= is to support a wide variety of senses, it would help306 to have a better understanding of what a ``sense'' actually is!307 While vision, touch, and hearing all seem like they are quite308 @@ -956,7 +1074,7 @@309 #+ATTR_LaTeX: :width 15cm310 [[./images/physical-hand.png]]312 -** Eyes reuse standard video game components313 +** Sight reuses standard video game components...315 Vision is one of the most important senses for humans, so I need to316 build a simulated sense of vision for my AI. I will do this with317 @@ -1257,8 +1375,8 @@318 community and is now (in modified form) part of a system for319 capturing in-game video to a file.321 -** Hearing is hard; =CORTEX= does it right322 -323 +** ...but hearing must be built from scratch324 +# is hard; =CORTEX= does it right325 At the end of this section I will have simulated ears that work the326 same way as the simulated eyes in the last section. I will be able to327 place any number of ear-nodes in a blender file, and they will bind to328 @@ -1565,7 +1683,7 @@329 jMonkeyEngine3 community and is used to record audio for demo330 videos.332 -** Touch uses hundreds of hair-like elements333 +** Hundreds of hair-like elements provide a sense of touch335 Touch is critical to navigation and spatial reasoning and as such I336 need a simulated version of it to give to my AI creatures.337 @@ -2059,7 +2177,7 @@338 #+ATTR_LaTeX: :width 15cm339 [[./images/touch-cube.png]]341 -** Proprioception is the sense that makes everything ``real''342 +** Proprioception provides knowledge of your own body's position344 Close your eyes, and touch your nose with your right index finger.345 How did you do it? You could not see your hand, and neither your346 @@ -2193,7 +2311,7 @@347 #+ATTR_LaTeX: :width 11cm348 [[./images/proprio.png]]350 -** Muscles are both effectors and sensors351 +** Muscles contain both sensors and effectors353 Surprisingly enough, terrestrial creatures only move by using354 torque applied about their joints. There's not a single straight355 @@ -2440,7 +2558,8 @@356 hard control problems without worrying about physics or357 senses.359 -* Empathy in a simulated worm360 +* =EMPATH=: the simulated worm experiment361 +# Empathy in a simulated worm363 Here I develop a computational model of empathy, using =CORTEX= as a364 base. Empathy in this context is the ability to observe another365 @@ -2732,7 +2851,7 @@366 provided by an experience vector and reliably infering the rest of367 the senses.369 -** Empathy is the process of tracing though \Phi-space370 +** ``Empathy'' requires retracing steps though \Phi-space372 Here is the core of a basic empathy algorithm, starting with an373 experience vector:374 @@ -2888,7 +3007,7 @@375 #+end_src376 #+end_listing378 -** Efficient action recognition with =EMPATH=379 +** =EMPATH= recognizes actions efficiently381 To use =EMPATH= with the worm, I first need to gather a set of382 experiences from the worm that includes the actions I want to383 @@ -3044,9 +3163,9 @@384 to interpretation, and dissaggrement between empathy and experience385 is more excusable.387 -** Digression: bootstrapping touch using free exploration388 -389 - In the previous section I showed how to compute actions in terms of390 +** Digression: Learn touch sensor layout through haptic experimentation, instead391 +# Boostraping touch using free exploration392 +In the previous section I showed how to compute actions in terms of393 body-centered predicates which relied averate touch activation of394 pre-defined regions of the worm's skin. What if, instead of recieving395 touch pre-grouped into the six faces of each worm segment, the true