Mercurial > cortex
view thesis/dylan-cortex-diff.diff @ 535:8a5abd51cd4f
add example / discussion per Winston's request.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Sun, 27 Apr 2014 20:25:22 -0400 |
parents | 90b236381642 |
children |
line wrap: on
line source
1 diff -r f639e2139ce2 thesis/cortex.org2 --- a/thesis/cortex.org Sun Mar 30 01:34:43 2014 -04003 +++ b/thesis/cortex.org Sun Mar 30 10:07:17 2014 -04004 @@ -41,49 +41,46 @@5 [[./images/aurellem-gray.png]]8 -* Empathy and Embodiment as problem solving strategies9 +* Empathy \& Embodiment: problem solving strategies11 - By the end of this thesis, you will have seen a novel approach to12 - interpreting video using embodiment and empathy. You will have also13 - seen one way to efficiently implement empathy for embodied14 - creatures. Finally, you will become familiar with =CORTEX=, a system15 - for designing and simulating creatures with rich senses, which you16 - may choose to use in your own research.17 -18 - This is the core vision of my thesis: That one of the important ways19 - in which we understand others is by imagining ourselves in their20 - position and emphatically feeling experiences relative to our own21 - bodies. By understanding events in terms of our own previous22 - corporeal experience, we greatly constrain the possibilities of what23 - would otherwise be an unwieldy exponential search. This extra24 - constraint can be the difference between easily understanding what25 - is happening in a video and being completely lost in a sea of26 - incomprehensible color and movement.27 -28 -** Recognizing actions in video is extremely difficult29 -30 - Consider for example the problem of determining what is happening31 - in a video of which this is one frame:32 -33 +** The problem: recognizing actions in video is extremely difficult34 +# developing / requires useful representations35 +36 + Examine the following collection of images. As you, and indeed very37 + young children, can easily determine, each one is a picture of38 + someone drinking.39 +40 + # dxh: cat, cup, drinking fountain, rain, straw, coconut41 #+caption: A cat drinking some water. Identifying this action is42 - #+caption: beyond the state of the art for computers.43 + #+caption: beyond the capabilities of existing computer vision systems.44 #+ATTR_LaTeX: :width 7cm45 [[./images/cat-drinking.jpg]]46 +47 + Nevertheless, it is beyond the state of the art for a computer48 + vision program to describe what's happening in each of these49 + images, or what's common to them. Part of the problem is that many50 + computer vision systems focus on pixel-level details or probability51 + distributions of pixels, with little focus on [...]52 +53 +54 + In fact, the contents of scene may have much less to do with pixel55 + probabilities than with recognizing various affordances: things you56 + can move, objects you can grasp, spaces that can be filled57 + (Gibson). For example, what processes might enable you to see the58 + chair in figure \ref{hidden-chair}?59 + # Or suppose that you are building a program that recognizes chairs.60 + # How could you ``see'' the chair ?62 - It is currently impossible for any computer program to reliably63 - label such a video as ``drinking''. And rightly so -- it is a very64 - hard problem! What features can you describe in terms of low level65 - functions of pixels that can even begin to describe at a high level66 - what is happening here?67 -68 - Or suppose that you are building a program that recognizes chairs.69 - How could you ``see'' the chair in figure \ref{hidden-chair}?70 -71 + # dxh: blur chair72 #+caption: The chair in this image is quite obvious to humans, but I73 #+caption: doubt that any modern computer vision program can find it.74 #+name: hidden-chair75 #+ATTR_LaTeX: :width 10cm76 [[./images/fat-person-sitting-at-desk.jpg]]77 +78 +79 +80 +82 Finally, how is it that you can easily tell the difference between83 how the girls /muscles/ are working in figure \ref{girl}?84 @@ -95,10 +92,13 @@85 #+ATTR_LaTeX: :width 7cm86 [[./images/wall-push.png]]88 +89 +90 +91 Each of these examples tells us something about what might be going92 on in our minds as we easily solve these recognition problems.94 - The hidden chairs show us that we are strongly triggered by cues95 + The hidden chair shows us that we are strongly triggered by cues96 relating to the position of human bodies, and that we can determine97 the overall physical configuration of a human body even if much of98 that body is occluded.99 @@ -109,10 +109,107 @@100 most positions, and we can easily project this self-knowledge to101 imagined positions triggered by images of the human body.103 -** =EMPATH= neatly solves recognition problems104 +** A step forward: the sensorimotor-centered approach105 +# ** =EMPATH= recognizes what creatures are doing106 +# neatly solves recognition problems107 + In this thesis, I explore the idea that our knowledge of our own108 + bodies enables us to recognize the actions of others.109 +110 + First, I built a system for constructing virtual creatures with111 + physiologically plausible sensorimotor systems and detailed112 + environments. The result is =CORTEX=, which is described in section113 + \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other114 + AI researchers; it is provided in full with detailed instructions115 + on the web [here].)116 +117 + Next, I wrote routines which enabled a simple worm-like creature to118 + infer the actions of a second worm-like creature, using only its119 + own prior sensorimotor experiences and knowledge of the second120 + worm's joint positions. This program, =EMPATH=, is described in121 + section \ref{sec-3}, and the key results of this experiment are122 + summarized below.123 +124 + #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer125 + #+caption: the complete sensory experience and classify these four poses.126 + #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.127 + #+name: worm-recognition-intro-2128 + #+ATTR_LaTeX: :width 15cm129 + [[./images/empathy-1.png]]130 +131 + # =CORTEX= provides a language for describing the sensorimotor132 + # experiences of various creatures.133 +134 + # Next, I developed an experiment to test the power of =CORTEX='s135 + # sensorimotor-centered language for solving recognition problems. As136 + # a proof of concept, I wrote routines which enabled a simple137 + # worm-like creature to infer the actions of a second worm-like138 + # creature, using only its own previous sensorimotor experiences and139 + # knowledge of the second worm's joints (figure140 + # \ref{worm-recognition-intro-2}). The result of this proof of141 + # concept was the program =EMPATH=, described in section142 + # \ref{sec-3}. The key results of this143 +144 + # Using only first-person sensorimotor experiences and third-person145 + # proprioceptive data,146 +147 +*** Key results148 + - After one-shot supervised training, =EMPATH= was able recognize a149 + wide variety of static poses and dynamic actions---ranging from150 + curling in a circle to wriggling with a particular frequency ---151 + with 95\% accuracy.152 + - These results were completely independent of viewing angle153 + because the underlying body-centered language fundamentally is;154 + once an action is learned, it can be recognized equally well from155 + any viewing angle.156 + - =EMPATH= is surprisingly short; the sensorimotor-centered157 + language provided by =CORTEX= resulted in extremely economical158 + recognition routines --- about 0000 lines in all --- suggesting159 + that such representations are very powerful, and often160 + indispensible for the types of recognition tasks considered here.161 + - Although for expediency's sake, I relied on direct knowledge of162 + joint positions in this proof of concept, it would be163 + straightforward to extend =EMPATH= so that it (more164 + realistically) infers joint positions from its visual data.165 +166 +# because the underlying language is fundamentally orientation-independent167 +168 +# recognize the actions of a worm with 95\% accuracy. The169 +# recognition tasks171 - I propose a system that can express the types of recognition172 - problems above in a form amenable to computation. It is split into173 +174 +175 +176 + [Talk about these results and what you find promising about them]177 +178 +** Roadmap179 + [I'm going to explain how =CORTEX= works, then break down how180 + =EMPATH= does its thing. Because the details reveal such-and-such181 + about the approach.]182 +183 + # The success of this simple proof-of-concept offers a tantalizing184 +185 +186 + # explore the idea187 + # The key contribution of this thesis is the idea that body-centered188 + # representations (which express189 +190 +191 + # the192 + # body-centered approach --- in which I try to determine what's193 + # happening in a scene by bringing it into registration with my own194 + # bodily experiences --- are indispensible for recognizing what195 + # creatures are doing in a scene.196 +197 +* COMMENT198 +# body-centered language199 +200 + In this thesis, I'll describe =EMPATH=, which solves a certain201 + class of recognition problems202 +203 + The key idea is to use self-centered (or first-person) language.204 +205 + I have built a system that can express the types of recognition206 + problems in a form amenable to computation. It is split into207 four parts:209 - Free/Guided Play :: The creature moves around and experiences the210 @@ -286,14 +383,14 @@211 code to create a creature, and can use a wide library of212 pre-existing blender models as a base for your own creatures.214 - - =CORTEX= implements a wide variety of senses, including touch,215 + - =CORTEX= implements a wide variety of senses: touch,216 proprioception, vision, hearing, and muscle tension. Complicated217 senses like touch, and vision involve multiple sensory elements218 embedded in a 2D surface. You have complete control over the219 distribution of these sensor elements through the use of simple220 png image files. In particular, =CORTEX= implements more221 comprehensive hearing than any other creature simulation system222 - available.223 + available.225 - =CORTEX= supports any number of creatures and any number of226 senses. Time in =CORTEX= dialates so that the simulated creatures227 @@ -353,7 +450,24 @@228 \end{sidewaysfigure}229 #+END_LaTeX231 -** Contributions232 +** Road map233 +234 + By the end of this thesis, you will have seen a novel approach to235 + interpreting video using embodiment and empathy. You will have also236 + seen one way to efficiently implement empathy for embodied237 + creatures. Finally, you will become familiar with =CORTEX=, a system238 + for designing and simulating creatures with rich senses, which you239 + may choose to use in your own research.240 +241 + This is the core vision of my thesis: That one of the important ways242 + in which we understand others is by imagining ourselves in their243 + position and emphatically feeling experiences relative to our own244 + bodies. By understanding events in terms of our own previous245 + corporeal experience, we greatly constrain the possibilities of what246 + would otherwise be an unwieldy exponential search. This extra247 + constraint can be the difference between easily understanding what248 + is happening in a video and being completely lost in a sea of249 + incomprehensible color and movement.251 - I built =CORTEX=, a comprehensive platform for embodied AI252 experiments. =CORTEX= supports many features lacking in other253 @@ -363,18 +477,22 @@254 - I built =EMPATH=, which uses =CORTEX= to identify the actions of255 a worm-like creature using a computational model of empathy.257 -* Building =CORTEX=258 -259 - I intend for =CORTEX= to be used as a general-purpose library for260 - building creatures and outfitting them with senses, so that it will261 - be useful for other researchers who want to test out ideas of their262 - own. To this end, wherver I have had to make archetictural choices263 - about =CORTEX=, I have chosen to give as much freedom to the user as264 - possible, so that =CORTEX= may be used for things I have not265 - forseen.266 -267 -** Simulation or Reality?268 -269 +270 +* Designing =CORTEX=271 + In this section, I outline the design decisions that went into272 + making =CORTEX=, along with some details about its273 + implementation. (A practical guide to getting started with =CORTEX=,274 + which skips over the history and implementation details presented275 + here, is provided in an appendix \ref{} at the end of this paper.)276 +277 + Throughout this project, I intended for =CORTEX= to be flexible and278 + extensible enough to be useful for other researchers who want to279 + test out ideas of their own. To this end, wherver I have had to make280 + archetictural choices about =CORTEX=, I have chosen to give as much281 + freedom to the user as possible, so that =CORTEX= may be used for282 + things I have not forseen.283 +284 +** Building in simulation versus reality285 The most important archetictural decision of all is the choice to286 use a computer-simulated environemnt in the first place! The world287 is a vast and rich place, and for now simulations are a very poor