view thesis/dxh-cortex-diff.diff @ 512:8b962ab418c8

accept/reject changes
author Robert McIntyre <rlm@mit.edu>
date Sun, 30 Mar 2014 10:39:19 -0400
parents 07c3feb32df3
children
line wrap: on
line source
1 diff -r f639e2139ce2 thesis/cortex.org
2 --- a/thesis/cortex.org Sun Mar 30 01:34:43 2014 -0400
3 +++ b/thesis/cortex.org Sun Mar 30 10:07:17 2014 -0400
4 @@ -41,49 +41,46 @@
5 [[./images/aurellem-gray.png]]
8 -* Empathy and Embodiment as problem solving strategies
9 +* Empathy \& Embodiment: problem solving strategies
11 - By the end of this thesis, you will have seen a novel approach to
12 - interpreting video using embodiment and empathy. You will have also
13 - seen one way to efficiently implement empathy for embodied
14 - creatures. Finally, you will become familiar with =CORTEX=, a system
15 - for designing and simulating creatures with rich senses, which you
16 - may choose to use in your own research.
17 -
18 - This is the core vision of my thesis: That one of the important ways
19 - in which we understand others is by imagining ourselves in their
20 - position and emphatically feeling experiences relative to our own
21 - bodies. By understanding events in terms of our own previous
22 - corporeal experience, we greatly constrain the possibilities of what
23 - would otherwise be an unwieldy exponential search. This extra
24 - constraint can be the difference between easily understanding what
25 - is happening in a video and being completely lost in a sea of
26 - incomprehensible color and movement.
27 -
28 -** Recognizing actions in video is extremely difficult
29 -
30 - Consider for example the problem of determining what is happening
31 - in a video of which this is one frame:
32 -
33 +** The problem: recognizing actions in video is extremely difficult
34 +# developing / requires useful representations
35 +
36 + Examine the following collection of images. As you, and indeed very
37 + young children, can easily determine, each one is a picture of
38 + someone drinking.
39 +
40 + # dxh: cat, cup, drinking fountain, rain, straw, coconut
41 #+caption: A cat drinking some water. Identifying this action is
42 - #+caption: beyond the state of the art for computers.
43 + #+caption: beyond the capabilities of existing computer vision systems.
44 #+ATTR_LaTeX: :width 7cm
45 [[./images/cat-drinking.jpg]]
46 +
47 + Nevertheless, it is beyond the state of the art for a computer
48 + vision program to describe what's happening in each of these
49 + images, or what's common to them. Part of the problem is that many
50 + computer vision systems focus on pixel-level details or probability
51 + distributions of pixels, with little focus on [...]
52 +
53 +
54 + In fact, the contents of scene may have much less to do with pixel
55 + probabilities than with recognizing various affordances: things you
56 + can move, objects you can grasp, spaces that can be filled
57 + (Gibson). For example, what processes might enable you to see the
58 + chair in figure \ref{hidden-chair}?
59 + # Or suppose that you are building a program that recognizes chairs.
60 + # How could you ``see'' the chair ?
62 - It is currently impossible for any computer program to reliably
63 - label such a video as ``drinking''. And rightly so -- it is a very
64 - hard problem! What features can you describe in terms of low level
65 - functions of pixels that can even begin to describe at a high level
66 - what is happening here?
67 -
68 - Or suppose that you are building a program that recognizes chairs.
69 - How could you ``see'' the chair in figure \ref{hidden-chair}?
70 -
71 + # dxh: blur chair
72 #+caption: The chair in this image is quite obvious to humans, but I
73 #+caption: doubt that any modern computer vision program can find it.
74 #+name: hidden-chair
75 #+ATTR_LaTeX: :width 10cm
76 [[./images/fat-person-sitting-at-desk.jpg]]
77 +
78 +
79 +
80 +
82 Finally, how is it that you can easily tell the difference between
83 how the girls /muscles/ are working in figure \ref{girl}?
84 @@ -95,10 +92,13 @@
85 #+ATTR_LaTeX: :width 7cm
86 [[./images/wall-push.png]]
88 +
89 +
90 +
91 Each of these examples tells us something about what might be going
92 on in our minds as we easily solve these recognition problems.
94 - The hidden chairs show us that we are strongly triggered by cues
95 + The hidden chair shows us that we are strongly triggered by cues
96 relating to the position of human bodies, and that we can determine
97 the overall physical configuration of a human body even if much of
98 that body is occluded.
99 @@ -109,10 +109,107 @@
100 most positions, and we can easily project this self-knowledge to
101 imagined positions triggered by images of the human body.
103 -** =EMPATH= neatly solves recognition problems
104 +** A step forward: the sensorimotor-centered approach
105 +# ** =EMPATH= recognizes what creatures are doing
106 +# neatly solves recognition problems
107 + In this thesis, I explore the idea that our knowledge of our own
108 + bodies enables us to recognize the actions of others.
109 +
110 + First, I built a system for constructing virtual creatures with
111 + physiologically plausible sensorimotor systems and detailed
112 + environments. The result is =CORTEX=, which is described in section
113 + \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other
114 + AI researchers; it is provided in full with detailed instructions
115 + on the web [here].)
116 +
117 + Next, I wrote routines which enabled a simple worm-like creature to
118 + infer the actions of a second worm-like creature, using only its
119 + own prior sensorimotor experiences and knowledge of the second
120 + worm's joint positions. This program, =EMPATH=, is described in
121 + section \ref{sec-3}, and the key results of this experiment are
122 + summarized below.
123 +
124 + #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer
125 + #+caption: the complete sensory experience and classify these four poses.
126 + #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.
127 + #+name: worm-recognition-intro-2
128 + #+ATTR_LaTeX: :width 15cm
129 + [[./images/empathy-1.png]]
130 +
131 + # =CORTEX= provides a language for describing the sensorimotor
132 + # experiences of various creatures.
133 +
134 + # Next, I developed an experiment to test the power of =CORTEX='s
135 + # sensorimotor-centered language for solving recognition problems. As
136 + # a proof of concept, I wrote routines which enabled a simple
137 + # worm-like creature to infer the actions of a second worm-like
138 + # creature, using only its own previous sensorimotor experiences and
139 + # knowledge of the second worm's joints (figure
140 + # \ref{worm-recognition-intro-2}). The result of this proof of
141 + # concept was the program =EMPATH=, described in section
142 + # \ref{sec-3}. The key results of this
143 +
144 + # Using only first-person sensorimotor experiences and third-person
145 + # proprioceptive data,
146 +
147 +*** Key results
148 + - After one-shot supervised training, =EMPATH= was able recognize a
149 + wide variety of static poses and dynamic actions---ranging from
150 + curling in a circle to wriggling with a particular frequency ---
151 + with 95\% accuracy.
152 + - These results were completely independent of viewing angle
153 + because the underlying body-centered language fundamentally is;
154 + once an action is learned, it can be recognized equally well from
155 + any viewing angle.
156 + - =EMPATH= is surprisingly short; the sensorimotor-centered
157 + language provided by =CORTEX= resulted in extremely economical
158 + recognition routines --- about 0000 lines in all --- suggesting
159 + that such representations are very powerful, and often
160 + indispensible for the types of recognition tasks considered here.
161 + - Although for expediency's sake, I relied on direct knowledge of
162 + joint positions in this proof of concept, it would be
163 + straightforward to extend =EMPATH= so that it (more
164 + realistically) infers joint positions from its visual data.
165 +
166 +# because the underlying language is fundamentally orientation-independent
167 +
168 +# recognize the actions of a worm with 95\% accuracy. The
169 +# recognition tasks
171 - I propose a system that can express the types of recognition
172 - problems above in a form amenable to computation. It is split into
173 +
174 +
175 +
176 + [Talk about these results and what you find promising about them]
177 +
178 +** Roadmap
179 + [I'm going to explain how =CORTEX= works, then break down how
180 + =EMPATH= does its thing. Because the details reveal such-and-such
181 + about the approach.]
182 +
183 + # The success of this simple proof-of-concept offers a tantalizing
184 +
185 +
186 + # explore the idea
187 + # The key contribution of this thesis is the idea that body-centered
188 + # representations (which express
189 +
190 +
191 + # the
192 + # body-centered approach --- in which I try to determine what's
193 + # happening in a scene by bringing it into registration with my own
194 + # bodily experiences --- are indispensible for recognizing what
195 + # creatures are doing in a scene.
196 +
197 +* COMMENT
198 +# body-centered language
199 +
200 + In this thesis, I'll describe =EMPATH=, which solves a certain
201 + class of recognition problems
202 +
203 + The key idea is to use self-centered (or first-person) language.
204 +
205 + I have built a system that can express the types of recognition
206 + problems in a form amenable to computation. It is split into
207 four parts:
209 - Free/Guided Play :: The creature moves around and experiences the
210 @@ -286,14 +383,14 @@
211 code to create a creature, and can use a wide library of
212 pre-existing blender models as a base for your own creatures.
214 - - =CORTEX= implements a wide variety of senses, including touch,
215 + - =CORTEX= implements a wide variety of senses: touch,
216 proprioception, vision, hearing, and muscle tension. Complicated
217 senses like touch, and vision involve multiple sensory elements
218 embedded in a 2D surface. You have complete control over the
219 distribution of these sensor elements through the use of simple
220 png image files. In particular, =CORTEX= implements more
221 comprehensive hearing than any other creature simulation system
222 - available.
223 + available.
225 - =CORTEX= supports any number of creatures and any number of
226 senses. Time in =CORTEX= dialates so that the simulated creatures
227 @@ -353,7 +450,24 @@
228 \end{sidewaysfigure}
229 #+END_LaTeX
231 -** Contributions
232 +** Road map
233 +
234 + By the end of this thesis, you will have seen a novel approach to
235 + interpreting video using embodiment and empathy. You will have also
236 + seen one way to efficiently implement empathy for embodied
237 + creatures. Finally, you will become familiar with =CORTEX=, a system
238 + for designing and simulating creatures with rich senses, which you
239 + may choose to use in your own research.
240 +
241 + This is the core vision of my thesis: That one of the important ways
242 + in which we understand others is by imagining ourselves in their
243 + position and emphatically feeling experiences relative to our own
244 + bodies. By understanding events in terms of our own previous
245 + corporeal experience, we greatly constrain the possibilities of what
246 + would otherwise be an unwieldy exponential search. This extra
247 + constraint can be the difference between easily understanding what
248 + is happening in a video and being completely lost in a sea of
249 + incomprehensible color and movement.
251 - I built =CORTEX=, a comprehensive platform for embodied AI
252 experiments. =CORTEX= supports many features lacking in other
253 @@ -363,18 +477,22 @@
254 - I built =EMPATH=, which uses =CORTEX= to identify the actions of
255 a worm-like creature using a computational model of empathy.
257 -* Building =CORTEX=
258 -
259 - I intend for =CORTEX= to be used as a general-purpose library for
260 - building creatures and outfitting them with senses, so that it will
261 - be useful for other researchers who want to test out ideas of their
262 - own. To this end, wherver I have had to make archetictural choices
263 - about =CORTEX=, I have chosen to give as much freedom to the user as
264 - possible, so that =CORTEX= may be used for things I have not
265 - forseen.
266 -
267 -** Simulation or Reality?
268 -
269 +
270 +* Designing =CORTEX=
271 + In this section, I outline the design decisions that went into
272 + making =CORTEX=, along with some details about its
273 + implementation. (A practical guide to getting started with =CORTEX=,
274 + which skips over the history and implementation details presented
275 + here, is provided in an appendix \ref{} at the end of this paper.)
276 +
277 + Throughout this project, I intended for =CORTEX= to be flexible and
278 + extensible enough to be useful for other researchers who want to
279 + test out ideas of their own. To this end, wherver I have had to make
280 + archetictural choices about =CORTEX=, I have chosen to give as much
281 + freedom to the user as possible, so that =CORTEX= may be used for
282 + things I have not forseen.
283 +
284 +** Building in simulation versus reality
285 The most important archetictural decision of all is the choice to
286 use a computer-simulated environemnt in the first place! The world
287 is a vast and rich place, and for now simulations are a very poor
288 @@ -436,7 +554,7 @@
289 doing everything in software is far cheaper than building custom
290 real-time hardware. All you need is a laptop and some patience.
292 -** Because of Time, simulation is perferable to reality
293 +** Simulated time enables rapid prototyping and complex scenes
295 I envision =CORTEX= being used to support rapid prototyping and
296 iteration of ideas. Even if I could put together a well constructed
297 @@ -459,8 +577,8 @@
298 simulations of very simple creatures in =CORTEX= generally run at
299 40x on my machine!
301 -** What is a sense?
302 -
303 +** All sense organs are two-dimensional surfaces
304 +# What is a sense?
305 If =CORTEX= is to support a wide variety of senses, it would help
306 to have a better understanding of what a ``sense'' actually is!
307 While vision, touch, and hearing all seem like they are quite
308 @@ -956,7 +1074,7 @@
309 #+ATTR_LaTeX: :width 15cm
310 [[./images/physical-hand.png]]
312 -** Eyes reuse standard video game components
313 +** Sight reuses standard video game components...
315 Vision is one of the most important senses for humans, so I need to
316 build a simulated sense of vision for my AI. I will do this with
317 @@ -1257,8 +1375,8 @@
318 community and is now (in modified form) part of a system for
319 capturing in-game video to a file.
321 -** Hearing is hard; =CORTEX= does it right
322 -
323 +** ...but hearing must be built from scratch
324 +# is hard; =CORTEX= does it right
325 At the end of this section I will have simulated ears that work the
326 same way as the simulated eyes in the last section. I will be able to
327 place any number of ear-nodes in a blender file, and they will bind to
328 @@ -1565,7 +1683,7 @@
329 jMonkeyEngine3 community and is used to record audio for demo
330 videos.
332 -** Touch uses hundreds of hair-like elements
333 +** Hundreds of hair-like elements provide a sense of touch
335 Touch is critical to navigation and spatial reasoning and as such I
336 need a simulated version of it to give to my AI creatures.
337 @@ -2059,7 +2177,7 @@
338 #+ATTR_LaTeX: :width 15cm
339 [[./images/touch-cube.png]]
341 -** Proprioception is the sense that makes everything ``real''
342 +** Proprioception provides knowledge of your own body's position
344 Close your eyes, and touch your nose with your right index finger.
345 How did you do it? You could not see your hand, and neither your
346 @@ -2193,7 +2311,7 @@
347 #+ATTR_LaTeX: :width 11cm
348 [[./images/proprio.png]]
350 -** Muscles are both effectors and sensors
351 +** Muscles contain both sensors and effectors
353 Surprisingly enough, terrestrial creatures only move by using
354 torque applied about their joints. There's not a single straight
355 @@ -2440,7 +2558,8 @@
356 hard control problems without worrying about physics or
357 senses.
359 -* Empathy in a simulated worm
360 +* =EMPATH=: the simulated worm experiment
361 +# Empathy in a simulated worm
363 Here I develop a computational model of empathy, using =CORTEX= as a
364 base. Empathy in this context is the ability to observe another
365 @@ -2732,7 +2851,7 @@
366 provided by an experience vector and reliably infering the rest of
367 the senses.
369 -** Empathy is the process of tracing though \Phi-space
370 +** ``Empathy'' requires retracing steps though \Phi-space
372 Here is the core of a basic empathy algorithm, starting with an
373 experience vector:
374 @@ -2888,7 +3007,7 @@
375 #+end_src
376 #+end_listing
378 -** Efficient action recognition with =EMPATH=
379 +** =EMPATH= recognizes actions efficiently
381 To use =EMPATH= with the worm, I first need to gather a set of
382 experiences from the worm that includes the actions I want to
383 @@ -3044,9 +3163,9 @@
384 to interpretation, and dissaggrement between empathy and experience
385 is more excusable.
387 -** Digression: bootstrapping touch using free exploration
388 -
389 - In the previous section I showed how to compute actions in terms of
390 +** Digression: Learn touch sensor layout through haptic experimentation, instead
391 +# Boostraping touch using free exploration
392 +In the previous section I showed how to compute actions in terms of
393 body-centered predicates which relied averate touch activation of
394 pre-defined regions of the worm's skin. What if, instead of recieving
395 touch pre-grouped into the six faces of each worm segment, the true
396 @@ -3210,13 +3329,14 @@
398 In this thesis you have seen the =CORTEX= system, a complete
399 environment for creating simulated creatures. You have seen how to
400 - implement five senses including touch, proprioception, hearing,
401 - vision, and muscle tension. You have seen how to create new creatues
402 - using blender, a 3D modeling tool. I hope that =CORTEX= will be
403 - useful in further research projects. To this end I have included the
404 - full source to =CORTEX= along with a large suite of tests and
405 - examples. I have also created a user guide for =CORTEX= which is
406 - inculded in an appendix to this thesis.
407 + implement five senses: touch, proprioception, hearing, vision, and
408 + muscle tension. You have seen how to create new creatues using
409 + blender, a 3D modeling tool. I hope that =CORTEX= will be useful in
410 + further research projects. To this end I have included the full
411 + source to =CORTEX= along with a large suite of tests and examples. I
412 + have also created a user guide for =CORTEX= which is inculded in an
413 + appendix to this thesis \ref{}.
414 +# dxh: todo reference appendix
416 You have also seen how I used =CORTEX= as a platform to attach the
417 /action recognition/ problem, which is the problem of recognizing
418 @@ -3234,8 +3354,8 @@
420 - =CORTEX=, a system for creating simulated creatures with rich
421 senses.
422 - - =EMPATH=, a program for recognizing actions by imagining sensory
423 - experience.
424 + - =EMPATH=, a program for recognizing actions by aligning them with
425 + personal sensory experiences.
427 # An anatomical joke:
428 # - Training