view thesis/dylan-cortex-diff.diff @ 535:8a5abd51cd4f

add example / discussion per Winston's request.
author Robert McIntyre <rlm@mit.edu>
date Sun, 27 Apr 2014 20:25:22 -0400
parents 90b236381642
children
line wrap: on
line source
1 diff -r f639e2139ce2 thesis/cortex.org
2 --- a/thesis/cortex.org Sun Mar 30 01:34:43 2014 -0400
3 +++ b/thesis/cortex.org Sun Mar 30 10:07:17 2014 -0400
4 @@ -41,49 +41,46 @@
5 [[./images/aurellem-gray.png]]
8 -* Empathy and Embodiment as problem solving strategies
9 +* Empathy \& Embodiment: problem solving strategies
11 - By the end of this thesis, you will have seen a novel approach to
12 - interpreting video using embodiment and empathy. You will have also
13 - seen one way to efficiently implement empathy for embodied
14 - creatures. Finally, you will become familiar with =CORTEX=, a system
15 - for designing and simulating creatures with rich senses, which you
16 - may choose to use in your own research.
17 -
18 - This is the core vision of my thesis: That one of the important ways
19 - in which we understand others is by imagining ourselves in their
20 - position and emphatically feeling experiences relative to our own
21 - bodies. By understanding events in terms of our own previous
22 - corporeal experience, we greatly constrain the possibilities of what
23 - would otherwise be an unwieldy exponential search. This extra
24 - constraint can be the difference between easily understanding what
25 - is happening in a video and being completely lost in a sea of
26 - incomprehensible color and movement.
27 -
28 -** Recognizing actions in video is extremely difficult
29 -
30 - Consider for example the problem of determining what is happening
31 - in a video of which this is one frame:
32 -
33 +** The problem: recognizing actions in video is extremely difficult
34 +# developing / requires useful representations
35 +
36 + Examine the following collection of images. As you, and indeed very
37 + young children, can easily determine, each one is a picture of
38 + someone drinking.
39 +
40 + # dxh: cat, cup, drinking fountain, rain, straw, coconut
41 #+caption: A cat drinking some water. Identifying this action is
42 - #+caption: beyond the state of the art for computers.
43 + #+caption: beyond the capabilities of existing computer vision systems.
44 #+ATTR_LaTeX: :width 7cm
45 [[./images/cat-drinking.jpg]]
46 +
47 + Nevertheless, it is beyond the state of the art for a computer
48 + vision program to describe what's happening in each of these
49 + images, or what's common to them. Part of the problem is that many
50 + computer vision systems focus on pixel-level details or probability
51 + distributions of pixels, with little focus on [...]
52 +
53 +
54 + In fact, the contents of scene may have much less to do with pixel
55 + probabilities than with recognizing various affordances: things you
56 + can move, objects you can grasp, spaces that can be filled
57 + (Gibson). For example, what processes might enable you to see the
58 + chair in figure \ref{hidden-chair}?
59 + # Or suppose that you are building a program that recognizes chairs.
60 + # How could you ``see'' the chair ?
62 - It is currently impossible for any computer program to reliably
63 - label such a video as ``drinking''. And rightly so -- it is a very
64 - hard problem! What features can you describe in terms of low level
65 - functions of pixels that can even begin to describe at a high level
66 - what is happening here?
67 -
68 - Or suppose that you are building a program that recognizes chairs.
69 - How could you ``see'' the chair in figure \ref{hidden-chair}?
70 -
71 + # dxh: blur chair
72 #+caption: The chair in this image is quite obvious to humans, but I
73 #+caption: doubt that any modern computer vision program can find it.
74 #+name: hidden-chair
75 #+ATTR_LaTeX: :width 10cm
76 [[./images/fat-person-sitting-at-desk.jpg]]
77 +
78 +
79 +
80 +
82 Finally, how is it that you can easily tell the difference between
83 how the girls /muscles/ are working in figure \ref{girl}?
84 @@ -95,10 +92,13 @@
85 #+ATTR_LaTeX: :width 7cm
86 [[./images/wall-push.png]]
88 +
89 +
90 +
91 Each of these examples tells us something about what might be going
92 on in our minds as we easily solve these recognition problems.
94 - The hidden chairs show us that we are strongly triggered by cues
95 + The hidden chair shows us that we are strongly triggered by cues
96 relating to the position of human bodies, and that we can determine
97 the overall physical configuration of a human body even if much of
98 that body is occluded.
99 @@ -109,10 +109,107 @@
100 most positions, and we can easily project this self-knowledge to
101 imagined positions triggered by images of the human body.
103 -** =EMPATH= neatly solves recognition problems
104 +** A step forward: the sensorimotor-centered approach
105 +# ** =EMPATH= recognizes what creatures are doing
106 +# neatly solves recognition problems
107 + In this thesis, I explore the idea that our knowledge of our own
108 + bodies enables us to recognize the actions of others.
109 +
110 + First, I built a system for constructing virtual creatures with
111 + physiologically plausible sensorimotor systems and detailed
112 + environments. The result is =CORTEX=, which is described in section
113 + \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other
114 + AI researchers; it is provided in full with detailed instructions
115 + on the web [here].)
116 +
117 + Next, I wrote routines which enabled a simple worm-like creature to
118 + infer the actions of a second worm-like creature, using only its
119 + own prior sensorimotor experiences and knowledge of the second
120 + worm's joint positions. This program, =EMPATH=, is described in
121 + section \ref{sec-3}, and the key results of this experiment are
122 + summarized below.
123 +
124 + #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer
125 + #+caption: the complete sensory experience and classify these four poses.
126 + #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.
127 + #+name: worm-recognition-intro-2
128 + #+ATTR_LaTeX: :width 15cm
129 + [[./images/empathy-1.png]]
130 +
131 + # =CORTEX= provides a language for describing the sensorimotor
132 + # experiences of various creatures.
133 +
134 + # Next, I developed an experiment to test the power of =CORTEX='s
135 + # sensorimotor-centered language for solving recognition problems. As
136 + # a proof of concept, I wrote routines which enabled a simple
137 + # worm-like creature to infer the actions of a second worm-like
138 + # creature, using only its own previous sensorimotor experiences and
139 + # knowledge of the second worm's joints (figure
140 + # \ref{worm-recognition-intro-2}). The result of this proof of
141 + # concept was the program =EMPATH=, described in section
142 + # \ref{sec-3}. The key results of this
143 +
144 + # Using only first-person sensorimotor experiences and third-person
145 + # proprioceptive data,
146 +
147 +*** Key results
148 + - After one-shot supervised training, =EMPATH= was able recognize a
149 + wide variety of static poses and dynamic actions---ranging from
150 + curling in a circle to wriggling with a particular frequency ---
151 + with 95\% accuracy.
152 + - These results were completely independent of viewing angle
153 + because the underlying body-centered language fundamentally is;
154 + once an action is learned, it can be recognized equally well from
155 + any viewing angle.
156 + - =EMPATH= is surprisingly short; the sensorimotor-centered
157 + language provided by =CORTEX= resulted in extremely economical
158 + recognition routines --- about 0000 lines in all --- suggesting
159 + that such representations are very powerful, and often
160 + indispensible for the types of recognition tasks considered here.
161 + - Although for expediency's sake, I relied on direct knowledge of
162 + joint positions in this proof of concept, it would be
163 + straightforward to extend =EMPATH= so that it (more
164 + realistically) infers joint positions from its visual data.
165 +
166 +# because the underlying language is fundamentally orientation-independent
167 +
168 +# recognize the actions of a worm with 95\% accuracy. The
169 +# recognition tasks
171 - I propose a system that can express the types of recognition
172 - problems above in a form amenable to computation. It is split into
173 +
174 +
175 +
176 + [Talk about these results and what you find promising about them]
177 +
178 +** Roadmap
179 + [I'm going to explain how =CORTEX= works, then break down how
180 + =EMPATH= does its thing. Because the details reveal such-and-such
181 + about the approach.]
182 +
183 + # The success of this simple proof-of-concept offers a tantalizing
184 +
185 +
186 + # explore the idea
187 + # The key contribution of this thesis is the idea that body-centered
188 + # representations (which express
189 +
190 +
191 + # the
192 + # body-centered approach --- in which I try to determine what's
193 + # happening in a scene by bringing it into registration with my own
194 + # bodily experiences --- are indispensible for recognizing what
195 + # creatures are doing in a scene.
196 +
197 +* COMMENT
198 +# body-centered language
199 +
200 + In this thesis, I'll describe =EMPATH=, which solves a certain
201 + class of recognition problems
202 +
203 + The key idea is to use self-centered (or first-person) language.
204 +
205 + I have built a system that can express the types of recognition
206 + problems in a form amenable to computation. It is split into
207 four parts:
209 - Free/Guided Play :: The creature moves around and experiences the
210 @@ -286,14 +383,14 @@
211 code to create a creature, and can use a wide library of
212 pre-existing blender models as a base for your own creatures.
214 - - =CORTEX= implements a wide variety of senses, including touch,
215 + - =CORTEX= implements a wide variety of senses: touch,
216 proprioception, vision, hearing, and muscle tension. Complicated
217 senses like touch, and vision involve multiple sensory elements
218 embedded in a 2D surface. You have complete control over the
219 distribution of these sensor elements through the use of simple
220 png image files. In particular, =CORTEX= implements more
221 comprehensive hearing than any other creature simulation system
222 - available.
223 + available.
225 - =CORTEX= supports any number of creatures and any number of
226 senses. Time in =CORTEX= dialates so that the simulated creatures
227 @@ -353,7 +450,24 @@
228 \end{sidewaysfigure}
229 #+END_LaTeX
231 -** Contributions
232 +** Road map
233 +
234 + By the end of this thesis, you will have seen a novel approach to
235 + interpreting video using embodiment and empathy. You will have also
236 + seen one way to efficiently implement empathy for embodied
237 + creatures. Finally, you will become familiar with =CORTEX=, a system
238 + for designing and simulating creatures with rich senses, which you
239 + may choose to use in your own research.
240 +
241 + This is the core vision of my thesis: That one of the important ways
242 + in which we understand others is by imagining ourselves in their
243 + position and emphatically feeling experiences relative to our own
244 + bodies. By understanding events in terms of our own previous
245 + corporeal experience, we greatly constrain the possibilities of what
246 + would otherwise be an unwieldy exponential search. This extra
247 + constraint can be the difference between easily understanding what
248 + is happening in a video and being completely lost in a sea of
249 + incomprehensible color and movement.
251 - I built =CORTEX=, a comprehensive platform for embodied AI
252 experiments. =CORTEX= supports many features lacking in other
253 @@ -363,18 +477,22 @@
254 - I built =EMPATH=, which uses =CORTEX= to identify the actions of
255 a worm-like creature using a computational model of empathy.
257 -* Building =CORTEX=
258 -
259 - I intend for =CORTEX= to be used as a general-purpose library for
260 - building creatures and outfitting them with senses, so that it will
261 - be useful for other researchers who want to test out ideas of their
262 - own. To this end, wherver I have had to make archetictural choices
263 - about =CORTEX=, I have chosen to give as much freedom to the user as
264 - possible, so that =CORTEX= may be used for things I have not
265 - forseen.
266 -
267 -** Simulation or Reality?
268 -
269 +
270 +* Designing =CORTEX=
271 + In this section, I outline the design decisions that went into
272 + making =CORTEX=, along with some details about its
273 + implementation. (A practical guide to getting started with =CORTEX=,
274 + which skips over the history and implementation details presented
275 + here, is provided in an appendix \ref{} at the end of this paper.)
276 +
277 + Throughout this project, I intended for =CORTEX= to be flexible and
278 + extensible enough to be useful for other researchers who want to
279 + test out ideas of their own. To this end, wherver I have had to make
280 + archetictural choices about =CORTEX=, I have chosen to give as much
281 + freedom to the user as possible, so that =CORTEX= may be used for
282 + things I have not forseen.
283 +
284 +** Building in simulation versus reality
285 The most important archetictural decision of all is the choice to
286 use a computer-simulated environemnt in the first place! The world
287 is a vast and rich place, and for now simulations are a very poor