rlm@513
|
1 diff -r f639e2139ce2 thesis/cortex.org
|
rlm@513
|
2 --- a/thesis/cortex.org Sun Mar 30 01:34:43 2014 -0400
|
rlm@513
|
3 +++ b/thesis/cortex.org Sun Mar 30 10:07:17 2014 -0400
|
rlm@513
|
4 @@ -41,49 +41,46 @@
|
rlm@513
|
5 [[./images/aurellem-gray.png]]
|
rlm@513
|
6
|
rlm@513
|
7
|
rlm@513
|
8 -* Empathy and Embodiment as problem solving strategies
|
rlm@513
|
9 +* Empathy \& Embodiment: problem solving strategies
|
rlm@513
|
10
|
rlm@513
|
11 - By the end of this thesis, you will have seen a novel approach to
|
rlm@513
|
12 - interpreting video using embodiment and empathy. You will have also
|
rlm@513
|
13 - seen one way to efficiently implement empathy for embodied
|
rlm@513
|
14 - creatures. Finally, you will become familiar with =CORTEX=, a system
|
rlm@513
|
15 - for designing and simulating creatures with rich senses, which you
|
rlm@513
|
16 - may choose to use in your own research.
|
rlm@513
|
17 -
|
rlm@513
|
18 - This is the core vision of my thesis: That one of the important ways
|
rlm@513
|
19 - in which we understand others is by imagining ourselves in their
|
rlm@513
|
20 - position and emphatically feeling experiences relative to our own
|
rlm@513
|
21 - bodies. By understanding events in terms of our own previous
|
rlm@513
|
22 - corporeal experience, we greatly constrain the possibilities of what
|
rlm@513
|
23 - would otherwise be an unwieldy exponential search. This extra
|
rlm@513
|
24 - constraint can be the difference between easily understanding what
|
rlm@513
|
25 - is happening in a video and being completely lost in a sea of
|
rlm@513
|
26 - incomprehensible color and movement.
|
rlm@513
|
27 -
|
rlm@513
|
28 -** Recognizing actions in video is extremely difficult
|
rlm@513
|
29 -
|
rlm@513
|
30 - Consider for example the problem of determining what is happening
|
rlm@513
|
31 - in a video of which this is one frame:
|
rlm@513
|
32 -
|
rlm@513
|
33 +** The problem: recognizing actions in video is extremely difficult
|
rlm@513
|
34 +# developing / requires useful representations
|
rlm@513
|
35 +
|
rlm@513
|
36 + Examine the following collection of images. As you, and indeed very
|
rlm@513
|
37 + young children, can easily determine, each one is a picture of
|
rlm@513
|
38 + someone drinking.
|
rlm@513
|
39 +
|
rlm@513
|
40 + # dxh: cat, cup, drinking fountain, rain, straw, coconut
|
rlm@513
|
41 #+caption: A cat drinking some water. Identifying this action is
|
rlm@513
|
42 - #+caption: beyond the state of the art for computers.
|
rlm@513
|
43 + #+caption: beyond the capabilities of existing computer vision systems.
|
rlm@513
|
44 #+ATTR_LaTeX: :width 7cm
|
rlm@513
|
45 [[./images/cat-drinking.jpg]]
|
rlm@513
|
46 +
|
rlm@513
|
47 + Nevertheless, it is beyond the state of the art for a computer
|
rlm@513
|
48 + vision program to describe what's happening in each of these
|
rlm@513
|
49 + images, or what's common to them. Part of the problem is that many
|
rlm@513
|
50 + computer vision systems focus on pixel-level details or probability
|
rlm@513
|
51 + distributions of pixels, with little focus on [...]
|
rlm@513
|
52 +
|
rlm@513
|
53 +
|
rlm@513
|
54 + In fact, the contents of scene may have much less to do with pixel
|
rlm@513
|
55 + probabilities than with recognizing various affordances: things you
|
rlm@513
|
56 + can move, objects you can grasp, spaces that can be filled
|
rlm@513
|
57 + (Gibson). For example, what processes might enable you to see the
|
rlm@513
|
58 + chair in figure \ref{hidden-chair}?
|
rlm@513
|
59 + # Or suppose that you are building a program that recognizes chairs.
|
rlm@513
|
60 + # How could you ``see'' the chair ?
|
rlm@513
|
61
|
rlm@513
|
62 - It is currently impossible for any computer program to reliably
|
rlm@513
|
63 - label such a video as ``drinking''. And rightly so -- it is a very
|
rlm@513
|
64 - hard problem! What features can you describe in terms of low level
|
rlm@513
|
65 - functions of pixels that can even begin to describe at a high level
|
rlm@513
|
66 - what is happening here?
|
rlm@513
|
67 -
|
rlm@513
|
68 - Or suppose that you are building a program that recognizes chairs.
|
rlm@513
|
69 - How could you ``see'' the chair in figure \ref{hidden-chair}?
|
rlm@513
|
70 -
|
rlm@513
|
71 + # dxh: blur chair
|
rlm@513
|
72 #+caption: The chair in this image is quite obvious to humans, but I
|
rlm@513
|
73 #+caption: doubt that any modern computer vision program can find it.
|
rlm@513
|
74 #+name: hidden-chair
|
rlm@513
|
75 #+ATTR_LaTeX: :width 10cm
|
rlm@513
|
76 [[./images/fat-person-sitting-at-desk.jpg]]
|
rlm@513
|
77 +
|
rlm@513
|
78 +
|
rlm@513
|
79 +
|
rlm@513
|
80 +
|
rlm@513
|
81
|
rlm@513
|
82 Finally, how is it that you can easily tell the difference between
|
rlm@513
|
83 how the girls /muscles/ are working in figure \ref{girl}?
|
rlm@513
|
84 @@ -95,10 +92,13 @@
|
rlm@513
|
85 #+ATTR_LaTeX: :width 7cm
|
rlm@513
|
86 [[./images/wall-push.png]]
|
rlm@513
|
87
|
rlm@513
|
88 +
|
rlm@513
|
89 +
|
rlm@513
|
90 +
|
rlm@513
|
91 Each of these examples tells us something about what might be going
|
rlm@513
|
92 on in our minds as we easily solve these recognition problems.
|
rlm@513
|
93
|
rlm@513
|
94 - The hidden chairs show us that we are strongly triggered by cues
|
rlm@513
|
95 + The hidden chair shows us that we are strongly triggered by cues
|
rlm@513
|
96 relating to the position of human bodies, and that we can determine
|
rlm@513
|
97 the overall physical configuration of a human body even if much of
|
rlm@513
|
98 that body is occluded.
|
rlm@513
|
99 @@ -109,10 +109,107 @@
|
rlm@513
|
100 most positions, and we can easily project this self-knowledge to
|
rlm@513
|
101 imagined positions triggered by images of the human body.
|
rlm@513
|
102
|
rlm@513
|
103 -** =EMPATH= neatly solves recognition problems
|
rlm@513
|
104 +** A step forward: the sensorimotor-centered approach
|
rlm@513
|
105 +# ** =EMPATH= recognizes what creatures are doing
|
rlm@513
|
106 +# neatly solves recognition problems
|
rlm@513
|
107 + In this thesis, I explore the idea that our knowledge of our own
|
rlm@513
|
108 + bodies enables us to recognize the actions of others.
|
rlm@513
|
109 +
|
rlm@513
|
110 + First, I built a system for constructing virtual creatures with
|
rlm@513
|
111 + physiologically plausible sensorimotor systems and detailed
|
rlm@513
|
112 + environments. The result is =CORTEX=, which is described in section
|
rlm@513
|
113 + \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other
|
rlm@513
|
114 + AI researchers; it is provided in full with detailed instructions
|
rlm@513
|
115 + on the web [here].)
|
rlm@513
|
116 +
|
rlm@513
|
117 + Next, I wrote routines which enabled a simple worm-like creature to
|
rlm@513
|
118 + infer the actions of a second worm-like creature, using only its
|
rlm@513
|
119 + own prior sensorimotor experiences and knowledge of the second
|
rlm@513
|
120 + worm's joint positions. This program, =EMPATH=, is described in
|
rlm@513
|
121 + section \ref{sec-3}, and the key results of this experiment are
|
rlm@513
|
122 + summarized below.
|
rlm@513
|
123 +
|
rlm@513
|
124 + #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer
|
rlm@513
|
125 + #+caption: the complete sensory experience and classify these four poses.
|
rlm@513
|
126 + #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.
|
rlm@513
|
127 + #+name: worm-recognition-intro-2
|
rlm@513
|
128 + #+ATTR_LaTeX: :width 15cm
|
rlm@513
|
129 + [[./images/empathy-1.png]]
|
rlm@513
|
130 +
|
rlm@513
|
131 + # =CORTEX= provides a language for describing the sensorimotor
|
rlm@513
|
132 + # experiences of various creatures.
|
rlm@513
|
133 +
|
rlm@513
|
134 + # Next, I developed an experiment to test the power of =CORTEX='s
|
rlm@513
|
135 + # sensorimotor-centered language for solving recognition problems. As
|
rlm@513
|
136 + # a proof of concept, I wrote routines which enabled a simple
|
rlm@513
|
137 + # worm-like creature to infer the actions of a second worm-like
|
rlm@513
|
138 + # creature, using only its own previous sensorimotor experiences and
|
rlm@513
|
139 + # knowledge of the second worm's joints (figure
|
rlm@513
|
140 + # \ref{worm-recognition-intro-2}). The result of this proof of
|
rlm@513
|
141 + # concept was the program =EMPATH=, described in section
|
rlm@513
|
142 + # \ref{sec-3}. The key results of this
|
rlm@513
|
143 +
|
rlm@513
|
144 + # Using only first-person sensorimotor experiences and third-person
|
rlm@513
|
145 + # proprioceptive data,
|
rlm@513
|
146 +
|
rlm@513
|
147 +*** Key results
|
rlm@513
|
148 + - After one-shot supervised training, =EMPATH= was able recognize a
|
rlm@513
|
149 + wide variety of static poses and dynamic actions---ranging from
|
rlm@513
|
150 + curling in a circle to wriggling with a particular frequency ---
|
rlm@513
|
151 + with 95\% accuracy.
|
rlm@513
|
152 + - These results were completely independent of viewing angle
|
rlm@513
|
153 + because the underlying body-centered language fundamentally is;
|
rlm@513
|
154 + once an action is learned, it can be recognized equally well from
|
rlm@513
|
155 + any viewing angle.
|
rlm@513
|
156 + - =EMPATH= is surprisingly short; the sensorimotor-centered
|
rlm@513
|
157 + language provided by =CORTEX= resulted in extremely economical
|
rlm@513
|
158 + recognition routines --- about 0000 lines in all --- suggesting
|
rlm@513
|
159 + that such representations are very powerful, and often
|
rlm@513
|
160 + indispensible for the types of recognition tasks considered here.
|
rlm@513
|
161 + - Although for expediency's sake, I relied on direct knowledge of
|
rlm@513
|
162 + joint positions in this proof of concept, it would be
|
rlm@513
|
163 + straightforward to extend =EMPATH= so that it (more
|
rlm@513
|
164 + realistically) infers joint positions from its visual data.
|
rlm@513
|
165 +
|
rlm@513
|
166 +# because the underlying language is fundamentally orientation-independent
|
rlm@513
|
167 +
|
rlm@513
|
168 +# recognize the actions of a worm with 95\% accuracy. The
|
rlm@513
|
169 +# recognition tasks
|
rlm@513
|
170
|
rlm@513
|
171 - I propose a system that can express the types of recognition
|
rlm@513
|
172 - problems above in a form amenable to computation. It is split into
|
rlm@513
|
173 +
|
rlm@513
|
174 +
|
rlm@513
|
175 +
|
rlm@513
|
176 + [Talk about these results and what you find promising about them]
|
rlm@513
|
177 +
|
rlm@513
|
178 +** Roadmap
|
rlm@513
|
179 + [I'm going to explain how =CORTEX= works, then break down how
|
rlm@513
|
180 + =EMPATH= does its thing. Because the details reveal such-and-such
|
rlm@513
|
181 + about the approach.]
|
rlm@513
|
182 +
|
rlm@513
|
183 + # The success of this simple proof-of-concept offers a tantalizing
|
rlm@513
|
184 +
|
rlm@513
|
185 +
|
rlm@513
|
186 + # explore the idea
|
rlm@513
|
187 + # The key contribution of this thesis is the idea that body-centered
|
rlm@513
|
188 + # representations (which express
|
rlm@513
|
189 +
|
rlm@513
|
190 +
|
rlm@513
|
191 + # the
|
rlm@513
|
192 + # body-centered approach --- in which I try to determine what's
|
rlm@513
|
193 + # happening in a scene by bringing it into registration with my own
|
rlm@513
|
194 + # bodily experiences --- are indispensible for recognizing what
|
rlm@513
|
195 + # creatures are doing in a scene.
|
rlm@513
|
196 +
|
rlm@513
|
197 +* COMMENT
|
rlm@513
|
198 +# body-centered language
|
rlm@513
|
199 +
|
rlm@513
|
200 + In this thesis, I'll describe =EMPATH=, which solves a certain
|
rlm@513
|
201 + class of recognition problems
|
rlm@513
|
202 +
|
rlm@513
|
203 + The key idea is to use self-centered (or first-person) language.
|
rlm@513
|
204 +
|
rlm@513
|
205 + I have built a system that can express the types of recognition
|
rlm@513
|
206 + problems in a form amenable to computation. It is split into
|
rlm@513
|
207 four parts:
|
rlm@513
|
208
|
rlm@513
|
209 - Free/Guided Play :: The creature moves around and experiences the
|
rlm@513
|
210 @@ -286,14 +383,14 @@
|
rlm@513
|
211 code to create a creature, and can use a wide library of
|
rlm@513
|
212 pre-existing blender models as a base for your own creatures.
|
rlm@513
|
213
|
rlm@513
|
214 - - =CORTEX= implements a wide variety of senses, including touch,
|
rlm@513
|
215 + - =CORTEX= implements a wide variety of senses: touch,
|
rlm@513
|
216 proprioception, vision, hearing, and muscle tension. Complicated
|
rlm@513
|
217 senses like touch, and vision involve multiple sensory elements
|
rlm@513
|
218 embedded in a 2D surface. You have complete control over the
|
rlm@513
|
219 distribution of these sensor elements through the use of simple
|
rlm@513
|
220 png image files. In particular, =CORTEX= implements more
|
rlm@513
|
221 comprehensive hearing than any other creature simulation system
|
rlm@513
|
222 - available.
|
rlm@513
|
223 + available.
|
rlm@513
|
224
|
rlm@513
|
225 - =CORTEX= supports any number of creatures and any number of
|
rlm@513
|
226 senses. Time in =CORTEX= dialates so that the simulated creatures
|
rlm@513
|
227 @@ -353,7 +450,24 @@
|
rlm@513
|
228 \end{sidewaysfigure}
|
rlm@513
|
229 #+END_LaTeX
|
rlm@513
|
230
|
rlm@513
|
231 -** Contributions
|
rlm@513
|
232 +** Road map
|
rlm@513
|
233 +
|
rlm@513
|
234 + By the end of this thesis, you will have seen a novel approach to
|
rlm@513
|
235 + interpreting video using embodiment and empathy. You will have also
|
rlm@513
|
236 + seen one way to efficiently implement empathy for embodied
|
rlm@513
|
237 + creatures. Finally, you will become familiar with =CORTEX=, a system
|
rlm@513
|
238 + for designing and simulating creatures with rich senses, which you
|
rlm@513
|
239 + may choose to use in your own research.
|
rlm@513
|
240 +
|
rlm@513
|
241 + This is the core vision of my thesis: That one of the important ways
|
rlm@513
|
242 + in which we understand others is by imagining ourselves in their
|
rlm@513
|
243 + position and emphatically feeling experiences relative to our own
|
rlm@513
|
244 + bodies. By understanding events in terms of our own previous
|
rlm@513
|
245 + corporeal experience, we greatly constrain the possibilities of what
|
rlm@513
|
246 + would otherwise be an unwieldy exponential search. This extra
|
rlm@513
|
247 + constraint can be the difference between easily understanding what
|
rlm@513
|
248 + is happening in a video and being completely lost in a sea of
|
rlm@513
|
249 + incomprehensible color and movement.
|
rlm@513
|
250
|
rlm@513
|
251 - I built =CORTEX=, a comprehensive platform for embodied AI
|
rlm@513
|
252 experiments. =CORTEX= supports many features lacking in other
|
rlm@513
|
253 @@ -363,18 +477,22 @@
|
rlm@513
|
254 - I built =EMPATH=, which uses =CORTEX= to identify the actions of
|
rlm@513
|
255 a worm-like creature using a computational model of empathy.
|
rlm@513
|
256
|
rlm@513
|
257 -* Building =CORTEX=
|
rlm@513
|
258 -
|
rlm@513
|
259 - I intend for =CORTEX= to be used as a general-purpose library for
|
rlm@513
|
260 - building creatures and outfitting them with senses, so that it will
|
rlm@513
|
261 - be useful for other researchers who want to test out ideas of their
|
rlm@513
|
262 - own. To this end, wherver I have had to make archetictural choices
|
rlm@513
|
263 - about =CORTEX=, I have chosen to give as much freedom to the user as
|
rlm@513
|
264 - possible, so that =CORTEX= may be used for things I have not
|
rlm@513
|
265 - forseen.
|
rlm@513
|
266 -
|
rlm@513
|
267 -** Simulation or Reality?
|
rlm@513
|
268 -
|
rlm@513
|
269 +
|
rlm@513
|
270 +* Designing =CORTEX=
|
rlm@513
|
271 + In this section, I outline the design decisions that went into
|
rlm@513
|
272 + making =CORTEX=, along with some details about its
|
rlm@513
|
273 + implementation. (A practical guide to getting started with =CORTEX=,
|
rlm@513
|
274 + which skips over the history and implementation details presented
|
rlm@513
|
275 + here, is provided in an appendix \ref{} at the end of this paper.)
|
rlm@513
|
276 +
|
rlm@513
|
277 + Throughout this project, I intended for =CORTEX= to be flexible and
|
rlm@513
|
278 + extensible enough to be useful for other researchers who want to
|
rlm@513
|
279 + test out ideas of their own. To this end, wherver I have had to make
|
rlm@513
|
280 + archetictural choices about =CORTEX=, I have chosen to give as much
|
rlm@513
|
281 + freedom to the user as possible, so that =CORTEX= may be used for
|
rlm@513
|
282 + things I have not forseen.
|
rlm@513
|
283 +
|
rlm@513
|
284 +** Building in simulation versus reality
|
rlm@513
|
285 The most important archetictural decision of all is the choice to
|
rlm@513
|
286 use a computer-simulated environemnt in the first place! The world
|
rlm@513
|
287 is a vast and rich place, and for now simulations are a very poor
|