comparison thesis/cortex.org @ 516:ced955c3c84f

resurrect old cortex to fix flow issues.
author Robert McIntyre <rlm@mit.edu>
date Sun, 30 Mar 2014 22:48:19 -0400
parents 58fa1ffd481e
children 68665d2c32a7
comparison
equal deleted inserted replaced
515:58fa1ffd481e 516:ced955c3c84f
40 #+ATTR_LaTeX: :width 10cm 40 #+ATTR_LaTeX: :width 10cm
41 [[./images/aurellem-gray.png]] 41 [[./images/aurellem-gray.png]]
42 42
43 43
44 * Empathy \& Embodiment: problem solving strategies 44 * Empathy \& Embodiment: problem solving strategies
45
46 By the end of this thesis, you will have seen a novel approach to
47 interpreting video using embodiment and empathy. You will have also
48 seen one way to efficiently implement empathy for embodied
49 creatures. Finally, you will become familiar with =CORTEX=, a system
50 for designing and simulating creatures with rich senses, which you
51 may choose to use in your own research.
45 52
46 ** The problem: recognizing actions in video is extremely difficult 53 This is the core vision of my thesis: That one of the important ways
47 # developing / requires useful representations 54 in which we understand others is by imagining ourselves in their
48 55 position and emphatically feeling experiences relative to our own
49 Examine the following collection of images. As you, and indeed very 56 bodies. By understanding events in terms of our own previous
50 young children, can easily determine, each one is a picture of 57 corporeal experience, we greatly constrain the possibilities of what
51 someone drinking. 58 would otherwise be an unwieldy exponential search. This extra
52 59 constraint can be the difference between easily understanding what
53 # dxh: cat, cup, drinking fountain, rain, straw, coconut 60 is happening in a video and being completely lost in a sea of
61 incomprehensible color and movement.
62
63
64 ** The problem: recognizing actions in video is hard!
65
66 Examine the following image. What is happening? As you, and indeed
67 very young children, can easily determine, this is an image of
68 drinking.
69
54 #+caption: A cat drinking some water. Identifying this action is 70 #+caption: A cat drinking some water. Identifying this action is
55 #+caption: beyond the capabilities of existing computer vision systems. 71 #+caption: beyond the capabilities of existing computer vision systems.
56 #+ATTR_LaTeX: :width 7cm 72 #+ATTR_LaTeX: :width 7cm
57 [[./images/cat-drinking.jpg]] 73 [[./images/cat-drinking.jpg]]
58 74
59 Nevertheless, it is beyond the state of the art for a computer 75 Nevertheless, it is beyond the state of the art for a computer
60 vision program to describe what's happening in each of these 76 vision program to describe what's happening in this image. Part of
61 images, or what's common to them. Part of the problem is that many 77 the problem is that many computer vision systems focus on
62 computer vision systems focus on pixel-level details or probability 78 pixel-level details or comparisons to example images (such as
63 distributions of pixels, with little focus on [...] 79 \cite{volume-action-recognition}), but the 3D world is so variable
64 80 that it is hard to descrive the world in terms of possible images.
65 81
66 In fact, the contents of scene may have much less to do with pixel 82 In fact, the contents of scene may have much less to do with pixel
67 probabilities than with recognizing various affordances: things you 83 probabilities than with recognizing various affordances: things you
68 can move, objects you can grasp, spaces that can be filled 84 can move, objects you can grasp, spaces that can be filled . For
69 (Gibson). For example, what processes might enable you to see the 85 example, what processes might enable you to see the chair in figure
70 chair in figure \ref{hidden-chair}? 86 \ref{hidden-chair}?
71 # Or suppose that you are building a program that recognizes chairs. 87
72 # How could you ``see'' the chair ?
73
74 # dxh: blur chair
75 #+caption: The chair in this image is quite obvious to humans, but I 88 #+caption: The chair in this image is quite obvious to humans, but I
76 #+caption: doubt that any modern computer vision program can find it. 89 #+caption: doubt that any modern computer vision program can find it.
77 #+name: hidden-chair 90 #+name: hidden-chair
78 #+ATTR_LaTeX: :width 10cm 91 #+ATTR_LaTeX: :width 10cm
79 [[./images/fat-person-sitting-at-desk.jpg]] 92 [[./images/fat-person-sitting-at-desk.jpg]]
80 93
81
82
83
84
85 Finally, how is it that you can easily tell the difference between 94 Finally, how is it that you can easily tell the difference between
86 how the girls /muscles/ are working in figure \ref{girl}? 95 how the girls /muscles/ are working in figure \ref{girl}?
87 96
88 #+caption: The mysterious ``common sense'' appears here as you are able 97 #+caption: The mysterious ``common sense'' appears here as you are able
89 #+caption: to discern the difference in how the girl's arm muscles 98 #+caption: to discern the difference in how the girl's arm muscles
90 #+caption: are activated between the two images. 99 #+caption: are activated between the two images.
91 #+name: girl 100 #+name: girl
92 #+ATTR_LaTeX: :width 7cm 101 #+ATTR_LaTeX: :width 7cm
93 [[./images/wall-push.png]] 102 [[./images/wall-push.png]]
94 103
95
96
97
98 Each of these examples tells us something about what might be going 104 Each of these examples tells us something about what might be going
99 on in our minds as we easily solve these recognition problems. 105 on in our minds as we easily solve these recognition problems.
100 106
101 The hidden chair shows us that we are strongly triggered by cues 107 The hidden chair shows us that we are strongly triggered by cues
102 relating to the position of human bodies, and that we can determine 108 relating to the position of human bodies, and that we can determine
108 We know well how our muscles would have to work to maintain us in 114 We know well how our muscles would have to work to maintain us in
109 most positions, and we can easily project this self-knowledge to 115 most positions, and we can easily project this self-knowledge to
110 imagined positions triggered by images of the human body. 116 imagined positions triggered by images of the human body.
111 117
112 ** A step forward: the sensorimotor-centered approach 118 ** A step forward: the sensorimotor-centered approach
113 # ** =EMPATH= recognizes what creatures are doing 119
114 # neatly solves recognition problems
115 In this thesis, I explore the idea that our knowledge of our own 120 In this thesis, I explore the idea that our knowledge of our own
116 bodies enables us to recognize the actions of others. 121 bodies, combined with our own rich senses, enables us to recognize
122 the actions of others.
123
124 For example, I think humans are able to label the cat video as
125 ``drinking'' because they imagine /themselves/ as the cat, and
126 imagine putting their face up against a stream of water and
127 sticking out their tongue. In that imagined world, they can feel
128 the cool water hitting their tongue, and feel the water entering
129 their body, and are able to recognize that /feeling/ as drinking.
130 So, the label of the action is not really in the pixels of the
131 image, but is found clearly in a simulation inspired by those
132 pixels. An imaginative system, having been trained on drinking and
133 non-drinking examples and learning that the most important
134 component of drinking is the feeling of water sliding down one's
135 throat, would analyze a video of a cat drinking in the following
136 manner:
137
138 1. Create a physical model of the video by putting a ``fuzzy''
139 model of its own body in place of the cat. Possibly also create
140 a simulation of the stream of water.
141
142 2. Play out this simulated scene and generate imagined sensory
143 experience. This will include relevant muscle contractions, a
144 close up view of the stream from the cat's perspective, and most
145 importantly, the imagined feeling of water entering the
146 mouth. The imagined sensory experience can come from a
147 simulation of the event, but can also be pattern-matched from
148 previous, similar embodied experience.
149
150 3. The action is now easily identified as drinking by the sense of
151 taste alone. The other senses (such as the tongue moving in and
152 out) help to give plausibility to the simulated action. Note that
153 the sense of vision, while critical in creating the simulation,
154 is not critical for identifying the action from the simulation.
155
156 For the chair examples, the process is even easier:
157
158 1. Align a model of your body to the person in the image.
159
160 2. Generate proprioceptive sensory data from this alignment.
161
162 3. Use the imagined proprioceptive data as a key to lookup related
163 sensory experience associated with that particular proproceptive
164 feeling.
165
166 4. Retrieve the feeling of your bottom resting on a surface, your
167 knees bent, and your leg muscles relaxed.
168
169 5. This sensory information is consistent with your =sitting?=
170 sensory predicate, so you (and the entity in the image) must be
171 sitting.
172
173 6. There must be a chair-like object since you are sitting.
174
175 Empathy offers yet another alternative to the age-old AI
176 representation question: ``What is a chair?'' --- A chair is the
177 feeling of sitting!
178
179 One powerful advantage of empathic problem solving is that it
180 factors the action recognition problem into two easier problems. To
181 use empathy, you need an /aligner/, which takes the video and a
182 model of your body, and aligns the model with the video. Then, you
183 need a /recognizer/, which uses the aligned model to interpret the
184 action. The power in this method lies in the fact that you describe
185 all actions form a body-centered viewpoint. You are less tied to
186 the particulars of any visual representation of the actions. If you
187 teach the system what ``running'' is, and you have a good enough
188 aligner, the system will from then on be able to recognize running
189 from any point of view, even strange points of view like above or
190 underneath the runner. This is in contrast to action recognition
191 schemes that try to identify actions using a non-embodied approach.
192 If these systems learn about running as viewed from the side, they
193 will not automatically be able to recognize running from any other
194 viewpoint.
195
196 Another powerful advantage is that using the language of multiple
197 body-centered rich senses to describe body-centerd actions offers a
198 massive boost in descriptive capability. Consider how difficult it
199 would be to compose a set of HOG filters to describe the action of
200 a simple worm-creature ``curling'' so that its head touches its
201 tail, and then behold the simplicity of describing thus action in a
202 language designed for the task (listing \ref{grand-circle-intro}):
203
204 #+caption: Body-centerd actions are best expressed in a body-centered
205 #+caption: language. This code detects when the worm has curled into a
206 #+caption: full circle. Imagine how you would replicate this functionality
207 #+caption: using low-level pixel features such as HOG filters!
208 #+name: grand-circle-intro
209 #+begin_listing clojure
210 #+begin_src clojure
211 (defn grand-circle?
212 "Does the worm form a majestic circle (one end touching the other)?"
213 [experiences]
214 (and (curled? experiences)
215 (let [worm-touch (:touch (peek experiences))
216 tail-touch (worm-touch 0)
217 head-touch (worm-touch 4)]
218 (and (< 0.2 (contact worm-segment-bottom-tip tail-touch))
219 (< 0.2 (contact worm-segment-top-tip head-touch))))))
220 #+end_src
221 #+end_listing
222
223 ** =EMPATH= regognizes actions using empathy
117 224
118 First, I built a system for constructing virtual creatures with 225 First, I built a system for constructing virtual creatures with
119 physiologically plausible sensorimotor systems and detailed 226 physiologically plausible sensorimotor systems and detailed
120 environments. The result is =CORTEX=, which is described in section 227 environments. The result is =CORTEX=, which is described in section
121 \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other 228 \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other
126 infer the actions of a second worm-like creature, using only its 233 infer the actions of a second worm-like creature, using only its
127 own prior sensorimotor experiences and knowledge of the second 234 own prior sensorimotor experiences and knowledge of the second
128 worm's joint positions. This program, =EMPATH=, is described in 235 worm's joint positions. This program, =EMPATH=, is described in
129 section \ref{sec-3}, and the key results of this experiment are 236 section \ref{sec-3}, and the key results of this experiment are
130 summarized below. 237 summarized below.
131
132 #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer
133 #+caption: the complete sensory experience and classify these four poses.
134 #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.
135 #+name: worm-recognition-intro-2
136 #+ATTR_LaTeX: :width 15cm
137 [[./images/empathy-1.png]]
138
139 # =CORTEX= provides a language for describing the sensorimotor
140 # experiences of various creatures.
141
142 # Next, I developed an experiment to test the power of =CORTEX='s
143 # sensorimotor-centered language for solving recognition problems. As
144 # a proof of concept, I wrote routines which enabled a simple
145 # worm-like creature to infer the actions of a second worm-like
146 # creature, using only its own previous sensorimotor experiences and
147 # knowledge of the second worm's joints (figure
148 # \ref{worm-recognition-intro-2}). The result of this proof of
149 # concept was the program =EMPATH=, described in section
150 # \ref{sec-3}. The key results of this
151
152 # Using only first-person sensorimotor experiences and third-person
153 # proprioceptive data,
154
155 *** Key results
156 - After one-shot supervised training, =EMPATH= was able recognize a
157 wide variety of static poses and dynamic actions---ranging from
158 curling in a circle to wriggling with a particular frequency ---
159 with 95\% accuracy.
160 - These results were completely independent of viewing angle
161 because the underlying body-centered language fundamentally is
162 independent; once an action is learned, it can be recognized
163 equally well from any viewing angle.
164 - =EMPATH= is surprisingly short; the sensorimotor-centered
165 language provided by =CORTEX= resulted in extremely economical
166 recognition routines --- about 0000 lines in all --- suggesting
167 that such representations are very powerful, and often
168 indispensible for the types of recognition tasks considered here.
169 - Although for expediency's sake, I relied on direct knowledge of
170 joint positions in this proof of concept, it would be
171 straightforward to extend =EMPATH= so that it (more
172 realistically) infers joint positions from its visual data.
173
174 # because the underlying language is fundamentally orientation-independent
175
176 # recognize the actions of a worm with 95\% accuracy. The
177 # recognition tasks
178
179
180
181
182 [Talk about these results and what you find promising about them]
183
184 ** Roadmap
185 [I'm going to explain how =CORTEX= works, then break down how
186 =EMPATH= does its thing. Because the details reveal such-and-such
187 about the approach.]
188
189 # The success of this simple proof-of-concept offers a tantalizing
190
191
192 # explore the idea
193 # The key contribution of this thesis is the idea that body-centered
194 # representations (which express
195
196
197 # the
198 # body-centered approach --- in which I try to determine what's
199 # happening in a scene by bringing it into registration with my own
200 # bodily experiences --- are indispensible for recognizing what
201 # creatures are doing in a scene.
202
203 * COMMENT
204 # body-centered language
205
206 In this thesis, I'll describe =EMPATH=, which solves a certain
207 class of recognition problems
208
209 The key idea is to use self-centered (or first-person) language.
210 238
211 I have built a system that can express the types of recognition 239 I have built a system that can express the types of recognition
212 problems in a form amenable to computation. It is split into 240 problems in a form amenable to computation. It is split into
213 four parts: 241 four parts:
214 242
241 data, just as it would if it were actually experiencing the 269 data, just as it would if it were actually experiencing the
242 scene first-hand. If previous experience has been accurately 270 scene first-hand. If previous experience has been accurately
243 retrieved, and if it is analogous enough to the scene, then 271 retrieved, and if it is analogous enough to the scene, then
244 the creature will correctly identify the action in the scene. 272 the creature will correctly identify the action in the scene.
245 273
246 For example, I think humans are able to label the cat video as
247 ``drinking'' because they imagine /themselves/ as the cat, and
248 imagine putting their face up against a stream of water and
249 sticking out their tongue. In that imagined world, they can feel
250 the cool water hitting their tongue, and feel the water entering
251 their body, and are able to recognize that /feeling/ as drinking.
252 So, the label of the action is not really in the pixels of the
253 image, but is found clearly in a simulation inspired by those
254 pixels. An imaginative system, having been trained on drinking and
255 non-drinking examples and learning that the most important
256 component of drinking is the feeling of water sliding down one's
257 throat, would analyze a video of a cat drinking in the following
258 manner:
259
260 1. Create a physical model of the video by putting a ``fuzzy''
261 model of its own body in place of the cat. Possibly also create
262 a simulation of the stream of water.
263
264 2. Play out this simulated scene and generate imagined sensory
265 experience. This will include relevant muscle contractions, a
266 close up view of the stream from the cat's perspective, and most
267 importantly, the imagined feeling of water entering the
268 mouth. The imagined sensory experience can come from a
269 simulation of the event, but can also be pattern-matched from
270 previous, similar embodied experience.
271
272 3. The action is now easily identified as drinking by the sense of
273 taste alone. The other senses (such as the tongue moving in and
274 out) help to give plausibility to the simulated action. Note that
275 the sense of vision, while critical in creating the simulation,
276 is not critical for identifying the action from the simulation.
277
278 For the chair examples, the process is even easier:
279
280 1. Align a model of your body to the person in the image.
281
282 2. Generate proprioceptive sensory data from this alignment.
283
284 3. Use the imagined proprioceptive data as a key to lookup related
285 sensory experience associated with that particular proproceptive
286 feeling.
287
288 4. Retrieve the feeling of your bottom resting on a surface, your
289 knees bent, and your leg muscles relaxed.
290
291 5. This sensory information is consistent with the =sitting?=
292 sensory predicate, so you (and the entity in the image) must be
293 sitting.
294
295 6. There must be a chair-like object since you are sitting.
296
297 Empathy offers yet another alternative to the age-old AI
298 representation question: ``What is a chair?'' --- A chair is the
299 feeling of sitting.
300 274
301 My program, =EMPATH= uses this empathic problem solving technique 275 My program, =EMPATH= uses this empathic problem solving technique
302 to interpret the actions of a simple, worm-like creature. 276 to interpret the actions of a simple, worm-like creature.
303 277
304 #+caption: The worm performs many actions during free play such as 278 #+caption: The worm performs many actions during free play such as
311 #+caption: poses by inferring the complete sensory experience 285 #+caption: poses by inferring the complete sensory experience
312 #+caption: from proprioceptive data. 286 #+caption: from proprioceptive data.
313 #+name: worm-recognition-intro 287 #+name: worm-recognition-intro
314 #+ATTR_LaTeX: :width 15cm 288 #+ATTR_LaTeX: :width 15cm
315 [[./images/worm-poses.png]] 289 [[./images/worm-poses.png]]
316 290
317 One powerful advantage of empathic problem solving is that it 291 #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer
318 factors the action recognition problem into two easier problems. To 292 #+caption: the complete sensory experience and classify these four poses.
319 use empathy, you need an /aligner/, which takes the video and a 293 #+caption: The last image is a composite, depicting the intermediate stages
320 model of your body, and aligns the model with the video. Then, you 294 #+caption: of \emph{wriggling}.
321 need a /recognizer/, which uses the aligned model to interpret the 295 #+name: worm-recognition-intro-2
322 action. The power in this method lies in the fact that you describe 296 #+ATTR_LaTeX: :width 15cm
323 all actions form a body-centered viewpoint. You are less tied to 297 [[./images/empathy-1.png]]
324 the particulars of any visual representation of the actions. If you 298
325 teach the system what ``running'' is, and you have a good enough 299 Next, I developed an experiment to test the power of =CORTEX='s
326 aligner, the system will from then on be able to recognize running 300 sensorimotor-centered language for solving recognition problems. As
327 from any point of view, even strange points of view like above or 301 a proof of concept, I wrote routines which enabled a simple
328 underneath the runner. This is in contrast to action recognition 302 worm-like creature to infer the actions of a second worm-like
329 schemes that try to identify actions using a non-embodied approach. 303 creature, using only its own previous sensorimotor experiences and
330 If these systems learn about running as viewed from the side, they 304 knowledge of the second worm's joints (figure
331 will not automatically be able to recognize running from any other 305 \ref{worm-recognition-intro-2}). The result of this proof of
332 viewpoint. 306 concept was the program =EMPATH=, described in section \ref{sec-3}.
333 307
334 Another powerful advantage is that using the language of multiple 308 ** =EMPATH= is built on =CORTEX=, en environment for making creatures.
335 body-centered rich senses to describe body-centerd actions offers a 309
336 massive boost in descriptive capability. Consider how difficult it 310 # =CORTEX= provides a language for describing the sensorimotor
337 would be to compose a set of HOG filters to describe the action of 311 # experiences of various creatures.
338 a simple worm-creature ``curling'' so that its head touches its
339 tail, and then behold the simplicity of describing thus action in a
340 language designed for the task (listing \ref{grand-circle-intro}):
341
342 #+caption: Body-centerd actions are best expressed in a body-centered
343 #+caption: language. This code detects when the worm has curled into a
344 #+caption: full circle. Imagine how you would replicate this functionality
345 #+caption: using low-level pixel features such as HOG filters!
346 #+name: grand-circle-intro
347 #+begin_listing clojure
348 #+begin_src clojure
349 (defn grand-circle?
350 "Does the worm form a majestic circle (one end touching the other)?"
351 [experiences]
352 (and (curled? experiences)
353 (let [worm-touch (:touch (peek experiences))
354 tail-touch (worm-touch 0)
355 head-touch (worm-touch 4)]
356 (and (< 0.2 (contact worm-segment-bottom-tip tail-touch))
357 (< 0.2 (contact worm-segment-top-tip head-touch))))))
358 #+end_src
359 #+end_listing
360
361 ** =CORTEX= is a toolkit for building sensate creatures
362 312
363 I built =CORTEX= to be a general AI research platform for doing 313 I built =CORTEX= to be a general AI research platform for doing
364 experiments involving multiple rich senses and a wide variety and 314 experiments involving multiple rich senses and a wide variety and
365 number of creatures. I intend it to be useful as a library for many 315 number of creatures. I intend it to be useful as a library for many
366 more projects than just this thesis. =CORTEX= was necessary to meet 316 more projects than just this thesis. =CORTEX= was necessary to meet
410 that I know of that can support multiple entities that can each 360 that I know of that can support multiple entities that can each
411 hear the world from their own perspective. Other senses also 361 hear the world from their own perspective. Other senses also
412 require a small layer of Java code. =CORTEX= also uses =bullet=, a 362 require a small layer of Java code. =CORTEX= also uses =bullet=, a
413 physics simulator written in =C=. 363 physics simulator written in =C=.
414 364
415 #+caption: Here is the worm from above modeled in Blender, a free 365 #+caption: Here is the worm from figure \ref{worm-intro} modeled
416 #+caption: 3D-modeling program. Senses and joints are described 366 #+caption: in Blender, a free 3D-modeling program. Senses and
417 #+caption: using special nodes in Blender. 367 #+caption: joints are described using special nodes in Blender.
418 #+name: worm-recognition-intro 368 #+name: worm-recognition-intro
419 #+ATTR_LaTeX: :width 12cm 369 #+ATTR_LaTeX: :width 12cm
420 [[./images/blender-worm.png]] 370 [[./images/blender-worm.png]]
421 371
422 Here are some thing I anticipate that =CORTEX= might be used for: 372 Here are some thing I anticipate that =CORTEX= might be used for:
448 its own finger from the eye in its palm, and that it can feel its 398 its own finger from the eye in its palm, and that it can feel its
449 own thumb touching its palm.} 399 own thumb touching its palm.}
450 \end{sidewaysfigure} 400 \end{sidewaysfigure}
451 #+END_LaTeX 401 #+END_LaTeX
452 402
453 ** Road map 403 ** Contributions
454
455 By the end of this thesis, you will have seen a novel approach to
456 interpreting video using embodiment and empathy. You will have also
457 seen one way to efficiently implement empathy for embodied
458 creatures. Finally, you will become familiar with =CORTEX=, a system
459 for designing and simulating creatures with rich senses, which you
460 may choose to use in your own research.
461
462 This is the core vision of my thesis: That one of the important ways
463 in which we understand others is by imagining ourselves in their
464 position and emphatically feeling experiences relative to our own
465 bodies. By understanding events in terms of our own previous
466 corporeal experience, we greatly constrain the possibilities of what
467 would otherwise be an unwieldy exponential search. This extra
468 constraint can be the difference between easily understanding what
469 is happening in a video and being completely lost in a sea of
470 incomprehensible color and movement.
471 404
472 - I built =CORTEX=, a comprehensive platform for embodied AI 405 - I built =CORTEX=, a comprehensive platform for embodied AI
473 experiments. =CORTEX= supports many features lacking in other 406 experiments. =CORTEX= supports many features lacking in other
474 systems, such proper simulation of hearing. It is easy to create 407 systems, such proper simulation of hearing. It is easy to create
475 new =CORTEX= creatures using Blender, a free 3D modeling program. 408 new =CORTEX= creatures using Blender, a free 3D modeling program.
476 409
477 - I built =EMPATH=, which uses =CORTEX= to identify the actions of 410 - I built =EMPATH=, which uses =CORTEX= to identify the actions of
478 a worm-like creature using a computational model of empathy. 411 a worm-like creature using a computational model of empathy.
479 412
413 - After one-shot supervised training, =EMPATH= was able recognize a
414 wide variety of static poses and dynamic actions---ranging from
415 curling in a circle to wriggling with a particular frequency ---
416 with 95\% accuracy.
417
418 - These results were completely independent of viewing angle
419 because the underlying body-centered language fundamentally is
420 independent; once an action is learned, it can be recognized
421 equally well from any viewing angle.
422
423 - =EMPATH= is surprisingly short; the sensorimotor-centered
424 language provided by =CORTEX= resulted in extremely economical
425 recognition routines --- about 500 lines in all --- suggesting
426 that such representations are very powerful, and often
427 indispensible for the types of recognition tasks considered here.
428
429 - Although for expediency's sake, I relied on direct knowledge of
430 joint positions in this proof of concept, it would be
431 straightforward to extend =EMPATH= so that it (more
432 realistically) infers joint positions from its visual data.
480 433
481 * Designing =CORTEX= 434 * Designing =CORTEX=
435
482 In this section, I outline the design decisions that went into 436 In this section, I outline the design decisions that went into
483 making =CORTEX=, along with some details about its 437 making =CORTEX=, along with some details about its implementation.
484 implementation. (A practical guide to getting started with =CORTEX=, 438 (A practical guide to getting started with =CORTEX=, which skips
485 which skips over the history and implementation details presented 439 over the history and implementation details presented here, is
486 here, is provided in an appendix \ref{} at the end of this paper.) 440 provided in an appendix at the end of this thesis.)
487 441
488 Throughout this project, I intended for =CORTEX= to be flexible and 442 Throughout this project, I intended for =CORTEX= to be flexible and
489 extensible enough to be useful for other researchers who want to 443 extensible enough to be useful for other researchers who want to
490 test out ideas of their own. To this end, wherver I have had to make 444 test out ideas of their own. To this end, wherver I have had to make
491 archetictural choices about =CORTEX=, I have chosen to give as much 445 archetictural choices about =CORTEX=, I have chosen to give as much
552 time in the simulated world can be slowed down to accommodate the 506 time in the simulated world can be slowed down to accommodate the
553 limitations of the character's programming. In terms of cost, 507 limitations of the character's programming. In terms of cost,
554 doing everything in software is far cheaper than building custom 508 doing everything in software is far cheaper than building custom
555 real-time hardware. All you need is a laptop and some patience. 509 real-time hardware. All you need is a laptop and some patience.
556 510
557 ** Simulated time enables rapid prototyping and complex scenes 511 ** Simulated time enables rapid prototyping \& simple programs
558 512
559 I envision =CORTEX= being used to support rapid prototyping and 513 I envision =CORTEX= being used to support rapid prototyping and
560 iteration of ideas. Even if I could put together a well constructed 514 iteration of ideas. Even if I could put together a well constructed
561 kit for creating robots, it would still not be enough because of 515 kit for creating robots, it would still not be enough because of
562 the scourge of real-time processing. Anyone who wants to test their 516 the scourge of real-time processing. Anyone who wants to test their