comparison thesis/cortex.org @ 551:d304b2ea7c58

some changes from winston.
author Robert McIntyre <rlm@mit.edu>
date Fri, 02 May 2014 13:40:47 -0400
parents b1d8d9b4b569
children 20f64a70f8c5
comparison
equal deleted inserted replaced
550:b1d8d9b4b569 551:d304b2ea7c58
41 [[./images/aurellem-gray.png]] 41 [[./images/aurellem-gray.png]]
42 42
43 43
44 * Empathy \& Embodiment: problem solving strategies 44 * Empathy \& Embodiment: problem solving strategies
45 45
46 By the end of this thesis, you will have a novel approach to 46 By the time you have read this thesis, you will understand a novel
47 representing an recognizing physical actions using embodiment and 47 approach to representing and recognizing physical actions using
48 empathy. You will also see one way to efficiently implement physical 48 embodiment and empathy. You will also see one way to efficiently
49 empathy for embodied creatures. Finally, you will become familiar 49 implement physical empathy for embodied creatures. Finally, you will
50 with =CORTEX=, a system for designing and simulating creatures with 50 become familiar with =CORTEX=, a system for designing and simulating
51 rich senses, which I have designed as a library that you can use in 51 creatures with rich senses, which I have designed as a library that
52 your own research. Note that I /do not/ process video directly --- I 52 you can use in your own research. Note that I /do not/ process video
53 start with knowledge of the positions of a creature's body parts and 53 directly --- I start with knowledge of the positions of a creature's
54 works from there. 54 body parts and work from there.
55 55
56 This is the core vision of my thesis: That one of the important ways 56 This is the core vision of my thesis: That one of the important ways
57 in which we understand others is by imagining ourselves in their 57 in which we understand others is by imagining ourselves in their
58 position and emphatically feeling experiences relative to our own 58 position and emphatically feeling experiences relative to our own
59 bodies. By understanding events in terms of our own previous 59 bodies. By understanding events in terms of our own previous
63 is happening in a video and being completely lost in a sea of 63 is happening in a video and being completely lost in a sea of
64 incomprehensible color and movement. 64 incomprehensible color and movement.
65 65
66 ** The problem: recognizing actions is hard! 66 ** The problem: recognizing actions is hard!
67 67
68 Examine the following image. What is happening? As you, and indeed 68 Examine figure \ref{cat-drink}. What is happening? As you, and
69 very young children, can easily determine, this is an image of 69 indeed very young children, can easily determine, this is an image
70 drinking. 70 of drinking.
71 71
72 #+caption: A cat drinking some water. Identifying this action is 72 #+caption: A cat drinking some water. Identifying this action is
73 #+caption: beyond the capabilities of existing computer vision systems. 73 #+caption: beyond the capabilities of existing computer vision systems.
74 #+name: cat-drink
74 #+ATTR_LaTeX: :width 7cm 75 #+ATTR_LaTeX: :width 7cm
75 [[./images/cat-drinking.jpg]] 76 [[./images/cat-drinking.jpg]]
76 77
77 Nevertheless, it is beyond the state of the art for a computer 78 Nevertheless, it is beyond the state of the art for a computer
78 vision program to describe what's happening in this image. Part of 79 vision program to describe what's happening in this image. Part of
92 #+name: hidden-chair 93 #+name: hidden-chair
93 #+ATTR_LaTeX: :width 10cm 94 #+ATTR_LaTeX: :width 10cm
94 [[./images/fat-person-sitting-at-desk.jpg]] 95 [[./images/fat-person-sitting-at-desk.jpg]]
95 96
96 Finally, how is it that you can easily tell the difference between 97 Finally, how is it that you can easily tell the difference between
97 how the girls /muscles/ are working in figure \ref{girl}? 98 how the girl's /muscles/ are working in figure \ref{girl}?
98 99
99 #+caption: The mysterious ``common sense'' appears here as you are able 100 #+caption: The mysterious ``common sense'' appears here as you are able
100 #+caption: to discern the difference in how the girl's arm muscles 101 #+caption: to discern the difference in how the girl's arm muscles
101 #+caption: are activated between the two images. 102 #+caption: are activated between the two images. When you compare
103 #+caption: these two images, do you feel something in your own arm
104 #+caption: muscles?
102 #+name: girl 105 #+name: girl
103 #+ATTR_LaTeX: :width 7cm 106 #+ATTR_LaTeX: :width 7cm
104 [[./images/wall-push.png]] 107 [[./images/wall-push.png]]
105 108
106 Each of these examples tells us something about what might be going 109 Each of these examples tells us something about what might be going
136 their body, and are able to recognize that /feeling/ as drinking. 139 their body, and are able to recognize that /feeling/ as drinking.
137 So, the label of the action is not really in the pixels of the 140 So, the label of the action is not really in the pixels of the
138 image, but is found clearly in a simulation / recollection inspired 141 image, but is found clearly in a simulation / recollection inspired
139 by those pixels. An imaginative system, having been trained on 142 by those pixels. An imaginative system, having been trained on
140 drinking and non-drinking examples and learning that the most 143 drinking and non-drinking examples and learning that the most
141 important component of drinking is the feeling of water sliding 144 important component of drinking is the feeling of water flowing
142 down one's throat, would analyze a video of a cat drinking in the 145 down one's throat, would analyze a video of a cat drinking in the
143 following manner: 146 following manner:
144 147
145 1. Create a physical model of the video by putting a ``fuzzy'' 148 1. Create a physical model of the video by putting a ``fuzzy''
146 model of its own body in place of the cat. Possibly also create 149 model of its own body in place of the cat. Possibly also create
147 a simulation of the stream of water. 150 a simulation of the stream of water.
148 151
149 2. ``Play out'' this simulated scene and generate imagined sensory 152 2. Play out this simulated scene and generate imagined sensory
150 experience. This will include relevant muscle contractions, a 153 experience. This will include relevant muscle contractions, a
151 close up view of the stream from the cat's perspective, and most 154 close up view of the stream from the cat's perspective, and most
152 importantly, the imagined feeling of water entering the mouth. 155 importantly, the imagined feeling of water entering the mouth.
153 The imagined sensory experience can come from a simulation of 156 The imagined sensory experience can come from a simulation of
154 the event, but can also be pattern-matched from previous, 157 the event, but can also be pattern-matched from previous,
231 ** =EMPATH= recognizes actions using empathy 234 ** =EMPATH= recognizes actions using empathy
232 235
233 Exploring these ideas further demands a concrete implementation, so 236 Exploring these ideas further demands a concrete implementation, so
234 first, I built a system for constructing virtual creatures with 237 first, I built a system for constructing virtual creatures with
235 physiologically plausible sensorimotor systems and detailed 238 physiologically plausible sensorimotor systems and detailed
236 environments. The result is =CORTEX=, which is described in section 239 environments. The result is =CORTEX=, which I describe in chapter
237 \ref{sec-2}. 240 \ref{sec-2}.
238 241
239 Next, I wrote routines which enabled a simple worm-like creature to 242 Next, I wrote routines which enabled a simple worm-like creature to
240 infer the actions of a second worm-like creature, using only its 243 infer the actions of a second worm-like creature, using only its
241 own prior sensorimotor experiences and knowledge of the second 244 own prior sensorimotor experiences and knowledge of the second
242 worm's joint positions. This program, =EMPATH=, is described in 245 worm's joint positions. This program, =EMPATH=, is described in
243 section \ref{sec-3}. It's main components are: 246 chapter \ref{sec-3}. It's main components are:
244 247
245 - Embodied Action Definitions :: Many otherwise complicated actions 248 - Embodied Action Definitions :: Many otherwise complicated actions
246 are easily described in the language of a full suite of 249 are easily described in the language of a full suite of
247 body-centered, rich senses and experiences. For example, 250 body-centered, rich senses and experiences. For example,
248 drinking is the feeling of water sliding down your throat, and 251 drinking is the feeling of water flowing down your throat, and
249 cooling your insides. It's often accompanied by bringing your 252 cooling your insides. It's often accompanied by bringing your
250 hand close to your face, or bringing your face close to water. 253 hand close to your face, or bringing your face close to water.
251 Sitting down is the feeling of bending your knees, activating 254 Sitting down is the feeling of bending your knees, activating
252 your quadriceps, then feeling a surface with your bottom and 255 your quadriceps, then feeling a surface with your bottom and
253 relaxing your legs. These body-centered action descriptions 256 relaxing your legs. These body-centered action descriptions
314 language provided by =CORTEX= resulted in extremely economical 317 language provided by =CORTEX= resulted in extremely economical
315 recognition routines --- about 500 lines in all --- suggesting 318 recognition routines --- about 500 lines in all --- suggesting
316 that such representations are very powerful, and often 319 that such representations are very powerful, and often
317 indispensable for the types of recognition tasks considered here. 320 indispensable for the types of recognition tasks considered here.
318 321
319 - Although for expediency's sake, I relied on direct knowledge of 322 - For expediency's sake, I relied on direct knowledge of joint
320 joint positions in this proof of concept, it would be 323 positions in this proof of concept. However, I believe that the
321 straightforward to extend =EMPATH= so that it (more 324 structure of =EMPATH= and =CORTEX= will make future work to
322 realistically) infers joint positions from its visual data. 325 enable video analysis much easier than it would otherwise be.
323 326
324 ** =EMPATH= is built on =CORTEX=, a creature builder. 327 ** =EMPATH= is built on =CORTEX=, a creature builder.
325 328
326 I built =CORTEX= to be a general AI research platform for doing 329 I built =CORTEX= to be a general AI research platform for doing
327 experiments involving multiple rich senses and a wide variety and 330 experiments involving multiple rich senses and a wide variety and
341 344
342 =CORTEX= is well suited as an environment for embodied AI research 345 =CORTEX= is well suited as an environment for embodied AI research
343 for three reasons: 346 for three reasons:
344 347
345 - You can design new creatures using Blender (\cite{blender}), a 348 - You can design new creatures using Blender (\cite{blender}), a
346 popular 3D modeling program. Each sense can be specified using 349 popular, free 3D modeling program. Each sense can be specified
347 special blender nodes with biologically inspired parameters. You 350 using special blender nodes with biologically inspired
348 need not write any code to create a creature, and can use a wide 351 parameters. You need not write any code to create a creature, and
349 library of pre-existing blender models as a base for your own 352 can use a wide library of pre-existing blender models as a base
350 creatures. 353 for your own creatures.
351 354
352 - =CORTEX= implements a wide variety of senses: touch, 355 - =CORTEX= implements a wide variety of senses: touch,
353 proprioception, vision, hearing, and muscle tension. Complicated 356 proprioception, vision, hearing, and muscle tension. Complicated
354 senses like touch and vision involve multiple sensory elements 357 senses like touch and vision involve multiple sensory elements
355 embedded in a 2D surface. You have complete control over the 358 embedded in a 2D surface. You have complete control over the
356 distribution of these sensor elements through the use of simple 359 distribution of these sensor elements through the use of simple
357 png image files. =CORTEX= implements more comprehensive hearing 360 image files. =CORTEX= implements more comprehensive hearing than
358 than any other creature simulation system available. 361 any other creature simulation system available.
359 362
360 - =CORTEX= supports any number of creatures and any number of 363 - =CORTEX= supports any number of creatures and any number of
361 senses. Time in =CORTEX= dilates so that the simulated creatures 364 senses. Time in =CORTEX= dilates so that the simulated creatures
362 always perceive a perfectly smooth flow of time, regardless of 365 always perceive a perfectly smooth flow of time, regardless of
363 the actual computational load. 366 the actual computational load.
364 367
365 =CORTEX= is built on top of =jMonkeyEngine3= 368 =CORTEX= is built on top of =jMonkeyEngine3=
366 (\cite{jmonkeyengine}), which is a video game engine designed to 369 (\cite{jmonkeyengine}), which is a video game engine designed to
367 create cross-platform 3D desktop games. =CORTEX= is mainly written 370 create cross-platform 3D desktop games. =CORTEX= is mainly written
368 in clojure, a dialect of =LISP= that runs on the java virtual 371 in clojure, a dialect of =LISP= that runs on the Java Virtual
369 machine (JVM). The API for creating and simulating creatures and 372 Machine (JVM). The API for creating and simulating creatures and
370 senses is entirely expressed in clojure, though many senses are 373 senses is entirely expressed in clojure, though many senses are
371 implemented at the layer of jMonkeyEngine or below. For example, 374 implemented at the layer of jMonkeyEngine or below. For example,
372 for the sense of hearing I use a layer of clojure code on top of a 375 for the sense of hearing I use a layer of clojure code on top of a
373 layer of java JNI bindings that drive a layer of =C++= code which 376 layer of java JNI bindings that drive a layer of =C++= code which
374 implements a modified version of =OpenAL= to support multiple 377 implements a modified version of =OpenAL= to support multiple
394 - exploration of exotic senses and effectors that are not possible 397 - exploration of exotic senses and effectors that are not possible
395 in the real world (such as telekinesis or a semantic sense) 398 in the real world (such as telekinesis or a semantic sense)
396 - imagination using subworlds 399 - imagination using subworlds
397 400
398 During one test with =CORTEX=, I created 3,000 creatures each with 401 During one test with =CORTEX=, I created 3,000 creatures each with
399 their own independent senses and ran them all at only 1/80 real 402 its own independent senses and ran them all at only 1/80 real time.
400 time. In another test, I created a detailed model of my own hand, 403 In another test, I created a detailed model of my own hand,
401 equipped with a realistic distribution of touch (more sensitive at 404 equipped with a realistic distribution of touch (more sensitive at
402 the fingertips), as well as eyes and ears, and it ran at around 1/4 405 the fingertips), as well as eyes and ears, and it ran at around 1/4
403 real time. 406 real time.
404 407
405 #+BEGIN_LaTeX 408 #+BEGIN_LaTeX
414 its own finger from the eye in its palm, and that it can feel its 417 its own finger from the eye in its palm, and that it can feel its
415 own thumb touching its palm.} 418 own thumb touching its palm.}
416 \end{sidewaysfigure} 419 \end{sidewaysfigure}
417 #+END_LaTeX 420 #+END_LaTeX
418 421
419 * Designing =CORTEX= 422 * COMMENT Designing =CORTEX=
420 423
421 In this section, I outline the design decisions that went into 424 In this chapter, I outline the design decisions that went into
422 making =CORTEX=, along with some details about its implementation. 425 making =CORTEX=, along with some details about its implementation.
423 (A practical guide to getting started with =CORTEX=, which skips 426 (A practical guide to getting started with =CORTEX=, which skips
424 over the history and implementation details presented here, is 427 over the history and implementation details presented here, is
425 provided in an appendix at the end of this thesis.) 428 provided in an appendix at the end of this thesis.)
426 429
1315 community and is now (in modified form) part of a system for 1318 community and is now (in modified form) part of a system for
1316 capturing in-game video to a file. 1319 capturing in-game video to a file.
1317 1320
1318 ** ...but hearing must be built from scratch 1321 ** ...but hearing must be built from scratch
1319 1322
1320 At the end of this section I will have simulated ears that work the 1323 At the end of this chapter I will have simulated ears that work the
1321 same way as the simulated eyes in the last section. I will be able to 1324 same way as the simulated eyes in the last chapter. I will be able to
1322 place any number of ear-nodes in a blender file, and they will bind to 1325 place any number of ear-nodes in a blender file, and they will bind to
1323 the closest physical object and follow it as it moves around. Each ear 1326 the closest physical object and follow it as it moves around. Each ear
1324 will provide access to the sound data it picks up between every frame. 1327 will provide access to the sound data it picks up between every frame.
1325 1328
1326 Hearing is one of the more difficult senses to simulate, because there 1329 Hearing is one of the more difficult senses to simulate, because there
1330 access the rendered sound data. 1333 access the rendered sound data.
1331 1334
1332 =CORTEX='s hearing is unique because it does not have any 1335 =CORTEX='s hearing is unique because it does not have any
1333 limitations compared to other simulation environments. As far as I 1336 limitations compared to other simulation environments. As far as I
1334 know, there is no other system that supports multiple listeners, 1337 know, there is no other system that supports multiple listeners,
1335 and the sound demo at the end of this section is the first time 1338 and the sound demo at the end of this chapter is the first time
1336 it's been done in a video game environment. 1339 it's been done in a video game environment.
1337 1340
1338 *** Brief Description of jMonkeyEngine's Sound System 1341 *** Brief Description of jMonkeyEngine's Sound System
1339 1342
1340 jMonkeyEngine's sound system works as follows: 1343 jMonkeyEngine's sound system works as follows:
2144 2147
2145 My simulated proprioception calculates the relative angles of each 2148 My simulated proprioception calculates the relative angles of each
2146 joint from the rest position defined in the blender file. This 2149 joint from the rest position defined in the blender file. This
2147 simulates the muscle-spindles and joint capsules. I will deal with 2150 simulates the muscle-spindles and joint capsules. I will deal with
2148 Golgi tendon organs, which calculate muscle strain, in the next 2151 Golgi tendon organs, which calculate muscle strain, in the next
2149 section. 2152 chapter.
2150 2153
2151 *** Helper functions 2154 *** Helper functions
2152 2155
2153 =absolute-angle= calculates the angle between two vectors, 2156 =absolute-angle= calculates the angle between two vectors,
2154 relative to a third axis vector. This angle is the number of 2157 relative to a third axis vector. This angle is the number of
2390 a rotational force dependent on it's orientation to the object in 2393 a rotational force dependent on it's orientation to the object in
2391 the blender file. The function returned by =movement-kernel= is 2394 the blender file. The function returned by =movement-kernel= is
2392 also a sense function: it returns the percent of the total muscle 2395 also a sense function: it returns the percent of the total muscle
2393 strength that is currently being employed. This is analogous to 2396 strength that is currently being employed. This is analogous to
2394 muscle tension in humans and completes the sense of proprioception 2397 muscle tension in humans and completes the sense of proprioception
2395 begun in the last section. 2398 begun in the last chapter.
2396 2399
2397 ** =CORTEX= brings complex creatures to life! 2400 ** =CORTEX= brings complex creatures to life!
2398 2401
2399 The ultimate test of =CORTEX= is to create a creature with the full 2402 The ultimate test of =CORTEX= is to create a creature with the full
2400 gamut of senses and put it though its paces. 2403 gamut of senses and put it though its paces.
2497 hard control problems without worrying about physics or 2500 hard control problems without worrying about physics or
2498 senses. 2501 senses.
2499 2502
2500 \newpage 2503 \newpage
2501 2504
2502 * =EMPATH=: action recognition in a simulated worm 2505 * COMMENT =EMPATH=: action recognition in a simulated worm
2503 2506
2504 Here I develop a computational model of empathy, using =CORTEX= as a 2507 Here I develop a computational model of empathy, using =CORTEX= as a
2505 base. Empathy in this context is the ability to observe another 2508 base. Empathy in this context is the ability to observe another
2506 creature and infer what sorts of sensations that creature is 2509 creature and infer what sorts of sensations that creature is
2507 feeling. My empathy algorithm involves multiple phases. First is 2510 feeling. My empathy algorithm involves multiple phases. First is
3218 see no errors in action identification compared to my own judgment 3221 see no errors in action identification compared to my own judgment
3219 of what the worm is doing. 3222 of what the worm is doing.
3220 3223
3221 ** Digression: Learning touch sensor layout through free play 3224 ** Digression: Learning touch sensor layout through free play
3222 3225
3223 In the previous section I showed how to compute actions in terms of 3226 In the previous chapter I showed how to compute actions in terms of
3224 body-centered predicates, but some of those predicates relied on 3227 body-centered predicates, but some of those predicates relied on
3225 the average touch activation of pre-defined regions of the worm's 3228 the average touch activation of pre-defined regions of the worm's
3226 skin. What if, instead of receiving touch pre-grouped into the six 3229 skin. What if, instead of receiving touch pre-grouped into the six
3227 faces of each worm segment, the true topology of the worm's skin 3230 faces of each worm segment, the true topology of the worm's skin
3228 was unknown? This is more similar to how a nerve fiber bundle might 3231 was unknown? This is more similar to how a nerve fiber bundle might
3231 together on the skin, the process of taking a complicated surface 3234 together on the skin, the process of taking a complicated surface
3232 and forcing it into essentially a circle requires that some regions 3235 and forcing it into essentially a circle requires that some regions
3233 of skin that are close together in the animal end up far apart in 3236 of skin that are close together in the animal end up far apart in
3234 the nerve bundle. 3237 the nerve bundle.
3235 3238
3236 In this section I show how to automatically learn the skin-topology of 3239 In this chapter I show how to automatically learn the skin-topology of
3237 a worm segment by free exploration. As the worm rolls around on the 3240 a worm segment by free exploration. As the worm rolls around on the
3238 floor, large sections of its surface get activated. If the worm has 3241 floor, large sections of its surface get activated. If the worm has
3239 stopped moving, then whatever region of skin that is touching the 3242 stopped moving, then whatever region of skin that is touching the
3240 floor is probably an important region, and should be recorded. 3243 floor is probably an important region, and should be recorded.
3241 3244
3482 3485
3483 #+BEGIN_LaTeX 3486 #+BEGIN_LaTeX
3484 \clearpage 3487 \clearpage
3485 #+END_LaTeX 3488 #+END_LaTeX
3486 3489
3487 * Contributions 3490 * COMMENT Contributions
3488 3491
3489 The big idea behind this thesis is a new way to represent and 3492 The big idea behind this thesis is a new way to represent and
3490 recognize physical actions, which I call /empathic representation/. 3493 recognize physical actions, which I call /empathic representation/.
3491 Actions are represented as predicates which have access to the 3494 Actions are represented as predicates which have access to the
3492 totality of a creature's sensory abilities. To recognize the 3495 totality of a creature's sensory abilities. To recognize the
3542 #+BEGIN_LaTeX 3545 #+BEGIN_LaTeX
3543 \clearpage 3546 \clearpage
3544 \appendix 3547 \appendix
3545 #+END_LaTeX 3548 #+END_LaTeX
3546 3549
3547 * Appendix: =CORTEX= User Guide 3550 * COMMENT Appendix: =CORTEX= User Guide
3548 3551
3549 Those who write a thesis should endeavor to make their code not only 3552 Those who write a thesis should endeavor to make their code not only
3550 accessible, but actually usable, as a way to pay back the community 3553 accessible, but actually usable, as a way to pay back the community
3551 that made the thesis possible in the first place. This thesis would 3554 that made the thesis possible in the first place. This thesis would
3552 not be possible without Free Software such as jMonkeyEngine3, 3555 not be possible without Free Software such as jMonkeyEngine3,