Mercurial > cortex
comparison thesis/cortex.org @ 551:d304b2ea7c58
some changes from winston.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Fri, 02 May 2014 13:40:47 -0400 |
parents | b1d8d9b4b569 |
children | 20f64a70f8c5 |
comparison
equal
deleted
inserted
replaced
550:b1d8d9b4b569 | 551:d304b2ea7c58 |
---|---|
41 [[./images/aurellem-gray.png]] | 41 [[./images/aurellem-gray.png]] |
42 | 42 |
43 | 43 |
44 * Empathy \& Embodiment: problem solving strategies | 44 * Empathy \& Embodiment: problem solving strategies |
45 | 45 |
46 By the end of this thesis, you will have a novel approach to | 46 By the time you have read this thesis, you will understand a novel |
47 representing an recognizing physical actions using embodiment and | 47 approach to representing and recognizing physical actions using |
48 empathy. You will also see one way to efficiently implement physical | 48 embodiment and empathy. You will also see one way to efficiently |
49 empathy for embodied creatures. Finally, you will become familiar | 49 implement physical empathy for embodied creatures. Finally, you will |
50 with =CORTEX=, a system for designing and simulating creatures with | 50 become familiar with =CORTEX=, a system for designing and simulating |
51 rich senses, which I have designed as a library that you can use in | 51 creatures with rich senses, which I have designed as a library that |
52 your own research. Note that I /do not/ process video directly --- I | 52 you can use in your own research. Note that I /do not/ process video |
53 start with knowledge of the positions of a creature's body parts and | 53 directly --- I start with knowledge of the positions of a creature's |
54 works from there. | 54 body parts and work from there. |
55 | 55 |
56 This is the core vision of my thesis: That one of the important ways | 56 This is the core vision of my thesis: That one of the important ways |
57 in which we understand others is by imagining ourselves in their | 57 in which we understand others is by imagining ourselves in their |
58 position and emphatically feeling experiences relative to our own | 58 position and emphatically feeling experiences relative to our own |
59 bodies. By understanding events in terms of our own previous | 59 bodies. By understanding events in terms of our own previous |
63 is happening in a video and being completely lost in a sea of | 63 is happening in a video and being completely lost in a sea of |
64 incomprehensible color and movement. | 64 incomprehensible color and movement. |
65 | 65 |
66 ** The problem: recognizing actions is hard! | 66 ** The problem: recognizing actions is hard! |
67 | 67 |
68 Examine the following image. What is happening? As you, and indeed | 68 Examine figure \ref{cat-drink}. What is happening? As you, and |
69 very young children, can easily determine, this is an image of | 69 indeed very young children, can easily determine, this is an image |
70 drinking. | 70 of drinking. |
71 | 71 |
72 #+caption: A cat drinking some water. Identifying this action is | 72 #+caption: A cat drinking some water. Identifying this action is |
73 #+caption: beyond the capabilities of existing computer vision systems. | 73 #+caption: beyond the capabilities of existing computer vision systems. |
74 #+name: cat-drink | |
74 #+ATTR_LaTeX: :width 7cm | 75 #+ATTR_LaTeX: :width 7cm |
75 [[./images/cat-drinking.jpg]] | 76 [[./images/cat-drinking.jpg]] |
76 | 77 |
77 Nevertheless, it is beyond the state of the art for a computer | 78 Nevertheless, it is beyond the state of the art for a computer |
78 vision program to describe what's happening in this image. Part of | 79 vision program to describe what's happening in this image. Part of |
92 #+name: hidden-chair | 93 #+name: hidden-chair |
93 #+ATTR_LaTeX: :width 10cm | 94 #+ATTR_LaTeX: :width 10cm |
94 [[./images/fat-person-sitting-at-desk.jpg]] | 95 [[./images/fat-person-sitting-at-desk.jpg]] |
95 | 96 |
96 Finally, how is it that you can easily tell the difference between | 97 Finally, how is it that you can easily tell the difference between |
97 how the girls /muscles/ are working in figure \ref{girl}? | 98 how the girl's /muscles/ are working in figure \ref{girl}? |
98 | 99 |
99 #+caption: The mysterious ``common sense'' appears here as you are able | 100 #+caption: The mysterious ``common sense'' appears here as you are able |
100 #+caption: to discern the difference in how the girl's arm muscles | 101 #+caption: to discern the difference in how the girl's arm muscles |
101 #+caption: are activated between the two images. | 102 #+caption: are activated between the two images. When you compare |
103 #+caption: these two images, do you feel something in your own arm | |
104 #+caption: muscles? | |
102 #+name: girl | 105 #+name: girl |
103 #+ATTR_LaTeX: :width 7cm | 106 #+ATTR_LaTeX: :width 7cm |
104 [[./images/wall-push.png]] | 107 [[./images/wall-push.png]] |
105 | 108 |
106 Each of these examples tells us something about what might be going | 109 Each of these examples tells us something about what might be going |
136 their body, and are able to recognize that /feeling/ as drinking. | 139 their body, and are able to recognize that /feeling/ as drinking. |
137 So, the label of the action is not really in the pixels of the | 140 So, the label of the action is not really in the pixels of the |
138 image, but is found clearly in a simulation / recollection inspired | 141 image, but is found clearly in a simulation / recollection inspired |
139 by those pixels. An imaginative system, having been trained on | 142 by those pixels. An imaginative system, having been trained on |
140 drinking and non-drinking examples and learning that the most | 143 drinking and non-drinking examples and learning that the most |
141 important component of drinking is the feeling of water sliding | 144 important component of drinking is the feeling of water flowing |
142 down one's throat, would analyze a video of a cat drinking in the | 145 down one's throat, would analyze a video of a cat drinking in the |
143 following manner: | 146 following manner: |
144 | 147 |
145 1. Create a physical model of the video by putting a ``fuzzy'' | 148 1. Create a physical model of the video by putting a ``fuzzy'' |
146 model of its own body in place of the cat. Possibly also create | 149 model of its own body in place of the cat. Possibly also create |
147 a simulation of the stream of water. | 150 a simulation of the stream of water. |
148 | 151 |
149 2. ``Play out'' this simulated scene and generate imagined sensory | 152 2. Play out this simulated scene and generate imagined sensory |
150 experience. This will include relevant muscle contractions, a | 153 experience. This will include relevant muscle contractions, a |
151 close up view of the stream from the cat's perspective, and most | 154 close up view of the stream from the cat's perspective, and most |
152 importantly, the imagined feeling of water entering the mouth. | 155 importantly, the imagined feeling of water entering the mouth. |
153 The imagined sensory experience can come from a simulation of | 156 The imagined sensory experience can come from a simulation of |
154 the event, but can also be pattern-matched from previous, | 157 the event, but can also be pattern-matched from previous, |
231 ** =EMPATH= recognizes actions using empathy | 234 ** =EMPATH= recognizes actions using empathy |
232 | 235 |
233 Exploring these ideas further demands a concrete implementation, so | 236 Exploring these ideas further demands a concrete implementation, so |
234 first, I built a system for constructing virtual creatures with | 237 first, I built a system for constructing virtual creatures with |
235 physiologically plausible sensorimotor systems and detailed | 238 physiologically plausible sensorimotor systems and detailed |
236 environments. The result is =CORTEX=, which is described in section | 239 environments. The result is =CORTEX=, which I describe in chapter |
237 \ref{sec-2}. | 240 \ref{sec-2}. |
238 | 241 |
239 Next, I wrote routines which enabled a simple worm-like creature to | 242 Next, I wrote routines which enabled a simple worm-like creature to |
240 infer the actions of a second worm-like creature, using only its | 243 infer the actions of a second worm-like creature, using only its |
241 own prior sensorimotor experiences and knowledge of the second | 244 own prior sensorimotor experiences and knowledge of the second |
242 worm's joint positions. This program, =EMPATH=, is described in | 245 worm's joint positions. This program, =EMPATH=, is described in |
243 section \ref{sec-3}. It's main components are: | 246 chapter \ref{sec-3}. It's main components are: |
244 | 247 |
245 - Embodied Action Definitions :: Many otherwise complicated actions | 248 - Embodied Action Definitions :: Many otherwise complicated actions |
246 are easily described in the language of a full suite of | 249 are easily described in the language of a full suite of |
247 body-centered, rich senses and experiences. For example, | 250 body-centered, rich senses and experiences. For example, |
248 drinking is the feeling of water sliding down your throat, and | 251 drinking is the feeling of water flowing down your throat, and |
249 cooling your insides. It's often accompanied by bringing your | 252 cooling your insides. It's often accompanied by bringing your |
250 hand close to your face, or bringing your face close to water. | 253 hand close to your face, or bringing your face close to water. |
251 Sitting down is the feeling of bending your knees, activating | 254 Sitting down is the feeling of bending your knees, activating |
252 your quadriceps, then feeling a surface with your bottom and | 255 your quadriceps, then feeling a surface with your bottom and |
253 relaxing your legs. These body-centered action descriptions | 256 relaxing your legs. These body-centered action descriptions |
314 language provided by =CORTEX= resulted in extremely economical | 317 language provided by =CORTEX= resulted in extremely economical |
315 recognition routines --- about 500 lines in all --- suggesting | 318 recognition routines --- about 500 lines in all --- suggesting |
316 that such representations are very powerful, and often | 319 that such representations are very powerful, and often |
317 indispensable for the types of recognition tasks considered here. | 320 indispensable for the types of recognition tasks considered here. |
318 | 321 |
319 - Although for expediency's sake, I relied on direct knowledge of | 322 - For expediency's sake, I relied on direct knowledge of joint |
320 joint positions in this proof of concept, it would be | 323 positions in this proof of concept. However, I believe that the |
321 straightforward to extend =EMPATH= so that it (more | 324 structure of =EMPATH= and =CORTEX= will make future work to |
322 realistically) infers joint positions from its visual data. | 325 enable video analysis much easier than it would otherwise be. |
323 | 326 |
324 ** =EMPATH= is built on =CORTEX=, a creature builder. | 327 ** =EMPATH= is built on =CORTEX=, a creature builder. |
325 | 328 |
326 I built =CORTEX= to be a general AI research platform for doing | 329 I built =CORTEX= to be a general AI research platform for doing |
327 experiments involving multiple rich senses and a wide variety and | 330 experiments involving multiple rich senses and a wide variety and |
341 | 344 |
342 =CORTEX= is well suited as an environment for embodied AI research | 345 =CORTEX= is well suited as an environment for embodied AI research |
343 for three reasons: | 346 for three reasons: |
344 | 347 |
345 - You can design new creatures using Blender (\cite{blender}), a | 348 - You can design new creatures using Blender (\cite{blender}), a |
346 popular 3D modeling program. Each sense can be specified using | 349 popular, free 3D modeling program. Each sense can be specified |
347 special blender nodes with biologically inspired parameters. You | 350 using special blender nodes with biologically inspired |
348 need not write any code to create a creature, and can use a wide | 351 parameters. You need not write any code to create a creature, and |
349 library of pre-existing blender models as a base for your own | 352 can use a wide library of pre-existing blender models as a base |
350 creatures. | 353 for your own creatures. |
351 | 354 |
352 - =CORTEX= implements a wide variety of senses: touch, | 355 - =CORTEX= implements a wide variety of senses: touch, |
353 proprioception, vision, hearing, and muscle tension. Complicated | 356 proprioception, vision, hearing, and muscle tension. Complicated |
354 senses like touch and vision involve multiple sensory elements | 357 senses like touch and vision involve multiple sensory elements |
355 embedded in a 2D surface. You have complete control over the | 358 embedded in a 2D surface. You have complete control over the |
356 distribution of these sensor elements through the use of simple | 359 distribution of these sensor elements through the use of simple |
357 png image files. =CORTEX= implements more comprehensive hearing | 360 image files. =CORTEX= implements more comprehensive hearing than |
358 than any other creature simulation system available. | 361 any other creature simulation system available. |
359 | 362 |
360 - =CORTEX= supports any number of creatures and any number of | 363 - =CORTEX= supports any number of creatures and any number of |
361 senses. Time in =CORTEX= dilates so that the simulated creatures | 364 senses. Time in =CORTEX= dilates so that the simulated creatures |
362 always perceive a perfectly smooth flow of time, regardless of | 365 always perceive a perfectly smooth flow of time, regardless of |
363 the actual computational load. | 366 the actual computational load. |
364 | 367 |
365 =CORTEX= is built on top of =jMonkeyEngine3= | 368 =CORTEX= is built on top of =jMonkeyEngine3= |
366 (\cite{jmonkeyengine}), which is a video game engine designed to | 369 (\cite{jmonkeyengine}), which is a video game engine designed to |
367 create cross-platform 3D desktop games. =CORTEX= is mainly written | 370 create cross-platform 3D desktop games. =CORTEX= is mainly written |
368 in clojure, a dialect of =LISP= that runs on the java virtual | 371 in clojure, a dialect of =LISP= that runs on the Java Virtual |
369 machine (JVM). The API for creating and simulating creatures and | 372 Machine (JVM). The API for creating and simulating creatures and |
370 senses is entirely expressed in clojure, though many senses are | 373 senses is entirely expressed in clojure, though many senses are |
371 implemented at the layer of jMonkeyEngine or below. For example, | 374 implemented at the layer of jMonkeyEngine or below. For example, |
372 for the sense of hearing I use a layer of clojure code on top of a | 375 for the sense of hearing I use a layer of clojure code on top of a |
373 layer of java JNI bindings that drive a layer of =C++= code which | 376 layer of java JNI bindings that drive a layer of =C++= code which |
374 implements a modified version of =OpenAL= to support multiple | 377 implements a modified version of =OpenAL= to support multiple |
394 - exploration of exotic senses and effectors that are not possible | 397 - exploration of exotic senses and effectors that are not possible |
395 in the real world (such as telekinesis or a semantic sense) | 398 in the real world (such as telekinesis or a semantic sense) |
396 - imagination using subworlds | 399 - imagination using subworlds |
397 | 400 |
398 During one test with =CORTEX=, I created 3,000 creatures each with | 401 During one test with =CORTEX=, I created 3,000 creatures each with |
399 their own independent senses and ran them all at only 1/80 real | 402 its own independent senses and ran them all at only 1/80 real time. |
400 time. In another test, I created a detailed model of my own hand, | 403 In another test, I created a detailed model of my own hand, |
401 equipped with a realistic distribution of touch (more sensitive at | 404 equipped with a realistic distribution of touch (more sensitive at |
402 the fingertips), as well as eyes and ears, and it ran at around 1/4 | 405 the fingertips), as well as eyes and ears, and it ran at around 1/4 |
403 real time. | 406 real time. |
404 | 407 |
405 #+BEGIN_LaTeX | 408 #+BEGIN_LaTeX |
414 its own finger from the eye in its palm, and that it can feel its | 417 its own finger from the eye in its palm, and that it can feel its |
415 own thumb touching its palm.} | 418 own thumb touching its palm.} |
416 \end{sidewaysfigure} | 419 \end{sidewaysfigure} |
417 #+END_LaTeX | 420 #+END_LaTeX |
418 | 421 |
419 * Designing =CORTEX= | 422 * COMMENT Designing =CORTEX= |
420 | 423 |
421 In this section, I outline the design decisions that went into | 424 In this chapter, I outline the design decisions that went into |
422 making =CORTEX=, along with some details about its implementation. | 425 making =CORTEX=, along with some details about its implementation. |
423 (A practical guide to getting started with =CORTEX=, which skips | 426 (A practical guide to getting started with =CORTEX=, which skips |
424 over the history and implementation details presented here, is | 427 over the history and implementation details presented here, is |
425 provided in an appendix at the end of this thesis.) | 428 provided in an appendix at the end of this thesis.) |
426 | 429 |
1315 community and is now (in modified form) part of a system for | 1318 community and is now (in modified form) part of a system for |
1316 capturing in-game video to a file. | 1319 capturing in-game video to a file. |
1317 | 1320 |
1318 ** ...but hearing must be built from scratch | 1321 ** ...but hearing must be built from scratch |
1319 | 1322 |
1320 At the end of this section I will have simulated ears that work the | 1323 At the end of this chapter I will have simulated ears that work the |
1321 same way as the simulated eyes in the last section. I will be able to | 1324 same way as the simulated eyes in the last chapter. I will be able to |
1322 place any number of ear-nodes in a blender file, and they will bind to | 1325 place any number of ear-nodes in a blender file, and they will bind to |
1323 the closest physical object and follow it as it moves around. Each ear | 1326 the closest physical object and follow it as it moves around. Each ear |
1324 will provide access to the sound data it picks up between every frame. | 1327 will provide access to the sound data it picks up between every frame. |
1325 | 1328 |
1326 Hearing is one of the more difficult senses to simulate, because there | 1329 Hearing is one of the more difficult senses to simulate, because there |
1330 access the rendered sound data. | 1333 access the rendered sound data. |
1331 | 1334 |
1332 =CORTEX='s hearing is unique because it does not have any | 1335 =CORTEX='s hearing is unique because it does not have any |
1333 limitations compared to other simulation environments. As far as I | 1336 limitations compared to other simulation environments. As far as I |
1334 know, there is no other system that supports multiple listeners, | 1337 know, there is no other system that supports multiple listeners, |
1335 and the sound demo at the end of this section is the first time | 1338 and the sound demo at the end of this chapter is the first time |
1336 it's been done in a video game environment. | 1339 it's been done in a video game environment. |
1337 | 1340 |
1338 *** Brief Description of jMonkeyEngine's Sound System | 1341 *** Brief Description of jMonkeyEngine's Sound System |
1339 | 1342 |
1340 jMonkeyEngine's sound system works as follows: | 1343 jMonkeyEngine's sound system works as follows: |
2144 | 2147 |
2145 My simulated proprioception calculates the relative angles of each | 2148 My simulated proprioception calculates the relative angles of each |
2146 joint from the rest position defined in the blender file. This | 2149 joint from the rest position defined in the blender file. This |
2147 simulates the muscle-spindles and joint capsules. I will deal with | 2150 simulates the muscle-spindles and joint capsules. I will deal with |
2148 Golgi tendon organs, which calculate muscle strain, in the next | 2151 Golgi tendon organs, which calculate muscle strain, in the next |
2149 section. | 2152 chapter. |
2150 | 2153 |
2151 *** Helper functions | 2154 *** Helper functions |
2152 | 2155 |
2153 =absolute-angle= calculates the angle between two vectors, | 2156 =absolute-angle= calculates the angle between two vectors, |
2154 relative to a third axis vector. This angle is the number of | 2157 relative to a third axis vector. This angle is the number of |
2390 a rotational force dependent on it's orientation to the object in | 2393 a rotational force dependent on it's orientation to the object in |
2391 the blender file. The function returned by =movement-kernel= is | 2394 the blender file. The function returned by =movement-kernel= is |
2392 also a sense function: it returns the percent of the total muscle | 2395 also a sense function: it returns the percent of the total muscle |
2393 strength that is currently being employed. This is analogous to | 2396 strength that is currently being employed. This is analogous to |
2394 muscle tension in humans and completes the sense of proprioception | 2397 muscle tension in humans and completes the sense of proprioception |
2395 begun in the last section. | 2398 begun in the last chapter. |
2396 | 2399 |
2397 ** =CORTEX= brings complex creatures to life! | 2400 ** =CORTEX= brings complex creatures to life! |
2398 | 2401 |
2399 The ultimate test of =CORTEX= is to create a creature with the full | 2402 The ultimate test of =CORTEX= is to create a creature with the full |
2400 gamut of senses and put it though its paces. | 2403 gamut of senses and put it though its paces. |
2497 hard control problems without worrying about physics or | 2500 hard control problems without worrying about physics or |
2498 senses. | 2501 senses. |
2499 | 2502 |
2500 \newpage | 2503 \newpage |
2501 | 2504 |
2502 * =EMPATH=: action recognition in a simulated worm | 2505 * COMMENT =EMPATH=: action recognition in a simulated worm |
2503 | 2506 |
2504 Here I develop a computational model of empathy, using =CORTEX= as a | 2507 Here I develop a computational model of empathy, using =CORTEX= as a |
2505 base. Empathy in this context is the ability to observe another | 2508 base. Empathy in this context is the ability to observe another |
2506 creature and infer what sorts of sensations that creature is | 2509 creature and infer what sorts of sensations that creature is |
2507 feeling. My empathy algorithm involves multiple phases. First is | 2510 feeling. My empathy algorithm involves multiple phases. First is |
3218 see no errors in action identification compared to my own judgment | 3221 see no errors in action identification compared to my own judgment |
3219 of what the worm is doing. | 3222 of what the worm is doing. |
3220 | 3223 |
3221 ** Digression: Learning touch sensor layout through free play | 3224 ** Digression: Learning touch sensor layout through free play |
3222 | 3225 |
3223 In the previous section I showed how to compute actions in terms of | 3226 In the previous chapter I showed how to compute actions in terms of |
3224 body-centered predicates, but some of those predicates relied on | 3227 body-centered predicates, but some of those predicates relied on |
3225 the average touch activation of pre-defined regions of the worm's | 3228 the average touch activation of pre-defined regions of the worm's |
3226 skin. What if, instead of receiving touch pre-grouped into the six | 3229 skin. What if, instead of receiving touch pre-grouped into the six |
3227 faces of each worm segment, the true topology of the worm's skin | 3230 faces of each worm segment, the true topology of the worm's skin |
3228 was unknown? This is more similar to how a nerve fiber bundle might | 3231 was unknown? This is more similar to how a nerve fiber bundle might |
3231 together on the skin, the process of taking a complicated surface | 3234 together on the skin, the process of taking a complicated surface |
3232 and forcing it into essentially a circle requires that some regions | 3235 and forcing it into essentially a circle requires that some regions |
3233 of skin that are close together in the animal end up far apart in | 3236 of skin that are close together in the animal end up far apart in |
3234 the nerve bundle. | 3237 the nerve bundle. |
3235 | 3238 |
3236 In this section I show how to automatically learn the skin-topology of | 3239 In this chapter I show how to automatically learn the skin-topology of |
3237 a worm segment by free exploration. As the worm rolls around on the | 3240 a worm segment by free exploration. As the worm rolls around on the |
3238 floor, large sections of its surface get activated. If the worm has | 3241 floor, large sections of its surface get activated. If the worm has |
3239 stopped moving, then whatever region of skin that is touching the | 3242 stopped moving, then whatever region of skin that is touching the |
3240 floor is probably an important region, and should be recorded. | 3243 floor is probably an important region, and should be recorded. |
3241 | 3244 |
3482 | 3485 |
3483 #+BEGIN_LaTeX | 3486 #+BEGIN_LaTeX |
3484 \clearpage | 3487 \clearpage |
3485 #+END_LaTeX | 3488 #+END_LaTeX |
3486 | 3489 |
3487 * Contributions | 3490 * COMMENT Contributions |
3488 | 3491 |
3489 The big idea behind this thesis is a new way to represent and | 3492 The big idea behind this thesis is a new way to represent and |
3490 recognize physical actions, which I call /empathic representation/. | 3493 recognize physical actions, which I call /empathic representation/. |
3491 Actions are represented as predicates which have access to the | 3494 Actions are represented as predicates which have access to the |
3492 totality of a creature's sensory abilities. To recognize the | 3495 totality of a creature's sensory abilities. To recognize the |
3542 #+BEGIN_LaTeX | 3545 #+BEGIN_LaTeX |
3543 \clearpage | 3546 \clearpage |
3544 \appendix | 3547 \appendix |
3545 #+END_LaTeX | 3548 #+END_LaTeX |
3546 | 3549 |
3547 * Appendix: =CORTEX= User Guide | 3550 * COMMENT Appendix: =CORTEX= User Guide |
3548 | 3551 |
3549 Those who write a thesis should endeavor to make their code not only | 3552 Those who write a thesis should endeavor to make their code not only |
3550 accessible, but actually usable, as a way to pay back the community | 3553 accessible, but actually usable, as a way to pay back the community |
3551 that made the thesis possible in the first place. This thesis would | 3554 that made the thesis possible in the first place. This thesis would |
3552 not be possible without Free Software such as jMonkeyEngine3, | 3555 not be possible without Free Software such as jMonkeyEngine3, |