Mercurial > cortex
comparison thesis/cortex.org @ 548:0b891e0dd809
version 0.2 of thesis complete.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Thu, 01 May 2014 23:41:41 -0400 |
parents | 5d89879fc894 |
children | c14545acdfba |
comparison
equal
deleted
inserted
replaced
547:5d89879fc894 | 548:0b891e0dd809 |
---|---|
511 in proportion to the amount of processing each frame. From the | 511 in proportion to the amount of processing each frame. From the |
512 perspective of the creatures inside the simulation, time always | 512 perspective of the creatures inside the simulation, time always |
513 appears to flow at a constant rate, regardless of how complicated | 513 appears to flow at a constant rate, regardless of how complicated |
514 the environment becomes or how many creatures are in the | 514 the environment becomes or how many creatures are in the |
515 simulation. The cost is that =CORTEX= can sometimes run slower than | 515 simulation. The cost is that =CORTEX= can sometimes run slower than |
516 real time. Time dialation works both ways, however --- simulations | 516 real time. Time dilation works both ways, however --- simulations |
517 of very simple creatures in =CORTEX= generally run at 40x real-time | 517 of very simple creatures in =CORTEX= generally run at 40x real-time |
518 on my machine! | 518 on my machine! |
519 | 519 |
520 ** All sense organs are two-dimensional surfaces | 520 ** All sense organs are two-dimensional surfaces |
521 | 521 |
563 Therefore, =CORTEX= must support the ability to create objects and | 563 Therefore, =CORTEX= must support the ability to create objects and |
564 then be able to ``paint'' points along their surfaces to describe | 564 then be able to ``paint'' points along their surfaces to describe |
565 each sense. | 565 each sense. |
566 | 566 |
567 Fortunately this idea is already a well known computer graphics | 567 Fortunately this idea is already a well known computer graphics |
568 technique called /UV-mapping/. In UV-maping, the three-dimensional | 568 technique called /UV-mapping/. In UV-mapping, the three-dimensional |
569 surface of a model is cut and smooshed until it fits on a | 569 surface of a model is cut and smooshed until it fits on a |
570 two-dimensional image. You paint whatever you want on that image, | 570 two-dimensional image. You paint whatever you want on that image, |
571 and when the three-dimensional shape is rendered in a game the | 571 and when the three-dimensional shape is rendered in a game the |
572 smooshing and cutting is reversed and the image appears on the | 572 smooshing and cutting is reversed and the image appears on the |
573 three-dimensional object. | 573 three-dimensional object. |
2812 of muscle contractions to transform the worm's body along a | 2812 of muscle contractions to transform the worm's body along a |
2813 specific path through \Phi-space. | 2813 specific path through \Phi-space. |
2814 | 2814 |
2815 The worm's total life experience is a long looping path through | 2815 The worm's total life experience is a long looping path through |
2816 \Phi-space. I will now introduce simple way of taking that | 2816 \Phi-space. I will now introduce simple way of taking that |
2817 experiece path and building a function that can infer complete | 2817 experience path and building a function that can infer complete |
2818 sensory experience given only a stream of proprioceptive data. This | 2818 sensory experience given only a stream of proprioceptive data. This |
2819 /empathy/ function will provide a bridge to use the body centered | 2819 /empathy/ function will provide a bridge to use the body centered |
2820 action predicates on video-like streams of information. | 2820 action predicates on video-like streams of information. |
2821 | 2821 |
2822 ** Empathy is the process of building paths in \Phi-space | 2822 ** Empathy is the process of building paths in \Phi-space |
2970 | 2970 |
2971 =longest-thread= takes time proportional to the average number of | 2971 =longest-thread= takes time proportional to the average number of |
2972 entries in a proprioceptive bin, because for each element in the | 2972 entries in a proprioceptive bin, because for each element in the |
2973 starting bin it performs a series of set lookups in the preceding | 2973 starting bin it performs a series of set lookups in the preceding |
2974 bins. If the total history is limited, then this takes time | 2974 bins. If the total history is limited, then this takes time |
2975 proprotional to a only a constant multiple of the number of entries | 2975 proportional to a only a constant multiple of the number of entries |
2976 in the starting bin. This analysis also applies, even if the action | 2976 in the starting bin. This analysis also applies, even if the action |
2977 requires multiple longest chains -- it's still the average number | 2977 requires multiple longest chains -- it's still the average number |
2978 of entries in a proprioceptive bin times the desired chain length. | 2978 of entries in a proprioceptive bin times the desired chain length. |
2979 Because =longest-thread= is so efficient and simple, I can | 2979 Because =longest-thread= is so efficient and simple, I can |
2980 interpret worm-actions in real time. | 2980 interpret worm-actions in real time. |
3123 worm's \Phi-space is generated from a simple motor script. Then the | 3123 worm's \Phi-space is generated from a simple motor script. Then the |
3124 worm is re-created in an environment almost exactly identical to | 3124 worm is re-created in an environment almost exactly identical to |
3125 the testing environment for the action-predicates, with one major | 3125 the testing environment for the action-predicates, with one major |
3126 difference : the only sensory information available to the system | 3126 difference : the only sensory information available to the system |
3127 is proprioception. From just the proprioception data and | 3127 is proprioception. From just the proprioception data and |
3128 \Phi-space, =longest-thread= synthesises a complete record the last | 3128 \Phi-space, =longest-thread= synthesizes a complete record the last |
3129 300 sensory experiences of the worm. These synthesized experiences | 3129 300 sensory experiences of the worm. These synthesized experiences |
3130 are fed directly into the action predicates =grand-circle?=, | 3130 are fed directly into the action predicates =grand-circle?=, |
3131 =curled?=, =wiggling?=, and =resting?= from before and their output | 3131 =curled?=, =wiggling?=, and =resting?= from before and their output |
3132 is printed to the screen at each frame. | 3132 is printed to the screen at each frame. |
3133 | 3133 |
3363 #+name: worm-roll | 3363 #+name: worm-roll |
3364 #+ATTR_LaTeX: :width 12cm | 3364 #+ATTR_LaTeX: :width 12cm |
3365 [[./images/worm-roll.png]] | 3365 [[./images/worm-roll.png]] |
3366 | 3366 |
3367 #+caption: After completing its adventures, the worm now knows | 3367 #+caption: After completing its adventures, the worm now knows |
3368 #+caption: how its touch sensors are arranged along its skin. These | 3368 #+caption: how its touch sensors are arranged along its skin. Each of these six rectangles are touch sensory patterns that were |
3369 #+caption: are the regions that were deemed important by | 3369 #+caption: deemed important by |
3370 #+caption: =learn-touch-regions=. Each white square in the rectangles | 3370 #+caption: =learn-touch-regions=. Each white square in the rectangles |
3371 #+caption: above is a cluster of ``related" touch nodes as determined | 3371 #+caption: above is a cluster of ``related" touch nodes as determined |
3372 #+caption: by the system. Since each square in the ``cross" corresponds | 3372 #+caption: by the system. The worm has correctly discovered that it has six faces, and has partitioned its sensory map into these six faces. |
3373 #+caption: to a face, the worm has correctly discovered that it has | |
3374 #+caption: six faces. | |
3375 #+name: worm-touch-map | 3373 #+name: worm-touch-map |
3376 #+ATTR_LaTeX: :width 12cm | 3374 #+ATTR_LaTeX: :width 12cm |
3377 [[./images/touch-learn.png]] | 3375 [[./images/touch-learn.png]] |
3378 | 3376 |
3379 While simple, =learn-touch-regions= exploits regularities in both | 3377 While simple, =learn-touch-regions= exploits regularities in both |
3381 deduce that the worm has six sides. Note that =learn-touch-regions= | 3379 deduce that the worm has six sides. Note that =learn-touch-regions= |
3382 would work just as well even if the worm's touch sense data were | 3380 would work just as well even if the worm's touch sense data were |
3383 completely scrambled. The cross shape is just for convenience. This | 3381 completely scrambled. The cross shape is just for convenience. This |
3384 example justifies the use of pre-defined touch regions in =EMPATH=. | 3382 example justifies the use of pre-defined touch regions in =EMPATH=. |
3385 | 3383 |
3384 ** Recognizing an object using embodied representation | |
3385 | |
3386 At the beginning of the thesis, I suggested that we might recognize | |
3387 the chair in Figure \ref{hidden-chair} by imagining ourselves in | |
3388 the position of the man and realizing that he must be sitting on | |
3389 something in order to maintain that position. Here, I present a | |
3390 brief elaboration on how to this might be done. | |
3391 | |
3392 First, I need the feeling of leaning or resting /on/ some other | |
3393 object that is not the floor. This feeling is easy to describe | |
3394 using an embodied representation. | |
3395 | |
3396 #+caption: Program describing the sense of leaning or resting on something. | |
3397 #+caption: This involves a relaxed posture, the feeling of touching something, | |
3398 #+caption: and a period of stability where the worm does not move. | |
3399 #+name: draped | |
3400 #+begin_listing clojure | |
3401 #+begin_src clojure | |
3402 (defn draped? | |
3403 "Is the worm: | |
3404 -- not flat (the floor is not a 'chair') | |
3405 -- supported (not using its muscles to hold its position) | |
3406 -- stable (not changing its position) | |
3407 -- touching something (must register contact)" | |
3408 [experiences] | |
3409 (let [b2-hash (bin 2) | |
3410 touch (:touch (peek experiences)) | |
3411 total-contact | |
3412 (reduce | |
3413 + | |
3414 (map #(contact all-touch-coordinates %) | |
3415 (rest touch)))] | |
3416 (println total-contact) | |
3417 (and (not (resting? experiences)) | |
3418 (every? | |
3419 zero? | |
3420 (-> experiences | |
3421 (vector:last-n 25) | |
3422 (#(map :muscle %)) | |
3423 (flatten))) | |
3424 (-> experiences | |
3425 (vector:last-n 20) | |
3426 (#(map (comp b2-hash flatten :proprioception) %)) | |
3427 (set) | |
3428 (count) (= 1)) | |
3429 (< 0.03 total-contact)))) | |
3430 #+end_src | |
3431 #+end_listing | |
3432 | |
3433 #+caption: The =draped?= predicate detects the presence of the | |
3434 #+caption: cube whenever the worm interacts with it. The details of the | |
3435 #+caption: cube are irrelevant; only the way it influences the worm's | |
3436 #+caption: body matters. | |
3437 #+name: draped-video | |
3438 #+ATTR_LaTeX: :width 13cm | |
3439 [[./images/draped.png]] | |
3440 | |
3441 Though this is a simple example, using the =draped?= predicate to | |
3442 detect the cube has interesting advantages. The =draped?= predicate | |
3443 describes the cube not in terms of properties that the cube has, | |
3444 but instead in terms of how the worm interacts with it physically. | |
3445 This means that the cube can still be detected even if it is not | |
3446 visible, as long as its influence on the worm's body is visible. | |
3447 | |
3448 This system will also see the virtual cube created by a | |
3449 ``mimeworm", which uses its muscles in a very controlled way to | |
3450 mimic the appearance of leaning on a cube. The system will | |
3451 anticipate that there is an actual invisible cube that provides | |
3452 support! | |
3453 | |
3454 #+caption: Can you see the thing that this person is leaning on? | |
3455 #+caption: What properties does it have, other than how it makes the man's | |
3456 #+caption: elbow and shoulder feel? I wonder if people who can actually | |
3457 #+caption: maintain this pose easily still see the support? | |
3458 #+name: mime | |
3459 #+ATTR_LaTeX: :width 6cm | |
3460 [[./images/pablo-the-mime.png]] | |
3461 | |
3462 This makes me wonder about the psychology of actual mimes. Suppose | |
3463 for a moment that people have something analogous to \Phi-space and | |
3464 that one of the ways that they find objects in a scene is by their | |
3465 relation to other people's bodies. Suppose that a person watches a | |
3466 person miming an invisible wall. For a person with no experience | |
3467 with miming, their \Phi-space will only have entries that describe | |
3468 the scene with the sensation of their hands touching a wall. This | |
3469 sensation of touch will create a strong impression of a wall, even | |
3470 though the wall would have to be invisible. A person with | |
3471 experience in miming however, will have entries in their \Phi-space | |
3472 that describe the wall-miming position without a sense of touch. It | |
3473 will not seem to such as person that an invisible wall is present, | |
3474 but merely that the mime is holding out their hands in a special | |
3475 way. Thus, the theory that humans use something like \Phi-space | |
3476 weakly predicts that learning how to mime should break the power of | |
3477 miming illusions. Most optical illusions still work no matter how | |
3478 much you know about them, so this proposal would be quite | |
3479 interesting to test, as it predicts a non-standard result! | |
3480 | |
3481 | |
3482 #+BEGIN_LaTeX | |
3483 \clearpage | |
3484 #+END_LaTeX | |
3485 | |
3386 * Contributions | 3486 * Contributions |
3487 | |
3488 The big idea behind this thesis is a new way to represent and | |
3489 recognize physical actions, which I call /empathic representation/. | |
3490 Actions are represented as predicates which have access to the | |
3491 totality of a creature's sensory abilities. To recognize the | |
3492 physical actions of another creature similar to yourself, you | |
3493 imagine what they would feel by examining the position of their body | |
3494 and relating it to your own previous experience. | |
3387 | 3495 |
3388 The big idea behind this thesis is a new way to represent and | 3496 Empathic representation of physical actions is robust and general. |
3389 recognize physical actions -- empathic representation. Actions are | 3497 Because the representation is body-centered, it avoids baking in a |
3390 represented as predicates which have available the totality of a | 3498 particular viewpoint like you might get from learning from example |
3391 creature's sensory abilities. To recognize the physical actions of | 3499 videos. Because empathic representation relies on all of a |
3392 another creature similar to yourself, you imagine what they would | |
3393 feel by examining the position of their body and relating it to your | |
3394 own previous experience. | |
3395 | |
3396 Empathic description of physical actions is very robust and general. | |
3397 Because the representation is body-centered, it avoids the fragility | |
3398 of learning from example videos. Because it relies on all of a | |
3399 creature's senses, it can describe exactly what an action /feels | 3500 creature's senses, it can describe exactly what an action /feels |
3400 like/ without getting caught up in irrelevant details such as visual | 3501 like/ without getting caught up in irrelevant details such as visual |
3401 appearance. I think it is important that a correct description of | 3502 appearance. I think it is important that a correct description of |
3402 jumping (for example) should not waste even a single bit on the | 3503 jumping (for example) should not include irrelevant details such as |
3403 color of a person's clothes or skin; empathic representation can | 3504 the color of a person's clothes or skin; empathic representation can |
3404 avoid this waste by describing jumping in terms of touch, muscle | 3505 get right to the heart of what jumping is by describing it in terms |
3405 contractions, and the brief feeling of weightlessness. Empathic | 3506 of touch, muscle contractions, and a brief feeling of |
3406 representation is very low-level in that it describes actions using | 3507 weightlessness. Empathic representation is very low-level in that it |
3407 concrete sensory data with little abstraction, but it has the | 3508 describes actions using concrete sensory data with little |
3408 generality of much more abstract representations! | 3509 abstraction, but it has the generality of much more abstract |
3510 representations! | |
3409 | 3511 |
3410 Another important contribution of this thesis is the development of | 3512 Another important contribution of this thesis is the development of |
3411 the =CORTEX= system, a complete environment for creating simulated | 3513 the =CORTEX= system, a complete environment for creating simulated |
3412 creatures. You have seen how to implement five senses: touch, | 3514 creatures. You have seen how to implement five senses: touch, |
3413 proprioception, hearing, vision, and muscle tension. You have seen | 3515 proprioception, hearing, vision, and muscle tension. You have seen |
3414 how to create new creatures using blender, a 3D modeling tool. | 3516 how to create new creatures using blender, a 3D modeling tool. |
3415 | 3517 |
3416 I hope that =CORTEX= will be useful in further research projects. To | |
3417 this end I have included the full source to =CORTEX= along with a | |
3418 large suite of tests and examples. I have also created a user guide | |
3419 for =CORTEX= which is included in an appendix to this thesis. | |
3420 | |
3421 As a minor digression, you also saw how I used =CORTEX= to enable a | 3518 As a minor digression, you also saw how I used =CORTEX= to enable a |
3422 tiny worm to discover the topology of its skin simply by rolling on | 3519 tiny worm to discover the topology of its skin simply by rolling on |
3423 the ground. | 3520 the ground. You also saw how to detect objects using only embodied |
3424 | 3521 predicates. |
3425 In conclusion, the main contributions of this thesis are: | 3522 |
3426 | 3523 In conclusion, for this thesis I: |
3427 - =CORTEX=, a comprehensive platform for embodied AI experiments. | 3524 |
3428 =CORTEX= supports many features lacking in other systems, such | 3525 - Developed the idea of embodied representation, which describes |
3429 proper simulation of hearing. It is easy to create new =CORTEX= | 3526 actions that a creature can do in terms of first-person sensory |
3430 creatures using Blender, a free 3D modeling program. | 3527 data. |
3431 | 3528 |
3432 - =EMPATH=, which uses =CORTEX= to identify the actions of a | 3529 - Developed a method of empathic action recognition which uses |
3433 worm-like creature using a computational model of empathy. This | 3530 previous embodied experience and embodied representation of |
3434 empathic representation of actions is an important new kind of | 3531 actions to greatly constrain the possible interpretations of an |
3435 representation for physical actions. | 3532 action. |
3533 | |
3534 - Created =EMPATH=, a program which uses empathic action | |
3535 recognition to recognize physical actions in a simple model | |
3536 involving segmented worm-like creatures. | |
3537 | |
3538 - Created =CORTEX=, a comprehensive platform for embodied AI | |
3539 experiments. It is the base on which =EMPATH= is built. | |
3436 | 3540 |
3437 #+BEGIN_LaTeX | 3541 #+BEGIN_LaTeX |
3438 \newpage | 3542 \clearpage |
3439 \appendix | 3543 \appendix |
3440 #+END_LaTeX | 3544 #+END_LaTeX |
3441 | 3545 |
3442 * Appendix: =CORTEX= User Guide | 3546 * Appendix: =CORTEX= User Guide |
3443 | 3547 |