comparison thesis/cortex.org @ 548:0b891e0dd809

version 0.2 of thesis complete.
author Robert McIntyre <rlm@mit.edu>
date Thu, 01 May 2014 23:41:41 -0400
parents 5d89879fc894
children c14545acdfba
comparison
equal deleted inserted replaced
547:5d89879fc894 548:0b891e0dd809
511 in proportion to the amount of processing each frame. From the 511 in proportion to the amount of processing each frame. From the
512 perspective of the creatures inside the simulation, time always 512 perspective of the creatures inside the simulation, time always
513 appears to flow at a constant rate, regardless of how complicated 513 appears to flow at a constant rate, regardless of how complicated
514 the environment becomes or how many creatures are in the 514 the environment becomes or how many creatures are in the
515 simulation. The cost is that =CORTEX= can sometimes run slower than 515 simulation. The cost is that =CORTEX= can sometimes run slower than
516 real time. Time dialation works both ways, however --- simulations 516 real time. Time dilation works both ways, however --- simulations
517 of very simple creatures in =CORTEX= generally run at 40x real-time 517 of very simple creatures in =CORTEX= generally run at 40x real-time
518 on my machine! 518 on my machine!
519 519
520 ** All sense organs are two-dimensional surfaces 520 ** All sense organs are two-dimensional surfaces
521 521
563 Therefore, =CORTEX= must support the ability to create objects and 563 Therefore, =CORTEX= must support the ability to create objects and
564 then be able to ``paint'' points along their surfaces to describe 564 then be able to ``paint'' points along their surfaces to describe
565 each sense. 565 each sense.
566 566
567 Fortunately this idea is already a well known computer graphics 567 Fortunately this idea is already a well known computer graphics
568 technique called /UV-mapping/. In UV-maping, the three-dimensional 568 technique called /UV-mapping/. In UV-mapping, the three-dimensional
569 surface of a model is cut and smooshed until it fits on a 569 surface of a model is cut and smooshed until it fits on a
570 two-dimensional image. You paint whatever you want on that image, 570 two-dimensional image. You paint whatever you want on that image,
571 and when the three-dimensional shape is rendered in a game the 571 and when the three-dimensional shape is rendered in a game the
572 smooshing and cutting is reversed and the image appears on the 572 smooshing and cutting is reversed and the image appears on the
573 three-dimensional object. 573 three-dimensional object.
2812 of muscle contractions to transform the worm's body along a 2812 of muscle contractions to transform the worm's body along a
2813 specific path through \Phi-space. 2813 specific path through \Phi-space.
2814 2814
2815 The worm's total life experience is a long looping path through 2815 The worm's total life experience is a long looping path through
2816 \Phi-space. I will now introduce simple way of taking that 2816 \Phi-space. I will now introduce simple way of taking that
2817 experiece path and building a function that can infer complete 2817 experience path and building a function that can infer complete
2818 sensory experience given only a stream of proprioceptive data. This 2818 sensory experience given only a stream of proprioceptive data. This
2819 /empathy/ function will provide a bridge to use the body centered 2819 /empathy/ function will provide a bridge to use the body centered
2820 action predicates on video-like streams of information. 2820 action predicates on video-like streams of information.
2821 2821
2822 ** Empathy is the process of building paths in \Phi-space 2822 ** Empathy is the process of building paths in \Phi-space
2970 2970
2971 =longest-thread= takes time proportional to the average number of 2971 =longest-thread= takes time proportional to the average number of
2972 entries in a proprioceptive bin, because for each element in the 2972 entries in a proprioceptive bin, because for each element in the
2973 starting bin it performs a series of set lookups in the preceding 2973 starting bin it performs a series of set lookups in the preceding
2974 bins. If the total history is limited, then this takes time 2974 bins. If the total history is limited, then this takes time
2975 proprotional to a only a constant multiple of the number of entries 2975 proportional to a only a constant multiple of the number of entries
2976 in the starting bin. This analysis also applies, even if the action 2976 in the starting bin. This analysis also applies, even if the action
2977 requires multiple longest chains -- it's still the average number 2977 requires multiple longest chains -- it's still the average number
2978 of entries in a proprioceptive bin times the desired chain length. 2978 of entries in a proprioceptive bin times the desired chain length.
2979 Because =longest-thread= is so efficient and simple, I can 2979 Because =longest-thread= is so efficient and simple, I can
2980 interpret worm-actions in real time. 2980 interpret worm-actions in real time.
3123 worm's \Phi-space is generated from a simple motor script. Then the 3123 worm's \Phi-space is generated from a simple motor script. Then the
3124 worm is re-created in an environment almost exactly identical to 3124 worm is re-created in an environment almost exactly identical to
3125 the testing environment for the action-predicates, with one major 3125 the testing environment for the action-predicates, with one major
3126 difference : the only sensory information available to the system 3126 difference : the only sensory information available to the system
3127 is proprioception. From just the proprioception data and 3127 is proprioception. From just the proprioception data and
3128 \Phi-space, =longest-thread= synthesises a complete record the last 3128 \Phi-space, =longest-thread= synthesizes a complete record the last
3129 300 sensory experiences of the worm. These synthesized experiences 3129 300 sensory experiences of the worm. These synthesized experiences
3130 are fed directly into the action predicates =grand-circle?=, 3130 are fed directly into the action predicates =grand-circle?=,
3131 =curled?=, =wiggling?=, and =resting?= from before and their output 3131 =curled?=, =wiggling?=, and =resting?= from before and their output
3132 is printed to the screen at each frame. 3132 is printed to the screen at each frame.
3133 3133
3363 #+name: worm-roll 3363 #+name: worm-roll
3364 #+ATTR_LaTeX: :width 12cm 3364 #+ATTR_LaTeX: :width 12cm
3365 [[./images/worm-roll.png]] 3365 [[./images/worm-roll.png]]
3366 3366
3367 #+caption: After completing its adventures, the worm now knows 3367 #+caption: After completing its adventures, the worm now knows
3368 #+caption: how its touch sensors are arranged along its skin. These 3368 #+caption: how its touch sensors are arranged along its skin. Each of these six rectangles are touch sensory patterns that were
3369 #+caption: are the regions that were deemed important by 3369 #+caption: deemed important by
3370 #+caption: =learn-touch-regions=. Each white square in the rectangles 3370 #+caption: =learn-touch-regions=. Each white square in the rectangles
3371 #+caption: above is a cluster of ``related" touch nodes as determined 3371 #+caption: above is a cluster of ``related" touch nodes as determined
3372 #+caption: by the system. Since each square in the ``cross" corresponds 3372 #+caption: by the system. The worm has correctly discovered that it has six faces, and has partitioned its sensory map into these six faces.
3373 #+caption: to a face, the worm has correctly discovered that it has
3374 #+caption: six faces.
3375 #+name: worm-touch-map 3373 #+name: worm-touch-map
3376 #+ATTR_LaTeX: :width 12cm 3374 #+ATTR_LaTeX: :width 12cm
3377 [[./images/touch-learn.png]] 3375 [[./images/touch-learn.png]]
3378 3376
3379 While simple, =learn-touch-regions= exploits regularities in both 3377 While simple, =learn-touch-regions= exploits regularities in both
3381 deduce that the worm has six sides. Note that =learn-touch-regions= 3379 deduce that the worm has six sides. Note that =learn-touch-regions=
3382 would work just as well even if the worm's touch sense data were 3380 would work just as well even if the worm's touch sense data were
3383 completely scrambled. The cross shape is just for convenience. This 3381 completely scrambled. The cross shape is just for convenience. This
3384 example justifies the use of pre-defined touch regions in =EMPATH=. 3382 example justifies the use of pre-defined touch regions in =EMPATH=.
3385 3383
3384 ** Recognizing an object using embodied representation
3385
3386 At the beginning of the thesis, I suggested that we might recognize
3387 the chair in Figure \ref{hidden-chair} by imagining ourselves in
3388 the position of the man and realizing that he must be sitting on
3389 something in order to maintain that position. Here, I present a
3390 brief elaboration on how to this might be done.
3391
3392 First, I need the feeling of leaning or resting /on/ some other
3393 object that is not the floor. This feeling is easy to describe
3394 using an embodied representation.
3395
3396 #+caption: Program describing the sense of leaning or resting on something.
3397 #+caption: This involves a relaxed posture, the feeling of touching something,
3398 #+caption: and a period of stability where the worm does not move.
3399 #+name: draped
3400 #+begin_listing clojure
3401 #+begin_src clojure
3402 (defn draped?
3403 "Is the worm:
3404 -- not flat (the floor is not a 'chair')
3405 -- supported (not using its muscles to hold its position)
3406 -- stable (not changing its position)
3407 -- touching something (must register contact)"
3408 [experiences]
3409 (let [b2-hash (bin 2)
3410 touch (:touch (peek experiences))
3411 total-contact
3412 (reduce
3413 +
3414 (map #(contact all-touch-coordinates %)
3415 (rest touch)))]
3416 (println total-contact)
3417 (and (not (resting? experiences))
3418 (every?
3419 zero?
3420 (-> experiences
3421 (vector:last-n 25)
3422 (#(map :muscle %))
3423 (flatten)))
3424 (-> experiences
3425 (vector:last-n 20)
3426 (#(map (comp b2-hash flatten :proprioception) %))
3427 (set)
3428 (count) (= 1))
3429 (< 0.03 total-contact))))
3430 #+end_src
3431 #+end_listing
3432
3433 #+caption: The =draped?= predicate detects the presence of the
3434 #+caption: cube whenever the worm interacts with it. The details of the
3435 #+caption: cube are irrelevant; only the way it influences the worm's
3436 #+caption: body matters.
3437 #+name: draped-video
3438 #+ATTR_LaTeX: :width 13cm
3439 [[./images/draped.png]]
3440
3441 Though this is a simple example, using the =draped?= predicate to
3442 detect the cube has interesting advantages. The =draped?= predicate
3443 describes the cube not in terms of properties that the cube has,
3444 but instead in terms of how the worm interacts with it physically.
3445 This means that the cube can still be detected even if it is not
3446 visible, as long as its influence on the worm's body is visible.
3447
3448 This system will also see the virtual cube created by a
3449 ``mimeworm", which uses its muscles in a very controlled way to
3450 mimic the appearance of leaning on a cube. The system will
3451 anticipate that there is an actual invisible cube that provides
3452 support!
3453
3454 #+caption: Can you see the thing that this person is leaning on?
3455 #+caption: What properties does it have, other than how it makes the man's
3456 #+caption: elbow and shoulder feel? I wonder if people who can actually
3457 #+caption: maintain this pose easily still see the support?
3458 #+name: mime
3459 #+ATTR_LaTeX: :width 6cm
3460 [[./images/pablo-the-mime.png]]
3461
3462 This makes me wonder about the psychology of actual mimes. Suppose
3463 for a moment that people have something analogous to \Phi-space and
3464 that one of the ways that they find objects in a scene is by their
3465 relation to other people's bodies. Suppose that a person watches a
3466 person miming an invisible wall. For a person with no experience
3467 with miming, their \Phi-space will only have entries that describe
3468 the scene with the sensation of their hands touching a wall. This
3469 sensation of touch will create a strong impression of a wall, even
3470 though the wall would have to be invisible. A person with
3471 experience in miming however, will have entries in their \Phi-space
3472 that describe the wall-miming position without a sense of touch. It
3473 will not seem to such as person that an invisible wall is present,
3474 but merely that the mime is holding out their hands in a special
3475 way. Thus, the theory that humans use something like \Phi-space
3476 weakly predicts that learning how to mime should break the power of
3477 miming illusions. Most optical illusions still work no matter how
3478 much you know about them, so this proposal would be quite
3479 interesting to test, as it predicts a non-standard result!
3480
3481
3482 #+BEGIN_LaTeX
3483 \clearpage
3484 #+END_LaTeX
3485
3386 * Contributions 3486 * Contributions
3487
3488 The big idea behind this thesis is a new way to represent and
3489 recognize physical actions, which I call /empathic representation/.
3490 Actions are represented as predicates which have access to the
3491 totality of a creature's sensory abilities. To recognize the
3492 physical actions of another creature similar to yourself, you
3493 imagine what they would feel by examining the position of their body
3494 and relating it to your own previous experience.
3387 3495
3388 The big idea behind this thesis is a new way to represent and 3496 Empathic representation of physical actions is robust and general.
3389 recognize physical actions -- empathic representation. Actions are 3497 Because the representation is body-centered, it avoids baking in a
3390 represented as predicates which have available the totality of a 3498 particular viewpoint like you might get from learning from example
3391 creature's sensory abilities. To recognize the physical actions of 3499 videos. Because empathic representation relies on all of a
3392 another creature similar to yourself, you imagine what they would
3393 feel by examining the position of their body and relating it to your
3394 own previous experience.
3395
3396 Empathic description of physical actions is very robust and general.
3397 Because the representation is body-centered, it avoids the fragility
3398 of learning from example videos. Because it relies on all of a
3399 creature's senses, it can describe exactly what an action /feels 3500 creature's senses, it can describe exactly what an action /feels
3400 like/ without getting caught up in irrelevant details such as visual 3501 like/ without getting caught up in irrelevant details such as visual
3401 appearance. I think it is important that a correct description of 3502 appearance. I think it is important that a correct description of
3402 jumping (for example) should not waste even a single bit on the 3503 jumping (for example) should not include irrelevant details such as
3403 color of a person's clothes or skin; empathic representation can 3504 the color of a person's clothes or skin; empathic representation can
3404 avoid this waste by describing jumping in terms of touch, muscle 3505 get right to the heart of what jumping is by describing it in terms
3405 contractions, and the brief feeling of weightlessness. Empathic 3506 of touch, muscle contractions, and a brief feeling of
3406 representation is very low-level in that it describes actions using 3507 weightlessness. Empathic representation is very low-level in that it
3407 concrete sensory data with little abstraction, but it has the 3508 describes actions using concrete sensory data with little
3408 generality of much more abstract representations! 3509 abstraction, but it has the generality of much more abstract
3510 representations!
3409 3511
3410 Another important contribution of this thesis is the development of 3512 Another important contribution of this thesis is the development of
3411 the =CORTEX= system, a complete environment for creating simulated 3513 the =CORTEX= system, a complete environment for creating simulated
3412 creatures. You have seen how to implement five senses: touch, 3514 creatures. You have seen how to implement five senses: touch,
3413 proprioception, hearing, vision, and muscle tension. You have seen 3515 proprioception, hearing, vision, and muscle tension. You have seen
3414 how to create new creatures using blender, a 3D modeling tool. 3516 how to create new creatures using blender, a 3D modeling tool.
3415 3517
3416 I hope that =CORTEX= will be useful in further research projects. To
3417 this end I have included the full source to =CORTEX= along with a
3418 large suite of tests and examples. I have also created a user guide
3419 for =CORTEX= which is included in an appendix to this thesis.
3420
3421 As a minor digression, you also saw how I used =CORTEX= to enable a 3518 As a minor digression, you also saw how I used =CORTEX= to enable a
3422 tiny worm to discover the topology of its skin simply by rolling on 3519 tiny worm to discover the topology of its skin simply by rolling on
3423 the ground. 3520 the ground. You also saw how to detect objects using only embodied
3424 3521 predicates.
3425 In conclusion, the main contributions of this thesis are: 3522
3426 3523 In conclusion, for this thesis I:
3427 - =CORTEX=, a comprehensive platform for embodied AI experiments. 3524
3428 =CORTEX= supports many features lacking in other systems, such 3525 - Developed the idea of embodied representation, which describes
3429 proper simulation of hearing. It is easy to create new =CORTEX= 3526 actions that a creature can do in terms of first-person sensory
3430 creatures using Blender, a free 3D modeling program. 3527 data.
3431 3528
3432 - =EMPATH=, which uses =CORTEX= to identify the actions of a 3529 - Developed a method of empathic action recognition which uses
3433 worm-like creature using a computational model of empathy. This 3530 previous embodied experience and embodied representation of
3434 empathic representation of actions is an important new kind of 3531 actions to greatly constrain the possible interpretations of an
3435 representation for physical actions. 3532 action.
3533
3534 - Created =EMPATH=, a program which uses empathic action
3535 recognition to recognize physical actions in a simple model
3536 involving segmented worm-like creatures.
3537
3538 - Created =CORTEX=, a comprehensive platform for embodied AI
3539 experiments. It is the base on which =EMPATH= is built.
3436 3540
3437 #+BEGIN_LaTeX 3541 #+BEGIN_LaTeX
3438 \newpage 3542 \clearpage
3439 \appendix 3543 \appendix
3440 #+END_LaTeX 3544 #+END_LaTeX
3441 3545
3442 * Appendix: =CORTEX= User Guide 3546 * Appendix: =CORTEX= User Guide
3443 3547