comparison thesis/cortex.org @ 528:fd74479db5cb

merge some winston updates.
author Robert McIntyre <rlm@mit.edu>
date Mon, 21 Apr 2014 02:12:51 -0400
parents 25f23cfd56ce
children 96c189d4d15e
comparison
equal deleted inserted replaced
527:ac747fa0a678 528:fd74479db5cb
42 42
43 43
44 * Empathy \& Embodiment: problem solving strategies 44 * Empathy \& Embodiment: problem solving strategies
45 45
46 By the end of this thesis, you will have seen a novel approach to 46 By the end of this thesis, you will have seen a novel approach to
47 interpreting video using embodiment and empathy. You will have also 47 interpreting video using embodiment and empathy. You will also see
48 seen one way to efficiently implement empathy for embodied 48 one way to efficiently implement physical empathy for embodied
49 creatures. Finally, you will become familiar with =CORTEX=, a system 49 creatures. Finally, you will become familiar with =CORTEX=, a system
50 for designing and simulating creatures with rich senses, which you 50 for designing and simulating creatures with rich senses, which I
51 may choose to use in your own research. 51 have designed as a library that you can use in your own research.
52 Note that I /do not/ process video directly --- I start with
53 knowledge of the positions of a creature's body parts and works from
54 there.
52 55
53 This is the core vision of my thesis: That one of the important ways 56 This is the core vision of my thesis: That one of the important ways
54 in which we understand others is by imagining ourselves in their 57 in which we understand others is by imagining ourselves in their
55 position and emphatically feeling experiences relative to our own 58 position and emphatically feeling experiences relative to our own
56 bodies. By understanding events in terms of our own previous 59 bodies. By understanding events in terms of our own previous
58 would otherwise be an unwieldy exponential search. This extra 61 would otherwise be an unwieldy exponential search. This extra
59 constraint can be the difference between easily understanding what 62 constraint can be the difference between easily understanding what
60 is happening in a video and being completely lost in a sea of 63 is happening in a video and being completely lost in a sea of
61 incomprehensible color and movement. 64 incomprehensible color and movement.
62 65
63 ** The problem: recognizing actions in video is hard! 66 ** The problem: recognizing actions is hard!
64 67
65 Examine the following image. What is happening? As you, and indeed 68 Examine the following image. What is happening? As you, and indeed
66 very young children, can easily determine, this is an image of 69 very young children, can easily determine, this is an image of
67 drinking. 70 drinking.
68 71
82 probabilities than with recognizing various affordances: things you 85 probabilities than with recognizing various affordances: things you
83 can move, objects you can grasp, spaces that can be filled . For 86 can move, objects you can grasp, spaces that can be filled . For
84 example, what processes might enable you to see the chair in figure 87 example, what processes might enable you to see the chair in figure
85 \ref{hidden-chair}? 88 \ref{hidden-chair}?
86 89
87 #+caption: The chair in this image is quite obvious to humans, but I 90 #+caption: The chair in this image is quite obvious to humans, but
88 #+caption: doubt that any modern computer vision program can find it. 91 #+caption: it can't be found by any modern computer vision program.
89 #+name: hidden-chair 92 #+name: hidden-chair
90 #+ATTR_LaTeX: :width 10cm 93 #+ATTR_LaTeX: :width 10cm
91 [[./images/fat-person-sitting-at-desk.jpg]] 94 [[./images/fat-person-sitting-at-desk.jpg]]
92 95
93 Finally, how is it that you can easily tell the difference between 96 Finally, how is it that you can easily tell the difference between
478 saved for later use. It is harder to conduct science because it is 481 saved for later use. It is harder to conduct science because it is
479 harder to repeat an experiment. The worst thing about using the 482 harder to repeat an experiment. The worst thing about using the
480 real world instead of a simulation is the matter of time. Instead 483 real world instead of a simulation is the matter of time. Instead
481 of simulated time you get the constant and unstoppable flow of 484 of simulated time you get the constant and unstoppable flow of
482 real time. This severely limits the sorts of software you can use 485 real time. This severely limits the sorts of software you can use
483 to program the AI because all sense inputs must be handled in real 486 to program an AI, because all sense inputs must be handled in real
484 time. Complicated ideas may have to be implemented in hardware or 487 time. Complicated ideas may have to be implemented in hardware or
485 may simply be impossible given the current speed of our 488 may simply be impossible given the current speed of our
486 processors. Contrast this with a simulation, in which the flow of 489 processors. Contrast this with a simulation, in which the flow of
487 time in the simulated world can be slowed down to accommodate the 490 time in the simulated world can be slowed down to accommodate the
488 limitations of the character's programming. In terms of cost, 491 limitations of the character's programming. In terms of cost,
548 cochlea; each one is sensitive to a slightly different frequency of 551 cochlea; each one is sensitive to a slightly different frequency of
549 sound. For eyes, it is rods and cones distributed along the surface 552 sound. For eyes, it is rods and cones distributed along the surface
550 of the retina. In each case, we can describe the sense with a 553 of the retina. In each case, we can describe the sense with a
551 surface and a distribution of sensors along that surface. 554 surface and a distribution of sensors along that surface.
552 555
553 The neat idea is that every human sense can be effectively 556 In fact, almost every human sense can be effectively described in
554 described in terms of a surface containing embedded sensors. If the 557 terms of a surface containing embedded sensors. If the sense had
555 sense had any more dimensions, then there wouldn't be enough room 558 any more dimensions, then there wouldn't be enough room in the
556 in the spinal chord to transmit the information! 559 spinal chord to transmit the information!
557 560
558 Therefore, =CORTEX= must support the ability to create objects and 561 Therefore, =CORTEX= must support the ability to create objects and
559 then be able to ``paint'' points along their surfaces to describe 562 then be able to ``paint'' points along their surfaces to describe
560 each sense. 563 each sense.
561 564
2376 (movement-kernel creature muscle))) 2379 (movement-kernel creature muscle)))
2377 #+END_SRC 2380 #+END_SRC
2378 #+end_listing 2381 #+end_listing
2379 2382
2380 2383
2381 =movement-kernel= creates a function that will move the nearest 2384 =movement-kernel= creates a function that controlls the movement
2382 physical object to the muscle node. The muscle exerts a rotational 2385 of the nearest physical node to the muscle node. The muscle exerts
2383 force dependent on it's orientation to the object in the blender 2386 a rotational force dependent on it's orientation to the object in
2384 file. The function returned by =movement-kernel= is also a sense 2387 the blender file. The function returned by =movement-kernel= is
2385 function: it returns the percent of the total muscle strength that 2388 also a sense function: it returns the percent of the total muscle
2386 is currently being employed. This is analogous to muscle tension 2389 strength that is currently being employed. This is analogous to
2387 in humans and completes the sense of proprioception begun in the 2390 muscle tension in humans and completes the sense of proprioception
2388 last section. 2391 begun in the last section.
2389 2392
2390 ** =CORTEX= brings complex creatures to life! 2393 ** =CORTEX= brings complex creatures to life!
2391 2394
2392 The ultimate test of =CORTEX= is to create a creature with the full 2395 The ultimate test of =CORTEX= is to create a creature with the full
2393 gamut of senses and put it though its paces. 2396 gamut of senses and put it though its paces.
2489 - Inverse kinematics :: experiments in sense guided motor control 2492 - Inverse kinematics :: experiments in sense guided motor control
2490 are easy given =CORTEX='s support -- you can get right to the 2493 are easy given =CORTEX='s support -- you can get right to the
2491 hard control problems without worrying about physics or 2494 hard control problems without worrying about physics or
2492 senses. 2495 senses.
2493 2496
2497 \newpage
2498
2494 * =EMPATH=: action recognition in a simulated worm 2499 * =EMPATH=: action recognition in a simulated worm
2495 2500
2496 Here I develop a computational model of empathy, using =CORTEX= as a 2501 Here I develop a computational model of empathy, using =CORTEX= as a
2497 base. Empathy in this context is the ability to observe another 2502 base. Empathy in this context is the ability to observe another
2498 creature and infer what sorts of sensations that creature is 2503 creature and infer what sorts of sensations that creature is
2500 free-play, where the creature moves around and gains sensory 2505 free-play, where the creature moves around and gains sensory
2501 experience. From this experience I construct a representation of the 2506 experience. From this experience I construct a representation of the
2502 creature's sensory state space, which I call \Phi-space. Using 2507 creature's sensory state space, which I call \Phi-space. Using
2503 \Phi-space, I construct an efficient function which takes the 2508 \Phi-space, I construct an efficient function which takes the
2504 limited data that comes from observing another creature and enriches 2509 limited data that comes from observing another creature and enriches
2505 it full compliment of imagined sensory data. I can then use the 2510 it with a full compliment of imagined sensory data. I can then use
2506 imagined sensory data to recognize what the observed creature is 2511 the imagined sensory data to recognize what the observed creature is
2507 doing and feeling, using straightforward embodied action predicates. 2512 doing and feeling, using straightforward embodied action predicates.
2508 This is all demonstrated with using a simple worm-like creature, and 2513 This is all demonstrated with using a simple worm-like creature, and
2509 recognizing worm-actions based on limited data. 2514 recognizing worm-actions based on limited data.
2510 2515
2511 #+caption: Here is the worm with which we will be working. 2516 #+caption: Here is the worm with which we will be working.
2553 2558
2554 ** Action recognition is easy with a full gamut of senses 2559 ** Action recognition is easy with a full gamut of senses
2555 2560
2556 Embodied representations using multiple senses such as touch, 2561 Embodied representations using multiple senses such as touch,
2557 proprioception, and muscle tension turns out be be exceedingly 2562 proprioception, and muscle tension turns out be be exceedingly
2558 efficient at describing body-centered actions. It is the ``right 2563 efficient at describing body-centered actions. It is the right
2559 language for the job''. For example, it takes only around 5 lines 2564 language for the job. For example, it takes only around 5 lines of
2560 of LISP code to describe the action of ``curling'' using embodied 2565 LISP code to describe the action of curling using embodied
2561 primitives. It takes about 10 lines to describe the seemingly 2566 primitives. It takes about 10 lines to describe the seemingly
2562 complicated action of wiggling. 2567 complicated action of wiggling.
2563 2568
2564 The following action predicates each take a stream of sensory 2569 The following action predicates each take a stream of sensory
2565 experience, observe however much of it they desire, and decide 2570 experience, observe however much of it they desire, and decide
2566 whether the worm is doing the action they describe. =curled?= 2571 whether the worm is doing the action they describe. =curled?=
2567 relies on proprioception, =resting?= relies on touch, =wiggling?= 2572 relies on proprioception, =resting?= relies on touch, =wiggling?=
2568 relies on a Fourier analysis of muscle contraction, and 2573 relies on a Fourier analysis of muscle contraction, and
2569 =grand-circle?= relies on touch and reuses =curled?= as a guard. 2574 =grand-circle?= relies on touch and reuses =curled?= in its
2575 definition, showing how embodied predicates can be composed.
2570 2576
2571 #+caption: Program for detecting whether the worm is curled. This is the 2577 #+caption: Program for detecting whether the worm is curled. This is the
2572 #+caption: simplest action predicate, because it only uses the last frame 2578 #+caption: simplest action predicate, because it only uses the last frame
2573 #+caption: of sensory experience, and only uses proprioceptive data. Even 2579 #+caption: of sensory experience, and only uses proprioceptive data. Even
2574 #+caption: this simple predicate, however, is automatically frame 2580 #+caption: this simple predicate, however, is automatically frame
2575 #+caption: independent and ignores vermopomorphic differences such as 2581 #+caption: independent and ignores vermopomorphic \footnote{Like
2576 #+caption: worm textures and colors. 2582 #+caption: \emph{anthropomorphic}, except for worms instead of humans.}
2583 #+caption: differences such as worm textures and colors.
2577 #+name: curled 2584 #+name: curled
2578 #+begin_listing clojure 2585 #+begin_listing clojure
2579 #+begin_src clojure 2586 #+begin_src clojure
2580 (defn curled? 2587 (defn curled?
2581 "Is the worm curled up?" 2588 "Is the worm curled up?"
2733 will work regardless of whether the worm is a different color or 2740 will work regardless of whether the worm is a different color or
2734 heavily textured, or if the environment has strange lighting. 2741 heavily textured, or if the environment has strange lighting.
2735 2742
2736 The trick now is to make the action predicates work even when the 2743 The trick now is to make the action predicates work even when the
2737 sensory data on which they depend is absent. If I can do that, then 2744 sensory data on which they depend is absent. If I can do that, then
2738 I will have gained much, 2745 I will have gained much.
2739 2746
2740 ** \Phi-space describes the worm's experiences 2747 ** \Phi-space describes the worm's experiences
2741 2748
2742 As a first step towards building empathy, I need to gather all of 2749 As a first step towards building empathy, I need to gather all of
2743 the worm's experiences during free play. I use a simple vector to 2750 the worm's experiences during free play. I use a simple vector to