Mercurial > cortex
comparison thesis/cortex.org @ 525:25f23cfd56ce
alter pics, enhance text.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Mon, 21 Apr 2014 02:13:23 -0400 |
parents | 1e51263afdc0 |
children | 96c189d4d15e |
comparison
equal
deleted
inserted
replaced
524:8e52a2802821 | 525:25f23cfd56ce |
---|---|
42 | 42 |
43 | 43 |
44 * Empathy \& Embodiment: problem solving strategies | 44 * Empathy \& Embodiment: problem solving strategies |
45 | 45 |
46 By the end of this thesis, you will have seen a novel approach to | 46 By the end of this thesis, you will have seen a novel approach to |
47 interpreting video using embodiment and empathy. You will have also | 47 interpreting video using embodiment and empathy. You will also see |
48 seen one way to efficiently implement empathy for embodied | 48 one way to efficiently implement physical empathy for embodied |
49 creatures. Finally, you will become familiar with =CORTEX=, a system | 49 creatures. Finally, you will become familiar with =CORTEX=, a system |
50 for designing and simulating creatures with rich senses, which you | 50 for designing and simulating creatures with rich senses, which I |
51 may choose to use in your own research. | 51 have designed as a library that you can use in your own research. |
52 Note that I /do not/ process video directly --- I start with | |
53 knowledge of the positions of a creature's body parts and works from | |
54 there. | |
52 | 55 |
53 This is the core vision of my thesis: That one of the important ways | 56 This is the core vision of my thesis: That one of the important ways |
54 in which we understand others is by imagining ourselves in their | 57 in which we understand others is by imagining ourselves in their |
55 position and emphatically feeling experiences relative to our own | 58 position and emphatically feeling experiences relative to our own |
56 bodies. By understanding events in terms of our own previous | 59 bodies. By understanding events in terms of our own previous |
58 would otherwise be an unwieldy exponential search. This extra | 61 would otherwise be an unwieldy exponential search. This extra |
59 constraint can be the difference between easily understanding what | 62 constraint can be the difference between easily understanding what |
60 is happening in a video and being completely lost in a sea of | 63 is happening in a video and being completely lost in a sea of |
61 incomprehensible color and movement. | 64 incomprehensible color and movement. |
62 | 65 |
63 ** The problem: recognizing actions in video is hard! | 66 ** The problem: recognizing actions is hard! |
64 | 67 |
65 Examine the following image. What is happening? As you, and indeed | 68 Examine the following image. What is happening? As you, and indeed |
66 very young children, can easily determine, this is an image of | 69 very young children, can easily determine, this is an image of |
67 drinking. | 70 drinking. |
68 | 71 |
82 probabilities than with recognizing various affordances: things you | 85 probabilities than with recognizing various affordances: things you |
83 can move, objects you can grasp, spaces that can be filled . For | 86 can move, objects you can grasp, spaces that can be filled . For |
84 example, what processes might enable you to see the chair in figure | 87 example, what processes might enable you to see the chair in figure |
85 \ref{hidden-chair}? | 88 \ref{hidden-chair}? |
86 | 89 |
87 #+caption: The chair in this image is quite obvious to humans, but I | 90 #+caption: The chair in this image is quite obvious to humans, but |
88 #+caption: doubt that any modern computer vision program can find it. | 91 #+caption: it can't be found by any modern computer vision program. |
89 #+name: hidden-chair | 92 #+name: hidden-chair |
90 #+ATTR_LaTeX: :width 10cm | 93 #+ATTR_LaTeX: :width 10cm |
91 [[./images/fat-person-sitting-at-desk.jpg]] | 94 [[./images/fat-person-sitting-at-desk.jpg]] |
92 | 95 |
93 Finally, how is it that you can easily tell the difference between | 96 Finally, how is it that you can easily tell the difference between |
478 saved for later use. It is harder to conduct science because it is | 481 saved for later use. It is harder to conduct science because it is |
479 harder to repeat an experiment. The worst thing about using the | 482 harder to repeat an experiment. The worst thing about using the |
480 real world instead of a simulation is the matter of time. Instead | 483 real world instead of a simulation is the matter of time. Instead |
481 of simulated time you get the constant and unstoppable flow of | 484 of simulated time you get the constant and unstoppable flow of |
482 real time. This severely limits the sorts of software you can use | 485 real time. This severely limits the sorts of software you can use |
483 to program the AI because all sense inputs must be handled in real | 486 to program an AI, because all sense inputs must be handled in real |
484 time. Complicated ideas may have to be implemented in hardware or | 487 time. Complicated ideas may have to be implemented in hardware or |
485 may simply be impossible given the current speed of our | 488 may simply be impossible given the current speed of our |
486 processors. Contrast this with a simulation, in which the flow of | 489 processors. Contrast this with a simulation, in which the flow of |
487 time in the simulated world can be slowed down to accommodate the | 490 time in the simulated world can be slowed down to accommodate the |
488 limitations of the character's programming. In terms of cost, | 491 limitations of the character's programming. In terms of cost, |
548 cochlea; each one is sensitive to a slightly different frequency of | 551 cochlea; each one is sensitive to a slightly different frequency of |
549 sound. For eyes, it is rods and cones distributed along the surface | 552 sound. For eyes, it is rods and cones distributed along the surface |
550 of the retina. In each case, we can describe the sense with a | 553 of the retina. In each case, we can describe the sense with a |
551 surface and a distribution of sensors along that surface. | 554 surface and a distribution of sensors along that surface. |
552 | 555 |
553 The neat idea is that every human sense can be effectively | 556 In fact, almost every human sense can be effectively described in |
554 described in terms of a surface containing embedded sensors. If the | 557 terms of a surface containing embedded sensors. If the sense had |
555 sense had any more dimensions, then there wouldn't be enough room | 558 any more dimensions, then there wouldn't be enough room in the |
556 in the spinal chord to transmit the information! | 559 spinal chord to transmit the information! |
557 | 560 |
558 Therefore, =CORTEX= must support the ability to create objects and | 561 Therefore, =CORTEX= must support the ability to create objects and |
559 then be able to ``paint'' points along their surfaces to describe | 562 then be able to ``paint'' points along their surfaces to describe |
560 each sense. | 563 each sense. |
561 | 564 |
2376 (movement-kernel creature muscle))) | 2379 (movement-kernel creature muscle))) |
2377 #+END_SRC | 2380 #+END_SRC |
2378 #+end_listing | 2381 #+end_listing |
2379 | 2382 |
2380 | 2383 |
2381 =movement-kernel= creates a function that will move the nearest | 2384 =movement-kernel= creates a function that controlls the movement |
2382 physical object to the muscle node. The muscle exerts a rotational | 2385 of the nearest physical node to the muscle node. The muscle exerts |
2383 force dependent on it's orientation to the object in the blender | 2386 a rotational force dependent on it's orientation to the object in |
2384 file. The function returned by =movement-kernel= is also a sense | 2387 the blender file. The function returned by =movement-kernel= is |
2385 function: it returns the percent of the total muscle strength that | 2388 also a sense function: it returns the percent of the total muscle |
2386 is currently being employed. This is analogous to muscle tension | 2389 strength that is currently being employed. This is analogous to |
2387 in humans and completes the sense of proprioception begun in the | 2390 muscle tension in humans and completes the sense of proprioception |
2388 last section. | 2391 begun in the last section. |
2389 | 2392 |
2390 ** =CORTEX= brings complex creatures to life! | 2393 ** =CORTEX= brings complex creatures to life! |
2391 | 2394 |
2392 The ultimate test of =CORTEX= is to create a creature with the full | 2395 The ultimate test of =CORTEX= is to create a creature with the full |
2393 gamut of senses and put it though its paces. | 2396 gamut of senses and put it though its paces. |
2489 - Inverse kinematics :: experiments in sense guided motor control | 2492 - Inverse kinematics :: experiments in sense guided motor control |
2490 are easy given =CORTEX='s support -- you can get right to the | 2493 are easy given =CORTEX='s support -- you can get right to the |
2491 hard control problems without worrying about physics or | 2494 hard control problems without worrying about physics or |
2492 senses. | 2495 senses. |
2493 | 2496 |
2497 \newpage | |
2498 | |
2494 * =EMPATH=: action recognition in a simulated worm | 2499 * =EMPATH=: action recognition in a simulated worm |
2495 | 2500 |
2496 Here I develop a computational model of empathy, using =CORTEX= as a | 2501 Here I develop a computational model of empathy, using =CORTEX= as a |
2497 base. Empathy in this context is the ability to observe another | 2502 base. Empathy in this context is the ability to observe another |
2498 creature and infer what sorts of sensations that creature is | 2503 creature and infer what sorts of sensations that creature is |
2500 free-play, where the creature moves around and gains sensory | 2505 free-play, where the creature moves around and gains sensory |
2501 experience. From this experience I construct a representation of the | 2506 experience. From this experience I construct a representation of the |
2502 creature's sensory state space, which I call \Phi-space. Using | 2507 creature's sensory state space, which I call \Phi-space. Using |
2503 \Phi-space, I construct an efficient function which takes the | 2508 \Phi-space, I construct an efficient function which takes the |
2504 limited data that comes from observing another creature and enriches | 2509 limited data that comes from observing another creature and enriches |
2505 it full compliment of imagined sensory data. I can then use the | 2510 it with a full compliment of imagined sensory data. I can then use |
2506 imagined sensory data to recognize what the observed creature is | 2511 the imagined sensory data to recognize what the observed creature is |
2507 doing and feeling, using straightforward embodied action predicates. | 2512 doing and feeling, using straightforward embodied action predicates. |
2508 This is all demonstrated with using a simple worm-like creature, and | 2513 This is all demonstrated with using a simple worm-like creature, and |
2509 recognizing worm-actions based on limited data. | 2514 recognizing worm-actions based on limited data. |
2510 | 2515 |
2511 #+caption: Here is the worm with which we will be working. | 2516 #+caption: Here is the worm with which we will be working. |
2553 | 2558 |
2554 ** Action recognition is easy with a full gamut of senses | 2559 ** Action recognition is easy with a full gamut of senses |
2555 | 2560 |
2556 Embodied representations using multiple senses such as touch, | 2561 Embodied representations using multiple senses such as touch, |
2557 proprioception, and muscle tension turns out be be exceedingly | 2562 proprioception, and muscle tension turns out be be exceedingly |
2558 efficient at describing body-centered actions. It is the ``right | 2563 efficient at describing body-centered actions. It is the right |
2559 language for the job''. For example, it takes only around 5 lines | 2564 language for the job. For example, it takes only around 5 lines of |
2560 of LISP code to describe the action of ``curling'' using embodied | 2565 LISP code to describe the action of curling using embodied |
2561 primitives. It takes about 10 lines to describe the seemingly | 2566 primitives. It takes about 10 lines to describe the seemingly |
2562 complicated action of wiggling. | 2567 complicated action of wiggling. |
2563 | 2568 |
2564 The following action predicates each take a stream of sensory | 2569 The following action predicates each take a stream of sensory |
2565 experience, observe however much of it they desire, and decide | 2570 experience, observe however much of it they desire, and decide |
2566 whether the worm is doing the action they describe. =curled?= | 2571 whether the worm is doing the action they describe. =curled?= |
2567 relies on proprioception, =resting?= relies on touch, =wiggling?= | 2572 relies on proprioception, =resting?= relies on touch, =wiggling?= |
2568 relies on a Fourier analysis of muscle contraction, and | 2573 relies on a Fourier analysis of muscle contraction, and |
2569 =grand-circle?= relies on touch and reuses =curled?= as a guard. | 2574 =grand-circle?= relies on touch and reuses =curled?= in its |
2575 definition, showing how embodied predicates can be composed. | |
2570 | 2576 |
2571 #+caption: Program for detecting whether the worm is curled. This is the | 2577 #+caption: Program for detecting whether the worm is curled. This is the |
2572 #+caption: simplest action predicate, because it only uses the last frame | 2578 #+caption: simplest action predicate, because it only uses the last frame |
2573 #+caption: of sensory experience, and only uses proprioceptive data. Even | 2579 #+caption: of sensory experience, and only uses proprioceptive data. Even |
2574 #+caption: this simple predicate, however, is automatically frame | 2580 #+caption: this simple predicate, however, is automatically frame |
2575 #+caption: independent and ignores vermopomorphic differences such as | 2581 #+caption: independent and ignores vermopomorphic \footnote{Like |
2576 #+caption: worm textures and colors. | 2582 #+caption: \emph{anthropomorphic}, except for worms instead of humans.} |
2583 #+caption: differences such as worm textures and colors. | |
2577 #+name: curled | 2584 #+name: curled |
2578 #+begin_listing clojure | 2585 #+begin_listing clojure |
2579 #+begin_src clojure | 2586 #+begin_src clojure |
2580 (defn curled? | 2587 (defn curled? |
2581 "Is the worm curled up?" | 2588 "Is the worm curled up?" |
2733 will work regardless of whether the worm is a different color or | 2740 will work regardless of whether the worm is a different color or |
2734 heavily textured, or if the environment has strange lighting. | 2741 heavily textured, or if the environment has strange lighting. |
2735 | 2742 |
2736 The trick now is to make the action predicates work even when the | 2743 The trick now is to make the action predicates work even when the |
2737 sensory data on which they depend is absent. If I can do that, then | 2744 sensory data on which they depend is absent. If I can do that, then |
2738 I will have gained much, | 2745 I will have gained much. |
2739 | 2746 |
2740 ** \Phi-space describes the worm's experiences | 2747 ** \Phi-space describes the worm's experiences |
2741 | 2748 |
2742 As a first step towards building empathy, I need to gather all of | 2749 As a first step towards building empathy, I need to gather all of |
2743 the worm's experiences during free play. I use a simple vector to | 2750 the worm's experiences during free play. I use a simple vector to |