Mercurial > cortex
comparison thesis/cortex.org @ 437:c1e6b7221b2f
progress on intro.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Sun, 23 Mar 2014 22:20:44 -0400 |
parents | 853377051f1e |
children | 4dcb923c9b16 |
comparison
equal
deleted
inserted
replaced
436:853377051f1e | 437:c1e6b7221b2f |
---|---|
2 #+author: Robert McIntyre | 2 #+author: Robert McIntyre |
3 #+email: rlm@mit.edu | 3 #+email: rlm@mit.edu |
4 #+description: Using embodied AI to facilitate Artificial Imagination. | 4 #+description: Using embodied AI to facilitate Artificial Imagination. |
5 #+keywords: AI, clojure, embodiment | 5 #+keywords: AI, clojure, embodiment |
6 | 6 |
7 * Embodiment is a critical component of Intelligence | 7 |
8 * Empathy and Embodiment as a problem solving strategy | |
9 | |
10 By the end of this thesis, you will have seen a novel approach to | |
11 interpreting video using embodiment and empathy. You will have also | |
12 seen one way to efficiently implement empathy for embodied | |
13 creatures. | |
14 | |
15 The core vision of this thesis is that one of the important ways in | |
16 which we understand others is by imagining ourselves in their | |
17 posistion and empathicaly feeling experiences based on our own past | |
18 experiences and imagination. | |
19 | |
20 By understanding events in terms of our own previous corperal | |
21 experience, we greatly constrain the possibilities of what would | |
22 otherwise be an unweidly exponential search. This extra constraint | |
23 can be the difference between easily understanding what is happening | |
24 in a video and being completely lost in a sea of incomprehensible | |
25 color and movement. | |
8 | 26 |
9 ** Recognizing actions in video is extremely difficult | 27 ** Recognizing actions in video is extremely difficult |
28 | |
29 Consider for example the problem of determining what is happening in | |
30 a video of which this is one frame: | |
31 | |
32 #+caption: A cat drinking some water. Identifying this action is beyond the state of the art for computers. | |
33 #+ATTR_LaTeX: :width 7cm | |
34 [[./images/cat-drinking.jpg]] | |
35 | |
36 It is currently impossible for any computer program to reliably | |
37 label such an video as "drinking". And rightly so -- it is a very | |
38 hard problem! What features can you describe in terms of low level | |
39 functions of pixels that can even begin to describe what is | |
40 happening here? | |
41 | |
42 Or suppose that you are building a program that recognizes | |
43 chairs. How could you ``see'' the chair in the following picture? | |
44 | |
45 #+caption: When you look at this, do you think ``chair''? I certainly do. | |
46 #+ATTR_LaTeX: :width 10cm | |
47 [[./images/invisible-chair.png]] | |
48 | |
49 #+caption: The chair in this image is quite obvious to humans, but I doubt any computer program can find it. | |
50 #+ATTR_LaTeX: :width 10cm | |
51 [[./images/fat-person-sitting-at-desk.jpg]] | |
52 | |
53 | |
54 I think humans are able to label | |
55 such video as "drinking" because they imagine /themselves/ as the | |
56 cat, and imagine putting their face up against a stream of water and | |
57 sticking out their tongue. In that imagined world, they can feel the | |
58 cool water hitting their tongue, and feel the water entering their | |
59 body, and are able to recognize that /feeling/ as drinking. So, the | |
60 label of the action is not really in the pixels of the image, but is | |
61 found clearly in a simulation inspired by those pixels. An | |
62 imaginative system, having been trained on drinking and non-drinking | |
63 examples and learning that the most important component of drinking | |
64 is the feeling of water sliding down one's throat, would analyze a | |
65 video of a cat drinking in the following manner: | |
66 | |
67 - Create a physical model of the video by putting a "fuzzy" model | |
68 of its own body in place of the cat. Also, create a simulation of | |
69 the stream of water. | |
70 | |
71 - Play out this simulated scene and generate imagined sensory | |
72 experience. This will include relevant muscle contractions, a | |
73 close up view of the stream from the cat's perspective, and most | |
74 importantly, the imagined feeling of water entering the mouth. | |
75 | |
76 - The action is now easily identified as drinking by the sense of | |
77 taste alone. The other senses (such as the tongue moving in and | |
78 out) help to give plausibility to the simulated action. Note that | |
79 the sense of vision, while critical in creating the simulation, | |
80 is not critical for identifying the action from the simulation. | |
81 | |
82 | |
83 | |
84 | |
85 | |
86 | |
87 | |
10 cat drinking, mimes, leaning, common sense | 88 cat drinking, mimes, leaning, common sense |
11 | 89 |
12 ** Embodiment is the the right language for the job | 90 ** =EMPATH= neatly solves recognition problems |
91 | |
92 factorization , right language, etc | |
13 | 93 |
14 a new possibility for the question ``what is a chair?'' -- it's the | 94 a new possibility for the question ``what is a chair?'' -- it's the |
15 feeling of your butt on something and your knees bent, with your | 95 feeling of your butt on something and your knees bent, with your |
16 back muscles and legs relaxed. | 96 back muscles and legs relaxed. |
17 | 97 |
18 ** =CORTEX= is a system for exploring embodiment | 98 ** =CORTEX= is a toolkit for building sensate creatures |
19 | 99 |
20 Hand integration demo | 100 Hand integration demo |
21 | 101 |
22 ** =CORTEX= solves recognition problems using empathy | 102 ** Contributions |
23 | |
24 worm empathy demo | |
25 | |
26 ** Overview | |
27 | 103 |
28 * Building =CORTEX= | 104 * Building =CORTEX= |
29 | 105 |
30 ** To explore embodiment, we need a world, body, and senses | 106 ** To explore embodiment, we need a world, body, and senses |
31 | 107 |
53 | 129 |
54 ** Embodiment factors action recognition into managable parts | 130 ** Embodiment factors action recognition into managable parts |
55 | 131 |
56 ** Action recognition is easy with a full gamut of senses | 132 ** Action recognition is easy with a full gamut of senses |
57 | 133 |
58 ** Digression: bootstrapping with multiple senses | 134 ** Digression: bootstrapping touch using free exploration |
59 | 135 |
60 ** \Phi-space describes the worm's experiences | 136 ** \Phi-space describes the worm's experiences |
61 | 137 |
62 ** Empathy is the process of tracing though \Phi-space | 138 ** Empathy is the process of tracing though \Phi-space |
63 | 139 |
67 - Built =CORTEX=, a comprehensive platform for embodied AI | 143 - Built =CORTEX=, a comprehensive platform for embodied AI |
68 experiments. Has many new features lacking in other systems, such | 144 experiments. Has many new features lacking in other systems, such |
69 as sound. Easy to model/create new creatures. | 145 as sound. Easy to model/create new creatures. |
70 - created a novel concept for action recognition by using artificial | 146 - created a novel concept for action recognition by using artificial |
71 imagination. | 147 imagination. |
72 | |
73 * =CORTEX= User Guide | |
74 | |
75 | |
76 | 148 |
77 In the second half of the thesis I develop a computational model of | 149 In the second half of the thesis I develop a computational model of |
78 empathy, using =CORTEX= as a base. Empathy in this context is the | 150 empathy, using =CORTEX= as a base. Empathy in this context is the |
79 ability to observe another creature and infer what sorts of sensations | 151 ability to observe another creature and infer what sorts of sensations |
80 that creature is feeling. My empathy algorithm involves multiple | 152 that creature is feeling. My empathy algorithm involves multiple |
95 language for the job''. For example, it takes only around 5 lines of | 167 language for the job''. For example, it takes only around 5 lines of |
96 LISP code to describe the action of ``curling'' using embodied | 168 LISP code to describe the action of ``curling'' using embodied |
97 primitives. It takes about 8 lines to describe the seemingly | 169 primitives. It takes about 8 lines to describe the seemingly |
98 complicated action of wiggling. | 170 complicated action of wiggling. |
99 | 171 |
172 | |
173 | |
174 * COMMENT names for cortex | |
175 - bioland |