Mercurial > cortex
comparison thesis/cortex.org @ 448:af13fc73e851
completing second part of first chapter.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Tue, 25 Mar 2014 22:54:41 -0400 |
parents | 284316604be0 |
children | 09b7c8dd4365 |
comparison
equal
deleted
inserted
replaced
447:284316604be0 | 448:af13fc73e851 |
---|---|
39 hard problem! What features can you describe in terms of low level | 39 hard problem! What features can you describe in terms of low level |
40 functions of pixels that can even begin to describe at a high level | 40 functions of pixels that can even begin to describe at a high level |
41 what is happening here? | 41 what is happening here? |
42 | 42 |
43 Or suppose that you are building a program that recognizes chairs. | 43 Or suppose that you are building a program that recognizes chairs. |
44 How could you ``see'' the chair in figure \ref{invisible-chair} and | 44 How could you ``see'' the chair in figure \ref{hidden-chair}? |
45 figure \ref{hidden-chair}? | |
46 | |
47 #+caption: When you look at this, do you think ``chair''? I certainly do. | |
48 #+name: invisible-chair | |
49 #+ATTR_LaTeX: :width 10cm | |
50 [[./images/invisible-chair.png]] | |
51 | 45 |
52 #+caption: The chair in this image is quite obvious to humans, but I | 46 #+caption: The chair in this image is quite obvious to humans, but I |
53 #+caption: doubt that any computer program can find it. | 47 #+caption: doubt that any modern computer vision program can find it. |
54 #+name: hidden-chair | 48 #+name: hidden-chair |
55 #+ATTR_LaTeX: :width 10cm | 49 #+ATTR_LaTeX: :width 10cm |
56 [[./images/fat-person-sitting-at-desk.jpg]] | 50 [[./images/fat-person-sitting-at-desk.jpg]] |
57 | 51 |
58 Finally, how is it that you can easily tell the difference between | 52 Finally, how is it that you can easily tell the difference between |
60 | 54 |
61 #+caption: The mysterious ``common sense'' appears here as you are able | 55 #+caption: The mysterious ``common sense'' appears here as you are able |
62 #+caption: to discern the difference in how the girl's arm muscles | 56 #+caption: to discern the difference in how the girl's arm muscles |
63 #+caption: are activated between the two images. | 57 #+caption: are activated between the two images. |
64 #+name: girl | 58 #+name: girl |
65 #+ATTR_LaTeX: :width 10cm | 59 #+ATTR_LaTeX: :width 7cm |
66 [[./images/wall-push.png]] | 60 [[./images/wall-push.png]] |
67 | 61 |
68 Each of these examples tells us something about what might be going | 62 Each of these examples tells us something about what might be going |
69 on in our minds as we easily solve these recognition problems. | 63 on in our minds as we easily solve these recognition problems. |
70 | 64 |
83 | 77 |
84 I propose a system that can express the types of recognition | 78 I propose a system that can express the types of recognition |
85 problems above in a form amenable to computation. It is split into | 79 problems above in a form amenable to computation. It is split into |
86 four parts: | 80 four parts: |
87 | 81 |
88 - Free/Guided Play (Training) :: The creature moves around and | 82 - Free/Guided Play :: The creature moves around and experiences the |
89 experiences the world through its unique perspective. Many | 83 world through its unique perspective. Many otherwise |
90 otherwise complicated actions are easily described in the | 84 complicated actions are easily described in the language of a |
91 language of a full suite of body-centered, rich senses. For | 85 full suite of body-centered, rich senses. For example, |
92 example, drinking is the feeling of water sliding down your | 86 drinking is the feeling of water sliding down your throat, and |
93 throat, and cooling your insides. It's often accompanied by | 87 cooling your insides. It's often accompanied by bringing your |
94 bringing your hand close to your face, or bringing your face | 88 hand close to your face, or bringing your face close to water. |
95 close to water. Sitting down is the feeling of bending your | 89 Sitting down is the feeling of bending your knees, activating |
96 knees, activating your quadriceps, then feeling a surface with | 90 your quadriceps, then feeling a surface with your bottom and |
97 your bottom and relaxing your legs. These body-centered action | 91 relaxing your legs. These body-centered action descriptions |
98 descriptions can be either learned or hard coded. | 92 can be either learned or hard coded. |
99 - Alignment (Posture imitation) :: When trying to interpret a video | 93 - Posture Imitation :: When trying to interpret a video or image, |
100 or image, the creature takes a model of itself and aligns it | 94 the creature takes a model of itself and aligns it with |
101 with whatever it sees. This alignment can even cross species, | 95 whatever it sees. This alignment can even cross species, as |
102 as when humans try to align themselves with things like | 96 when humans try to align themselves with things like ponies, |
103 ponies, dogs, or other humans with a different body type. | 97 dogs, or other humans with a different body type. |
104 - Empathy (Sensory extrapolation) :: The alignment triggers | 98 - Empathy :: The alignment triggers associations with |
105 associations with sensory data from prior experiences. For | 99 sensory data from prior experiences. For example, the |
106 example, the alignment itself easily maps to proprioceptive | 100 alignment itself easily maps to proprioceptive data. Any |
107 data. Any sounds or obvious skin contact in the video can to a | 101 sounds or obvious skin contact in the video can to a lesser |
108 lesser extent trigger previous experience. Segments of | 102 extent trigger previous experience. Segments of previous |
109 previous experiences are stitched together to form a coherent | 103 experiences are stitched together to form a coherent and |
110 and complete sensory portrait of the scene. | 104 complete sensory portrait of the scene. |
111 - Recognition (Classification) :: With the scene described in terms | 105 - Recognition :: With the scene described in terms of first |
112 of first person sensory events, the creature can now run its | 106 person sensory events, the creature can now run its |
113 action-identification programs on this synthesized sensory | 107 action-identification programs on this synthesized sensory |
114 data, just as it would if it were actually experiencing the | 108 data, just as it would if it were actually experiencing the |
115 scene first-hand. If previous experience has been accurately | 109 scene first-hand. If previous experience has been accurately |
116 retrieved, and if it is analogous enough to the scene, then | 110 retrieved, and if it is analogous enough to the scene, then |
117 the creature will correctly identify the action in the scene. | 111 the creature will correctly identify the action in the scene. |
191 factors the action recognition problem into two easier problems. To | 185 factors the action recognition problem into two easier problems. To |
192 use empathy, you need an /aligner/, which takes the video and a | 186 use empathy, you need an /aligner/, which takes the video and a |
193 model of your body, and aligns the model with the video. Then, you | 187 model of your body, and aligns the model with the video. Then, you |
194 need a /recognizer/, which uses the aligned model to interpret the | 188 need a /recognizer/, which uses the aligned model to interpret the |
195 action. The power in this method lies in the fact that you describe | 189 action. The power in this method lies in the fact that you describe |
196 all actions form a body-centered, viewpoint You are less tied to | 190 all actions form a body-centered viewpoint. You are less tied to |
197 the particulars of any visual representation of the actions. If you | 191 the particulars of any visual representation of the actions. If you |
198 teach the system what ``running'' is, and you have a good enough | 192 teach the system what ``running'' is, and you have a good enough |
199 aligner, the system will from then on be able to recognize running | 193 aligner, the system will from then on be able to recognize running |
200 from any point of view, even strange points of view like above or | 194 from any point of view, even strange points of view like above or |
201 underneath the runner. This is in contrast to action recognition | 195 underneath the runner. This is in contrast to action recognition |
202 schemes that try to identify actions using a non-embodied approach | 196 schemes that try to identify actions using a non-embodied approach. |
203 such as TODO:REFERENCE. If these systems learn about running as | 197 If these systems learn about running as viewed from the side, they |
204 viewed from the side, they will not automatically be able to | 198 will not automatically be able to recognize running from any other |
205 recognize running from any other viewpoint. | 199 viewpoint. |
206 | 200 |
207 Another powerful advantage is that using the language of multiple | 201 Another powerful advantage is that using the language of multiple |
208 body-centered rich senses to describe body-centerd actions offers a | 202 body-centered rich senses to describe body-centerd actions offers a |
209 massive boost in descriptive capability. Consider how difficult it | 203 massive boost in descriptive capability. Consider how difficult it |
210 would be to compose a set of HOG filters to describe the action of | 204 would be to compose a set of HOG filters to describe the action of |
232 #+end_listing | 226 #+end_listing |
233 | 227 |
234 | 228 |
235 ** =CORTEX= is a toolkit for building sensate creatures | 229 ** =CORTEX= is a toolkit for building sensate creatures |
236 | 230 |
237 Hand integration demo | 231 I built =CORTEX= to be a general AI research platform for doing |
238 | 232 experiments involving multiple rich senses and a wide variety and |
233 number of creatures. I intend it to be useful as a library for many | |
234 more projects than just this one. =CORTEX= was necessary to meet a | |
235 need among AI researchers at CSAIL and beyond, which is that people | |
236 often will invent neat ideas that are best expressed in the | |
237 language of creatures and senses, but in order to explore those | |
238 ideas they must first build a platform in which they can create | |
239 simulated creatures with rich senses! There are many ideas that | |
240 would be simple to execute (such as =EMPATH=), but attached to them | |
241 is the multi-month effort to make a good creature simulator. Often, | |
242 that initial investment of time proves to be too much, and the | |
243 project must make do with a lesser environment. | |
244 | |
245 =CORTEX= is well suited as an environment for embodied AI research | |
246 for three reasons: | |
247 | |
248 - You can create new creatures using Blender, a popular 3D modeling | |
249 program. Each sense can be specified using special blender nodes | |
250 with biologically inspired paramaters. You need not write any | |
251 code to create a creature, and can use a wide library of | |
252 pre-existing blender models as a base for your own creatures. | |
253 | |
254 - =CORTEX= implements a wide variety of senses, including touch, | |
255 proprioception, vision, hearing, and muscle tension. Complicated | |
256 senses like touch, and vision involve multiple sensory elements | |
257 embedded in a 2D surface. You have complete control over the | |
258 distribution of these sensor elements through the use of simple | |
259 png image files. In particular, =CORTEX= implements more | |
260 comprehensive hearing than any other creature simulation system | |
261 available. | |
262 | |
263 - =CORTEX= supports any number of creatures and any number of | |
264 senses. Time in =CORTEX= dialates so that the simulated creatures | |
265 always precieve a perfectly smooth flow of time, regardless of | |
266 the actual computational load. | |
267 | |
268 =CORTEX= is built on top of =jMonkeyEngine3=, which is a video game | |
269 engine designed to create cross-platform 3D desktop games. =CORTEX= | |
270 is mainly written in clojure, a dialect of =LISP= that runs on the | |
271 java virtual machine (JVM). The API for creating and simulating | |
272 creatures is entirely expressed in clojure. Hearing is implemented | |
273 as a layer of clojure code on top of a layer of java code on top of | |
274 a layer of =C++= code which implements a modified version of | |
275 =OpenAL= to support multiple listeners. =CORTEX= is the only | |
276 simulation environment that I know of that can support multiple | |
277 entities that can each hear the world from their own perspective. | |
278 Other senses also require a small layer of Java code. =CORTEX= also | |
279 uses =bullet=, a physics simulator written in =C=. | |
280 | |
281 #+caption: Here is the worm from above modeled in Blender, a free | |
282 #+caption: 3D-modeling program. Senses and joints are described | |
283 #+caption: using special nodes in Blender. | |
284 #+name: worm-recognition-intro | |
285 #+ATTR_LaTeX: :width 12cm | |
286 [[./images/blender-worm.png]] | |
287 | |
288 During one test with =CORTEX=, I created 3,000 entities each with | |
289 their own independent senses and ran them all at only 1/80 real | |
290 time. In another test, I created a detailed model of my own hand, | |
291 equipped with a realistic distribution of touch (more sensitive at | |
292 the fingertips), as well as eyes and ears, and it ran at around 1/4 | |
293 real time. | |
294 | |
295 #+caption: Here is the worm from above modeled in Blender, a free | |
296 #+caption: 3D-modeling program. Senses and joints are described | |
297 #+caption: using special nodes in Blender. | |
298 #+name: worm-recognition-intro | |
299 #+ATTR_LaTeX: :width 15cm | |
300 [[./images/full-hand.png]] | |
301 | |
302 | |
303 | |
304 | |
305 | |
239 ** Contributions | 306 ** Contributions |
240 | 307 |
241 * Building =CORTEX= | 308 * Building =CORTEX= |
242 | 309 |
243 ** To explore embodiment, we need a world, body, and senses | 310 ** To explore embodiment, we need a world, body, and senses |