comparison thesis/cortex.org @ 448:af13fc73e851

completing second part of first chapter.
author Robert McIntyre <rlm@mit.edu>
date Tue, 25 Mar 2014 22:54:41 -0400
parents 284316604be0
children 09b7c8dd4365
comparison
equal deleted inserted replaced
447:284316604be0 448:af13fc73e851
39 hard problem! What features can you describe in terms of low level 39 hard problem! What features can you describe in terms of low level
40 functions of pixels that can even begin to describe at a high level 40 functions of pixels that can even begin to describe at a high level
41 what is happening here? 41 what is happening here?
42 42
43 Or suppose that you are building a program that recognizes chairs. 43 Or suppose that you are building a program that recognizes chairs.
44 How could you ``see'' the chair in figure \ref{invisible-chair} and 44 How could you ``see'' the chair in figure \ref{hidden-chair}?
45 figure \ref{hidden-chair}?
46
47 #+caption: When you look at this, do you think ``chair''? I certainly do.
48 #+name: invisible-chair
49 #+ATTR_LaTeX: :width 10cm
50 [[./images/invisible-chair.png]]
51 45
52 #+caption: The chair in this image is quite obvious to humans, but I 46 #+caption: The chair in this image is quite obvious to humans, but I
53 #+caption: doubt that any computer program can find it. 47 #+caption: doubt that any modern computer vision program can find it.
54 #+name: hidden-chair 48 #+name: hidden-chair
55 #+ATTR_LaTeX: :width 10cm 49 #+ATTR_LaTeX: :width 10cm
56 [[./images/fat-person-sitting-at-desk.jpg]] 50 [[./images/fat-person-sitting-at-desk.jpg]]
57 51
58 Finally, how is it that you can easily tell the difference between 52 Finally, how is it that you can easily tell the difference between
60 54
61 #+caption: The mysterious ``common sense'' appears here as you are able 55 #+caption: The mysterious ``common sense'' appears here as you are able
62 #+caption: to discern the difference in how the girl's arm muscles 56 #+caption: to discern the difference in how the girl's arm muscles
63 #+caption: are activated between the two images. 57 #+caption: are activated between the two images.
64 #+name: girl 58 #+name: girl
65 #+ATTR_LaTeX: :width 10cm 59 #+ATTR_LaTeX: :width 7cm
66 [[./images/wall-push.png]] 60 [[./images/wall-push.png]]
67 61
68 Each of these examples tells us something about what might be going 62 Each of these examples tells us something about what might be going
69 on in our minds as we easily solve these recognition problems. 63 on in our minds as we easily solve these recognition problems.
70 64
83 77
84 I propose a system that can express the types of recognition 78 I propose a system that can express the types of recognition
85 problems above in a form amenable to computation. It is split into 79 problems above in a form amenable to computation. It is split into
86 four parts: 80 four parts:
87 81
88 - Free/Guided Play (Training) :: The creature moves around and 82 - Free/Guided Play :: The creature moves around and experiences the
89 experiences the world through its unique perspective. Many 83 world through its unique perspective. Many otherwise
90 otherwise complicated actions are easily described in the 84 complicated actions are easily described in the language of a
91 language of a full suite of body-centered, rich senses. For 85 full suite of body-centered, rich senses. For example,
92 example, drinking is the feeling of water sliding down your 86 drinking is the feeling of water sliding down your throat, and
93 throat, and cooling your insides. It's often accompanied by 87 cooling your insides. It's often accompanied by bringing your
94 bringing your hand close to your face, or bringing your face 88 hand close to your face, or bringing your face close to water.
95 close to water. Sitting down is the feeling of bending your 89 Sitting down is the feeling of bending your knees, activating
96 knees, activating your quadriceps, then feeling a surface with 90 your quadriceps, then feeling a surface with your bottom and
97 your bottom and relaxing your legs. These body-centered action 91 relaxing your legs. These body-centered action descriptions
98 descriptions can be either learned or hard coded. 92 can be either learned or hard coded.
99 - Alignment (Posture imitation) :: When trying to interpret a video 93 - Posture Imitation :: When trying to interpret a video or image,
100 or image, the creature takes a model of itself and aligns it 94 the creature takes a model of itself and aligns it with
101 with whatever it sees. This alignment can even cross species, 95 whatever it sees. This alignment can even cross species, as
102 as when humans try to align themselves with things like 96 when humans try to align themselves with things like ponies,
103 ponies, dogs, or other humans with a different body type. 97 dogs, or other humans with a different body type.
104 - Empathy (Sensory extrapolation) :: The alignment triggers 98 - Empathy :: The alignment triggers associations with
105 associations with sensory data from prior experiences. For 99 sensory data from prior experiences. For example, the
106 example, the alignment itself easily maps to proprioceptive 100 alignment itself easily maps to proprioceptive data. Any
107 data. Any sounds or obvious skin contact in the video can to a 101 sounds or obvious skin contact in the video can to a lesser
108 lesser extent trigger previous experience. Segments of 102 extent trigger previous experience. Segments of previous
109 previous experiences are stitched together to form a coherent 103 experiences are stitched together to form a coherent and
110 and complete sensory portrait of the scene. 104 complete sensory portrait of the scene.
111 - Recognition (Classification) :: With the scene described in terms 105 - Recognition :: With the scene described in terms of first
112 of first person sensory events, the creature can now run its 106 person sensory events, the creature can now run its
113 action-identification programs on this synthesized sensory 107 action-identification programs on this synthesized sensory
114 data, just as it would if it were actually experiencing the 108 data, just as it would if it were actually experiencing the
115 scene first-hand. If previous experience has been accurately 109 scene first-hand. If previous experience has been accurately
116 retrieved, and if it is analogous enough to the scene, then 110 retrieved, and if it is analogous enough to the scene, then
117 the creature will correctly identify the action in the scene. 111 the creature will correctly identify the action in the scene.
191 factors the action recognition problem into two easier problems. To 185 factors the action recognition problem into two easier problems. To
192 use empathy, you need an /aligner/, which takes the video and a 186 use empathy, you need an /aligner/, which takes the video and a
193 model of your body, and aligns the model with the video. Then, you 187 model of your body, and aligns the model with the video. Then, you
194 need a /recognizer/, which uses the aligned model to interpret the 188 need a /recognizer/, which uses the aligned model to interpret the
195 action. The power in this method lies in the fact that you describe 189 action. The power in this method lies in the fact that you describe
196 all actions form a body-centered, viewpoint You are less tied to 190 all actions form a body-centered viewpoint. You are less tied to
197 the particulars of any visual representation of the actions. If you 191 the particulars of any visual representation of the actions. If you
198 teach the system what ``running'' is, and you have a good enough 192 teach the system what ``running'' is, and you have a good enough
199 aligner, the system will from then on be able to recognize running 193 aligner, the system will from then on be able to recognize running
200 from any point of view, even strange points of view like above or 194 from any point of view, even strange points of view like above or
201 underneath the runner. This is in contrast to action recognition 195 underneath the runner. This is in contrast to action recognition
202 schemes that try to identify actions using a non-embodied approach 196 schemes that try to identify actions using a non-embodied approach.
203 such as TODO:REFERENCE. If these systems learn about running as 197 If these systems learn about running as viewed from the side, they
204 viewed from the side, they will not automatically be able to 198 will not automatically be able to recognize running from any other
205 recognize running from any other viewpoint. 199 viewpoint.
206 200
207 Another powerful advantage is that using the language of multiple 201 Another powerful advantage is that using the language of multiple
208 body-centered rich senses to describe body-centerd actions offers a 202 body-centered rich senses to describe body-centerd actions offers a
209 massive boost in descriptive capability. Consider how difficult it 203 massive boost in descriptive capability. Consider how difficult it
210 would be to compose a set of HOG filters to describe the action of 204 would be to compose a set of HOG filters to describe the action of
232 #+end_listing 226 #+end_listing
233 227
234 228
235 ** =CORTEX= is a toolkit for building sensate creatures 229 ** =CORTEX= is a toolkit for building sensate creatures
236 230
237 Hand integration demo 231 I built =CORTEX= to be a general AI research platform for doing
238 232 experiments involving multiple rich senses and a wide variety and
233 number of creatures. I intend it to be useful as a library for many
234 more projects than just this one. =CORTEX= was necessary to meet a
235 need among AI researchers at CSAIL and beyond, which is that people
236 often will invent neat ideas that are best expressed in the
237 language of creatures and senses, but in order to explore those
238 ideas they must first build a platform in which they can create
239 simulated creatures with rich senses! There are many ideas that
240 would be simple to execute (such as =EMPATH=), but attached to them
241 is the multi-month effort to make a good creature simulator. Often,
242 that initial investment of time proves to be too much, and the
243 project must make do with a lesser environment.
244
245 =CORTEX= is well suited as an environment for embodied AI research
246 for three reasons:
247
248 - You can create new creatures using Blender, a popular 3D modeling
249 program. Each sense can be specified using special blender nodes
250 with biologically inspired paramaters. You need not write any
251 code to create a creature, and can use a wide library of
252 pre-existing blender models as a base for your own creatures.
253
254 - =CORTEX= implements a wide variety of senses, including touch,
255 proprioception, vision, hearing, and muscle tension. Complicated
256 senses like touch, and vision involve multiple sensory elements
257 embedded in a 2D surface. You have complete control over the
258 distribution of these sensor elements through the use of simple
259 png image files. In particular, =CORTEX= implements more
260 comprehensive hearing than any other creature simulation system
261 available.
262
263 - =CORTEX= supports any number of creatures and any number of
264 senses. Time in =CORTEX= dialates so that the simulated creatures
265 always precieve a perfectly smooth flow of time, regardless of
266 the actual computational load.
267
268 =CORTEX= is built on top of =jMonkeyEngine3=, which is a video game
269 engine designed to create cross-platform 3D desktop games. =CORTEX=
270 is mainly written in clojure, a dialect of =LISP= that runs on the
271 java virtual machine (JVM). The API for creating and simulating
272 creatures is entirely expressed in clojure. Hearing is implemented
273 as a layer of clojure code on top of a layer of java code on top of
274 a layer of =C++= code which implements a modified version of
275 =OpenAL= to support multiple listeners. =CORTEX= is the only
276 simulation environment that I know of that can support multiple
277 entities that can each hear the world from their own perspective.
278 Other senses also require a small layer of Java code. =CORTEX= also
279 uses =bullet=, a physics simulator written in =C=.
280
281 #+caption: Here is the worm from above modeled in Blender, a free
282 #+caption: 3D-modeling program. Senses and joints are described
283 #+caption: using special nodes in Blender.
284 #+name: worm-recognition-intro
285 #+ATTR_LaTeX: :width 12cm
286 [[./images/blender-worm.png]]
287
288 During one test with =CORTEX=, I created 3,000 entities each with
289 their own independent senses and ran them all at only 1/80 real
290 time. In another test, I created a detailed model of my own hand,
291 equipped with a realistic distribution of touch (more sensitive at
292 the fingertips), as well as eyes and ears, and it ran at around 1/4
293 real time.
294
295 #+caption: Here is the worm from above modeled in Blender, a free
296 #+caption: 3D-modeling program. Senses and joints are described
297 #+caption: using special nodes in Blender.
298 #+name: worm-recognition-intro
299 #+ATTR_LaTeX: :width 15cm
300 [[./images/full-hand.png]]
301
302
303
304
305
239 ** Contributions 306 ** Contributions
240 307
241 * Building =CORTEX= 308 * Building =CORTEX=
242 309
243 ** To explore embodiment, we need a world, body, and senses 310 ** To explore embodiment, we need a world, body, and senses