Mercurial > cortex
comparison thesis/cortex.org @ 517:68665d2c32a7
spellcheck; almost done with first draft!
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Mon, 31 Mar 2014 00:18:26 -0400 |
parents | ced955c3c84f |
children | d78f5102d693 |
comparison
equal
deleted
inserted
replaced
516:ced955c3c84f | 517:68665d2c32a7 |
---|---|
57 corporeal experience, we greatly constrain the possibilities of what | 57 corporeal experience, we greatly constrain the possibilities of what |
58 would otherwise be an unwieldy exponential search. This extra | 58 would otherwise be an unwieldy exponential search. This extra |
59 constraint can be the difference between easily understanding what | 59 constraint can be the difference between easily understanding what |
60 is happening in a video and being completely lost in a sea of | 60 is happening in a video and being completely lost in a sea of |
61 incomprehensible color and movement. | 61 incomprehensible color and movement. |
62 | |
63 | 62 |
64 ** The problem: recognizing actions in video is hard! | 63 ** The problem: recognizing actions in video is hard! |
65 | 64 |
66 Examine the following image. What is happening? As you, and indeed | 65 Examine the following image. What is happening? As you, and indeed |
67 very young children, can easily determine, this is an image of | 66 very young children, can easily determine, this is an image of |
75 Nevertheless, it is beyond the state of the art for a computer | 74 Nevertheless, it is beyond the state of the art for a computer |
76 vision program to describe what's happening in this image. Part of | 75 vision program to describe what's happening in this image. Part of |
77 the problem is that many computer vision systems focus on | 76 the problem is that many computer vision systems focus on |
78 pixel-level details or comparisons to example images (such as | 77 pixel-level details or comparisons to example images (such as |
79 \cite{volume-action-recognition}), but the 3D world is so variable | 78 \cite{volume-action-recognition}), but the 3D world is so variable |
80 that it is hard to descrive the world in terms of possible images. | 79 that it is hard to describe the world in terms of possible images. |
81 | 80 |
82 In fact, the contents of scene may have much less to do with pixel | 81 In fact, the contents of scene may have much less to do with pixel |
83 probabilities than with recognizing various affordances: things you | 82 probabilities than with recognizing various affordances: things you |
84 can move, objects you can grasp, spaces that can be filled . For | 83 can move, objects you can grasp, spaces that can be filled . For |
85 example, what processes might enable you to see the chair in figure | 84 example, what processes might enable you to see the chair in figure |
100 #+name: girl | 99 #+name: girl |
101 #+ATTR_LaTeX: :width 7cm | 100 #+ATTR_LaTeX: :width 7cm |
102 [[./images/wall-push.png]] | 101 [[./images/wall-push.png]] |
103 | 102 |
104 Each of these examples tells us something about what might be going | 103 Each of these examples tells us something about what might be going |
105 on in our minds as we easily solve these recognition problems. | 104 on in our minds as we easily solve these recognition problems: |
106 | 105 |
107 The hidden chair shows us that we are strongly triggered by cues | 106 The hidden chair shows us that we are strongly triggered by cues |
108 relating to the position of human bodies, and that we can determine | 107 relating to the position of human bodies, and that we can determine |
109 the overall physical configuration of a human body even if much of | 108 the overall physical configuration of a human body even if much of |
110 that body is occluded. | 109 that body is occluded. |
112 The picture of the girl pushing against the wall tells us that we | 111 The picture of the girl pushing against the wall tells us that we |
113 have common sense knowledge about the kinetics of our own bodies. | 112 have common sense knowledge about the kinetics of our own bodies. |
114 We know well how our muscles would have to work to maintain us in | 113 We know well how our muscles would have to work to maintain us in |
115 most positions, and we can easily project this self-knowledge to | 114 most positions, and we can easily project this self-knowledge to |
116 imagined positions triggered by images of the human body. | 115 imagined positions triggered by images of the human body. |
116 | |
117 The cat tells us that imagination of some kind plays an important | |
118 role in understanding actions. The question is: Can we be more | |
119 precise about what sort of imagination is required to understand | |
120 these actions? | |
117 | 121 |
118 ** A step forward: the sensorimotor-centered approach | 122 ** A step forward: the sensorimotor-centered approach |
119 | 123 |
120 In this thesis, I explore the idea that our knowledge of our own | 124 In this thesis, I explore the idea that our knowledge of our own |
121 bodies, combined with our own rich senses, enables us to recognize | 125 bodies, combined with our own rich senses, enables us to recognize |
137 | 141 |
138 1. Create a physical model of the video by putting a ``fuzzy'' | 142 1. Create a physical model of the video by putting a ``fuzzy'' |
139 model of its own body in place of the cat. Possibly also create | 143 model of its own body in place of the cat. Possibly also create |
140 a simulation of the stream of water. | 144 a simulation of the stream of water. |
141 | 145 |
142 2. Play out this simulated scene and generate imagined sensory | 146 2. ``Play out'' this simulated scene and generate imagined sensory |
143 experience. This will include relevant muscle contractions, a | 147 experience. This will include relevant muscle contractions, a |
144 close up view of the stream from the cat's perspective, and most | 148 close up view of the stream from the cat's perspective, and most |
145 importantly, the imagined feeling of water entering the | 149 importantly, the imagined feeling of water entering the mouth. |
146 mouth. The imagined sensory experience can come from a | 150 The imagined sensory experience can come from a simulation of |
147 simulation of the event, but can also be pattern-matched from | 151 the event, but can also be pattern-matched from previous, |
148 previous, similar embodied experience. | 152 similar embodied experience. |
149 | 153 |
150 3. The action is now easily identified as drinking by the sense of | 154 3. The action is now easily identified as drinking by the sense of |
151 taste alone. The other senses (such as the tongue moving in and | 155 taste alone. The other senses (such as the tongue moving in and |
152 out) help to give plausibility to the simulated action. Note that | 156 out) help to give plausibility to the simulated action. Note that |
153 the sense of vision, while critical in creating the simulation, | 157 the sense of vision, while critical in creating the simulation, |
158 1. Align a model of your body to the person in the image. | 162 1. Align a model of your body to the person in the image. |
159 | 163 |
160 2. Generate proprioceptive sensory data from this alignment. | 164 2. Generate proprioceptive sensory data from this alignment. |
161 | 165 |
162 3. Use the imagined proprioceptive data as a key to lookup related | 166 3. Use the imagined proprioceptive data as a key to lookup related |
163 sensory experience associated with that particular proproceptive | 167 sensory experience associated with that particular proprioceptive |
164 feeling. | 168 feeling. |
165 | 169 |
166 4. Retrieve the feeling of your bottom resting on a surface, your | 170 4. Retrieve the feeling of your bottom resting on a surface, your |
167 knees bent, and your leg muscles relaxed. | 171 knees bent, and your leg muscles relaxed. |
168 | 172 |
192 If these systems learn about running as viewed from the side, they | 196 If these systems learn about running as viewed from the side, they |
193 will not automatically be able to recognize running from any other | 197 will not automatically be able to recognize running from any other |
194 viewpoint. | 198 viewpoint. |
195 | 199 |
196 Another powerful advantage is that using the language of multiple | 200 Another powerful advantage is that using the language of multiple |
197 body-centered rich senses to describe body-centerd actions offers a | 201 body-centered rich senses to describe body-centered actions offers a |
198 massive boost in descriptive capability. Consider how difficult it | 202 massive boost in descriptive capability. Consider how difficult it |
199 would be to compose a set of HOG filters to describe the action of | 203 would be to compose a set of HOG filters to describe the action of |
200 a simple worm-creature ``curling'' so that its head touches its | 204 a simple worm-creature ``curling'' so that its head touches its |
201 tail, and then behold the simplicity of describing thus action in a | 205 tail, and then behold the simplicity of describing thus action in a |
202 language designed for the task (listing \ref{grand-circle-intro}): | 206 language designed for the task (listing \ref{grand-circle-intro}): |
203 | 207 |
204 #+caption: Body-centerd actions are best expressed in a body-centered | 208 #+caption: Body-centered actions are best expressed in a body-centered |
205 #+caption: language. This code detects when the worm has curled into a | 209 #+caption: language. This code detects when the worm has curled into a |
206 #+caption: full circle. Imagine how you would replicate this functionality | 210 #+caption: full circle. Imagine how you would replicate this functionality |
207 #+caption: using low-level pixel features such as HOG filters! | 211 #+caption: using low-level pixel features such as HOG filters! |
208 #+name: grand-circle-intro | 212 #+name: grand-circle-intro |
209 #+begin_listing clojure | 213 #+begin_listing clojure |
218 (and (< 0.2 (contact worm-segment-bottom-tip tail-touch)) | 222 (and (< 0.2 (contact worm-segment-bottom-tip tail-touch)) |
219 (< 0.2 (contact worm-segment-top-tip head-touch)))))) | 223 (< 0.2 (contact worm-segment-top-tip head-touch)))))) |
220 #+end_src | 224 #+end_src |
221 #+end_listing | 225 #+end_listing |
222 | 226 |
223 ** =EMPATH= regognizes actions using empathy | 227 ** =EMPATH= recognizes actions using empathy |
224 | 228 |
225 First, I built a system for constructing virtual creatures with | 229 Exploring these ideas further demands a concrete implementation, so |
230 first, I built a system for constructing virtual creatures with | |
226 physiologically plausible sensorimotor systems and detailed | 231 physiologically plausible sensorimotor systems and detailed |
227 environments. The result is =CORTEX=, which is described in section | 232 environments. The result is =CORTEX=, which is described in section |
228 \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other | 233 \ref{sec-2}. |
229 AI researchers; it is provided in full with detailed instructions | |
230 on the web [here].) | |
231 | 234 |
232 Next, I wrote routines which enabled a simple worm-like creature to | 235 Next, I wrote routines which enabled a simple worm-like creature to |
233 infer the actions of a second worm-like creature, using only its | 236 infer the actions of a second worm-like creature, using only its |
234 own prior sensorimotor experiences and knowledge of the second | 237 own prior sensorimotor experiences and knowledge of the second |
235 worm's joint positions. This program, =EMPATH=, is described in | 238 worm's joint positions. This program, =EMPATH=, is described in |
236 section \ref{sec-3}, and the key results of this experiment are | 239 section \ref{sec-3}. It's main components are: |
237 summarized below. | 240 |
238 | 241 - Embodied Action Definitions :: Many otherwise complicated actions |
239 I have built a system that can express the types of recognition | 242 are easily described in the language of a full suite of |
240 problems in a form amenable to computation. It is split into | 243 body-centered, rich senses and experiences. For example, |
241 four parts: | |
242 | |
243 - Free/Guided Play :: The creature moves around and experiences the | |
244 world through its unique perspective. Many otherwise | |
245 complicated actions are easily described in the language of a | |
246 full suite of body-centered, rich senses. For example, | |
247 drinking is the feeling of water sliding down your throat, and | 244 drinking is the feeling of water sliding down your throat, and |
248 cooling your insides. It's often accompanied by bringing your | 245 cooling your insides. It's often accompanied by bringing your |
249 hand close to your face, or bringing your face close to water. | 246 hand close to your face, or bringing your face close to water. |
250 Sitting down is the feeling of bending your knees, activating | 247 Sitting down is the feeling of bending your knees, activating |
251 your quadriceps, then feeling a surface with your bottom and | 248 your quadriceps, then feeling a surface with your bottom and |
252 relaxing your legs. These body-centered action descriptions | 249 relaxing your legs. These body-centered action descriptions |
253 can be either learned or hard coded. | 250 can be either learned or hard coded. |
254 - Posture Imitation :: When trying to interpret a video or image, | 251 |
252 - Guided Play :: The creature moves around and experiences the | |
253 world through its unique perspective. As the creature moves, | |
254 it gathers experiences that satisfy the embodied action | |
255 definitions. | |
256 | |
257 - Posture imitation :: When trying to interpret a video or image, | |
255 the creature takes a model of itself and aligns it with | 258 the creature takes a model of itself and aligns it with |
256 whatever it sees. This alignment can even cross species, as | 259 whatever it sees. This alignment might even cross species, as |
257 when humans try to align themselves with things like ponies, | 260 when humans try to align themselves with things like ponies, |
258 dogs, or other humans with a different body type. | 261 dogs, or other humans with a different body type. |
259 - Empathy :: The alignment triggers associations with | 262 |
263 - Empathy :: The alignment triggers associations with | |
260 sensory data from prior experiences. For example, the | 264 sensory data from prior experiences. For example, the |
261 alignment itself easily maps to proprioceptive data. Any | 265 alignment itself easily maps to proprioceptive data. Any |
262 sounds or obvious skin contact in the video can to a lesser | 266 sounds or obvious skin contact in the video can to a lesser |
263 extent trigger previous experience. Segments of previous | 267 extent trigger previous experience keyed to hearing or touch. |
264 experiences are stitched together to form a coherent and | 268 Segments of previous experiences gained from play are stitched |
265 complete sensory portrait of the scene. | 269 together to form a coherent and complete sensory portrait of |
266 - Recognition :: With the scene described in terms of first | 270 the scene. |
267 person sensory events, the creature can now run its | 271 |
268 action-identification programs on this synthesized sensory | 272 - Recognition :: With the scene described in terms of |
269 data, just as it would if it were actually experiencing the | 273 remembered first person sensory events, the creature can now |
270 scene first-hand. If previous experience has been accurately | 274 run its action-identified programs (such as the one in listing |
275 \ref{grand-circle-intro} on this synthesized sensory data, | |
276 just as it would if it were actually experiencing the scene | |
277 first-hand. If previous experience has been accurately | |
271 retrieved, and if it is analogous enough to the scene, then | 278 retrieved, and if it is analogous enough to the scene, then |
272 the creature will correctly identify the action in the scene. | 279 the creature will correctly identify the action in the scene. |
273 | |
274 | 280 |
275 My program, =EMPATH= uses this empathic problem solving technique | 281 My program, =EMPATH= uses this empathic problem solving technique |
276 to interpret the actions of a simple, worm-like creature. | 282 to interpret the actions of a simple, worm-like creature. |
277 | 283 |
278 #+caption: The worm performs many actions during free play such as | 284 #+caption: The worm performs many actions during free play such as |
285 #+caption: poses by inferring the complete sensory experience | 291 #+caption: poses by inferring the complete sensory experience |
286 #+caption: from proprioceptive data. | 292 #+caption: from proprioceptive data. |
287 #+name: worm-recognition-intro | 293 #+name: worm-recognition-intro |
288 #+ATTR_LaTeX: :width 15cm | 294 #+ATTR_LaTeX: :width 15cm |
289 [[./images/worm-poses.png]] | 295 [[./images/worm-poses.png]] |
290 | |
291 #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer | |
292 #+caption: the complete sensory experience and classify these four poses. | |
293 #+caption: The last image is a composite, depicting the intermediate stages | |
294 #+caption: of \emph{wriggling}. | |
295 #+name: worm-recognition-intro-2 | |
296 #+ATTR_LaTeX: :width 15cm | |
297 [[./images/empathy-1.png]] | |
298 | 296 |
299 Next, I developed an experiment to test the power of =CORTEX='s | 297 *** Main Results |
300 sensorimotor-centered language for solving recognition problems. As | 298 |
301 a proof of concept, I wrote routines which enabled a simple | 299 - After one-shot supervised training, =EMPATH= was able recognize a |
302 worm-like creature to infer the actions of a second worm-like | 300 wide variety of static poses and dynamic actions---ranging from |
303 creature, using only its own previous sensorimotor experiences and | 301 curling in a circle to wiggling with a particular frequency --- |
304 knowledge of the second worm's joints (figure | 302 with 95\% accuracy. |
305 \ref{worm-recognition-intro-2}). The result of this proof of | 303 |
306 concept was the program =EMPATH=, described in section \ref{sec-3}. | 304 - These results were completely independent of viewing angle |
307 | 305 because the underlying body-centered language fundamentally is |
308 ** =EMPATH= is built on =CORTEX=, en environment for making creatures. | 306 independent; once an action is learned, it can be recognized |
309 | 307 equally well from any viewing angle. |
310 # =CORTEX= provides a language for describing the sensorimotor | 308 |
311 # experiences of various creatures. | 309 - =EMPATH= is surprisingly short; the sensorimotor-centered |
310 language provided by =CORTEX= resulted in extremely economical | |
311 recognition routines --- about 500 lines in all --- suggesting | |
312 that such representations are very powerful, and often | |
313 indispensable for the types of recognition tasks considered here. | |
314 | |
315 - Although for expediency's sake, I relied on direct knowledge of | |
316 joint positions in this proof of concept, it would be | |
317 straightforward to extend =EMPATH= so that it (more | |
318 realistically) infers joint positions from its visual data. | |
319 | |
320 ** =EMPATH= is built on =CORTEX=, a creature builder. | |
312 | 321 |
313 I built =CORTEX= to be a general AI research platform for doing | 322 I built =CORTEX= to be a general AI research platform for doing |
314 experiments involving multiple rich senses and a wide variety and | 323 experiments involving multiple rich senses and a wide variety and |
315 number of creatures. I intend it to be useful as a library for many | 324 number of creatures. I intend it to be useful as a library for many |
316 more projects than just this thesis. =CORTEX= was necessary to meet | 325 more projects than just this thesis. =CORTEX= was necessary to meet |
317 a need among AI researchers at CSAIL and beyond, which is that | 326 a need among AI researchers at CSAIL and beyond, which is that |
318 people often will invent neat ideas that are best expressed in the | 327 people often will invent neat ideas that are best expressed in the |
319 language of creatures and senses, but in order to explore those | 328 language of creatures and senses, but in order to explore those |
320 ideas they must first build a platform in which they can create | 329 ideas they must first build a platform in which they can create |
321 simulated creatures with rich senses! There are many ideas that | 330 simulated creatures with rich senses! There are many ideas that |
322 would be simple to execute (such as =EMPATH=), but attached to them | 331 would be simple to execute (such as =EMPATH= or |
323 is the multi-month effort to make a good creature simulator. Often, | 332 \cite{larson-symbols}), but attached to them is the multi-month |
324 that initial investment of time proves to be too much, and the | 333 effort to make a good creature simulator. Often, that initial |
325 project must make do with a lesser environment. | 334 investment of time proves to be too much, and the project must make |
335 do with a lesser environment. | |
326 | 336 |
327 =CORTEX= is well suited as an environment for embodied AI research | 337 =CORTEX= is well suited as an environment for embodied AI research |
328 for three reasons: | 338 for three reasons: |
329 | 339 |
330 - You can create new creatures using Blender, a popular 3D modeling | 340 - You can create new creatures using Blender (\cite{blender}), a |
331 program. Each sense can be specified using special blender nodes | 341 popular 3D modeling program. Each sense can be specified using |
332 with biologically inspired paramaters. You need not write any | 342 special blender nodes with biologically inspired parameters. You |
333 code to create a creature, and can use a wide library of | 343 need not write any code to create a creature, and can use a wide |
334 pre-existing blender models as a base for your own creatures. | 344 library of pre-existing blender models as a base for your own |
345 creatures. | |
335 | 346 |
336 - =CORTEX= implements a wide variety of senses: touch, | 347 - =CORTEX= implements a wide variety of senses: touch, |
337 proprioception, vision, hearing, and muscle tension. Complicated | 348 proprioception, vision, hearing, and muscle tension. Complicated |
338 senses like touch, and vision involve multiple sensory elements | 349 senses like touch, and vision involve multiple sensory elements |
339 embedded in a 2D surface. You have complete control over the | 350 embedded in a 2D surface. You have complete control over the |
341 png image files. In particular, =CORTEX= implements more | 352 png image files. In particular, =CORTEX= implements more |
342 comprehensive hearing than any other creature simulation system | 353 comprehensive hearing than any other creature simulation system |
343 available. | 354 available. |
344 | 355 |
345 - =CORTEX= supports any number of creatures and any number of | 356 - =CORTEX= supports any number of creatures and any number of |
346 senses. Time in =CORTEX= dialates so that the simulated creatures | 357 senses. Time in =CORTEX= dilates so that the simulated creatures |
347 always precieve a perfectly smooth flow of time, regardless of | 358 always perceive a perfectly smooth flow of time, regardless of |
348 the actual computational load. | 359 the actual computational load. |
349 | 360 |
350 =CORTEX= is built on top of =jMonkeyEngine3=, which is a video game | 361 =CORTEX= is built on top of =jMonkeyEngine3= |
351 engine designed to create cross-platform 3D desktop games. =CORTEX= | 362 (\cite{jmonkeyengine}), which is a video game engine designed to |
352 is mainly written in clojure, a dialect of =LISP= that runs on the | 363 create cross-platform 3D desktop games. =CORTEX= is mainly written |
353 java virtual machine (JVM). The API for creating and simulating | 364 in clojure, a dialect of =LISP= that runs on the java virtual |
354 creatures and senses is entirely expressed in clojure, though many | 365 machine (JVM). The API for creating and simulating creatures and |
355 senses are implemented at the layer of jMonkeyEngine or below. For | 366 senses is entirely expressed in clojure, though many senses are |
356 example, for the sense of hearing I use a layer of clojure code on | 367 implemented at the layer of jMonkeyEngine or below. For example, |
357 top of a layer of java JNI bindings that drive a layer of =C++= | 368 for the sense of hearing I use a layer of clojure code on top of a |
358 code which implements a modified version of =OpenAL= to support | 369 layer of java JNI bindings that drive a layer of =C++= code which |
359 multiple listeners. =CORTEX= is the only simulation environment | 370 implements a modified version of =OpenAL= to support multiple |
360 that I know of that can support multiple entities that can each | 371 listeners. =CORTEX= is the only simulation environment that I know |
361 hear the world from their own perspective. Other senses also | 372 of that can support multiple entities that can each hear the world |
362 require a small layer of Java code. =CORTEX= also uses =bullet=, a | 373 from their own perspective. Other senses also require a small layer |
363 physics simulator written in =C=. | 374 of Java code. =CORTEX= also uses =bullet=, a physics simulator |
375 written in =C=. | |
364 | 376 |
365 #+caption: Here is the worm from figure \ref{worm-intro} modeled | 377 #+caption: Here is the worm from figure \ref{worm-intro} modeled |
366 #+caption: in Blender, a free 3D-modeling program. Senses and | 378 #+caption: in Blender, a free 3D-modeling program. Senses and |
367 #+caption: joints are described using special nodes in Blender. | 379 #+caption: joints are described using special nodes in Blender. |
368 #+name: worm-recognition-intro | 380 #+name: worm-recognition-intro |
373 | 385 |
374 - exploring new ideas about sensory integration | 386 - exploring new ideas about sensory integration |
375 - distributed communication among swarm creatures | 387 - distributed communication among swarm creatures |
376 - self-learning using free exploration, | 388 - self-learning using free exploration, |
377 - evolutionary algorithms involving creature construction | 389 - evolutionary algorithms involving creature construction |
378 - exploration of exoitic senses and effectors that are not possible | 390 - exploration of exotic senses and effectors that are not possible |
379 in the real world (such as telekenisis or a semantic sense) | 391 in the real world (such as telekinesis or a semantic sense) |
380 - imagination using subworlds | 392 - imagination using subworlds |
381 | 393 |
382 During one test with =CORTEX=, I created 3,000 creatures each with | 394 During one test with =CORTEX=, I created 3,000 creatures each with |
383 their own independent senses and ran them all at only 1/80 real | 395 their own independent senses and ran them all at only 1/80 real |
384 time. In another test, I created a detailed model of my own hand, | 396 time. In another test, I created a detailed model of my own hand, |
398 its own finger from the eye in its palm, and that it can feel its | 410 its own finger from the eye in its palm, and that it can feel its |
399 own thumb touching its palm.} | 411 own thumb touching its palm.} |
400 \end{sidewaysfigure} | 412 \end{sidewaysfigure} |
401 #+END_LaTeX | 413 #+END_LaTeX |
402 | 414 |
403 ** Contributions | |
404 | |
405 - I built =CORTEX=, a comprehensive platform for embodied AI | |
406 experiments. =CORTEX= supports many features lacking in other | |
407 systems, such proper simulation of hearing. It is easy to create | |
408 new =CORTEX= creatures using Blender, a free 3D modeling program. | |
409 | |
410 - I built =EMPATH=, which uses =CORTEX= to identify the actions of | |
411 a worm-like creature using a computational model of empathy. | |
412 | |
413 - After one-shot supervised training, =EMPATH= was able recognize a | |
414 wide variety of static poses and dynamic actions---ranging from | |
415 curling in a circle to wriggling with a particular frequency --- | |
416 with 95\% accuracy. | |
417 | |
418 - These results were completely independent of viewing angle | |
419 because the underlying body-centered language fundamentally is | |
420 independent; once an action is learned, it can be recognized | |
421 equally well from any viewing angle. | |
422 | |
423 - =EMPATH= is surprisingly short; the sensorimotor-centered | |
424 language provided by =CORTEX= resulted in extremely economical | |
425 recognition routines --- about 500 lines in all --- suggesting | |
426 that such representations are very powerful, and often | |
427 indispensible for the types of recognition tasks considered here. | |
428 | |
429 - Although for expediency's sake, I relied on direct knowledge of | |
430 joint positions in this proof of concept, it would be | |
431 straightforward to extend =EMPATH= so that it (more | |
432 realistically) infers joint positions from its visual data. | |
433 | |
434 * Designing =CORTEX= | 415 * Designing =CORTEX= |
435 | 416 |
436 In this section, I outline the design decisions that went into | 417 In this section, I outline the design decisions that went into |
437 making =CORTEX=, along with some details about its implementation. | 418 making =CORTEX=, along with some details about its implementation. |
438 (A practical guide to getting started with =CORTEX=, which skips | 419 (A practical guide to getting started with =CORTEX=, which skips |
439 over the history and implementation details presented here, is | 420 over the history and implementation details presented here, is |
440 provided in an appendix at the end of this thesis.) | 421 provided in an appendix at the end of this thesis.) |
441 | 422 |
442 Throughout this project, I intended for =CORTEX= to be flexible and | 423 Throughout this project, I intended for =CORTEX= to be flexible and |
443 extensible enough to be useful for other researchers who want to | 424 extensible enough to be useful for other researchers who want to |
444 test out ideas of their own. To this end, wherver I have had to make | 425 test out ideas of their own. To this end, wherever I have had to make |
445 archetictural choices about =CORTEX=, I have chosen to give as much | 426 architectural choices about =CORTEX=, I have chosen to give as much |
446 freedom to the user as possible, so that =CORTEX= may be used for | 427 freedom to the user as possible, so that =CORTEX= may be used for |
447 things I have not forseen. | 428 things I have not foreseen. |
448 | 429 |
449 ** Building in simulation versus reality | 430 ** Building in simulation versus reality |
450 The most important archetictural decision of all is the choice to | 431 The most important architectural decision of all is the choice to |
451 use a computer-simulated environemnt in the first place! The world | 432 use a computer-simulated environment in the first place! The world |
452 is a vast and rich place, and for now simulations are a very poor | 433 is a vast and rich place, and for now simulations are a very poor |
453 reflection of its complexity. It may be that there is a significant | 434 reflection of its complexity. It may be that there is a significant |
454 qualatative difference between dealing with senses in the real | 435 qualitative difference between dealing with senses in the real |
455 world and dealing with pale facilimilies of them in a simulation | 436 world and dealing with pale facsimiles of them in a simulation |
456 \cite{brooks-representation}. What are the advantages and | 437 \cite{brooks-representation}. What are the advantages and |
457 disadvantages of a simulation vs. reality? | 438 disadvantages of a simulation vs. reality? |
458 | 439 |
459 *** Simulation | 440 *** Simulation |
460 | 441 |
517 ideas in the real world must always worry about getting their | 498 ideas in the real world must always worry about getting their |
518 algorithms to run fast enough to process information in real time. | 499 algorithms to run fast enough to process information in real time. |
519 The need for real time processing only increases if multiple senses | 500 The need for real time processing only increases if multiple senses |
520 are involved. In the extreme case, even simple algorithms will have | 501 are involved. In the extreme case, even simple algorithms will have |
521 to be accelerated by ASIC chips or FPGAs, turning what would | 502 to be accelerated by ASIC chips or FPGAs, turning what would |
522 otherwise be a few lines of code and a 10x speed penality into a | 503 otherwise be a few lines of code and a 10x speed penalty into a |
523 multi-month ordeal. For this reason, =CORTEX= supports | 504 multi-month ordeal. For this reason, =CORTEX= supports |
524 /time-dialiation/, which scales back the framerate of the | 505 /time-dilation/, which scales back the framerate of the |
525 simulation in proportion to the amount of processing each frame. | 506 simulation in proportion to the amount of processing each frame. |
526 From the perspective of the creatures inside the simulation, time | 507 From the perspective of the creatures inside the simulation, time |
527 always appears to flow at a constant rate, regardless of how | 508 always appears to flow at a constant rate, regardless of how |
528 complicated the envorimnent becomes or how many creatures are in | 509 complicated the environment becomes or how many creatures are in |
529 the simulation. The cost is that =CORTEX= can sometimes run slower | 510 the simulation. The cost is that =CORTEX= can sometimes run slower |
530 than real time. This can also be an advantage, however --- | 511 than real time. This can also be an advantage, however --- |
531 simulations of very simple creatures in =CORTEX= generally run at | 512 simulations of very simple creatures in =CORTEX= generally run at |
532 40x on my machine! | 513 40x on my machine! |
533 | 514 |
534 ** All sense organs are two-dimensional surfaces | 515 ** All sense organs are two-dimensional surfaces |
535 | 516 |
536 If =CORTEX= is to support a wide variety of senses, it would help | 517 If =CORTEX= is to support a wide variety of senses, it would help |
537 to have a better understanding of what a ``sense'' actually is! | 518 to have a better understanding of what a ``sense'' actually is! |
538 While vision, touch, and hearing all seem like they are quite | 519 While vision, touch, and hearing all seem like they are quite |
539 different things, I was supprised to learn during the course of | 520 different things, I was surprised to learn during the course of |
540 this thesis that they (and all physical senses) can be expressed as | 521 this thesis that they (and all physical senses) can be expressed as |
541 exactly the same mathematical object due to a dimensional argument! | 522 exactly the same mathematical object due to a dimensional argument! |
542 | 523 |
543 Human beings are three-dimensional objects, and the nerves that | 524 Human beings are three-dimensional objects, and the nerves that |
544 transmit data from our various sense organs to our brain are | 525 transmit data from our various sense organs to our brain are |
559 complicated surface of the skin onto a two dimensional image. | 540 complicated surface of the skin onto a two dimensional image. |
560 | 541 |
561 Most human senses consist of many discrete sensors of various | 542 Most human senses consist of many discrete sensors of various |
562 properties distributed along a surface at various densities. For | 543 properties distributed along a surface at various densities. For |
563 skin, it is Pacinian corpuscles, Meissner's corpuscles, Merkel's | 544 skin, it is Pacinian corpuscles, Meissner's corpuscles, Merkel's |
564 disks, and Ruffini's endings, which detect pressure and vibration | 545 disks, and Ruffini's endings (\cite{9.01-textbook), which detect |
565 of various intensities. For ears, it is the stereocilia distributed | 546 pressure and vibration of various intensities. For ears, it is the |
566 along the basilar membrane inside the cochlea; each one is | 547 stereocilia distributed along the basilar membrane inside the |
567 sensitive to a slightly different frequency of sound. For eyes, it | 548 cochlea; each one is sensitive to a slightly different frequency of |
568 is rods and cones distributed along the surface of the retina. In | 549 sound. For eyes, it is rods and cones distributed along the surface |
569 each case, we can describe the sense with a surface and a | 550 of the retina. In each case, we can describe the sense with a |
570 distribution of sensors along that surface. | 551 surface and a distribution of sensors along that surface. |
571 | 552 |
572 The neat idea is that every human sense can be effectively | 553 The neat idea is that every human sense can be effectively |
573 described in terms of a surface containing embedded sensors. If the | 554 described in terms of a surface containing embedded sensors. If the |
574 sense had any more dimensions, then there wouldn't be enough room | 555 sense had any more dimensions, then there wouldn't be enough room |
575 in the spinal chord to transmit the information! | 556 in the spinal chord to transmit the information! |
612 ** Video game engines provide ready-made physics and shading | 593 ** Video game engines provide ready-made physics and shading |
613 | 594 |
614 I did not need to write my own physics simulation code or shader to | 595 I did not need to write my own physics simulation code or shader to |
615 build =CORTEX=. Doing so would lead to a system that is impossible | 596 build =CORTEX=. Doing so would lead to a system that is impossible |
616 for anyone but myself to use anyway. Instead, I use a video game | 597 for anyone but myself to use anyway. Instead, I use a video game |
617 engine as a base and modify it to accomodate the additional needs | 598 engine as a base and modify it to accommodate the additional needs |
618 of =CORTEX=. Video game engines are an ideal starting point to | 599 of =CORTEX=. Video game engines are an ideal starting point to |
619 build =CORTEX=, because they are not far from being creature | 600 build =CORTEX=, because they are not far from being creature |
620 building systems themselves. | 601 building systems themselves. |
621 | 602 |
622 First off, general purpose video game engines come with a physics | 603 First off, general purpose video game engines come with a physics |
682 one to create boxes, spheres, etc., and leave that API as the sole | 663 one to create boxes, spheres, etc., and leave that API as the sole |
683 way to create creatures. However, for =CORTEX= to truly be useful | 664 way to create creatures. However, for =CORTEX= to truly be useful |
684 for other projects, it needs a way to construct complicated | 665 for other projects, it needs a way to construct complicated |
685 creatures. If possible, it would be nice to leverage work that has | 666 creatures. If possible, it would be nice to leverage work that has |
686 already been done by the community of 3D modelers, or at least | 667 already been done by the community of 3D modelers, or at least |
687 enable people who are talented at moedling but not programming to | 668 enable people who are talented at modeling but not programming to |
688 design =CORTEX= creatures. | 669 design =CORTEX= creatures. |
689 | 670 |
690 Therefore, I use Blender, a free 3D modeling program, as the main | 671 Therefore, I use Blender, a free 3D modeling program, as the main |
691 way to create creatures in =CORTEX=. However, the creatures modeled | 672 way to create creatures in =CORTEX=. However, the creatures modeled |
692 in Blender must also be simple to simulate in jMonkeyEngine3's game | 673 in Blender must also be simple to simulate in jMonkeyEngine3's game |
702 - Add empty nodes which each contain meta-data relevant to the | 683 - Add empty nodes which each contain meta-data relevant to the |
703 sense, including a UV-map describing the number/distribution of | 684 sense, including a UV-map describing the number/distribution of |
704 sensors if applicable. | 685 sensors if applicable. |
705 - Make each empty-node the child of the top-level node. | 686 - Make each empty-node the child of the top-level node. |
706 | 687 |
707 #+caption: An example of annoting a creature model with empty | 688 #+caption: An example of annotating a creature model with empty |
708 #+caption: nodes to describe the layout of senses. There are | 689 #+caption: nodes to describe the layout of senses. There are |
709 #+caption: multiple empty nodes which each describe the position | 690 #+caption: multiple empty nodes which each describe the position |
710 #+caption: of muscles, ears, eyes, or joints. | 691 #+caption: of muscles, ears, eyes, or joints. |
711 #+name: sense-nodes | 692 #+name: sense-nodes |
712 #+ATTR_LaTeX: :width 10cm | 693 #+ATTR_LaTeX: :width 10cm |
715 ** Bodies are composed of segments connected by joints | 696 ** Bodies are composed of segments connected by joints |
716 | 697 |
717 Blender is a general purpose animation tool, which has been used in | 698 Blender is a general purpose animation tool, which has been used in |
718 the past to create high quality movies such as Sintel | 699 the past to create high quality movies such as Sintel |
719 \cite{blender}. Though Blender can model and render even complicated | 700 \cite{blender}. Though Blender can model and render even complicated |
720 things like water, it is crucual to keep models that are meant to | 701 things like water, it is crucial to keep models that are meant to |
721 be simulated as creatures simple. =Bullet=, which =CORTEX= uses | 702 be simulated as creatures simple. =Bullet=, which =CORTEX= uses |
722 though jMonkeyEngine3, is a rigid-body physics system. This offers | 703 though jMonkeyEngine3, is a rigid-body physics system. This offers |
723 a compromise between the expressiveness of a game level and the | 704 a compromise between the expressiveness of a game level and the |
724 speed at which it can be simulated, and it means that creatures | 705 speed at which it can be simulated, and it means that creatures |
725 should be naturally expressed as rigid components held together by | 706 should be naturally expressed as rigid components held together by |
726 joint constraints. | 707 joint constraints. |
727 | 708 |
728 But humans are more like a squishy bag with wrapped around some | 709 But humans are more like a squishy bag wrapped around some hard |
729 hard bones which define the overall shape. When we move, our skin | 710 bones which define the overall shape. When we move, our skin bends |
730 bends and stretches to accomodate the new positions of our bones. | 711 and stretches to accommodate the new positions of our bones. |
731 | 712 |
732 One way to make bodies composed of rigid pieces connected by joints | 713 One way to make bodies composed of rigid pieces connected by joints |
733 /seem/ more human-like is to use an /armature/, (or /rigging/) | 714 /seem/ more human-like is to use an /armature/, (or /rigging/) |
734 system, which defines a overall ``body mesh'' and defines how the | 715 system, which defines a overall ``body mesh'' and defines how the |
735 mesh deforms as a function of the position of each ``bone'' which | 716 mesh deforms as a function of the position of each ``bone'' which |
736 is a standard rigid body. This technique is used extensively to | 717 is a standard rigid body. This technique is used extensively to |
737 model humans and create realistic animations. It is not a good | 718 model humans and create realistic animations. It is not a good |
738 technique for physical simulation, however because it creates a lie | 719 technique for physical simulation because it is a lie -- the skin |
739 -- the skin is not a physical part of the simulation and does not | 720 is not a physical part of the simulation and does not interact with |
740 interact with any objects in the world or itself. Objects will pass | 721 any objects in the world or itself. Objects will pass right though |
741 right though the skin until they come in contact with the | 722 the skin until they come in contact with the underlying bone, which |
742 underlying bone, which is a physical object. Whithout simulating | 723 is a physical object. Without simulating the skin, the sense of |
743 the skin, the sense of touch has little meaning, and the creature's | 724 touch has little meaning, and the creature's own vision will lie to |
744 own vision will lie to it about the true extent of its body. | 725 it about the true extent of its body. Simulating the skin as a |
745 Simulating the skin as a physical object requires some way to | 726 physical object requires some way to continuously update the |
746 continuously update the physical model of the skin along with the | 727 physical model of the skin along with the movement of the bones, |
747 movement of the bones, which is unacceptably slow compared to rigid | 728 which is unacceptably slow compared to rigid body simulation. |
748 body simulation. | |
749 | 729 |
750 Therefore, instead of using the human-like ``deformable bag of | 730 Therefore, instead of using the human-like ``deformable bag of |
751 bones'' approach, I decided to base my body plans on multiple solid | 731 bones'' approach, I decided to base my body plans on multiple solid |
752 objects that are connected by joints, inspired by the robot =EVE= | 732 objects that are connected by joints, inspired by the robot =EVE= |
753 from the movie WALL-E. | 733 from the movie WALL-E. |
760 | 740 |
761 =EVE='s body is composed of several rigid components that are held | 741 =EVE='s body is composed of several rigid components that are held |
762 together by invisible joint constraints. This is what I mean by | 742 together by invisible joint constraints. This is what I mean by |
763 ``eve-like''. The main reason that I use eve-style bodies is for | 743 ``eve-like''. The main reason that I use eve-style bodies is for |
764 efficiency, and so that there will be correspondence between the | 744 efficiency, and so that there will be correspondence between the |
765 AI's semses and the physical presence of its body. Each individual | 745 AI's senses and the physical presence of its body. Each individual |
766 section is simulated by a separate rigid body that corresponds | 746 section is simulated by a separate rigid body that corresponds |
767 exactly with its visual representation and does not change. | 747 exactly with its visual representation and does not change. |
768 Sections are connected by invisible joints that are well supported | 748 Sections are connected by invisible joints that are well supported |
769 in jMonkeyEngine3. Bullet, the physics backend for jMonkeyEngine3, | 749 in jMonkeyEngine3. Bullet, the physics backend for jMonkeyEngine3, |
770 can efficiently simulate hundreds of rigid bodies connected by | 750 can efficiently simulate hundreds of rigid bodies connected by |
868 Since the objects must be physical, the empty-node itself escapes | 848 Since the objects must be physical, the empty-node itself escapes |
869 detection. Because the objects must be physical, =joint-targets= | 849 detection. Because the objects must be physical, =joint-targets= |
870 must be called /after/ =physical!= is called. | 850 must be called /after/ =physical!= is called. |
871 | 851 |
872 #+caption: Program to find the targets of a joint node by | 852 #+caption: Program to find the targets of a joint node by |
873 #+caption: exponentiallly growth of a search cube. | 853 #+caption: exponentially growth of a search cube. |
874 #+name: joint-targets | 854 #+name: joint-targets |
875 #+begin_listing clojure | 855 #+begin_listing clojure |
876 #+begin_src clojure | 856 #+begin_src clojure |
877 (defn joint-targets | 857 (defn joint-targets |
878 "Return the two closest two objects to the joint object, ordered | 858 "Return the two closest two objects to the joint object, ordered |
903 | 883 |
904 Once =CORTEX= finds all joints and targets, it creates them using | 884 Once =CORTEX= finds all joints and targets, it creates them using |
905 a dispatch on the metadata of each joint node. | 885 a dispatch on the metadata of each joint node. |
906 | 886 |
907 #+caption: Program to dispatch on blender metadata and create joints | 887 #+caption: Program to dispatch on blender metadata and create joints |
908 #+caption: sutiable for physical simulation. | 888 #+caption: suitable for physical simulation. |
909 #+name: joint-dispatch | 889 #+name: joint-dispatch |
910 #+begin_listing clojure | 890 #+begin_listing clojure |
911 #+begin_src clojure | 891 #+begin_src clojure |
912 (defmulti joint-dispatch | 892 (defmulti joint-dispatch |
913 "Translate blender pseudo-joints into real JME joints." | 893 "Translate blender pseudo-joints into real JME joints." |
983 #+end_listing | 963 #+end_listing |
984 | 964 |
985 In general, whenever =CORTEX= exposes a sense (or in this case | 965 In general, whenever =CORTEX= exposes a sense (or in this case |
986 physicality), it provides a function of the type =sense!=, which | 966 physicality), it provides a function of the type =sense!=, which |
987 takes in a collection of nodes and augments it to support that | 967 takes in a collection of nodes and augments it to support that |
988 sense. The function returns any controlls necessary to use that | 968 sense. The function returns any controls necessary to use that |
989 sense. In this case =body!= cerates a physical body and returns no | 969 sense. In this case =body!= creates a physical body and returns no |
990 control functions. | 970 control functions. |
991 | 971 |
992 #+caption: Program to give joints to a creature. | 972 #+caption: Program to give joints to a creature. |
993 #+name: name | 973 #+name: name |
994 #+begin_listing clojure | 974 #+begin_listing clojure |
1020 The hand from figure \ref{blender-hand}, which was modeled after | 1000 The hand from figure \ref{blender-hand}, which was modeled after |
1021 my own right hand, can now be given joints and simulated as a | 1001 my own right hand, can now be given joints and simulated as a |
1022 creature. | 1002 creature. |
1023 | 1003 |
1024 #+caption: With the ability to create physical creatures from blender, | 1004 #+caption: With the ability to create physical creatures from blender, |
1025 #+caption: =CORTEX= gets one step closer to becomming a full creature | 1005 #+caption: =CORTEX= gets one step closer to becoming a full creature |
1026 #+caption: simulation environment. | 1006 #+caption: simulation environment. |
1027 #+name: name | 1007 #+name: name |
1028 #+ATTR_LaTeX: :width 15cm | 1008 #+ATTR_LaTeX: :width 15cm |
1029 [[./images/physical-hand.png]] | 1009 [[./images/physical-hand.png]] |
1030 | 1010 |
1083 the data. To make this easy for the continuation function, the | 1063 the data. To make this easy for the continuation function, the |
1084 =SceneProcessor= maintains appropriately sized buffers in RAM to | 1064 =SceneProcessor= maintains appropriately sized buffers in RAM to |
1085 hold the data. It does not do any copying from the GPU to the CPU | 1065 hold the data. It does not do any copying from the GPU to the CPU |
1086 itself because it is a slow operation. | 1066 itself because it is a slow operation. |
1087 | 1067 |
1088 #+caption: Function to make the rendered secne in jMonkeyEngine | 1068 #+caption: Function to make the rendered scene in jMonkeyEngine |
1089 #+caption: available for further processing. | 1069 #+caption: available for further processing. |
1090 #+name: pipeline-1 | 1070 #+name: pipeline-1 |
1091 #+begin_listing clojure | 1071 #+begin_listing clojure |
1092 #+begin_src clojure | 1072 #+begin_src clojure |
1093 (defn vision-pipeline | 1073 (defn vision-pipeline |
1158 XZY rotation for the node in blender." | 1138 XZY rotation for the node in blender." |
1159 [#^Node creature #^Spatial eye] | 1139 [#^Node creature #^Spatial eye] |
1160 (let [target (closest-node creature eye) | 1140 (let [target (closest-node creature eye) |
1161 [cam-width cam-height] | 1141 [cam-width cam-height] |
1162 ;;[640 480] ;; graphics card on laptop doesn't support | 1142 ;;[640 480] ;; graphics card on laptop doesn't support |
1163 ;; arbitray dimensions. | 1143 ;; arbitrary dimensions. |
1164 (eye-dimensions eye) | 1144 (eye-dimensions eye) |
1165 cam (Camera. cam-width cam-height) | 1145 cam (Camera. cam-width cam-height) |
1166 rot (.getWorldRotation eye)] | 1146 rot (.getWorldRotation eye)] |
1167 (.setLocation cam (.getWorldTranslation eye)) | 1147 (.setLocation cam (.getWorldTranslation eye)) |
1168 (.lookAtDirection | 1148 (.lookAtDirection |
1343 sound from different points of view, and there is no way to directly | 1323 sound from different points of view, and there is no way to directly |
1344 access the rendered sound data. | 1324 access the rendered sound data. |
1345 | 1325 |
1346 =CORTEX='s hearing is unique because it does not have any | 1326 =CORTEX='s hearing is unique because it does not have any |
1347 limitations compared to other simulation environments. As far as I | 1327 limitations compared to other simulation environments. As far as I |
1348 know, there is no other system that supports multiple listerers, | 1328 know, there is no other system that supports multiple listeners, |
1349 and the sound demo at the end of this section is the first time | 1329 and the sound demo at the end of this section is the first time |
1350 it's been done in a video game environment. | 1330 it's been done in a video game environment. |
1351 | 1331 |
1352 *** Brief Description of jMonkeyEngine's Sound System | 1332 *** Brief Description of jMonkeyEngine's Sound System |
1353 | 1333 |
1382 *** Extending =OpenAl= | 1362 *** Extending =OpenAl= |
1383 | 1363 |
1384 Extending =OpenAL= to support multiple listeners requires 500 | 1364 Extending =OpenAL= to support multiple listeners requires 500 |
1385 lines of =C= code and is too hairy to mention here. Instead, I | 1365 lines of =C= code and is too hairy to mention here. Instead, I |
1386 will show a small amount of extension code and go over the high | 1366 will show a small amount of extension code and go over the high |
1387 level stragety. Full source is of course available with the | 1367 level strategy. Full source is of course available with the |
1388 =CORTEX= distribution if you're interested. | 1368 =CORTEX= distribution if you're interested. |
1389 | 1369 |
1390 =OpenAL= goes to great lengths to support many different systems, | 1370 =OpenAL= goes to great lengths to support many different systems, |
1391 all with different sound capabilities and interfaces. It | 1371 all with different sound capabilities and interfaces. It |
1392 accomplishes this difficult task by providing code for many | 1372 accomplishes this difficult task by providing code for many |
1404 any particular system. These include the Null Device, which | 1384 any particular system. These include the Null Device, which |
1405 doesn't do anything, and the Wave Device, which writes whatever | 1385 doesn't do anything, and the Wave Device, which writes whatever |
1406 sound it receives to a file, if everything has been set up | 1386 sound it receives to a file, if everything has been set up |
1407 correctly when configuring =OpenAL=. | 1387 correctly when configuring =OpenAL=. |
1408 | 1388 |
1409 Actual mixing (doppler shift and distance.environment-based | 1389 Actual mixing (Doppler shift and distance.environment-based |
1410 attenuation) of the sound data happens in the Devices, and they | 1390 attenuation) of the sound data happens in the Devices, and they |
1411 are the only point in the sound rendering process where this data | 1391 are the only point in the sound rendering process where this data |
1412 is available. | 1392 is available. |
1413 | 1393 |
1414 Therefore, in order to support multiple listeners, and get the | 1394 Therefore, in order to support multiple listeners, and get the |
1621 entity.getMaterial().setColor("Color", ColorRGBA.Gray); | 1601 entity.getMaterial().setColor("Color", ColorRGBA.Gray); |
1622 } | 1602 } |
1623 #+END_SRC | 1603 #+END_SRC |
1624 #+end_listing | 1604 #+end_listing |
1625 | 1605 |
1626 #+caption: First ever simulation of multiple listerners in =CORTEX=. | 1606 #+caption: First ever simulation of multiple listeners in =CORTEX=. |
1627 #+caption: Each cube is a creature which processes sound data with | 1607 #+caption: Each cube is a creature which processes sound data with |
1628 #+caption: the =process= function from listing \ref{sound-test}. | 1608 #+caption: the =process= function from listing \ref{sound-test}. |
1629 #+caption: the ball is constantally emiting a pure tone of | 1609 #+caption: the ball is constantly emitting a pure tone of |
1630 #+caption: constant volume. As it approaches the cubes, they each | 1610 #+caption: constant volume. As it approaches the cubes, they each |
1631 #+caption: change color in response to the sound. | 1611 #+caption: change color in response to the sound. |
1632 #+name: sound-cubes. | 1612 #+name: sound-cubes. |
1633 #+ATTR_LaTeX: :width 10cm | 1613 #+ATTR_LaTeX: :width 10cm |
1634 [[./images/java-hearing-test.png]] | 1614 [[./images/java-hearing-test.png]] |
1754 comprise a mesh, while =pixel-triangles= gets those same triangles | 1734 comprise a mesh, while =pixel-triangles= gets those same triangles |
1755 expressed in pixel coordinates (which are UV coordinates scaled to | 1735 expressed in pixel coordinates (which are UV coordinates scaled to |
1756 fit the height and width of the UV image). | 1736 fit the height and width of the UV image). |
1757 | 1737 |
1758 #+caption: Programs to extract triangles from a geometry and get | 1738 #+caption: Programs to extract triangles from a geometry and get |
1759 #+caption: their verticies in both world and UV-coordinates. | 1739 #+caption: their vertices in both world and UV-coordinates. |
1760 #+name: get-triangles | 1740 #+name: get-triangles |
1761 #+begin_listing clojure | 1741 #+begin_listing clojure |
1762 #+BEGIN_SRC clojure | 1742 #+BEGIN_SRC clojure |
1763 (defn triangle | 1743 (defn triangle |
1764 "Get the triangle specified by triangle-index from the mesh." | 1744 "Get the triangle specified by triangle-index from the mesh." |
1849 | 1829 |
1850 The clojure code below recapitulates the formulas above, using | 1830 The clojure code below recapitulates the formulas above, using |
1851 jMonkeyEngine's =Matrix4f= objects, which can describe any affine | 1831 jMonkeyEngine's =Matrix4f= objects, which can describe any affine |
1852 transformation. | 1832 transformation. |
1853 | 1833 |
1854 #+caption: Program to interpert triangles as affine transforms. | 1834 #+caption: Program to interpret triangles as affine transforms. |
1855 #+name: triangle-affine | 1835 #+name: triangle-affine |
1856 #+begin_listing clojure | 1836 #+begin_listing clojure |
1857 #+BEGIN_SRC clojure | 1837 #+BEGIN_SRC clojure |
1858 (defn triangle->matrix4f | 1838 (defn triangle->matrix4f |
1859 "Converts the triangle into a 4x4 matrix: The first three columns | 1839 "Converts the triangle into a 4x4 matrix: The first three columns |
1892 triangle. | 1872 triangle. |
1893 | 1873 |
1894 =inside-triangle?= determines whether a point is inside a triangle | 1874 =inside-triangle?= determines whether a point is inside a triangle |
1895 in 2D pixel-space. | 1875 in 2D pixel-space. |
1896 | 1876 |
1897 #+caption: Program to efficiently determine point includion | 1877 #+caption: Program to efficiently determine point inclusion |
1898 #+caption: in a triangle. | 1878 #+caption: in a triangle. |
1899 #+name: in-triangle | 1879 #+name: in-triangle |
1900 #+begin_listing clojure | 1880 #+begin_listing clojure |
1901 #+BEGIN_SRC clojure | 1881 #+BEGIN_SRC clojure |
1902 (defn convex-bounds | 1882 (defn convex-bounds |
2087 #+END_SRC | 2067 #+END_SRC |
2088 #+end_listing | 2068 #+end_listing |
2089 | 2069 |
2090 Armed with the =touch!= function, =CORTEX= becomes capable of | 2070 Armed with the =touch!= function, =CORTEX= becomes capable of |
2091 giving creatures a sense of touch. A simple test is to create a | 2071 giving creatures a sense of touch. A simple test is to create a |
2092 cube that is outfitted with a uniform distrubition of touch | 2072 cube that is outfitted with a uniform distribution of touch |
2093 sensors. It can feel the ground and any balls that it touches. | 2073 sensors. It can feel the ground and any balls that it touches. |
2094 | 2074 |
2095 #+caption: =CORTEX= interface for creating touch in a simulated | 2075 #+caption: =CORTEX= interface for creating touch in a simulated |
2096 #+caption: creature. | 2076 #+caption: creature. |
2097 #+name: touch | 2077 #+name: touch |
2109 (node-seq creature))))) | 2089 (node-seq creature))))) |
2110 #+END_SRC | 2090 #+END_SRC |
2111 #+end_listing | 2091 #+end_listing |
2112 | 2092 |
2113 The tactile-sensor-profile image for the touch cube is a simple | 2093 The tactile-sensor-profile image for the touch cube is a simple |
2114 cross with a unifom distribution of touch sensors: | 2094 cross with a uniform distribution of touch sensors: |
2115 | 2095 |
2116 #+caption: The touch profile for the touch-cube. Each pure white | 2096 #+caption: The touch profile for the touch-cube. Each pure white |
2117 #+caption: pixel defines a touch sensitive feeler. | 2097 #+caption: pixel defines a touch sensitive feeler. |
2118 #+name: touch-cube-uv-map | 2098 #+name: touch-cube-uv-map |
2119 #+ATTR_LaTeX: :width 7cm | 2099 #+ATTR_LaTeX: :width 7cm |
2120 [[./images/touch-profile.png]] | 2100 [[./images/touch-profile.png]] |
2121 | 2101 |
2122 #+caption: The touch cube reacts to canonballs. The black, red, | 2102 #+caption: The touch cube reacts to cannonballs. The black, red, |
2123 #+caption: and white cross on the right is a visual display of | 2103 #+caption: and white cross on the right is a visual display of |
2124 #+caption: the creature's touch. White means that it is feeling | 2104 #+caption: the creature's touch. White means that it is feeling |
2125 #+caption: something strongly, black is not feeling anything, | 2105 #+caption: something strongly, black is not feeling anything, |
2126 #+caption: and gray is in-between. The cube can feel both the | 2106 #+caption: and gray is in-between. The cube can feel both the |
2127 #+caption: floor and the ball. Notice that when the ball causes | 2107 #+caption: floor and the ball. Notice that when the ball causes |
2169 radians you have to move counterclockwise around the axis vector | 2149 radians you have to move counterclockwise around the axis vector |
2170 to get from the first to the second vector. It is not commutative | 2150 to get from the first to the second vector. It is not commutative |
2171 like a normal dot-product angle is. | 2151 like a normal dot-product angle is. |
2172 | 2152 |
2173 The purpose of these functions is to build a system of angle | 2153 The purpose of these functions is to build a system of angle |
2174 measurement that is biologically plausable. | 2154 measurement that is biologically plausible. |
2175 | 2155 |
2176 #+caption: Program to measure angles along a vector | 2156 #+caption: Program to measure angles along a vector |
2177 #+name: helpers | 2157 #+name: helpers |
2178 #+begin_listing clojure | 2158 #+begin_listing clojure |
2179 #+BEGIN_SRC clojure | 2159 #+BEGIN_SRC clojure |
2199 Given a joint, =proprioception-kernel= produces a function that | 2179 Given a joint, =proprioception-kernel= produces a function that |
2200 calculates the Euler angles between the the objects the joint | 2180 calculates the Euler angles between the the objects the joint |
2201 connects. The only tricky part here is making the angles relative | 2181 connects. The only tricky part here is making the angles relative |
2202 to the joint's initial ``straightness''. | 2182 to the joint's initial ``straightness''. |
2203 | 2183 |
2204 #+caption: Program to return biologially reasonable proprioceptive | 2184 #+caption: Program to return biologically reasonable proprioceptive |
2205 #+caption: data for each joint. | 2185 #+caption: data for each joint. |
2206 #+name: proprioception | 2186 #+name: proprioception |
2207 #+begin_listing clojure | 2187 #+begin_listing clojure |
2208 #+BEGIN_SRC clojure | 2188 #+BEGIN_SRC clojure |
2209 (defn proprioception-kernel | 2189 (defn proprioception-kernel |
2357 red, instead of shades of gray as I've been using for all the | 2337 red, instead of shades of gray as I've been using for all the |
2358 other senses. This is purely an aesthetic touch. | 2338 other senses. This is purely an aesthetic touch. |
2359 | 2339 |
2360 *** Creating muscles | 2340 *** Creating muscles |
2361 | 2341 |
2362 #+caption: This is the core movement functoion in =CORTEX=, which | 2342 #+caption: This is the core movement function in =CORTEX=, which |
2363 #+caption: implements muscles that report on their activation. | 2343 #+caption: implements muscles that report on their activation. |
2364 #+name: muscle-kernel | 2344 #+name: muscle-kernel |
2365 #+begin_listing clojure | 2345 #+begin_listing clojure |
2366 #+BEGIN_SRC clojure | 2346 #+BEGIN_SRC clojure |
2367 (defn movement-kernel | 2347 (defn movement-kernel |
2415 | 2395 |
2416 With all senses enabled, my right hand model looks like an | 2396 With all senses enabled, my right hand model looks like an |
2417 intricate marionette hand with several strings for each finger: | 2397 intricate marionette hand with several strings for each finger: |
2418 | 2398 |
2419 #+caption: View of the hand model with all sense nodes. You can see | 2399 #+caption: View of the hand model with all sense nodes. You can see |
2420 #+caption: the joint, muscle, ear, and eye nodess here. | 2400 #+caption: the joint, muscle, ear, and eye nodes here. |
2421 #+name: hand-nodes-1 | 2401 #+name: hand-nodes-1 |
2422 #+ATTR_LaTeX: :width 11cm | 2402 #+ATTR_LaTeX: :width 11cm |
2423 [[./images/hand-with-all-senses2.png]] | 2403 [[./images/hand-with-all-senses2.png]] |
2424 | 2404 |
2425 #+caption: An alternate view of the hand. | 2405 #+caption: An alternate view of the hand. |
2428 [[./images/hand-with-all-senses3.png]] | 2408 [[./images/hand-with-all-senses3.png]] |
2429 | 2409 |
2430 With the hand fully rigged with senses, I can run it though a test | 2410 With the hand fully rigged with senses, I can run it though a test |
2431 that will test everything. | 2411 that will test everything. |
2432 | 2412 |
2433 #+caption: A full test of the hand with all senses. Note expecially | 2413 #+caption: A full test of the hand with all senses. Note especially |
2434 #+caption: the interactions the hand has with itself: it feels | 2414 #+caption: the interactions the hand has with itself: it feels |
2435 #+caption: its own palm and fingers, and when it curls its fingers, | 2415 #+caption: its own palm and fingers, and when it curls its fingers, |
2436 #+caption: it sees them with its eye (which is located in the center | 2416 #+caption: it sees them with its eye (which is located in the center |
2437 #+caption: of the palm. The red block appears with a pure tone sound. | 2417 #+caption: of the palm. The red block appears with a pure tone sound. |
2438 #+caption: The hand then uses its muscles to launch the cube! | 2418 #+caption: The hand then uses its muscles to launch the cube! |
2439 #+name: integration | 2419 #+name: integration |
2440 #+ATTR_LaTeX: :width 16cm | 2420 #+ATTR_LaTeX: :width 16cm |
2441 [[./images/integration.png]] | 2421 [[./images/integration.png]] |
2442 | 2422 |
2443 ** =CORTEX= enables many possiblities for further research | 2423 ** =CORTEX= enables many possibilities for further research |
2444 | 2424 |
2445 Often times, the hardest part of building a system involving | 2425 Often times, the hardest part of building a system involving |
2446 creatures is dealing with physics and graphics. =CORTEX= removes | 2426 creatures is dealing with physics and graphics. =CORTEX= removes |
2447 much of this initial difficulty and leaves researchers free to | 2427 much of this initial difficulty and leaves researchers free to |
2448 directly pursue their ideas. I hope that even undergrads with a | 2428 directly pursue their ideas. I hope that even undergrads with a |
2559 :proprioception (proprioception! model) | 2539 :proprioception (proprioception! model) |
2560 :muscles (movement! model)})) | 2540 :muscles (movement! model)})) |
2561 #+end_src | 2541 #+end_src |
2562 #+end_listing | 2542 #+end_listing |
2563 | 2543 |
2564 ** Embodiment factors action recognition into managable parts | 2544 ** Embodiment factors action recognition into manageable parts |
2565 | 2545 |
2566 Using empathy, I divide the problem of action recognition into a | 2546 Using empathy, I divide the problem of action recognition into a |
2567 recognition process expressed in the language of a full compliment | 2547 recognition process expressed in the language of a full compliment |
2568 of senses, and an imaganitive process that generates full sensory | 2548 of senses, and an imaginative process that generates full sensory |
2569 data from partial sensory data. Splitting the action recognition | 2549 data from partial sensory data. Splitting the action recognition |
2570 problem in this manner greatly reduces the total amount of work to | 2550 problem in this manner greatly reduces the total amount of work to |
2571 recognize actions: The imaganitive process is mostly just matching | 2551 recognize actions: The imaginative process is mostly just matching |
2572 previous experience, and the recognition process gets to use all | 2552 previous experience, and the recognition process gets to use all |
2573 the senses to directly describe any action. | 2553 the senses to directly describe any action. |
2574 | 2554 |
2575 ** Action recognition is easy with a full gamut of senses | 2555 ** Action recognition is easy with a full gamut of senses |
2576 | 2556 |
2584 | 2564 |
2585 The following action predicates each take a stream of sensory | 2565 The following action predicates each take a stream of sensory |
2586 experience, observe however much of it they desire, and decide | 2566 experience, observe however much of it they desire, and decide |
2587 whether the worm is doing the action they describe. =curled?= | 2567 whether the worm is doing the action they describe. =curled?= |
2588 relies on proprioception, =resting?= relies on touch, =wiggling?= | 2568 relies on proprioception, =resting?= relies on touch, =wiggling?= |
2589 relies on a fourier analysis of muscle contraction, and | 2569 relies on a Fourier analysis of muscle contraction, and |
2590 =grand-circle?= relies on touch and reuses =curled?= as a gaurd. | 2570 =grand-circle?= relies on touch and reuses =curled?= as a guard. |
2591 | 2571 |
2592 #+caption: Program for detecting whether the worm is curled. This is the | 2572 #+caption: Program for detecting whether the worm is curled. This is the |
2593 #+caption: simplest action predicate, because it only uses the last frame | 2573 #+caption: simplest action predicate, because it only uses the last frame |
2594 #+caption: of sensory experience, and only uses proprioceptive data. Even | 2574 #+caption: of sensory experience, and only uses proprioceptive data. Even |
2595 #+caption: this simple predicate, however, is automatically frame | 2575 #+caption: this simple predicate, however, is automatically frame |
2632 | 2612 |
2633 #+caption: Program for detecting whether the worm is at rest. This program | 2613 #+caption: Program for detecting whether the worm is at rest. This program |
2634 #+caption: uses a summary of the tactile information from the underbelly | 2614 #+caption: uses a summary of the tactile information from the underbelly |
2635 #+caption: of the worm, and is only true if every segment is touching the | 2615 #+caption: of the worm, and is only true if every segment is touching the |
2636 #+caption: floor. Note that this function contains no references to | 2616 #+caption: floor. Note that this function contains no references to |
2637 #+caption: proprioction at all. | 2617 #+caption: proprioception at all. |
2638 #+name: resting | 2618 #+name: resting |
2639 #+begin_listing clojure | 2619 #+begin_listing clojure |
2640 #+begin_src clojure | 2620 #+begin_src clojure |
2641 (def worm-segment-bottom (rect-region [8 15] [14 22])) | 2621 (def worm-segment-bottom (rect-region [8 15] [14 22])) |
2642 | 2622 |
2673 #+end_src | 2653 #+end_src |
2674 #+end_listing | 2654 #+end_listing |
2675 | 2655 |
2676 | 2656 |
2677 #+caption: Program for detecting whether the worm has been wiggling for | 2657 #+caption: Program for detecting whether the worm has been wiggling for |
2678 #+caption: the last few frames. It uses a fourier analysis of the muscle | 2658 #+caption: the last few frames. It uses a Fourier analysis of the muscle |
2679 #+caption: contractions of the worm's tail to determine wiggling. This is | 2659 #+caption: contractions of the worm's tail to determine wiggling. This is |
2680 #+caption: signigicant because there is no particular frame that clearly | 2660 #+caption: significant because there is no particular frame that clearly |
2681 #+caption: indicates that the worm is wiggling --- only when multiple frames | 2661 #+caption: indicates that the worm is wiggling --- only when multiple frames |
2682 #+caption: are analyzed together is the wiggling revealed. Defining | 2662 #+caption: are analyzed together is the wiggling revealed. Defining |
2683 #+caption: wiggling this way also gives the worm an opportunity to learn | 2663 #+caption: wiggling this way also gives the worm an opportunity to learn |
2684 #+caption: and recognize ``frustrated wiggling'', where the worm tries to | 2664 #+caption: and recognize ``frustrated wiggling'', where the worm tries to |
2685 #+caption: wiggle but can't. Frustrated wiggling is very visually different | 2665 #+caption: wiggle but can't. Frustrated wiggling is very visually different |
2736 (resting? experiences) (.setText text "Resting"))) | 2716 (resting? experiences) (.setText text "Resting"))) |
2737 #+end_src | 2717 #+end_src |
2738 #+end_listing | 2718 #+end_listing |
2739 | 2719 |
2740 #+caption: Using =debug-experience=, the body-centered predicates | 2720 #+caption: Using =debug-experience=, the body-centered predicates |
2741 #+caption: work together to classify the behaviour of the worm. | 2721 #+caption: work together to classify the behavior of the worm. |
2742 #+caption: the predicates are operating with access to the worm's | 2722 #+caption: the predicates are operating with access to the worm's |
2743 #+caption: full sensory data. | 2723 #+caption: full sensory data. |
2744 #+name: basic-worm-view | 2724 #+name: basic-worm-view |
2745 #+ATTR_LaTeX: :width 10cm | 2725 #+ATTR_LaTeX: :width 10cm |
2746 [[./images/worm-identify-init.png]] | 2726 [[./images/worm-identify-init.png]] |
2747 | 2727 |
2748 These action predicates satisfy the recognition requirement of an | 2728 These action predicates satisfy the recognition requirement of an |
2749 empathic recognition system. There is power in the simplicity of | 2729 empathic recognition system. There is power in the simplicity of |
2750 the action predicates. They describe their actions without getting | 2730 the action predicates. They describe their actions without getting |
2751 confused in visual details of the worm. Each one is frame | 2731 confused in visual details of the worm. Each one is frame |
2752 independent, but more than that, they are each indepent of | 2732 independent, but more than that, they are each independent of |
2753 irrelevant visual details of the worm and the environment. They | 2733 irrelevant visual details of the worm and the environment. They |
2754 will work regardless of whether the worm is a different color or | 2734 will work regardless of whether the worm is a different color or |
2755 hevaily textured, or if the environment has strange lighting. | 2735 heavily textured, or if the environment has strange lighting. |
2756 | 2736 |
2757 The trick now is to make the action predicates work even when the | 2737 The trick now is to make the action predicates work even when the |
2758 sensory data on which they depend is absent. If I can do that, then | 2738 sensory data on which they depend is absent. If I can do that, then |
2759 I will have gained much, | 2739 I will have gained much, |
2760 | 2740 |
2774 touching and at the same time not also experience the sensation of | 2754 touching and at the same time not also experience the sensation of |
2775 touching itself. | 2755 touching itself. |
2776 | 2756 |
2777 As the worm moves around during free play and its experience vector | 2757 As the worm moves around during free play and its experience vector |
2778 grows larger, the vector begins to define a subspace which is all | 2758 grows larger, the vector begins to define a subspace which is all |
2779 the sensations the worm can practicaly experience during normal | 2759 the sensations the worm can practically experience during normal |
2780 operation. I call this subspace \Phi-space, short for | 2760 operation. I call this subspace \Phi-space, short for |
2781 physical-space. The experience vector defines a path through | 2761 physical-space. The experience vector defines a path through |
2782 \Phi-space. This path has interesting properties that all derive | 2762 \Phi-space. This path has interesting properties that all derive |
2783 from physical embodiment. The proprioceptive components are | 2763 from physical embodiment. The proprioceptive components are |
2784 completely smooth, because in order for the worm to move from one | 2764 completely smooth, because in order for the worm to move from one |
2799 activations of the worm's muscles, because it generally takes a | 2779 activations of the worm's muscles, because it generally takes a |
2800 unique combination of muscle contractions to transform the worm's | 2780 unique combination of muscle contractions to transform the worm's |
2801 body along a specific path through \Phi-space. | 2781 body along a specific path through \Phi-space. |
2802 | 2782 |
2803 There is a simple way of taking \Phi-space and the total ordering | 2783 There is a simple way of taking \Phi-space and the total ordering |
2804 provided by an experience vector and reliably infering the rest of | 2784 provided by an experience vector and reliably inferring the rest of |
2805 the senses. | 2785 the senses. |
2806 | 2786 |
2807 ** Empathy is the process of tracing though \Phi-space | 2787 ** Empathy is the process of tracing though \Phi-space |
2808 | 2788 |
2809 Here is the core of a basic empathy algorithm, starting with an | 2789 Here is the core of a basic empathy algorithm, starting with an |
2815 | 2795 |
2816 Then, given a sequence of proprioceptive input, generate a set of | 2796 Then, given a sequence of proprioceptive input, generate a set of |
2817 matching experience records for each input, using the tiered | 2797 matching experience records for each input, using the tiered |
2818 proprioceptive bins. | 2798 proprioceptive bins. |
2819 | 2799 |
2820 Finally, to infer sensory data, select the longest consective chain | 2800 Finally, to infer sensory data, select the longest consecutive chain |
2821 of experiences. Conecutive experience means that the experiences | 2801 of experiences. Consecutive experience means that the experiences |
2822 appear next to each other in the experience vector. | 2802 appear next to each other in the experience vector. |
2823 | 2803 |
2824 This algorithm has three advantages: | 2804 This algorithm has three advantages: |
2825 | 2805 |
2826 1. It's simple | 2806 1. It's simple |
2831 proprioceptive bin. Redundant experiences in \Phi-space can be | 2811 proprioceptive bin. Redundant experiences in \Phi-space can be |
2832 merged to save computation. | 2812 merged to save computation. |
2833 | 2813 |
2834 2. It protects from wrong interpretations of transient ambiguous | 2814 2. It protects from wrong interpretations of transient ambiguous |
2835 proprioceptive data. For example, if the worm is flat for just | 2815 proprioceptive data. For example, if the worm is flat for just |
2836 an instant, this flattness will not be interpreted as implying | 2816 an instant, this flatness will not be interpreted as implying |
2837 that the worm has its muscles relaxed, since the flattness is | 2817 that the worm has its muscles relaxed, since the flatness is |
2838 part of a longer chain which includes a distinct pattern of | 2818 part of a longer chain which includes a distinct pattern of |
2839 muscle activation. Markov chains or other memoryless statistical | 2819 muscle activation. Markov chains or other memoryless statistical |
2840 models that operate on individual frames may very well make this | 2820 models that operate on individual frames may very well make this |
2841 mistake. | 2821 mistake. |
2842 | 2822 |
2853 (flatten) | 2833 (flatten) |
2854 (mapv #(Math/round (* % (Math/pow 10 (dec digits)))))))) | 2834 (mapv #(Math/round (* % (Math/pow 10 (dec digits)))))))) |
2855 | 2835 |
2856 (defn gen-phi-scan | 2836 (defn gen-phi-scan |
2857 "Nearest-neighbors with binning. Only returns a result if | 2837 "Nearest-neighbors with binning. Only returns a result if |
2858 the propriceptive data is within 10% of a previously recorded | 2838 the proprioceptive data is within 10% of a previously recorded |
2859 result in all dimensions." | 2839 result in all dimensions." |
2860 [phi-space] | 2840 [phi-space] |
2861 (let [bin-keys (map bin [3 2 1]) | 2841 (let [bin-keys (map bin [3 2 1]) |
2862 bin-maps | 2842 bin-maps |
2863 (map (fn [bin-key] | 2843 (map (fn [bin-key] |
2880 | 2860 |
2881 =longest-thread= infers sensory data by stitching together pieces | 2861 =longest-thread= infers sensory data by stitching together pieces |
2882 from previous experience. It prefers longer chains of previous | 2862 from previous experience. It prefers longer chains of previous |
2883 experience to shorter ones. For example, during training the worm | 2863 experience to shorter ones. For example, during training the worm |
2884 might rest on the ground for one second before it performs its | 2864 might rest on the ground for one second before it performs its |
2885 excercises. If during recognition the worm rests on the ground for | 2865 exercises. If during recognition the worm rests on the ground for |
2886 five seconds, =longest-thread= will accomodate this five second | 2866 five seconds, =longest-thread= will accommodate this five second |
2887 rest period by looping the one second rest chain five times. | 2867 rest period by looping the one second rest chain five times. |
2888 | 2868 |
2889 =longest-thread= takes time proportinal to the average number of | 2869 =longest-thread= takes time proportional to the average number of |
2890 entries in a proprioceptive bin, because for each element in the | 2870 entries in a proprioceptive bin, because for each element in the |
2891 starting bin it performes a series of set lookups in the preceeding | 2871 starting bin it performs a series of set lookups in the preceding |
2892 bins. If the total history is limited, then this is only a constant | 2872 bins. If the total history is limited, then this is only a constant |
2893 multiple times the number of entries in the starting bin. This | 2873 multiple times the number of entries in the starting bin. This |
2894 analysis also applies even if the action requires multiple longest | 2874 analysis also applies even if the action requires multiple longest |
2895 chains -- it's still the average number of entries in a | 2875 chains -- it's still the average number of entries in a |
2896 proprioceptive bin times the desired chain length. Because | 2876 proprioceptive bin times the desired chain length. Because |
2964 | 2944 |
2965 To use =EMPATH= with the worm, I first need to gather a set of | 2945 To use =EMPATH= with the worm, I first need to gather a set of |
2966 experiences from the worm that includes the actions I want to | 2946 experiences from the worm that includes the actions I want to |
2967 recognize. The =generate-phi-space= program (listing | 2947 recognize. The =generate-phi-space= program (listing |
2968 \ref{generate-phi-space} runs the worm through a series of | 2948 \ref{generate-phi-space} runs the worm through a series of |
2969 exercices and gatheres those experiences into a vector. The | 2949 exercises and gatherers those experiences into a vector. The |
2970 =do-all-the-things= program is a routine expressed in a simple | 2950 =do-all-the-things= program is a routine expressed in a simple |
2971 muscle contraction script language for automated worm control. It | 2951 muscle contraction script language for automated worm control. It |
2972 causes the worm to rest, curl, and wiggle over about 700 frames | 2952 causes the worm to rest, curl, and wiggle over about 700 frames |
2973 (approx. 11 seconds). | 2953 (approx. 11 seconds). |
2974 | 2954 |
2975 #+caption: Program to gather the worm's experiences into a vector for | 2955 #+caption: Program to gather the worm's experiences into a vector for |
2976 #+caption: further processing. The =motor-control-program= line uses | 2956 #+caption: further processing. The =motor-control-program= line uses |
2977 #+caption: a motor control script that causes the worm to execute a series | 2957 #+caption: a motor control script that causes the worm to execute a series |
2978 #+caption: of ``exercices'' that include all the action predicates. | 2958 #+caption: of ``exercises'' that include all the action predicates. |
2979 #+name: generate-phi-space | 2959 #+name: generate-phi-space |
2980 #+begin_listing clojure | 2960 #+begin_listing clojure |
2981 #+begin_src clojure | 2961 #+begin_src clojure |
2982 (def do-all-the-things | 2962 (def do-all-the-things |
2983 (concat | 2963 (concat |
3037 on simulated sensory data just as well as with actual data. Figure | 3017 on simulated sensory data just as well as with actual data. Figure |
3038 \ref{empathy-debug-image} was generated using =empathy-experiment=: | 3018 \ref{empathy-debug-image} was generated using =empathy-experiment=: |
3039 | 3019 |
3040 #+caption: From only proprioceptive data, =EMPATH= was able to infer | 3020 #+caption: From only proprioceptive data, =EMPATH= was able to infer |
3041 #+caption: the complete sensory experience and classify four poses | 3021 #+caption: the complete sensory experience and classify four poses |
3042 #+caption: (The last panel shows a composite image of \emph{wriggling}, | 3022 #+caption: (The last panel shows a composite image of /wiggling/, |
3043 #+caption: a dynamic pose.) | 3023 #+caption: a dynamic pose.) |
3044 #+name: empathy-debug-image | 3024 #+name: empathy-debug-image |
3045 #+ATTR_LaTeX: :width 10cm :placement [H] | 3025 #+ATTR_LaTeX: :width 10cm :placement [H] |
3046 [[./images/empathy-1.png]] | 3026 [[./images/empathy-1.png]] |
3047 | 3027 |
3048 One way to measure the performance of =EMPATH= is to compare the | 3028 One way to measure the performance of =EMPATH= is to compare the |
3049 sutiability of the imagined sense experience to trigger the same | 3029 suitability of the imagined sense experience to trigger the same |
3050 action predicates as the real sensory experience. | 3030 action predicates as the real sensory experience. |
3051 | 3031 |
3052 #+caption: Determine how closely empathy approximates actual | 3032 #+caption: Determine how closely empathy approximates actual |
3053 #+caption: sensory data. | 3033 #+caption: sensory data. |
3054 #+name: test-empathy-accuracy | 3034 #+name: test-empathy-accuracy |
3084 #+end_src | 3064 #+end_src |
3085 #+end_listing | 3065 #+end_listing |
3086 | 3066 |
3087 Running =test-empathy-accuracy= using the very short exercise | 3067 Running =test-empathy-accuracy= using the very short exercise |
3088 program defined in listing \ref{generate-phi-space}, and then doing | 3068 program defined in listing \ref{generate-phi-space}, and then doing |
3089 a similar pattern of activity manually yeilds an accuracy of around | 3069 a similar pattern of activity manually yields an accuracy of around |
3090 73%. This is based on very limited worm experience. By training the | 3070 73%. This is based on very limited worm experience. By training the |
3091 worm for longer, the accuracy dramatically improves. | 3071 worm for longer, the accuracy dramatically improves. |
3092 | 3072 |
3093 #+caption: Program to generate \Phi-space using manual training. | 3073 #+caption: Program to generate \Phi-space using manual training. |
3094 #+name: manual-phi-space | 3074 #+name: manual-phi-space |
3111 After about 1 minute of manual training, I was able to achieve 95% | 3091 After about 1 minute of manual training, I was able to achieve 95% |
3112 accuracy on manual testing of the worm using =init-interactive= and | 3092 accuracy on manual testing of the worm using =init-interactive= and |
3113 =test-empathy-accuracy=. The majority of errors are near the | 3093 =test-empathy-accuracy=. The majority of errors are near the |
3114 boundaries of transitioning from one type of action to another. | 3094 boundaries of transitioning from one type of action to another. |
3115 During these transitions the exact label for the action is more open | 3095 During these transitions the exact label for the action is more open |
3116 to interpretation, and dissaggrement between empathy and experience | 3096 to interpretation, and disagreement between empathy and experience |
3117 is more excusable. | 3097 is more excusable. |
3118 | 3098 |
3119 ** Digression: Learn touch sensor layout through free play | 3099 ** Digression: Learn touch sensor layout through free play |
3120 | 3100 |
3121 In the previous section I showed how to compute actions in terms of | 3101 In the previous section I showed how to compute actions in terms of |
3122 body-centered predicates which relied averate touch activation of | 3102 body-centered predicates which relied on the average touch |
3123 pre-defined regions of the worm's skin. What if, instead of | 3103 activation of pre-defined regions of the worm's skin. What if, |
3124 recieving touch pre-grouped into the six faces of each worm | 3104 instead of receiving touch pre-grouped into the six faces of each |
3125 segment, the true topology of the worm's skin was unknown? This is | 3105 worm segment, the true topology of the worm's skin was unknown? |
3126 more similiar to how a nerve fiber bundle might be arranged. While | 3106 This is more similar to how a nerve fiber bundle might be |
3127 two fibers that are close in a nerve bundle /might/ correspond to | 3107 arranged. While two fibers that are close in a nerve bundle /might/ |
3128 two touch sensors that are close together on the skin, the process | 3108 correspond to two touch sensors that are close together on the |
3129 of taking a complicated surface and forcing it into essentially a | 3109 skin, the process of taking a complicated surface and forcing it |
3130 circle requires some cuts and rerragenments. | 3110 into essentially a circle requires some cuts and rearrangements. |
3131 | 3111 |
3132 In this section I show how to automatically learn the skin-topology of | 3112 In this section I show how to automatically learn the skin-topology of |
3133 a worm segment by free exploration. As the worm rolls around on the | 3113 a worm segment by free exploration. As the worm rolls around on the |
3134 floor, large sections of its surface get activated. If the worm has | 3114 floor, large sections of its surface get activated. If the worm has |
3135 stopped moving, then whatever region of skin that is touching the | 3115 stopped moving, then whatever region of skin that is touching the |
3149 (= (set (map first touch)) (set full-contact))) | 3129 (= (set (map first touch)) (set full-contact))) |
3150 #+end_src | 3130 #+end_src |
3151 #+end_listing | 3131 #+end_listing |
3152 | 3132 |
3153 After collecting these important regions, there will many nearly | 3133 After collecting these important regions, there will many nearly |
3154 similiar touch regions. While for some purposes the subtle | 3134 similar touch regions. While for some purposes the subtle |
3155 differences between these regions will be important, for my | 3135 differences between these regions will be important, for my |
3156 purposes I colapse them into mostly non-overlapping sets using | 3136 purposes I collapse them into mostly non-overlapping sets using |
3157 =remove-similiar= in listing \ref{remove-similiar} | 3137 =remove-similar= in listing \ref{remove-similar} |
3158 | 3138 |
3159 #+caption: Program to take a lits of set of points and ``collapse them'' | 3139 #+caption: Program to take a list of sets of points and ``collapse them'' |
3160 #+caption: so that the remaining sets in the list are siginificantly | 3140 #+caption: so that the remaining sets in the list are significantly |
3161 #+caption: different from each other. Prefer smaller sets to larger ones. | 3141 #+caption: different from each other. Prefer smaller sets to larger ones. |
3162 #+name: remove-similiar | 3142 #+name: remove-similar |
3163 #+begin_listing clojure | 3143 #+begin_listing clojure |
3164 #+begin_src clojure | 3144 #+begin_src clojure |
3165 (defn remove-similar | 3145 (defn remove-similar |
3166 [coll] | 3146 [coll] |
3167 (loop [result () coll (sort-by (comp - count) coll)] | 3147 (loop [result () coll (sort-by (comp - count) coll)] |
3179 #+end_listing | 3159 #+end_listing |
3180 | 3160 |
3181 Actually running this simulation is easy given =CORTEX='s facilities. | 3161 Actually running this simulation is easy given =CORTEX='s facilities. |
3182 | 3162 |
3183 #+caption: Collect experiences while the worm moves around. Filter the touch | 3163 #+caption: Collect experiences while the worm moves around. Filter the touch |
3184 #+caption: sensations by stable ones, collapse similiar ones together, | 3164 #+caption: sensations by stable ones, collapse similar ones together, |
3185 #+caption: and report the regions learned. | 3165 #+caption: and report the regions learned. |
3186 #+name: learn-touch | 3166 #+name: learn-touch |
3187 #+begin_listing clojure | 3167 #+begin_listing clojure |
3188 #+begin_src clojure | 3168 #+begin_src clojure |
3189 (defn learn-touch-regions [] | 3169 (defn learn-touch-regions [] |
3214 (map view-touch-region | 3194 (map view-touch-region |
3215 (learn-touch-regions))) | 3195 (learn-touch-regions))) |
3216 #+end_src | 3196 #+end_src |
3217 #+end_listing | 3197 #+end_listing |
3218 | 3198 |
3219 The only thing remining to define is the particular motion the worm | 3199 The only thing remaining to define is the particular motion the worm |
3220 must take. I accomplish this with a simple motor control program. | 3200 must take. I accomplish this with a simple motor control program. |
3221 | 3201 |
3222 #+caption: Motor control program for making the worm roll on the ground. | 3202 #+caption: Motor control program for making the worm roll on the ground. |
3223 #+caption: This could also be replaced with random motion. | 3203 #+caption: This could also be replaced with random motion. |
3224 #+name: worm-roll | 3204 #+name: worm-roll |
3273 | 3253 |
3274 While simple, =learn-touch-regions= exploits regularities in both | 3254 While simple, =learn-touch-regions= exploits regularities in both |
3275 the worm's physiology and the worm's environment to correctly | 3255 the worm's physiology and the worm's environment to correctly |
3276 deduce that the worm has six sides. Note that =learn-touch-regions= | 3256 deduce that the worm has six sides. Note that =learn-touch-regions= |
3277 would work just as well even if the worm's touch sense data were | 3257 would work just as well even if the worm's touch sense data were |
3278 completely scrambled. The cross shape is just for convienence. This | 3258 completely scrambled. The cross shape is just for convenience. This |
3279 example justifies the use of pre-defined touch regions in =EMPATH=. | 3259 example justifies the use of pre-defined touch regions in =EMPATH=. |
3280 | 3260 |
3281 * Contributions | 3261 * Contributions |
3282 | 3262 |
3283 In this thesis you have seen the =CORTEX= system, a complete | 3263 In this thesis you have seen the =CORTEX= system, a complete |
3284 environment for creating simulated creatures. You have seen how to | 3264 environment for creating simulated creatures. You have seen how to |
3285 implement five senses: touch, proprioception, hearing, vision, and | 3265 implement five senses: touch, proprioception, hearing, vision, and |
3286 muscle tension. You have seen how to create new creatues using | 3266 muscle tension. You have seen how to create new creatures using |
3287 blender, a 3D modeling tool. I hope that =CORTEX= will be useful in | 3267 blender, a 3D modeling tool. I hope that =CORTEX= will be useful in |
3288 further research projects. To this end I have included the full | 3268 further research projects. To this end I have included the full |
3289 source to =CORTEX= along with a large suite of tests and examples. I | 3269 source to =CORTEX= along with a large suite of tests and examples. I |
3290 have also created a user guide for =CORTEX= which is inculded in an | 3270 have also created a user guide for =CORTEX= which is included in an |
3291 appendix to this thesis \ref{}. | 3271 appendix to this thesis. |
3292 # dxh: todo reference appendix | |
3293 | 3272 |
3294 You have also seen how I used =CORTEX= as a platform to attach the | 3273 You have also seen how I used =CORTEX= as a platform to attach the |
3295 /action recognition/ problem, which is the problem of recognizing | 3274 /action recognition/ problem, which is the problem of recognizing |
3296 actions in video. You saw a simple system called =EMPATH= which | 3275 actions in video. You saw a simple system called =EMPATH= which |
3297 ientifies actions by first describing actions in a body-centerd, | 3276 identifies actions by first describing actions in a body-centered, |
3298 rich sense language, then infering a full range of sensory | 3277 rich sense language, then inferring a full range of sensory |
3299 experience from limited data using previous experience gained from | 3278 experience from limited data using previous experience gained from |
3300 free play. | 3279 free play. |
3301 | 3280 |
3302 As a minor digression, you also saw how I used =CORTEX= to enable a | 3281 As a minor digression, you also saw how I used =CORTEX= to enable a |
3303 tiny worm to discover the topology of its skin simply by rolling on | 3282 tiny worm to discover the topology of its skin simply by rolling on |
3304 the ground. | 3283 the ground. |
3305 | 3284 |
3306 In conclusion, the main contributions of this thesis are: | 3285 In conclusion, the main contributions of this thesis are: |
3307 | 3286 |
3308 - =CORTEX=, a system for creating simulated creatures with rich | 3287 - =CORTEX=, a comprehensive platform for embodied AI experiments. |
3309 senses. | 3288 =CORTEX= supports many features lacking in other systems, such |
3310 - =EMPATH=, a program for recognizing actions by imagining sensory | 3289 proper simulation of hearing. It is easy to create new =CORTEX= |
3311 experience. | 3290 creatures using Blender, a free 3D modeling program. |
3312 | 3291 |
3313 # An anatomical joke: | 3292 - =EMPATH=, which uses =CORTEX= to identify the actions of a |
3314 # - Training | 3293 worm-like creature using a computational model of empathy. |
3315 # - Skeletal imitation | 3294 |
3316 # - Sensory fleshing-out | |
3317 # - Classification | |
3318 #+BEGIN_LaTeX | 3295 #+BEGIN_LaTeX |
3319 \appendix | 3296 \appendix |
3320 #+END_LaTeX | 3297 #+END_LaTeX |
3298 | |
3321 * Appendix: =CORTEX= User Guide | 3299 * Appendix: =CORTEX= User Guide |
3322 | 3300 |
3323 Those who write a thesis should endeavor to make their code not only | 3301 Those who write a thesis should endeavor to make their code not only |
3324 accessable, but actually useable, as a way to pay back the community | 3302 accessible, but actually usable, as a way to pay back the community |
3325 that made the thesis possible in the first place. This thesis would | 3303 that made the thesis possible in the first place. This thesis would |
3326 not be possible without Free Software such as jMonkeyEngine3, | 3304 not be possible without Free Software such as jMonkeyEngine3, |
3327 Blender, clojure, emacs, ffmpeg, and many other tools. That is why I | 3305 Blender, clojure, emacs, ffmpeg, and many other tools. That is why I |
3328 have included this user guide, in the hope that someone else might | 3306 have included this user guide, in the hope that someone else might |
3329 find =CORTEX= useful. | 3307 find =CORTEX= useful. |
3347 | 3325 |
3348 ** Creating creatures | 3326 ** Creating creatures |
3349 | 3327 |
3350 Creatures are created using /Blender/, a free 3D modeling program. | 3328 Creatures are created using /Blender/, a free 3D modeling program. |
3351 You will need Blender version 2.6 when using the =CORTEX= included | 3329 You will need Blender version 2.6 when using the =CORTEX= included |
3352 in this thesis. You create a =CORTEX= creature in a similiar manner | 3330 in this thesis. You create a =CORTEX= creature in a similar manner |
3353 to modeling anything in Blender, except that you also create | 3331 to modeling anything in Blender, except that you also create |
3354 several trees of empty nodes which define the creature's senses. | 3332 several trees of empty nodes which define the creature's senses. |
3355 | 3333 |
3356 *** Mass | 3334 *** Mass |
3357 | 3335 |
3415 The eye will point outward from the X-axis of the node, and ``up'' | 3393 The eye will point outward from the X-axis of the node, and ``up'' |
3416 will be in the direction of the X-axis of the node. It will help | 3394 will be in the direction of the X-axis of the node. It will help |
3417 to set the empty node's display mode to ``Arrows'' so that you can | 3395 to set the empty node's display mode to ``Arrows'' so that you can |
3418 clearly see the direction of the axes. | 3396 clearly see the direction of the axes. |
3419 | 3397 |
3420 Each retina file should contain white pixels whever you want to be | 3398 Each retina file should contain white pixels wherever you want to be |
3421 sensitive to your chosen color. If you want the entire field of | 3399 sensitive to your chosen color. If you want the entire field of |
3422 view, specify :all of 0xFFFFFF and a retinal map that is entirely | 3400 view, specify :all of 0xFFFFFF and a retinal map that is entirely |
3423 white. | 3401 white. |
3424 | 3402 |
3425 Here is a sample retinal map: | 3403 Here is a sample retinal map: |
3451 #+BEGIN_EXAMPLE | 3429 #+BEGIN_EXAMPLE |
3452 <touch-UV-map-file-name> | 3430 <touch-UV-map-file-name> |
3453 #+END_EXAMPLE | 3431 #+END_EXAMPLE |
3454 | 3432 |
3455 You may also include an optional ``scale'' metadata number to | 3433 You may also include an optional ``scale'' metadata number to |
3456 specifiy the length of the touch feelers. The default is $0.1$, | 3434 specify the length of the touch feelers. The default is $0.1$, |
3457 and this is generally sufficient. | 3435 and this is generally sufficient. |
3458 | 3436 |
3459 The touch UV should contain white pixels for each touch sensor. | 3437 The touch UV should contain white pixels for each touch sensor. |
3460 | 3438 |
3461 Here is an example touch-uv map that approximates a human finger, | 3439 Here is an example touch-uv map that approximates a human finger, |
3473 #+caption: model of a fingertip. | 3451 #+caption: model of a fingertip. |
3474 #+name: guide-fingertip | 3452 #+name: guide-fingertip |
3475 #+ATTR_LaTeX: :width 9cm :placement [H] | 3453 #+ATTR_LaTeX: :width 9cm :placement [H] |
3476 [[./images/finger-2.png]] | 3454 [[./images/finger-2.png]] |
3477 | 3455 |
3478 *** Propriocepotion | 3456 *** Proprioception |
3479 | 3457 |
3480 Proprioception is tied to each joint node -- nothing special must | 3458 Proprioception is tied to each joint node -- nothing special must |
3481 be done in a blender model to enable proprioception other than | 3459 be done in a blender model to enable proprioception other than |
3482 creating joint nodes. | 3460 creating joint nodes. |
3483 | 3461 |
3580 | 3558 |
3581 - =(load-blender-model file-name)= :: create a node structure | 3559 - =(load-blender-model file-name)= :: create a node structure |
3582 representing that described in a blender file. | 3560 representing that described in a blender file. |
3583 | 3561 |
3584 - =(light-up-everything world)= :: distribute a standard compliment | 3562 - =(light-up-everything world)= :: distribute a standard compliment |
3585 of lights throught the simulation. Should be adequate for most | 3563 of lights throughout the simulation. Should be adequate for most |
3586 purposes. | 3564 purposes. |
3587 | 3565 |
3588 - =(node-seq node)= :: return a recursuve list of the node's | 3566 - =(node-seq node)= :: return a recursive list of the node's |
3589 children. | 3567 children. |
3590 | 3568 |
3591 - =(nodify name children)= :: construct a node given a node-name and | 3569 - =(nodify name children)= :: construct a node given a node-name and |
3592 desired children. | 3570 desired children. |
3593 | 3571 |
3636 =[activation, length]= pairs for each touch hair. | 3614 =[activation, length]= pairs for each touch hair. |
3637 | 3615 |
3638 - =(proprioception! creature)= :: give the creature the sense of | 3616 - =(proprioception! creature)= :: give the creature the sense of |
3639 proprioception. Returns a list of functions, one for each | 3617 proprioception. Returns a list of functions, one for each |
3640 joint, that when called during a running simulation will | 3618 joint, that when called during a running simulation will |
3641 report the =[headnig, pitch, roll]= of the joint. | 3619 report the =[heading, pitch, roll]= of the joint. |
3642 | 3620 |
3643 - =(movement! creature)= :: give the creature the power of movement. | 3621 - =(movement! creature)= :: give the creature the power of movement. |
3644 Creates a list of functions, one for each muscle, that when | 3622 Creates a list of functions, one for each muscle, that when |
3645 called with an integer, will set the recruitment of that | 3623 called with an integer, will set the recruitment of that |
3646 muscle to that integer, and will report the current power | 3624 muscle to that integer, and will report the current power |
3675 | 3653 |
3676 - =(mega-import-jme3)= :: for experimenting at the REPL. This | 3654 - =(mega-import-jme3)= :: for experimenting at the REPL. This |
3677 function will import all jMonkeyEngine3 classes for immediate | 3655 function will import all jMonkeyEngine3 classes for immediate |
3678 use. | 3656 use. |
3679 | 3657 |
3680 - =(display-dialated-time world timer)= :: Shows the time as it is | 3658 - =(display-dilated-time world timer)= :: Shows the time as it is |
3681 flowing in the simulation on a HUD display. | 3659 flowing in the simulation on a HUD display. |
3682 | 3660 |
3683 | 3661 |
3684 | 3662 |