comparison thesis/cortex.org @ 547:5d89879fc894

couple hours worth of edits.
author Robert McIntyre <rlm@mit.edu>
date Mon, 28 Apr 2014 15:10:59 -0400
parents b2c66ea58c39
children 0b891e0dd809
comparison
equal deleted inserted replaced
546:f4770e3d30ae 547:5d89879fc894
41 [[./images/aurellem-gray.png]] 41 [[./images/aurellem-gray.png]]
42 42
43 43
44 * Empathy \& Embodiment: problem solving strategies 44 * Empathy \& Embodiment: problem solving strategies
45 45
46 By the end of this thesis, you will have seen a novel approach to 46 By the end of this thesis, you will have a novel approach to
47 interpreting video using embodiment and empathy. You will also see 47 representing an recognizing physical actions using embodiment and
48 one way to efficiently implement physical empathy for embodied 48 empathy. You will also see one way to efficiently implement physical
49 creatures. Finally, you will become familiar with =CORTEX=, a system 49 empathy for embodied creatures. Finally, you will become familiar
50 for designing and simulating creatures with rich senses, which I 50 with =CORTEX=, a system for designing and simulating creatures with
51 have designed as a library that you can use in your own research. 51 rich senses, which I have designed as a library that you can use in
52 Note that I /do not/ process video directly --- I start with 52 your own research. Note that I /do not/ process video directly --- I
53 knowledge of the positions of a creature's body parts and works from 53 start with knowledge of the positions of a creature's body parts and
54 there. 54 works from there.
55 55
56 This is the core vision of my thesis: That one of the important ways 56 This is the core vision of my thesis: That one of the important ways
57 in which we understand others is by imagining ourselves in their 57 in which we understand others is by imagining ourselves in their
58 position and emphatically feeling experiences relative to our own 58 position and emphatically feeling experiences relative to our own
59 bodies. By understanding events in terms of our own previous 59 bodies. By understanding events in terms of our own previous
79 the problem is that many computer vision systems focus on 79 the problem is that many computer vision systems focus on
80 pixel-level details or comparisons to example images (such as 80 pixel-level details or comparisons to example images (such as
81 \cite{volume-action-recognition}), but the 3D world is so variable 81 \cite{volume-action-recognition}), but the 3D world is so variable
82 that it is hard to describe the world in terms of possible images. 82 that it is hard to describe the world in terms of possible images.
83 83
84 In fact, the contents of scene may have much less to do with pixel 84 In fact, the contents of a scene may have much less to do with
85 probabilities than with recognizing various affordances: things you 85 pixel probabilities than with recognizing various affordances:
86 can move, objects you can grasp, spaces that can be filled . For 86 things you can move, objects you can grasp, spaces that can be
87 example, what processes might enable you to see the chair in figure 87 filled . For example, what processes might enable you to see the
88 \ref{hidden-chair}? 88 chair in figure \ref{hidden-chair}?
89 89
90 #+caption: The chair in this image is quite obvious to humans, but 90 #+caption: The chair in this image is quite obvious to humans, but
91 #+caption: it can't be found by any modern computer vision program. 91 #+caption: it can't be found by any modern computer vision program.
92 #+name: hidden-chair 92 #+name: hidden-chair
93 #+ATTR_LaTeX: :width 10cm 93 #+ATTR_LaTeX: :width 10cm
104 [[./images/wall-push.png]] 104 [[./images/wall-push.png]]
105 105
106 Each of these examples tells us something about what might be going 106 Each of these examples tells us something about what might be going
107 on in our minds as we easily solve these recognition problems: 107 on in our minds as we easily solve these recognition problems:
108 108
109 The hidden chair shows us that we are strongly triggered by cues 109 - The hidden chair shows us that we are strongly triggered by cues
110 relating to the position of human bodies, and that we can determine 110 relating to the position of human bodies, and that we can
111 the overall physical configuration of a human body even if much of 111 determine the overall physical configuration of a human body even
112 that body is occluded. 112 if much of that body is occluded.
113 113
114 The picture of the girl pushing against the wall tells us that we 114 - The picture of the girl pushing against the wall tells us that we
115 have common sense knowledge about the kinetics of our own bodies. 115 have common sense knowledge about the kinetics of our own bodies.
116 We know well how our muscles would have to work to maintain us in 116 We know well how our muscles would have to work to maintain us in
117 most positions, and we can easily project this self-knowledge to 117 most positions, and we can easily project this self-knowledge to
118 imagined positions triggered by images of the human body. 118 imagined positions triggered by images of the human body.
119 119
120 The cat tells us that imagination of some kind plays an important 120 - The cat tells us that imagination of some kind plays an important
121 role in understanding actions. The question is: Can we be more 121 role in understanding actions. The question is: Can we be more
122 precise about what sort of imagination is required to understand 122 precise about what sort of imagination is required to understand
123 these actions? 123 these actions?
124 124
125 ** A step forward: the sensorimotor-centered approach 125 ** A step forward: the sensorimotor-centered approach
126 126
127 In this thesis, I explore the idea that our knowledge of our own 127 In this thesis, I explore the idea that our knowledge of our own
128 bodies, combined with our own rich senses, enables us to recognize 128 bodies, combined with our own rich senses, enables us to recognize
133 imagine putting their face up against a stream of water and 133 imagine putting their face up against a stream of water and
134 sticking out their tongue. In that imagined world, they can feel 134 sticking out their tongue. In that imagined world, they can feel
135 the cool water hitting their tongue, and feel the water entering 135 the cool water hitting their tongue, and feel the water entering
136 their body, and are able to recognize that /feeling/ as drinking. 136 their body, and are able to recognize that /feeling/ as drinking.
137 So, the label of the action is not really in the pixels of the 137 So, the label of the action is not really in the pixels of the
138 image, but is found clearly in a simulation inspired by those 138 image, but is found clearly in a simulation / recollection inspired
139 pixels. An imaginative system, having been trained on drinking and 139 by those pixels. An imaginative system, having been trained on
140 non-drinking examples and learning that the most important 140 drinking and non-drinking examples and learning that the most
141 component of drinking is the feeling of water sliding down one's 141 important component of drinking is the feeling of water sliding
142 throat, would analyze a video of a cat drinking in the following 142 down one's throat, would analyze a video of a cat drinking in the
143 manner: 143 following manner:
144 144
145 1. Create a physical model of the video by putting a ``fuzzy'' 145 1. Create a physical model of the video by putting a ``fuzzy''
146 model of its own body in place of the cat. Possibly also create 146 model of its own body in place of the cat. Possibly also create
147 a simulation of the stream of water. 147 a simulation of the stream of water.
148 148
191 action. The power in this method lies in the fact that you describe 191 action. The power in this method lies in the fact that you describe
192 all actions from a body-centered viewpoint. You are less tied to 192 all actions from a body-centered viewpoint. You are less tied to
193 the particulars of any visual representation of the actions. If you 193 the particulars of any visual representation of the actions. If you
194 teach the system what ``running'' is, and you have a good enough 194 teach the system what ``running'' is, and you have a good enough
195 aligner, the system will from then on be able to recognize running 195 aligner, the system will from then on be able to recognize running
196 from any point of view, even strange points of view like above or 196 from any point of view -- even strange points of view like above or
197 underneath the runner. This is in contrast to action recognition 197 underneath the runner. This is in contrast to action recognition
198 schemes that try to identify actions using a non-embodied approach. 198 schemes that try to identify actions using a non-embodied approach.
199 If these systems learn about running as viewed from the side, they 199 If these systems learn about running as viewed from the side, they
200 will not automatically be able to recognize running from any other 200 will not automatically be able to recognize running from any other
201 viewpoint. 201 viewpoint.
202 202
203 Another powerful advantage is that using the language of multiple 203 Another powerful advantage is that using the language of multiple
204 body-centered rich senses to describe body-centered actions offers a 204 body-centered rich senses to describe body-centered actions offers
205 massive boost in descriptive capability. Consider how difficult it 205 a massive boost in descriptive capability. Consider how difficult
206 would be to compose a set of HOG filters to describe the action of 206 it would be to compose a set of HOG (Histogram of Oriented
207 a simple worm-creature ``curling'' so that its head touches its 207 Gradients) filters to describe the action of a simple worm-creature
208 tail, and then behold the simplicity of describing thus action in a 208 ``curling'' so that its head touches its tail, and then behold the
209 language designed for the task (listing \ref{grand-circle-intro}): 209 simplicity of describing thus action in a language designed for the
210 task (listing \ref{grand-circle-intro}):
210 211
211 #+caption: Body-centered actions are best expressed in a body-centered 212 #+caption: Body-centered actions are best expressed in a body-centered
212 #+caption: language. This code detects when the worm has curled into a 213 #+caption: language. This code detects when the worm has curled into a
213 #+caption: full circle. Imagine how you would replicate this functionality 214 #+caption: full circle. Imagine how you would replicate this functionality
214 #+caption: using low-level pixel features such as HOG filters! 215 #+caption: using low-level pixel features such as HOG filters!
270 extent trigger previous experience keyed to hearing or touch. 271 extent trigger previous experience keyed to hearing or touch.
271 Segments of previous experiences gained from play are stitched 272 Segments of previous experiences gained from play are stitched
272 together to form a coherent and complete sensory portrait of 273 together to form a coherent and complete sensory portrait of
273 the scene. 274 the scene.
274 275
275 - Recognition :: With the scene described in terms of 276 - Recognition :: With the scene described in terms of remembered
276 remembered first person sensory events, the creature can now 277 first person sensory events, the creature can now run its
277 run its action-identified programs (such as the one in listing 278 action-definition programs (such as the one in listing
278 \ref{grand-circle-intro} on this synthesized sensory data, 279 \ref{grand-circle-intro}) on this synthesized sensory data,
279 just as it would if it were actually experiencing the scene 280 just as it would if it were actually experiencing the scene
280 first-hand. If previous experience has been accurately 281 first-hand. If previous experience has been accurately
281 retrieved, and if it is analogous enough to the scene, then 282 retrieved, and if it is analogous enough to the scene, then
282 the creature will correctly identify the action in the scene. 283 the creature will correctly identify the action in the scene.
283 284
325 I built =CORTEX= to be a general AI research platform for doing 326 I built =CORTEX= to be a general AI research platform for doing
326 experiments involving multiple rich senses and a wide variety and 327 experiments involving multiple rich senses and a wide variety and
327 number of creatures. I intend it to be useful as a library for many 328 number of creatures. I intend it to be useful as a library for many
328 more projects than just this thesis. =CORTEX= was necessary to meet 329 more projects than just this thesis. =CORTEX= was necessary to meet
329 a need among AI researchers at CSAIL and beyond, which is that 330 a need among AI researchers at CSAIL and beyond, which is that
330 people often will invent neat ideas that are best expressed in the 331 people often will invent wonderful ideas that are best expressed in
331 language of creatures and senses, but in order to explore those 332 the language of creatures and senses, but in order to explore those
332 ideas they must first build a platform in which they can create 333 ideas they must first build a platform in which they can create
333 simulated creatures with rich senses! There are many ideas that 334 simulated creatures with rich senses! There are many ideas that
334 would be simple to execute (such as =EMPATH= or 335 would be simple to execute (such as =EMPATH= or Larson's
335 \cite{larson-symbols}), but attached to them is the multi-month 336 self-organizing maps (\cite{larson-symbols})), but attached to them
336 effort to make a good creature simulator. Often, that initial 337 is the multi-month effort to make a good creature simulator. Often,
337 investment of time proves to be too much, and the project must make 338 that initial investment of time proves to be too much, and the
338 do with a lesser environment. 339 project must make do with a lesser environment or be abandoned
340 entirely.
339 341
340 =CORTEX= is well suited as an environment for embodied AI research 342 =CORTEX= is well suited as an environment for embodied AI research
341 for three reasons: 343 for three reasons:
342 344
343 - You can create new creatures using Blender (\cite{blender}), a 345 - You can design new creatures using Blender (\cite{blender}), a
344 popular 3D modeling program. Each sense can be specified using 346 popular 3D modeling program. Each sense can be specified using
345 special blender nodes with biologically inspired parameters. You 347 special blender nodes with biologically inspired parameters. You
346 need not write any code to create a creature, and can use a wide 348 need not write any code to create a creature, and can use a wide
347 library of pre-existing blender models as a base for your own 349 library of pre-existing blender models as a base for your own
348 creatures. 350 creatures.
350 - =CORTEX= implements a wide variety of senses: touch, 352 - =CORTEX= implements a wide variety of senses: touch,
351 proprioception, vision, hearing, and muscle tension. Complicated 353 proprioception, vision, hearing, and muscle tension. Complicated
352 senses like touch and vision involve multiple sensory elements 354 senses like touch and vision involve multiple sensory elements
353 embedded in a 2D surface. You have complete control over the 355 embedded in a 2D surface. You have complete control over the
354 distribution of these sensor elements through the use of simple 356 distribution of these sensor elements through the use of simple
355 png image files. In particular, =CORTEX= implements more 357 png image files. =CORTEX= implements more comprehensive hearing
356 comprehensive hearing than any other creature simulation system 358 than any other creature simulation system available.
357 available.
358 359
359 - =CORTEX= supports any number of creatures and any number of 360 - =CORTEX= supports any number of creatures and any number of
360 senses. Time in =CORTEX= dilates so that the simulated creatures 361 senses. Time in =CORTEX= dilates so that the simulated creatures
361 always perceive a perfectly smooth flow of time, regardless of 362 always perceive a perfectly smooth flow of time, regardless of
362 the actual computational load. 363 the actual computational load.
423 over the history and implementation details presented here, is 424 over the history and implementation details presented here, is
424 provided in an appendix at the end of this thesis.) 425 provided in an appendix at the end of this thesis.)
425 426
426 Throughout this project, I intended for =CORTEX= to be flexible and 427 Throughout this project, I intended for =CORTEX= to be flexible and
427 extensible enough to be useful for other researchers who want to 428 extensible enough to be useful for other researchers who want to
428 test out ideas of their own. To this end, wherever I have had to make 429 test ideas of their own. To this end, wherever I have had to make
429 architectural choices about =CORTEX=, I have chosen to give as much 430 architectural choices about =CORTEX=, I have chosen to give as much
430 freedom to the user as possible, so that =CORTEX= may be used for 431 freedom to the user as possible, so that =CORTEX= may be used for
431 things I have not foreseen. 432 things I have not foreseen.
432 433
433 ** Building in simulation versus reality 434 ** Building in simulation versus reality
435 use a computer-simulated environment in the first place! The world 436 use a computer-simulated environment in the first place! The world
436 is a vast and rich place, and for now simulations are a very poor 437 is a vast and rich place, and for now simulations are a very poor
437 reflection of its complexity. It may be that there is a significant 438 reflection of its complexity. It may be that there is a significant
438 qualitative difference between dealing with senses in the real 439 qualitative difference between dealing with senses in the real
439 world and dealing with pale facsimiles of them in a simulation 440 world and dealing with pale facsimiles of them in a simulation
440 \cite{brooks-representation}. What are the advantages and 441 (\cite{brooks-representation}). What are the advantages and
441 disadvantages of a simulation vs. reality? 442 disadvantages of a simulation vs. reality?
442 443
443 *** Simulation 444 *** Simulation
444 445
445 The advantages of virtual reality are that when everything is a 446 The advantages of virtual reality are that when everything is a
446 simulation, experiments in that simulation are absolutely 447 simulation, experiments in that simulation are absolutely
447 reproducible. It's also easier to change the character and world 448 reproducible. It's also easier to change the creature and
448 to explore new situations and different sensory combinations. 449 environment to explore new situations and different sensory
450 combinations.
449 451
450 If the world is to be simulated on a computer, then not only do 452 If the world is to be simulated on a computer, then not only do
451 you have to worry about whether the character's senses are rich 453 you have to worry about whether the creature's senses are rich
452 enough to learn from the world, but whether the world itself is 454 enough to learn from the world, but whether the world itself is
453 rendered with enough detail and realism to give enough working 455 rendered with enough detail and realism to give enough working
454 material to the character's senses. To name just a few 456 material to the creature's senses. To name just a few
455 difficulties facing modern physics simulators: destructibility of 457 difficulties facing modern physics simulators: destructibility of
456 the environment, simulation of water/other fluids, large areas, 458 the environment, simulation of water/other fluids, large areas,
457 nonrigid bodies, lots of objects, smoke. I don't know of any 459 nonrigid bodies, lots of objects, smoke. I don't know of any
458 computer simulation that would allow a character to take a rock 460 computer simulation that would allow a creature to take a rock
459 and grind it into fine dust, then use that dust to make a clay 461 and grind it into fine dust, then use that dust to make a clay
460 sculpture, at least not without spending years calculating the 462 sculpture, at least not without spending years calculating the
461 interactions of every single small grain of dust. Maybe a 463 interactions of every single small grain of dust. Maybe a
462 simulated world with today's limitations doesn't provide enough 464 simulated world with today's limitations doesn't provide enough
463 richness for real intelligence to evolve. 465 richness for real intelligence to evolve.
469 loose in the real world. This has the advantage of eliminating 471 loose in the real world. This has the advantage of eliminating
470 concerns about simulating the world at the expense of increasing 472 concerns about simulating the world at the expense of increasing
471 the complexity of implementing the senses. Instead of just 473 the complexity of implementing the senses. Instead of just
472 grabbing the current rendered frame for processing, you have to 474 grabbing the current rendered frame for processing, you have to
473 use an actual camera with real lenses and interact with photons to 475 use an actual camera with real lenses and interact with photons to
474 get an image. It is much harder to change the character, which is 476 get an image. It is much harder to change the creature, which is
475 now partly a physical robot of some sort, since doing so involves 477 now partly a physical robot of some sort, since doing so involves
476 changing things around in the real world instead of modifying 478 changing things around in the real world instead of modifying
477 lines of code. While the real world is very rich and definitely 479 lines of code. While the real world is very rich and definitely
478 provides enough stimulation for intelligence to develop as 480 provides enough stimulation for intelligence to develop (as
479 evidenced by our own existence, it is also uncontrollable in the 481 evidenced by our own existence), it is also uncontrollable in the
480 sense that a particular situation cannot be recreated perfectly or 482 sense that a particular situation cannot be recreated perfectly or
481 saved for later use. It is harder to conduct science because it is 483 saved for later use. It is harder to conduct Science because it is
482 harder to repeat an experiment. The worst thing about using the 484 harder to repeat an experiment. The worst thing about using the
483 real world instead of a simulation is the matter of time. Instead 485 real world instead of a simulation is the matter of time. Instead
484 of simulated time you get the constant and unstoppable flow of 486 of simulated time you get the constant and unstoppable flow of
485 real time. This severely limits the sorts of software you can use 487 real time. This severely limits the sorts of software you can use
486 to program an AI, because all sense inputs must be handled in real 488 to program an AI, because all sense inputs must be handled in real
487 time. Complicated ideas may have to be implemented in hardware or 489 time. Complicated ideas may have to be implemented in hardware or
488 may simply be impossible given the current speed of our 490 may simply be impossible given the current speed of our
489 processors. Contrast this with a simulation, in which the flow of 491 processors. Contrast this with a simulation, in which the flow of
490 time in the simulated world can be slowed down to accommodate the 492 time in the simulated world can be slowed down to accommodate the
491 limitations of the character's programming. In terms of cost, 493 limitations of the creature's programming. In terms of cost, doing
492 doing everything in software is far cheaper than building custom 494 everything in software is far cheaper than building custom
493 real-time hardware. All you need is a laptop and some patience. 495 real-time hardware. All you need is a laptop and some patience.
494 496
495 ** Simulated time enables rapid prototyping \& simple programs 497 ** Simulated time enables rapid prototyping \& simple programs
496 498
497 I envision =CORTEX= being used to support rapid prototyping and 499 I envision =CORTEX= being used to support rapid prototyping and
503 The need for real time processing only increases if multiple senses 505 The need for real time processing only increases if multiple senses
504 are involved. In the extreme case, even simple algorithms will have 506 are involved. In the extreme case, even simple algorithms will have
505 to be accelerated by ASIC chips or FPGAs, turning what would 507 to be accelerated by ASIC chips or FPGAs, turning what would
506 otherwise be a few lines of code and a 10x speed penalty into a 508 otherwise be a few lines of code and a 10x speed penalty into a
507 multi-month ordeal. For this reason, =CORTEX= supports 509 multi-month ordeal. For this reason, =CORTEX= supports
508 /time-dilation/, which scales back the framerate of the 510 /time-dilation/, which scales back the framerate of the simulation
509 simulation in proportion to the amount of processing each frame. 511 in proportion to the amount of processing each frame. From the
510 From the perspective of the creatures inside the simulation, time 512 perspective of the creatures inside the simulation, time always
511 always appears to flow at a constant rate, regardless of how 513 appears to flow at a constant rate, regardless of how complicated
512 complicated the environment becomes or how many creatures are in 514 the environment becomes or how many creatures are in the
513 the simulation. The cost is that =CORTEX= can sometimes run slower 515 simulation. The cost is that =CORTEX= can sometimes run slower than
514 than real time. This can also be an advantage, however --- 516 real time. Time dialation works both ways, however --- simulations
515 simulations of very simple creatures in =CORTEX= generally run at 517 of very simple creatures in =CORTEX= generally run at 40x real-time
516 40x on my machine! 518 on my machine!
517 519
518 ** All sense organs are two-dimensional surfaces 520 ** All sense organs are two-dimensional surfaces
519 521
520 If =CORTEX= is to support a wide variety of senses, it would help 522 If =CORTEX= is to support a wide variety of senses, it would help
521 to have a better understanding of what a ``sense'' actually is! 523 to have a better understanding of what a sense actually is! While
522 While vision, touch, and hearing all seem like they are quite 524 vision, touch, and hearing all seem like they are quite different
523 different things, I was surprised to learn during the course of 525 things, I was surprised to learn during the course of this thesis
524 this thesis that they (and all physical senses) can be expressed as 526 that they (and all physical senses) can be expressed as exactly the
525 exactly the same mathematical object due to a dimensional argument! 527 same mathematical object!
526 528
527 Human beings are three-dimensional objects, and the nerves that 529 Human beings are three-dimensional objects, and the nerves that
528 transmit data from our various sense organs to our brain are 530 transmit data from our various sense organs to our brain are
529 essentially one-dimensional. This leaves up to two dimensions in 531 essentially one-dimensional. This leaves up to two dimensions in
530 which our sensory information may flow. For example, imagine your 532 which our sensory information may flow. For example, imagine your
543 complicated surface of the skin onto a two dimensional image. 545 complicated surface of the skin onto a two dimensional image.
544 546
545 Most human senses consist of many discrete sensors of various 547 Most human senses consist of many discrete sensors of various
546 properties distributed along a surface at various densities. For 548 properties distributed along a surface at various densities. For
547 skin, it is Pacinian corpuscles, Meissner's corpuscles, Merkel's 549 skin, it is Pacinian corpuscles, Meissner's corpuscles, Merkel's
548 disks, and Ruffini's endings \cite{textbook901}, which detect 550 disks, and Ruffini's endings (\cite{textbook901}), which detect
549 pressure and vibration of various intensities. For ears, it is the 551 pressure and vibration of various intensities. For ears, it is the
550 stereocilia distributed along the basilar membrane inside the 552 stereocilia distributed along the basilar membrane inside the
551 cochlea; each one is sensitive to a slightly different frequency of 553 cochlea; each one is sensitive to a slightly different frequency of
552 sound. For eyes, it is rods and cones distributed along the surface 554 sound. For eyes, it is rods and cones distributed along the surface
553 of the retina. In each case, we can describe the sense with a 555 of the retina. In each case, we can describe the sense with a
554 surface and a distribution of sensors along that surface. 556 surface and a distribution of sensors along that surface.
555 557
556 In fact, almost every human sense can be effectively described in 558 In fact, almost every human sense can be effectively described in
557 terms of a surface containing embedded sensors. If the sense had 559 terms of a surface containing embedded sensors. If the sense had
558 any more dimensions, then there wouldn't be enough room in the 560 any more dimensions, then there wouldn't be enough room in the
559 spinal chord to transmit the information! 561 spinal cord to transmit the information!
560 562
561 Therefore, =CORTEX= must support the ability to create objects and 563 Therefore, =CORTEX= must support the ability to create objects and
562 then be able to ``paint'' points along their surfaces to describe 564 then be able to ``paint'' points along their surfaces to describe
563 each sense. 565 each sense.
564 566
565 Fortunately this idea is already a well known computer graphics 567 Fortunately this idea is already a well known computer graphics
566 technique called /UV-mapping/. The three-dimensional surface of a 568 technique called /UV-mapping/. In UV-maping, the three-dimensional
567 model is cut and smooshed until it fits on a two-dimensional 569 surface of a model is cut and smooshed until it fits on a
568 image. You paint whatever you want on that image, and when the 570 two-dimensional image. You paint whatever you want on that image,
569 three-dimensional shape is rendered in a game the smooshing and 571 and when the three-dimensional shape is rendered in a game the
570 cutting is reversed and the image appears on the three-dimensional 572 smooshing and cutting is reversed and the image appears on the
571 object. 573 three-dimensional object.
572 574
573 To make a sense, interpret the UV-image as describing the 575 To make a sense, interpret the UV-image as describing the
574 distribution of that senses sensors. To get different types of 576 distribution of that senses sensors. To get different types of
575 sensors, you can either use a different color for each type of 577 sensors, you can either use a different color for each type of
576 sensor, or use multiple UV-maps, each labeled with that sensor 578 sensor, or use multiple UV-maps, each labeled with that sensor
608 tools that can be co-opted to serve as touch, proprioception, and 610 tools that can be co-opted to serve as touch, proprioception, and
609 muscles. Since some games support split screen views, a good video 611 muscles. Since some games support split screen views, a good video
610 game engine will allow you to efficiently create multiple cameras 612 game engine will allow you to efficiently create multiple cameras
611 in the simulated world that can be used as eyes. Video game systems 613 in the simulated world that can be used as eyes. Video game systems
612 offer integrated asset management for things like textures and 614 offer integrated asset management for things like textures and
613 creatures models, providing an avenue for defining creatures. They 615 creature models, providing an avenue for defining creatures. They
614 also understand UV-mapping, since this technique is used to apply a 616 also understand UV-mapping, since this technique is used to apply a
615 texture to a model. Finally, because video game engines support a 617 texture to a model. Finally, because video game engines support a
616 large number of users, as long as =CORTEX= doesn't stray too far 618 large number of developers, as long as =CORTEX= doesn't stray too
617 from the base system, other researchers can turn to this community 619 far from the base system, other researchers can turn to this
618 for help when doing their research. 620 community for help when doing their research.
619 621
620 ** =CORTEX= is based on jMonkeyEngine3 622 ** =CORTEX= is based on jMonkeyEngine3
621 623
622 While preparing to build =CORTEX= I studied several video game 624 While preparing to build =CORTEX= I studied several video game
623 engines to see which would best serve as a base. The top contenders 625 engines to see which would best serve as a base. The top contenders
624 were: 626 were:
625 627
626 - [[http://www.idsoftware.com][Quake II]]/[[http://www.bytonic.de/html/jake2.html][Jake2]] :: The Quake II engine was designed by ID 628 - [[http://www.idsoftware.com][Quake II]]/[[http://www.bytonic.de/html/jake2.html][Jake2]] :: The Quake II engine was designed by ID software
627 software in 1997. All the source code was released by ID 629 in 1997. All the source code was released by ID software into
628 software into the Public Domain several years ago, and as a 630 the Public Domain several years ago, and as a result it has
629 result it has been ported to many different languages. This 631 been ported to many different languages. This engine was
630 engine was famous for its advanced use of realistic shading 632 famous for its advanced use of realistic shading and it had
631 and had decent and fast physics simulation. The main advantage 633 decent and fast physics simulation. The main advantage of the
632 of the Quake II engine is its simplicity, but I ultimately 634 Quake II engine is its simplicity, but I ultimately rejected
633 rejected it because the engine is too tied to the concept of a 635 it because the engine is too tied to the concept of a
634 first-person shooter game. One of the problems I had was that 636 first-person shooter game. One of the problems I had was that
635 there does not seem to be any easy way to attach multiple 637 there does not seem to be any easy way to attach multiple
636 cameras to a single character. There are also several physics 638 cameras to a single character. There are also several physics
637 clipping issues that are corrected in a way that only applies 639 clipping issues that are corrected in a way that only applies
638 to the main character and do not apply to arbitrary objects. 640 to the main character and do not apply to arbitrary objects.
668 creatures. If possible, it would be nice to leverage work that has 670 creatures. If possible, it would be nice to leverage work that has
669 already been done by the community of 3D modelers, or at least 671 already been done by the community of 3D modelers, or at least
670 enable people who are talented at modeling but not programming to 672 enable people who are talented at modeling but not programming to
671 design =CORTEX= creatures. 673 design =CORTEX= creatures.
672 674
673 Therefore, I use Blender, a free 3D modeling program, as the main 675 Therefore I use Blender, a free 3D modeling program, as the main
674 way to create creatures in =CORTEX=. However, the creatures modeled 676 way to create creatures in =CORTEX=. However, the creatures modeled
675 in Blender must also be simple to simulate in jMonkeyEngine3's game 677 in Blender must also be simple to simulate in jMonkeyEngine3's game
676 engine, and must also be easy to rig with =CORTEX='s senses. I 678 engine, and must also be easy to rig with =CORTEX='s senses. I
677 accomplish this with extensive use of Blender's ``empty nodes.'' 679 accomplish this with extensive use of Blender's ``empty nodes.''
678 680
679 Empty nodes have no mass, physical presence, or appearance, but 681 Empty nodes have no mass, physical presence, or appearance, but
680 they can hold metadata and have names. I use a tree structure of 682 they can hold metadata and have names. I use a tree structure of
681 empty nodes to specify senses in the following manner: 683 empty nodes to specify senses in the following manner:
682 684
697 699
698 ** Bodies are composed of segments connected by joints 700 ** Bodies are composed of segments connected by joints
699 701
700 Blender is a general purpose animation tool, which has been used in 702 Blender is a general purpose animation tool, which has been used in
701 the past to create high quality movies such as Sintel 703 the past to create high quality movies such as Sintel
702 \cite{blender}. Though Blender can model and render even complicated 704 (\cite{blender}). Though Blender can model and render even
703 things like water, it is crucial to keep models that are meant to 705 complicated things like water, it is crucial to keep models that
704 be simulated as creatures simple. =Bullet=, which =CORTEX= uses 706 are meant to be simulated as creatures simple. =Bullet=, which
705 though jMonkeyEngine3, is a rigid-body physics system. This offers 707 =CORTEX= uses though jMonkeyEngine3, is a rigid-body physics
706 a compromise between the expressiveness of a game level and the 708 system. This offers a compromise between the expressiveness of a
707 speed at which it can be simulated, and it means that creatures 709 game level and the speed at which it can be simulated, and it means
708 should be naturally expressed as rigid components held together by 710 that creatures should be naturally expressed as rigid components
709 joint constraints. 711 held together by joint constraints.
710 712
711 But humans are more like a squishy bag wrapped around some hard 713 But humans are more like a squishy bag wrapped around some hard
712 bones which define the overall shape. When we move, our skin bends 714 bones which define the overall shape. When we move, our skin bends
713 and stretches to accommodate the new positions of our bones. 715 and stretches to accommodate the new positions of our bones.
714 716
727 it about the true extent of its body. Simulating the skin as a 729 it about the true extent of its body. Simulating the skin as a
728 physical object requires some way to continuously update the 730 physical object requires some way to continuously update the
729 physical model of the skin along with the movement of the bones, 731 physical model of the skin along with the movement of the bones,
730 which is unacceptably slow compared to rigid body simulation. 732 which is unacceptably slow compared to rigid body simulation.
731 733
732 Therefore, instead of using the human-like ``deformable bag of 734 Therefore, instead of using the human-like ``bony meatbag''
733 bones'' approach, I decided to base my body plans on multiple solid 735 approach, I decided to base my body plans on multiple solid objects
734 objects that are connected by joints, inspired by the robot =EVE= 736 that are connected by joints, inspired by the robot =EVE= from the
735 from the movie WALL-E. 737 movie WALL-E.
736 738
737 #+caption: =EVE= from the movie WALL-E. This body plan turns 739 #+caption: =EVE= from the movie WALL-E. This body plan turns
738 #+caption: out to be much better suited to my purposes than a more 740 #+caption: out to be much better suited to my purposes than a more
739 #+caption: human-like one. 741 #+caption: human-like one.
740 #+ATTR_LaTeX: :width 10cm 742 #+ATTR_LaTeX: :width 10cm
741 [[./images/Eve.jpg]] 743 [[./images/Eve.jpg]]
742 744
743 =EVE='s body is composed of several rigid components that are held 745 =EVE='s body is composed of several rigid components that are held
744 together by invisible joint constraints. This is what I mean by 746 together by invisible joint constraints. This is what I mean by
745 ``eve-like''. The main reason that I use eve-style bodies is for 747 /eve-like/. The main reason that I use eve-like bodies is for
746 efficiency, and so that there will be correspondence between the 748 simulation efficiency, and so that there will be correspondence
747 AI's senses and the physical presence of its body. Each individual 749 between the AI's senses and the physical presence of its body. Each
748 section is simulated by a separate rigid body that corresponds 750 individual section is simulated by a separate rigid body that
749 exactly with its visual representation and does not change. 751 corresponds exactly with its visual representation and does not
750 Sections are connected by invisible joints that are well supported 752 change. Sections are connected by invisible joints that are well
751 in jMonkeyEngine3. Bullet, the physics backend for jMonkeyEngine3, 753 supported in jMonkeyEngine3. Bullet, the physics backend for
752 can efficiently simulate hundreds of rigid bodies connected by 754 jMonkeyEngine3, can efficiently simulate hundreds of rigid bodies
753 joints. Just because sections are rigid does not mean they have to 755 connected by joints. Just because sections are rigid does not mean
754 stay as one piece forever; they can be dynamically replaced with 756 they have to stay as one piece forever; they can be dynamically
755 multiple sections to simulate splitting in two. This could be used 757 replaced with multiple sections to simulate splitting in two. This
756 to simulate retractable claws or =EVE='s hands, which are able to 758 could be used to simulate retractable claws or =EVE='s hands, which
757 coalesce into one object in the movie. 759 are able to coalesce into one object in the movie.
758 760
759 *** Solidifying/Connecting a body 761 *** Solidifying/Connecting a body
760 762
761 =CORTEX= creates a creature in two steps: first, it traverses the 763 =CORTEX= creates a creature in two steps: first, it traverses the
762 nodes in the blender file and creates physical representations for 764 nodes in the blender file and creates physical representations for
2441 2443
2442 - Empathy :: my empathy program leaves many areas for 2444 - Empathy :: my empathy program leaves many areas for
2443 improvement, among which are using vision to infer 2445 improvement, among which are using vision to infer
2444 proprioception and looking up sensory experience with imagined 2446 proprioception and looking up sensory experience with imagined
2445 vision, touch, and sound. 2447 vision, touch, and sound.
2446 - Evolution :: Karl Sims created a rich environment for 2448 - Evolution :: Karl Sims created a rich environment for simulating
2447 simulating the evolution of creatures on a connection 2449 the evolution of creatures on a Connection Machine
2448 machine. Today, this can be redone and expanded with =CORTEX= 2450 (\cite{sims-evolving-creatures}). Today, this can be redone
2449 on an ordinary computer. 2451 and expanded with =CORTEX= on an ordinary computer.
2450 - Exotic senses :: Cortex enables many fascinating senses that are 2452 - Exotic senses :: Cortex enables many fascinating senses that are
2451 not possible to build in the real world. For example, 2453 not possible to build in the real world. For example,
2452 telekinesis is an interesting avenue to explore. You can also 2454 telekinesis is an interesting avenue to explore. You can also
2453 make a ``semantic'' sense which looks up metadata tags on 2455 make a ``semantic'' sense which looks up metadata tags on
2454 objects in the environment the metadata tags might contain 2456 objects in the environment the metadata tags might contain
2455 other sensory information. 2457 other sensory information.
2456 - Imagination via subworlds :: this would involve a creature with 2458 - Imagination via subworlds :: this would involve a creature with
2457 an effector which creates an entire new sub-simulation where 2459 an effector which creates an entire new sub-simulation where
2458 the creature has direct control over placement/creation of 2460 the creature has direct control over placement/creation of
2459 objects via simulated telekinesis. The creature observes this 2461 objects via simulated telekinesis. The creature observes this
2460 sub-world through it's normal senses and uses its observations 2462 sub-world through its normal senses and uses its observations
2461 to make predictions about its top level world. 2463 to make predictions about its top level world.
2462 - Simulated prescience :: step the simulation forward a few ticks, 2464 - Simulated prescience :: step the simulation forward a few ticks,
2463 gather sensory data, then supply this data for the creature as 2465 gather sensory data, then supply this data for the creature as
2464 one of its actual senses. The cost of prescience is slowing 2466 one of its actual senses. The cost of prescience is slowing
2465 the simulation down by a factor proportional to however far 2467 the simulation down by a factor proportional to however far
2468 fight each other? 2470 fight each other?
2469 - Swarm creatures :: Program a group of creatures that cooperate 2471 - Swarm creatures :: Program a group of creatures that cooperate
2470 with each other. Because the creatures would be simulated, you 2472 with each other. Because the creatures would be simulated, you
2471 could investigate computationally complex rules of behavior 2473 could investigate computationally complex rules of behavior
2472 which still, from the group's point of view, would happen in 2474 which still, from the group's point of view, would happen in
2473 ``real time''. Interactions could be as simple as cellular 2475 real time. Interactions could be as simple as cellular
2474 organisms communicating via flashing lights, or as complex as 2476 organisms communicating via flashing lights, or as complex as
2475 humanoids completing social tasks, etc. 2477 humanoids completing social tasks, etc.
2476 - =HACKER= for writing muscle-control programs :: Presented with 2478 - =HACKER= for writing muscle-control programs :: Presented with a
2477 low-level muscle control/ sense API, generate higher level 2479 low-level muscle control / sense API, generate higher level
2478 programs for accomplishing various stated goals. Example goals 2480 programs for accomplishing various stated goals. Example goals
2479 might be "extend all your fingers" or "move your hand into the 2481 might be "extend all your fingers" or "move your hand into the
2480 area with blue light" or "decrease the angle of this joint". 2482 area with blue light" or "decrease the angle of this joint".
2481 It would be like Sussman's HACKER, except it would operate 2483 It would be like Sussman's HACKER, except it would operate
2482 with much more data in a more realistic world. Start off with 2484 with much more data in a more realistic world. Start off with
2483 "calisthenics" to develop subroutines over the motor control 2485 "calisthenics" to develop subroutines over the motor control
2484 API. This would be the "spinal chord" of a more intelligent 2486 API. The low level programming code might be a turning machine
2485 creature. The low level programming code might be a turning 2487 that could develop programs to iterate over a "tape" where
2486 machine that could develop programs to iterate over a "tape" 2488 each entry in the tape could control recruitment of the fibers
2487 where each entry in the tape could control recruitment of the 2489 in a muscle.
2488 fibers in a muscle. 2490 - Sense fusion :: There is much work to be done on sense
2489 - Sense fusion :: There is much work to be done on sense
2490 integration -- building up a coherent picture of the world and 2491 integration -- building up a coherent picture of the world and
2491 the things in it with =CORTEX= as a base, you can explore 2492 the things in it. With =CORTEX= as a base, you can explore
2492 concepts like self-organizing maps or cross modal clustering 2493 concepts like self-organizing maps or cross modal clustering
2493 in ways that have never before been tried. 2494 in ways that have never before been tried.
2494 - Inverse kinematics :: experiments in sense guided motor control 2495 - Inverse kinematics :: experiments in sense guided motor control
2495 are easy given =CORTEX='s support -- you can get right to the 2496 are easy given =CORTEX='s support -- you can get right to the
2496 hard control problems without worrying about physics or 2497 hard control problems without worrying about physics or
2759 have terms that consider the color of a person's skin or whether 2760 have terms that consider the color of a person's skin or whether
2760 they are male or female, instead it gets right to the meat of what 2761 they are male or female, instead it gets right to the meat of what
2761 jumping actually /is/. 2762 jumping actually /is/.
2762 2763
2763 Of course, the action predicates are not directly applicable to 2764 Of course, the action predicates are not directly applicable to
2764 video data which lacks the advanced sensory information which they 2765 video data, which lacks the advanced sensory information which they
2765 require! 2766 require!
2766 2767
2767 The trick now is to make the action predicates work even when the 2768 The trick now is to make the action predicates work even when the
2768 sensory data on which they depend is absent. If I can do that, then 2769 sensory data on which they depend is absent. If I can do that, then
2769 I will have gained much. 2770 I will have gained much.
2856 #+BEGIN_EXAMPLE 2857 #+BEGIN_EXAMPLE
2857 [ flat, flat, flat, flat, flat, flat, lift-head ] 2858 [ flat, flat, flat, flat, flat, flat, lift-head ]
2858 #+END_EXAMPLE 2859 #+END_EXAMPLE
2859 2860
2860 The worm's previous experience of lying on the ground and lifting 2861 The worm's previous experience of lying on the ground and lifting
2861 its head generates possible interpretations for each frame: 2862 its head generates possible interpretations for each frame (the
2863 numbers are experience-indices):
2862 2864
2863 #+BEGIN_EXAMPLE 2865 #+BEGIN_EXAMPLE
2864 [ flat, flat, flat, flat, flat, flat, flat, lift-head ] 2866 [ flat, flat, flat, flat, flat, flat, flat, lift-head ]
2865 1 1 1 1 1 1 1 4 2867 1 1 1 1 1 1 1 4
2866 2 2 2 2 2 2 2 2868 2 2 2 2 2 2 2
2876 [ flat, flat, flat, flat, flat, flat, flat, lift-head ] 2878 [ flat, flat, flat, flat, flat, flat, flat, lift-head ]
2877 6 7 8 9 1 2 3 4 2879 6 7 8 9 1 2 3 4
2878 #+END_EXAMPLE 2880 #+END_EXAMPLE
2879 2881
2880 The new path through \Phi-space is synthesized from two actual 2882 The new path through \Phi-space is synthesized from two actual
2881 paths that the creature actually experiences, the "1-2-3-4" chain 2883 paths that the creature has experienced: the "1-2-3-4" chain and
2882 and the "6-7-8-9" chain. The "1-2-3-4" chain is necessary because 2884 the "6-7-8-9" chain. The "1-2-3-4" chain is necessary because it
2883 it ends with the worm lifting its head. It originated from a short 2885 ends with the worm lifting its head. It originated from a short
2884 training session where the worm rested on the floor for a brief 2886 training session where the worm rested on the floor for a brief
2885 while and then raised its head. The "6-7-8-9" chain is part of a 2887 while and then raised its head. The "6-7-8-9" chain is part of a
2886 longer chain of inactivity where the worm simply rested on the 2888 longer chain of inactivity where the worm simply rested on the
2887 floor without moving. It is preferred over a "1-2-3" chain (which 2889 floor without moving. It is preferred over a "1-2-3" chain (which
2888 also describes inactivity) because it is longer. The main ideas 2890 also describes inactivity) because it is longer. The main ideas
3798 - =(display-dilated-time world timer)= :: Shows the time as it is 3800 - =(display-dilated-time world timer)= :: Shows the time as it is
3799 flowing in the simulation on a HUD display. 3801 flowing in the simulation on a HUD display.
3800 3802
3801 3803
3802 3804
3805 TODO -- add a paper about detecting biological motion from only a few dots.