Mercurial > cortex
comparison thesis/cortex.org @ 547:5d89879fc894
couple hours worth of edits.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Mon, 28 Apr 2014 15:10:59 -0400 |
parents | b2c66ea58c39 |
children | 0b891e0dd809 |
comparison
equal
deleted
inserted
replaced
546:f4770e3d30ae | 547:5d89879fc894 |
---|---|
41 [[./images/aurellem-gray.png]] | 41 [[./images/aurellem-gray.png]] |
42 | 42 |
43 | 43 |
44 * Empathy \& Embodiment: problem solving strategies | 44 * Empathy \& Embodiment: problem solving strategies |
45 | 45 |
46 By the end of this thesis, you will have seen a novel approach to | 46 By the end of this thesis, you will have a novel approach to |
47 interpreting video using embodiment and empathy. You will also see | 47 representing an recognizing physical actions using embodiment and |
48 one way to efficiently implement physical empathy for embodied | 48 empathy. You will also see one way to efficiently implement physical |
49 creatures. Finally, you will become familiar with =CORTEX=, a system | 49 empathy for embodied creatures. Finally, you will become familiar |
50 for designing and simulating creatures with rich senses, which I | 50 with =CORTEX=, a system for designing and simulating creatures with |
51 have designed as a library that you can use in your own research. | 51 rich senses, which I have designed as a library that you can use in |
52 Note that I /do not/ process video directly --- I start with | 52 your own research. Note that I /do not/ process video directly --- I |
53 knowledge of the positions of a creature's body parts and works from | 53 start with knowledge of the positions of a creature's body parts and |
54 there. | 54 works from there. |
55 | 55 |
56 This is the core vision of my thesis: That one of the important ways | 56 This is the core vision of my thesis: That one of the important ways |
57 in which we understand others is by imagining ourselves in their | 57 in which we understand others is by imagining ourselves in their |
58 position and emphatically feeling experiences relative to our own | 58 position and emphatically feeling experiences relative to our own |
59 bodies. By understanding events in terms of our own previous | 59 bodies. By understanding events in terms of our own previous |
79 the problem is that many computer vision systems focus on | 79 the problem is that many computer vision systems focus on |
80 pixel-level details or comparisons to example images (such as | 80 pixel-level details or comparisons to example images (such as |
81 \cite{volume-action-recognition}), but the 3D world is so variable | 81 \cite{volume-action-recognition}), but the 3D world is so variable |
82 that it is hard to describe the world in terms of possible images. | 82 that it is hard to describe the world in terms of possible images. |
83 | 83 |
84 In fact, the contents of scene may have much less to do with pixel | 84 In fact, the contents of a scene may have much less to do with |
85 probabilities than with recognizing various affordances: things you | 85 pixel probabilities than with recognizing various affordances: |
86 can move, objects you can grasp, spaces that can be filled . For | 86 things you can move, objects you can grasp, spaces that can be |
87 example, what processes might enable you to see the chair in figure | 87 filled . For example, what processes might enable you to see the |
88 \ref{hidden-chair}? | 88 chair in figure \ref{hidden-chair}? |
89 | 89 |
90 #+caption: The chair in this image is quite obvious to humans, but | 90 #+caption: The chair in this image is quite obvious to humans, but |
91 #+caption: it can't be found by any modern computer vision program. | 91 #+caption: it can't be found by any modern computer vision program. |
92 #+name: hidden-chair | 92 #+name: hidden-chair |
93 #+ATTR_LaTeX: :width 10cm | 93 #+ATTR_LaTeX: :width 10cm |
104 [[./images/wall-push.png]] | 104 [[./images/wall-push.png]] |
105 | 105 |
106 Each of these examples tells us something about what might be going | 106 Each of these examples tells us something about what might be going |
107 on in our minds as we easily solve these recognition problems: | 107 on in our minds as we easily solve these recognition problems: |
108 | 108 |
109 The hidden chair shows us that we are strongly triggered by cues | 109 - The hidden chair shows us that we are strongly triggered by cues |
110 relating to the position of human bodies, and that we can determine | 110 relating to the position of human bodies, and that we can |
111 the overall physical configuration of a human body even if much of | 111 determine the overall physical configuration of a human body even |
112 that body is occluded. | 112 if much of that body is occluded. |
113 | 113 |
114 The picture of the girl pushing against the wall tells us that we | 114 - The picture of the girl pushing against the wall tells us that we |
115 have common sense knowledge about the kinetics of our own bodies. | 115 have common sense knowledge about the kinetics of our own bodies. |
116 We know well how our muscles would have to work to maintain us in | 116 We know well how our muscles would have to work to maintain us in |
117 most positions, and we can easily project this self-knowledge to | 117 most positions, and we can easily project this self-knowledge to |
118 imagined positions triggered by images of the human body. | 118 imagined positions triggered by images of the human body. |
119 | 119 |
120 The cat tells us that imagination of some kind plays an important | 120 - The cat tells us that imagination of some kind plays an important |
121 role in understanding actions. The question is: Can we be more | 121 role in understanding actions. The question is: Can we be more |
122 precise about what sort of imagination is required to understand | 122 precise about what sort of imagination is required to understand |
123 these actions? | 123 these actions? |
124 | 124 |
125 ** A step forward: the sensorimotor-centered approach | 125 ** A step forward: the sensorimotor-centered approach |
126 | 126 |
127 In this thesis, I explore the idea that our knowledge of our own | 127 In this thesis, I explore the idea that our knowledge of our own |
128 bodies, combined with our own rich senses, enables us to recognize | 128 bodies, combined with our own rich senses, enables us to recognize |
133 imagine putting their face up against a stream of water and | 133 imagine putting their face up against a stream of water and |
134 sticking out their tongue. In that imagined world, they can feel | 134 sticking out their tongue. In that imagined world, they can feel |
135 the cool water hitting their tongue, and feel the water entering | 135 the cool water hitting their tongue, and feel the water entering |
136 their body, and are able to recognize that /feeling/ as drinking. | 136 their body, and are able to recognize that /feeling/ as drinking. |
137 So, the label of the action is not really in the pixels of the | 137 So, the label of the action is not really in the pixels of the |
138 image, but is found clearly in a simulation inspired by those | 138 image, but is found clearly in a simulation / recollection inspired |
139 pixels. An imaginative system, having been trained on drinking and | 139 by those pixels. An imaginative system, having been trained on |
140 non-drinking examples and learning that the most important | 140 drinking and non-drinking examples and learning that the most |
141 component of drinking is the feeling of water sliding down one's | 141 important component of drinking is the feeling of water sliding |
142 throat, would analyze a video of a cat drinking in the following | 142 down one's throat, would analyze a video of a cat drinking in the |
143 manner: | 143 following manner: |
144 | 144 |
145 1. Create a physical model of the video by putting a ``fuzzy'' | 145 1. Create a physical model of the video by putting a ``fuzzy'' |
146 model of its own body in place of the cat. Possibly also create | 146 model of its own body in place of the cat. Possibly also create |
147 a simulation of the stream of water. | 147 a simulation of the stream of water. |
148 | 148 |
191 action. The power in this method lies in the fact that you describe | 191 action. The power in this method lies in the fact that you describe |
192 all actions from a body-centered viewpoint. You are less tied to | 192 all actions from a body-centered viewpoint. You are less tied to |
193 the particulars of any visual representation of the actions. If you | 193 the particulars of any visual representation of the actions. If you |
194 teach the system what ``running'' is, and you have a good enough | 194 teach the system what ``running'' is, and you have a good enough |
195 aligner, the system will from then on be able to recognize running | 195 aligner, the system will from then on be able to recognize running |
196 from any point of view, even strange points of view like above or | 196 from any point of view -- even strange points of view like above or |
197 underneath the runner. This is in contrast to action recognition | 197 underneath the runner. This is in contrast to action recognition |
198 schemes that try to identify actions using a non-embodied approach. | 198 schemes that try to identify actions using a non-embodied approach. |
199 If these systems learn about running as viewed from the side, they | 199 If these systems learn about running as viewed from the side, they |
200 will not automatically be able to recognize running from any other | 200 will not automatically be able to recognize running from any other |
201 viewpoint. | 201 viewpoint. |
202 | 202 |
203 Another powerful advantage is that using the language of multiple | 203 Another powerful advantage is that using the language of multiple |
204 body-centered rich senses to describe body-centered actions offers a | 204 body-centered rich senses to describe body-centered actions offers |
205 massive boost in descriptive capability. Consider how difficult it | 205 a massive boost in descriptive capability. Consider how difficult |
206 would be to compose a set of HOG filters to describe the action of | 206 it would be to compose a set of HOG (Histogram of Oriented |
207 a simple worm-creature ``curling'' so that its head touches its | 207 Gradients) filters to describe the action of a simple worm-creature |
208 tail, and then behold the simplicity of describing thus action in a | 208 ``curling'' so that its head touches its tail, and then behold the |
209 language designed for the task (listing \ref{grand-circle-intro}): | 209 simplicity of describing thus action in a language designed for the |
210 task (listing \ref{grand-circle-intro}): | |
210 | 211 |
211 #+caption: Body-centered actions are best expressed in a body-centered | 212 #+caption: Body-centered actions are best expressed in a body-centered |
212 #+caption: language. This code detects when the worm has curled into a | 213 #+caption: language. This code detects when the worm has curled into a |
213 #+caption: full circle. Imagine how you would replicate this functionality | 214 #+caption: full circle. Imagine how you would replicate this functionality |
214 #+caption: using low-level pixel features such as HOG filters! | 215 #+caption: using low-level pixel features such as HOG filters! |
270 extent trigger previous experience keyed to hearing or touch. | 271 extent trigger previous experience keyed to hearing or touch. |
271 Segments of previous experiences gained from play are stitched | 272 Segments of previous experiences gained from play are stitched |
272 together to form a coherent and complete sensory portrait of | 273 together to form a coherent and complete sensory portrait of |
273 the scene. | 274 the scene. |
274 | 275 |
275 - Recognition :: With the scene described in terms of | 276 - Recognition :: With the scene described in terms of remembered |
276 remembered first person sensory events, the creature can now | 277 first person sensory events, the creature can now run its |
277 run its action-identified programs (such as the one in listing | 278 action-definition programs (such as the one in listing |
278 \ref{grand-circle-intro} on this synthesized sensory data, | 279 \ref{grand-circle-intro}) on this synthesized sensory data, |
279 just as it would if it were actually experiencing the scene | 280 just as it would if it were actually experiencing the scene |
280 first-hand. If previous experience has been accurately | 281 first-hand. If previous experience has been accurately |
281 retrieved, and if it is analogous enough to the scene, then | 282 retrieved, and if it is analogous enough to the scene, then |
282 the creature will correctly identify the action in the scene. | 283 the creature will correctly identify the action in the scene. |
283 | 284 |
325 I built =CORTEX= to be a general AI research platform for doing | 326 I built =CORTEX= to be a general AI research platform for doing |
326 experiments involving multiple rich senses and a wide variety and | 327 experiments involving multiple rich senses and a wide variety and |
327 number of creatures. I intend it to be useful as a library for many | 328 number of creatures. I intend it to be useful as a library for many |
328 more projects than just this thesis. =CORTEX= was necessary to meet | 329 more projects than just this thesis. =CORTEX= was necessary to meet |
329 a need among AI researchers at CSAIL and beyond, which is that | 330 a need among AI researchers at CSAIL and beyond, which is that |
330 people often will invent neat ideas that are best expressed in the | 331 people often will invent wonderful ideas that are best expressed in |
331 language of creatures and senses, but in order to explore those | 332 the language of creatures and senses, but in order to explore those |
332 ideas they must first build a platform in which they can create | 333 ideas they must first build a platform in which they can create |
333 simulated creatures with rich senses! There are many ideas that | 334 simulated creatures with rich senses! There are many ideas that |
334 would be simple to execute (such as =EMPATH= or | 335 would be simple to execute (such as =EMPATH= or Larson's |
335 \cite{larson-symbols}), but attached to them is the multi-month | 336 self-organizing maps (\cite{larson-symbols})), but attached to them |
336 effort to make a good creature simulator. Often, that initial | 337 is the multi-month effort to make a good creature simulator. Often, |
337 investment of time proves to be too much, and the project must make | 338 that initial investment of time proves to be too much, and the |
338 do with a lesser environment. | 339 project must make do with a lesser environment or be abandoned |
340 entirely. | |
339 | 341 |
340 =CORTEX= is well suited as an environment for embodied AI research | 342 =CORTEX= is well suited as an environment for embodied AI research |
341 for three reasons: | 343 for three reasons: |
342 | 344 |
343 - You can create new creatures using Blender (\cite{blender}), a | 345 - You can design new creatures using Blender (\cite{blender}), a |
344 popular 3D modeling program. Each sense can be specified using | 346 popular 3D modeling program. Each sense can be specified using |
345 special blender nodes with biologically inspired parameters. You | 347 special blender nodes with biologically inspired parameters. You |
346 need not write any code to create a creature, and can use a wide | 348 need not write any code to create a creature, and can use a wide |
347 library of pre-existing blender models as a base for your own | 349 library of pre-existing blender models as a base for your own |
348 creatures. | 350 creatures. |
350 - =CORTEX= implements a wide variety of senses: touch, | 352 - =CORTEX= implements a wide variety of senses: touch, |
351 proprioception, vision, hearing, and muscle tension. Complicated | 353 proprioception, vision, hearing, and muscle tension. Complicated |
352 senses like touch and vision involve multiple sensory elements | 354 senses like touch and vision involve multiple sensory elements |
353 embedded in a 2D surface. You have complete control over the | 355 embedded in a 2D surface. You have complete control over the |
354 distribution of these sensor elements through the use of simple | 356 distribution of these sensor elements through the use of simple |
355 png image files. In particular, =CORTEX= implements more | 357 png image files. =CORTEX= implements more comprehensive hearing |
356 comprehensive hearing than any other creature simulation system | 358 than any other creature simulation system available. |
357 available. | |
358 | 359 |
359 - =CORTEX= supports any number of creatures and any number of | 360 - =CORTEX= supports any number of creatures and any number of |
360 senses. Time in =CORTEX= dilates so that the simulated creatures | 361 senses. Time in =CORTEX= dilates so that the simulated creatures |
361 always perceive a perfectly smooth flow of time, regardless of | 362 always perceive a perfectly smooth flow of time, regardless of |
362 the actual computational load. | 363 the actual computational load. |
423 over the history and implementation details presented here, is | 424 over the history and implementation details presented here, is |
424 provided in an appendix at the end of this thesis.) | 425 provided in an appendix at the end of this thesis.) |
425 | 426 |
426 Throughout this project, I intended for =CORTEX= to be flexible and | 427 Throughout this project, I intended for =CORTEX= to be flexible and |
427 extensible enough to be useful for other researchers who want to | 428 extensible enough to be useful for other researchers who want to |
428 test out ideas of their own. To this end, wherever I have had to make | 429 test ideas of their own. To this end, wherever I have had to make |
429 architectural choices about =CORTEX=, I have chosen to give as much | 430 architectural choices about =CORTEX=, I have chosen to give as much |
430 freedom to the user as possible, so that =CORTEX= may be used for | 431 freedom to the user as possible, so that =CORTEX= may be used for |
431 things I have not foreseen. | 432 things I have not foreseen. |
432 | 433 |
433 ** Building in simulation versus reality | 434 ** Building in simulation versus reality |
435 use a computer-simulated environment in the first place! The world | 436 use a computer-simulated environment in the first place! The world |
436 is a vast and rich place, and for now simulations are a very poor | 437 is a vast and rich place, and for now simulations are a very poor |
437 reflection of its complexity. It may be that there is a significant | 438 reflection of its complexity. It may be that there is a significant |
438 qualitative difference between dealing with senses in the real | 439 qualitative difference between dealing with senses in the real |
439 world and dealing with pale facsimiles of them in a simulation | 440 world and dealing with pale facsimiles of them in a simulation |
440 \cite{brooks-representation}. What are the advantages and | 441 (\cite{brooks-representation}). What are the advantages and |
441 disadvantages of a simulation vs. reality? | 442 disadvantages of a simulation vs. reality? |
442 | 443 |
443 *** Simulation | 444 *** Simulation |
444 | 445 |
445 The advantages of virtual reality are that when everything is a | 446 The advantages of virtual reality are that when everything is a |
446 simulation, experiments in that simulation are absolutely | 447 simulation, experiments in that simulation are absolutely |
447 reproducible. It's also easier to change the character and world | 448 reproducible. It's also easier to change the creature and |
448 to explore new situations and different sensory combinations. | 449 environment to explore new situations and different sensory |
450 combinations. | |
449 | 451 |
450 If the world is to be simulated on a computer, then not only do | 452 If the world is to be simulated on a computer, then not only do |
451 you have to worry about whether the character's senses are rich | 453 you have to worry about whether the creature's senses are rich |
452 enough to learn from the world, but whether the world itself is | 454 enough to learn from the world, but whether the world itself is |
453 rendered with enough detail and realism to give enough working | 455 rendered with enough detail and realism to give enough working |
454 material to the character's senses. To name just a few | 456 material to the creature's senses. To name just a few |
455 difficulties facing modern physics simulators: destructibility of | 457 difficulties facing modern physics simulators: destructibility of |
456 the environment, simulation of water/other fluids, large areas, | 458 the environment, simulation of water/other fluids, large areas, |
457 nonrigid bodies, lots of objects, smoke. I don't know of any | 459 nonrigid bodies, lots of objects, smoke. I don't know of any |
458 computer simulation that would allow a character to take a rock | 460 computer simulation that would allow a creature to take a rock |
459 and grind it into fine dust, then use that dust to make a clay | 461 and grind it into fine dust, then use that dust to make a clay |
460 sculpture, at least not without spending years calculating the | 462 sculpture, at least not without spending years calculating the |
461 interactions of every single small grain of dust. Maybe a | 463 interactions of every single small grain of dust. Maybe a |
462 simulated world with today's limitations doesn't provide enough | 464 simulated world with today's limitations doesn't provide enough |
463 richness for real intelligence to evolve. | 465 richness for real intelligence to evolve. |
469 loose in the real world. This has the advantage of eliminating | 471 loose in the real world. This has the advantage of eliminating |
470 concerns about simulating the world at the expense of increasing | 472 concerns about simulating the world at the expense of increasing |
471 the complexity of implementing the senses. Instead of just | 473 the complexity of implementing the senses. Instead of just |
472 grabbing the current rendered frame for processing, you have to | 474 grabbing the current rendered frame for processing, you have to |
473 use an actual camera with real lenses and interact with photons to | 475 use an actual camera with real lenses and interact with photons to |
474 get an image. It is much harder to change the character, which is | 476 get an image. It is much harder to change the creature, which is |
475 now partly a physical robot of some sort, since doing so involves | 477 now partly a physical robot of some sort, since doing so involves |
476 changing things around in the real world instead of modifying | 478 changing things around in the real world instead of modifying |
477 lines of code. While the real world is very rich and definitely | 479 lines of code. While the real world is very rich and definitely |
478 provides enough stimulation for intelligence to develop as | 480 provides enough stimulation for intelligence to develop (as |
479 evidenced by our own existence, it is also uncontrollable in the | 481 evidenced by our own existence), it is also uncontrollable in the |
480 sense that a particular situation cannot be recreated perfectly or | 482 sense that a particular situation cannot be recreated perfectly or |
481 saved for later use. It is harder to conduct science because it is | 483 saved for later use. It is harder to conduct Science because it is |
482 harder to repeat an experiment. The worst thing about using the | 484 harder to repeat an experiment. The worst thing about using the |
483 real world instead of a simulation is the matter of time. Instead | 485 real world instead of a simulation is the matter of time. Instead |
484 of simulated time you get the constant and unstoppable flow of | 486 of simulated time you get the constant and unstoppable flow of |
485 real time. This severely limits the sorts of software you can use | 487 real time. This severely limits the sorts of software you can use |
486 to program an AI, because all sense inputs must be handled in real | 488 to program an AI, because all sense inputs must be handled in real |
487 time. Complicated ideas may have to be implemented in hardware or | 489 time. Complicated ideas may have to be implemented in hardware or |
488 may simply be impossible given the current speed of our | 490 may simply be impossible given the current speed of our |
489 processors. Contrast this with a simulation, in which the flow of | 491 processors. Contrast this with a simulation, in which the flow of |
490 time in the simulated world can be slowed down to accommodate the | 492 time in the simulated world can be slowed down to accommodate the |
491 limitations of the character's programming. In terms of cost, | 493 limitations of the creature's programming. In terms of cost, doing |
492 doing everything in software is far cheaper than building custom | 494 everything in software is far cheaper than building custom |
493 real-time hardware. All you need is a laptop and some patience. | 495 real-time hardware. All you need is a laptop and some patience. |
494 | 496 |
495 ** Simulated time enables rapid prototyping \& simple programs | 497 ** Simulated time enables rapid prototyping \& simple programs |
496 | 498 |
497 I envision =CORTEX= being used to support rapid prototyping and | 499 I envision =CORTEX= being used to support rapid prototyping and |
503 The need for real time processing only increases if multiple senses | 505 The need for real time processing only increases if multiple senses |
504 are involved. In the extreme case, even simple algorithms will have | 506 are involved. In the extreme case, even simple algorithms will have |
505 to be accelerated by ASIC chips or FPGAs, turning what would | 507 to be accelerated by ASIC chips or FPGAs, turning what would |
506 otherwise be a few lines of code and a 10x speed penalty into a | 508 otherwise be a few lines of code and a 10x speed penalty into a |
507 multi-month ordeal. For this reason, =CORTEX= supports | 509 multi-month ordeal. For this reason, =CORTEX= supports |
508 /time-dilation/, which scales back the framerate of the | 510 /time-dilation/, which scales back the framerate of the simulation |
509 simulation in proportion to the amount of processing each frame. | 511 in proportion to the amount of processing each frame. From the |
510 From the perspective of the creatures inside the simulation, time | 512 perspective of the creatures inside the simulation, time always |
511 always appears to flow at a constant rate, regardless of how | 513 appears to flow at a constant rate, regardless of how complicated |
512 complicated the environment becomes or how many creatures are in | 514 the environment becomes or how many creatures are in the |
513 the simulation. The cost is that =CORTEX= can sometimes run slower | 515 simulation. The cost is that =CORTEX= can sometimes run slower than |
514 than real time. This can also be an advantage, however --- | 516 real time. Time dialation works both ways, however --- simulations |
515 simulations of very simple creatures in =CORTEX= generally run at | 517 of very simple creatures in =CORTEX= generally run at 40x real-time |
516 40x on my machine! | 518 on my machine! |
517 | 519 |
518 ** All sense organs are two-dimensional surfaces | 520 ** All sense organs are two-dimensional surfaces |
519 | 521 |
520 If =CORTEX= is to support a wide variety of senses, it would help | 522 If =CORTEX= is to support a wide variety of senses, it would help |
521 to have a better understanding of what a ``sense'' actually is! | 523 to have a better understanding of what a sense actually is! While |
522 While vision, touch, and hearing all seem like they are quite | 524 vision, touch, and hearing all seem like they are quite different |
523 different things, I was surprised to learn during the course of | 525 things, I was surprised to learn during the course of this thesis |
524 this thesis that they (and all physical senses) can be expressed as | 526 that they (and all physical senses) can be expressed as exactly the |
525 exactly the same mathematical object due to a dimensional argument! | 527 same mathematical object! |
526 | 528 |
527 Human beings are three-dimensional objects, and the nerves that | 529 Human beings are three-dimensional objects, and the nerves that |
528 transmit data from our various sense organs to our brain are | 530 transmit data from our various sense organs to our brain are |
529 essentially one-dimensional. This leaves up to two dimensions in | 531 essentially one-dimensional. This leaves up to two dimensions in |
530 which our sensory information may flow. For example, imagine your | 532 which our sensory information may flow. For example, imagine your |
543 complicated surface of the skin onto a two dimensional image. | 545 complicated surface of the skin onto a two dimensional image. |
544 | 546 |
545 Most human senses consist of many discrete sensors of various | 547 Most human senses consist of many discrete sensors of various |
546 properties distributed along a surface at various densities. For | 548 properties distributed along a surface at various densities. For |
547 skin, it is Pacinian corpuscles, Meissner's corpuscles, Merkel's | 549 skin, it is Pacinian corpuscles, Meissner's corpuscles, Merkel's |
548 disks, and Ruffini's endings \cite{textbook901}, which detect | 550 disks, and Ruffini's endings (\cite{textbook901}), which detect |
549 pressure and vibration of various intensities. For ears, it is the | 551 pressure and vibration of various intensities. For ears, it is the |
550 stereocilia distributed along the basilar membrane inside the | 552 stereocilia distributed along the basilar membrane inside the |
551 cochlea; each one is sensitive to a slightly different frequency of | 553 cochlea; each one is sensitive to a slightly different frequency of |
552 sound. For eyes, it is rods and cones distributed along the surface | 554 sound. For eyes, it is rods and cones distributed along the surface |
553 of the retina. In each case, we can describe the sense with a | 555 of the retina. In each case, we can describe the sense with a |
554 surface and a distribution of sensors along that surface. | 556 surface and a distribution of sensors along that surface. |
555 | 557 |
556 In fact, almost every human sense can be effectively described in | 558 In fact, almost every human sense can be effectively described in |
557 terms of a surface containing embedded sensors. If the sense had | 559 terms of a surface containing embedded sensors. If the sense had |
558 any more dimensions, then there wouldn't be enough room in the | 560 any more dimensions, then there wouldn't be enough room in the |
559 spinal chord to transmit the information! | 561 spinal cord to transmit the information! |
560 | 562 |
561 Therefore, =CORTEX= must support the ability to create objects and | 563 Therefore, =CORTEX= must support the ability to create objects and |
562 then be able to ``paint'' points along their surfaces to describe | 564 then be able to ``paint'' points along their surfaces to describe |
563 each sense. | 565 each sense. |
564 | 566 |
565 Fortunately this idea is already a well known computer graphics | 567 Fortunately this idea is already a well known computer graphics |
566 technique called /UV-mapping/. The three-dimensional surface of a | 568 technique called /UV-mapping/. In UV-maping, the three-dimensional |
567 model is cut and smooshed until it fits on a two-dimensional | 569 surface of a model is cut and smooshed until it fits on a |
568 image. You paint whatever you want on that image, and when the | 570 two-dimensional image. You paint whatever you want on that image, |
569 three-dimensional shape is rendered in a game the smooshing and | 571 and when the three-dimensional shape is rendered in a game the |
570 cutting is reversed and the image appears on the three-dimensional | 572 smooshing and cutting is reversed and the image appears on the |
571 object. | 573 three-dimensional object. |
572 | 574 |
573 To make a sense, interpret the UV-image as describing the | 575 To make a sense, interpret the UV-image as describing the |
574 distribution of that senses sensors. To get different types of | 576 distribution of that senses sensors. To get different types of |
575 sensors, you can either use a different color for each type of | 577 sensors, you can either use a different color for each type of |
576 sensor, or use multiple UV-maps, each labeled with that sensor | 578 sensor, or use multiple UV-maps, each labeled with that sensor |
608 tools that can be co-opted to serve as touch, proprioception, and | 610 tools that can be co-opted to serve as touch, proprioception, and |
609 muscles. Since some games support split screen views, a good video | 611 muscles. Since some games support split screen views, a good video |
610 game engine will allow you to efficiently create multiple cameras | 612 game engine will allow you to efficiently create multiple cameras |
611 in the simulated world that can be used as eyes. Video game systems | 613 in the simulated world that can be used as eyes. Video game systems |
612 offer integrated asset management for things like textures and | 614 offer integrated asset management for things like textures and |
613 creatures models, providing an avenue for defining creatures. They | 615 creature models, providing an avenue for defining creatures. They |
614 also understand UV-mapping, since this technique is used to apply a | 616 also understand UV-mapping, since this technique is used to apply a |
615 texture to a model. Finally, because video game engines support a | 617 texture to a model. Finally, because video game engines support a |
616 large number of users, as long as =CORTEX= doesn't stray too far | 618 large number of developers, as long as =CORTEX= doesn't stray too |
617 from the base system, other researchers can turn to this community | 619 far from the base system, other researchers can turn to this |
618 for help when doing their research. | 620 community for help when doing their research. |
619 | 621 |
620 ** =CORTEX= is based on jMonkeyEngine3 | 622 ** =CORTEX= is based on jMonkeyEngine3 |
621 | 623 |
622 While preparing to build =CORTEX= I studied several video game | 624 While preparing to build =CORTEX= I studied several video game |
623 engines to see which would best serve as a base. The top contenders | 625 engines to see which would best serve as a base. The top contenders |
624 were: | 626 were: |
625 | 627 |
626 - [[http://www.idsoftware.com][Quake II]]/[[http://www.bytonic.de/html/jake2.html][Jake2]] :: The Quake II engine was designed by ID | 628 - [[http://www.idsoftware.com][Quake II]]/[[http://www.bytonic.de/html/jake2.html][Jake2]] :: The Quake II engine was designed by ID software |
627 software in 1997. All the source code was released by ID | 629 in 1997. All the source code was released by ID software into |
628 software into the Public Domain several years ago, and as a | 630 the Public Domain several years ago, and as a result it has |
629 result it has been ported to many different languages. This | 631 been ported to many different languages. This engine was |
630 engine was famous for its advanced use of realistic shading | 632 famous for its advanced use of realistic shading and it had |
631 and had decent and fast physics simulation. The main advantage | 633 decent and fast physics simulation. The main advantage of the |
632 of the Quake II engine is its simplicity, but I ultimately | 634 Quake II engine is its simplicity, but I ultimately rejected |
633 rejected it because the engine is too tied to the concept of a | 635 it because the engine is too tied to the concept of a |
634 first-person shooter game. One of the problems I had was that | 636 first-person shooter game. One of the problems I had was that |
635 there does not seem to be any easy way to attach multiple | 637 there does not seem to be any easy way to attach multiple |
636 cameras to a single character. There are also several physics | 638 cameras to a single character. There are also several physics |
637 clipping issues that are corrected in a way that only applies | 639 clipping issues that are corrected in a way that only applies |
638 to the main character and do not apply to arbitrary objects. | 640 to the main character and do not apply to arbitrary objects. |
668 creatures. If possible, it would be nice to leverage work that has | 670 creatures. If possible, it would be nice to leverage work that has |
669 already been done by the community of 3D modelers, or at least | 671 already been done by the community of 3D modelers, or at least |
670 enable people who are talented at modeling but not programming to | 672 enable people who are talented at modeling but not programming to |
671 design =CORTEX= creatures. | 673 design =CORTEX= creatures. |
672 | 674 |
673 Therefore, I use Blender, a free 3D modeling program, as the main | 675 Therefore I use Blender, a free 3D modeling program, as the main |
674 way to create creatures in =CORTEX=. However, the creatures modeled | 676 way to create creatures in =CORTEX=. However, the creatures modeled |
675 in Blender must also be simple to simulate in jMonkeyEngine3's game | 677 in Blender must also be simple to simulate in jMonkeyEngine3's game |
676 engine, and must also be easy to rig with =CORTEX='s senses. I | 678 engine, and must also be easy to rig with =CORTEX='s senses. I |
677 accomplish this with extensive use of Blender's ``empty nodes.'' | 679 accomplish this with extensive use of Blender's ``empty nodes.'' |
678 | 680 |
679 Empty nodes have no mass, physical presence, or appearance, but | 681 Empty nodes have no mass, physical presence, or appearance, but |
680 they can hold metadata and have names. I use a tree structure of | 682 they can hold metadata and have names. I use a tree structure of |
681 empty nodes to specify senses in the following manner: | 683 empty nodes to specify senses in the following manner: |
682 | 684 |
697 | 699 |
698 ** Bodies are composed of segments connected by joints | 700 ** Bodies are composed of segments connected by joints |
699 | 701 |
700 Blender is a general purpose animation tool, which has been used in | 702 Blender is a general purpose animation tool, which has been used in |
701 the past to create high quality movies such as Sintel | 703 the past to create high quality movies such as Sintel |
702 \cite{blender}. Though Blender can model and render even complicated | 704 (\cite{blender}). Though Blender can model and render even |
703 things like water, it is crucial to keep models that are meant to | 705 complicated things like water, it is crucial to keep models that |
704 be simulated as creatures simple. =Bullet=, which =CORTEX= uses | 706 are meant to be simulated as creatures simple. =Bullet=, which |
705 though jMonkeyEngine3, is a rigid-body physics system. This offers | 707 =CORTEX= uses though jMonkeyEngine3, is a rigid-body physics |
706 a compromise between the expressiveness of a game level and the | 708 system. This offers a compromise between the expressiveness of a |
707 speed at which it can be simulated, and it means that creatures | 709 game level and the speed at which it can be simulated, and it means |
708 should be naturally expressed as rigid components held together by | 710 that creatures should be naturally expressed as rigid components |
709 joint constraints. | 711 held together by joint constraints. |
710 | 712 |
711 But humans are more like a squishy bag wrapped around some hard | 713 But humans are more like a squishy bag wrapped around some hard |
712 bones which define the overall shape. When we move, our skin bends | 714 bones which define the overall shape. When we move, our skin bends |
713 and stretches to accommodate the new positions of our bones. | 715 and stretches to accommodate the new positions of our bones. |
714 | 716 |
727 it about the true extent of its body. Simulating the skin as a | 729 it about the true extent of its body. Simulating the skin as a |
728 physical object requires some way to continuously update the | 730 physical object requires some way to continuously update the |
729 physical model of the skin along with the movement of the bones, | 731 physical model of the skin along with the movement of the bones, |
730 which is unacceptably slow compared to rigid body simulation. | 732 which is unacceptably slow compared to rigid body simulation. |
731 | 733 |
732 Therefore, instead of using the human-like ``deformable bag of | 734 Therefore, instead of using the human-like ``bony meatbag'' |
733 bones'' approach, I decided to base my body plans on multiple solid | 735 approach, I decided to base my body plans on multiple solid objects |
734 objects that are connected by joints, inspired by the robot =EVE= | 736 that are connected by joints, inspired by the robot =EVE= from the |
735 from the movie WALL-E. | 737 movie WALL-E. |
736 | 738 |
737 #+caption: =EVE= from the movie WALL-E. This body plan turns | 739 #+caption: =EVE= from the movie WALL-E. This body plan turns |
738 #+caption: out to be much better suited to my purposes than a more | 740 #+caption: out to be much better suited to my purposes than a more |
739 #+caption: human-like one. | 741 #+caption: human-like one. |
740 #+ATTR_LaTeX: :width 10cm | 742 #+ATTR_LaTeX: :width 10cm |
741 [[./images/Eve.jpg]] | 743 [[./images/Eve.jpg]] |
742 | 744 |
743 =EVE='s body is composed of several rigid components that are held | 745 =EVE='s body is composed of several rigid components that are held |
744 together by invisible joint constraints. This is what I mean by | 746 together by invisible joint constraints. This is what I mean by |
745 ``eve-like''. The main reason that I use eve-style bodies is for | 747 /eve-like/. The main reason that I use eve-like bodies is for |
746 efficiency, and so that there will be correspondence between the | 748 simulation efficiency, and so that there will be correspondence |
747 AI's senses and the physical presence of its body. Each individual | 749 between the AI's senses and the physical presence of its body. Each |
748 section is simulated by a separate rigid body that corresponds | 750 individual section is simulated by a separate rigid body that |
749 exactly with its visual representation and does not change. | 751 corresponds exactly with its visual representation and does not |
750 Sections are connected by invisible joints that are well supported | 752 change. Sections are connected by invisible joints that are well |
751 in jMonkeyEngine3. Bullet, the physics backend for jMonkeyEngine3, | 753 supported in jMonkeyEngine3. Bullet, the physics backend for |
752 can efficiently simulate hundreds of rigid bodies connected by | 754 jMonkeyEngine3, can efficiently simulate hundreds of rigid bodies |
753 joints. Just because sections are rigid does not mean they have to | 755 connected by joints. Just because sections are rigid does not mean |
754 stay as one piece forever; they can be dynamically replaced with | 756 they have to stay as one piece forever; they can be dynamically |
755 multiple sections to simulate splitting in two. This could be used | 757 replaced with multiple sections to simulate splitting in two. This |
756 to simulate retractable claws or =EVE='s hands, which are able to | 758 could be used to simulate retractable claws or =EVE='s hands, which |
757 coalesce into one object in the movie. | 759 are able to coalesce into one object in the movie. |
758 | 760 |
759 *** Solidifying/Connecting a body | 761 *** Solidifying/Connecting a body |
760 | 762 |
761 =CORTEX= creates a creature in two steps: first, it traverses the | 763 =CORTEX= creates a creature in two steps: first, it traverses the |
762 nodes in the blender file and creates physical representations for | 764 nodes in the blender file and creates physical representations for |
2441 | 2443 |
2442 - Empathy :: my empathy program leaves many areas for | 2444 - Empathy :: my empathy program leaves many areas for |
2443 improvement, among which are using vision to infer | 2445 improvement, among which are using vision to infer |
2444 proprioception and looking up sensory experience with imagined | 2446 proprioception and looking up sensory experience with imagined |
2445 vision, touch, and sound. | 2447 vision, touch, and sound. |
2446 - Evolution :: Karl Sims created a rich environment for | 2448 - Evolution :: Karl Sims created a rich environment for simulating |
2447 simulating the evolution of creatures on a connection | 2449 the evolution of creatures on a Connection Machine |
2448 machine. Today, this can be redone and expanded with =CORTEX= | 2450 (\cite{sims-evolving-creatures}). Today, this can be redone |
2449 on an ordinary computer. | 2451 and expanded with =CORTEX= on an ordinary computer. |
2450 - Exotic senses :: Cortex enables many fascinating senses that are | 2452 - Exotic senses :: Cortex enables many fascinating senses that are |
2451 not possible to build in the real world. For example, | 2453 not possible to build in the real world. For example, |
2452 telekinesis is an interesting avenue to explore. You can also | 2454 telekinesis is an interesting avenue to explore. You can also |
2453 make a ``semantic'' sense which looks up metadata tags on | 2455 make a ``semantic'' sense which looks up metadata tags on |
2454 objects in the environment the metadata tags might contain | 2456 objects in the environment the metadata tags might contain |
2455 other sensory information. | 2457 other sensory information. |
2456 - Imagination via subworlds :: this would involve a creature with | 2458 - Imagination via subworlds :: this would involve a creature with |
2457 an effector which creates an entire new sub-simulation where | 2459 an effector which creates an entire new sub-simulation where |
2458 the creature has direct control over placement/creation of | 2460 the creature has direct control over placement/creation of |
2459 objects via simulated telekinesis. The creature observes this | 2461 objects via simulated telekinesis. The creature observes this |
2460 sub-world through it's normal senses and uses its observations | 2462 sub-world through its normal senses and uses its observations |
2461 to make predictions about its top level world. | 2463 to make predictions about its top level world. |
2462 - Simulated prescience :: step the simulation forward a few ticks, | 2464 - Simulated prescience :: step the simulation forward a few ticks, |
2463 gather sensory data, then supply this data for the creature as | 2465 gather sensory data, then supply this data for the creature as |
2464 one of its actual senses. The cost of prescience is slowing | 2466 one of its actual senses. The cost of prescience is slowing |
2465 the simulation down by a factor proportional to however far | 2467 the simulation down by a factor proportional to however far |
2468 fight each other? | 2470 fight each other? |
2469 - Swarm creatures :: Program a group of creatures that cooperate | 2471 - Swarm creatures :: Program a group of creatures that cooperate |
2470 with each other. Because the creatures would be simulated, you | 2472 with each other. Because the creatures would be simulated, you |
2471 could investigate computationally complex rules of behavior | 2473 could investigate computationally complex rules of behavior |
2472 which still, from the group's point of view, would happen in | 2474 which still, from the group's point of view, would happen in |
2473 ``real time''. Interactions could be as simple as cellular | 2475 real time. Interactions could be as simple as cellular |
2474 organisms communicating via flashing lights, or as complex as | 2476 organisms communicating via flashing lights, or as complex as |
2475 humanoids completing social tasks, etc. | 2477 humanoids completing social tasks, etc. |
2476 - =HACKER= for writing muscle-control programs :: Presented with | 2478 - =HACKER= for writing muscle-control programs :: Presented with a |
2477 low-level muscle control/ sense API, generate higher level | 2479 low-level muscle control / sense API, generate higher level |
2478 programs for accomplishing various stated goals. Example goals | 2480 programs for accomplishing various stated goals. Example goals |
2479 might be "extend all your fingers" or "move your hand into the | 2481 might be "extend all your fingers" or "move your hand into the |
2480 area with blue light" or "decrease the angle of this joint". | 2482 area with blue light" or "decrease the angle of this joint". |
2481 It would be like Sussman's HACKER, except it would operate | 2483 It would be like Sussman's HACKER, except it would operate |
2482 with much more data in a more realistic world. Start off with | 2484 with much more data in a more realistic world. Start off with |
2483 "calisthenics" to develop subroutines over the motor control | 2485 "calisthenics" to develop subroutines over the motor control |
2484 API. This would be the "spinal chord" of a more intelligent | 2486 API. The low level programming code might be a turning machine |
2485 creature. The low level programming code might be a turning | 2487 that could develop programs to iterate over a "tape" where |
2486 machine that could develop programs to iterate over a "tape" | 2488 each entry in the tape could control recruitment of the fibers |
2487 where each entry in the tape could control recruitment of the | 2489 in a muscle. |
2488 fibers in a muscle. | 2490 - Sense fusion :: There is much work to be done on sense |
2489 - Sense fusion :: There is much work to be done on sense | |
2490 integration -- building up a coherent picture of the world and | 2491 integration -- building up a coherent picture of the world and |
2491 the things in it with =CORTEX= as a base, you can explore | 2492 the things in it. With =CORTEX= as a base, you can explore |
2492 concepts like self-organizing maps or cross modal clustering | 2493 concepts like self-organizing maps or cross modal clustering |
2493 in ways that have never before been tried. | 2494 in ways that have never before been tried. |
2494 - Inverse kinematics :: experiments in sense guided motor control | 2495 - Inverse kinematics :: experiments in sense guided motor control |
2495 are easy given =CORTEX='s support -- you can get right to the | 2496 are easy given =CORTEX='s support -- you can get right to the |
2496 hard control problems without worrying about physics or | 2497 hard control problems without worrying about physics or |
2759 have terms that consider the color of a person's skin or whether | 2760 have terms that consider the color of a person's skin or whether |
2760 they are male or female, instead it gets right to the meat of what | 2761 they are male or female, instead it gets right to the meat of what |
2761 jumping actually /is/. | 2762 jumping actually /is/. |
2762 | 2763 |
2763 Of course, the action predicates are not directly applicable to | 2764 Of course, the action predicates are not directly applicable to |
2764 video data which lacks the advanced sensory information which they | 2765 video data, which lacks the advanced sensory information which they |
2765 require! | 2766 require! |
2766 | 2767 |
2767 The trick now is to make the action predicates work even when the | 2768 The trick now is to make the action predicates work even when the |
2768 sensory data on which they depend is absent. If I can do that, then | 2769 sensory data on which they depend is absent. If I can do that, then |
2769 I will have gained much. | 2770 I will have gained much. |
2856 #+BEGIN_EXAMPLE | 2857 #+BEGIN_EXAMPLE |
2857 [ flat, flat, flat, flat, flat, flat, lift-head ] | 2858 [ flat, flat, flat, flat, flat, flat, lift-head ] |
2858 #+END_EXAMPLE | 2859 #+END_EXAMPLE |
2859 | 2860 |
2860 The worm's previous experience of lying on the ground and lifting | 2861 The worm's previous experience of lying on the ground and lifting |
2861 its head generates possible interpretations for each frame: | 2862 its head generates possible interpretations for each frame (the |
2863 numbers are experience-indices): | |
2862 | 2864 |
2863 #+BEGIN_EXAMPLE | 2865 #+BEGIN_EXAMPLE |
2864 [ flat, flat, flat, flat, flat, flat, flat, lift-head ] | 2866 [ flat, flat, flat, flat, flat, flat, flat, lift-head ] |
2865 1 1 1 1 1 1 1 4 | 2867 1 1 1 1 1 1 1 4 |
2866 2 2 2 2 2 2 2 | 2868 2 2 2 2 2 2 2 |
2876 [ flat, flat, flat, flat, flat, flat, flat, lift-head ] | 2878 [ flat, flat, flat, flat, flat, flat, flat, lift-head ] |
2877 6 7 8 9 1 2 3 4 | 2879 6 7 8 9 1 2 3 4 |
2878 #+END_EXAMPLE | 2880 #+END_EXAMPLE |
2879 | 2881 |
2880 The new path through \Phi-space is synthesized from two actual | 2882 The new path through \Phi-space is synthesized from two actual |
2881 paths that the creature actually experiences, the "1-2-3-4" chain | 2883 paths that the creature has experienced: the "1-2-3-4" chain and |
2882 and the "6-7-8-9" chain. The "1-2-3-4" chain is necessary because | 2884 the "6-7-8-9" chain. The "1-2-3-4" chain is necessary because it |
2883 it ends with the worm lifting its head. It originated from a short | 2885 ends with the worm lifting its head. It originated from a short |
2884 training session where the worm rested on the floor for a brief | 2886 training session where the worm rested on the floor for a brief |
2885 while and then raised its head. The "6-7-8-9" chain is part of a | 2887 while and then raised its head. The "6-7-8-9" chain is part of a |
2886 longer chain of inactivity where the worm simply rested on the | 2888 longer chain of inactivity where the worm simply rested on the |
2887 floor without moving. It is preferred over a "1-2-3" chain (which | 2889 floor without moving. It is preferred over a "1-2-3" chain (which |
2888 also describes inactivity) because it is longer. The main ideas | 2890 also describes inactivity) because it is longer. The main ideas |
3798 - =(display-dilated-time world timer)= :: Shows the time as it is | 3800 - =(display-dilated-time world timer)= :: Shows the time as it is |
3799 flowing in the simulation on a HUD display. | 3801 flowing in the simulation on a HUD display. |
3800 | 3802 |
3801 | 3803 |
3802 | 3804 |
3805 TODO -- add a paper about detecting biological motion from only a few dots. |