Mercurial > cortex
view thesis/cortex.org @ 438:4dcb923c9b16
progress on intro.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Sun, 23 Mar 2014 22:22:00 -0400 |
parents | c1e6b7221b2f |
children | 97dc719fd1ac |
line wrap: on
line source
1 #+title: =CORTEX=2 #+author: Robert McIntyre3 #+email: rlm@mit.edu4 #+description: Using embodied AI to facilitate Artificial Imagination.5 #+keywords: AI, clojure, embodiment8 * Empathy and Embodiment: problem solving strategies10 By the end of this thesis, you will have seen a novel approach to11 interpreting video using embodiment and empathy. You will have also12 seen one way to efficiently implement empathy for embodied13 creatures.15 The core vision of this thesis is that one of the important ways in16 which we understand others is by imagining ourselves in their17 posistion and empathicaly feeling experiences based on our own past18 experiences and imagination.20 By understanding events in terms of our own previous corperal21 experience, we greatly constrain the possibilities of what would22 otherwise be an unweidly exponential search. This extra constraint23 can be the difference between easily understanding what is happening24 in a video and being completely lost in a sea of incomprehensible25 color and movement.27 ** Recognizing actions in video is extremely difficult29 Consider for example the problem of determining what is happening in30 a video of which this is one frame:32 #+caption: A cat drinking some water. Identifying this action is beyond the state of the art for computers.33 #+ATTR_LaTeX: :width 7cm34 [[./images/cat-drinking.jpg]]36 It is currently impossible for any computer program to reliably37 label such an video as "drinking". And rightly so -- it is a very38 hard problem! What features can you describe in terms of low level39 functions of pixels that can even begin to describe what is40 happening here?42 Or suppose that you are building a program that recognizes43 chairs. How could you ``see'' the chair in the following picture?45 #+caption: When you look at this, do you think ``chair''? I certainly do.46 #+ATTR_LaTeX: :width 10cm47 [[./images/invisible-chair.png]]49 #+caption: The chair in this image is quite obvious to humans, but I doubt any computer program can find it.50 #+ATTR_LaTeX: :width 10cm51 [[./images/fat-person-sitting-at-desk.jpg]]54 I think humans are able to label55 such video as "drinking" because they imagine /themselves/ as the56 cat, and imagine putting their face up against a stream of water and57 sticking out their tongue. In that imagined world, they can feel the58 cool water hitting their tongue, and feel the water entering their59 body, and are able to recognize that /feeling/ as drinking. So, the60 label of the action is not really in the pixels of the image, but is61 found clearly in a simulation inspired by those pixels. An62 imaginative system, having been trained on drinking and non-drinking63 examples and learning that the most important component of drinking64 is the feeling of water sliding down one's throat, would analyze a65 video of a cat drinking in the following manner:67 - Create a physical model of the video by putting a "fuzzy" model68 of its own body in place of the cat. Also, create a simulation of69 the stream of water.71 - Play out this simulated scene and generate imagined sensory72 experience. This will include relevant muscle contractions, a73 close up view of the stream from the cat's perspective, and most74 importantly, the imagined feeling of water entering the mouth.76 - The action is now easily identified as drinking by the sense of77 taste alone. The other senses (such as the tongue moving in and78 out) help to give plausibility to the simulated action. Note that79 the sense of vision, while critical in creating the simulation,80 is not critical for identifying the action from the simulation.88 cat drinking, mimes, leaning, common sense90 ** =EMPATH= neatly solves recognition problems92 factorization , right language, etc94 a new possibility for the question ``what is a chair?'' -- it's the95 feeling of your butt on something and your knees bent, with your96 back muscles and legs relaxed.98 ** =CORTEX= is a toolkit for building sensate creatures100 Hand integration demo102 ** Contributions104 * Building =CORTEX=106 ** To explore embodiment, we need a world, body, and senses108 ** Because of Time, simulation is perferable to reality110 ** Video game engines are a great starting point112 ** Bodies are composed of segments connected by joints114 ** Eyes reuse standard video game components116 ** Hearing is hard; =CORTEX= does it right118 ** Touch uses hundreds of hair-like elements120 ** Proprioception is the force that makes everything ``real''122 ** Muscles are both effectors and sensors124 ** =CORTEX= brings complex creatures to life!126 ** =CORTEX= enables many possiblities for further research128 * Empathy in a simulated worm130 ** Embodiment factors action recognition into managable parts132 ** Action recognition is easy with a full gamut of senses134 ** Digression: bootstrapping touch using free exploration136 ** \Phi-space describes the worm's experiences138 ** Empathy is the process of tracing though \Phi-space140 ** Efficient action recognition via empathy142 * Contributions143 - Built =CORTEX=, a comprehensive platform for embodied AI144 experiments. Has many new features lacking in other systems, such145 as sound. Easy to model/create new creatures.146 - created a novel concept for action recognition by using artificial147 imagination.149 In the second half of the thesis I develop a computational model of150 empathy, using =CORTEX= as a base. Empathy in this context is the151 ability to observe another creature and infer what sorts of sensations152 that creature is feeling. My empathy algorithm involves multiple153 phases. First is free-play, where the creature moves around and gains154 sensory experience. From this experience I construct a representation155 of the creature's sensory state space, which I call \phi-space. Using156 \phi-space, I construct an efficient function for enriching the157 limited data that comes from observing another creature with a full158 compliment of imagined sensory data based on previous experience. I159 can then use the imagined sensory data to recognize what the observed160 creature is doing and feeling, using straightforward embodied action161 predicates. This is all demonstrated with using a simple worm-like162 creature, and recognizing worm-actions based on limited data.164 Embodied representation using multiple senses such as touch,165 proprioception, and muscle tension turns out be be exceedingly166 efficient at describing body-centered actions. It is the ``right167 language for the job''. For example, it takes only around 5 lines of168 LISP code to describe the action of ``curling'' using embodied169 primitives. It takes about 8 lines to describe the seemingly170 complicated action of wiggling.174 * COMMENT names for cortex175 - bioland