Mercurial > cortex
view thesis/cortex.org @ 440:b01c070b03d4
save for tonight.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Sun, 23 Mar 2014 23:43:20 -0400 |
parents | 97dc719fd1ac |
children | c20de2267d39 |
line wrap: on
line source
1 #+title: =CORTEX=2 #+author: Robert McIntyre3 #+email: rlm@mit.edu4 #+description: Using embodied AI to facilitate Artificial Imagination.5 #+keywords: AI, clojure, embodiment8 * Empathy and Embodiment as problem solving strategies10 By the end of this thesis, you will have seen a novel approach to11 interpreting video using embodiment and empathy. You will have also12 seen one way to efficiently implement empathy for embodied13 creatures.15 The core vision of this thesis is that one of the important ways in16 which we understand others is by imagining ourselves in their17 posistion and empathicaly feeling experiences based on our own past18 experiences and imagination.20 By understanding events in terms of our own previous corperal21 experience, we greatly constrain the possibilities of what would22 otherwise be an unweidly exponential search. This extra constraint23 can be the difference between easily understanding what is happening24 in a video and being completely lost in a sea of incomprehensible25 color and movement.27 ** Recognizing actions in video is extremely difficult29 Consider for example the problem of determining what is happening in30 a video of which this is one frame:32 #+caption: A cat drinking some water. Identifying this action is33 #+caption: beyond the state of the art for computers.34 #+ATTR_LaTeX: :width 7cm35 [[./images/cat-drinking.jpg]]37 It is currently impossible for any computer program to reliably38 label such an video as "drinking". And rightly so -- it is a very39 hard problem! What features can you describe in terms of low level40 functions of pixels that can even begin to describe what is41 happening here?43 Or suppose that you are building a program that recognizes44 chairs. How could you ``see'' the chair in the following pictures?46 #+caption: When you look at this, do you think ``chair''? I certainly do.47 #+ATTR_LaTeX: :width 10cm48 [[./images/invisible-chair.png]]50 #+caption: The chair in this image is quite obvious to humans, but I51 #+caption: doubt that any computer program can find it.52 #+ATTR_LaTeX: :width 10cm53 [[./images/fat-person-sitting-at-desk.jpg]]55 Finally, how is it that you can easily tell the difference between56 how the girls /muscles/ are working in \ref{girl}?58 #+caption: The mysterious ``common sense'' appears here as you are able59 #+caption: to ``see'' the difference in how the girl's arm muscles60 #+caption: are activated differently in the two images.61 #+name: girl62 #+ATTR_LaTeX: :width 10cm63 [[./images/wall-push.png]]66 These problems are difficult because the language of pixels is far67 removed from what we would consider to be an acceptable description68 of the events in these images. In order to process them, we must69 raise the images into some higher level of abstraction where their70 descriptions become more similar to how we would describe them in71 English. The question is, how can we raise74 I think humans are able to label such video as "drinking" because75 they imagine /themselves/ as the cat, and imagine putting their face76 up against a stream of water and sticking out their tongue. In that77 imagined world, they can feel the cool water hitting their tongue,78 and feel the water entering their body, and are able to recognize79 that /feeling/ as drinking. So, the label of the action is not80 really in the pixels of the image, but is found clearly in a81 simulation inspired by those pixels. An imaginative system, having82 been trained on drinking and non-drinking examples and learning that83 the most important component of drinking is the feeling of water84 sliding down one's throat, would analyze a video of a cat drinking85 in the following manner:87 - Create a physical model of the video by putting a "fuzzy" model88 of its own body in place of the cat. Also, create a simulation of89 the stream of water.91 - Play out this simulated scene and generate imagined sensory92 experience. This will include relevant muscle contractions, a93 close up view of the stream from the cat's perspective, and most94 importantly, the imagined feeling of water entering the mouth.96 - The action is now easily identified as drinking by the sense of97 taste alone. The other senses (such as the tongue moving in and98 out) help to give plausibility to the simulated action. Note that99 the sense of vision, while critical in creating the simulation,100 is not critical for identifying the action from the simulation.102 cat drinking, mimes, leaning, common sense104 ** =EMPATH= neatly solves recognition problems106 factorization , right language, etc108 a new possibility for the question ``what is a chair?'' -- it's the109 feeling of your butt on something and your knees bent, with your110 back muscles and legs relaxed.112 ** =CORTEX= is a toolkit for building sensate creatures114 Hand integration demo116 ** Contributions118 * Building =CORTEX=120 ** To explore embodiment, we need a world, body, and senses122 ** Because of Time, simulation is perferable to reality124 ** Video game engines are a great starting point126 ** Bodies are composed of segments connected by joints128 ** Eyes reuse standard video game components130 ** Hearing is hard; =CORTEX= does it right132 ** Touch uses hundreds of hair-like elements134 ** Proprioception is the sense that makes everything ``real''136 ** Muscles are both effectors and sensors138 ** =CORTEX= brings complex creatures to life!140 ** =CORTEX= enables many possiblities for further research142 * Empathy in a simulated worm144 ** Embodiment factors action recognition into managable parts146 ** Action recognition is easy with a full gamut of senses148 ** Digression: bootstrapping touch using free exploration150 ** \Phi-space describes the worm's experiences152 ** Empathy is the process of tracing though \Phi-space154 ** Efficient action recognition =EMPATH=156 * Contributions157 - Built =CORTEX=, a comprehensive platform for embodied AI158 experiments. Has many new features lacking in other systems, such159 as sound. Easy to model/create new creatures.160 - created a novel concept for action recognition by using artificial161 imagination.163 In the second half of the thesis I develop a computational model of164 empathy, using =CORTEX= as a base. Empathy in this context is the165 ability to observe another creature and infer what sorts of sensations166 that creature is feeling. My empathy algorithm involves multiple167 phases. First is free-play, where the creature moves around and gains168 sensory experience. From this experience I construct a representation169 of the creature's sensory state space, which I call \phi-space. Using170 \phi-space, I construct an efficient function for enriching the171 limited data that comes from observing another creature with a full172 compliment of imagined sensory data based on previous experience. I173 can then use the imagined sensory data to recognize what the observed174 creature is doing and feeling, using straightforward embodied action175 predicates. This is all demonstrated with using a simple worm-like176 creature, and recognizing worm-actions based on limited data.178 Embodied representation using multiple senses such as touch,179 proprioception, and muscle tension turns out be be exceedingly180 efficient at describing body-centered actions. It is the ``right181 language for the job''. For example, it takes only around 5 lines of182 LISP code to describe the action of ``curling'' using embodied183 primitives. It takes about 8 lines to describe the seemingly184 complicated action of wiggling.188 * COMMENT names for cortex189 - bioland