annotate thesis/cortex.org @ 440:b01c070b03d4

save for tonight.
author Robert McIntyre <rlm@mit.edu>
date Sun, 23 Mar 2014 23:43:20 -0400
parents 97dc719fd1ac
children c20de2267d39
rev   line source
rlm@425 1 #+title: =CORTEX=
rlm@425 2 #+author: Robert McIntyre
rlm@425 3 #+email: rlm@mit.edu
rlm@425 4 #+description: Using embodied AI to facilitate Artificial Imagination.
rlm@425 5 #+keywords: AI, clojure, embodiment
rlm@422 6
rlm@437 7
rlm@439 8 * Empathy and Embodiment as problem solving strategies
rlm@437 9
rlm@437 10 By the end of this thesis, you will have seen a novel approach to
rlm@437 11 interpreting video using embodiment and empathy. You will have also
rlm@437 12 seen one way to efficiently implement empathy for embodied
rlm@437 13 creatures.
rlm@437 14
rlm@437 15 The core vision of this thesis is that one of the important ways in
rlm@437 16 which we understand others is by imagining ourselves in their
rlm@437 17 posistion and empathicaly feeling experiences based on our own past
rlm@437 18 experiences and imagination.
rlm@437 19
rlm@437 20 By understanding events in terms of our own previous corperal
rlm@437 21 experience, we greatly constrain the possibilities of what would
rlm@437 22 otherwise be an unweidly exponential search. This extra constraint
rlm@437 23 can be the difference between easily understanding what is happening
rlm@437 24 in a video and being completely lost in a sea of incomprehensible
rlm@437 25 color and movement.
rlm@435 26
rlm@436 27 ** Recognizing actions in video is extremely difficult
rlm@437 28
rlm@437 29 Consider for example the problem of determining what is happening in
rlm@437 30 a video of which this is one frame:
rlm@437 31
rlm@439 32 #+caption: A cat drinking some water. Identifying this action is
rlm@439 33 #+caption: beyond the state of the art for computers.
rlm@437 34 #+ATTR_LaTeX: :width 7cm
rlm@437 35 [[./images/cat-drinking.jpg]]
rlm@437 36
rlm@437 37 It is currently impossible for any computer program to reliably
rlm@437 38 label such an video as "drinking". And rightly so -- it is a very
rlm@437 39 hard problem! What features can you describe in terms of low level
rlm@437 40 functions of pixels that can even begin to describe what is
rlm@437 41 happening here?
rlm@437 42
rlm@437 43 Or suppose that you are building a program that recognizes
rlm@440 44 chairs. How could you ``see'' the chair in the following pictures?
rlm@437 45
rlm@437 46 #+caption: When you look at this, do you think ``chair''? I certainly do.
rlm@437 47 #+ATTR_LaTeX: :width 10cm
rlm@437 48 [[./images/invisible-chair.png]]
rlm@437 49
rlm@439 50 #+caption: The chair in this image is quite obvious to humans, but I
rlm@439 51 #+caption: doubt that any computer program can find it.
rlm@437 52 #+ATTR_LaTeX: :width 10cm
rlm@437 53 [[./images/fat-person-sitting-at-desk.jpg]]
rlm@437 54
rlm@440 55 Finally, how is it that you can easily tell the difference between
rlm@440 56 how the girls /muscles/ are working in \ref{girl}?
rlm@437 57
rlm@440 58 #+caption: The mysterious ``common sense'' appears here as you are able
rlm@440 59 #+caption: to ``see'' the difference in how the girl's arm muscles
rlm@440 60 #+caption: are activated differently in the two images.
rlm@440 61 #+name: girl
rlm@440 62 #+ATTR_LaTeX: :width 10cm
rlm@440 63 [[./images/wall-push.png]]
rlm@440 64
rlm@440 65
rlm@440 66 These problems are difficult because the language of pixels is far
rlm@440 67 removed from what we would consider to be an acceptable description
rlm@440 68 of the events in these images. In order to process them, we must
rlm@440 69 raise the images into some higher level of abstraction where their
rlm@440 70 descriptions become more similar to how we would describe them in
rlm@440 71 English. The question is, how can we raise
rlm@440 72
rlm@440 73
rlm@440 74 I think humans are able to label such video as "drinking" because
rlm@440 75 they imagine /themselves/ as the cat, and imagine putting their face
rlm@440 76 up against a stream of water and sticking out their tongue. In that
rlm@440 77 imagined world, they can feel the cool water hitting their tongue,
rlm@440 78 and feel the water entering their body, and are able to recognize
rlm@440 79 that /feeling/ as drinking. So, the label of the action is not
rlm@440 80 really in the pixels of the image, but is found clearly in a
rlm@440 81 simulation inspired by those pixels. An imaginative system, having
rlm@440 82 been trained on drinking and non-drinking examples and learning that
rlm@440 83 the most important component of drinking is the feeling of water
rlm@440 84 sliding down one's throat, would analyze a video of a cat drinking
rlm@440 85 in the following manner:
rlm@437 86
rlm@437 87 - Create a physical model of the video by putting a "fuzzy" model
rlm@437 88 of its own body in place of the cat. Also, create a simulation of
rlm@437 89 the stream of water.
rlm@437 90
rlm@437 91 - Play out this simulated scene and generate imagined sensory
rlm@437 92 experience. This will include relevant muscle contractions, a
rlm@437 93 close up view of the stream from the cat's perspective, and most
rlm@437 94 importantly, the imagined feeling of water entering the mouth.
rlm@437 95
rlm@437 96 - The action is now easily identified as drinking by the sense of
rlm@437 97 taste alone. The other senses (such as the tongue moving in and
rlm@437 98 out) help to give plausibility to the simulated action. Note that
rlm@437 99 the sense of vision, while critical in creating the simulation,
rlm@437 100 is not critical for identifying the action from the simulation.
rlm@437 101
rlm@436 102 cat drinking, mimes, leaning, common sense
rlm@435 103
rlm@437 104 ** =EMPATH= neatly solves recognition problems
rlm@437 105
rlm@437 106 factorization , right language, etc
rlm@435 107
rlm@436 108 a new possibility for the question ``what is a chair?'' -- it's the
rlm@436 109 feeling of your butt on something and your knees bent, with your
rlm@436 110 back muscles and legs relaxed.
rlm@435 111
rlm@437 112 ** =CORTEX= is a toolkit for building sensate creatures
rlm@435 113
rlm@436 114 Hand integration demo
rlm@435 115
rlm@437 116 ** Contributions
rlm@435 117
rlm@436 118 * Building =CORTEX=
rlm@435 119
rlm@436 120 ** To explore embodiment, we need a world, body, and senses
rlm@435 121
rlm@436 122 ** Because of Time, simulation is perferable to reality
rlm@435 123
rlm@436 124 ** Video game engines are a great starting point
rlm@435 125
rlm@436 126 ** Bodies are composed of segments connected by joints
rlm@435 127
rlm@436 128 ** Eyes reuse standard video game components
rlm@436 129
rlm@436 130 ** Hearing is hard; =CORTEX= does it right
rlm@436 131
rlm@436 132 ** Touch uses hundreds of hair-like elements
rlm@436 133
rlm@440 134 ** Proprioception is the sense that makes everything ``real''
rlm@436 135
rlm@436 136 ** Muscles are both effectors and sensors
rlm@436 137
rlm@436 138 ** =CORTEX= brings complex creatures to life!
rlm@436 139
rlm@436 140 ** =CORTEX= enables many possiblities for further research
rlm@435 141
rlm@435 142 * Empathy in a simulated worm
rlm@435 143
rlm@436 144 ** Embodiment factors action recognition into managable parts
rlm@435 145
rlm@436 146 ** Action recognition is easy with a full gamut of senses
rlm@435 147
rlm@437 148 ** Digression: bootstrapping touch using free exploration
rlm@435 149
rlm@436 150 ** \Phi-space describes the worm's experiences
rlm@435 151
rlm@436 152 ** Empathy is the process of tracing though \Phi-space
rlm@435 153
rlm@440 154 ** Efficient action recognition =EMPATH=
rlm@425 155
rlm@432 156 * Contributions
rlm@432 157 - Built =CORTEX=, a comprehensive platform for embodied AI
rlm@432 158 experiments. Has many new features lacking in other systems, such
rlm@432 159 as sound. Easy to model/create new creatures.
rlm@432 160 - created a novel concept for action recognition by using artificial
rlm@432 161 imagination.
rlm@426 162
rlm@436 163 In the second half of the thesis I develop a computational model of
rlm@436 164 empathy, using =CORTEX= as a base. Empathy in this context is the
rlm@436 165 ability to observe another creature and infer what sorts of sensations
rlm@436 166 that creature is feeling. My empathy algorithm involves multiple
rlm@436 167 phases. First is free-play, where the creature moves around and gains
rlm@436 168 sensory experience. From this experience I construct a representation
rlm@436 169 of the creature's sensory state space, which I call \phi-space. Using
rlm@436 170 \phi-space, I construct an efficient function for enriching the
rlm@436 171 limited data that comes from observing another creature with a full
rlm@436 172 compliment of imagined sensory data based on previous experience. I
rlm@436 173 can then use the imagined sensory data to recognize what the observed
rlm@436 174 creature is doing and feeling, using straightforward embodied action
rlm@436 175 predicates. This is all demonstrated with using a simple worm-like
rlm@436 176 creature, and recognizing worm-actions based on limited data.
rlm@432 177
rlm@436 178 Embodied representation using multiple senses such as touch,
rlm@436 179 proprioception, and muscle tension turns out be be exceedingly
rlm@436 180 efficient at describing body-centered actions. It is the ``right
rlm@436 181 language for the job''. For example, it takes only around 5 lines of
rlm@436 182 LISP code to describe the action of ``curling'' using embodied
rlm@436 183 primitives. It takes about 8 lines to describe the seemingly
rlm@436 184 complicated action of wiggling.
rlm@432 185
rlm@437 186
rlm@437 187
rlm@437 188 * COMMENT names for cortex
rlm@437 189 - bioland