Mercurial > cortex

diff thesis/cortex.org @ 516:ced955c3c84f
resurrect old cortex to fix flow issues.
author: Robert McIntyre <rlm@mit.edu>
date: Sun, 30 Mar 2014 22:48:19 -0400
parents: 58fa1ffd481e
children: 68665d2c32a7
     1.1 --- a/thesis/cortex.org	Sun Mar 30 10:53:13 2014 -0400
     1.2 +++ b/thesis/cortex.org	Sun Mar 30 22:48:19 2014 -0400
     1.3 @@ -42,46 +42,55 @@
     1.4  
     1.5  
     1.6  * Empathy \& Embodiment: problem solving strategies
     1.7 +
     1.8 +  By the end of this thesis, you will have seen a novel approach to
     1.9 +  interpreting video using embodiment and empathy. You will have also
    1.10 +  seen one way to efficiently implement empathy for embodied
    1.11 +  creatures. Finally, you will become familiar with =CORTEX=, a system
    1.12 +  for designing and simulating creatures with rich senses, which you
    1.13 +  may choose to use in your own research.
    1.14    
    1.15 -** The problem: recognizing actions in video is extremely difficult
    1.16 -# developing / requires useful representations
    1.17 +  This is the core vision of my thesis: That one of the important ways
    1.18 +  in which we understand others is by imagining ourselves in their
    1.19 +  position and emphatically feeling experiences relative to our own
    1.20 +  bodies. By understanding events in terms of our own previous
    1.21 +  corporeal experience, we greatly constrain the possibilities of what
    1.22 +  would otherwise be an unwieldy exponential search. This extra
    1.23 +  constraint can be the difference between easily understanding what
    1.24 +  is happening in a video and being completely lost in a sea of
    1.25 +  incomprehensible color and movement.
    1.26 +
    1.27 +  
    1.28 +** The problem: recognizing actions in video is hard!
    1.29     
    1.30 -   Examine the following collection of images. As you, and indeed very
    1.31 -   young children, can easily determine, each one is a picture of
    1.32 -   someone drinking. 
    1.33 -
    1.34 -   # dxh: cat, cup, drinking fountain, rain, straw, coconut
    1.35 +   Examine the following image. What is happening? As you, and indeed
    1.36 +   very young children, can easily determine, this is an image of
    1.37 +   drinking. 
    1.38 +
    1.39     #+caption: A cat drinking some water. Identifying this action is 
    1.40     #+caption: beyond the capabilities of existing computer vision systems.
    1.41     #+ATTR_LaTeX: :width 7cm
    1.42     [[./images/cat-drinking.jpg]]
    1.43       
    1.44     Nevertheless, it is beyond the state of the art for a computer
    1.45 -   vision program to describe what's happening in each of these
    1.46 -   images, or what's common to them. Part of the problem is that many
    1.47 -   computer vision systems focus on pixel-level details or probability
    1.48 -   distributions of pixels, with little focus on [...]
    1.49 -
    1.50 +   vision program to describe what's happening in this image. Part of
    1.51 +   the problem is that many computer vision systems focus on
    1.52 +   pixel-level details or comparisons to example images (such as
    1.53 +   \cite{volume-action-recognition}), but the 3D world is so variable
    1.54 +   that it is hard to descrive the world in terms of possible images.
    1.55  
    1.56     In fact, the contents of scene may have much less to do with pixel
    1.57     probabilities than with recognizing various affordances: things you
    1.58 -   can move, objects you can grasp, spaces that can be filled
    1.59 -   (Gibson). For example, what processes might enable you to see the
    1.60 -   chair in figure \ref{hidden-chair}? 
    1.61 -   # Or suppose that you are building a program that recognizes chairs.
    1.62 -   # How could you ``see'' the chair ?
    1.63 -   
    1.64 -   # dxh: blur chair
    1.65 +   can move, objects you can grasp, spaces that can be filled . For
    1.66 +   example, what processes might enable you to see the chair in figure
    1.67 +   \ref{hidden-chair}?
    1.68 +
    1.69     #+caption: The chair in this image is quite obvious to humans, but I 
    1.70     #+caption: doubt that any modern computer vision program can find it.
    1.71     #+name: hidden-chair
    1.72     #+ATTR_LaTeX: :width 10cm
    1.73     [[./images/fat-person-sitting-at-desk.jpg]]
    1.74  
    1.75 -
    1.76 -   
    1.77 -
    1.78 -   
    1.79     Finally, how is it that you can easily tell the difference between
    1.80     how the girls /muscles/ are working in figure \ref{girl}?
    1.81     
    1.82 @@ -92,9 +101,6 @@
    1.83     #+ATTR_LaTeX: :width 7cm
    1.84     [[./images/wall-push.png]]
    1.85    
    1.86 -
    1.87 -
    1.88 -
    1.89     Each of these examples tells us something about what might be going
    1.90     on in our minds as we easily solve these recognition problems.
    1.91     
    1.92 @@ -110,10 +116,111 @@
    1.93     imagined positions triggered by images of the human body.
    1.94  
    1.95  ** A step forward: the sensorimotor-centered approach
    1.96 -# ** =EMPATH= recognizes what creatures are doing
    1.97 -# neatly solves recognition problems  
    1.98 +
    1.99     In this thesis, I explore the idea that our knowledge of our own
   1.100 -   bodies enables us to recognize the actions of others. 
   1.101 +   bodies, combined with our own rich senses, enables us to recognize
   1.102 +   the actions of others.
   1.103 +
   1.104 +   For example, I think humans are able to label the cat video as
   1.105 +   ``drinking'' because they imagine /themselves/ as the cat, and
   1.106 +   imagine putting their face up against a stream of water and
   1.107 +   sticking out their tongue. In that imagined world, they can feel
   1.108 +   the cool water hitting their tongue, and feel the water entering
   1.109 +   their body, and are able to recognize that /feeling/ as drinking.
   1.110 +   So, the label of the action is not really in the pixels of the
   1.111 +   image, but is found clearly in a simulation inspired by those
   1.112 +   pixels. An imaginative system, having been trained on drinking and
   1.113 +   non-drinking examples and learning that the most important
   1.114 +   component of drinking is the feeling of water sliding down one's
   1.115 +   throat, would analyze a video of a cat drinking in the following
   1.116 +   manner:
   1.117 +   
   1.118 +   1. Create a physical model of the video by putting a ``fuzzy''
   1.119 +      model of its own body in place of the cat. Possibly also create
   1.120 +      a simulation of the stream of water.
   1.121 +
   1.122 +   2. Play out this simulated scene and generate imagined sensory
   1.123 +      experience. This will include relevant muscle contractions, a
   1.124 +      close up view of the stream from the cat's perspective, and most
   1.125 +      importantly, the imagined feeling of water entering the
   1.126 +      mouth. The imagined sensory experience can come from a
   1.127 +      simulation of the event, but can also be pattern-matched from
   1.128 +      previous, similar embodied experience.
   1.129 +
   1.130 +   3. The action is now easily identified as drinking by the sense of
   1.131 +      taste alone. The other senses (such as the tongue moving in and
   1.132 +      out) help to give plausibility to the simulated action. Note that
   1.133 +      the sense of vision, while critical in creating the simulation,
   1.134 +      is not critical for identifying the action from the simulation.
   1.135 +
   1.136 +   For the chair examples, the process is even easier:
   1.137 +
   1.138 +    1. Align a model of your body to the person in the image.
   1.139 +
   1.140 +    2. Generate proprioceptive sensory data from this alignment.
   1.141 +  
   1.142 +    3. Use the imagined proprioceptive data as a key to lookup related
   1.143 +       sensory experience associated with that particular proproceptive
   1.144 +       feeling.
   1.145 +
   1.146 +    4. Retrieve the feeling of your bottom resting on a surface, your
   1.147 +       knees bent, and your leg muscles relaxed.
   1.148 +
   1.149 +    5. This sensory information is consistent with your =sitting?=
   1.150 +       sensory predicate, so you (and the entity in the image) must be
   1.151 +       sitting.
   1.152 +
   1.153 +    6. There must be a chair-like object since you are sitting.
   1.154 +
   1.155 +   Empathy offers yet another alternative to the age-old AI
   1.156 +   representation question: ``What is a chair?'' --- A chair is the
   1.157 +   feeling of sitting!
   1.158 +
   1.159 +   One powerful advantage of empathic problem solving is that it
   1.160 +   factors the action recognition problem into two easier problems. To
   1.161 +   use empathy, you need an /aligner/, which takes the video and a
   1.162 +   model of your body, and aligns the model with the video. Then, you
   1.163 +   need a /recognizer/, which uses the aligned model to interpret the
   1.164 +   action. The power in this method lies in the fact that you describe
   1.165 +   all actions form a body-centered viewpoint. You are less tied to
   1.166 +   the particulars of any visual representation of the actions. If you
   1.167 +   teach the system what ``running'' is, and you have a good enough
   1.168 +   aligner, the system will from then on be able to recognize running
   1.169 +   from any point of view, even strange points of view like above or
   1.170 +   underneath the runner. This is in contrast to action recognition
   1.171 +   schemes that try to identify actions using a non-embodied approach.
   1.172 +   If these systems learn about running as viewed from the side, they
   1.173 +   will not automatically be able to recognize running from any other
   1.174 +   viewpoint.
   1.175 +
   1.176 +   Another powerful advantage is that using the language of multiple
   1.177 +   body-centered rich senses to describe body-centerd actions offers a
   1.178 +   massive boost in descriptive capability. Consider how difficult it
   1.179 +   would be to compose a set of HOG filters to describe the action of
   1.180 +   a simple worm-creature ``curling'' so that its head touches its
   1.181 +   tail, and then behold the simplicity of describing thus action in a
   1.182 +   language designed for the task (listing \ref{grand-circle-intro}):
   1.183 +
   1.184 +   #+caption: Body-centerd actions are best expressed in a body-centered 
   1.185 +   #+caption: language. This code detects when the worm has curled into a 
   1.186 +   #+caption: full circle. Imagine how you would replicate this functionality
   1.187 +   #+caption: using low-level pixel features such as HOG filters!
   1.188 +   #+name: grand-circle-intro
   1.189 +   #+begin_listing clojure
   1.190 +   #+begin_src clojure
   1.191 +(defn grand-circle?
   1.192 +  "Does the worm form a majestic circle (one end touching the other)?"
   1.193 +  [experiences]
   1.194 +  (and (curled? experiences)
   1.195 +       (let [worm-touch (:touch (peek experiences))
   1.196 +             tail-touch (worm-touch 0)
   1.197 +             head-touch (worm-touch 4)]
   1.198 +         (and (< 0.2 (contact worm-segment-bottom-tip tail-touch))
   1.199 +              (< 0.2 (contact worm-segment-top-tip    head-touch))))))
   1.200 +   #+end_src
   1.201 +   #+end_listing
   1.202 +
   1.203 +** =EMPATH= regognizes actions using empathy
   1.204  
   1.205     First, I built a system for constructing virtual creatures with
   1.206     physiologically plausible sensorimotor systems and detailed
   1.207 @@ -129,85 +236,6 @@
   1.208     section \ref{sec-3}, and the key results of this experiment are
   1.209     summarized below.
   1.210  
   1.211 -  #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer 
   1.212 -  #+caption: the complete sensory experience and classify these four poses.
   1.213 -  #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}.
   1.214 -  #+name: worm-recognition-intro-2
   1.215 -  #+ATTR_LaTeX: :width 15cm
   1.216 -   [[./images/empathy-1.png]]
   1.217 -
   1.218 -   # =CORTEX= provides a language for describing the sensorimotor
   1.219 -   # experiences of various creatures. 
   1.220 -
   1.221 -   # Next, I developed an experiment to test the power of =CORTEX='s
   1.222 -   # sensorimotor-centered language for solving recognition problems. As
   1.223 -   # a proof of concept, I wrote routines which enabled a simple
   1.224 -   # worm-like creature to infer the actions of a second worm-like
   1.225 -   # creature, using only its own previous sensorimotor experiences and
   1.226 -   # knowledge of the second worm's joints (figure
   1.227 -   # \ref{worm-recognition-intro-2}). The result of this proof of
   1.228 -   # concept was the program =EMPATH=, described in section
   1.229 -   # \ref{sec-3}. The key results of this
   1.230 -
   1.231 -   # Using only first-person sensorimotor experiences and third-person
   1.232 -   # proprioceptive data, 
   1.233 -
   1.234 -*** Key results
   1.235 -   - After one-shot supervised training, =EMPATH= was able recognize a
   1.236 -     wide variety of static poses and dynamic actions---ranging from
   1.237 -     curling in a circle to wriggling with a particular frequency ---
   1.238 -     with 95\% accuracy.
   1.239 -   - These results were completely independent of viewing angle
   1.240 -     because the underlying body-centered language fundamentally is
   1.241 -     independent; once an action is learned, it can be recognized
   1.242 -     equally well from any viewing angle.
   1.243 -   - =EMPATH= is surprisingly short; the sensorimotor-centered
   1.244 -     language provided by =CORTEX= resulted in extremely economical
   1.245 -     recognition routines --- about 0000 lines in all --- suggesting
   1.246 -     that such representations are very powerful, and often
   1.247 -     indispensible for the types of recognition tasks considered here.
   1.248 -   - Although for expediency's sake, I relied on direct knowledge of
   1.249 -     joint positions in this proof of concept, it would be
   1.250 -     straightforward to extend =EMPATH= so that it (more
   1.251 -     realistically) infers joint positions from its visual data.
   1.252 -
   1.253 -# because the underlying language is fundamentally orientation-independent
   1.254 -
   1.255 -# recognize the actions of a worm with 95\% accuracy. The
   1.256 -#      recognition tasks 
   1.257 -   
   1.258 -
   1.259 -
   1.260 -
   1.261 -   [Talk about these results and what you find promising about them]
   1.262 -
   1.263 -** Roadmap
   1.264 -   [I'm going to explain how =CORTEX= works, then break down how
   1.265 -   =EMPATH= does its thing. Because the details reveal such-and-such
   1.266 -   about the approach.]
   1.267 -
   1.268 -   # The success of this simple proof-of-concept offers a tantalizing
   1.269 -
   1.270 -
   1.271 -   # explore the idea 
   1.272 -   # The key contribution of this thesis is the idea that body-centered
   1.273 -   # representations (which express 
   1.274 -
   1.275 -
   1.276 -   # the
   1.277 -   # body-centered approach --- in which I try to determine what's
   1.278 -   # happening in a scene by bringing it into registration with my own
   1.279 -   # bodily experiences --- are indispensible for recognizing what
   1.280 -   # creatures are doing in a scene.
   1.281 -
   1.282 -* COMMENT
   1.283 -# body-centered language
   1.284 -   
   1.285 -   In this thesis, I'll describe =EMPATH=, which solves a certain
   1.286 -   class of recognition problems 
   1.287 -
   1.288 -   The key idea is to use self-centered (or first-person) language.
   1.289 -
   1.290     I have built a system that can express the types of recognition
   1.291     problems in a form amenable to computation. It is split into
   1.292     four parts:
   1.293 @@ -243,60 +271,6 @@
   1.294          retrieved, and if it is analogous enough to the scene, then
   1.295          the creature will correctly identify the action in the scene.
   1.296     
   1.297 -   For example, I think humans are able to label the cat video as
   1.298 -   ``drinking'' because they imagine /themselves/ as the cat, and
   1.299 -   imagine putting their face up against a stream of water and
   1.300 -   sticking out their tongue. In that imagined world, they can feel
   1.301 -   the cool water hitting their tongue, and feel the water entering
   1.302 -   their body, and are able to recognize that /feeling/ as drinking.
   1.303 -   So, the label of the action is not really in the pixels of the
   1.304 -   image, but is found clearly in a simulation inspired by those
   1.305 -   pixels. An imaginative system, having been trained on drinking and
   1.306 -   non-drinking examples and learning that the most important
   1.307 -   component of drinking is the feeling of water sliding down one's
   1.308 -   throat, would analyze a video of a cat drinking in the following
   1.309 -   manner:
   1.310 -   
   1.311 -   1. Create a physical model of the video by putting a ``fuzzy''
   1.312 -      model of its own body in place of the cat. Possibly also create
   1.313 -      a simulation of the stream of water.
   1.314 -
   1.315 -   2. Play out this simulated scene and generate imagined sensory
   1.316 -      experience. This will include relevant muscle contractions, a
   1.317 -      close up view of the stream from the cat's perspective, and most
   1.318 -      importantly, the imagined feeling of water entering the
   1.319 -      mouth. The imagined sensory experience can come from a
   1.320 -      simulation of the event, but can also be pattern-matched from
   1.321 -      previous, similar embodied experience.
   1.322 -
   1.323 -   3. The action is now easily identified as drinking by the sense of
   1.324 -      taste alone. The other senses (such as the tongue moving in and
   1.325 -      out) help to give plausibility to the simulated action. Note that
   1.326 -      the sense of vision, while critical in creating the simulation,
   1.327 -      is not critical for identifying the action from the simulation.
   1.328 -
   1.329 -   For the chair examples, the process is even easier:
   1.330 -
   1.331 -    1. Align a model of your body to the person in the image.
   1.332 -
   1.333 -    2. Generate proprioceptive sensory data from this alignment.
   1.334 -  
   1.335 -    3. Use the imagined proprioceptive data as a key to lookup related
   1.336 -       sensory experience associated with that particular proproceptive
   1.337 -       feeling.
   1.338 -
   1.339 -    4. Retrieve the feeling of your bottom resting on a surface, your
   1.340 -       knees bent, and your leg muscles relaxed.
   1.341 -
   1.342 -    5. This sensory information is consistent with the =sitting?=
   1.343 -       sensory predicate, so you (and the entity in the image) must be
   1.344 -       sitting.
   1.345 -
   1.346 -    6. There must be a chair-like object since you are sitting.
   1.347 -
   1.348 -   Empathy offers yet another alternative to the age-old AI
   1.349 -   representation question: ``What is a chair?'' --- A chair is the
   1.350 -   feeling of sitting.
   1.351  
   1.352     My program, =EMPATH= uses this empathic problem solving technique
   1.353     to interpret the actions of a simple, worm-like creature. 
   1.354 @@ -313,52 +287,28 @@
   1.355     #+name: worm-recognition-intro
   1.356     #+ATTR_LaTeX: :width 15cm
   1.357     [[./images/worm-poses.png]]
   1.358 +
   1.359 +   #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer 
   1.360 +   #+caption: the complete sensory experience and classify these four poses.
   1.361 +   #+caption: The last image is a composite, depicting the intermediate stages
   1.362 +   #+caption: of \emph{wriggling}.
   1.363 +   #+name: worm-recognition-intro-2
   1.364 +   #+ATTR_LaTeX: :width 15cm
   1.365 +   [[./images/empathy-1.png]]
   1.366     
   1.367 -   One powerful advantage of empathic problem solving is that it
   1.368 -   factors the action recognition problem into two easier problems. To
   1.369 -   use empathy, you need an /aligner/, which takes the video and a
   1.370 -   model of your body, and aligns the model with the video. Then, you
   1.371 -   need a /recognizer/, which uses the aligned model to interpret the
   1.372 -   action. The power in this method lies in the fact that you describe
   1.373 -   all actions form a body-centered viewpoint. You are less tied to
   1.374 -   the particulars of any visual representation of the actions. If you
   1.375 -   teach the system what ``running'' is, and you have a good enough
   1.376 -   aligner, the system will from then on be able to recognize running
   1.377 -   from any point of view, even strange points of view like above or
   1.378 -   underneath the runner. This is in contrast to action recognition
   1.379 -   schemes that try to identify actions using a non-embodied approach.
   1.380 -   If these systems learn about running as viewed from the side, they
   1.381 -   will not automatically be able to recognize running from any other
   1.382 -   viewpoint.
   1.383 -
   1.384 -   Another powerful advantage is that using the language of multiple
   1.385 -   body-centered rich senses to describe body-centerd actions offers a
   1.386 -   massive boost in descriptive capability. Consider how difficult it
   1.387 -   would be to compose a set of HOG filters to describe the action of
   1.388 -   a simple worm-creature ``curling'' so that its head touches its
   1.389 -   tail, and then behold the simplicity of describing thus action in a
   1.390 -   language designed for the task (listing \ref{grand-circle-intro}):
   1.391 -
   1.392 -   #+caption: Body-centerd actions are best expressed in a body-centered 
   1.393 -   #+caption: language. This code detects when the worm has curled into a 
   1.394 -   #+caption: full circle. Imagine how you would replicate this functionality
   1.395 -   #+caption: using low-level pixel features such as HOG filters!
   1.396 -   #+name: grand-circle-intro
   1.397 -   #+begin_listing clojure
   1.398 -   #+begin_src clojure
   1.399 -(defn grand-circle?
   1.400 -  "Does the worm form a majestic circle (one end touching the other)?"
   1.401 -  [experiences]
   1.402 -  (and (curled? experiences)
   1.403 -       (let [worm-touch (:touch (peek experiences))
   1.404 -             tail-touch (worm-touch 0)
   1.405 -             head-touch (worm-touch 4)]
   1.406 -         (and (< 0.2 (contact worm-segment-bottom-tip tail-touch))
   1.407 -              (< 0.2 (contact worm-segment-top-tip    head-touch))))))
   1.408 -   #+end_src
   1.409 -   #+end_listing
   1.410 -
   1.411 -**  =CORTEX= is a toolkit for building sensate creatures
   1.412 +   Next, I developed an experiment to test the power of =CORTEX='s
   1.413 +   sensorimotor-centered language for solving recognition problems. As
   1.414 +   a proof of concept, I wrote routines which enabled a simple
   1.415 +   worm-like creature to infer the actions of a second worm-like
   1.416 +   creature, using only its own previous sensorimotor experiences and
   1.417 +   knowledge of the second worm's joints (figure
   1.418 +   \ref{worm-recognition-intro-2}). The result of this proof of
   1.419 +   concept was the program =EMPATH=, described in section \ref{sec-3}.
   1.420 +
   1.421 +** =EMPATH= is built on =CORTEX=, en environment for making creatures.
   1.422 +
   1.423 + # =CORTEX= provides a language for describing the sensorimotor
   1.424 +   # experiences of various creatures. 
   1.425  
   1.426     I built =CORTEX= to be a general AI research platform for doing
   1.427     experiments involving multiple rich senses and a wide variety and
   1.428 @@ -412,9 +362,9 @@
   1.429     require a small layer of Java code. =CORTEX= also uses =bullet=, a
   1.430     physics simulator written in =C=.
   1.431  
   1.432 -   #+caption: Here is the worm from above modeled in Blender, a free 
   1.433 -   #+caption: 3D-modeling program. Senses and joints are described
   1.434 -   #+caption: using special nodes in Blender.
   1.435 +   #+caption: Here is the worm from figure \ref{worm-intro} modeled 
   1.436 +   #+caption: in Blender, a free 3D-modeling program. Senses and 
   1.437 +   #+caption: joints are described using special nodes in Blender.
   1.438     #+name: worm-recognition-intro
   1.439     #+ATTR_LaTeX: :width 12cm
   1.440     [[./images/blender-worm.png]]
   1.441 @@ -450,24 +400,7 @@
   1.442     \end{sidewaysfigure}
   1.443  #+END_LaTeX
   1.444  
   1.445 -** Road map
   1.446 -
   1.447 -   By the end of this thesis, you will have seen a novel approach to
   1.448 -  interpreting video using embodiment and empathy. You will have also
   1.449 -  seen one way to efficiently implement empathy for embodied
   1.450 -  creatures. Finally, you will become familiar with =CORTEX=, a system
   1.451 -  for designing and simulating creatures with rich senses, which you
   1.452 -  may choose to use in your own research.
   1.453 -  
   1.454 -  This is the core vision of my thesis: That one of the important ways
   1.455 -  in which we understand others is by imagining ourselves in their
   1.456 -  position and emphatically feeling experiences relative to our own
   1.457 -  bodies. By understanding events in terms of our own previous
   1.458 -  corporeal experience, we greatly constrain the possibilities of what
   1.459 -  would otherwise be an unwieldy exponential search. This extra
   1.460 -  constraint can be the difference between easily understanding what
   1.461 -  is happening in a video and being completely lost in a sea of
   1.462 -  incomprehensible color and movement.
   1.463 +** Contributions
   1.464  
   1.465     - I built =CORTEX=, a comprehensive platform for embodied AI
   1.466       experiments. =CORTEX= supports many features lacking in other
   1.467 @@ -476,14 +409,35 @@
   1.468  
   1.469     - I built =EMPATH=, which uses =CORTEX= to identify the actions of
   1.470       a worm-like creature using a computational model of empathy.
   1.471 -   
   1.472 +
   1.473 +   - After one-shot supervised training, =EMPATH= was able recognize a
   1.474 +     wide variety of static poses and dynamic actions---ranging from
   1.475 +     curling in a circle to wriggling with a particular frequency ---
   1.476 +     with 95\% accuracy.
   1.477 +
   1.478 +   - These results were completely independent of viewing angle
   1.479 +     because the underlying body-centered language fundamentally is
   1.480 +     independent; once an action is learned, it can be recognized
   1.481 +     equally well from any viewing angle.
   1.482 +
   1.483 +   - =EMPATH= is surprisingly short; the sensorimotor-centered
   1.484 +     language provided by =CORTEX= resulted in extremely economical
   1.485 +     recognition routines --- about 500 lines in all --- suggesting
   1.486 +     that such representations are very powerful, and often
   1.487 +     indispensible for the types of recognition tasks considered here.
   1.488 +
   1.489 +   - Although for expediency's sake, I relied on direct knowledge of
   1.490 +     joint positions in this proof of concept, it would be
   1.491 +     straightforward to extend =EMPATH= so that it (more
   1.492 +     realistically) infers joint positions from its visual data.
   1.493  
   1.494  * Designing =CORTEX=
   1.495 +
   1.496    In this section, I outline the design decisions that went into
   1.497 -  making =CORTEX=, along with some details about its
   1.498 -  implementation. (A practical guide to getting started with =CORTEX=,
   1.499 -  which skips over the history and implementation details presented
   1.500 -  here, is provided in an appendix \ref{} at the end of this paper.)
   1.501 +  making =CORTEX=, along with some details about its implementation.
   1.502 +  (A practical guide to getting started with =CORTEX=, which skips
   1.503 +  over the history and implementation details presented here, is
   1.504 +  provided in an appendix at the end of this thesis.)
   1.505  
   1.506    Throughout this project, I intended for =CORTEX= to be flexible and
   1.507    extensible enough to be useful for other researchers who want to
   1.508 @@ -554,7 +508,7 @@
   1.509      doing everything in software is far cheaper than building custom
   1.510      real-time hardware. All you need is a laptop and some patience.
   1.511      
   1.512 -** Simulated time enables rapid prototyping and complex scenes 
   1.513 +** Simulated time enables rapid prototyping \& simple programs
   1.514  
   1.515     I envision =CORTEX= being used to support rapid prototyping and
   1.516     iteration of ideas. Even if I could put together a well constructed
author	Robert McIntyre <rlm@mit.edu>
date	Sun, 30 Mar 2014 22:48:19 -0400
parents	58fa1ffd481e
children	68665d2c32a7