Mercurial > cortex

     1 #+title: =CORTEX=

     2 #+author: Robert McIntyre

     3 #+email: rlm@mit.edu

     4 #+description: Using embodied AI to facilitate Artificial Imagination.

     5 #+keywords: AI, clojure, embodiment

     6 #+LaTeX_CLASS_OPTIONS: [nofloat]

     7 

     8 * Empathy and Embodiment as problem solving strategieszzzzzzz

     9   

    10   By the end of this thesis, you will have seen a novel approach to

    11   interpreting video using embodiment and empathy. You will have also

    12   seen one way to efficiently implement empathy for embodied

    13   creatures. Finally, you will become familiar with =CORTEX=, a system

    14   for designing and simulating creatures with rich senses, which you

    15   may choose to use in your own research.

    16   

    17   This is the core vision of my thesis: That one of the important ways

    18   in which we understand others is by imagining ourselves in their

    19   position and emphatically feeling experiences relative to our own

    20   bodies. By understanding events in terms of our own previous

    21   corporeal experience, we greatly constrain the possibilities of what

    22   would otherwise be an unwieldy exponential search. This extra

    23   constraint can be the difference between easily understanding what

    24   is happening in a video and being completely lost in a sea of

    25   incomprehensible color and movement.

    26 

    27 ** Recognizing actions in video is extremely difficult

    28 

    29    Consider for example the problem of determining what is happening

    30    in a video of which this is one frame:

    31 

    32    #+caption: A cat drinking some water. Identifying this action is 

    33    #+caption: beyond the state of the art for computers.

    34    #+ATTR_LaTeX: :width 7cm

    35    [[./images/cat-drinking.jpg]]

    36    

    37    It is currently impossible for any computer program to reliably

    38    label such a video as ``drinking''. And rightly so -- it is a very

    39    hard problem! What features can you describe in terms of low level

    40    functions of pixels that can even begin to describe at a high level

    41    what is happening here?

    42   

    43    Or suppose that you are building a program that recognizes chairs.

    44    How could you ``see'' the chair in figure \ref{hidden-chair}?

    45    

    46    #+caption: The chair in this image is quite obvious to humans, but I 

    47    #+caption: doubt that any modern computer vision program can find it.

    48    #+name: hidden-chair

    49    #+ATTR_LaTeX: :width 10cm

    50    [[./images/fat-person-sitting-at-desk.jpg]]

    51    

    52    Finally, how is it that you can easily tell the difference between

    53    how the girls /muscles/ are working in figure \ref{girl}?

    54    

    55    #+caption: The mysterious ``common sense'' appears here as you are able 

    56    #+caption: to discern the difference in how the girl's arm muscles

    57    #+caption: are activated between the two images.

    58    #+name: girl

    59    #+ATTR_LaTeX: :width 7cm

    60    [[./images/wall-push.png]]

    61   

    62    Each of these examples tells us something about what might be going

    63    on in our minds as we easily solve these recognition problems.

    64    

    65    The hidden chairs show us that we are strongly triggered by cues

    66    relating to the position of human bodies, and that we can determine

    67    the overall physical configuration of a human body even if much of

    68    that body is occluded.

    69 

    70    The picture of the girl pushing against the wall tells us that we

    71    have common sense knowledge about the kinetics of our own bodies.

    72    We know well how our muscles would have to work to maintain us in

    73    most positions, and we can easily project this self-knowledge to

    74    imagined positions triggered by images of the human body.

    75 

    76 ** =EMPATH= neatly solves recognition problems  

    77    

    78    I propose a system that can express the types of recognition

    79    problems above in a form amenable to computation. It is split into

    80    four parts:

    81 

    82    - Free/Guided Play :: The creature moves around and experiences the

    83         world through its unique perspective. Many otherwise

    84         complicated actions are easily described in the language of a

    85         full suite of body-centered, rich senses. For example,

    86         drinking is the feeling of water sliding down your throat, and

    87         cooling your insides. It's often accompanied by bringing your

    88         hand close to your face, or bringing your face close to water.

    89         Sitting down is the feeling of bending your knees, activating

    90         your quadriceps, then feeling a surface with your bottom and

    91         relaxing your legs. These body-centered action descriptions

    92         can be either learned or hard coded.

    93    - Posture Imitation :: When trying to interpret a video or image,

    94         the creature takes a model of itself and aligns it with

    95         whatever it sees. This alignment can even cross species, as

    96         when humans try to align themselves with things like ponies,

    97         dogs, or other humans with a different body type.

    98    - Empathy         :: The alignment triggers associations with

    99         sensory data from prior experiences. For example, the

   100         alignment itself easily maps to proprioceptive data. Any

   101         sounds or obvious skin contact in the video can to a lesser

   102         extent trigger previous experience. Segments of previous

   103         experiences are stitched together to form a coherent and

   104         complete sensory portrait of the scene.

   105    - Recognition      :: With the scene described in terms of first

   106         person sensory events, the creature can now run its

   107         action-identification programs on this synthesized sensory

   108         data, just as it would if it were actually experiencing the

   109         scene first-hand. If previous experience has been accurately

   110         retrieved, and if it is analogous enough to the scene, then

   111         the creature will correctly identify the action in the scene.

   112    

   113    For example, I think humans are able to label the cat video as

   114    ``drinking'' because they imagine /themselves/ as the cat, and

   115    imagine putting their face up against a stream of water and

   116    sticking out their tongue. In that imagined world, they can feel

   117    the cool water hitting their tongue, and feel the water entering

   118    their body, and are able to recognize that /feeling/ as drinking.

   119    So, the label of the action is not really in the pixels of the

   120    image, but is found clearly in a simulation inspired by those

   121    pixels. An imaginative system, having been trained on drinking and

   122    non-drinking examples and learning that the most important

   123    component of drinking is the feeling of water sliding down one's

   124    throat, would analyze a video of a cat drinking in the following

   125    manner:

   126    

   127    1. Create a physical model of the video by putting a ``fuzzy''

   128       model of its own body in place of the cat. Possibly also create

   129       a simulation of the stream of water.

   130 

   131    2. Play out this simulated scene and generate imagined sensory

   132       experience. This will include relevant muscle contractions, a

   133       close up view of the stream from the cat's perspective, and most

   134       importantly, the imagined feeling of water entering the

   135       mouth. The imagined sensory experience can come from a

   136       simulation of the event, but can also be pattern-matched from

   137       previous, similar embodied experience.

   138 

   139    3. The action is now easily identified as drinking by the sense of

   140       taste alone. The other senses (such as the tongue moving in and

   141       out) help to give plausibility to the simulated action. Note that

   142       the sense of vision, while critical in creating the simulation,

   143       is not critical for identifying the action from the simulation.

   144 

   145    For the chair examples, the process is even easier:

   146 

   147     1. Align a model of your body to the person in the image.

   148 

   149     2. Generate proprioceptive sensory data from this alignment.

   150   

   151     3. Use the imagined proprioceptive data as a key to lookup related

   152        sensory experience associated with that particular proproceptive

   153        feeling.

   154 

   155     4. Retrieve the feeling of your bottom resting on a surface, your

   156        knees bent, and your leg muscles relaxed.

   157 

   158     5. This sensory information is consistent with the =sitting?=

   159        sensory predicate, so you (and the entity in the image) must be

   160        sitting.

   161 

   162     6. There must be a chair-like object since you are sitting.

   163 

   164    Empathy offers yet another alternative to the age-old AI

   165    representation question: ``What is a chair?'' --- A chair is the

   166    feeling of sitting.

   167 

   168    My program, =EMPATH= uses this empathic problem solving technique

   169    to interpret the actions of a simple, worm-like creature. 

   170    

   171    #+caption: The worm performs many actions during free play such as 

   172    #+caption: curling, wiggling, and resting.

   173    #+name: worm-intro

   174    #+ATTR_LaTeX: :width 15cm

   175    [[./images/worm-intro-white.png]]

   176 

   177    #+caption: =EMPATH= recognized and classified each of these poses by

   178    #+caption: inferring the complete sensory experience from 

   179    #+caption: proprioceptive data.

   180    #+name: worm-recognition-intro

   181    #+ATTR_LaTeX: :width 15cm

   182    [[./images/worm-poses.png]]

   183    

   184    One powerful advantage of empathic problem solving is that it

   185    factors the action recognition problem into two easier problems. To

   186    use empathy, you need an /aligner/, which takes the video and a

   187    model of your body, and aligns the model with the video. Then, you

   188    need a /recognizer/, which uses the aligned model to interpret the

   189    action. The power in this method lies in the fact that you describe

   190    all actions form a body-centered viewpoint. You are less tied to

   191    the particulars of any visual representation of the actions. If you

   192    teach the system what ``running'' is, and you have a good enough

   193    aligner, the system will from then on be able to recognize running

   194    from any point of view, even strange points of view like above or

   195    underneath the runner. This is in contrast to action recognition

   196    schemes that try to identify actions using a non-embodied approach.

   197    If these systems learn about running as viewed from the side, they

   198    will not automatically be able to recognize running from any other

   199    viewpoint.

   200 

   201    Another powerful advantage is that using the language of multiple

   202    body-centered rich senses to describe body-centerd actions offers a

   203    massive boost in descriptive capability. Consider how difficult it

   204    would be to compose a set of HOG filters to describe the action of

   205    a simple worm-creature ``curling'' so that its head touches its

   206    tail, and then behold the simplicity of describing thus action in a

   207    language designed for the task (listing \ref{grand-circle-intro}):

   208 

   209    #+caption: Body-centerd actions are best expressed in a body-centered 

   210    #+caption: language. This code detects when the worm has curled into a 

   211    #+caption: full circle. Imagine how you would replicate this functionality

   212    #+caption: using low-level pixel features such as HOG filters!

   213    #+name: grand-circle-intro

   214    #+attr_latex: [htpb]

   215 #+begin_listing clojure

   216    #+begin_src clojure

   217 (defn grand-circle?

   218   "Does the worm form a majestic circle (one end touching the other)?"

   219   [experiences]

   220   (and (curled? experiences)

   221        (let [worm-touch (:touch (peek experiences))

   222              tail-touch (worm-touch 0)

   223              head-touch (worm-touch 4)]

   224          (and (< 0.55 (contact worm-segment-bottom-tip tail-touch))

   225               (< 0.55 (contact worm-segment-top-tip    head-touch))))))

   226    #+end_src

   227    #+end_listing

   228 

   229 

   230 **  =CORTEX= is a toolkit for building sensate creatures

   231 

   232    I built =CORTEX= to be a general AI research platform for doing

   233    experiments involving multiple rich senses and a wide variety and

   234    number of creatures. I intend it to be useful as a library for many

   235    more projects than just this one. =CORTEX= was necessary to meet a

   236    need among AI researchers at CSAIL and beyond, which is that people

   237    often will invent neat ideas that are best expressed in the

   238    language of creatures and senses, but in order to explore those

   239    ideas they must first build a platform in which they can create

   240    simulated creatures with rich senses! There are many ideas that

   241    would be simple to execute (such as =EMPATH=), but attached to them

   242    is the multi-month effort to make a good creature simulator. Often,

   243    that initial investment of time proves to be too much, and the

   244    project must make do with a lesser environment.

   245 

   246    =CORTEX= is well suited as an environment for embodied AI research

   247    for three reasons:

   248 

   249    - You can create new creatures using Blender, a popular 3D modeling

   250      program. Each sense can be specified using special blender nodes

   251      with biologically inspired paramaters. You need not write any

   252      code to create a creature, and can use a wide library of

   253      pre-existing blender models as a base for your own creatures.

   254 

   255    - =CORTEX= implements a wide variety of senses, including touch,

   256      proprioception, vision, hearing, and muscle tension. Complicated

   257      senses like touch, and vision involve multiple sensory elements

   258      embedded in a 2D surface. You have complete control over the

   259      distribution of these sensor elements through the use of simple

   260      png image files. In particular, =CORTEX= implements more

   261      comprehensive hearing than any other creature simulation system

   262      available. 

   263 

   264    - =CORTEX= supports any number of creatures and any number of

   265      senses. Time in =CORTEX= dialates so that the simulated creatures

   266      always precieve a perfectly smooth flow of time, regardless of

   267      the actual computational load.

   268 

   269    =CORTEX= is built on top of =jMonkeyEngine3=, which is a video game

   270    engine designed to create cross-platform 3D desktop games. =CORTEX=

   271    is mainly written in clojure, a dialect of =LISP= that runs on the

   272    java virtual machine (JVM). The API for creating and simulating

   273    creatures and senses is entirely expressed in clojure, though many

   274    senses are implemented at the layer of jMonkeyEngine or below. For

   275    example, for the sense of hearing I use a layer of clojure code on

   276    top of a layer of java JNI bindings that drive a layer of =C++=

   277    code which implements a modified version of =OpenAL= to support

   278    multiple listeners. =CORTEX= is the only simulation environment

   279    that I know of that can support multiple entities that can each

   280    hear the world from their own perspective. Other senses also

   281    require a small layer of Java code. =CORTEX= also uses =bullet=, a

   282    physics simulator written in =C=.

   283 

   284    #+caption: Here is the worm from above modeled in Blender, a free 

   285    #+caption: 3D-modeling program. Senses and joints are described

   286    #+caption: using special nodes in Blender.

   287    #+name: worm-recognition-intro

   288    #+ATTR_LaTeX: :width 12cm

   289    [[./images/blender-worm.png]]

   290 

   291    Here are some thing I anticipate that =CORTEX= might be used for:

   292 

   293    - exploring new ideas about sensory integration

   294    - distributed communication among swarm creatures

   295    - self-learning using free exploration, 

   296    - evolutionary algorithms involving creature construction

   297    - exploration of exoitic senses and effectors that are not possible

   298      in the real world (such as telekenisis or a semantic sense)

   299    - imagination using subworlds

   300 

   301    During one test with =CORTEX=, I created 3,000 creatures each with

   302    their own independent senses and ran them all at only 1/80 real

   303    time. In another test, I created a detailed model of my own hand,

   304    equipped with a realistic distribution of touch (more sensitive at

   305    the fingertips), as well as eyes and ears, and it ran at around 1/4

   306    real time.

   307 

   308 #+BEGIN_LaTeX

   309    \begin{sidewaysfigure}

   310    \includegraphics[width=9.5in]{images/full-hand.png}

   311    \caption{

   312    I modeled my own right hand in Blender and rigged it with all the

   313    senses that {\tt CORTEX} supports. My simulated hand has a

   314    biologically inspired distribution of touch sensors. The senses are

   315    displayed on the right, and the simulation is displayed on the

   316    left. Notice that my hand is curling its fingers, that it can see

   317    its own finger from the eye in its palm, and that it can feel its

   318    own thumb touching its palm.}

   319    \end{sidewaysfigure}

   320 #+END_LaTeX

   321 

   322 ** Contributions

   323 

   324    - I built =CORTEX=, a comprehensive platform for embodied AI

   325      experiments. =CORTEX= supports many features lacking in other

   326      systems, such proper simulation of hearing. It is easy to create

   327      new =CORTEX= creatures using Blender, a free 3D modeling program.

   328 

   329    - I built =EMPATH=, which uses =CORTEX= to identify the actions of

   330      a worm-like creature using a computational model of empathy.

   331    

   332 * Building =CORTEX=

   333 

   334 ** To explore embodiment, we need a world, body, and senses

   335 

   336 ** Because of Time, simulation is perferable to reality

   337 

   338 ** Video game engines are a great starting point

   339 

   340 ** Bodies are composed of segments connected by joints

   341 

   342 ** Eyes reuse standard video game components

   343 

   344 ** Hearing is hard; =CORTEX= does it right

   345 

   346 ** Touch uses hundreds of hair-like elements

   347 

   348 ** Proprioception is the sense that makes everything ``real''

   349 

   350 ** Muscles are both effectors and sensors

   351 

   352 ** =CORTEX= brings complex creatures to life!

   353 

   354 ** =CORTEX= enables many possiblities for further research

   355 

   356 * Empathy in a simulated worm

   357 

   358   Here I develop a computational model of empathy, using =CORTEX= as a

   359   base. Empathy in this context is the ability to observe another

   360   creature and infer what sorts of sensations that creature is

   361   feeling. My empathy algorithm involves multiple phases. First is

   362   free-play, where the creature moves around and gains sensory

   363   experience. From this experience I construct a representation of the

   364   creature's sensory state space, which I call \Phi-space. Using

   365   \Phi-space, I construct an efficient function which takes the

   366   limited data that comes from observing another creature and enriches

   367   it full compliment of imagined sensory data. I can then use the

   368   imagined sensory data to recognize what the observed creature is

   369   doing and feeling, using straightforward embodied action predicates.

   370   This is all demonstrated with using a simple worm-like creature, and

   371   recognizing worm-actions based on limited data.

   372 

   373   #+caption: Here is the worm with which we will be working. 

   374   #+caption: It is composed of 5 segments. Each segment has a 

   375   #+caption: pair of extensor and flexor muscles. Each of the 

   376   #+caption: worm's four joints is a hinge joint which allows 

   377   #+caption: about 30 degrees of rotation to either side. Each segment

   378   #+caption: of the worm is touch-capable and has a uniform 

   379   #+caption: distribution of touch sensors on each of its faces.

   380   #+caption: Each joint has a proprioceptive sense to detect 

   381   #+caption: relative positions. The worm segments are all the 

   382   #+caption: same except for the first one, which has a much

   383   #+caption: higher weight than the others to allow for easy 

   384   #+caption: manual motor control.

   385   #+name: basic-worm-view

   386   #+ATTR_LaTeX: :width 10cm

   387   [[./images/basic-worm-view.png]]

   388 

   389   #+caption: Program for reading a worm from a blender file and 

   390   #+caption: outfitting it with the senses of proprioception, 

   391   #+caption: touch, and the ability to move, as specified in the 

   392   #+caption: blender file.

   393   #+name: get-worm

   394   #+begin_listing clojure

   395   #+begin_src clojure

   396 (defn worm []

   397   (let [model (load-blender-model "Models/worm/worm.blend")]

   398     {:body (doto model (body!))

   399      :touch (touch! model)

   400      :proprioception (proprioception! model)

   401      :muscles (movement! model)}))

   402   #+end_src

   403   #+end_listing

   404 

   405 ** Embodiment factors action recognition into managable parts

   406 

   407    Using empathy, I divide the problem of action recognition into a

   408    recognition process expressed in the language of a full compliment

   409    of senses, and an imaganitive process that generates full sensory

   410    data from partial sensory data. Splitting the action recognition

   411    problem in this manner greatly reduces the total amount of work to

   412    recognize actions: The imaganitive process is mostly just matching

   413    previous experience, and the recognition process gets to use all

   414    the senses to directly describe any action.

   415 

   416 ** Action recognition is easy with a full gamut of senses

   417 

   418    Embodied representations using multiple senses such as touch,

   419    proprioception, and muscle tension turns out be be exceedingly

   420    efficient at describing body-centered actions. It is the ``right

   421    language for the job''. For example, it takes only around 5 lines

   422    of LISP code to describe the action of ``curling'' using embodied

   423    primitives. It takes about 10 lines to describe the seemingly

   424    complicated action of wiggling.

   425 

   426    The following action predicates each take a stream of sensory

   427    experience, observe however much of it they desire, and decide

   428    whether the worm is doing the action they describe. =curled?=

   429    relies on proprioception, =resting?= relies on touch, =wiggling?=

   430    relies on a fourier analysis of muscle contraction, and

   431    =grand-circle?= relies on touch and reuses =curled?= as a gaurd.

   432    

   433    #+caption: Program for detecting whether the worm is curled. This is the 

   434    #+caption: simplest action predicate, because it only uses the last frame 

   435    #+caption: of sensory experience, and only uses proprioceptive data. Even 

   436    #+caption: this simple predicate, however, is automatically frame 

   437    #+caption: independent and ignores vermopomorphic differences such as 

   438    #+caption: worm textures and colors.

   439    #+name: curled

   440    #+attr_latex: [htpb]

   441 #+begin_listing clojure

   442    #+begin_src clojure

   443 (defn curled?

   444   "Is the worm curled up?"

   445   [experiences]

   446   (every?

   447    (fn [[_ _ bend]]

   448      (> (Math/sin bend) 0.64))

   449    (:proprioception (peek experiences))))

   450    #+end_src

   451    #+end_listing

   452 

   453    #+caption: Program for summarizing the touch information in a patch 

   454    #+caption: of skin.

   455    #+name: touch-summary

   456    #+attr_latex: [htpb]

   457 

   458 #+begin_listing clojure

   459    #+begin_src clojure

   460 (defn contact

   461   "Determine how much contact a particular worm segment has with

   462    other objects. Returns a value between 0 and 1, where 1 is full

   463    contact and 0 is no contact."

   464   [touch-region [coords contact :as touch]]

   465   (-> (zipmap coords contact)

   466       (select-keys touch-region)

   467       (vals)

   468       (#(map first %))

   469       (average)

   470       (* 10)

   471       (- 1)

   472       (Math/abs)))

   473    #+end_src

   474    #+end_listing

   475 

   476 

   477    #+caption: Program for detecting whether the worm is at rest. This program

   478    #+caption: uses a summary of the tactile information from the underbelly 

   479    #+caption: of the worm, and is only true if every segment is touching the 

   480    #+caption: floor. Note that this function contains no references to 

   481    #+caption: proprioction at all.

   482    #+name: resting

   483    #+attr_latex: [htpb]

   484 #+begin_listing clojure

   485    #+begin_src clojure

   486 (def worm-segment-bottom (rect-region [8 15] [14 22]))

   487 

   488 (defn resting?

   489   "Is the worm resting on the ground?"

   490   [experiences]

   491   (every?

   492    (fn [touch-data]

   493      (< 0.9 (contact worm-segment-bottom touch-data)))

   494    (:touch (peek experiences))))

   495    #+end_src

   496    #+end_listing

   497 

   498    #+caption: Program for detecting whether the worm is curled up into a 

   499    #+caption: full circle. Here the embodied approach begins to shine, as

   500    #+caption: I am able to both use a previous action predicate (=curled?=)

   501    #+caption: as well as the direct tactile experience of the head and tail.

   502    #+name: grand-circle

   503    #+attr_latex: [htpb]

   504 #+begin_listing clojure

   505    #+begin_src clojure

   506 (def worm-segment-bottom-tip (rect-region [15 15] [22 22]))

   507 

   508 (def worm-segment-top-tip (rect-region [0 15] [7 22]))

   509 

   510 (defn grand-circle?

   511   "Does the worm form a majestic circle (one end touching the other)?"

   512   [experiences]

   513   (and (curled? experiences)

   514        (let [worm-touch (:touch (peek experiences))

   515              tail-touch (worm-touch 0)

   516              head-touch (worm-touch 4)]

   517          (and (< 0.55 (contact worm-segment-bottom-tip tail-touch))

   518               (< 0.55 (contact worm-segment-top-tip    head-touch))))))

   519    #+end_src

   520    #+end_listing

   521 

   522 

   523    #+caption: Program for detecting whether the worm has been wiggling for 

   524    #+caption: the last few frames. It uses a fourier analysis of the muscle 

   525    #+caption: contractions of the worm's tail to determine wiggling. This is 

   526    #+caption: signigicant because there is no particular frame that clearly 

   527    #+caption: indicates that the worm is wiggling --- only when multiple frames 

   528    #+caption: are analyzed together is the wiggling revealed. Defining 

   529    #+caption: wiggling this way also gives the worm an opportunity to learn 

   530    #+caption: and recognize ``frustrated wiggling'', where the worm tries to 

   531    #+caption: wiggle but can't. Frustrated wiggling is very visually different 

   532    #+caption: from actual wiggling, but this definition gives it to us for free.

   533    #+name: wiggling

   534    #+attr_latex: [htpb]

   535 #+begin_listing clojure

   536    #+begin_src clojure

   537 (defn fft [nums]

   538   (map

   539    #(.getReal %)

   540    (.transform

   541     (FastFourierTransformer. DftNormalization/STANDARD)

   542     (double-array nums) TransformType/FORWARD)))

   543 

   544 (def indexed (partial map-indexed vector))

   545 

   546 (defn max-indexed [s]

   547   (first (sort-by (comp - second) (indexed s))))

   548 

   549 (defn wiggling?

   550   "Is the worm wiggling?"

   551   [experiences]

   552   (let [analysis-interval 0x40]

   553     (when (> (count experiences) analysis-interval)

   554       (let [a-flex 3

   555             a-ex   2

   556             muscle-activity

   557             (map :muscle (vector:last-n experiences analysis-interval))

   558             base-activity

   559             (map #(- (% a-flex) (% a-ex)) muscle-activity)]

   560         (= 2

   561            (first

   562             (max-indexed

   563              (map #(Math/abs %)

   564                   (take 20 (fft base-activity))))))))))

   565    #+end_src

   566    #+end_listing

   567 

   568    With these action predicates, I can now recognize the actions of

   569    the worm while it is moving under my control and I have access to

   570    all the worm's senses.

   571 

   572    #+caption: Use the action predicates defined earlier to report on 

   573    #+caption: what the worm is doing while in simulation.

   574    #+name: report-worm-activity

   575    #+attr_latex: [htpb]

   576 #+begin_listing clojure

   577    #+begin_src clojure

   578 (defn debug-experience

   579   [experiences text]

   580   (cond

   581    (grand-circle? experiences) (.setText text "Grand Circle")

   582    (curled? experiences)       (.setText text "Curled")

   583    (wiggling? experiences)     (.setText text "Wiggling")

   584    (resting? experiences)      (.setText text "Resting")))

   585    #+end_src

   586    #+end_listing

   587 

   588    #+caption: Using =debug-experience=, the body-centered predicates

   589    #+caption: work together to classify the behaviour of the worm. 

   590    #+caption: the predicates are operating with access to the worm's

   591    #+caption: full sensory data.

   592    #+name: basic-worm-view

   593    #+ATTR_LaTeX: :width 10cm

   594    [[./images/worm-identify-init.png]]

   595 

   596    These action predicates satisfy the recognition requirement of an

   597    empathic recognition system. There is power in the simplicity of

   598    the action predicates. They describe their actions without getting

   599    confused in visual details of the worm. Each one is frame

   600    independent, but more than that, they are each indepent of

   601    irrelevant visual details of the worm and the environment. They

   602    will work regardless of whether the worm is a different color or

   603    hevaily textured, or if the environment has strange lighting.

   604 

   605    The trick now is to make the action predicates work even when the

   606    sensory data on which they depend is absent. If I can do that, then

   607    I will have gained much,

   608 

   609 ** \Phi-space describes the worm's experiences

   610    

   611    As a first step towards building empathy, I need to gather all of

   612    the worm's experiences during free play. I use a simple vector to

   613    store all the experiences. 

   614 

   615    Each element of the experience vector exists in the vast space of

   616    all possible worm-experiences. Most of this vast space is actually

   617    unreachable due to physical constraints of the worm's body. For

   618    example, the worm's segments are connected by hinge joints that put

   619    a practical limit on the worm's range of motions without limiting

   620    its degrees of freedom. Some groupings of senses are impossible;

   621    the worm can not be bent into a circle so that its ends are

   622    touching and at the same time not also experience the sensation of

   623    touching itself.

   624 

   625    As the worm moves around during free play and its experience vector

   626    grows larger, the vector begins to define a subspace which is all

   627    the sensations the worm can practicaly experience during normal

   628    operation. I call this subspace \Phi-space, short for

   629    physical-space. The experience vector defines a path through

   630    \Phi-space. This path has interesting properties that all derive

   631    from physical embodiment. The proprioceptive components are

   632    completely smooth, because in order for the worm to move from one

   633    position to another, it must pass through the intermediate

   634    positions. The path invariably forms loops as actions are repeated.

   635    Finally and most importantly, proprioception actually gives very

   636    strong inference about the other senses. For example, when the worm

   637    is flat, you can infer that it is touching the ground and that its

   638    muscles are not active, because if the muscles were active, the

   639    worm would be moving and would not be perfectly flat. In order to

   640    stay flat, the worm has to be touching the ground, or it would

   641    again be moving out of the flat position due to gravity. If the

   642    worm is positioned in such a way that it interacts with itself,

   643    then it is very likely to be feeling the same tactile feelings as

   644    the last time it was in that position, because it has the same body

   645    as then. If you observe multiple frames of proprioceptive data,

   646    then you can become increasingly confident about the exact

   647    activations of the worm's muscles, because it generally takes a

   648    unique combination of muscle contractions to transform the worm's

   649    body along a specific path through \Phi-space.

   650 

   651    There is a simple way of taking \Phi-space and the total ordering

   652    provided by an experience vector and reliably infering the rest of

   653    the senses.

   654 

   655 ** Empathy is the process of tracing though \Phi-space 

   656 

   657    Here is the core of a basic empathy algorithm, starting with an

   658    experience vector:

   659 

   660    First, group the experiences into tiered proprioceptive bins. I use

   661    powers of 10 and 3 bins, and the smallest bin has an approximate

   662    size of 0.001 radians in all proprioceptive dimensions.

   663    

   664    Then, given a sequence of proprioceptive input, generate a set of

   665    matching experience records for each input, using the tiered

   666    proprioceptive bins. 

   667 

   668    Finally, to infer sensory data, select the longest consective chain

   669    of experiences. Conecutive experience means that the experiences

   670    appear next to each other in the experience vector.

   671 

   672    This algorithm has three advantages: 

   673 

   674    1. It's simple

   675 

   676    3. It's very fast -- retrieving possible interpretations takes

   677       constant time. Tracing through chains of interpretations takes

   678       time proportional to the average number of experiences in a

   679       proprioceptive bin. Redundant experiences in \Phi-space can be

   680       merged to save computation.

   681 

   682    2. It protects from wrong interpretations of transient ambiguous

   683       proprioceptive data. For example, if the worm is flat for just

   684       an instant, this flattness will not be interpreted as implying

   685       that the worm has its muscles relaxed, since the flattness is

   686       part of a longer chain which includes a distinct pattern of

   687       muscle activation. Markov chains or other memoryless statistical

   688       models that operate on individual frames may very well make this

   689       mistake.

   690 

   691    #+caption: Program to convert an experience vector into a 

   692    #+caption: proprioceptively binned lookup function.

   693    #+name: bin

   694    #+attr_latex: [htpb]

   695 #+begin_listing clojure

   696    #+begin_src clojure

   697 (defn bin [digits]

   698   (fn [angles]

   699     (->> angles

   700          (flatten)

   701          (map (juxt #(Math/sin %) #(Math/cos %)))

   702          (flatten)

   703          (mapv #(Math/round (* % (Math/pow 10 (dec digits))))))))

   704 

   705 (defn gen-phi-scan 

   706   "Nearest-neighbors with binning. Only returns a result if

   707    the propriceptive data is within 10% of a previously recorded

   708    result in all dimensions."

   709   [phi-space]

   710   (let [bin-keys (map bin [3 2 1])

   711         bin-maps

   712         (map (fn [bin-key]

   713                (group-by

   714                 (comp bin-key :proprioception phi-space)

   715                 (range (count phi-space)))) bin-keys)

   716         lookups (map (fn [bin-key bin-map]

   717                        (fn [proprio] (bin-map (bin-key proprio))))

   718                      bin-keys bin-maps)]

   719     (fn lookup [proprio-data]

   720       (set (some #(% proprio-data) lookups)))))

   721    #+end_src

   722    #+end_listing

   723 

   724    #+caption: =longest-thread= finds the longest path of consecutive 

   725    #+caption: experiences to explain proprioceptive worm data.

   726    #+name: phi-space-history-scan

   727    #+ATTR_LaTeX: :width 10cm

   728    [[./images/aurellem-gray.png]]

   729 

   730    =longest-thread= infers sensory data by stitching together pieces

   731    from previous experience. It prefers longer chains of previous

   732    experience to shorter ones. For example, during training the worm

   733    might rest on the ground for one second before it performs its

   734    excercises. If during recognition the worm rests on the ground for

   735    five seconds, =longest-thread= will accomodate this five second

   736    rest period by looping the one second rest chain five times.

   737 

   738    =longest-thread= takes time proportinal to the average number of

   739    entries in a proprioceptive bin, because for each element in the

   740    starting bin it performes a series of set lookups in the preceeding

   741    bins. If the total history is limited, then this is only a constant

   742    multiple times the number of entries in the starting bin. This

   743    analysis also applies even if the action requires multiple longest

   744    chains -- it's still the average number of entries in a

   745    proprioceptive bin times the desired chain length. Because

   746    =longest-thread= is so efficient and simple, I can interpret

   747    worm-actions in real time.

   748 

   749    #+caption: Program to calculate empathy by tracing though \Phi-space

   750    #+caption: and finding the longest (ie. most coherent) interpretation

   751    #+caption: of the data.

   752    #+name: longest-thread

   753    #+attr_latex: [htpb]

   754 #+begin_listing clojure

   755    #+begin_src clojure

   756 (defn longest-thread

   757   "Find the longest thread from phi-index-sets. The index sets should

   758    be ordered from most recent to least recent."

   759   [phi-index-sets]

   760   (loop [result '()

   761          [thread-bases & remaining :as phi-index-sets] phi-index-sets]

   762     (if (empty? phi-index-sets)

   763       (vec result)

   764       (let [threads

   765             (for [thread-base thread-bases]

   766               (loop [thread (list thread-base)

   767                      remaining remaining]

   768                 (let [next-index (dec (first thread))]

   769                   (cond (empty? remaining) thread

   770                         (contains? (first remaining) next-index)

   771                         (recur

   772                          (cons next-index thread) (rest remaining))

   773                         :else thread))))

   774             longest-thread

   775             (reduce (fn [thread-a thread-b]

   776                       (if (> (count thread-a) (count thread-b))

   777                         thread-a thread-b))

   778                     '(nil)

   779                     threads)]

   780         (recur (concat longest-thread result)

   781                (drop (count longest-thread) phi-index-sets))))))

   782    #+end_src

   783    #+end_listing

   784 

   785    There is one final piece, which is to replace missing sensory data

   786    with a best-guess estimate. While I could fill in missing data by

   787    using a gradient over the closest known sensory data points,

   788    averages can be misleading. It is certainly possible to create an

   789    impossible sensory state by averaging two possible sensory states.

   790    Therefore, I simply replicate the most recent sensory experience to

   791    fill in the gaps.

   792 

   793    #+caption: Fill in blanks in sensory experience by replicating the most 

   794    #+caption: recent experience.

   795    #+name: infer-nils

   796    #+attr_latex: [htpb]

   797 #+begin_listing clojure

   798    #+begin_src clojure

   799 (defn infer-nils

   800   "Replace nils with the next available non-nil element in the

   801    sequence, or barring that, 0."

   802   [s]

   803   (loop [i (dec (count s))

   804          v (transient s)]

   805     (if (zero? i) (persistent! v)

   806         (if-let [cur (v i)]

   807           (if (get v (dec i) 0)

   808             (recur (dec i) v)

   809             (recur (dec i) (assoc! v (dec i) cur)))

   810           (recur i (assoc! v i 0))))))

   811    #+end_src

   812    #+end_listing

   813   

   814 ** Efficient action recognition with =EMPATH=

   815    

   816    To use =EMPATH= with the worm, I first need to gather a set of

   817    experiences from the worm that includes the actions I want to

   818    recognize. The =generate-phi-space= program (listing

   819    \ref{generate-phi-space} runs the worm through a series of

   820    exercices and gatheres those experiences into a vector. The

   821    =do-all-the-things= program is a routine expressed in a simple

   822    muscle contraction script language for automated worm control. It

   823    causes the worm to rest, curl, and wiggle over about 700 frames

   824    (approx. 11 seconds).

   825 

   826    #+caption: Program to gather the worm's experiences into a vector for 

   827    #+caption: further processing. The =motor-control-program= line uses

   828    #+caption: a motor control script that causes the worm to execute a series

   829    #+caption: of ``exercices'' that include all the action predicates.

   830    #+name: generate-phi-space

   831    #+attr_latex: [htpb]

   832 #+begin_listing clojure 

   833    #+begin_src clojure

   834 (def do-all-the-things 

   835   (concat

   836    curl-script

   837    [[300 :d-ex 40]

   838     [320 :d-ex 0]]

   839    (shift-script 280 (take 16 wiggle-script))))

   840 

   841 (defn generate-phi-space []

   842   (let [experiences (atom [])]

   843     (run-world

   844      (apply-map 

   845       worm-world

   846       (merge

   847        (worm-world-defaults)

   848        {:end-frame 700

   849         :motor-control

   850         (motor-control-program worm-muscle-labels do-all-the-things)

   851         :experiences experiences})))

   852     @experiences))

   853    #+end_src

   854    #+end_listing

   855 

   856    #+caption: Use longest thread and a phi-space generated from a short

   857    #+caption: exercise routine to interpret actions during free play.

   858    #+name: empathy-debug

   859    #+attr_latex: [htpb]

   860 #+begin_listing clojure

   861    #+begin_src clojure

   862 (defn init []

   863   (def phi-space (generate-phi-space))

   864   (def phi-scan (gen-phi-scan phi-space)))

   865 

   866 (defn empathy-demonstration []

   867   (let [proprio (atom ())]

   868     (fn

   869       [experiences text]

   870       (let [phi-indices (phi-scan (:proprioception (peek experiences)))]

   871         (swap! proprio (partial cons phi-indices))

   872         (let [exp-thread (longest-thread (take 300 @proprio))

   873               empathy (mapv phi-space (infer-nils exp-thread))]

   874           (println-repl (vector:last-n exp-thread 22))

   875           (cond

   876            (grand-circle? empathy) (.setText text "Grand Circle")

   877            (curled? empathy)       (.setText text "Curled")

   878            (wiggling? empathy)     (.setText text "Wiggling")

   879            (resting? empathy)      (.setText text "Resting")

   880            :else                       (.setText text "Unknown")))))))

   881 

   882 (defn empathy-experiment [record]

   883   (.start (worm-world :experience-watch (debug-experience-phi)

   884                       :record record :worm worm*)))

   885    #+end_src

   886    #+end_listing

   887    

   888    The result of running =empathy-experiment= is that the system is

   889    generally able to interpret worm actions using the action-predicates

   890    on simulated sensory data just as well as with actual data. Figure

   891    \ref{empathy-debug-image} was generated using =empathy-experiment=:

   892 

   893   #+caption: From only proprioceptive data, =EMPATH= was able to infer 

   894   #+caption: the complete sensory experience and classify four poses

   895   #+caption: (The last panel shows a composite image of \emph{wriggling}, 

   896   #+caption: a dynamic pose.)

   897   #+name: empathy-debug-image

   898   #+ATTR_LaTeX: :width 10cm :placement [H]

   899   [[./images/empathy-1.png]]

   900 

   901   One way to measure the performance of =EMPATH= is to compare the

   902   sutiability of the imagined sense experience to trigger the same

   903   action predicates as the real sensory experience. 

   904   

   905    #+caption: Determine how closely empathy approximates actual 

   906    #+caption: sensory data.

   907    #+name: test-empathy-accuracy

   908    #+attr_latex: [htpb]

   909 #+begin_listing clojure

   910    #+begin_src clojure

   911 (def worm-action-label

   912   (juxt grand-circle? curled? wiggling?))

   913 

   914 (defn compare-empathy-with-baseline [matches]

   915   (let [proprio (atom ())]

   916     (fn

   917       [experiences text]

   918       (let [phi-indices (phi-scan (:proprioception (peek experiences)))]

   919         (swap! proprio (partial cons phi-indices))

   920         (let [exp-thread (longest-thread (take 300 @proprio))

   921               empathy (mapv phi-space (infer-nils exp-thread))

   922               experience-matches-empathy

   923               (= (worm-action-label experiences)

   924                  (worm-action-label empathy))]

   925           (println-repl experience-matches-empathy)

   926           (swap! matches #(conj % experience-matches-empathy)))))))

   927               

   928 (defn accuracy [v]

   929   (float (/ (count (filter true? v)) (count v))))

   930 

   931 (defn test-empathy-accuracy []

   932   (let [res (atom [])]

   933     (run-world

   934      (worm-world :experience-watch

   935                  (compare-empathy-with-baseline res)

   936                  :worm worm*))

   937     (accuracy @res)))

   938    #+end_src

   939    #+end_listing

   940 

   941   Running =test-empathy-accuracy= using the very short exercise

   942   program defined in listing \ref{generate-phi-space}, and then doing

   943   a similar pattern of activity manually yeilds an accuracy of around

   944   73%. This is based on very limited worm experience. By training the

   945   worm for longer, the accuracy dramatically improves.

   946 

   947    #+caption: Program to generate \Phi-space using manual training.

   948    #+name: manual-phi-space

   949    #+attr_latex: [htpb]

   950    #+begin_listing clojure

   951    #+begin_src clojure

   952 (defn init-interactive []

   953   (def phi-space

   954     (let [experiences (atom [])]

   955       (run-world

   956        (apply-map 

   957         worm-world

   958         (merge

   959          (worm-world-defaults)

   960          {:experiences experiences})))

   961       @experiences))

   962   (def phi-scan (gen-phi-scan phi-space)))

   963    #+end_src

   964    #+end_listing

   965 

   966   After about 1 minute of manual training, I was able to achieve 95%

   967   accuracy on manual testing of the worm using =init-interactive= and

   968   =test-empathy-accuracy=. The majority of errors are near the

   969   boundaries of transitioning from one type of action to another.

   970   During these transitions the exact label for the action is more open

   971   to interpretation, and dissaggrement between empathy and experience

   972   is more excusable.

   973 

   974 ** Digression: bootstrapping touch using free exploration

   975 

   976    In the previous section I showed how to compute actions in terms of

   977    body-centered predicates which relied averate touch activation of

   978    pre-defined regions of the worm's skin. What if, instead of recieving

   979    touch pre-grouped into the six faces of each worm segment, the true

   980    topology of the worm's skin was unknown? This is more similiar to how

   981    a nerve fiber bundle might be arranged. While two fibers that are

   982    close in a nerve bundle /might/ correspond to two touch sensors that

   983    are close together on the skin, the process of taking a complicated

   984    surface and forcing it into essentially a circle requires some cuts

   985    and rerragenments.

   986    

   987    In this section I show how to automatically learn the skin-topology of

   988    a worm segment by free exploration. As the worm rolls around on the

   989    floor, large sections of its surface get activated. If the worm has

   990    stopped moving, then whatever region of skin that is touching the

   991    floor is probably an important region, and should be recorded.

   992    

   993    #+caption: Program to detect whether the worm is in a resting state 

   994    #+caption: with one face touching the floor.

   995    #+name: pure-touch

   996    #+begin_listing clojure

   997    #+begin_src clojure

   998 (def full-contact [(float 0.0) (float 0.1)])

   999 

  1000 (defn pure-touch?

  1001   "This is worm specific code to determine if a large region of touch

  1002    sensors is either all on or all off."

  1003   [[coords touch :as touch-data]]

  1004   (= (set (map first touch)) (set full-contact)))

  1005    #+end_src

  1006    #+end_listing

  1007 

  1008    After collecting these important regions, there will many nearly

  1009    similiar touch regions. While for some purposes the subtle

  1010    differences between these regions will be important, for my

  1011    purposes I colapse them into mostly non-overlapping sets using

  1012    =remove-similiar= in listing \ref{remove-similiar}

  1013 

  1014    #+caption: Program to take a lits of set of points and ``collapse them''

  1015    #+caption: so that the remaining sets in the list are siginificantly 

  1016    #+caption: different from each other. Prefer smaller sets to larger ones.

  1017    #+name: remove-similiar

  1018    #+begin_listing clojure

  1019    #+begin_src clojure

  1020 (defn remove-similar

  1021   [coll]

  1022   (loop [result () coll (sort-by (comp - count) coll)]

  1023     (if (empty? coll) result

  1024         (let  [[x & xs] coll

  1025                c (count x)]

  1026           (if (some

  1027                (fn [other-set]

  1028                  (let [oc (count other-set)]

  1029                    (< (- (count (union other-set x)) c) (* oc 0.1))))

  1030                xs)

  1031             (recur result xs)

  1032             (recur (cons x result) xs))))))

  1033    #+end_src

  1034    #+end_listing

  1035 

  1036    Actually running this simulation is easy given =CORTEX='s facilities.

  1037 

  1038    #+caption: Collect experiences while the worm moves around. Filter the touch 

  1039    #+caption: sensations by stable ones, collapse similiar ones together, 

  1040    #+caption: and report the regions learned.

  1041    #+name: learn-touch

  1042    #+begin_listing clojure

  1043    #+begin_src clojure

  1044 (defn learn-touch-regions []

  1045   (let [experiences (atom [])

  1046         world (apply-map

  1047                worm-world

  1048                (assoc (worm-segment-defaults)

  1049                  :experiences experiences))]

  1050     (run-world world)

  1051     (->>

  1052      @experiences

  1053      (drop 175)

  1054      ;; access the single segment's touch data

  1055      (map (comp first :touch))

  1056      ;; only deal with "pure" touch data to determine surfaces

  1057      (filter pure-touch?)

  1058      ;; associate coordinates with touch values

  1059      (map (partial apply zipmap))

  1060      ;; select those regions where contact is being made

  1061      (map (partial group-by second))

  1062      (map #(get % full-contact))

  1063      (map (partial map first))

  1064      ;; remove redundant/subset regions

  1065      (map set)

  1066      remove-similar)))

  1067 

  1068 (defn learn-and-view-touch-regions []

  1069   (map view-touch-region

  1070        (learn-touch-regions)))

  1071    #+end_src

  1072    #+end_listing

  1073 

  1074    The only thing remining to define is the particular motion the worm

  1075    must take. I accomplish this with a simple motor control program.

  1076 

  1077    #+caption: Motor control program for making the worm roll on the ground.

  1078    #+caption: This could also be replaced with random motion.

  1079    #+name: worm-roll

  1080    #+begin_listing clojure

  1081    #+begin_src clojure

  1082 (defn touch-kinesthetics []

  1083   [[170 :lift-1 40]

  1084    [190 :lift-1 19]

  1085    [206 :lift-1  0]

  1086 

  1087    [400 :lift-2 40]

  1088    [410 :lift-2  0]

  1089 

  1090    [570 :lift-2 40]

  1091    [590 :lift-2 21]

  1092    [606 :lift-2  0]

  1093 

  1094    [800 :lift-1 30]

  1095    [809 :lift-1 0]

  1096 

  1097    [900 :roll-2 40]

  1098    [905 :roll-2 20]

  1099    [910 :roll-2  0]

  1100 

  1101    [1000 :roll-2 40]

  1102    [1005 :roll-2 20]

  1103    [1010 :roll-2  0]

  1104    

  1105    [1100 :roll-2 40]

  1106    [1105 :roll-2 20]

  1107    [1110 :roll-2  0]

  1108    ])

  1109    #+end_src

  1110    #+end_listing

  1111 

  1112 

  1113    #+caption: The small worm rolls around on the floor, driven

  1114    #+caption: by the motor control program in listing \ref{worm-roll}.

  1115    #+name: worm-roll

  1116    #+ATTR_LaTeX: :width 12cm

  1117    [[./images/worm-roll.png]]

  1118 

  1119 

  1120    #+caption: After completing its adventures, the worm now knows 

  1121    #+caption: how its touch sensors are arranged along its skin. These 

  1122    #+caption: are the regions that were deemed important by 

  1123    #+caption: =learn-touch-regions=. Note that the worm has discovered

  1124    #+caption: that it has six sides.

  1125    #+name: worm-touch-map

  1126    #+ATTR_LaTeX: :width 12cm

  1127    [[./images/touch-learn.png]]

  1128 

  1129    While simple, =learn-touch-regions= exploits regularities in both

  1130    the worm's physiology and the worm's environment to correctly

  1131    deduce that the worm has six sides. Note that =learn-touch-regions=

  1132    would work just as well even if the worm's touch sense data were

  1133    completely scrambled. The cross shape is just for convienence. This

  1134    example justifies the use of pre-defined touch regions in =EMPATH=.

  1135 

  1136 * Contributions

  1137 

  1138 

  1139 

  1140 

  1141 # An anatomical joke:

  1142 # - Training

  1143 # - Skeletal imitation

  1144 # - Sensory fleshing-out

  1145 # - Classification
author	Robert McIntyre <rlm@mit.edu>
date	Wed, 26 Mar 2014 22:17:42 -0400
parents	0a4362d1f138
children	6db37c4aa1ee