# HG changeset patch # User Robert McIntyre # Date 1395519034 14400 # Node ID 5205535237fb52fc5cba7f118893c36f1f0101e3 # Parent b5d0f0adf19fc0e45079762e0848d5259b84ea64 fix skew in self-organizing-touch, work on thesis. diff -r b5d0f0adf19f -r 5205535237fb org/movement.org --- a/org/movement.org Fri Mar 21 20:56:56 2014 -0400 +++ b/org/movement.org Sat Mar 22 16:10:34 2014 -0400 @@ -283,7 +283,7 @@ muscles (pics "muscles/0") targets (map #(File. (str base "out/" (format "%07d.png" %))) - (range 0 (count main-view)))] + (range (count main-view)))] (dorun (pmap (comp diff -r b5d0f0adf19f -r 5205535237fb org/proprioception.org --- a/org/proprioception.org Fri Mar 21 20:56:56 2014 -0400 +++ b/org/proprioception.org Sat Mar 22 16:10:34 2014 -0400 @@ -52,7 +52,7 @@ system. The three vectors do not have to be normalized or orthogonal." [vec1 vec2 vec3] - (< 0 (.dot (.cross vec1 vec2) vec3))) + (pos? (.dot (.cross vec1 vec2) vec3))) (defn absolute-angle "The angle between 'vec1 and 'vec2 around 'axis. In the range @@ -328,7 +328,7 @@ proprioception (pics "proprio/0") targets (map #(File. (str base "out/" (format "%07d.png" %))) - (range 0 (count main-view)))] + (range (count main-view)))] (dorun (pmap (comp @@ -385,7 +385,7 @@ * Next -Next time, I'll give the Worm the power to [[./movement.org][move on it's own]]. +Next time, I'll give the Worm the power to [[./movement.org][move on its own]]. * COMMENT generate source diff -r b5d0f0adf19f -r 5205535237fb org/self_organizing_touch.clj --- a/org/self_organizing_touch.clj Fri Mar 21 20:56:56 2014 -0400 +++ b/org/self_organizing_touch.clj Sat Mar 22 16:10:34 2014 -0400 @@ -62,6 +62,7 @@ (merge (worm-world-defaults) {:worm-model single-worm-segment :view single-worm-segment-view + :experience-watch nil :motor-control (motor-control-program worm-single-segment-muscle-labels diff -r b5d0f0adf19f -r 5205535237fb org/touch.org --- a/org/touch.org Fri Mar 21 20:56:56 2014 -0400 +++ b/org/touch.org Sat Mar 22 16:10:34 2014 -0400 @@ -78,7 +78,7 @@ To simulate touch there are three conceptual steps. For each solid object in the creature, you first have to get UV image and scale parameter which define the position and length of the feelers. Then, -you use the triangles which compose the mesh and the UV data stored in +you use the triangles which comprise the mesh and the UV data stored in the mesh to determine the world-space position and orientation of each feeler. Then once every frame, update these positions and orientations to match the current position and orientation of the object, and use @@ -136,7 +136,7 @@ A =Mesh= is composed of =Triangles=, and each =Triangle= has three vertices which have coordinates in world space and UV space. -Here, =triangles= gets all the world-space triangles which compose a +Here, =triangles= gets all the world-space triangles which comprise a mesh, while =pixel-triangles= gets those same triangles expressed in pixel coordinates (which are UV coordinates scaled to fit the height and width of the UV image). @@ -152,7 +152,7 @@ (.getTriangle (.getMesh geo) triangle-index scratch) scratch))) (defn triangles - "Return a sequence of all the Triangles which compose a given + "Return a sequence of all the Triangles which comprise a given Geometry." [#^Geometry geo] (map (partial triangle geo) (range (.getTriangleCount (.getMesh geo))))) @@ -240,7 +240,7 @@ [#^Triangle t] (let [mat (Matrix4f.) [vert-1 vert-2 vert-3] - ((comp vec map) #(.get t %) (range 3)) + (mapv #(.get t %) (range 3)) unit-normal (do (.calculateNormal t)(.getNormal t)) vertices [vert-1 vert-2 vert-3 unit-normal]] (dorun diff -r b5d0f0adf19f -r 5205535237fb org/worm_learn.clj --- a/org/worm_learn.clj Fri Mar 21 20:56:56 2014 -0400 +++ b/org/worm_learn.clj Sat Mar 22 16:10:34 2014 -0400 @@ -141,9 +141,6 @@ (> (Math/sin bend) 0.64)) (:proprioception (peek experiences)))) -(defn touch-average [[coords touch]] - (/ (average (map first touch)) (average (map second touch)))) - (defn rect-region [[x0 y0] [x1 y1]] (vec (for [x (range x0 (inc x1)) @@ -225,15 +222,6 @@ (declare phi-space phi-scan) -(defn next-phi-states - "Given proprioception data, determine the most likely next sensory - pattern from previous experience." - [proprio phi-space phi-scan] - (if-let [results (phi-scan proprio)] - (mapv phi-space - (filter (partial > (count phi-space)) - (map inc results))))) - (defn debug-experience [experiences] (cond @@ -257,14 +245,13 @@ (defn worm-world-defaults [] (let [direct-control (worm-direct-control worm-muscle-labels 40)] - {:view worm-side-view - :motor-control (:motor-control direct-control) - :keybindings (:keybindings direct-control) - :record nil - :experiences (atom []) - :experience-watch debug-experience - :worm-model worm-model - :end-frame nil})) + (merge direct-control + {:view worm-side-view + :record nil + :experiences (atom []) + :experience-watch debug-experience + :worm-model worm-model + :end-frame nil}))) (defn dir! [file] (if-not (.exists file) @@ -300,7 +287,7 @@ (position-camera world view) (.setTimer world timer) (display-dilated-time world timer) - (if record + (when record (Capture/captureVideo world (dir! (File. record "main-view")))) @@ -321,13 +308,13 @@ (experience-watch @experiences)) (muscle-display muscle-data - (if record (dir! (File. record "muscle")))) + (when record (dir! (File. record "muscle")))) (prop-display proprioception-data - (if record (dir! (File. record "proprio")))) + (when record (dir! (File. record "proprio")))) (touch-display touch-data - (if record (dir! (File. record "touch"))))))))) + (when record (dir! (File. record "touch"))))))))) @@ -406,22 +393,37 @@ (def phi-scan (gen-phi-scan phi-space)) ) - - +;; (defn infer-nils-dyl [s] +;; (loop [closed () +;; open s +;; anchor 0] +;; (if-not (empty? open) +;; (recur (conj closed +;; (or (peek open) +;; anchor)) +;; (pop open) +;; (or (peek open) anchor)) +;; closed))) + +;; (defn infer-nils [s] +;; (for [i (range (count s))] +;; (or (get s i) +;; (some (comp not nil?) (vector:last-n (- (count s) i))) +;; 0))) (defn infer-nils "Replace nils with the next available non-nil element in the sequence, or barring that, 0." [s] - (loop [i (dec (count s)) v (transient s)] - (if (= i 0) (persistent! v) - (let [cur (v i)] - (if cur - (if (get v (dec i) 0) - (recur (dec i) v) - (recur (dec i) (assoc! v (dec i) cur))) - (recur i (assoc! v i 0))))))) + (loop [i (dec (count s)) + v (transient s)] + (if (zero? i) (persistent! v) + (if-let [cur (v i)] + (if (get v (dec i) 0) + (recur (dec i) v) + (recur (dec i) (assoc! v (dec i) cur))) + (recur i (assoc! v i 0)))))) ;; tests diff -r b5d0f0adf19f -r 5205535237fb thesis/aux/org/first-chapter.html --- a/thesis/aux/org/first-chapter.html Fri Mar 21 20:56:56 2014 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,455 +0,0 @@ - - - - -<code>CORTEX</code> - - - - - - - - - - - - - - - -
-

CORTEX

- - -
-
- -
- -

aurellem

- -
- -
Written by Robert McIntyre
- - - - - - - -
-

Artificial Imagination

-
- - -

- Imagine watching a video of someone skateboarding. When you watch - the video, you can imagine yourself skateboarding, and your - knowledge of the human body and its dynamics guides your - interpretation of the scene. For example, even if the skateboarder - is partially occluded, you can infer the positions of his arms and - body from your own knowledge of how your body would be positioned if - you were skateboarding. If the skateboarder suffers an accident, you - wince in sympathy, imagining the pain your own body would experience - if it were in the same situation. This empathy with other people - guides our understanding of whatever they are doing because it is a - powerful constraint on what is probable and possible. In order to - make use of this powerful empathy constraint, I need a system that - can generate and make sense of sensory data from the many different - senses that humans possess. The two key proprieties of such a system - are embodiment and imagination. -

- -
- -
-

What is imagination?

-
- - -

- One kind of imagination is sympathetic imagination: you imagine - yourself in the position of something/someone you are - observing. This type of imagination comes into play when you follow - along visually when watching someone perform actions, or when you - sympathetically grimace when someone hurts themselves. This type of - imagination uses the constraints you have learned about your own - body to highly constrain the possibilities in whatever you are - seeing. It uses all your senses to including your senses of touch, - proprioception, etc. Humans are flexible when it comes to "putting - themselves in another's shoes," and can sympathetically understand - not only other humans, but entities ranging animals to cartoon - characters to single dots on a screen! -

-

- Another kind of imagination is predictive imagination: you - construct scenes in your mind that are not entirely related to - whatever you are observing, but instead are predictions of the - future or simply flights of fancy. You use this type of imagination - to plan out multi-step actions, or play out dangerous situations in - your mind so as to avoid messing them up in reality. -

-

- Of course, sympathetic and predictive imagination blend into each - other and are not completely separate concepts. One dimension along - which you can distinguish types of imagination is dependence on raw - sense data. Sympathetic imagination is highly constrained by your - senses, while predictive imagination can be more or less dependent - on your senses depending on how far ahead you imagine. Daydreaming - is an extreme form of predictive imagination that wanders through - different possibilities without concern for whether they are - related to whatever is happening in reality. -

-

- For this thesis, I will mostly focus on sympathetic imagination and - the constraint it provides for understanding sensory data. -

-
- -
- -
-

What problems can imagination solve?

-
- - -

- Consider a video of a cat drinking some water. -

- -
-

../images/cat-drinking.jpg

-

A cat drinking some water. Identifying this action is beyond the state of the art for computers.

-
- -

- It is currently impossible for any computer program to reliably - label such an video as "drinking". I think humans are able to label - such video as "drinking" because they imagine themselves as the - cat, and imagine putting their face up against a stream of water - and sticking out their tongue. In that imagined world, they can - feel the cool water hitting their tongue, and feel the water - entering their body, and are able to recognize that feeling as - drinking. So, the label of the action is not really in the pixels - of the image, but is found clearly in a simulation inspired by - those pixels. An imaginative system, having been trained on - drinking and non-drinking examples and learning that the most - important component of drinking is the feeling of water sliding - down one's throat, would analyze a video of a cat drinking in the - following manner: -

-
    -
  • Create a physical model of the video by putting a "fuzzy" model - of its own body in place of the cat. Also, create a simulation of - the stream of water. - -
  • -
  • Play out this simulated scene and generate imagined sensory - experience. This will include relevant muscle contractions, a - close up view of the stream from the cat's perspective, and most - importantly, the imagined feeling of water entering the mouth. - -
  • -
  • The action is now easily identified as drinking by the sense of - taste alone. The other senses (such as the tongue moving in and - out) help to give plausibility to the simulated action. Note that - the sense of vision, while critical in creating the simulation, - is not critical for identifying the action from the simulation. -
  • -
- - -

- More generally, I expect imaginative systems to be particularly - good at identifying embodied actions in videos. -

-
-
- -
- -
-

Cortex

-
- - -

- The previous example involves liquids, the sense of taste, and - imagining oneself as a cat. For this thesis I constrain myself to - simpler, more easily digitizable senses and situations. -

-

- My system, Cortex performs imagination in two different simplified - worlds: worm world and stick figure world. In each of these - worlds, entities capable of imagination recognize actions by - simulating the experience from their own perspective, and then - recognizing the action from a database of examples. -

-

- In order to serve as a framework for experiments in imagination, - Cortex requires simulated bodies, worlds, and senses like vision, - hearing, touch, proprioception, etc. -

- -
- -
-

A Video Game Engine takes care of some of the groundwork

-
- - -

- When it comes to simulation environments, the engines used to - create the worlds in video games offer top-notch physics and - graphics support. These engines also have limited support for - creating cameras and rendering 3D sound, which can be repurposed - for vision and hearing respectively. Physics collision detection - can be expanded to create a sense of touch. -

-

- jMonkeyEngine3 is one such engine for creating video games in - Java. It uses OpenGL to render to the screen and uses screengraphs - to avoid drawing things that do not appear on the screen. It has an - active community and several games in the pipeline. The engine was - not built to serve any particular game but is instead meant to be - used for any 3D game. I chose jMonkeyEngine3 it because it had the - most features out of all the open projects I looked at, and because - I could then write my code in Clojure, an implementation of LISP - that runs on the JVM. -

-
- -
- -
-

CORTEX Extends jMonkeyEngine3 to implement rich senses

-
- - -

- Using the game-making primitives provided by jMonkeyEngine3, I have - constructed every major human sense except for smell and - taste. Cortex also provides an interface for creating creatures - in Blender, a 3D modeling environment, and then "rigging" the - creatures with senses using 3D annotations in Blender. A creature - can have any number of senses, and there can be any number of - creatures in a simulation. -

-

- The senses available in Cortex are: -

- - - -
-
- -
- -
-

A roadmap for Cortex experiments

-
- - - -
- -
-

Worm World

-
- - -

- Worms in Cortex are segmented creatures which vary in length and - number of segments, and have the senses of vision, proprioception, - touch, and muscle tension. -

- -
-

../images/finger-UV.png

-

This is the tactile-sensor-profile for the upper segment of a worm. It defines regions of high touch sensitivity (where there are many white pixels) and regions of low sensitivity (where white pixels are sparse).

-
- - - - -
-
- -
YouTube -
-

The worm responds to touch.

-
- -
-
- -
YouTube -
-

Proprioception in a worm. The proprioceptive readout is - in the upper left corner of the screen.

-
- -

- A worm is trained in various actions such as sinusoidal movement, - curling, flailing, and spinning by directly playing motor - contractions while the worm "feels" the experience. These actions - are recorded both as vectors of muscle tension, touch, and - proprioceptive data, but also in higher level forms such as - frequencies of the various contractions and a symbolic name for the - action. -

-

- Then, the worm watches a video of another worm performing one of - the actions, and must judge which action was performed. Normally - this would be an extremely difficult problem, but the worm is able - to greatly diminish the search space through sympathetic - imagination. First, it creates an imagined copy of its body which - it observes from a third person point of view. Then for each frame - of the video, it maneuvers its simulated body to be in registration - with the worm depicted in the video. The physical constraints - imposed by the physics simulation greatly decrease the number of - poses that have to be tried, making the search feasible. As the - imaginary worm moves, it generates imaginary muscle tension and - proprioceptive sensations. The worm determines the action not by - vision, but by matching the imagined proprioceptive data with - previous examples. -

-

- By using non-visual sensory data such as touch, the worms can also - answer body related questions such as "did your head touch your - tail?" and "did worm A touch worm B?" -

-

- The proprioceptive information used for action identification is - body-centric, so only the registration step is dependent on point - of view, not the identification step. Registration is not specific - to any particular action. Thus, action identification can be - divided into a point-of-view dependent generic registration step, - and a action-specific step that is body-centered and invariant to - point of view. -

-
- -
- -
-

Stick Figure World

-
- - -

- This environment is similar to Worm World, except the creatures are - more complicated and the actions and questions more varied. It is - an experiment to see how far imagination can go in interpreting - actions. -

-
-
-
- -
-

Date: 2013-11-07 04:21:29 EST

-

Author: Robert McIntyre

-

Org version 7.7 with Emacs version 24

-Validate XHTML 1.0 - -
- - diff -r b5d0f0adf19f -r 5205535237fb thesis/aux/org/first-chapter.org --- a/thesis/aux/org/first-chapter.org Fri Mar 21 20:56:56 2014 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,241 +0,0 @@ -#+title: =CORTEX= -#+author: Robert McIntyre -#+email: rlm@mit.edu -#+description: Using embodied AI to facilitate Artificial Imagination. -#+keywords: AI, clojure, embodiment -#+SETUPFILE: ../../aurellem/org/setup.org -#+INCLUDE: ../../aurellem/org/level-0.org -#+babel: :mkdirp yes :noweb yes :exports both -#+OPTIONS: toc:nil, num:nil - -* Artificial Imagination - Imagine watching a video of someone skateboarding. When you watch - the video, you can imagine yourself skateboarding, and your - knowledge of the human body and its dynamics guides your - interpretation of the scene. For example, even if the skateboarder - is partially occluded, you can infer the positions of his arms and - body from your own knowledge of how your body would be positioned if - you were skateboarding. If the skateboarder suffers an accident, you - wince in sympathy, imagining the pain your own body would experience - if it were in the same situation. This empathy with other people - guides our understanding of whatever they are doing because it is a - powerful constraint on what is probable and possible. In order to - make use of this powerful empathy constraint, I need a system that - can generate and make sense of sensory data from the many different - senses that humans possess. The two key proprieties of such a system - are /embodiment/ and /imagination/. - -** What is imagination? - - One kind of imagination is /sympathetic/ imagination: you imagine - yourself in the position of something/someone you are - observing. This type of imagination comes into play when you follow - along visually when watching someone perform actions, or when you - sympathetically grimace when someone hurts themselves. This type of - imagination uses the constraints you have learned about your own - body to highly constrain the possibilities in whatever you are - seeing. It uses all your senses to including your senses of touch, - proprioception, etc. Humans are flexible when it comes to "putting - themselves in another's shoes," and can sympathetically understand - not only other humans, but entities ranging from animals to cartoon - characters to [[http://www.youtube.com/watch?v=0jz4HcwTQmU][single dots]] on a screen! - -# and can infer intention from the actions of not only other humans, -# but also animals, cartoon characters, and even abstract moving dots -# on a screen! - - Another kind of imagination is /predictive/ imagination: you - construct scenes in your mind that are not entirely related to - whatever you are observing, but instead are predictions of the - future or simply flights of fancy. You use this type of imagination - to plan out multi-step actions, or play out dangerous situations in - your mind so as to avoid messing them up in reality. - - Of course, sympathetic and predictive imagination blend into each - other and are not completely separate concepts. One dimension along - which you can distinguish types of imagination is dependence on raw - sense data. Sympathetic imagination is highly constrained by your - senses, while predictive imagination can be more or less dependent - on your senses depending on how far ahead you imagine. Daydreaming - is an extreme form of predictive imagination that wanders through - different possibilities without concern for whether they are - related to whatever is happening in reality. - - For this thesis, I will mostly focus on sympathetic imagination and - the constraint it provides for understanding sensory data. - -** What problems can imagination solve? - - Consider a video of a cat drinking some water. - - #+caption: A cat drinking some water. Identifying this action is beyond the state of the art for computers. - #+ATTR_LaTeX: width=5cm - [[../images/cat-drinking.jpg]] - - It is currently impossible for any computer program to reliably - label such an video as "drinking". I think humans are able to label - such video as "drinking" because they imagine /themselves/ as the - cat, and imagine putting their face up against a stream of water - and sticking out their tongue. In that imagined world, they can - feel the cool water hitting their tongue, and feel the water - entering their body, and are able to recognize that /feeling/ as - drinking. So, the label of the action is not really in the pixels - of the image, but is found clearly in a simulation inspired by - those pixels. An imaginative system, having been trained on - drinking and non-drinking examples and learning that the most - important component of drinking is the feeling of water sliding - down one's throat, would analyze a video of a cat drinking in the - following manner: - - - Create a physical model of the video by putting a "fuzzy" model - of its own body in place of the cat. Also, create a simulation of - the stream of water. - - - Play out this simulated scene and generate imagined sensory - experience. This will include relevant muscle contractions, a - close up view of the stream from the cat's perspective, and most - importantly, the imagined feeling of water entering the mouth. - - - The action is now easily identified as drinking by the sense of - taste alone. The other senses (such as the tongue moving in and - out) help to give plausibility to the simulated action. Note that - the sense of vision, while critical in creating the simulation, - is not critical for identifying the action from the simulation. - - More generally, I expect imaginative systems to be particularly - good at identifying embodied actions in videos. - -* Cortex - - The previous example involves liquids, the sense of taste, and - imagining oneself as a cat. For this thesis I constrain myself to - simpler, more easily digitizable senses and situations. - - My system, =CORTEX= performs imagination in two different simplified - worlds: /worm world/ and /stick-figure world/. In each of these - worlds, entities capable of imagination recognize actions by - simulating the experience from their own perspective, and then - recognizing the action from a database of examples. - - In order to serve as a framework for experiments in imagination, - =CORTEX= requires simulated bodies, worlds, and senses like vision, - hearing, touch, proprioception, etc. - -** A Video Game Engine takes care of some of the groundwork - - When it comes to simulation environments, the engines used to - create the worlds in video games offer top-notch physics and - graphics support. These engines also have limited support for - creating cameras and rendering 3D sound, which can be repurposed - for vision and hearing respectively. Physics collision detection - can be expanded to create a sense of touch. - - jMonkeyEngine3 is one such engine for creating video games in - Java. It uses OpenGL to render to the screen and uses screengraphs - to avoid drawing things that do not appear on the screen. It has an - active community and several games in the pipeline. The engine was - not built to serve any particular game but is instead meant to be - used for any 3D game. I chose jMonkeyEngine3 it because it had the - most features out of all the open projects I looked at, and because - I could then write my code in Clojure, an implementation of LISP - that runs on the JVM. - -** =CORTEX= Extends jMonkeyEngine3 to implement rich senses - - Using the game-making primitives provided by jMonkeyEngine3, I have - constructed every major human sense except for smell and - taste. =CORTEX= also provides an interface for creating creatures - in Blender, a 3D modeling environment, and then "rigging" the - creatures with senses using 3D annotations in Blender. A creature - can have any number of senses, and there can be any number of - creatures in a simulation. - - The senses available in =CORTEX= are: - - - [[../../cortex/html/vision.html][Vision]] - - [[../../cortex/html/hearing.html][Hearing]] - - [[../../cortex/html/touch.html][Touch]] - - [[../../cortex/html/proprioception.html][Proprioception]] - - [[../../cortex/html/movement.html][Muscle Tension]] - -* A roadmap for =CORTEX= experiments - -** Worm World - - Worms in =CORTEX= are segmented creatures which vary in length and - number of segments, and have the senses of vision, proprioception, - touch, and muscle tension. - -#+attr_html: width=755 -#+caption: This is the tactile-sensor-profile for the upper segment of a worm. It defines regions of high touch sensitivity (where there are many white pixels) and regions of low sensitivity (where white pixels are sparse). -[[../images/finger-UV.png]] - - -#+begin_html -
-
- -
YouTube -
-

The worm responds to touch.

-
-#+end_html - -#+begin_html -
-
- -
YouTube -
-

Proprioception in a worm. The proprioceptive readout is - in the upper left corner of the screen.

-
-#+end_html - - A worm is trained in various actions such as sinusoidal movement, - curling, flailing, and spinning by directly playing motor - contractions while the worm "feels" the experience. These actions - are recorded both as vectors of muscle tension, touch, and - proprioceptive data, but also in higher level forms such as - frequencies of the various contractions and a symbolic name for the - action. - - Then, the worm watches a video of another worm performing one of - the actions, and must judge which action was performed. Normally - this would be an extremely difficult problem, but the worm is able - to greatly diminish the search space through sympathetic - imagination. First, it creates an imagined copy of its body which - it observes from a third person point of view. Then for each frame - of the video, it maneuvers its simulated body to be in registration - with the worm depicted in the video. The physical constraints - imposed by the physics simulation greatly decrease the number of - poses that have to be tried, making the search feasible. As the - imaginary worm moves, it generates imaginary muscle tension and - proprioceptive sensations. The worm determines the action not by - vision, but by matching the imagined proprioceptive data with - previous examples. - - By using non-visual sensory data such as touch, the worms can also - answer body related questions such as "did your head touch your - tail?" and "did worm A touch worm B?" - - The proprioceptive information used for action identification is - body-centric, so only the registration step is dependent on point - of view, not the identification step. Registration is not specific - to any particular action. Thus, action identification can be - divided into a point-of-view dependent generic registration step, - and a action-specific step that is body-centered and invariant to - point of view. - -** Stick Figure World - - This environment is similar to Worm World, except the creatures are - more complicated and the actions and questions more varied. It is - an experiment to see how far imagination can go in interpreting - actions. diff -r b5d0f0adf19f -r 5205535237fb thesis/aux/org/roadmap.org --- a/thesis/aux/org/roadmap.org Fri Mar 21 20:56:56 2014 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,220 +0,0 @@ -In order for this to be a reasonable thesis that I can be proud of, -what are the /minimum/ number of things I need to get done? - - -* worm OR hand registration - - training from a few examples (2 to start out) - - aligning the body with the scene - - generating sensory data - - matching previous labeled examples using dot-products or some - other basic thing - - showing that it works with different views - -* first draft - - draft of thesis without bibliography or formatting - - should have basic experiment and have full description of - framework with code - - review with Winston - -* final draft - - implement stretch goals from Winston if possible - - complete final formatting and submit - -* CORTEX - DEADLINE: <2014-05-09 Fri> - SHIT THAT'S IN 67 DAYS!!! - -** program simple feature matching code for the worm's segments - -Subgoals: -*** DONE Get cortex working again, run tests, no jmonkeyengine updates - CLOSED: [2014-03-03 Mon 22:07] SCHEDULED: <2014-03-03 Mon> -*** DONE get blender working again - CLOSED: [2014-03-03 Mon 22:43] SCHEDULED: <2014-03-03 Mon> -*** DONE make sparce touch worm segment in blender - CLOSED: [2014-03-03 Mon 23:16] SCHEDULED: <2014-03-03 Mon> - CLOCK: [2014-03-03 Mon 22:44]--[2014-03-03 Mon 23:16] => 0:32 -*** DONE make multi-segment touch worm with touch sensors and display - CLOSED: [2014-03-03 Mon 23:54] SCHEDULED: <2014-03-03 Mon> - -*** DONE Make a worm wiggle and curl - CLOSED: [2014-03-04 Tue 23:03] SCHEDULED: <2014-03-04 Tue> - - -** First draft - -Subgoals: -*** Writeup new worm experiments. -*** Triage implementation code and get it into chapter form. - - - - - -** for today - -- guided worm :: control the worm with the keyboard. Useful for - testing the body-centered recog scripts, and for - preparing a cool demo video. - -- body-centered recognition :: detect actions using hard coded - body-centered scripts. - -- cool demo video of the worm being moved and recognizing things :: - will be a neat part of the thesis. - -- thesis export :: refactoring and organization of code so that it - spits out a thesis in addition to the web page. - -- video alignment :: analyze the frames of a video in order to align - the worm. Requires body-centered recognition. Can "cheat". - -- smoother actions :: use debugging controls to directly influence the - demo actions, and to generate recoginition procedures. - -- degenerate video demonstration :: show the system recognizing a - curled worm from dead on. Crowning achievement of thesis. - -** Ordered from easiest to hardest - -Just report the positions of everything. I don't think that this -necessairly shows anything usefull. - -Worm-segment vision -- you initialize a view of the worm, but instead -of pixels you use labels via ray tracing. Has the advantage of still -allowing for visual occlusion, but reliably identifies the objects, -even without rainbow coloring. You can code this as an image. - -Same as above, except just with worm/non-worm labels. - -Color code each worm segment and then recognize them using blob -detectors. Then you solve for the perspective and the action -simultaneously. - -The entire worm can be colored the same, high contrast color against a -nearly black background. - -"Rooted" vision. You give the exact coordinates of ONE piece of the -worm, but the algorithm figures out the rest. - -More rooted vision -- start off the entire worm with one posistion. - -The right way to do alignment is to use motion over multiple frames to -snap individual pieces of the model into place sharing and -propragating the individual alignments over the whole model. We also -want to limit the alignment search to just those actions we are -prepared to identify. This might mean that I need some small "micro -actions" such as the individual movements of the worm pieces. - -Get just the centers of each segment projected onto the imaging -plane. (best so far). - - -Repertoire of actions + video frames --> - directed multi-frame-search alg - - - - - - -!! Could also have a bounding box around the worm provided by -filtering the worm/non-worm render, and use bbbgs. As a bonus, I get -to include bbbgs in my thesis! Could finally do that recursive things -where I make bounding boxes be those things that give results that -give good bounding boxes. If I did this I could use a disruptive -pattern on the worm. - -Re imagining using default textures is very simple for this system, -but hard for others. - - -Want to demonstrate, at minimum, alignment of some model of the worm -to the video, and a lookup of the action by simulated perception. - -note: the purple/white points is a very beautiful texture, because -when it moves slightly, the white dots look like they're -twinkling. Would look even better if it was a darker purple. Also -would look better more spread out. - - -embed assumption of one frame of view, search by moving around in -simulated world. - -Allowed to limit search by setting limits to a hemisphere around the -imagined worm! This limits scale also. - - - - - -!! Limited search with worm/non-worm rendering. -How much inverse kinematics do we have to do? -What about cached (allowed state-space) paths, derived from labeled -training. You have to lead from one to another. - -What about initial state? Could start the input videos at a specific -state, then just match that explicitly. - -!! The training doesn't have to be labeled -- you can just move around -for a while!! - -!! Limited search with motion based alignment. - - - - -"play arounds" can establish a chain of linked sensoriums. Future -matches must fall into one of the already experienced things, and once -they do, it greatly limits the things that are possible in the future. - - -frame differences help to detect muscle exertion. - -Can try to match on a few "representative" frames. Can also just have -a few "bodies" in various states which we try to match. - - - -Paths through state-space have the exact same signature as -simulation. BUT, these can be searched in parallel and don't interfere -with each other. - - - - -** Final stretch up to First Draft - -*** DONE complete debug control of worm - CLOSED: [2014-03-17 Mon 17:29] SCHEDULED: <2014-03-17 Mon> - CLOCK: [2014-03-17 Mon 14:01]--[2014-03-17 Mon 17:29] => 3:28 -*** DONE add phi-space output to debug control - CLOSED: [2014-03-17 Mon 17:42] SCHEDULED: <2014-03-17 Mon> - CLOCK: [2014-03-17 Mon 17:31]--[2014-03-17 Mon 17:42] => 0:11 - -*** DONE complete automatic touch partitioning - CLOSED: [2014-03-18 Tue 21:43] SCHEDULED: <2014-03-18 Tue> -*** DONE complete cyclic predicate - CLOSED: [2014-03-19 Wed 16:34] SCHEDULED: <2014-03-18 Tue> - CLOCK: [2014-03-19 Wed 13:16]--[2014-03-19 Wed 16:34] => 3:18 -*** DONE complete three phi-stream action predicatates; test them with debug control - CLOSED: [2014-03-19 Wed 16:35] SCHEDULED: <2014-03-17 Mon> - CLOCK: [2014-03-18 Tue 18:36]--[2014-03-18 Tue 21:43] => 3:07 - CLOCK: [2014-03-18 Tue 18:34]--[2014-03-18 Tue 18:36] => 0:02 - CLOCK: [2014-03-17 Mon 19:19]--[2014-03-17 Mon 21:19] => 2:00 -*** DONE build an automatic "do all the things" sequence. - CLOSED: [2014-03-19 Wed 16:55] SCHEDULED: <2014-03-19 Wed> - CLOCK: [2014-03-19 Wed 16:53]--[2014-03-19 Wed 16:55] => 0:02 -*** DONE implement proprioception based movement lookup in phi-space - CLOSED: [2014-03-19 Wed 22:04] SCHEDULED: <2014-03-19 Wed> - CLOCK: [2014-03-19 Wed 19:32]--[2014-03-19 Wed 22:04] => 2:32 -*** DONE make proprioception reference phi-space indexes - CLOSED: [2014-03-19 Wed 22:47] SCHEDULED: <2014-03-19 Wed> - CLOCK: [2014-03-19 Wed 22:07] - - -*** DONE create test videos, also record positions of worm segments - CLOSED: [2014-03-20 Thu 22:02] SCHEDULED: <2014-03-19 Wed> - -*** TODO Collect intro, worm-learn and cortex creation into draft thesis. - diff -r b5d0f0adf19f -r 5205535237fb thesis/org/first-chapter.html --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/thesis/org/first-chapter.html Sat Mar 22 16:10:34 2014 -0400 @@ -0,0 +1,455 @@ + + + + +<code>CORTEX</code> + + + + + + + + + + + + + + + +
+

CORTEX

+ + +
+
+ +
+ +

aurellem

+ +
+ +
Written by Robert McIntyre
+ + + + + + + +
+

Artificial Imagination

+
+ + +

+ Imagine watching a video of someone skateboarding. When you watch + the video, you can imagine yourself skateboarding, and your + knowledge of the human body and its dynamics guides your + interpretation of the scene. For example, even if the skateboarder + is partially occluded, you can infer the positions of his arms and + body from your own knowledge of how your body would be positioned if + you were skateboarding. If the skateboarder suffers an accident, you + wince in sympathy, imagining the pain your own body would experience + if it were in the same situation. This empathy with other people + guides our understanding of whatever they are doing because it is a + powerful constraint on what is probable and possible. In order to + make use of this powerful empathy constraint, I need a system that + can generate and make sense of sensory data from the many different + senses that humans possess. The two key proprieties of such a system + are embodiment and imagination. +

+ +
+ +
+

What is imagination?

+
+ + +

+ One kind of imagination is sympathetic imagination: you imagine + yourself in the position of something/someone you are + observing. This type of imagination comes into play when you follow + along visually when watching someone perform actions, or when you + sympathetically grimace when someone hurts themselves. This type of + imagination uses the constraints you have learned about your own + body to highly constrain the possibilities in whatever you are + seeing. It uses all your senses to including your senses of touch, + proprioception, etc. Humans are flexible when it comes to "putting + themselves in another's shoes," and can sympathetically understand + not only other humans, but entities ranging animals to cartoon + characters to single dots on a screen! +

+

+ Another kind of imagination is predictive imagination: you + construct scenes in your mind that are not entirely related to + whatever you are observing, but instead are predictions of the + future or simply flights of fancy. You use this type of imagination + to plan out multi-step actions, or play out dangerous situations in + your mind so as to avoid messing them up in reality. +

+

+ Of course, sympathetic and predictive imagination blend into each + other and are not completely separate concepts. One dimension along + which you can distinguish types of imagination is dependence on raw + sense data. Sympathetic imagination is highly constrained by your + senses, while predictive imagination can be more or less dependent + on your senses depending on how far ahead you imagine. Daydreaming + is an extreme form of predictive imagination that wanders through + different possibilities without concern for whether they are + related to whatever is happening in reality. +

+

+ For this thesis, I will mostly focus on sympathetic imagination and + the constraint it provides for understanding sensory data. +

+
+ +
+ +
+

What problems can imagination solve?

+
+ + +

+ Consider a video of a cat drinking some water. +

+ +
+

../images/cat-drinking.jpg

+

A cat drinking some water. Identifying this action is beyond the state of the art for computers.

+
+ +

+ It is currently impossible for any computer program to reliably + label such an video as "drinking". I think humans are able to label + such video as "drinking" because they imagine themselves as the + cat, and imagine putting their face up against a stream of water + and sticking out their tongue. In that imagined world, they can + feel the cool water hitting their tongue, and feel the water + entering their body, and are able to recognize that feeling as + drinking. So, the label of the action is not really in the pixels + of the image, but is found clearly in a simulation inspired by + those pixels. An imaginative system, having been trained on + drinking and non-drinking examples and learning that the most + important component of drinking is the feeling of water sliding + down one's throat, would analyze a video of a cat drinking in the + following manner: +

+
    +
  • Create a physical model of the video by putting a "fuzzy" model + of its own body in place of the cat. Also, create a simulation of + the stream of water. + +
  • +
  • Play out this simulated scene and generate imagined sensory + experience. This will include relevant muscle contractions, a + close up view of the stream from the cat's perspective, and most + importantly, the imagined feeling of water entering the mouth. + +
  • +
  • The action is now easily identified as drinking by the sense of + taste alone. The other senses (such as the tongue moving in and + out) help to give plausibility to the simulated action. Note that + the sense of vision, while critical in creating the simulation, + is not critical for identifying the action from the simulation. +
  • +
+ + +

+ More generally, I expect imaginative systems to be particularly + good at identifying embodied actions in videos. +

+
+
+ +
+ +
+

Cortex

+
+ + +

+ The previous example involves liquids, the sense of taste, and + imagining oneself as a cat. For this thesis I constrain myself to + simpler, more easily digitizable senses and situations. +

+

+ My system, Cortex performs imagination in two different simplified + worlds: worm world and stick figure world. In each of these + worlds, entities capable of imagination recognize actions by + simulating the experience from their own perspective, and then + recognizing the action from a database of examples. +

+

+ In order to serve as a framework for experiments in imagination, + Cortex requires simulated bodies, worlds, and senses like vision, + hearing, touch, proprioception, etc. +

+ +
+ +
+

A Video Game Engine takes care of some of the groundwork

+
+ + +

+ When it comes to simulation environments, the engines used to + create the worlds in video games offer top-notch physics and + graphics support. These engines also have limited support for + creating cameras and rendering 3D sound, which can be repurposed + for vision and hearing respectively. Physics collision detection + can be expanded to create a sense of touch. +

+

+ jMonkeyEngine3 is one such engine for creating video games in + Java. It uses OpenGL to render to the screen and uses screengraphs + to avoid drawing things that do not appear on the screen. It has an + active community and several games in the pipeline. The engine was + not built to serve any particular game but is instead meant to be + used for any 3D game. I chose jMonkeyEngine3 it because it had the + most features out of all the open projects I looked at, and because + I could then write my code in Clojure, an implementation of LISP + that runs on the JVM. +

+
+ +
+ +
+

CORTEX Extends jMonkeyEngine3 to implement rich senses

+
+ + +

+ Using the game-making primitives provided by jMonkeyEngine3, I have + constructed every major human sense except for smell and + taste. Cortex also provides an interface for creating creatures + in Blender, a 3D modeling environment, and then "rigging" the + creatures with senses using 3D annotations in Blender. A creature + can have any number of senses, and there can be any number of + creatures in a simulation. +

+

+ The senses available in Cortex are: +

+ + + +
+
+ +
+ +
+

A roadmap for Cortex experiments

+
+ + + +
+ +
+

Worm World

+
+ + +

+ Worms in Cortex are segmented creatures which vary in length and + number of segments, and have the senses of vision, proprioception, + touch, and muscle tension. +

+ +
+

../images/finger-UV.png

+

This is the tactile-sensor-profile for the upper segment of a worm. It defines regions of high touch sensitivity (where there are many white pixels) and regions of low sensitivity (where white pixels are sparse).

+
+ + + + +
+
+ +
YouTube +
+

The worm responds to touch.

+
+ +
+
+ +
YouTube +
+

Proprioception in a worm. The proprioceptive readout is + in the upper left corner of the screen.

+
+ +

+ A worm is trained in various actions such as sinusoidal movement, + curling, flailing, and spinning by directly playing motor + contractions while the worm "feels" the experience. These actions + are recorded both as vectors of muscle tension, touch, and + proprioceptive data, but also in higher level forms such as + frequencies of the various contractions and a symbolic name for the + action. +

+

+ Then, the worm watches a video of another worm performing one of + the actions, and must judge which action was performed. Normally + this would be an extremely difficult problem, but the worm is able + to greatly diminish the search space through sympathetic + imagination. First, it creates an imagined copy of its body which + it observes from a third person point of view. Then for each frame + of the video, it maneuvers its simulated body to be in registration + with the worm depicted in the video. The physical constraints + imposed by the physics simulation greatly decrease the number of + poses that have to be tried, making the search feasible. As the + imaginary worm moves, it generates imaginary muscle tension and + proprioceptive sensations. The worm determines the action not by + vision, but by matching the imagined proprioceptive data with + previous examples. +

+

+ By using non-visual sensory data such as touch, the worms can also + answer body related questions such as "did your head touch your + tail?" and "did worm A touch worm B?" +

+

+ The proprioceptive information used for action identification is + body-centric, so only the registration step is dependent on point + of view, not the identification step. Registration is not specific + to any particular action. Thus, action identification can be + divided into a point-of-view dependent generic registration step, + and a action-specific step that is body-centered and invariant to + point of view. +

+
+ +
+ +
+

Stick Figure World

+
+ + +

+ This environment is similar to Worm World, except the creatures are + more complicated and the actions and questions more varied. It is + an experiment to see how far imagination can go in interpreting + actions. +

+
+
+
+ +
+

Date: 2013-11-07 04:21:29 EST

+

Author: Robert McIntyre

+

Org version 7.7 with Emacs version 24

+Validate XHTML 1.0 + +
+ + diff -r b5d0f0adf19f -r 5205535237fb thesis/org/first-chapter.org --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/thesis/org/first-chapter.org Sat Mar 22 16:10:34 2014 -0400 @@ -0,0 +1,241 @@ +#+title: =CORTEX= +#+author: Robert McIntyre +#+email: rlm@mit.edu +#+description: Using embodied AI to facilitate Artificial Imagination. +#+keywords: AI, clojure, embodiment +#+SETUPFILE: ../../aurellem/org/setup.org +#+INCLUDE: ../../aurellem/org/level-0.org +#+babel: :mkdirp yes :noweb yes :exports both +#+OPTIONS: toc:nil, num:nil + +* Artificial Imagination + Imagine watching a video of someone skateboarding. When you watch + the video, you can imagine yourself skateboarding, and your + knowledge of the human body and its dynamics guides your + interpretation of the scene. For example, even if the skateboarder + is partially occluded, you can infer the positions of his arms and + body from your own knowledge of how your body would be positioned if + you were skateboarding. If the skateboarder suffers an accident, you + wince in sympathy, imagining the pain your own body would experience + if it were in the same situation. This empathy with other people + guides our understanding of whatever they are doing because it is a + powerful constraint on what is probable and possible. In order to + make use of this powerful empathy constraint, I need a system that + can generate and make sense of sensory data from the many different + senses that humans possess. The two key proprieties of such a system + are /embodiment/ and /imagination/. + +** What is imagination? + + One kind of imagination is /sympathetic/ imagination: you imagine + yourself in the position of something/someone you are + observing. This type of imagination comes into play when you follow + along visually when watching someone perform actions, or when you + sympathetically grimace when someone hurts themselves. This type of + imagination uses the constraints you have learned about your own + body to highly constrain the possibilities in whatever you are + seeing. It uses all your senses to including your senses of touch, + proprioception, etc. Humans are flexible when it comes to "putting + themselves in another's shoes," and can sympathetically understand + not only other humans, but entities ranging from animals to cartoon + characters to [[http://www.youtube.com/watch?v=0jz4HcwTQmU][single dots]] on a screen! + +# and can infer intention from the actions of not only other humans, +# but also animals, cartoon characters, and even abstract moving dots +# on a screen! + + Another kind of imagination is /predictive/ imagination: you + construct scenes in your mind that are not entirely related to + whatever you are observing, but instead are predictions of the + future or simply flights of fancy. You use this type of imagination + to plan out multi-step actions, or play out dangerous situations in + your mind so as to avoid messing them up in reality. + + Of course, sympathetic and predictive imagination blend into each + other and are not completely separate concepts. One dimension along + which you can distinguish types of imagination is dependence on raw + sense data. Sympathetic imagination is highly constrained by your + senses, while predictive imagination can be more or less dependent + on your senses depending on how far ahead you imagine. Daydreaming + is an extreme form of predictive imagination that wanders through + different possibilities without concern for whether they are + related to whatever is happening in reality. + + For this thesis, I will mostly focus on sympathetic imagination and + the constraint it provides for understanding sensory data. + +** What problems can imagination solve? + + Consider a video of a cat drinking some water. + + #+caption: A cat drinking some water. Identifying this action is beyond the state of the art for computers. + #+ATTR_LaTeX: width=5cm + [[../images/cat-drinking.jpg]] + + It is currently impossible for any computer program to reliably + label such an video as "drinking". I think humans are able to label + such video as "drinking" because they imagine /themselves/ as the + cat, and imagine putting their face up against a stream of water + and sticking out their tongue. In that imagined world, they can + feel the cool water hitting their tongue, and feel the water + entering their body, and are able to recognize that /feeling/ as + drinking. So, the label of the action is not really in the pixels + of the image, but is found clearly in a simulation inspired by + those pixels. An imaginative system, having been trained on + drinking and non-drinking examples and learning that the most + important component of drinking is the feeling of water sliding + down one's throat, would analyze a video of a cat drinking in the + following manner: + + - Create a physical model of the video by putting a "fuzzy" model + of its own body in place of the cat. Also, create a simulation of + the stream of water. + + - Play out this simulated scene and generate imagined sensory + experience. This will include relevant muscle contractions, a + close up view of the stream from the cat's perspective, and most + importantly, the imagined feeling of water entering the mouth. + + - The action is now easily identified as drinking by the sense of + taste alone. The other senses (such as the tongue moving in and + out) help to give plausibility to the simulated action. Note that + the sense of vision, while critical in creating the simulation, + is not critical for identifying the action from the simulation. + + More generally, I expect imaginative systems to be particularly + good at identifying embodied actions in videos. + +* Cortex + + The previous example involves liquids, the sense of taste, and + imagining oneself as a cat. For this thesis I constrain myself to + simpler, more easily digitizable senses and situations. + + My system, =CORTEX= performs imagination in two different simplified + worlds: /worm world/ and /stick-figure world/. In each of these + worlds, entities capable of imagination recognize actions by + simulating the experience from their own perspective, and then + recognizing the action from a database of examples. + + In order to serve as a framework for experiments in imagination, + =CORTEX= requires simulated bodies, worlds, and senses like vision, + hearing, touch, proprioception, etc. + +** A Video Game Engine takes care of some of the groundwork + + When it comes to simulation environments, the engines used to + create the worlds in video games offer top-notch physics and + graphics support. These engines also have limited support for + creating cameras and rendering 3D sound, which can be repurposed + for vision and hearing respectively. Physics collision detection + can be expanded to create a sense of touch. + + jMonkeyEngine3 is one such engine for creating video games in + Java. It uses OpenGL to render to the screen and uses screengraphs + to avoid drawing things that do not appear on the screen. It has an + active community and several games in the pipeline. The engine was + not built to serve any particular game but is instead meant to be + used for any 3D game. I chose jMonkeyEngine3 it because it had the + most features out of all the open projects I looked at, and because + I could then write my code in Clojure, an implementation of LISP + that runs on the JVM. + +** =CORTEX= Extends jMonkeyEngine3 to implement rich senses + + Using the game-making primitives provided by jMonkeyEngine3, I have + constructed every major human sense except for smell and + taste. =CORTEX= also provides an interface for creating creatures + in Blender, a 3D modeling environment, and then "rigging" the + creatures with senses using 3D annotations in Blender. A creature + can have any number of senses, and there can be any number of + creatures in a simulation. + + The senses available in =CORTEX= are: + + - [[../../cortex/html/vision.html][Vision]] + - [[../../cortex/html/hearing.html][Hearing]] + - [[../../cortex/html/touch.html][Touch]] + - [[../../cortex/html/proprioception.html][Proprioception]] + - [[../../cortex/html/movement.html][Muscle Tension]] + +* A roadmap for =CORTEX= experiments + +** Worm World + + Worms in =CORTEX= are segmented creatures which vary in length and + number of segments, and have the senses of vision, proprioception, + touch, and muscle tension. + +#+attr_html: width=755 +#+caption: This is the tactile-sensor-profile for the upper segment of a worm. It defines regions of high touch sensitivity (where there are many white pixels) and regions of low sensitivity (where white pixels are sparse). +[[../images/finger-UV.png]] + + +#+begin_html +
+
+ +
YouTube +
+

The worm responds to touch.

+
+#+end_html + +#+begin_html +
+
+ +
YouTube +
+

Proprioception in a worm. The proprioceptive readout is + in the upper left corner of the screen.

+
+#+end_html + + A worm is trained in various actions such as sinusoidal movement, + curling, flailing, and spinning by directly playing motor + contractions while the worm "feels" the experience. These actions + are recorded both as vectors of muscle tension, touch, and + proprioceptive data, but also in higher level forms such as + frequencies of the various contractions and a symbolic name for the + action. + + Then, the worm watches a video of another worm performing one of + the actions, and must judge which action was performed. Normally + this would be an extremely difficult problem, but the worm is able + to greatly diminish the search space through sympathetic + imagination. First, it creates an imagined copy of its body which + it observes from a third person point of view. Then for each frame + of the video, it maneuvers its simulated body to be in registration + with the worm depicted in the video. The physical constraints + imposed by the physics simulation greatly decrease the number of + poses that have to be tried, making the search feasible. As the + imaginary worm moves, it generates imaginary muscle tension and + proprioceptive sensations. The worm determines the action not by + vision, but by matching the imagined proprioceptive data with + previous examples. + + By using non-visual sensory data such as touch, the worms can also + answer body related questions such as "did your head touch your + tail?" and "did worm A touch worm B?" + + The proprioceptive information used for action identification is + body-centric, so only the registration step is dependent on point + of view, not the identification step. Registration is not specific + to any particular action. Thus, action identification can be + divided into a point-of-view dependent generic registration step, + and a action-specific step that is body-centered and invariant to + point of view. + +** Stick Figure World + + This environment is similar to Worm World, except the creatures are + more complicated and the actions and questions more varied. It is + an experiment to see how far imagination can go in interpreting + actions. diff -r b5d0f0adf19f -r 5205535237fb thesis/org/roadmap.org --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/thesis/org/roadmap.org Sat Mar 22 16:10:34 2014 -0400 @@ -0,0 +1,220 @@ +In order for this to be a reasonable thesis that I can be proud of, +what are the /minimum/ number of things I need to get done? + + +* worm OR hand registration + - training from a few examples (2 to start out) + - aligning the body with the scene + - generating sensory data + - matching previous labeled examples using dot-products or some + other basic thing + - showing that it works with different views + +* first draft + - draft of thesis without bibliography or formatting + - should have basic experiment and have full description of + framework with code + - review with Winston + +* final draft + - implement stretch goals from Winston if possible + - complete final formatting and submit + +* CORTEX + DEADLINE: <2014-05-09 Fri> + SHIT THAT'S IN 67 DAYS!!! + +** program simple feature matching code for the worm's segments + +Subgoals: +*** DONE Get cortex working again, run tests, no jmonkeyengine updates + CLOSED: [2014-03-03 Mon 22:07] SCHEDULED: <2014-03-03 Mon> +*** DONE get blender working again + CLOSED: [2014-03-03 Mon 22:43] SCHEDULED: <2014-03-03 Mon> +*** DONE make sparce touch worm segment in blender + CLOSED: [2014-03-03 Mon 23:16] SCHEDULED: <2014-03-03 Mon> + CLOCK: [2014-03-03 Mon 22:44]--[2014-03-03 Mon 23:16] => 0:32 +*** DONE make multi-segment touch worm with touch sensors and display + CLOSED: [2014-03-03 Mon 23:54] SCHEDULED: <2014-03-03 Mon> + +*** DONE Make a worm wiggle and curl + CLOSED: [2014-03-04 Tue 23:03] SCHEDULED: <2014-03-04 Tue> + + +** First draft + +Subgoals: +*** Writeup new worm experiments. +*** Triage implementation code and get it into chapter form. + + + + + +** for today + +- guided worm :: control the worm with the keyboard. Useful for + testing the body-centered recog scripts, and for + preparing a cool demo video. + +- body-centered recognition :: detect actions using hard coded + body-centered scripts. + +- cool demo video of the worm being moved and recognizing things :: + will be a neat part of the thesis. + +- thesis export :: refactoring and organization of code so that it + spits out a thesis in addition to the web page. + +- video alignment :: analyze the frames of a video in order to align + the worm. Requires body-centered recognition. Can "cheat". + +- smoother actions :: use debugging controls to directly influence the + demo actions, and to generate recoginition procedures. + +- degenerate video demonstration :: show the system recognizing a + curled worm from dead on. Crowning achievement of thesis. + +** Ordered from easiest to hardest + +Just report the positions of everything. I don't think that this +necessairly shows anything usefull. + +Worm-segment vision -- you initialize a view of the worm, but instead +of pixels you use labels via ray tracing. Has the advantage of still +allowing for visual occlusion, but reliably identifies the objects, +even without rainbow coloring. You can code this as an image. + +Same as above, except just with worm/non-worm labels. + +Color code each worm segment and then recognize them using blob +detectors. Then you solve for the perspective and the action +simultaneously. + +The entire worm can be colored the same, high contrast color against a +nearly black background. + +"Rooted" vision. You give the exact coordinates of ONE piece of the +worm, but the algorithm figures out the rest. + +More rooted vision -- start off the entire worm with one posistion. + +The right way to do alignment is to use motion over multiple frames to +snap individual pieces of the model into place sharing and +propragating the individual alignments over the whole model. We also +want to limit the alignment search to just those actions we are +prepared to identify. This might mean that I need some small "micro +actions" such as the individual movements of the worm pieces. + +Get just the centers of each segment projected onto the imaging +plane. (best so far). + + +Repertoire of actions + video frames --> + directed multi-frame-search alg + + + + + + +!! Could also have a bounding box around the worm provided by +filtering the worm/non-worm render, and use bbbgs. As a bonus, I get +to include bbbgs in my thesis! Could finally do that recursive things +where I make bounding boxes be those things that give results that +give good bounding boxes. If I did this I could use a disruptive +pattern on the worm. + +Re imagining using default textures is very simple for this system, +but hard for others. + + +Want to demonstrate, at minimum, alignment of some model of the worm +to the video, and a lookup of the action by simulated perception. + +note: the purple/white points is a very beautiful texture, because +when it moves slightly, the white dots look like they're +twinkling. Would look even better if it was a darker purple. Also +would look better more spread out. + + +embed assumption of one frame of view, search by moving around in +simulated world. + +Allowed to limit search by setting limits to a hemisphere around the +imagined worm! This limits scale also. + + + + + +!! Limited search with worm/non-worm rendering. +How much inverse kinematics do we have to do? +What about cached (allowed state-space) paths, derived from labeled +training. You have to lead from one to another. + +What about initial state? Could start the input videos at a specific +state, then just match that explicitly. + +!! The training doesn't have to be labeled -- you can just move around +for a while!! + +!! Limited search with motion based alignment. + + + + +"play arounds" can establish a chain of linked sensoriums. Future +matches must fall into one of the already experienced things, and once +they do, it greatly limits the things that are possible in the future. + + +frame differences help to detect muscle exertion. + +Can try to match on a few "representative" frames. Can also just have +a few "bodies" in various states which we try to match. + + + +Paths through state-space have the exact same signature as +simulation. BUT, these can be searched in parallel and don't interfere +with each other. + + + + +** Final stretch up to First Draft + +*** DONE complete debug control of worm + CLOSED: [2014-03-17 Mon 17:29] SCHEDULED: <2014-03-17 Mon> + CLOCK: [2014-03-17 Mon 14:01]--[2014-03-17 Mon 17:29] => 3:28 +*** DONE add phi-space output to debug control + CLOSED: [2014-03-17 Mon 17:42] SCHEDULED: <2014-03-17 Mon> + CLOCK: [2014-03-17 Mon 17:31]--[2014-03-17 Mon 17:42] => 0:11 + +*** DONE complete automatic touch partitioning + CLOSED: [2014-03-18 Tue 21:43] SCHEDULED: <2014-03-18 Tue> +*** DONE complete cyclic predicate + CLOSED: [2014-03-19 Wed 16:34] SCHEDULED: <2014-03-18 Tue> + CLOCK: [2014-03-19 Wed 13:16]--[2014-03-19 Wed 16:34] => 3:18 +*** DONE complete three phi-stream action predicatates; test them with debug control + CLOSED: [2014-03-19 Wed 16:35] SCHEDULED: <2014-03-17 Mon> + CLOCK: [2014-03-18 Tue 18:36]--[2014-03-18 Tue 21:43] => 3:07 + CLOCK: [2014-03-18 Tue 18:34]--[2014-03-18 Tue 18:36] => 0:02 + CLOCK: [2014-03-17 Mon 19:19]--[2014-03-17 Mon 21:19] => 2:00 +*** DONE build an automatic "do all the things" sequence. + CLOSED: [2014-03-19 Wed 16:55] SCHEDULED: <2014-03-19 Wed> + CLOCK: [2014-03-19 Wed 16:53]--[2014-03-19 Wed 16:55] => 0:02 +*** DONE implement proprioception based movement lookup in phi-space + CLOSED: [2014-03-19 Wed 22:04] SCHEDULED: <2014-03-19 Wed> + CLOCK: [2014-03-19 Wed 19:32]--[2014-03-19 Wed 22:04] => 2:32 +*** DONE make proprioception reference phi-space indexes + CLOSED: [2014-03-19 Wed 22:47] SCHEDULED: <2014-03-19 Wed> + CLOCK: [2014-03-19 Wed 22:07] + + +*** DONE create test videos, also record positions of worm segments + CLOSED: [2014-03-20 Thu 22:02] SCHEDULED: <2014-03-19 Wed> + +*** TODO Collect intro, worm-learn and cortex creation into draft thesis. +