Mercurial > cortex

changeset 470:3401053124b0
integrating vision into thesis.
author: Robert McIntyre <rlm@mit.edu>
date: Fri, 28 Mar 2014 17:10:43 -0400
parents: ae10f35022ba
children: f14fa9e5b67f
files: org/vision.org thesis/cortex.org thesis/images/retina-small.png
diffstat: 3 files changed, 315 insertions(+), 25 deletions(-) [+]
[-]

org/vision.org 17

thesis/cortex.org 323

thesis/images/retina-small.png 0 org/vision.org 17 thesis/cortex.org 323 thesis/images/retina-small.png 0
org/vision.org 17
thesis/cortex.org 323
     1.1 --- a/org/vision.org	Fri Mar 28 16:34:35 2014 -0400
     1.2 +++ b/org/vision.org	Fri Mar 28 17:10:43 2014 -0400
     1.3 @@ -174,21 +174,18 @@
     1.4      (bind-sense target cam) cam))
     1.5  #+end_src
     1.6  
     1.7 -#+results: add-eye
     1.8 -: #'cortex.vision/add-eye!
     1.9 -
    1.10  Here, the camera is created based on metadata on the eye-node and
    1.11  attached to the nearest physical object with =bind-sense=
    1.12  ** The Retina
    1.13  
    1.14  An eye is a surface (the retina) which contains many discrete sensors
    1.15 -to detect light. These sensors have can have different light-sensing
    1.16 -properties.  In humans, each discrete sensor is sensitive to red,
    1.17 -blue, green, or gray. These different types of sensors can have
    1.18 -different spatial distributions along the retina. In humans, there is
    1.19 -a fovea in the center of the retina which has a very high density of
    1.20 -color sensors, and a blind spot which has no sensors at all. Sensor
    1.21 -density decreases in proportion to distance from the fovea.
    1.22 +to detect light. These sensors can have different light-sensing
    1.23 +properties. In humans, each discrete sensor is sensitive to red, blue,
    1.24 +green, or gray. These different types of sensors can have different
    1.25 +spatial distributions along the retina. In humans, there is a fovea in
    1.26 +the center of the retina which has a very high density of color
    1.27 +sensors, and a blind spot which has no sensors at all. Sensor density
    1.28 +decreases in proportion to distance from the fovea.
    1.29  
    1.30  I want to be able to model any retinal configuration, so my eye-nodes
    1.31  in blender contain metadata pointing to images that describe the

     2.1 --- a/thesis/cortex.org	Fri Mar 28 16:34:35 2014 -0400
     2.2 +++ b/thesis/cortex.org	Fri Mar 28 17:10:43 2014 -0400
     2.3 @@ -6,22 +6,36 @@
     2.4  #+LaTeX_CLASS_OPTIONS: [nofloat]
     2.5  
     2.6  * COMMENT templates
     2.7 -  #+caption: 
     2.8 -  #+caption: 
     2.9 -  #+caption: 
    2.10 -  #+caption: 
    2.11 -  #+name: name
    2.12 -  #+begin_listing clojure
    2.13 -  #+begin_src clojure
    2.14 -  #+end_src
    2.15 -  #+end_listing
    2.16 +   #+caption: 
    2.17 +   #+caption: 
    2.18 +   #+caption: 
    2.19 +   #+caption: 
    2.20 +   #+name: name
    2.21 +   #+begin_listing clojure
    2.22 +   #+end_listing
    2.23  
    2.24 -  #+caption: 
    2.25 -  #+caption: 
    2.26 -  #+caption: 
    2.27 -  #+name: name
    2.28 -  #+ATTR_LaTeX: :width 10cm
    2.29 -  [[./images/aurellem-gray.png]]
    2.30 +   #+caption: 
    2.31 +   #+caption: 
    2.32 +   #+caption: 
    2.33 +   #+name: name
    2.34 +   #+ATTR_LaTeX: :width 10cm
    2.35 +   [[./images/aurellem-gray.png]]
    2.36 +
    2.37 +    #+caption: 
    2.38 +    #+caption: 
    2.39 +    #+caption: 
    2.40 +    #+caption: 
    2.41 +    #+name: name
    2.42 +    #+begin_listing clojure
    2.43 +    #+end_listing
    2.44 +
    2.45 +    #+caption: 
    2.46 +    #+caption: 
    2.47 +    #+caption: 
    2.48 +    #+name: name
    2.49 +    #+ATTR_LaTeX: :width 10cm
    2.50 +    [[./images/aurellem-gray.png]]
    2.51 +
    2.52  
    2.53  * COMMENT Empathy and Embodiment as problem solving strategies
    2.54    
    2.55 @@ -942,6 +956,285 @@
    2.56  
    2.57  ** Eyes reuse standard video game components
    2.58  
    2.59 +   Vision is one of the most important senses for humans, so I need to
    2.60 +   build a simulated sense of vision for my AI. I will do this with
    2.61 +   simulated eyes. Each eye can be independently moved and should see
    2.62 +   its own version of the world depending on where it is.
    2.63 +
    2.64 +   Making these simulated eyes a reality is simple because
    2.65 +   jMonkeyEngine already contains extensive support for multiple views
    2.66 +   of the same 3D simulated world. The reason jMonkeyEngine has this
    2.67 +   support is because the support is necessary to create games with
    2.68 +   split-screen views. Multiple views are also used to create
    2.69 +   efficient pseudo-reflections by rendering the scene from a certain
    2.70 +   perspective and then projecting it back onto a surface in the 3D
    2.71 +   world.
    2.72 +
    2.73 +   #+caption: jMonkeyEngine supports multiple views to enable 
    2.74 +   #+caption: split-screen games, like GoldenEye, which was one of 
    2.75 +   #+caption: the first games to use split-screen views.
    2.76 +   #+name: name
    2.77 +   #+ATTR_LaTeX: :width 10cm
    2.78 +   [[./images/goldeneye-4-player.png]]
    2.79 +
    2.80 +*** A Brief Description of jMonkeyEngine's Rendering Pipeline
    2.81 +
    2.82 +    jMonkeyEngine allows you to create a =ViewPort=, which represents a
    2.83 +    view of the simulated world. You can create as many of these as you
    2.84 +    want. Every frame, the =RenderManager= iterates through each
    2.85 +    =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there
    2.86 +    is a =FrameBuffer= which represents the rendered image in the GPU.
    2.87 +  
    2.88 +    #+caption: =ViewPorts= are cameras in the world. During each frame, 
    2.89 +    #+caption: the =RenderManager= records a snapshot of what each view 
    2.90 +    #+caption: is currently seeing; these snapshots are =FrameBuffer= objects.
    2.91 +    #+name: name
    2.92 +    #+ATTR_LaTeX: :width 10cm
    2.93 +    [[../images/diagram_rendermanager2.png]]
    2.94 +
    2.95 +    Each =ViewPort= can have any number of attached =SceneProcessor=
    2.96 +    objects, which are called every time a new frame is rendered. A
    2.97 +    =SceneProcessor= receives its =ViewPort's= =FrameBuffer= and can do
    2.98 +    whatever it wants to the data.  Often this consists of invoking GPU
    2.99 +    specific operations on the rendered image.  The =SceneProcessor= can
   2.100 +    also copy the GPU image data to RAM and process it with the CPU.
   2.101 +
   2.102 +*** Appropriating Views for Vision
   2.103 +
   2.104 +    Each eye in the simulated creature needs its own =ViewPort= so
   2.105 +    that it can see the world from its own perspective. To this
   2.106 +    =ViewPort=, I add a =SceneProcessor= that feeds the visual data to
   2.107 +    any arbitrary continuation function for further processing. That
   2.108 +    continuation function may perform both CPU and GPU operations on
   2.109 +    the data. To make this easy for the continuation function, the
   2.110 +    =SceneProcessor= maintains appropriately sized buffers in RAM to
   2.111 +    hold the data. It does not do any copying from the GPU to the CPU
   2.112 +    itself because it is a slow operation.
   2.113 +
   2.114 +    #+caption: Function to make the rendered secne in jMonkeyEngine 
   2.115 +    #+caption: available for further processing.
   2.116 +    #+name: pipeline-1 
   2.117 +    #+begin_listing clojure
   2.118 +    #+begin_src clojure
   2.119 +(defn vision-pipeline
   2.120 +  "Create a SceneProcessor object which wraps a vision processing
   2.121 +  continuation function. The continuation is a function that takes 
   2.122 +  [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi],
   2.123 +  each of which has already been appropriately sized."
   2.124 +  [continuation]
   2.125 +  (let [byte-buffer (atom nil)
   2.126 +	renderer (atom nil)
   2.127 +        image (atom nil)]
   2.128 +  (proxy [SceneProcessor] []
   2.129 +    (initialize
   2.130 +     [renderManager viewPort]
   2.131 +     (let [cam (.getCamera viewPort)
   2.132 +	   width (.getWidth cam)
   2.133 +	   height (.getHeight cam)]
   2.134 +       (reset! renderer (.getRenderer renderManager))
   2.135 +       (reset! byte-buffer
   2.136 +	     (BufferUtils/createByteBuffer
   2.137 +	      (* width height 4)))
   2.138 +        (reset! image (BufferedImage.
   2.139 +                      width height
   2.140 +                      BufferedImage/TYPE_4BYTE_ABGR))))
   2.141 +    (isInitialized [] (not (nil? @byte-buffer)))
   2.142 +    (reshape [_ _ _])
   2.143 +    (preFrame [_])
   2.144 +    (postQueue [_])
   2.145 +    (postFrame
   2.146 +     [#^FrameBuffer fb]
   2.147 +     (.clear @byte-buffer)
   2.148 +     (continuation @renderer fb @byte-buffer @image))
   2.149 +    (cleanup []))))
   2.150 +    #+end_src
   2.151 +    #+end_listing
   2.152 +
   2.153 +    The continuation function given to =vision-pipeline= above will be
   2.154 +    given a =Renderer= and three containers for image data. The
   2.155 +    =FrameBuffer= references the GPU image data, but the pixel data
   2.156 +    can not be used directly on the CPU. The =ByteBuffer= and
   2.157 +    =BufferedImage= are initially "empty" but are sized to hold the
   2.158 +    data in the =FrameBuffer=. I call transferring the GPU image data
   2.159 +    to the CPU structures "mixing" the image data.
   2.160 +
   2.161 +*** Optical sensor arrays are described with images and referenced with metadata
   2.162 +
   2.163 +    The vision pipeline described above handles the flow of rendered
   2.164 +    images. Now, =CORTEX= needs simulated eyes to serve as the source
   2.165 +    of these images.
   2.166 +
   2.167 +    An eye is described in blender in the same way as a joint. They
   2.168 +    are zero dimensional empty objects with no geometry whose local
   2.169 +    coordinate system determines the orientation of the resulting eye.
   2.170 +    All eyes are children of a parent node named "eyes" just as all
   2.171 +    joints have a parent named "joints". An eye binds to the nearest
   2.172 +    physical object with =bind-sense=.
   2.173 +
   2.174 +    #+caption: Here, the camera is created based on metadata on the
   2.175 +    #+caption: eye-node and attached to the nearest physical object 
   2.176 +    #+caption: with =bind-sense=
   2.177 +    #+name: add-eye
   2.178 +    #+begin_listing clojure
   2.179 +(defn add-eye!
   2.180 +  "Create a Camera centered on the current position of 'eye which
   2.181 +   follows the closest physical node in 'creature. The camera will
   2.182 +   point in the X direction and use the Z vector as up as determined
   2.183 +   by the rotation of these vectors in blender coordinate space. Use
   2.184 +   XZY rotation for the node in blender."
   2.185 +  [#^Node creature #^Spatial eye]
   2.186 +  (let [target (closest-node creature eye)
   2.187 +        [cam-width cam-height] 
   2.188 +        ;;[640 480] ;; graphics card on laptop doesn't support
   2.189 +                    ;; arbitray dimensions.
   2.190 +        (eye-dimensions eye)
   2.191 +        cam (Camera. cam-width cam-height)
   2.192 +        rot (.getWorldRotation eye)]
   2.193 +    (.setLocation cam (.getWorldTranslation eye))
   2.194 +    (.lookAtDirection
   2.195 +     cam                           ; this part is not a mistake and
   2.196 +     (.mult rot Vector3f/UNIT_X)   ; is consistent with using Z in
   2.197 +     (.mult rot Vector3f/UNIT_Y))  ; blender as the UP vector.
   2.198 +    (.setFrustumPerspective
   2.199 +     cam (float 45)
   2.200 +     (float (/ (.getWidth cam) (.getHeight cam)))
   2.201 +     (float 1)
   2.202 +     (float 1000))
   2.203 +    (bind-sense target cam) cam))
   2.204 +    #+end_listing
   2.205 +
   2.206 +*** Simulated Retina 
   2.207 +
   2.208 +    An eye is a surface (the retina) which contains many discrete
   2.209 +    sensors to detect light. These sensors can have different
   2.210 +    light-sensing properties. In humans, each discrete sensor is
   2.211 +    sensitive to red, blue, green, or gray. These different types of
   2.212 +    sensors can have different spatial distributions along the retina.
   2.213 +    In humans, there is a fovea in the center of the retina which has
   2.214 +    a very high density of color sensors, and a blind spot which has
   2.215 +    no sensors at all. Sensor density decreases in proportion to
   2.216 +    distance from the fovea.
   2.217 +
   2.218 +    I want to be able to model any retinal configuration, so my
   2.219 +    eye-nodes in blender contain metadata pointing to images that
   2.220 +    describe the precise position of the individual sensors using
   2.221 +    white pixels. The meta-data also describes the precise sensitivity
   2.222 +    to light that the sensors described in the image have. An eye can
   2.223 +    contain any number of these images. For example, the metadata for
   2.224 +    an eye might look like this:
   2.225 +
   2.226 +    #+begin_src clojure
   2.227 +{0xFF0000 "Models/test-creature/retina-small.png"}
   2.228 +    #+end_src
   2.229 +
   2.230 +    #+caption: An example retinal profile image. White pixels are 
   2.231 +    #+caption: photo-sensitive elements. The distribution of white 
   2.232 +    #+caption: pixels is denser in the middle and falls off at the 
   2.233 +    #+caption: edges and is inspired by the human retina.
   2.234 +    #+name: retina
   2.235 +    #+ATTR_LaTeX: :width 10cm
   2.236 +    [[./images/retina-small.png]]
   2.237 +
   2.238 +    Together, the number 0xFF0000 and the image image above describe
   2.239 +    the placement of red-sensitive sensory elements.
   2.240 +
   2.241 +    Meta-data to very crudely approximate a human eye might be
   2.242 +    something like this:
   2.243 +
   2.244 +    #+begin_src clojure
   2.245 +(let [retinal-profile "Models/test-creature/retina-small.png"]
   2.246 +  {0xFF0000 retinal-profile
   2.247 +   0x00FF00 retinal-profile
   2.248 +   0x0000FF retinal-profile
   2.249 +   0xFFFFFF retinal-profile})
   2.250 +    #+end_src
   2.251 +
   2.252 +    The numbers that serve as keys in the map determine a sensor's
   2.253 +    relative sensitivity to the channels red, green, and blue. These
   2.254 +    sensitivity values are packed into an integer in the order
   2.255 +    =|_|R|G|B|= in 8-bit fields. The RGB values of a pixel in the
   2.256 +    image are added together with these sensitivities as linear
   2.257 +    weights. Therefore, 0xFF0000 means sensitive to red only while
   2.258 +    0xFFFFFF means sensitive to all colors equally (gray).
   2.259 +
   2.260 +    #+caption: This is the core of vision in =CORTEX=. A given eye node 
   2.261 +    #+caption: is converted into a function that returns visual
   2.262 +    #+caption: information from the simulation.
   2.263 +    #+name: name
   2.264 +    #+begin_listing clojure
   2.265 +(defn vision-kernel
   2.266 +  "Returns a list of functions, each of which will return a color
   2.267 +   channel's worth of visual information when called inside a running
   2.268 +   simulation."
   2.269 +  [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}]
   2.270 +  (let [retinal-map (retina-sensor-profile eye)
   2.271 +        camera (add-eye! creature eye)
   2.272 +        vision-image
   2.273 +        (atom
   2.274 +         (BufferedImage. (.getWidth camera)
   2.275 +                         (.getHeight camera)
   2.276 +                         BufferedImage/TYPE_BYTE_BINARY))
   2.277 +        register-eye!
   2.278 +        (runonce
   2.279 +         (fn [world]
   2.280 +           (add-camera!
   2.281 +            world camera
   2.282 +            (let [counter  (atom 0)]
   2.283 +              (fn [r fb bb bi]
   2.284 +                (if (zero? (rem (swap! counter inc) (inc skip)))
   2.285 +                  (reset! vision-image
   2.286 +                          (BufferedImage! r fb bb bi))))))))]
   2.287 +     (vec
   2.288 +      (map
   2.289 +       (fn [[key image]]
   2.290 +         (let [whites (white-coordinates image)
   2.291 +               topology (vec (collapse whites))
   2.292 +               sensitivity (sensitivity-presets key key)]
   2.293 +           (attached-viewport.
   2.294 +            (fn [world]
   2.295 +              (register-eye! world)
   2.296 +              (vector
   2.297 +               topology
   2.298 +               (vec 
   2.299 +                (for [[x y] whites]
   2.300 +                  (pixel-sense 
   2.301 +                   sensitivity
   2.302 +                   (.getRGB @vision-image x y))))))
   2.303 +            register-eye!)))
   2.304 +         retinal-map))))
   2.305 +    #+end_listing
   2.306 +
   2.307 +    Note that since each of the functions generated by =vision-kernel=
   2.308 +    shares the same =register-eye!= function, the eye will be
   2.309 +    registered only once the first time any of the functions from the
   2.310 +    list returned by =vision-kernel= is called. Each of the functions
   2.311 +    returned by =vision-kernel= also allows access to the =Viewport=
   2.312 +    through which it receives images.
   2.313 +
   2.314 +    All the hard work has been done; all that remains is to apply
   2.315 +    =vision-kernel= to each eye in the creature and gather the results
   2.316 +    into one list of functions.
   2.317 +
   2.318 +
   2.319 +    #+caption: With =vision!=, =CORTEX= is already a fine simulation 
   2.320 +    #+caption: environment for experimenting with different types of 
   2.321 +    #+caption: eyes.
   2.322 +    #+name: vision!
   2.323 +    #+begin_listing clojure
   2.324 +(defn vision!
   2.325 +  "Returns a list of functions, each of which returns visual sensory
   2.326 +   data when called inside a running simulation."
   2.327 +  [#^Node creature & {skip :skip :or {skip 0}}]
   2.328 +  (reduce
   2.329 +   concat 
   2.330 +   (for [eye (eyes creature)]
   2.331 +     (vision-kernel creature eye))))
   2.332 +    #+end_listing
   2.333 +
   2.334 +
   2.335 +
   2.336 +
   2.337 +
   2.338  ** Hearing is hard; =CORTEX= does it right
   2.339  
   2.340  ** Touch uses hundreds of hair-like elements