Mercurial > cortex

     1 #+title: Simulated Sense of Sight

     2 #+author: Robert McIntyre

     3 #+email: rlm@mit.edu

     4 #+description: Simulated sight for AI research using JMonkeyEngine3 and clojure

     5 #+keywords: computer vision, jMonkeyEngine3, clojure

     6 #+SETUPFILE: ../../aurellem/org/setup.org

     7 #+INCLUDE: ../../aurellem/org/level-0.org

     8 #+babel: :mkdirp yes :noweb yes :exports both

     9 

    10 * Vision

    11 

    12 

    13 Vision is one of the most important senses for humans, so I need to

    14 build a simulated sense of vision for my AI. I will do this with

    15 simulated eyes. Each eye can be independely moved and should see its

    16 own version of the world depending on where it is.

    17 

    18 Making these simulated eyes a reality is fairly simple bacause

    19 jMonkeyEngine already conatains extensive support for multiple views

    20 of the same 3D simulated world. The reason jMonkeyEngine has this

    21 support is because the support is necessary to create games with

    22 split-screen views. Multiple views are also used to create efficient

    23 pseudo-reflections by rendering the scene from a certain perspective

    24 and then projecting it back onto a surface in the 3D world.

    25 

    26 #+caption: jMonkeyEngine supports multiple views to enable split-screen games, like GoldenEye

    27 [[../images/goldeneye-4-player.png]]

    28 

    29 * Brief Description of jMonkeyEngine's Rendering Pipeline

    30 

    31 jMonkeyEngine allows you to create a =ViewPort=, which represents a

    32 view of the simulated world. You can create as many of these as you

    33 want. Every frame, the =RenderManager= iterates through each

    34 =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there

    35 is a =FrameBuffer= which represents the rendered image in the GPU.

    36 

    37 Each =ViewPort= can have any number of attached =SceneProcessor=

    38 objects, which are called every time a new frame is rendered. A

    39 =SceneProcessor= recieves a =FrameBuffer= and can do whatever it wants

    40 to the data.  Often this consists of invoking GPU specific operations

    41 on the rendered image.  The =SceneProcessor= can also copy the GPU

    42 image data to RAM and process it with the CPU.

    43 

    44 * The Vision Pipeline

    45 

    46 Each eye in the simulated creature needs it's own =ViewPort= so that

    47 it can see the world from its own perspective. To this =ViewPort=, I

    48 add a =SceneProcessor= that feeds the visual data to any arbitray

    49 continuation function for further processing.  That continuation

    50 function may perform both CPU and GPU operations on the data. To make

    51 this easy for the continuation function, the =SceneProcessor=

    52 maintains appropriatly sized buffers in RAM to hold the data.  It does

    53 not do any copying from the GPU to the CPU itself.

    54 

    55 #+name: pipeline-1

    56 #+begin_src clojure

    57 (defn vision-pipeline

    58   "Create a SceneProcessor object which wraps a vision processing

    59   continuation function. The continuation is a function that takes 

    60   [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi],

    61   each of which has already been appropiately sized."

    62   [continuation]

    63   (let [byte-buffer (atom nil)

    64 	renderer (atom nil)

    65         image (atom nil)]

    66   (proxy [SceneProcessor] []

    67     (initialize

    68      [renderManager viewPort]

    69      (let [cam (.getCamera viewPort)

    70 	   width (.getWidth cam)

    71 	   height (.getHeight cam)]

    72        (reset! renderer (.getRenderer renderManager))

    73        (reset! byte-buffer

    74 	     (BufferUtils/createByteBuffer

    75 	      (* width height 4)))

    76         (reset! image (BufferedImage.

    77                       width height

    78                       BufferedImage/TYPE_4BYTE_ABGR))))

    79     (isInitialized [] (not (nil? @byte-buffer)))

    80     (reshape [_ _ _])

    81     (preFrame [_])

    82     (postQueue [_])

    83     (postFrame

    84      [#^FrameBuffer fb]

    85      (.clear @byte-buffer)

    86      (continuation @renderer fb @byte-buffer @image))

    87     (cleanup []))))

    88 #+end_src

    89 

    90 The continuation function given to =(vision-pipeline)= above will be

    91 given a =Renderer= and three containers for image data. The

    92 =FrameBuffer= references the GPU image data, but it can not be used

    93 directly on the CPU.  The =ByteBuffer= and =BufferedImage= are

    94 initially "empty" but are sized to hold to data in the

    95 =FrameBuffer=. I call transfering the GPU image data to the CPU

    96 structures "mixing" the image data. I have provided three functions to

    97 do this mixing.

    98 

    99 #+name: pipeline-2

   100 #+begin_src clojure

   101 (defn frameBuffer->byteBuffer!

   102   "Transfer the data in the graphics card (Renderer, FrameBuffer) to

   103    the CPU (ByteBuffer)."  

   104   [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb]

   105   (.readFrameBuffer r fb bb) bb)

   106 

   107 (defn byteBuffer->bufferedImage!

   108   "Convert the C-style BGRA image data in the ByteBuffer bb to the AWT

   109    style ABGR image data and place it in BufferedImage bi."

   110   [#^ByteBuffer bb #^BufferedImage bi]

   111   (Screenshots/convertScreenShot bb bi) bi)

   112 

   113 (defn BufferedImage!

   114   "Continuation which will grab the buffered image from the materials

   115    provided by (vision-pipeline)."

   116   [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb #^BufferedImage bi]

   117   (byteBuffer->bufferedImage!

   118    (frameBuffer->byteBuffer! r fb bb) bi))

   119 #+end_src

   120 

   121 Note that it is possible to write vision processing algorithms

   122 entirely in terms of =BufferedImage= inputs. Just compose that

   123 =BufferedImage= algorithm with =(BufferedImage!)=. However, a vision

   124 processing algorithm that is entirely hosted on the GPU does not have

   125 to pay for this convienence.

   126 

   127 * COMMENT asdasd 

   128 

   129 (vision creature) will take an optional :skip argument which will

   130 inform the continuations in scene processor to skip the given

   131 number of cycles 0 means that no cycles will be skipped.

   132 

   133 (vision creature) will return [init-functions sensor-functions].

   134 The init-functions are each single-arg functions that take the

   135 world and register the cameras and must each be called before the

   136 corresponding sensor-functions.  Each init-function returns the

   137 viewport for that eye which can be manipulated, saved, etc. Each

   138 sensor-function is a thunk and will return data in the same

   139 format as the tactile-sensor functions the structure is

   140 [topology, sensor-data]. Internally, these sensor-functions

   141 maintain a reference to sensor-data which is periodically updated

   142 by the continuation function established by its init-function.

   143 They can be queried every cycle, but their information may not

   144 necessairly be different every cycle.

   145 

   146 

   147 

   148 * Physical Eyes

   149 

   150 The vision pipeline described above handles the flow of rendered

   151 images. Now, we need simulated eyes to serve as the source of these

   152 images. 

   153 

   154 An eye is described in blender in the same way as a joint. They are

   155 zero dimensional empty objects with no geometry whose local coordinate

   156 system determines the orientation of the resulting eye. All eyes are

   157 childern of a parent node named "eyes" just as all joints have a

   158 parent named "joints". An eye binds to the nearest physical object

   159 with =(bind-sense=).

   160 

   161 #+name: add-eye

   162 #+begin_src clojure

   163 (defn add-eye!

   164   "Create a Camera centered on the current position of 'eye which

   165    follows the closest physical node in 'creature and sends visual

   166    data to 'continuation."

   167   [#^Node creature #^Spatial eye]

   168   (let [target (closest-node creature eye)

   169         [cam-width cam-height] (eye-dimensions eye)

   170         cam (Camera. cam-width cam-height)]

   171     (.setLocation cam (.getWorldTranslation eye))

   172     (.setRotation cam (.getWorldRotation eye))

   173     (.setFrustumPerspective

   174      cam 45 (/ (.getWidth cam) (.getHeight cam))

   175      1 1000)

   176     (bind-sense target cam)

   177     cam))

   178 #+end_src

   179 

   180 Here, the camera is created based on metadata on the eye-node and

   181 attached to the nearest physical object with =(bind-sense)=

   182 

   183 

   184 ** The Retina

   185 

   186 An eye is a surface (the retina) which contains many discrete sensors

   187 to detect light. These sensors have can have different-light sensing

   188 properties.  In humans, each discrete sensor is sensitive to red,

   189 blue, green, or gray. These different types of sensors can have

   190 different spatial distributions along the retina. In humans, there is

   191 a fovea in the center of the retina which has a very high density of

   192 color sensors, and a blind spot which has no sensors at all. Sensor

   193 density decreases in proportion to distance from the retina.

   194 

   195 I want to be able to model any retinal configuration, so my eye-nodes

   196 in blender contain metadata pointing to images that describe the

   197 percise position of the individual sensors using white pixels. The

   198 meta-data also describes the percise sensitivity to light that the

   199 sensors described in the image have.  An eye can contain any number of

   200 these images. For example, the metadata for an eye might look like

   201 this:

   202 

   203 #+begin_src clojure

   204 {0xFF0000 "Models/test-creature/retina-small.png"}

   205 #+end_src

   206 

   207 #+caption: The retinal profile image "Models/test-creature/retina-small.png". White pixels are photo-sensitive elements. The distribution of white pixels is denser in the middle and falls off at the edges and is inspired by the human retina.

   208 [[../assets/Models/test-creature/retina-small.png]]

   209 

   210 Together, the number 0xFF0000 and the image image above describe the

   211 placement of red-sensitive sensory elements.

   212 

   213 Meta-data to very crudely approximate a human eye might be something

   214 like this:

   215 

   216 #+begin_src clojure

   217 (let [retinal-profile "Models/test-creature/retina-small.png"]

   218   {0xFF0000 retinal-profile

   219    0x00FF00 retinal-profile

   220    0x0000FF retinal-profile

   221    0xFFFFFF retinal-profile})

   222 #+end_src

   223 

   224 The numbers that serve as keys in the map determine a sensor's

   225 relative sensitivity to the channels red, green, and blue. These

   226 sensitivity values are packed into an integer in the order _RGB in

   227 8-bit fields. The RGB values of a pixel in the image are added

   228 together with these sensitivities as linear weights. Therfore,

   229 0xFF0000 means sensitive to red only while 0xFFFFFF means sensitive to

   230 all colors equally (gray).

   231 

   232 For convienence I've defined a few symbols for the more common

   233 sensitivity values.

   234 

   235 #+name: sensitivity

   236 #+begin_src clojure

   237 (defvar sensitivity-presets

   238   {:all    0xFFFFFF

   239    :red    0xFF0000

   240    :blue   0x0000FF

   241    :green  0x00FF00}

   242   "Retinal sensitivity presets for sensors that extract one channel

   243    (:red :blue :green) or average all channels (:gray)")

   244 #+end_src

   245 

   246 ** Metadata Processing

   247 

   248 =(retina-sensor-profile)= extracts a map from the eye-node in the same

   249 format as the example maps above.  =(eye-dimensions)= finds the

   250 dimansions of the smallest image required to contain all the retinal

   251 sensor maps.

   252 

   253 #+begin_src clojure

   254 (defn retina-sensor-profile

   255   "Return a map of pixel sensitivity numbers to BufferedImages

   256    describing the distribution of light-sensitive components of this

   257    eye. :red, :green, :blue, :gray are already defined as extracting

   258    the red, green, blue, and average components respectively."

   259    [#^Spatial eye]

   260    (if-let [eye-map (meta-data eye "eye")]

   261      (map-vals

   262       load-image

   263       (eval (read-string eye-map)))))

   264 

   265 (defn eye-dimensions

   266   "Returns [width, height] specified in the metadata of the eye"

   267   [#^Spatial eye]

   268   (let [dimensions

   269           (map #(vector (.getWidth %) (.getHeight %))

   270                (vals (retina-sensor-profile eye)))]

   271     [(apply max (map first dimensions))

   272      (apply max (map second dimensions))]))

   273 #+end_src

   274 

   275 

   276 * Eye Creation 

   277 

   278 First off, get the children of the "eyes" empty node to find all the

   279 eyes the creature has.

   280 

   281 #+begin_src clojure

   282 (defvar 

   283   ^{:arglists '([creature])}

   284   eyes

   285   (sense-nodes "eyes")

   286   "Return the children of the creature's \"eyes\" node.")

   287 #+end_src

   288 

   289 Then, 

   290 

   291 #+begin_src clojure

   292 (defn add-camera!

   293   "Add a camera to the world, calling continuation on every frame

   294   produced." 

   295   [#^Application world camera continuation]

   296   (let [width (.getWidth camera)

   297 	height (.getHeight camera)

   298 	render-manager (.getRenderManager world)

   299 	viewport (.createMainView render-manager "eye-view" camera)]

   300     (doto viewport

   301       (.setClearFlags true true true)

   302       (.setBackgroundColor ColorRGBA/Black)

   303       (.addProcessor (vision-pipeline continuation))

   304       (.attachScene (.getRootNode world)))))

   305 

   306 

   307 

   308 

   309 

   310 (defn vision-fn

   311   "Returns a list of functions, each of which will return a color

   312    channel's worth of visual information when called inside a running

   313    simulation."

   314   [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}]

   315   (let [retinal-map (retina-sensor-profile eye)

   316         camera (add-eye! creature eye)

   317         vision-image

   318         (atom

   319          (BufferedImage. (.getWidth camera)

   320                          (.getHeight camera)

   321                          BufferedImage/TYPE_BYTE_BINARY))

   322         register-eye!

   323         (runonce

   324          (fn [world]

   325            (add-camera!

   326             world camera

   327             (let [counter  (atom 0)]

   328               (fn [r fb bb bi]

   329                 (if (zero? (rem (swap! counter inc) (inc skip)))

   330                   (reset! vision-image

   331                           (BufferedImage! r fb bb bi))))))))]

   332      (vec

   333       (map

   334        (fn [[key image]]

   335          (let [whites (white-coordinates image)

   336                topology (vec (collapse whites))

   337                mask (color-channel-presets key key)]

   338            (fn [world]

   339              (register-eye! world)

   340              (vector

   341               topology

   342               (vec 

   343                (for [[x y] whites]

   344                  (bit-and

   345                   mask (.getRGB @vision-image x y))))))))

   346        retinal-map))))

   347 

   348 

   349 ;; TODO maybe should add a viewport-manipulation function to

   350 ;; automatically change viewport settings, attach shadow filters, etc.

   351 

   352 (defn vision!

   353   "Returns a function which returns visual sensory data when called

   354    inside a running simulation"

   355   [#^Node creature & {skip :skip :or {skip 0}}]

   356   (reduce

   357    concat 

   358    (for [eye (eyes creature)]

   359      (vision-fn creature eye))))

   360 

   361 (defn view-vision

   362   "Creates a function which accepts a list of visual sensor-data and

   363   displays each element of the list to the screen." 

   364   []

   365   (view-sense

   366    (fn 

   367      [[coords sensor-data]]

   368      (let [image (points->image coords)]

   369        (dorun

   370         (for [i (range (count coords))]

   371           (.setRGB image ((coords i) 0) ((coords i) 1)

   372                    (sensor-data i))))

   373        image))))

   374 

   375 #+end_src

   376 

   377 

   378 Note the use of continuation passing style for connecting the eye to a

   379 function to process the output. You can create any number of eyes, and

   380 each of them will see the world from their own =Camera=. Once every

   381 frame, the rendered image is copied to a =BufferedImage=, and that

   382 data is sent off to the continuation function. Moving the =Camera=

   383 which was used to create the eye will change what the eye sees.

   384 

   385 * Example

   386 

   387 #+name: test-vision

   388 #+begin_src clojure

   389 (ns cortex.test.vision

   390   (:use (cortex world util vision))

   391   (:import java.awt.image.BufferedImage)

   392   (:import javax.swing.JPanel)

   393   (:import javax.swing.SwingUtilities)

   394   (:import java.awt.Dimension)

   395   (:import javax.swing.JFrame)

   396   (:import com.jme3.math.ColorRGBA)

   397   (:import com.jme3.scene.Node)

   398   (:import com.jme3.math.Vector3f))

   399 

   400 (defn test-two-eyes

   401   "Testing vision:

   402    Tests the vision system by creating two views of the same rotating

   403    object from different angles and displaying both of those views in

   404    JFrames.

   405 

   406    You should see a rotating cube, and two windows,

   407    each displaying a different view of the cube."

   408   []

   409   (let [candy

   410         (box 1 1 1 :physical? false :color ColorRGBA/Blue)]

   411     (world

   412      (doto (Node.)

   413        (.attachChild candy))

   414      {}

   415      (fn [world]

   416        (let [cam (.clone (.getCamera world))

   417              width (.getWidth cam)

   418              height (.getHeight cam)]

   419          (add-camera! world cam 

   420                   ;;no-op

   421                   (comp (view-image) BufferedImage!)

   422                   )

   423          (add-camera! world

   424                   (doto (.clone cam)

   425                     (.setLocation (Vector3f. -10 0 0))

   426                     (.lookAt Vector3f/ZERO Vector3f/UNIT_Y))

   427                   ;;no-op

   428                   (comp (view-image) BufferedImage!))

   429          ;; This is here to restore the main view

   430          ;; after the other views have completed processing

   431          (add-camera! world (.getCamera world) no-op)))

   432      (fn [world tpf]

   433        (.rotate candy (* tpf 0.2) 0 0)))))

   434 #+end_src

   435 

   436 #+name: vision-header

   437 #+begin_src clojure 

   438 (ns cortex.vision

   439   "Simulate the sense of vision in jMonkeyEngine3. Enables multiple

   440   eyes from different positions to observe the same world, and pass

   441   the observed data to any arbitray function. Automatically reads

   442   eye-nodes from specially prepared blender files and instanttiates

   443   them in the world as actual eyes."

   444   {:author "Robert McIntyre"}

   445   (:use (cortex world sense util))

   446   (:use clojure.contrib.def)

   447   (:import com.jme3.post.SceneProcessor)

   448   (:import (com.jme3.util BufferUtils Screenshots))

   449   (:import java.nio.ByteBuffer)

   450   (:import java.awt.image.BufferedImage)

   451   (:import (com.jme3.renderer ViewPort Camera))

   452   (:import com.jme3.math.ColorRGBA)

   453   (:import com.jme3.renderer.Renderer)

   454   (:import com.jme3.app.Application)

   455   (:import com.jme3.texture.FrameBuffer)

   456   (:import (com.jme3.scene Node Spatial)))

   457 #+end_src

   458 

   459 The example code will create two videos of the same rotating object

   460 from different angles. It can be used both for stereoscopic vision

   461 simulation or for simulating multiple creatures, each with their own

   462 sense of vision.

   463 

   464 - As a neat bonus, this idea behind simulated vision also enables one

   465   to [[../../cortex/html/capture-video.html][capture live video feeds from jMonkeyEngine]].

   466 

   467 

   468 * COMMENT Generate Source

   469 #+begin_src clojure :tangle ../src/cortex/vision.clj

   470 <<eyes>>

   471 #+end_src

   472 

   473 #+begin_src clojure :tangle ../src/cortex/test/vision.clj

   474 <<test-vision>>

   475 #+end_src
author	Robert McIntyre <rlm@mit.edu>
date	Thu, 09 Feb 2012 09:04:17 -0700
parents	319963720179
children	f283c62bd212