annotate org/vision.org @ 213:319963720179

fleshing out vision
author Robert McIntyre <rlm@mit.edu>
date Thu, 09 Feb 2012 08:11:10 -0700
parents 8e9825c38941
children 01d3e9855ef9
rev   line source
rlm@34 1 #+title: Simulated Sense of Sight
rlm@23 2 #+author: Robert McIntyre
rlm@23 3 #+email: rlm@mit.edu
rlm@38 4 #+description: Simulated sight for AI research using JMonkeyEngine3 and clojure
rlm@34 5 #+keywords: computer vision, jMonkeyEngine3, clojure
rlm@23 6 #+SETUPFILE: ../../aurellem/org/setup.org
rlm@23 7 #+INCLUDE: ../../aurellem/org/level-0.org
rlm@23 8 #+babel: :mkdirp yes :noweb yes :exports both
rlm@23 9
rlm@194 10 * Vision
rlm@23 11
rlm@151 12
rlm@212 13 Vision is one of the most important senses for humans, so I need to
rlm@212 14 build a simulated sense of vision for my AI. I will do this with
rlm@212 15 simulated eyes. Each eye can be independely moved and should see its
rlm@212 16 own version of the world depending on where it is.
rlm@212 17
rlm@212 18 Making these simulated eyes a reality is fairly simple bacause
rlm@212 19 jMonkeyEngine already conatains extensive support for multiple views
rlm@212 20 of the same 3D simulated world. The reason jMonkeyEngine has this
rlm@212 21 support is because the support is necessary to create games with
rlm@212 22 split-screen views. Multiple views are also used to create efficient
rlm@212 23 pseudo-reflections by rendering the scene from a certain perspective
rlm@212 24 and then projecting it back onto a surface in the 3D world.
rlm@212 25
rlm@212 26 #+caption: jMonkeyEngine supports multiple views to enable split-screen games, like GoldenEye
rlm@212 27 [[../images/goldeneye-4-player.png]]
rlm@212 28
rlm@213 29 * Brief Description of jMonkeyEngine's Rendering Pipeline
rlm@212 30
rlm@213 31 jMonkeyEngine allows you to create a =ViewPort=, which represents a
rlm@213 32 view of the simulated world. You can create as many of these as you
rlm@213 33 want. Every frame, the =RenderManager= iterates through each
rlm@213 34 =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there
rlm@213 35 is a =FrameBuffer= which represents the rendered image in the GPU.
rlm@151 36
rlm@213 37 Each =ViewPort= can have any number of attached =SceneProcessor=
rlm@213 38 objects, which are called every time a new frame is rendered. A
rlm@213 39 =SceneProcessor= recieves a =FrameBuffer= and can do whatever it wants
rlm@213 40 to the data. Often this consists of invoking GPU specific operations
rlm@213 41 on the rendered image. The =SceneProcessor= can also copy the GPU
rlm@213 42 image data to RAM and process it with the CPU.
rlm@151 43
rlm@213 44 * The Vision Pipeline
rlm@151 45
rlm@213 46 Each eye in the simulated creature needs it's own =ViewPort= so that
rlm@213 47 it can see the world from its own perspective. To this =ViewPort=, I
rlm@213 48 add a =SceneProcessor= that feeds the visual data to any arbitra
rlm@213 49 continuation function for further processing. That continuation
rlm@213 50 function may perform both CPU and GPU operations on the data. To make
rlm@213 51 this easy for the continuation function, the =SceneProcessor=
rlm@213 52 maintains appropriatly sized buffers in RAM to hold the data. It does
rlm@213 53 not do any copying from the GPU to the CPU itself.
rlm@213 54 #+name: pipeline-1
rlm@213 55 #+begin_src clojure
rlm@113 56 (defn vision-pipeline
rlm@34 57 "Create a SceneProcessor object which wraps a vision processing
rlm@113 58 continuation function. The continuation is a function that takes
rlm@113 59 [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi],
rlm@113 60 each of which has already been appropiately sized."
rlm@23 61 [continuation]
rlm@23 62 (let [byte-buffer (atom nil)
rlm@113 63 renderer (atom nil)
rlm@113 64 image (atom nil)]
rlm@23 65 (proxy [SceneProcessor] []
rlm@23 66 (initialize
rlm@23 67 [renderManager viewPort]
rlm@23 68 (let [cam (.getCamera viewPort)
rlm@23 69 width (.getWidth cam)
rlm@23 70 height (.getHeight cam)]
rlm@23 71 (reset! renderer (.getRenderer renderManager))
rlm@23 72 (reset! byte-buffer
rlm@23 73 (BufferUtils/createByteBuffer
rlm@113 74 (* width height 4)))
rlm@113 75 (reset! image (BufferedImage.
rlm@113 76 width height
rlm@113 77 BufferedImage/TYPE_4BYTE_ABGR))))
rlm@23 78 (isInitialized [] (not (nil? @byte-buffer)))
rlm@23 79 (reshape [_ _ _])
rlm@23 80 (preFrame [_])
rlm@23 81 (postQueue [_])
rlm@23 82 (postFrame
rlm@23 83 [#^FrameBuffer fb]
rlm@23 84 (.clear @byte-buffer)
rlm@113 85 (continuation @renderer fb @byte-buffer @image))
rlm@23 86 (cleanup []))))
rlm@213 87 #+end_src
rlm@213 88
rlm@213 89 The continuation function given to =(vision-pipeline)= above will be
rlm@213 90 given a =Renderer= and three containers for image data. The
rlm@213 91 =FrameBuffer= references the GPU image data, but it can not be used
rlm@213 92 directly on the CPU. The =ByteBuffer= and =BufferedImage= are
rlm@213 93 initially "empty" but are sized to hold to data in the
rlm@213 94 =FrameBuffer=. I call transfering the GPU image data to the CPU
rlm@213 95 structures "mixing" the image data. I have provided three functions to
rlm@213 96 do this mixing.
rlm@213 97
rlm@213 98 #+name: pipeline-2
rlm@213 99 #+begin_src clojure
rlm@113 100 (defn frameBuffer->byteBuffer!
rlm@113 101 "Transfer the data in the graphics card (Renderer, FrameBuffer) to
rlm@113 102 the CPU (ByteBuffer)."
rlm@113 103 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb]
rlm@113 104 (.readFrameBuffer r fb bb) bb)
rlm@113 105
rlm@113 106 (defn byteBuffer->bufferedImage!
rlm@113 107 "Convert the C-style BGRA image data in the ByteBuffer bb to the AWT
rlm@113 108 style ABGR image data and place it in BufferedImage bi."
rlm@113 109 [#^ByteBuffer bb #^BufferedImage bi]
rlm@113 110 (Screenshots/convertScreenShot bb bi) bi)
rlm@113 111
rlm@113 112 (defn BufferedImage!
rlm@113 113 "Continuation which will grab the buffered image from the materials
rlm@113 114 provided by (vision-pipeline)."
rlm@113 115 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb #^BufferedImage bi]
rlm@113 116 (byteBuffer->bufferedImage!
rlm@113 117 (frameBuffer->byteBuffer! r fb bb) bi))
rlm@213 118 #+end_src
rlm@112 119
rlm@213 120 Note that it is possible to write vision processing algorithms
rlm@213 121 entirely in terms of =BufferedImage= inputs. Just compose that
rlm@213 122 =BufferedImage= algorithm with =(BufferedImage!)=. However, a vision
rlm@213 123 processing algorithm that is entirely hosted on the GPU does not have
rlm@213 124 to pay for this convienence.
rlm@213 125
rlm@213 126
rlm@213 127 * Physical Eyes
rlm@213 128
rlm@213 129 The vision pipeline described above only deals with
rlm@213 130 Each eye in the creature in blender will work the same way as
rlm@213 131 joints -- a zero dimensional object with no geometry whose local
rlm@213 132 coordinate system determines the orientation of the resulting
rlm@213 133 eye. All eyes will have a parent named "eyes" just as all joints
rlm@213 134 have a parent named "joints". The resulting camera will be a
rlm@213 135 ChaseCamera or a CameraNode bound to the geo that is closest to
rlm@213 136 the eye marker. The eye marker will contain the metadata for the
rlm@213 137 eye, and will be moved by it's bound geometry. The dimensions of
rlm@213 138 the eye's camera are equal to the dimensions of the eye's "UV"
rlm@213 139 map.
rlm@213 140
rlm@213 141 (vision creature) will take an optional :skip argument which will
rlm@213 142 inform the continuations in scene processor to skip the given
rlm@213 143 number of cycles 0 means that no cycles will be skipped.
rlm@213 144
rlm@213 145 (vision creature) will return [init-functions sensor-functions].
rlm@213 146 The init-functions are each single-arg functions that take the
rlm@213 147 world and register the cameras and must each be called before the
rlm@213 148 corresponding sensor-functions. Each init-function returns the
rlm@213 149 viewport for that eye which can be manipulated, saved, etc. Each
rlm@213 150 sensor-function is a thunk and will return data in the same
rlm@213 151 format as the tactile-sensor functions the structure is
rlm@213 152 [topology, sensor-data]. Internally, these sensor-functions
rlm@213 153 maintain a reference to sensor-data which is periodically updated
rlm@213 154 by the continuation function established by its init-function.
rlm@213 155 They can be queried every cycle, but their information may not
rlm@213 156 necessairly be different every cycle.
rlm@213 157
rlm@213 158
rlm@213 159 #+begin_src clojure
rlm@169 160 (defn add-camera!
rlm@169 161 "Add a camera to the world, calling continuation on every frame
rlm@34 162 produced."
rlm@167 163 [#^Application world camera continuation]
rlm@23 164 (let [width (.getWidth camera)
rlm@23 165 height (.getHeight camera)
rlm@23 166 render-manager (.getRenderManager world)
rlm@23 167 viewport (.createMainView render-manager "eye-view" camera)]
rlm@23 168 (doto viewport
rlm@23 169 (.setClearFlags true true true)
rlm@112 170 (.setBackgroundColor ColorRGBA/Black)
rlm@113 171 (.addProcessor (vision-pipeline continuation))
rlm@23 172 (.attachScene (.getRootNode world)))))
rlm@151 173
rlm@169 174 (defn retina-sensor-profile
rlm@151 175 "Return a map of pixel selection functions to BufferedImages
rlm@169 176 describing the distribution of light-sensitive components of this
rlm@169 177 eye. Each function creates an integer from the rgb values found in
rlm@169 178 the pixel. :red, :green, :blue, :gray are already defined as
rlm@169 179 extracting the red, green, blue, and average components
rlm@151 180 respectively."
rlm@151 181 [#^Spatial eye]
rlm@151 182 (if-let [eye-map (meta-data eye "eye")]
rlm@151 183 (map-vals
rlm@167 184 load-image
rlm@151 185 (eval (read-string eye-map)))))
rlm@151 186
rlm@151 187 (defn eye-dimensions
rlm@169 188 "Returns [width, height] specified in the metadata of the eye"
rlm@151 189 [#^Spatial eye]
rlm@151 190 (let [dimensions
rlm@151 191 (map #(vector (.getWidth %) (.getHeight %))
rlm@169 192 (vals (retina-sensor-profile eye)))]
rlm@151 193 [(apply max (map first dimensions))
rlm@151 194 (apply max (map second dimensions))]))
rlm@151 195
rlm@167 196 (defvar
rlm@167 197 ^{:arglists '([creature])}
rlm@167 198 eyes
rlm@167 199 (sense-nodes "eyes")
rlm@167 200 "Return the children of the creature's \"eyes\" node.")
rlm@151 201
rlm@169 202 (defn add-eye!
rlm@169 203 "Create a Camera centered on the current position of 'eye which
rlm@169 204 follows the closest physical node in 'creature and sends visual
rlm@169 205 data to 'continuation."
rlm@151 206 [#^Node creature #^Spatial eye]
rlm@151 207 (let [target (closest-node creature eye)
rlm@151 208 [cam-width cam-height] (eye-dimensions eye)
rlm@151 209 cam (Camera. cam-width cam-height)]
rlm@151 210 (.setLocation cam (.getWorldTranslation eye))
rlm@151 211 (.setRotation cam (.getWorldRotation eye))
rlm@151 212 (.setFrustumPerspective
rlm@151 213 cam 45 (/ (.getWidth cam) (.getHeight cam))
rlm@151 214 1 1000)
rlm@151 215 (bind-sense target cam)
rlm@151 216 cam))
rlm@151 217
rlm@172 218 (defvar color-channel-presets
rlm@151 219 {:all 0xFFFFFF
rlm@151 220 :red 0xFF0000
rlm@151 221 :blue 0x0000FF
rlm@172 222 :green 0x00FF00}
rlm@172 223 "Bitmasks for common RGB color channels")
rlm@151 224
rlm@169 225 (defn vision-fn
rlm@171 226 "Returns a list of functions, each of which will return a color
rlm@171 227 channel's worth of visual information when called inside a running
rlm@171 228 simulation."
rlm@151 229 [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}]
rlm@169 230 (let [retinal-map (retina-sensor-profile eye)
rlm@169 231 camera (add-eye! creature eye)
rlm@151 232 vision-image
rlm@151 233 (atom
rlm@151 234 (BufferedImage. (.getWidth camera)
rlm@151 235 (.getHeight camera)
rlm@170 236 BufferedImage/TYPE_BYTE_BINARY))
rlm@170 237 register-eye!
rlm@170 238 (runonce
rlm@170 239 (fn [world]
rlm@170 240 (add-camera!
rlm@170 241 world camera
rlm@170 242 (let [counter (atom 0)]
rlm@170 243 (fn [r fb bb bi]
rlm@170 244 (if (zero? (rem (swap! counter inc) (inc skip)))
rlm@170 245 (reset! vision-image
rlm@170 246 (BufferedImage! r fb bb bi))))))))]
rlm@151 247 (vec
rlm@151 248 (map
rlm@151 249 (fn [[key image]]
rlm@151 250 (let [whites (white-coordinates image)
rlm@151 251 topology (vec (collapse whites))
rlm@172 252 mask (color-channel-presets key)]
rlm@170 253 (fn [world]
rlm@170 254 (register-eye! world)
rlm@151 255 (vector
rlm@151 256 topology
rlm@151 257 (vec
rlm@151 258 (for [[x y] whites]
rlm@151 259 (bit-and
rlm@151 260 mask (.getRGB @vision-image x y))))))))
rlm@170 261 retinal-map))))
rlm@151 262
rlm@170 263
rlm@170 264 ;; TODO maybe should add a viewport-manipulation function to
rlm@170 265 ;; automatically change viewport settings, attach shadow filters, etc.
rlm@170 266
rlm@170 267 (defn vision!
rlm@170 268 "Returns a function which returns visual sensory data when called
rlm@170 269 inside a running simulation"
rlm@151 270 [#^Node creature & {skip :skip :or {skip 0}}]
rlm@151 271 (reduce
rlm@170 272 concat
rlm@167 273 (for [eye (eyes creature)]
rlm@169 274 (vision-fn creature eye))))
rlm@151 275
rlm@189 276 (defn view-vision
rlm@189 277 "Creates a function which accepts a list of visual sensor-data and
rlm@189 278 displays each element of the list to the screen."
rlm@189 279 []
rlm@188 280 (view-sense
rlm@188 281 (fn
rlm@188 282 [[coords sensor-data]]
rlm@188 283 (let [image (points->image coords)]
rlm@188 284 (dorun
rlm@188 285 (for [i (range (count coords))]
rlm@188 286 (.setRGB image ((coords i) 0) ((coords i) 1)
rlm@188 287 (sensor-data i))))
rlm@189 288 image))))
rlm@188 289
rlm@34 290 #+end_src
rlm@23 291
rlm@112 292
rlm@34 293 Note the use of continuation passing style for connecting the eye to a
rlm@34 294 function to process the output. You can create any number of eyes, and
rlm@34 295 each of them will see the world from their own =Camera=. Once every
rlm@34 296 frame, the rendered image is copied to a =BufferedImage=, and that
rlm@34 297 data is sent off to the continuation function. Moving the =Camera=
rlm@34 298 which was used to create the eye will change what the eye sees.
rlm@23 299
rlm@34 300 * Example
rlm@23 301
rlm@66 302 #+name: test-vision
rlm@23 303 #+begin_src clojure
rlm@68 304 (ns cortex.test.vision
rlm@34 305 (:use (cortex world util vision))
rlm@34 306 (:import java.awt.image.BufferedImage)
rlm@34 307 (:import javax.swing.JPanel)
rlm@34 308 (:import javax.swing.SwingUtilities)
rlm@34 309 (:import java.awt.Dimension)
rlm@34 310 (:import javax.swing.JFrame)
rlm@34 311 (:import com.jme3.math.ColorRGBA)
rlm@45 312 (:import com.jme3.scene.Node)
rlm@113 313 (:import com.jme3.math.Vector3f))
rlm@23 314
rlm@36 315 (defn test-two-eyes
rlm@69 316 "Testing vision:
rlm@69 317 Tests the vision system by creating two views of the same rotating
rlm@69 318 object from different angles and displaying both of those views in
rlm@69 319 JFrames.
rlm@69 320
rlm@69 321 You should see a rotating cube, and two windows,
rlm@69 322 each displaying a different view of the cube."
rlm@36 323 []
rlm@58 324 (let [candy
rlm@58 325 (box 1 1 1 :physical? false :color ColorRGBA/Blue)]
rlm@112 326 (world
rlm@112 327 (doto (Node.)
rlm@112 328 (.attachChild candy))
rlm@112 329 {}
rlm@112 330 (fn [world]
rlm@112 331 (let [cam (.clone (.getCamera world))
rlm@112 332 width (.getWidth cam)
rlm@112 333 height (.getHeight cam)]
rlm@169 334 (add-camera! world cam
rlm@113 335 ;;no-op
rlm@113 336 (comp (view-image) BufferedImage!)
rlm@112 337 )
rlm@169 338 (add-camera! world
rlm@112 339 (doto (.clone cam)
rlm@112 340 (.setLocation (Vector3f. -10 0 0))
rlm@112 341 (.lookAt Vector3f/ZERO Vector3f/UNIT_Y))
rlm@113 342 ;;no-op
rlm@113 343 (comp (view-image) BufferedImage!))
rlm@112 344 ;; This is here to restore the main view
rlm@112 345 ;; after the other views have completed processing
rlm@169 346 (add-camera! world (.getCamera world) no-op)))
rlm@112 347 (fn [world tpf]
rlm@112 348 (.rotate candy (* tpf 0.2) 0 0)))))
rlm@23 349 #+end_src
rlm@23 350
rlm@213 351 #+name: vision-header
rlm@213 352 #+begin_src clojure
rlm@213 353 (ns cortex.vision
rlm@213 354 "Simulate the sense of vision in jMonkeyEngine3. Enables multiple
rlm@213 355 eyes from different positions to observe the same world, and pass
rlm@213 356 the observed data to any arbitray function. Automatically reads
rlm@213 357 eye-nodes from specially prepared blender files and instanttiates
rlm@213 358 them in the world as actual eyes."
rlm@213 359 {:author "Robert McIntyre"}
rlm@213 360 (:use (cortex world sense util))
rlm@213 361 (:use clojure.contrib.def)
rlm@213 362 (:import com.jme3.post.SceneProcessor)
rlm@213 363 (:import (com.jme3.util BufferUtils Screenshots))
rlm@213 364 (:import java.nio.ByteBuffer)
rlm@213 365 (:import java.awt.image.BufferedImage)
rlm@213 366 (:import (com.jme3.renderer ViewPort Camera))
rlm@213 367 (:import com.jme3.math.ColorRGBA)
rlm@213 368 (:import com.jme3.renderer.Renderer)
rlm@213 369 (:import com.jme3.app.Application)
rlm@213 370 (:import com.jme3.texture.FrameBuffer)
rlm@213 371 (:import (com.jme3.scene Node Spatial)))
rlm@213 372 #+end_src
rlm@112 373
rlm@34 374 The example code will create two videos of the same rotating object
rlm@34 375 from different angles. It can be used both for stereoscopic vision
rlm@34 376 simulation or for simulating multiple creatures, each with their own
rlm@34 377 sense of vision.
rlm@24 378
rlm@35 379 - As a neat bonus, this idea behind simulated vision also enables one
rlm@35 380 to [[../../cortex/html/capture-video.html][capture live video feeds from jMonkeyEngine]].
rlm@35 381
rlm@24 382
rlm@212 383 * COMMENT Generate Source
rlm@34 384 #+begin_src clojure :tangle ../src/cortex/vision.clj
rlm@24 385 <<eyes>>
rlm@24 386 #+end_src
rlm@24 387
rlm@68 388 #+begin_src clojure :tangle ../src/cortex/test/vision.clj
rlm@24 389 <<test-vision>>
rlm@24 390 #+end_src