view org/vision.org @ 214:01d3e9855ef9

saving progress, time to sleep.....
author Robert McIntyre <rlm@mit.edu>
date Thu, 09 Feb 2012 09:04:17 -0700
parents 319963720179
children f283c62bd212
line wrap: on
line source
1 #+title: Simulated Sense of Sight
2 #+author: Robert McIntyre
3 #+email: rlm@mit.edu
4 #+description: Simulated sight for AI research using JMonkeyEngine3 and clojure
5 #+keywords: computer vision, jMonkeyEngine3, clojure
6 #+SETUPFILE: ../../aurellem/org/setup.org
7 #+INCLUDE: ../../aurellem/org/level-0.org
8 #+babel: :mkdirp yes :noweb yes :exports both
10 * Vision
13 Vision is one of the most important senses for humans, so I need to
14 build a simulated sense of vision for my AI. I will do this with
15 simulated eyes. Each eye can be independely moved and should see its
16 own version of the world depending on where it is.
18 Making these simulated eyes a reality is fairly simple bacause
19 jMonkeyEngine already conatains extensive support for multiple views
20 of the same 3D simulated world. The reason jMonkeyEngine has this
21 support is because the support is necessary to create games with
22 split-screen views. Multiple views are also used to create efficient
23 pseudo-reflections by rendering the scene from a certain perspective
24 and then projecting it back onto a surface in the 3D world.
26 #+caption: jMonkeyEngine supports multiple views to enable split-screen games, like GoldenEye
27 [[../images/goldeneye-4-player.png]]
29 * Brief Description of jMonkeyEngine's Rendering Pipeline
31 jMonkeyEngine allows you to create a =ViewPort=, which represents a
32 view of the simulated world. You can create as many of these as you
33 want. Every frame, the =RenderManager= iterates through each
34 =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there
35 is a =FrameBuffer= which represents the rendered image in the GPU.
37 Each =ViewPort= can have any number of attached =SceneProcessor=
38 objects, which are called every time a new frame is rendered. A
39 =SceneProcessor= recieves a =FrameBuffer= and can do whatever it wants
40 to the data. Often this consists of invoking GPU specific operations
41 on the rendered image. The =SceneProcessor= can also copy the GPU
42 image data to RAM and process it with the CPU.
44 * The Vision Pipeline
46 Each eye in the simulated creature needs it's own =ViewPort= so that
47 it can see the world from its own perspective. To this =ViewPort=, I
48 add a =SceneProcessor= that feeds the visual data to any arbitray
49 continuation function for further processing. That continuation
50 function may perform both CPU and GPU operations on the data. To make
51 this easy for the continuation function, the =SceneProcessor=
52 maintains appropriatly sized buffers in RAM to hold the data. It does
53 not do any copying from the GPU to the CPU itself.
55 #+name: pipeline-1
56 #+begin_src clojure
57 (defn vision-pipeline
58 "Create a SceneProcessor object which wraps a vision processing
59 continuation function. The continuation is a function that takes
60 [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi],
61 each of which has already been appropiately sized."
62 [continuation]
63 (let [byte-buffer (atom nil)
64 renderer (atom nil)
65 image (atom nil)]
66 (proxy [SceneProcessor] []
67 (initialize
68 [renderManager viewPort]
69 (let [cam (.getCamera viewPort)
70 width (.getWidth cam)
71 height (.getHeight cam)]
72 (reset! renderer (.getRenderer renderManager))
73 (reset! byte-buffer
74 (BufferUtils/createByteBuffer
75 (* width height 4)))
76 (reset! image (BufferedImage.
77 width height
78 BufferedImage/TYPE_4BYTE_ABGR))))
79 (isInitialized [] (not (nil? @byte-buffer)))
80 (reshape [_ _ _])
81 (preFrame [_])
82 (postQueue [_])
83 (postFrame
84 [#^FrameBuffer fb]
85 (.clear @byte-buffer)
86 (continuation @renderer fb @byte-buffer @image))
87 (cleanup []))))
88 #+end_src
90 The continuation function given to =(vision-pipeline)= above will be
91 given a =Renderer= and three containers for image data. The
92 =FrameBuffer= references the GPU image data, but it can not be used
93 directly on the CPU. The =ByteBuffer= and =BufferedImage= are
94 initially "empty" but are sized to hold to data in the
95 =FrameBuffer=. I call transfering the GPU image data to the CPU
96 structures "mixing" the image data. I have provided three functions to
97 do this mixing.
99 #+name: pipeline-2
100 #+begin_src clojure
101 (defn frameBuffer->byteBuffer!
102 "Transfer the data in the graphics card (Renderer, FrameBuffer) to
103 the CPU (ByteBuffer)."
104 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb]
105 (.readFrameBuffer r fb bb) bb)
107 (defn byteBuffer->bufferedImage!
108 "Convert the C-style BGRA image data in the ByteBuffer bb to the AWT
109 style ABGR image data and place it in BufferedImage bi."
110 [#^ByteBuffer bb #^BufferedImage bi]
111 (Screenshots/convertScreenShot bb bi) bi)
113 (defn BufferedImage!
114 "Continuation which will grab the buffered image from the materials
115 provided by (vision-pipeline)."
116 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb #^BufferedImage bi]
117 (byteBuffer->bufferedImage!
118 (frameBuffer->byteBuffer! r fb bb) bi))
119 #+end_src
121 Note that it is possible to write vision processing algorithms
122 entirely in terms of =BufferedImage= inputs. Just compose that
123 =BufferedImage= algorithm with =(BufferedImage!)=. However, a vision
124 processing algorithm that is entirely hosted on the GPU does not have
125 to pay for this convienence.
127 * COMMENT asdasd
129 (vision creature) will take an optional :skip argument which will
130 inform the continuations in scene processor to skip the given
131 number of cycles 0 means that no cycles will be skipped.
133 (vision creature) will return [init-functions sensor-functions].
134 The init-functions are each single-arg functions that take the
135 world and register the cameras and must each be called before the
136 corresponding sensor-functions. Each init-function returns the
137 viewport for that eye which can be manipulated, saved, etc. Each
138 sensor-function is a thunk and will return data in the same
139 format as the tactile-sensor functions the structure is
140 [topology, sensor-data]. Internally, these sensor-functions
141 maintain a reference to sensor-data which is periodically updated
142 by the continuation function established by its init-function.
143 They can be queried every cycle, but their information may not
144 necessairly be different every cycle.
148 * Physical Eyes
150 The vision pipeline described above handles the flow of rendered
151 images. Now, we need simulated eyes to serve as the source of these
152 images.
154 An eye is described in blender in the same way as a joint. They are
155 zero dimensional empty objects with no geometry whose local coordinate
156 system determines the orientation of the resulting eye. All eyes are
157 childern of a parent node named "eyes" just as all joints have a
158 parent named "joints". An eye binds to the nearest physical object
159 with =(bind-sense=).
161 #+name: add-eye
162 #+begin_src clojure
163 (defn add-eye!
164 "Create a Camera centered on the current position of 'eye which
165 follows the closest physical node in 'creature and sends visual
166 data to 'continuation."
167 [#^Node creature #^Spatial eye]
168 (let [target (closest-node creature eye)
169 [cam-width cam-height] (eye-dimensions eye)
170 cam (Camera. cam-width cam-height)]
171 (.setLocation cam (.getWorldTranslation eye))
172 (.setRotation cam (.getWorldRotation eye))
173 (.setFrustumPerspective
174 cam 45 (/ (.getWidth cam) (.getHeight cam))
175 1 1000)
176 (bind-sense target cam)
177 cam))
178 #+end_src
180 Here, the camera is created based on metadata on the eye-node and
181 attached to the nearest physical object with =(bind-sense)=
184 ** The Retina
186 An eye is a surface (the retina) which contains many discrete sensors
187 to detect light. These sensors have can have different-light sensing
188 properties. In humans, each discrete sensor is sensitive to red,
189 blue, green, or gray. These different types of sensors can have
190 different spatial distributions along the retina. In humans, there is
191 a fovea in the center of the retina which has a very high density of
192 color sensors, and a blind spot which has no sensors at all. Sensor
193 density decreases in proportion to distance from the retina.
195 I want to be able to model any retinal configuration, so my eye-nodes
196 in blender contain metadata pointing to images that describe the
197 percise position of the individual sensors using white pixels. The
198 meta-data also describes the percise sensitivity to light that the
199 sensors described in the image have. An eye can contain any number of
200 these images. For example, the metadata for an eye might look like
201 this:
203 #+begin_src clojure
204 {0xFF0000 "Models/test-creature/retina-small.png"}
205 #+end_src
207 #+caption: The retinal profile image "Models/test-creature/retina-small.png". White pixels are photo-sensitive elements. The distribution of white pixels is denser in the middle and falls off at the edges and is inspired by the human retina.
208 [[../assets/Models/test-creature/retina-small.png]]
210 Together, the number 0xFF0000 and the image image above describe the
211 placement of red-sensitive sensory elements.
213 Meta-data to very crudely approximate a human eye might be something
214 like this:
216 #+begin_src clojure
217 (let [retinal-profile "Models/test-creature/retina-small.png"]
218 {0xFF0000 retinal-profile
219 0x00FF00 retinal-profile
220 0x0000FF retinal-profile
221 0xFFFFFF retinal-profile})
222 #+end_src
224 The numbers that serve as keys in the map determine a sensor's
225 relative sensitivity to the channels red, green, and blue. These
226 sensitivity values are packed into an integer in the order _RGB in
227 8-bit fields. The RGB values of a pixel in the image are added
228 together with these sensitivities as linear weights. Therfore,
229 0xFF0000 means sensitive to red only while 0xFFFFFF means sensitive to
230 all colors equally (gray).
232 For convienence I've defined a few symbols for the more common
233 sensitivity values.
235 #+name: sensitivity
236 #+begin_src clojure
237 (defvar sensitivity-presets
238 {:all 0xFFFFFF
239 :red 0xFF0000
240 :blue 0x0000FF
241 :green 0x00FF00}
242 "Retinal sensitivity presets for sensors that extract one channel
243 (:red :blue :green) or average all channels (:gray)")
244 #+end_src
246 ** Metadata Processing
248 =(retina-sensor-profile)= extracts a map from the eye-node in the same
249 format as the example maps above. =(eye-dimensions)= finds the
250 dimansions of the smallest image required to contain all the retinal
251 sensor maps.
253 #+begin_src clojure
254 (defn retina-sensor-profile
255 "Return a map of pixel sensitivity numbers to BufferedImages
256 describing the distribution of light-sensitive components of this
257 eye. :red, :green, :blue, :gray are already defined as extracting
258 the red, green, blue, and average components respectively."
259 [#^Spatial eye]
260 (if-let [eye-map (meta-data eye "eye")]
261 (map-vals
262 load-image
263 (eval (read-string eye-map)))))
265 (defn eye-dimensions
266 "Returns [width, height] specified in the metadata of the eye"
267 [#^Spatial eye]
268 (let [dimensions
269 (map #(vector (.getWidth %) (.getHeight %))
270 (vals (retina-sensor-profile eye)))]
271 [(apply max (map first dimensions))
272 (apply max (map second dimensions))]))
273 #+end_src
276 * Eye Creation
278 First off, get the children of the "eyes" empty node to find all the
279 eyes the creature has.
281 #+begin_src clojure
282 (defvar
283 ^{:arglists '([creature])}
284 eyes
285 (sense-nodes "eyes")
286 "Return the children of the creature's \"eyes\" node.")
287 #+end_src
289 Then,
291 #+begin_src clojure
292 (defn add-camera!
293 "Add a camera to the world, calling continuation on every frame
294 produced."
295 [#^Application world camera continuation]
296 (let [width (.getWidth camera)
297 height (.getHeight camera)
298 render-manager (.getRenderManager world)
299 viewport (.createMainView render-manager "eye-view" camera)]
300 (doto viewport
301 (.setClearFlags true true true)
302 (.setBackgroundColor ColorRGBA/Black)
303 (.addProcessor (vision-pipeline continuation))
304 (.attachScene (.getRootNode world)))))
310 (defn vision-fn
311 "Returns a list of functions, each of which will return a color
312 channel's worth of visual information when called inside a running
313 simulation."
314 [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}]
315 (let [retinal-map (retina-sensor-profile eye)
316 camera (add-eye! creature eye)
317 vision-image
318 (atom
319 (BufferedImage. (.getWidth camera)
320 (.getHeight camera)
321 BufferedImage/TYPE_BYTE_BINARY))
322 register-eye!
323 (runonce
324 (fn [world]
325 (add-camera!
326 world camera
327 (let [counter (atom 0)]
328 (fn [r fb bb bi]
329 (if (zero? (rem (swap! counter inc) (inc skip)))
330 (reset! vision-image
331 (BufferedImage! r fb bb bi))))))))]
332 (vec
333 (map
334 (fn [[key image]]
335 (let [whites (white-coordinates image)
336 topology (vec (collapse whites))
337 mask (color-channel-presets key key)]
338 (fn [world]
339 (register-eye! world)
340 (vector
341 topology
342 (vec
343 (for [[x y] whites]
344 (bit-and
345 mask (.getRGB @vision-image x y))))))))
346 retinal-map))))
349 ;; TODO maybe should add a viewport-manipulation function to
350 ;; automatically change viewport settings, attach shadow filters, etc.
352 (defn vision!
353 "Returns a function which returns visual sensory data when called
354 inside a running simulation"
355 [#^Node creature & {skip :skip :or {skip 0}}]
356 (reduce
357 concat
358 (for [eye (eyes creature)]
359 (vision-fn creature eye))))
361 (defn view-vision
362 "Creates a function which accepts a list of visual sensor-data and
363 displays each element of the list to the screen."
364 []
365 (view-sense
366 (fn
367 [[coords sensor-data]]
368 (let [image (points->image coords)]
369 (dorun
370 (for [i (range (count coords))]
371 (.setRGB image ((coords i) 0) ((coords i) 1)
372 (sensor-data i))))
373 image))))
375 #+end_src
378 Note the use of continuation passing style for connecting the eye to a
379 function to process the output. You can create any number of eyes, and
380 each of them will see the world from their own =Camera=. Once every
381 frame, the rendered image is copied to a =BufferedImage=, and that
382 data is sent off to the continuation function. Moving the =Camera=
383 which was used to create the eye will change what the eye sees.
385 * Example
387 #+name: test-vision
388 #+begin_src clojure
389 (ns cortex.test.vision
390 (:use (cortex world util vision))
391 (:import java.awt.image.BufferedImage)
392 (:import javax.swing.JPanel)
393 (:import javax.swing.SwingUtilities)
394 (:import java.awt.Dimension)
395 (:import javax.swing.JFrame)
396 (:import com.jme3.math.ColorRGBA)
397 (:import com.jme3.scene.Node)
398 (:import com.jme3.math.Vector3f))
400 (defn test-two-eyes
401 "Testing vision:
402 Tests the vision system by creating two views of the same rotating
403 object from different angles and displaying both of those views in
404 JFrames.
406 You should see a rotating cube, and two windows,
407 each displaying a different view of the cube."
408 []
409 (let [candy
410 (box 1 1 1 :physical? false :color ColorRGBA/Blue)]
411 (world
412 (doto (Node.)
413 (.attachChild candy))
414 {}
415 (fn [world]
416 (let [cam (.clone (.getCamera world))
417 width (.getWidth cam)
418 height (.getHeight cam)]
419 (add-camera! world cam
420 ;;no-op
421 (comp (view-image) BufferedImage!)
422 )
423 (add-camera! world
424 (doto (.clone cam)
425 (.setLocation (Vector3f. -10 0 0))
426 (.lookAt Vector3f/ZERO Vector3f/UNIT_Y))
427 ;;no-op
428 (comp (view-image) BufferedImage!))
429 ;; This is here to restore the main view
430 ;; after the other views have completed processing
431 (add-camera! world (.getCamera world) no-op)))
432 (fn [world tpf]
433 (.rotate candy (* tpf 0.2) 0 0)))))
434 #+end_src
436 #+name: vision-header
437 #+begin_src clojure
438 (ns cortex.vision
439 "Simulate the sense of vision in jMonkeyEngine3. Enables multiple
440 eyes from different positions to observe the same world, and pass
441 the observed data to any arbitray function. Automatically reads
442 eye-nodes from specially prepared blender files and instanttiates
443 them in the world as actual eyes."
444 {:author "Robert McIntyre"}
445 (:use (cortex world sense util))
446 (:use clojure.contrib.def)
447 (:import com.jme3.post.SceneProcessor)
448 (:import (com.jme3.util BufferUtils Screenshots))
449 (:import java.nio.ByteBuffer)
450 (:import java.awt.image.BufferedImage)
451 (:import (com.jme3.renderer ViewPort Camera))
452 (:import com.jme3.math.ColorRGBA)
453 (:import com.jme3.renderer.Renderer)
454 (:import com.jme3.app.Application)
455 (:import com.jme3.texture.FrameBuffer)
456 (:import (com.jme3.scene Node Spatial)))
457 #+end_src
459 The example code will create two videos of the same rotating object
460 from different angles. It can be used both for stereoscopic vision
461 simulation or for simulating multiple creatures, each with their own
462 sense of vision.
464 - As a neat bonus, this idea behind simulated vision also enables one
465 to [[../../cortex/html/capture-video.html][capture live video feeds from jMonkeyEngine]].
468 * COMMENT Generate Source
469 #+begin_src clojure :tangle ../src/cortex/vision.clj
470 <<eyes>>
471 #+end_src
473 #+begin_src clojure :tangle ../src/cortex/test/vision.clj
474 <<test-vision>>
475 #+end_src