view org/vision.org @ 213:319963720179

fleshing out vision
author Robert McIntyre <rlm@mit.edu>
date Thu, 09 Feb 2012 08:11:10 -0700
parents 8e9825c38941
children 01d3e9855ef9
line wrap: on
line source
1 #+title: Simulated Sense of Sight
2 #+author: Robert McIntyre
3 #+email: rlm@mit.edu
4 #+description: Simulated sight for AI research using JMonkeyEngine3 and clojure
5 #+keywords: computer vision, jMonkeyEngine3, clojure
6 #+SETUPFILE: ../../aurellem/org/setup.org
7 #+INCLUDE: ../../aurellem/org/level-0.org
8 #+babel: :mkdirp yes :noweb yes :exports both
10 * Vision
13 Vision is one of the most important senses for humans, so I need to
14 build a simulated sense of vision for my AI. I will do this with
15 simulated eyes. Each eye can be independely moved and should see its
16 own version of the world depending on where it is.
18 Making these simulated eyes a reality is fairly simple bacause
19 jMonkeyEngine already conatains extensive support for multiple views
20 of the same 3D simulated world. The reason jMonkeyEngine has this
21 support is because the support is necessary to create games with
22 split-screen views. Multiple views are also used to create efficient
23 pseudo-reflections by rendering the scene from a certain perspective
24 and then projecting it back onto a surface in the 3D world.
26 #+caption: jMonkeyEngine supports multiple views to enable split-screen games, like GoldenEye
27 [[../images/goldeneye-4-player.png]]
29 * Brief Description of jMonkeyEngine's Rendering Pipeline
31 jMonkeyEngine allows you to create a =ViewPort=, which represents a
32 view of the simulated world. You can create as many of these as you
33 want. Every frame, the =RenderManager= iterates through each
34 =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there
35 is a =FrameBuffer= which represents the rendered image in the GPU.
37 Each =ViewPort= can have any number of attached =SceneProcessor=
38 objects, which are called every time a new frame is rendered. A
39 =SceneProcessor= recieves a =FrameBuffer= and can do whatever it wants
40 to the data. Often this consists of invoking GPU specific operations
41 on the rendered image. The =SceneProcessor= can also copy the GPU
42 image data to RAM and process it with the CPU.
44 * The Vision Pipeline
46 Each eye in the simulated creature needs it's own =ViewPort= so that
47 it can see the world from its own perspective. To this =ViewPort=, I
48 add a =SceneProcessor= that feeds the visual data to any arbitra
49 continuation function for further processing. That continuation
50 function may perform both CPU and GPU operations on the data. To make
51 this easy for the continuation function, the =SceneProcessor=
52 maintains appropriatly sized buffers in RAM to hold the data. It does
53 not do any copying from the GPU to the CPU itself.
54 #+name: pipeline-1
55 #+begin_src clojure
56 (defn vision-pipeline
57 "Create a SceneProcessor object which wraps a vision processing
58 continuation function. The continuation is a function that takes
59 [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi],
60 each of which has already been appropiately sized."
61 [continuation]
62 (let [byte-buffer (atom nil)
63 renderer (atom nil)
64 image (atom nil)]
65 (proxy [SceneProcessor] []
66 (initialize
67 [renderManager viewPort]
68 (let [cam (.getCamera viewPort)
69 width (.getWidth cam)
70 height (.getHeight cam)]
71 (reset! renderer (.getRenderer renderManager))
72 (reset! byte-buffer
73 (BufferUtils/createByteBuffer
74 (* width height 4)))
75 (reset! image (BufferedImage.
76 width height
77 BufferedImage/TYPE_4BYTE_ABGR))))
78 (isInitialized [] (not (nil? @byte-buffer)))
79 (reshape [_ _ _])
80 (preFrame [_])
81 (postQueue [_])
82 (postFrame
83 [#^FrameBuffer fb]
84 (.clear @byte-buffer)
85 (continuation @renderer fb @byte-buffer @image))
86 (cleanup []))))
87 #+end_src
89 The continuation function given to =(vision-pipeline)= above will be
90 given a =Renderer= and three containers for image data. The
91 =FrameBuffer= references the GPU image data, but it can not be used
92 directly on the CPU. The =ByteBuffer= and =BufferedImage= are
93 initially "empty" but are sized to hold to data in the
94 =FrameBuffer=. I call transfering the GPU image data to the CPU
95 structures "mixing" the image data. I have provided three functions to
96 do this mixing.
98 #+name: pipeline-2
99 #+begin_src clojure
100 (defn frameBuffer->byteBuffer!
101 "Transfer the data in the graphics card (Renderer, FrameBuffer) to
102 the CPU (ByteBuffer)."
103 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb]
104 (.readFrameBuffer r fb bb) bb)
106 (defn byteBuffer->bufferedImage!
107 "Convert the C-style BGRA image data in the ByteBuffer bb to the AWT
108 style ABGR image data and place it in BufferedImage bi."
109 [#^ByteBuffer bb #^BufferedImage bi]
110 (Screenshots/convertScreenShot bb bi) bi)
112 (defn BufferedImage!
113 "Continuation which will grab the buffered image from the materials
114 provided by (vision-pipeline)."
115 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb #^BufferedImage bi]
116 (byteBuffer->bufferedImage!
117 (frameBuffer->byteBuffer! r fb bb) bi))
118 #+end_src
120 Note that it is possible to write vision processing algorithms
121 entirely in terms of =BufferedImage= inputs. Just compose that
122 =BufferedImage= algorithm with =(BufferedImage!)=. However, a vision
123 processing algorithm that is entirely hosted on the GPU does not have
124 to pay for this convienence.
127 * Physical Eyes
129 The vision pipeline described above only deals with
130 Each eye in the creature in blender will work the same way as
131 joints -- a zero dimensional object with no geometry whose local
132 coordinate system determines the orientation of the resulting
133 eye. All eyes will have a parent named "eyes" just as all joints
134 have a parent named "joints". The resulting camera will be a
135 ChaseCamera or a CameraNode bound to the geo that is closest to
136 the eye marker. The eye marker will contain the metadata for the
137 eye, and will be moved by it's bound geometry. The dimensions of
138 the eye's camera are equal to the dimensions of the eye's "UV"
139 map.
141 (vision creature) will take an optional :skip argument which will
142 inform the continuations in scene processor to skip the given
143 number of cycles 0 means that no cycles will be skipped.
145 (vision creature) will return [init-functions sensor-functions].
146 The init-functions are each single-arg functions that take the
147 world and register the cameras and must each be called before the
148 corresponding sensor-functions. Each init-function returns the
149 viewport for that eye which can be manipulated, saved, etc. Each
150 sensor-function is a thunk and will return data in the same
151 format as the tactile-sensor functions the structure is
152 [topology, sensor-data]. Internally, these sensor-functions
153 maintain a reference to sensor-data which is periodically updated
154 by the continuation function established by its init-function.
155 They can be queried every cycle, but their information may not
156 necessairly be different every cycle.
159 #+begin_src clojure
160 (defn add-camera!
161 "Add a camera to the world, calling continuation on every frame
162 produced."
163 [#^Application world camera continuation]
164 (let [width (.getWidth camera)
165 height (.getHeight camera)
166 render-manager (.getRenderManager world)
167 viewport (.createMainView render-manager "eye-view" camera)]
168 (doto viewport
169 (.setClearFlags true true true)
170 (.setBackgroundColor ColorRGBA/Black)
171 (.addProcessor (vision-pipeline continuation))
172 (.attachScene (.getRootNode world)))))
174 (defn retina-sensor-profile
175 "Return a map of pixel selection functions to BufferedImages
176 describing the distribution of light-sensitive components of this
177 eye. Each function creates an integer from the rgb values found in
178 the pixel. :red, :green, :blue, :gray are already defined as
179 extracting the red, green, blue, and average components
180 respectively."
181 [#^Spatial eye]
182 (if-let [eye-map (meta-data eye "eye")]
183 (map-vals
184 load-image
185 (eval (read-string eye-map)))))
187 (defn eye-dimensions
188 "Returns [width, height] specified in the metadata of the eye"
189 [#^Spatial eye]
190 (let [dimensions
191 (map #(vector (.getWidth %) (.getHeight %))
192 (vals (retina-sensor-profile eye)))]
193 [(apply max (map first dimensions))
194 (apply max (map second dimensions))]))
196 (defvar
197 ^{:arglists '([creature])}
198 eyes
199 (sense-nodes "eyes")
200 "Return the children of the creature's \"eyes\" node.")
202 (defn add-eye!
203 "Create a Camera centered on the current position of 'eye which
204 follows the closest physical node in 'creature and sends visual
205 data to 'continuation."
206 [#^Node creature #^Spatial eye]
207 (let [target (closest-node creature eye)
208 [cam-width cam-height] (eye-dimensions eye)
209 cam (Camera. cam-width cam-height)]
210 (.setLocation cam (.getWorldTranslation eye))
211 (.setRotation cam (.getWorldRotation eye))
212 (.setFrustumPerspective
213 cam 45 (/ (.getWidth cam) (.getHeight cam))
214 1 1000)
215 (bind-sense target cam)
216 cam))
218 (defvar color-channel-presets
219 {:all 0xFFFFFF
220 :red 0xFF0000
221 :blue 0x0000FF
222 :green 0x00FF00}
223 "Bitmasks for common RGB color channels")
225 (defn vision-fn
226 "Returns a list of functions, each of which will return a color
227 channel's worth of visual information when called inside a running
228 simulation."
229 [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}]
230 (let [retinal-map (retina-sensor-profile eye)
231 camera (add-eye! creature eye)
232 vision-image
233 (atom
234 (BufferedImage. (.getWidth camera)
235 (.getHeight camera)
236 BufferedImage/TYPE_BYTE_BINARY))
237 register-eye!
238 (runonce
239 (fn [world]
240 (add-camera!
241 world camera
242 (let [counter (atom 0)]
243 (fn [r fb bb bi]
244 (if (zero? (rem (swap! counter inc) (inc skip)))
245 (reset! vision-image
246 (BufferedImage! r fb bb bi))))))))]
247 (vec
248 (map
249 (fn [[key image]]
250 (let [whites (white-coordinates image)
251 topology (vec (collapse whites))
252 mask (color-channel-presets key)]
253 (fn [world]
254 (register-eye! world)
255 (vector
256 topology
257 (vec
258 (for [[x y] whites]
259 (bit-and
260 mask (.getRGB @vision-image x y))))))))
261 retinal-map))))
264 ;; TODO maybe should add a viewport-manipulation function to
265 ;; automatically change viewport settings, attach shadow filters, etc.
267 (defn vision!
268 "Returns a function which returns visual sensory data when called
269 inside a running simulation"
270 [#^Node creature & {skip :skip :or {skip 0}}]
271 (reduce
272 concat
273 (for [eye (eyes creature)]
274 (vision-fn creature eye))))
276 (defn view-vision
277 "Creates a function which accepts a list of visual sensor-data and
278 displays each element of the list to the screen."
279 []
280 (view-sense
281 (fn
282 [[coords sensor-data]]
283 (let [image (points->image coords)]
284 (dorun
285 (for [i (range (count coords))]
286 (.setRGB image ((coords i) 0) ((coords i) 1)
287 (sensor-data i))))
288 image))))
290 #+end_src
293 Note the use of continuation passing style for connecting the eye to a
294 function to process the output. You can create any number of eyes, and
295 each of them will see the world from their own =Camera=. Once every
296 frame, the rendered image is copied to a =BufferedImage=, and that
297 data is sent off to the continuation function. Moving the =Camera=
298 which was used to create the eye will change what the eye sees.
300 * Example
302 #+name: test-vision
303 #+begin_src clojure
304 (ns cortex.test.vision
305 (:use (cortex world util vision))
306 (:import java.awt.image.BufferedImage)
307 (:import javax.swing.JPanel)
308 (:import javax.swing.SwingUtilities)
309 (:import java.awt.Dimension)
310 (:import javax.swing.JFrame)
311 (:import com.jme3.math.ColorRGBA)
312 (:import com.jme3.scene.Node)
313 (:import com.jme3.math.Vector3f))
315 (defn test-two-eyes
316 "Testing vision:
317 Tests the vision system by creating two views of the same rotating
318 object from different angles and displaying both of those views in
319 JFrames.
321 You should see a rotating cube, and two windows,
322 each displaying a different view of the cube."
323 []
324 (let [candy
325 (box 1 1 1 :physical? false :color ColorRGBA/Blue)]
326 (world
327 (doto (Node.)
328 (.attachChild candy))
329 {}
330 (fn [world]
331 (let [cam (.clone (.getCamera world))
332 width (.getWidth cam)
333 height (.getHeight cam)]
334 (add-camera! world cam
335 ;;no-op
336 (comp (view-image) BufferedImage!)
337 )
338 (add-camera! world
339 (doto (.clone cam)
340 (.setLocation (Vector3f. -10 0 0))
341 (.lookAt Vector3f/ZERO Vector3f/UNIT_Y))
342 ;;no-op
343 (comp (view-image) BufferedImage!))
344 ;; This is here to restore the main view
345 ;; after the other views have completed processing
346 (add-camera! world (.getCamera world) no-op)))
347 (fn [world tpf]
348 (.rotate candy (* tpf 0.2) 0 0)))))
349 #+end_src
351 #+name: vision-header
352 #+begin_src clojure
353 (ns cortex.vision
354 "Simulate the sense of vision in jMonkeyEngine3. Enables multiple
355 eyes from different positions to observe the same world, and pass
356 the observed data to any arbitray function. Automatically reads
357 eye-nodes from specially prepared blender files and instanttiates
358 them in the world as actual eyes."
359 {:author "Robert McIntyre"}
360 (:use (cortex world sense util))
361 (:use clojure.contrib.def)
362 (:import com.jme3.post.SceneProcessor)
363 (:import (com.jme3.util BufferUtils Screenshots))
364 (:import java.nio.ByteBuffer)
365 (:import java.awt.image.BufferedImage)
366 (:import (com.jme3.renderer ViewPort Camera))
367 (:import com.jme3.math.ColorRGBA)
368 (:import com.jme3.renderer.Renderer)
369 (:import com.jme3.app.Application)
370 (:import com.jme3.texture.FrameBuffer)
371 (:import (com.jme3.scene Node Spatial)))
372 #+end_src
374 The example code will create two videos of the same rotating object
375 from different angles. It can be used both for stereoscopic vision
376 simulation or for simulating multiple creatures, each with their own
377 sense of vision.
379 - As a neat bonus, this idea behind simulated vision also enables one
380 to [[../../cortex/html/capture-video.html][capture live video feeds from jMonkeyEngine]].
383 * COMMENT Generate Source
384 #+begin_src clojure :tangle ../src/cortex/vision.clj
385 <<eyes>>
386 #+end_src
388 #+begin_src clojure :tangle ../src/cortex/test/vision.clj
389 <<test-vision>>
390 #+end_src