view org/vision.org @ 215:f283c62bd212

fixed long standing problem with orientation of eyes in blender, fleshed out text in vision.org
author Robert McIntyre <rlm@mit.edu>
date Fri, 10 Feb 2012 02:19:24 -0700
parents 01d3e9855ef9
children f5ea63245b3b
line wrap: on
line source
1 #+title: Simulated Sense of Sight
2 #+author: Robert McIntyre
3 #+email: rlm@mit.edu
4 #+description: Simulated sight for AI research using JMonkeyEngine3 and clojure
5 #+keywords: computer vision, jMonkeyEngine3, clojure
6 #+SETUPFILE: ../../aurellem/org/setup.org
7 #+INCLUDE: ../../aurellem/org/level-0.org
8 #+babel: :mkdirp yes :noweb yes :exports both
10 * Vision
13 Vision is one of the most important senses for humans, so I need to
14 build a simulated sense of vision for my AI. I will do this with
15 simulated eyes. Each eye can be independely moved and should see its
16 own version of the world depending on where it is.
18 Making these simulated eyes a reality is fairly simple bacause
19 jMonkeyEngine already conatains extensive support for multiple views
20 of the same 3D simulated world. The reason jMonkeyEngine has this
21 support is because the support is necessary to create games with
22 split-screen views. Multiple views are also used to create efficient
23 pseudo-reflections by rendering the scene from a certain perspective
24 and then projecting it back onto a surface in the 3D world.
26 #+caption: jMonkeyEngine supports multiple views to enable split-screen games, like GoldenEye
27 [[../images/goldeneye-4-player.png]]
29 * Brief Description of jMonkeyEngine's Rendering Pipeline
31 jMonkeyEngine allows you to create a =ViewPort=, which represents a
32 view of the simulated world. You can create as many of these as you
33 want. Every frame, the =RenderManager= iterates through each
34 =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there
35 is a =FrameBuffer= which represents the rendered image in the GPU.
37 Each =ViewPort= can have any number of attached =SceneProcessor=
38 objects, which are called every time a new frame is rendered. A
39 =SceneProcessor= recieves a =FrameBuffer= and can do whatever it wants
40 to the data. Often this consists of invoking GPU specific operations
41 on the rendered image. The =SceneProcessor= can also copy the GPU
42 image data to RAM and process it with the CPU.
44 * The Vision Pipeline
46 Each eye in the simulated creature needs it's own =ViewPort= so that
47 it can see the world from its own perspective. To this =ViewPort=, I
48 add a =SceneProcessor= that feeds the visual data to any arbitray
49 continuation function for further processing. That continuation
50 function may perform both CPU and GPU operations on the data. To make
51 this easy for the continuation function, the =SceneProcessor=
52 maintains appropriatly sized buffers in RAM to hold the data. It does
53 not do any copying from the GPU to the CPU itself.
55 #+name: pipeline-1
56 #+begin_src clojure
57 (defn vision-pipeline
58 "Create a SceneProcessor object which wraps a vision processing
59 continuation function. The continuation is a function that takes
60 [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi],
61 each of which has already been appropiately sized."
62 [continuation]
63 (let [byte-buffer (atom nil)
64 renderer (atom nil)
65 image (atom nil)]
66 (proxy [SceneProcessor] []
67 (initialize
68 [renderManager viewPort]
69 (let [cam (.getCamera viewPort)
70 width (.getWidth cam)
71 height (.getHeight cam)]
72 (reset! renderer (.getRenderer renderManager))
73 (reset! byte-buffer
74 (BufferUtils/createByteBuffer
75 (* width height 4)))
76 (reset! image (BufferedImage.
77 width height
78 BufferedImage/TYPE_4BYTE_ABGR))))
79 (isInitialized [] (not (nil? @byte-buffer)))
80 (reshape [_ _ _])
81 (preFrame [_])
82 (postQueue [_])
83 (postFrame
84 [#^FrameBuffer fb]
85 (.clear @byte-buffer)
86 (continuation @renderer fb @byte-buffer @image))
87 (cleanup []))))
88 #+end_src
90 The continuation function given to =(vision-pipeline)= above will be
91 given a =Renderer= and three containers for image data. The
92 =FrameBuffer= references the GPU image data, but it can not be used
93 directly on the CPU. The =ByteBuffer= and =BufferedImage= are
94 initially "empty" but are sized to hold to data in the
95 =FrameBuffer=. I call transfering the GPU image data to the CPU
96 structures "mixing" the image data. I have provided three functions to
97 do this mixing.
99 #+name: pipeline-2
100 #+begin_src clojure
101 (defn frameBuffer->byteBuffer!
102 "Transfer the data in the graphics card (Renderer, FrameBuffer) to
103 the CPU (ByteBuffer)."
104 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb]
105 (.readFrameBuffer r fb bb) bb)
107 (defn byteBuffer->bufferedImage!
108 "Convert the C-style BGRA image data in the ByteBuffer bb to the AWT
109 style ABGR image data and place it in BufferedImage bi."
110 [#^ByteBuffer bb #^BufferedImage bi]
111 (Screenshots/convertScreenShot bb bi) bi)
113 (defn BufferedImage!
114 "Continuation which will grab the buffered image from the materials
115 provided by (vision-pipeline)."
116 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb #^BufferedImage bi]
117 (byteBuffer->bufferedImage!
118 (frameBuffer->byteBuffer! r fb bb) bi))
119 #+end_src
121 Note that it is possible to write vision processing algorithms
122 entirely in terms of =BufferedImage= inputs. Just compose that
123 =BufferedImage= algorithm with =(BufferedImage!)=. However, a vision
124 processing algorithm that is entirely hosted on the GPU does not have
125 to pay for this convienence.
127 * COMMENT asdasd
129 (vision creature) will take an optional :skip argument which will
130 inform the continuations in scene processor to skip the given
131 number of cycles 0 means that no cycles will be skipped.
133 (vision creature) will return [init-functions sensor-functions].
134 The init-functions are each single-arg functions that take the
135 world and register the cameras and must each be called before the
136 corresponding sensor-functions. Each init-function returns the
137 viewport for that eye which can be manipulated, saved, etc. Each
138 sensor-function is a thunk and will return data in the same
139 format as the tactile-sensor functions the structure is
140 [topology, sensor-data]. Internally, these sensor-functions
141 maintain a reference to sensor-data which is periodically updated
142 by the continuation function established by its init-function.
143 They can be queried every cycle, but their information may not
144 necessairly be different every cycle.
148 * Physical Eyes
150 The vision pipeline described above handles the flow of rendered
151 images. Now, we need simulated eyes to serve as the source of these
152 images.
154 An eye is described in blender in the same way as a joint. They are
155 zero dimensional empty objects with no geometry whose local coordinate
156 system determines the orientation of the resulting eye. All eyes are
157 childern of a parent node named "eyes" just as all joints have a
158 parent named "joints". An eye binds to the nearest physical object
159 with =(bind-sense=).
161 #+name: add-eye
162 #+begin_src clojure
163 (in-ns 'cortex.vision)
165 (import com.jme3.math.Vector3f)
167 (def blender-rotation-correction
168 (doto (Quaternion.)
169 (.fromRotationMatrix
170 (doto (Matrix3f.)
171 (.setColumn 0
172 (Vector3f. 1 0 0))
173 (.setColumn 1
174 (Vector3f. 0 -1 0))
175 (.setColumn 2
176 (Vector3f. 0 0 -1)))
178 (doto (Matrix3f.)
179 (.setColumn 0
180 (Vector3f.
183 (defn add-eye!
184 "Create a Camera centered on the current position of 'eye which
185 follows the closest physical node in 'creature and sends visual
186 data to 'continuation. The camera will point in the X direction and
187 use the Z vector as up as determined by the rotation of these
188 vectors in blender coordinate space. Use XZY rotation for the node
189 in blender."
190 [#^Node creature #^Spatial eye]
191 (let [target (closest-node creature eye)
192 [cam-width cam-height] (eye-dimensions eye)
193 cam (Camera. cam-width cam-height)
194 rot (.getWorldRotation eye)]
195 (.setLocation cam (.getWorldTranslation eye))
196 (.lookAtDirection cam (.mult rot Vector3f/UNIT_X)
197 ;; this part is consistent with using Z in
198 ;; blender as the UP vector.
199 (.mult rot Vector3f/UNIT_Y))
201 (println-repl "eye unit-z ->" (.mult rot Vector3f/UNIT_Z))
202 (println-repl "eye unit-y ->" (.mult rot Vector3f/UNIT_Y))
203 (println-repl "eye unit-x ->" (.mult rot Vector3f/UNIT_X))
204 (.setFrustumPerspective
205 cam 45 (/ (.getWidth cam) (.getHeight cam)) 1 1000)
206 (bind-sense target cam) cam))
207 #+end_src
209 Here, the camera is created based on metadata on the eye-node and
210 attached to the nearest physical object with =(bind-sense)=
213 ** The Retina
215 An eye is a surface (the retina) which contains many discrete sensors
216 to detect light. These sensors have can have different-light sensing
217 properties. In humans, each discrete sensor is sensitive to red,
218 blue, green, or gray. These different types of sensors can have
219 different spatial distributions along the retina. In humans, there is
220 a fovea in the center of the retina which has a very high density of
221 color sensors, and a blind spot which has no sensors at all. Sensor
222 density decreases in proportion to distance from the retina.
224 I want to be able to model any retinal configuration, so my eye-nodes
225 in blender contain metadata pointing to images that describe the
226 percise position of the individual sensors using white pixels. The
227 meta-data also describes the percise sensitivity to light that the
228 sensors described in the image have. An eye can contain any number of
229 these images. For example, the metadata for an eye might look like
230 this:
232 #+begin_src clojure
233 {0xFF0000 "Models/test-creature/retina-small.png"}
234 #+end_src
236 #+caption: The retinal profile image "Models/test-creature/retina-small.png". White pixels are photo-sensitive elements. The distribution of white pixels is denser in the middle and falls off at the edges and is inspired by the human retina.
237 [[../assets/Models/test-creature/retina-small.png]]
239 Together, the number 0xFF0000 and the image image above describe the
240 placement of red-sensitive sensory elements.
242 Meta-data to very crudely approximate a human eye might be something
243 like this:
245 #+begin_src clojure
246 (let [retinal-profile "Models/test-creature/retina-small.png"]
247 {0xFF0000 retinal-profile
248 0x00FF00 retinal-profile
249 0x0000FF retinal-profile
250 0xFFFFFF retinal-profile})
251 #+end_src
253 The numbers that serve as keys in the map determine a sensor's
254 relative sensitivity to the channels red, green, and blue. These
255 sensitivity values are packed into an integer in the order _RGB in
256 8-bit fields. The RGB values of a pixel in the image are added
257 together with these sensitivities as linear weights. Therfore,
258 0xFF0000 means sensitive to red only while 0xFFFFFF means sensitive to
259 all colors equally (gray).
261 For convienence I've defined a few symbols for the more common
262 sensitivity values.
264 #+name: sensitivity
265 #+begin_src clojure
266 (defvar sensitivity-presets
267 {:all 0xFFFFFF
268 :red 0xFF0000
269 :blue 0x0000FF
270 :green 0x00FF00}
271 "Retinal sensitivity presets for sensors that extract one channel
272 (:red :blue :green) or average all channels (:gray)")
273 #+end_src
275 ** Metadata Processing
277 =(retina-sensor-profile)= extracts a map from the eye-node in the same
278 format as the example maps above. =(eye-dimensions)= finds the
279 dimansions of the smallest image required to contain all the retinal
280 sensor maps.
282 #+begin_src clojure
283 (defn retina-sensor-profile
284 "Return a map of pixel sensitivity numbers to BufferedImages
285 describing the distribution of light-sensitive components of this
286 eye. :red, :green, :blue, :gray are already defined as extracting
287 the red, green, blue, and average components respectively."
288 [#^Spatial eye]
289 (if-let [eye-map (meta-data eye "eye")]
290 (map-vals
291 load-image
292 (eval (read-string eye-map)))))
294 (defn eye-dimensions
295 "Returns [width, height] specified in the metadata of the eye"
296 [#^Spatial eye]
297 (let [dimensions
298 (map #(vector (.getWidth %) (.getHeight %))
299 (vals (retina-sensor-profile eye)))]
300 [(apply max (map first dimensions))
301 (apply max (map second dimensions))]))
302 #+end_src
305 * Eye Creation
307 First off, get the children of the "eyes" empty node to find all the
308 eyes the creature has.
310 #+begin_src clojure
311 (defvar
312 ^{:arglists '([creature])}
313 eyes
314 (sense-nodes "eyes")
315 "Return the children of the creature's \"eyes\" node.")
316 #+end_src
318 Then, add the camera created by =(add-eye!)= to the simulation by
319 creating a new viewport.
321 #+begin_src clojure
322 (defn add-camera!
323 "Add a camera to the world, calling continuation on every frame
324 produced."
325 [#^Application world camera continuation]
326 (let [width (.getWidth camera)
327 height (.getHeight camera)
328 render-manager (.getRenderManager world)
329 viewport (.createMainView render-manager "eye-view" camera)]
330 (doto viewport
331 (.setClearFlags true true true)
332 (.setBackgroundColor ColorRGBA/Black)
333 (.addProcessor (vision-pipeline continuation))
334 (.attachScene (.getRootNode world)))))
335 #+end_src
338 The continuation function registers the viewport with the simulation
339 the first time it is called, and uses the CPU to extract the
340 appropriate pixels from the rendered image and weight them by each
341 sensors sensitivity. I have the option to do this filtering in native
342 code for a slight gain in speed. I could also do it in the GPU for a
343 massive gain in speed. =(vision-kernel)= generates a list of such
344 continuation functions, one for each channel of the eye.
346 #+begin_src clojure
347 (in-ns 'cortex.vision)
349 (defrecord attached-viewport [vision-fn viewport-fn]
350 clojure.lang.IFn
351 (invoke [this world] (vision-fn world))
352 (applyTo [this args] (apply vision-fn args)))
354 (defn vision-kernel
355 "Returns a list of functions, each of which will return a color
356 channel's worth of visual information when called inside a running
357 simulation."
358 [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}]
359 (let [retinal-map (retina-sensor-profile eye)
360 camera (add-eye! creature eye)
361 vision-image
362 (atom
363 (BufferedImage. (.getWidth camera)
364 (.getHeight camera)
365 BufferedImage/TYPE_BYTE_BINARY))
366 register-eye!
367 (runonce
368 (fn [world]
369 (add-camera!
370 world camera
371 (let [counter (atom 0)]
372 (fn [r fb bb bi]
373 (if (zero? (rem (swap! counter inc) (inc skip)))
374 (reset! vision-image
375 (BufferedImage! r fb bb bi))))))))]
376 (vec
377 (map
378 (fn [[key image]]
379 (let [whites (white-coordinates image)
380 topology (vec (collapse whites))
381 mask (color-channel-presets key key)]
382 (attached-viewport.
383 (fn [world]
384 (register-eye! world)
385 (vector
386 topology
387 (vec
388 (for [[x y] whites]
389 (bit-and
390 mask (.getRGB @vision-image x y))))))
391 register-eye!)))
392 retinal-map))))
394 (defn gen-fix-display
395 "Create a function to call to restore a simulation's display when it
396 is disrupted by a Viewport."
397 []
398 (runonce
399 (fn [world]
400 (add-camera! world (.getCamera world) no-op))))
402 #+end_src
404 Note that since each of the functions generated by =(vision-kernel)=
405 shares the same =(register-eye!)= function, the eye will be registered
406 only once the first time any of the functions from the list returned
407 by =(vision-kernel)= is called. Each of the functions returned by
408 =(vision-kernel)= also allows access to the =Viewport= through which
409 it recieves images.
411 The in-game display can be disrupted by all the viewports that the
412 functions greated by =(vision-kernel)= add. This doesn't affect the
413 simulation or the simulated senses, but can be annoying.
414 =(gen-fix-display)= restores the in-simulation display.
416 ** Vision!
418 All the hard work has been done, all that remains is to apply
419 =(vision-kernel)= to each eye in the creature and gather the results
420 into one list of functions.
422 #+begin_src clojure
423 (defn vision!
424 "Returns a function which returns visual sensory data when called
425 inside a running simulation"
426 [#^Node creature & {skip :skip :or {skip 0}}]
427 (reduce
428 concat
429 (for [eye (eyes creature)]
430 (vision-kernel creature eye))))
431 #+end_src
433 ** Visualization of Vision
435 It's vital to have a visual representation for each sense. Here I use
436 =(view-sense)= to construct a function that will create a display for
437 visual data.
439 #+begin_src clojure
440 (defn view-vision
441 "Creates a function which accepts a list of visual sensor-data and
442 displays each element of the list to the screen."
443 []
444 (view-sense
445 (fn
446 [[coords sensor-data]]
447 (let [image (points->image coords)]
448 (dorun
449 (for [i (range (count coords))]
450 (.setRGB image ((coords i) 0) ((coords i) 1)
451 (sensor-data i))))
452 image))))
453 #+end_src
455 * Tests
457 ** Basic Test
459 This is a basic test for the vision system. It only tests the
460 vision-pipeline and does not deal with loadig eyes from a blender
461 file. The code creates two videos of the same rotating cube from
462 different angles.
464 #+name: test-1
465 #+begin_src clojure
466 (in-ns 'cortex.test.vision)
468 (defn test-two-eyes
469 "Testing vision:
470 Tests the vision system by creating two views of the same rotating
471 object from different angles and displaying both of those views in
472 JFrames.
474 You should see a rotating cube, and two windows,
475 each displaying a different view of the cube."
476 []
477 (let [candy
478 (box 1 1 1 :physical? false :color ColorRGBA/Blue)]
479 (world
480 (doto (Node.)
481 (.attachChild candy))
482 {}
483 (fn [world]
484 (let [cam (.clone (.getCamera world))
485 width (.getWidth cam)
486 height (.getHeight cam)]
487 (add-camera! world cam
488 (comp
489 (view-image
490 (File. "/home/r/proj/cortex/render/vision/1"))
491 BufferedImage!))
492 (add-camera! world
493 (doto (.clone cam)
494 (.setLocation (Vector3f. -10 0 0))
495 (.lookAt Vector3f/ZERO Vector3f/UNIT_Y))
496 (comp
497 (view-image
498 (File. "/home/r/proj/cortex/render/vision/2"))
499 BufferedImage!))
500 ;; This is here to restore the main view
501 ;; after the other views have completed processing
502 (add-camera! world (.getCamera world) no-op)))
503 (fn [world tpf]
504 (.rotate candy (* tpf 0.2) 0 0)))))
505 #+end_src
507 #+begin_html
508 <div class="figure">
509 <video controls="controls" width="755">
510 <source src="../video/spinning-cube.ogg" type="video/ogg"
511 preload="none" poster="../images/aurellem-1280x480.png" />
512 </video>
513 <p>A rotating cube viewed from two different perspectives.</p>
514 </div>
515 #+end_html
517 Creating multiple eyes like this can be used for stereoscopic vision
518 simulation in a single creature or for simulating multiple creatures,
519 each with their own sense of vision.
521 ** Adding Vision to the Worm
523 To the worm from the last post, we add a new node that describes its
524 eyes.
526 #+attr_html: width=755
527 #+caption: The worm with newly added empty nodes describing a single eye.
528 [[../images/worm-with-eye.png]]
530 The node highlighted in yellow is the root level "eyes" node. It has
531 a single node, highlighted in orange, which describes a single
532 eye. This is the "eye" node. The two nodes which are not highlighted describe the single joint
533 of the worm.
535 The metadata of the eye-node is:
537 #+begin_src clojure :results verbatim :exports both
538 (cortex.sense/meta-data
539 (.getChild
540 (.getChild (cortex.test.body/worm)
541 "eyes") "eye") "eye")
542 #+end_src
544 #+results:
545 : "(let [retina \"Models/test-creature/retina-small.png\"]
546 : {:all retina :red retina :green retina :blue retina})"
548 This is the approximation to the human eye described earlier.
550 #+begin_src clojure
551 (in-ns 'cortex.test.vision)
553 (import com.aurellem.capture.Capture)
555 (defn test-worm-vision []
556 (let [the-worm (doto (worm)(body!))
557 vision (vision! the-worm)
558 vision-display (view-vision)
559 fix-display (gen-fix-display)
560 me (sphere 0.5 :color ColorRGBA/Blue :physical? false)
561 x-axis
562 (box 1 0.01 0.01 :physical? false :color ColorRGBA/Red
563 :position (Vector3f. 0 -5 0))
564 y-axis
565 (box 0.01 1 0.01 :physical? false :color ColorRGBA/Green
566 :position (Vector3f. 0 -5 0))
567 z-axis
568 (box 0.01 0.01 1 :physical? false :color ColorRGBA/Blue
569 :position (Vector3f. 0 -5 0))]
571 (world (nodify [(floor) the-worm x-axis y-axis z-axis me])
572 standard-debug-controls
573 (fn [world]
574 (light-up-everything world)
575 ;; add a view from the worm's perspective
576 (add-camera!
577 world
578 (add-eye! the-worm
579 (.getChild
580 (.getChild the-worm "eyes") "eye"))
581 (comp
582 (view-image
583 (File. "/home/r/proj/cortex/render/worm-vision/worm-view"))
584 BufferedImage!))
585 (set-gravity world Vector3f/ZERO)
586 (Capture/captureVideo
587 world
588 (File. "/home/r/proj/cortex/render/worm-vision/main-view")))
589 (fn [world _ ]
590 (.setLocalTranslation me (.getLocation (.getCamera world)))
591 (vision-display
592 (map #(% world) vision)
593 (File. "/home/r/proj/cortex/render/worm-vision"))
594 (fix-display world)))))
595 #+end_src
597 * Headers
599 #+name: vision-header
600 #+begin_src clojure
601 (ns cortex.vision
602 "Simulate the sense of vision in jMonkeyEngine3. Enables multiple
603 eyes from different positions to observe the same world, and pass
604 the observed data to any arbitray function. Automatically reads
605 eye-nodes from specially prepared blender files and instanttiates
606 them in the world as actual eyes."
607 {:author "Robert McIntyre"}
608 (:use (cortex world sense util))
609 (:use clojure.contrib.def)
610 (:import com.jme3.post.SceneProcessor)
611 (:import (com.jme3.util BufferUtils Screenshots))
612 (:import java.nio.ByteBuffer)
613 (:import java.awt.image.BufferedImage)
614 (:import (com.jme3.renderer ViewPort Camera))
615 (:import com.jme3.math.ColorRGBA)
616 (:import com.jme3.renderer.Renderer)
617 (:import com.jme3.app.Application)
618 (:import com.jme3.texture.FrameBuffer)
619 (:import (com.jme3.scene Node Spatial)))
620 #+end_src
622 #+name: test-header
623 #+begin_src clojure
624 (ns cortex.test.vision
625 (:use (cortex world sense util body vision))
626 (:use cortex.test.body)
627 (:import java.awt.image.BufferedImage)
628 (:import javax.swing.JPanel)
629 (:import javax.swing.SwingUtilities)
630 (:import java.awt.Dimension)
631 (:import javax.swing.JFrame)
632 (:import com.jme3.math.ColorRGBA)
633 (:import com.jme3.scene.Node)
634 (:import com.jme3.math.Vector3f)
635 (:import java.io.File))
636 #+end_src
640 - As a neat bonus, this idea behind simulated vision also enables one
641 to [[../../cortex/html/capture-video.html][capture live video feeds from jMonkeyEngine]].
644 * COMMENT Generate Source
645 #+begin_src clojure :tangle ../src/cortex/vision.clj
646 <<eyes>>
647 #+end_src
649 #+begin_src clojure :tangle ../src/cortex/test/vision.clj
650 <<test-header>>
651 <<test-1>>
652 #+end_src