Mercurial > cortex
view org/vision.org @ 215:f283c62bd212
fixed long standing problem with orientation of eyes in blender, fleshed out text in vision.org
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Fri, 10 Feb 2012 02:19:24 -0700 |
parents | 01d3e9855ef9 |
children | f5ea63245b3b |
line wrap: on
line source
1 #+title: Simulated Sense of Sight2 #+author: Robert McIntyre3 #+email: rlm@mit.edu4 #+description: Simulated sight for AI research using JMonkeyEngine3 and clojure5 #+keywords: computer vision, jMonkeyEngine3, clojure6 #+SETUPFILE: ../../aurellem/org/setup.org7 #+INCLUDE: ../../aurellem/org/level-0.org8 #+babel: :mkdirp yes :noweb yes :exports both10 * Vision13 Vision is one of the most important senses for humans, so I need to14 build a simulated sense of vision for my AI. I will do this with15 simulated eyes. Each eye can be independely moved and should see its16 own version of the world depending on where it is.18 Making these simulated eyes a reality is fairly simple bacause19 jMonkeyEngine already conatains extensive support for multiple views20 of the same 3D simulated world. The reason jMonkeyEngine has this21 support is because the support is necessary to create games with22 split-screen views. Multiple views are also used to create efficient23 pseudo-reflections by rendering the scene from a certain perspective24 and then projecting it back onto a surface in the 3D world.26 #+caption: jMonkeyEngine supports multiple views to enable split-screen games, like GoldenEye27 [[../images/goldeneye-4-player.png]]29 * Brief Description of jMonkeyEngine's Rendering Pipeline31 jMonkeyEngine allows you to create a =ViewPort=, which represents a32 view of the simulated world. You can create as many of these as you33 want. Every frame, the =RenderManager= iterates through each34 =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there35 is a =FrameBuffer= which represents the rendered image in the GPU.37 Each =ViewPort= can have any number of attached =SceneProcessor=38 objects, which are called every time a new frame is rendered. A39 =SceneProcessor= recieves a =FrameBuffer= and can do whatever it wants40 to the data. Often this consists of invoking GPU specific operations41 on the rendered image. The =SceneProcessor= can also copy the GPU42 image data to RAM and process it with the CPU.44 * The Vision Pipeline46 Each eye in the simulated creature needs it's own =ViewPort= so that47 it can see the world from its own perspective. To this =ViewPort=, I48 add a =SceneProcessor= that feeds the visual data to any arbitray49 continuation function for further processing. That continuation50 function may perform both CPU and GPU operations on the data. To make51 this easy for the continuation function, the =SceneProcessor=52 maintains appropriatly sized buffers in RAM to hold the data. It does53 not do any copying from the GPU to the CPU itself.55 #+name: pipeline-156 #+begin_src clojure57 (defn vision-pipeline58 "Create a SceneProcessor object which wraps a vision processing59 continuation function. The continuation is a function that takes60 [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi],61 each of which has already been appropiately sized."62 [continuation]63 (let [byte-buffer (atom nil)64 renderer (atom nil)65 image (atom nil)]66 (proxy [SceneProcessor] []67 (initialize68 [renderManager viewPort]69 (let [cam (.getCamera viewPort)70 width (.getWidth cam)71 height (.getHeight cam)]72 (reset! renderer (.getRenderer renderManager))73 (reset! byte-buffer74 (BufferUtils/createByteBuffer75 (* width height 4)))76 (reset! image (BufferedImage.77 width height78 BufferedImage/TYPE_4BYTE_ABGR))))79 (isInitialized [] (not (nil? @byte-buffer)))80 (reshape [_ _ _])81 (preFrame [_])82 (postQueue [_])83 (postFrame84 [#^FrameBuffer fb]85 (.clear @byte-buffer)86 (continuation @renderer fb @byte-buffer @image))87 (cleanup []))))88 #+end_src90 The continuation function given to =(vision-pipeline)= above will be91 given a =Renderer= and three containers for image data. The92 =FrameBuffer= references the GPU image data, but it can not be used93 directly on the CPU. The =ByteBuffer= and =BufferedImage= are94 initially "empty" but are sized to hold to data in the95 =FrameBuffer=. I call transfering the GPU image data to the CPU96 structures "mixing" the image data. I have provided three functions to97 do this mixing.99 #+name: pipeline-2100 #+begin_src clojure101 (defn frameBuffer->byteBuffer!102 "Transfer the data in the graphics card (Renderer, FrameBuffer) to103 the CPU (ByteBuffer)."104 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb]105 (.readFrameBuffer r fb bb) bb)107 (defn byteBuffer->bufferedImage!108 "Convert the C-style BGRA image data in the ByteBuffer bb to the AWT109 style ABGR image data and place it in BufferedImage bi."110 [#^ByteBuffer bb #^BufferedImage bi]111 (Screenshots/convertScreenShot bb bi) bi)113 (defn BufferedImage!114 "Continuation which will grab the buffered image from the materials115 provided by (vision-pipeline)."116 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb #^BufferedImage bi]117 (byteBuffer->bufferedImage!118 (frameBuffer->byteBuffer! r fb bb) bi))119 #+end_src121 Note that it is possible to write vision processing algorithms122 entirely in terms of =BufferedImage= inputs. Just compose that123 =BufferedImage= algorithm with =(BufferedImage!)=. However, a vision124 processing algorithm that is entirely hosted on the GPU does not have125 to pay for this convienence.127 * COMMENT asdasd129 (vision creature) will take an optional :skip argument which will130 inform the continuations in scene processor to skip the given131 number of cycles 0 means that no cycles will be skipped.133 (vision creature) will return [init-functions sensor-functions].134 The init-functions are each single-arg functions that take the135 world and register the cameras and must each be called before the136 corresponding sensor-functions. Each init-function returns the137 viewport for that eye which can be manipulated, saved, etc. Each138 sensor-function is a thunk and will return data in the same139 format as the tactile-sensor functions the structure is140 [topology, sensor-data]. Internally, these sensor-functions141 maintain a reference to sensor-data which is periodically updated142 by the continuation function established by its init-function.143 They can be queried every cycle, but their information may not144 necessairly be different every cycle.148 * Physical Eyes150 The vision pipeline described above handles the flow of rendered151 images. Now, we need simulated eyes to serve as the source of these152 images.154 An eye is described in blender in the same way as a joint. They are155 zero dimensional empty objects with no geometry whose local coordinate156 system determines the orientation of the resulting eye. All eyes are157 childern of a parent node named "eyes" just as all joints have a158 parent named "joints". An eye binds to the nearest physical object159 with =(bind-sense=).161 #+name: add-eye162 #+begin_src clojure163 (in-ns 'cortex.vision)165 (import com.jme3.math.Vector3f)167 (def blender-rotation-correction168 (doto (Quaternion.)169 (.fromRotationMatrix170 (doto (Matrix3f.)171 (.setColumn 0172 (Vector3f. 1 0 0))173 (.setColumn 1174 (Vector3f. 0 -1 0))175 (.setColumn 2176 (Vector3f. 0 0 -1)))178 (doto (Matrix3f.)179 (.setColumn 0180 (Vector3f.183 (defn add-eye!184 "Create a Camera centered on the current position of 'eye which185 follows the closest physical node in 'creature and sends visual186 data to 'continuation. The camera will point in the X direction and187 use the Z vector as up as determined by the rotation of these188 vectors in blender coordinate space. Use XZY rotation for the node189 in blender."190 [#^Node creature #^Spatial eye]191 (let [target (closest-node creature eye)192 [cam-width cam-height] (eye-dimensions eye)193 cam (Camera. cam-width cam-height)194 rot (.getWorldRotation eye)]195 (.setLocation cam (.getWorldTranslation eye))196 (.lookAtDirection cam (.mult rot Vector3f/UNIT_X)197 ;; this part is consistent with using Z in198 ;; blender as the UP vector.199 (.mult rot Vector3f/UNIT_Y))201 (println-repl "eye unit-z ->" (.mult rot Vector3f/UNIT_Z))202 (println-repl "eye unit-y ->" (.mult rot Vector3f/UNIT_Y))203 (println-repl "eye unit-x ->" (.mult rot Vector3f/UNIT_X))204 (.setFrustumPerspective205 cam 45 (/ (.getWidth cam) (.getHeight cam)) 1 1000)206 (bind-sense target cam) cam))207 #+end_src209 Here, the camera is created based on metadata on the eye-node and210 attached to the nearest physical object with =(bind-sense)=213 ** The Retina215 An eye is a surface (the retina) which contains many discrete sensors216 to detect light. These sensors have can have different-light sensing217 properties. In humans, each discrete sensor is sensitive to red,218 blue, green, or gray. These different types of sensors can have219 different spatial distributions along the retina. In humans, there is220 a fovea in the center of the retina which has a very high density of221 color sensors, and a blind spot which has no sensors at all. Sensor222 density decreases in proportion to distance from the retina.224 I want to be able to model any retinal configuration, so my eye-nodes225 in blender contain metadata pointing to images that describe the226 percise position of the individual sensors using white pixels. The227 meta-data also describes the percise sensitivity to light that the228 sensors described in the image have. An eye can contain any number of229 these images. For example, the metadata for an eye might look like230 this:232 #+begin_src clojure233 {0xFF0000 "Models/test-creature/retina-small.png"}234 #+end_src236 #+caption: The retinal profile image "Models/test-creature/retina-small.png". White pixels are photo-sensitive elements. The distribution of white pixels is denser in the middle and falls off at the edges and is inspired by the human retina.237 [[../assets/Models/test-creature/retina-small.png]]239 Together, the number 0xFF0000 and the image image above describe the240 placement of red-sensitive sensory elements.242 Meta-data to very crudely approximate a human eye might be something243 like this:245 #+begin_src clojure246 (let [retinal-profile "Models/test-creature/retina-small.png"]247 {0xFF0000 retinal-profile248 0x00FF00 retinal-profile249 0x0000FF retinal-profile250 0xFFFFFF retinal-profile})251 #+end_src253 The numbers that serve as keys in the map determine a sensor's254 relative sensitivity to the channels red, green, and blue. These255 sensitivity values are packed into an integer in the order _RGB in256 8-bit fields. The RGB values of a pixel in the image are added257 together with these sensitivities as linear weights. Therfore,258 0xFF0000 means sensitive to red only while 0xFFFFFF means sensitive to259 all colors equally (gray).261 For convienence I've defined a few symbols for the more common262 sensitivity values.264 #+name: sensitivity265 #+begin_src clojure266 (defvar sensitivity-presets267 {:all 0xFFFFFF268 :red 0xFF0000269 :blue 0x0000FF270 :green 0x00FF00}271 "Retinal sensitivity presets for sensors that extract one channel272 (:red :blue :green) or average all channels (:gray)")273 #+end_src275 ** Metadata Processing277 =(retina-sensor-profile)= extracts a map from the eye-node in the same278 format as the example maps above. =(eye-dimensions)= finds the279 dimansions of the smallest image required to contain all the retinal280 sensor maps.282 #+begin_src clojure283 (defn retina-sensor-profile284 "Return a map of pixel sensitivity numbers to BufferedImages285 describing the distribution of light-sensitive components of this286 eye. :red, :green, :blue, :gray are already defined as extracting287 the red, green, blue, and average components respectively."288 [#^Spatial eye]289 (if-let [eye-map (meta-data eye "eye")]290 (map-vals291 load-image292 (eval (read-string eye-map)))))294 (defn eye-dimensions295 "Returns [width, height] specified in the metadata of the eye"296 [#^Spatial eye]297 (let [dimensions298 (map #(vector (.getWidth %) (.getHeight %))299 (vals (retina-sensor-profile eye)))]300 [(apply max (map first dimensions))301 (apply max (map second dimensions))]))302 #+end_src305 * Eye Creation307 First off, get the children of the "eyes" empty node to find all the308 eyes the creature has.310 #+begin_src clojure311 (defvar312 ^{:arglists '([creature])}313 eyes314 (sense-nodes "eyes")315 "Return the children of the creature's \"eyes\" node.")316 #+end_src318 Then, add the camera created by =(add-eye!)= to the simulation by319 creating a new viewport.321 #+begin_src clojure322 (defn add-camera!323 "Add a camera to the world, calling continuation on every frame324 produced."325 [#^Application world camera continuation]326 (let [width (.getWidth camera)327 height (.getHeight camera)328 render-manager (.getRenderManager world)329 viewport (.createMainView render-manager "eye-view" camera)]330 (doto viewport331 (.setClearFlags true true true)332 (.setBackgroundColor ColorRGBA/Black)333 (.addProcessor (vision-pipeline continuation))334 (.attachScene (.getRootNode world)))))335 #+end_src338 The continuation function registers the viewport with the simulation339 the first time it is called, and uses the CPU to extract the340 appropriate pixels from the rendered image and weight them by each341 sensors sensitivity. I have the option to do this filtering in native342 code for a slight gain in speed. I could also do it in the GPU for a343 massive gain in speed. =(vision-kernel)= generates a list of such344 continuation functions, one for each channel of the eye.346 #+begin_src clojure347 (in-ns 'cortex.vision)349 (defrecord attached-viewport [vision-fn viewport-fn]350 clojure.lang.IFn351 (invoke [this world] (vision-fn world))352 (applyTo [this args] (apply vision-fn args)))354 (defn vision-kernel355 "Returns a list of functions, each of which will return a color356 channel's worth of visual information when called inside a running357 simulation."358 [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}]359 (let [retinal-map (retina-sensor-profile eye)360 camera (add-eye! creature eye)361 vision-image362 (atom363 (BufferedImage. (.getWidth camera)364 (.getHeight camera)365 BufferedImage/TYPE_BYTE_BINARY))366 register-eye!367 (runonce368 (fn [world]369 (add-camera!370 world camera371 (let [counter (atom 0)]372 (fn [r fb bb bi]373 (if (zero? (rem (swap! counter inc) (inc skip)))374 (reset! vision-image375 (BufferedImage! r fb bb bi))))))))]376 (vec377 (map378 (fn [[key image]]379 (let [whites (white-coordinates image)380 topology (vec (collapse whites))381 mask (color-channel-presets key key)]382 (attached-viewport.383 (fn [world]384 (register-eye! world)385 (vector386 topology387 (vec388 (for [[x y] whites]389 (bit-and390 mask (.getRGB @vision-image x y))))))391 register-eye!)))392 retinal-map))))394 (defn gen-fix-display395 "Create a function to call to restore a simulation's display when it396 is disrupted by a Viewport."397 []398 (runonce399 (fn [world]400 (add-camera! world (.getCamera world) no-op))))402 #+end_src404 Note that since each of the functions generated by =(vision-kernel)=405 shares the same =(register-eye!)= function, the eye will be registered406 only once the first time any of the functions from the list returned407 by =(vision-kernel)= is called. Each of the functions returned by408 =(vision-kernel)= also allows access to the =Viewport= through which409 it recieves images.411 The in-game display can be disrupted by all the viewports that the412 functions greated by =(vision-kernel)= add. This doesn't affect the413 simulation or the simulated senses, but can be annoying.414 =(gen-fix-display)= restores the in-simulation display.416 ** Vision!418 All the hard work has been done, all that remains is to apply419 =(vision-kernel)= to each eye in the creature and gather the results420 into one list of functions.422 #+begin_src clojure423 (defn vision!424 "Returns a function which returns visual sensory data when called425 inside a running simulation"426 [#^Node creature & {skip :skip :or {skip 0}}]427 (reduce428 concat429 (for [eye (eyes creature)]430 (vision-kernel creature eye))))431 #+end_src433 ** Visualization of Vision435 It's vital to have a visual representation for each sense. Here I use436 =(view-sense)= to construct a function that will create a display for437 visual data.439 #+begin_src clojure440 (defn view-vision441 "Creates a function which accepts a list of visual sensor-data and442 displays each element of the list to the screen."443 []444 (view-sense445 (fn446 [[coords sensor-data]]447 (let [image (points->image coords)]448 (dorun449 (for [i (range (count coords))]450 (.setRGB image ((coords i) 0) ((coords i) 1)451 (sensor-data i))))452 image))))453 #+end_src455 * Tests457 ** Basic Test459 This is a basic test for the vision system. It only tests the460 vision-pipeline and does not deal with loadig eyes from a blender461 file. The code creates two videos of the same rotating cube from462 different angles.464 #+name: test-1465 #+begin_src clojure466 (in-ns 'cortex.test.vision)468 (defn test-two-eyes469 "Testing vision:470 Tests the vision system by creating two views of the same rotating471 object from different angles and displaying both of those views in472 JFrames.474 You should see a rotating cube, and two windows,475 each displaying a different view of the cube."476 []477 (let [candy478 (box 1 1 1 :physical? false :color ColorRGBA/Blue)]479 (world480 (doto (Node.)481 (.attachChild candy))482 {}483 (fn [world]484 (let [cam (.clone (.getCamera world))485 width (.getWidth cam)486 height (.getHeight cam)]487 (add-camera! world cam488 (comp489 (view-image490 (File. "/home/r/proj/cortex/render/vision/1"))491 BufferedImage!))492 (add-camera! world493 (doto (.clone cam)494 (.setLocation (Vector3f. -10 0 0))495 (.lookAt Vector3f/ZERO Vector3f/UNIT_Y))496 (comp497 (view-image498 (File. "/home/r/proj/cortex/render/vision/2"))499 BufferedImage!))500 ;; This is here to restore the main view501 ;; after the other views have completed processing502 (add-camera! world (.getCamera world) no-op)))503 (fn [world tpf]504 (.rotate candy (* tpf 0.2) 0 0)))))505 #+end_src507 #+begin_html508 <div class="figure">509 <video controls="controls" width="755">510 <source src="../video/spinning-cube.ogg" type="video/ogg"511 preload="none" poster="../images/aurellem-1280x480.png" />512 </video>513 <p>A rotating cube viewed from two different perspectives.</p>514 </div>515 #+end_html517 Creating multiple eyes like this can be used for stereoscopic vision518 simulation in a single creature or for simulating multiple creatures,519 each with their own sense of vision.521 ** Adding Vision to the Worm523 To the worm from the last post, we add a new node that describes its524 eyes.526 #+attr_html: width=755527 #+caption: The worm with newly added empty nodes describing a single eye.528 [[../images/worm-with-eye.png]]530 The node highlighted in yellow is the root level "eyes" node. It has531 a single node, highlighted in orange, which describes a single532 eye. This is the "eye" node. The two nodes which are not highlighted describe the single joint533 of the worm.535 The metadata of the eye-node is:537 #+begin_src clojure :results verbatim :exports both538 (cortex.sense/meta-data539 (.getChild540 (.getChild (cortex.test.body/worm)541 "eyes") "eye") "eye")542 #+end_src544 #+results:545 : "(let [retina \"Models/test-creature/retina-small.png\"]546 : {:all retina :red retina :green retina :blue retina})"548 This is the approximation to the human eye described earlier.550 #+begin_src clojure551 (in-ns 'cortex.test.vision)553 (import com.aurellem.capture.Capture)555 (defn test-worm-vision []556 (let [the-worm (doto (worm)(body!))557 vision (vision! the-worm)558 vision-display (view-vision)559 fix-display (gen-fix-display)560 me (sphere 0.5 :color ColorRGBA/Blue :physical? false)561 x-axis562 (box 1 0.01 0.01 :physical? false :color ColorRGBA/Red563 :position (Vector3f. 0 -5 0))564 y-axis565 (box 0.01 1 0.01 :physical? false :color ColorRGBA/Green566 :position (Vector3f. 0 -5 0))567 z-axis568 (box 0.01 0.01 1 :physical? false :color ColorRGBA/Blue569 :position (Vector3f. 0 -5 0))]571 (world (nodify [(floor) the-worm x-axis y-axis z-axis me])572 standard-debug-controls573 (fn [world]574 (light-up-everything world)575 ;; add a view from the worm's perspective576 (add-camera!577 world578 (add-eye! the-worm579 (.getChild580 (.getChild the-worm "eyes") "eye"))581 (comp582 (view-image583 (File. "/home/r/proj/cortex/render/worm-vision/worm-view"))584 BufferedImage!))585 (set-gravity world Vector3f/ZERO)586 (Capture/captureVideo587 world588 (File. "/home/r/proj/cortex/render/worm-vision/main-view")))589 (fn [world _ ]590 (.setLocalTranslation me (.getLocation (.getCamera world)))591 (vision-display592 (map #(% world) vision)593 (File. "/home/r/proj/cortex/render/worm-vision"))594 (fix-display world)))))595 #+end_src597 * Headers599 #+name: vision-header600 #+begin_src clojure601 (ns cortex.vision602 "Simulate the sense of vision in jMonkeyEngine3. Enables multiple603 eyes from different positions to observe the same world, and pass604 the observed data to any arbitray function. Automatically reads605 eye-nodes from specially prepared blender files and instanttiates606 them in the world as actual eyes."607 {:author "Robert McIntyre"}608 (:use (cortex world sense util))609 (:use clojure.contrib.def)610 (:import com.jme3.post.SceneProcessor)611 (:import (com.jme3.util BufferUtils Screenshots))612 (:import java.nio.ByteBuffer)613 (:import java.awt.image.BufferedImage)614 (:import (com.jme3.renderer ViewPort Camera))615 (:import com.jme3.math.ColorRGBA)616 (:import com.jme3.renderer.Renderer)617 (:import com.jme3.app.Application)618 (:import com.jme3.texture.FrameBuffer)619 (:import (com.jme3.scene Node Spatial)))620 #+end_src622 #+name: test-header623 #+begin_src clojure624 (ns cortex.test.vision625 (:use (cortex world sense util body vision))626 (:use cortex.test.body)627 (:import java.awt.image.BufferedImage)628 (:import javax.swing.JPanel)629 (:import javax.swing.SwingUtilities)630 (:import java.awt.Dimension)631 (:import javax.swing.JFrame)632 (:import com.jme3.math.ColorRGBA)633 (:import com.jme3.scene.Node)634 (:import com.jme3.math.Vector3f)635 (:import java.io.File))636 #+end_src640 - As a neat bonus, this idea behind simulated vision also enables one641 to [[../../cortex/html/capture-video.html][capture live video feeds from jMonkeyEngine]].644 * COMMENT Generate Source645 #+begin_src clojure :tangle ../src/cortex/vision.clj646 <<eyes>>647 #+end_src649 #+begin_src clojure :tangle ../src/cortex/test/vision.clj650 <<test-header>>651 <<test-1>>652 #+end_src