Mercurial > cortex
view org/vision.org @ 214:01d3e9855ef9
saving progress, time to sleep.....
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Thu, 09 Feb 2012 09:04:17 -0700 |
parents | 319963720179 |
children | f283c62bd212 |
line wrap: on
line source
1 #+title: Simulated Sense of Sight2 #+author: Robert McIntyre3 #+email: rlm@mit.edu4 #+description: Simulated sight for AI research using JMonkeyEngine3 and clojure5 #+keywords: computer vision, jMonkeyEngine3, clojure6 #+SETUPFILE: ../../aurellem/org/setup.org7 #+INCLUDE: ../../aurellem/org/level-0.org8 #+babel: :mkdirp yes :noweb yes :exports both10 * Vision13 Vision is one of the most important senses for humans, so I need to14 build a simulated sense of vision for my AI. I will do this with15 simulated eyes. Each eye can be independely moved and should see its16 own version of the world depending on where it is.18 Making these simulated eyes a reality is fairly simple bacause19 jMonkeyEngine already conatains extensive support for multiple views20 of the same 3D simulated world. The reason jMonkeyEngine has this21 support is because the support is necessary to create games with22 split-screen views. Multiple views are also used to create efficient23 pseudo-reflections by rendering the scene from a certain perspective24 and then projecting it back onto a surface in the 3D world.26 #+caption: jMonkeyEngine supports multiple views to enable split-screen games, like GoldenEye27 [[../images/goldeneye-4-player.png]]29 * Brief Description of jMonkeyEngine's Rendering Pipeline31 jMonkeyEngine allows you to create a =ViewPort=, which represents a32 view of the simulated world. You can create as many of these as you33 want. Every frame, the =RenderManager= iterates through each34 =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there35 is a =FrameBuffer= which represents the rendered image in the GPU.37 Each =ViewPort= can have any number of attached =SceneProcessor=38 objects, which are called every time a new frame is rendered. A39 =SceneProcessor= recieves a =FrameBuffer= and can do whatever it wants40 to the data. Often this consists of invoking GPU specific operations41 on the rendered image. The =SceneProcessor= can also copy the GPU42 image data to RAM and process it with the CPU.44 * The Vision Pipeline46 Each eye in the simulated creature needs it's own =ViewPort= so that47 it can see the world from its own perspective. To this =ViewPort=, I48 add a =SceneProcessor= that feeds the visual data to any arbitray49 continuation function for further processing. That continuation50 function may perform both CPU and GPU operations on the data. To make51 this easy for the continuation function, the =SceneProcessor=52 maintains appropriatly sized buffers in RAM to hold the data. It does53 not do any copying from the GPU to the CPU itself.55 #+name: pipeline-156 #+begin_src clojure57 (defn vision-pipeline58 "Create a SceneProcessor object which wraps a vision processing59 continuation function. The continuation is a function that takes60 [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi],61 each of which has already been appropiately sized."62 [continuation]63 (let [byte-buffer (atom nil)64 renderer (atom nil)65 image (atom nil)]66 (proxy [SceneProcessor] []67 (initialize68 [renderManager viewPort]69 (let [cam (.getCamera viewPort)70 width (.getWidth cam)71 height (.getHeight cam)]72 (reset! renderer (.getRenderer renderManager))73 (reset! byte-buffer74 (BufferUtils/createByteBuffer75 (* width height 4)))76 (reset! image (BufferedImage.77 width height78 BufferedImage/TYPE_4BYTE_ABGR))))79 (isInitialized [] (not (nil? @byte-buffer)))80 (reshape [_ _ _])81 (preFrame [_])82 (postQueue [_])83 (postFrame84 [#^FrameBuffer fb]85 (.clear @byte-buffer)86 (continuation @renderer fb @byte-buffer @image))87 (cleanup []))))88 #+end_src90 The continuation function given to =(vision-pipeline)= above will be91 given a =Renderer= and three containers for image data. The92 =FrameBuffer= references the GPU image data, but it can not be used93 directly on the CPU. The =ByteBuffer= and =BufferedImage= are94 initially "empty" but are sized to hold to data in the95 =FrameBuffer=. I call transfering the GPU image data to the CPU96 structures "mixing" the image data. I have provided three functions to97 do this mixing.99 #+name: pipeline-2100 #+begin_src clojure101 (defn frameBuffer->byteBuffer!102 "Transfer the data in the graphics card (Renderer, FrameBuffer) to103 the CPU (ByteBuffer)."104 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb]105 (.readFrameBuffer r fb bb) bb)107 (defn byteBuffer->bufferedImage!108 "Convert the C-style BGRA image data in the ByteBuffer bb to the AWT109 style ABGR image data and place it in BufferedImage bi."110 [#^ByteBuffer bb #^BufferedImage bi]111 (Screenshots/convertScreenShot bb bi) bi)113 (defn BufferedImage!114 "Continuation which will grab the buffered image from the materials115 provided by (vision-pipeline)."116 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb #^BufferedImage bi]117 (byteBuffer->bufferedImage!118 (frameBuffer->byteBuffer! r fb bb) bi))119 #+end_src121 Note that it is possible to write vision processing algorithms122 entirely in terms of =BufferedImage= inputs. Just compose that123 =BufferedImage= algorithm with =(BufferedImage!)=. However, a vision124 processing algorithm that is entirely hosted on the GPU does not have125 to pay for this convienence.127 * COMMENT asdasd129 (vision creature) will take an optional :skip argument which will130 inform the continuations in scene processor to skip the given131 number of cycles 0 means that no cycles will be skipped.133 (vision creature) will return [init-functions sensor-functions].134 The init-functions are each single-arg functions that take the135 world and register the cameras and must each be called before the136 corresponding sensor-functions. Each init-function returns the137 viewport for that eye which can be manipulated, saved, etc. Each138 sensor-function is a thunk and will return data in the same139 format as the tactile-sensor functions the structure is140 [topology, sensor-data]. Internally, these sensor-functions141 maintain a reference to sensor-data which is periodically updated142 by the continuation function established by its init-function.143 They can be queried every cycle, but their information may not144 necessairly be different every cycle.148 * Physical Eyes150 The vision pipeline described above handles the flow of rendered151 images. Now, we need simulated eyes to serve as the source of these152 images.154 An eye is described in blender in the same way as a joint. They are155 zero dimensional empty objects with no geometry whose local coordinate156 system determines the orientation of the resulting eye. All eyes are157 childern of a parent node named "eyes" just as all joints have a158 parent named "joints". An eye binds to the nearest physical object159 with =(bind-sense=).161 #+name: add-eye162 #+begin_src clojure163 (defn add-eye!164 "Create a Camera centered on the current position of 'eye which165 follows the closest physical node in 'creature and sends visual166 data to 'continuation."167 [#^Node creature #^Spatial eye]168 (let [target (closest-node creature eye)169 [cam-width cam-height] (eye-dimensions eye)170 cam (Camera. cam-width cam-height)]171 (.setLocation cam (.getWorldTranslation eye))172 (.setRotation cam (.getWorldRotation eye))173 (.setFrustumPerspective174 cam 45 (/ (.getWidth cam) (.getHeight cam))175 1 1000)176 (bind-sense target cam)177 cam))178 #+end_src180 Here, the camera is created based on metadata on the eye-node and181 attached to the nearest physical object with =(bind-sense)=184 ** The Retina186 An eye is a surface (the retina) which contains many discrete sensors187 to detect light. These sensors have can have different-light sensing188 properties. In humans, each discrete sensor is sensitive to red,189 blue, green, or gray. These different types of sensors can have190 different spatial distributions along the retina. In humans, there is191 a fovea in the center of the retina which has a very high density of192 color sensors, and a blind spot which has no sensors at all. Sensor193 density decreases in proportion to distance from the retina.195 I want to be able to model any retinal configuration, so my eye-nodes196 in blender contain metadata pointing to images that describe the197 percise position of the individual sensors using white pixels. The198 meta-data also describes the percise sensitivity to light that the199 sensors described in the image have. An eye can contain any number of200 these images. For example, the metadata for an eye might look like201 this:203 #+begin_src clojure204 {0xFF0000 "Models/test-creature/retina-small.png"}205 #+end_src207 #+caption: The retinal profile image "Models/test-creature/retina-small.png". White pixels are photo-sensitive elements. The distribution of white pixels is denser in the middle and falls off at the edges and is inspired by the human retina.208 [[../assets/Models/test-creature/retina-small.png]]210 Together, the number 0xFF0000 and the image image above describe the211 placement of red-sensitive sensory elements.213 Meta-data to very crudely approximate a human eye might be something214 like this:216 #+begin_src clojure217 (let [retinal-profile "Models/test-creature/retina-small.png"]218 {0xFF0000 retinal-profile219 0x00FF00 retinal-profile220 0x0000FF retinal-profile221 0xFFFFFF retinal-profile})222 #+end_src224 The numbers that serve as keys in the map determine a sensor's225 relative sensitivity to the channels red, green, and blue. These226 sensitivity values are packed into an integer in the order _RGB in227 8-bit fields. The RGB values of a pixel in the image are added228 together with these sensitivities as linear weights. Therfore,229 0xFF0000 means sensitive to red only while 0xFFFFFF means sensitive to230 all colors equally (gray).232 For convienence I've defined a few symbols for the more common233 sensitivity values.235 #+name: sensitivity236 #+begin_src clojure237 (defvar sensitivity-presets238 {:all 0xFFFFFF239 :red 0xFF0000240 :blue 0x0000FF241 :green 0x00FF00}242 "Retinal sensitivity presets for sensors that extract one channel243 (:red :blue :green) or average all channels (:gray)")244 #+end_src246 ** Metadata Processing248 =(retina-sensor-profile)= extracts a map from the eye-node in the same249 format as the example maps above. =(eye-dimensions)= finds the250 dimansions of the smallest image required to contain all the retinal251 sensor maps.253 #+begin_src clojure254 (defn retina-sensor-profile255 "Return a map of pixel sensitivity numbers to BufferedImages256 describing the distribution of light-sensitive components of this257 eye. :red, :green, :blue, :gray are already defined as extracting258 the red, green, blue, and average components respectively."259 [#^Spatial eye]260 (if-let [eye-map (meta-data eye "eye")]261 (map-vals262 load-image263 (eval (read-string eye-map)))))265 (defn eye-dimensions266 "Returns [width, height] specified in the metadata of the eye"267 [#^Spatial eye]268 (let [dimensions269 (map #(vector (.getWidth %) (.getHeight %))270 (vals (retina-sensor-profile eye)))]271 [(apply max (map first dimensions))272 (apply max (map second dimensions))]))273 #+end_src276 * Eye Creation278 First off, get the children of the "eyes" empty node to find all the279 eyes the creature has.281 #+begin_src clojure282 (defvar283 ^{:arglists '([creature])}284 eyes285 (sense-nodes "eyes")286 "Return the children of the creature's \"eyes\" node.")287 #+end_src289 Then,291 #+begin_src clojure292 (defn add-camera!293 "Add a camera to the world, calling continuation on every frame294 produced."295 [#^Application world camera continuation]296 (let [width (.getWidth camera)297 height (.getHeight camera)298 render-manager (.getRenderManager world)299 viewport (.createMainView render-manager "eye-view" camera)]300 (doto viewport301 (.setClearFlags true true true)302 (.setBackgroundColor ColorRGBA/Black)303 (.addProcessor (vision-pipeline continuation))304 (.attachScene (.getRootNode world)))))310 (defn vision-fn311 "Returns a list of functions, each of which will return a color312 channel's worth of visual information when called inside a running313 simulation."314 [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}]315 (let [retinal-map (retina-sensor-profile eye)316 camera (add-eye! creature eye)317 vision-image318 (atom319 (BufferedImage. (.getWidth camera)320 (.getHeight camera)321 BufferedImage/TYPE_BYTE_BINARY))322 register-eye!323 (runonce324 (fn [world]325 (add-camera!326 world camera327 (let [counter (atom 0)]328 (fn [r fb bb bi]329 (if (zero? (rem (swap! counter inc) (inc skip)))330 (reset! vision-image331 (BufferedImage! r fb bb bi))))))))]332 (vec333 (map334 (fn [[key image]]335 (let [whites (white-coordinates image)336 topology (vec (collapse whites))337 mask (color-channel-presets key key)]338 (fn [world]339 (register-eye! world)340 (vector341 topology342 (vec343 (for [[x y] whites]344 (bit-and345 mask (.getRGB @vision-image x y))))))))346 retinal-map))))349 ;; TODO maybe should add a viewport-manipulation function to350 ;; automatically change viewport settings, attach shadow filters, etc.352 (defn vision!353 "Returns a function which returns visual sensory data when called354 inside a running simulation"355 [#^Node creature & {skip :skip :or {skip 0}}]356 (reduce357 concat358 (for [eye (eyes creature)]359 (vision-fn creature eye))))361 (defn view-vision362 "Creates a function which accepts a list of visual sensor-data and363 displays each element of the list to the screen."364 []365 (view-sense366 (fn367 [[coords sensor-data]]368 (let [image (points->image coords)]369 (dorun370 (for [i (range (count coords))]371 (.setRGB image ((coords i) 0) ((coords i) 1)372 (sensor-data i))))373 image))))375 #+end_src378 Note the use of continuation passing style for connecting the eye to a379 function to process the output. You can create any number of eyes, and380 each of them will see the world from their own =Camera=. Once every381 frame, the rendered image is copied to a =BufferedImage=, and that382 data is sent off to the continuation function. Moving the =Camera=383 which was used to create the eye will change what the eye sees.385 * Example387 #+name: test-vision388 #+begin_src clojure389 (ns cortex.test.vision390 (:use (cortex world util vision))391 (:import java.awt.image.BufferedImage)392 (:import javax.swing.JPanel)393 (:import javax.swing.SwingUtilities)394 (:import java.awt.Dimension)395 (:import javax.swing.JFrame)396 (:import com.jme3.math.ColorRGBA)397 (:import com.jme3.scene.Node)398 (:import com.jme3.math.Vector3f))400 (defn test-two-eyes401 "Testing vision:402 Tests the vision system by creating two views of the same rotating403 object from different angles and displaying both of those views in404 JFrames.406 You should see a rotating cube, and two windows,407 each displaying a different view of the cube."408 []409 (let [candy410 (box 1 1 1 :physical? false :color ColorRGBA/Blue)]411 (world412 (doto (Node.)413 (.attachChild candy))414 {}415 (fn [world]416 (let [cam (.clone (.getCamera world))417 width (.getWidth cam)418 height (.getHeight cam)]419 (add-camera! world cam420 ;;no-op421 (comp (view-image) BufferedImage!)422 )423 (add-camera! world424 (doto (.clone cam)425 (.setLocation (Vector3f. -10 0 0))426 (.lookAt Vector3f/ZERO Vector3f/UNIT_Y))427 ;;no-op428 (comp (view-image) BufferedImage!))429 ;; This is here to restore the main view430 ;; after the other views have completed processing431 (add-camera! world (.getCamera world) no-op)))432 (fn [world tpf]433 (.rotate candy (* tpf 0.2) 0 0)))))434 #+end_src436 #+name: vision-header437 #+begin_src clojure438 (ns cortex.vision439 "Simulate the sense of vision in jMonkeyEngine3. Enables multiple440 eyes from different positions to observe the same world, and pass441 the observed data to any arbitray function. Automatically reads442 eye-nodes from specially prepared blender files and instanttiates443 them in the world as actual eyes."444 {:author "Robert McIntyre"}445 (:use (cortex world sense util))446 (:use clojure.contrib.def)447 (:import com.jme3.post.SceneProcessor)448 (:import (com.jme3.util BufferUtils Screenshots))449 (:import java.nio.ByteBuffer)450 (:import java.awt.image.BufferedImage)451 (:import (com.jme3.renderer ViewPort Camera))452 (:import com.jme3.math.ColorRGBA)453 (:import com.jme3.renderer.Renderer)454 (:import com.jme3.app.Application)455 (:import com.jme3.texture.FrameBuffer)456 (:import (com.jme3.scene Node Spatial)))457 #+end_src459 The example code will create two videos of the same rotating object460 from different angles. It can be used both for stereoscopic vision461 simulation or for simulating multiple creatures, each with their own462 sense of vision.464 - As a neat bonus, this idea behind simulated vision also enables one465 to [[../../cortex/html/capture-video.html][capture live video feeds from jMonkeyEngine]].468 * COMMENT Generate Source469 #+begin_src clojure :tangle ../src/cortex/vision.clj470 <<eyes>>471 #+end_src473 #+begin_src clojure :tangle ../src/cortex/test/vision.clj474 <<test-vision>>475 #+end_src