rlm@34: #+title: Simulated Sense of Sight rlm@23: #+author: Robert McIntyre rlm@23: #+email: rlm@mit.edu rlm@38: #+description: Simulated sight for AI research using JMonkeyEngine3 and clojure rlm@34: #+keywords: computer vision, jMonkeyEngine3, clojure rlm@23: #+SETUPFILE: ../../aurellem/org/setup.org rlm@23: #+INCLUDE: ../../aurellem/org/level-0.org rlm@23: #+babel: :mkdirp yes :noweb yes :exports both rlm@23: ocsenave@265: # SUGGEST: Call functions by their name, without ocsenave@265: # parentheses. e.g. =add-eye!=, not =(add-eye!)=. The reason for this ocsenave@265: # is that it is potentially easy to confuse the /function/ =f= with its ocsenave@265: # /value/ at a particular point =(f x)=. Mathematicians have this ocsenave@265: # problem with their notation; we don't need it in ours. ocsenave@265: ocsenave@264: * JMonkeyEngine natively supports multiple views of the same world. ocsenave@264: rlm@212: Vision is one of the most important senses for humans, so I need to rlm@212: build a simulated sense of vision for my AI. I will do this with rlm@212: simulated eyes. Each eye can be independely moved and should see its rlm@212: own version of the world depending on where it is. rlm@212: rlm@218: Making these simulated eyes a reality is simple bacause jMonkeyEngine rlm@218: already conatains extensive support for multiple views of the same 3D rlm@218: simulated world. The reason jMonkeyEngine has this support is because rlm@218: the support is necessary to create games with split-screen rlm@218: views. Multiple views are also used to create efficient rlm@212: pseudo-reflections by rendering the scene from a certain perspective rlm@212: and then projecting it back onto a surface in the 3D world. rlm@212: rlm@218: #+caption: jMonkeyEngine supports multiple views to enable split-screen games, like GoldenEye, which was one of the first games to use split-screen views. rlm@212: [[../images/goldeneye-4-player.png]] rlm@212: ocsenave@264: ** =ViewPorts=, =SceneProcessors=, and the =RenderManager=. ocsenave@264: # =Viewports= are cameras; =RenderManger= takes snapshots each frame. ocsenave@264: #* A Brief Description of jMonkeyEngine's Rendering Pipeline rlm@212: rlm@213: jMonkeyEngine allows you to create a =ViewPort=, which represents a rlm@213: view of the simulated world. You can create as many of these as you rlm@213: want. Every frame, the =RenderManager= iterates through each rlm@213: =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there rlm@213: is a =FrameBuffer= which represents the rendered image in the GPU. rlm@151: ocsenave@272: #+caption: =ViewPorts= are cameras in the world. During each frame, the =Rendermanager= records a snapshot of what each view is currently seeing; these snapshots are =Framebuffer= objects. ocsenave@265: #+ATTR_HTML: width="400" ocsenave@272: [[../images/diagram_rendermanager2.png]] ocsenave@262: rlm@213: Each =ViewPort= can have any number of attached =SceneProcessor= rlm@213: objects, which are called every time a new frame is rendered. A rlm@219: =SceneProcessor= recieves its =ViewPort's= =FrameBuffer= and can do rlm@219: whatever it wants to the data. Often this consists of invoking GPU rlm@219: specific operations on the rendered image. The =SceneProcessor= can rlm@219: also copy the GPU image data to RAM and process it with the CPU. rlm@151: ocsenave@264: ** From Views to Vision ocsenave@264: # Appropriating Views for Vision. rlm@151: ocsenave@264: Each eye in the simulated creature needs its own =ViewPort= so that rlm@213: it can see the world from its own perspective. To this =ViewPort=, I rlm@214: add a =SceneProcessor= that feeds the visual data to any arbitray rlm@213: continuation function for further processing. That continuation rlm@213: function may perform both CPU and GPU operations on the data. To make rlm@213: this easy for the continuation function, the =SceneProcessor= rlm@213: maintains appropriatly sized buffers in RAM to hold the data. It does rlm@218: not do any copying from the GPU to the CPU itself because it is a slow rlm@218: operation. rlm@214: rlm@213: #+name: pipeline-1 rlm@213: #+begin_src clojure rlm@113: (defn vision-pipeline rlm@34: "Create a SceneProcessor object which wraps a vision processing rlm@113: continuation function. The continuation is a function that takes rlm@113: [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi], rlm@113: each of which has already been appropiately sized." rlm@23: [continuation] rlm@23: (let [byte-buffer (atom nil) rlm@113: renderer (atom nil) rlm@113: image (atom nil)] rlm@23: (proxy [SceneProcessor] [] rlm@23: (initialize rlm@23: [renderManager viewPort] rlm@23: (let [cam (.getCamera viewPort) rlm@23: width (.getWidth cam) rlm@23: height (.getHeight cam)] rlm@23: (reset! renderer (.getRenderer renderManager)) rlm@23: (reset! byte-buffer rlm@23: (BufferUtils/createByteBuffer rlm@113: (* width height 4))) rlm@113: (reset! image (BufferedImage. rlm@113: width height rlm@113: BufferedImage/TYPE_4BYTE_ABGR)))) rlm@23: (isInitialized [] (not (nil? @byte-buffer))) rlm@23: (reshape [_ _ _]) rlm@23: (preFrame [_]) rlm@23: (postQueue [_]) rlm@23: (postFrame rlm@23: [#^FrameBuffer fb] rlm@23: (.clear @byte-buffer) rlm@113: (continuation @renderer fb @byte-buffer @image)) rlm@23: (cleanup [])))) rlm@213: #+end_src rlm@213: rlm@213: The continuation function given to =(vision-pipeline)= above will be rlm@213: given a =Renderer= and three containers for image data. The rlm@218: =FrameBuffer= references the GPU image data, but the pixel data can rlm@218: not be used directly on the CPU. The =ByteBuffer= and =BufferedImage= rlm@219: are initially "empty" but are sized to hold the data in the rlm@213: =FrameBuffer=. I call transfering the GPU image data to the CPU rlm@213: structures "mixing" the image data. I have provided three functions to rlm@213: do this mixing. rlm@213: rlm@213: #+name: pipeline-2 rlm@213: #+begin_src clojure rlm@113: (defn frameBuffer->byteBuffer! rlm@113: "Transfer the data in the graphics card (Renderer, FrameBuffer) to rlm@113: the CPU (ByteBuffer)." rlm@113: [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb] rlm@113: (.readFrameBuffer r fb bb) bb) rlm@113: rlm@113: (defn byteBuffer->bufferedImage! rlm@113: "Convert the C-style BGRA image data in the ByteBuffer bb to the AWT rlm@113: style ABGR image data and place it in BufferedImage bi." rlm@113: [#^ByteBuffer bb #^BufferedImage bi] rlm@113: (Screenshots/convertScreenShot bb bi) bi) rlm@113: rlm@113: (defn BufferedImage! rlm@113: "Continuation which will grab the buffered image from the materials rlm@113: provided by (vision-pipeline)." rlm@113: [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb #^BufferedImage bi] rlm@113: (byteBuffer->bufferedImage! rlm@113: (frameBuffer->byteBuffer! r fb bb) bi)) rlm@213: #+end_src rlm@112: rlm@213: Note that it is possible to write vision processing algorithms rlm@213: entirely in terms of =BufferedImage= inputs. Just compose that rlm@213: =BufferedImage= algorithm with =(BufferedImage!)=. However, a vision rlm@213: processing algorithm that is entirely hosted on the GPU does not have rlm@213: to pay for this convienence. rlm@213: ocsenave@265: * Optical sensor arrays are described with images and referenced with metadata rlm@214: The vision pipeline described above handles the flow of rendered rlm@214: images. Now, we need simulated eyes to serve as the source of these rlm@214: images. rlm@214: rlm@214: An eye is described in blender in the same way as a joint. They are rlm@214: zero dimensional empty objects with no geometry whose local coordinate rlm@214: system determines the orientation of the resulting eye. All eyes are rlm@214: childern of a parent node named "eyes" just as all joints have a rlm@214: parent named "joints". An eye binds to the nearest physical object rlm@214: with =(bind-sense=). rlm@214: rlm@214: #+name: add-eye rlm@214: #+begin_src clojure rlm@215: (in-ns 'cortex.vision) rlm@215: rlm@214: (defn add-eye! rlm@214: "Create a Camera centered on the current position of 'eye which rlm@214: follows the closest physical node in 'creature and sends visual rlm@215: data to 'continuation. The camera will point in the X direction and rlm@215: use the Z vector as up as determined by the rotation of these rlm@215: vectors in blender coordinate space. Use XZY rotation for the node rlm@215: in blender." rlm@214: [#^Node creature #^Spatial eye] rlm@214: (let [target (closest-node creature eye) rlm@214: [cam-width cam-height] (eye-dimensions eye) rlm@215: cam (Camera. cam-width cam-height) rlm@215: rot (.getWorldRotation eye)] rlm@214: (.setLocation cam (.getWorldTranslation eye)) rlm@218: (.lookAtDirection rlm@218: cam ; this part is not a mistake and rlm@218: (.mult rot Vector3f/UNIT_X) ; is consistent with using Z in rlm@218: (.mult rot Vector3f/UNIT_Y)) ; blender as the UP vector. rlm@214: (.setFrustumPerspective rlm@215: cam 45 (/ (.getWidth cam) (.getHeight cam)) 1 1000) rlm@215: (bind-sense target cam) cam)) rlm@214: #+end_src rlm@214: rlm@214: Here, the camera is created based on metadata on the eye-node and rlm@214: attached to the nearest physical object with =(bind-sense)= rlm@214: ** The Retina rlm@214: rlm@214: An eye is a surface (the retina) which contains many discrete sensors rlm@218: to detect light. These sensors have can have different light-sensing rlm@214: properties. In humans, each discrete sensor is sensitive to red, rlm@214: blue, green, or gray. These different types of sensors can have rlm@214: different spatial distributions along the retina. In humans, there is rlm@214: a fovea in the center of the retina which has a very high density of rlm@214: color sensors, and a blind spot which has no sensors at all. Sensor rlm@219: density decreases in proportion to distance from the fovea. rlm@214: rlm@214: I want to be able to model any retinal configuration, so my eye-nodes rlm@214: in blender contain metadata pointing to images that describe the rlm@214: percise position of the individual sensors using white pixels. The rlm@214: meta-data also describes the percise sensitivity to light that the rlm@214: sensors described in the image have. An eye can contain any number of rlm@214: these images. For example, the metadata for an eye might look like rlm@214: this: rlm@214: rlm@214: #+begin_src clojure rlm@214: {0xFF0000 "Models/test-creature/retina-small.png"} rlm@214: #+end_src rlm@214: rlm@214: #+caption: The retinal profile image "Models/test-creature/retina-small.png". White pixels are photo-sensitive elements. The distribution of white pixels is denser in the middle and falls off at the edges and is inspired by the human retina. rlm@214: [[../assets/Models/test-creature/retina-small.png]] rlm@214: rlm@214: Together, the number 0xFF0000 and the image image above describe the rlm@214: placement of red-sensitive sensory elements. rlm@214: rlm@214: Meta-data to very crudely approximate a human eye might be something rlm@214: like this: rlm@214: rlm@214: #+begin_src clojure rlm@214: (let [retinal-profile "Models/test-creature/retina-small.png"] rlm@214: {0xFF0000 retinal-profile rlm@214: 0x00FF00 retinal-profile rlm@214: 0x0000FF retinal-profile rlm@214: 0xFFFFFF retinal-profile}) rlm@214: #+end_src rlm@214: rlm@214: The numbers that serve as keys in the map determine a sensor's rlm@214: relative sensitivity to the channels red, green, and blue. These rlm@218: sensitivity values are packed into an integer in the order =|_|R|G|B|= rlm@218: in 8-bit fields. The RGB values of a pixel in the image are added rlm@214: together with these sensitivities as linear weights. Therfore, rlm@214: 0xFF0000 means sensitive to red only while 0xFFFFFF means sensitive to rlm@214: all colors equally (gray). rlm@214: rlm@214: For convienence I've defined a few symbols for the more common rlm@214: sensitivity values. rlm@214: rlm@214: #+name: sensitivity rlm@214: #+begin_src clojure rlm@214: (defvar sensitivity-presets rlm@214: {:all 0xFFFFFF rlm@214: :red 0xFF0000 rlm@214: :blue 0x0000FF rlm@214: :green 0x00FF00} rlm@214: "Retinal sensitivity presets for sensors that extract one channel rlm@219: (:red :blue :green) or average all channels (:all)") rlm@214: #+end_src rlm@214: rlm@214: ** Metadata Processing rlm@214: rlm@214: =(retina-sensor-profile)= extracts a map from the eye-node in the same rlm@214: format as the example maps above. =(eye-dimensions)= finds the rlm@219: dimensions of the smallest image required to contain all the retinal rlm@214: sensor maps. rlm@214: rlm@216: #+name: retina rlm@214: #+begin_src clojure rlm@214: (defn retina-sensor-profile rlm@214: "Return a map of pixel sensitivity numbers to BufferedImages rlm@214: describing the distribution of light-sensitive components of this rlm@214: eye. :red, :green, :blue, :gray are already defined as extracting rlm@214: the red, green, blue, and average components respectively." rlm@214: [#^Spatial eye] rlm@214: (if-let [eye-map (meta-data eye "eye")] rlm@214: (map-vals rlm@214: load-image rlm@214: (eval (read-string eye-map))))) rlm@214: rlm@218: (defn eye-dimensions rlm@218: "Returns [width, height] determined by the metadata of the eye." rlm@214: [#^Spatial eye] rlm@214: (let [dimensions rlm@214: (map #(vector (.getWidth %) (.getHeight %)) rlm@214: (vals (retina-sensor-profile eye)))] rlm@214: [(apply max (map first dimensions)) rlm@214: (apply max (map second dimensions))])) rlm@214: #+end_src rlm@214: ocsenave@265: * Importing and parsing descriptions of eyes. rlm@214: First off, get the children of the "eyes" empty node to find all the rlm@214: eyes the creature has. rlm@216: #+name: eye-node rlm@214: #+begin_src clojure rlm@214: (defvar rlm@214: ^{:arglists '([creature])} rlm@214: eyes rlm@214: (sense-nodes "eyes") rlm@214: "Return the children of the creature's \"eyes\" node.") rlm@214: #+end_src rlm@214: rlm@215: Then, add the camera created by =(add-eye!)= to the simulation by rlm@215: creating a new viewport. rlm@214: rlm@216: #+name: add-camera rlm@213: #+begin_src clojure rlm@169: (defn add-camera! rlm@169: "Add a camera to the world, calling continuation on every frame rlm@34: produced." rlm@167: [#^Application world camera continuation] rlm@23: (let [width (.getWidth camera) rlm@23: height (.getHeight camera) rlm@23: render-manager (.getRenderManager world) rlm@23: viewport (.createMainView render-manager "eye-view" camera)] rlm@23: (doto viewport rlm@23: (.setClearFlags true true true) rlm@112: (.setBackgroundColor ColorRGBA/Black) rlm@113: (.addProcessor (vision-pipeline continuation)) rlm@23: (.attachScene (.getRootNode world))))) rlm@215: #+end_src rlm@151: rlm@151: rlm@218: The eye's continuation function should register the viewport with the rlm@218: simulation the first time it is called, use the CPU to extract the rlm@215: appropriate pixels from the rendered image and weight them by each rlm@218: sensor's sensitivity. I have the option to do this processing in rlm@218: native code for a slight gain in speed. I could also do it in the GPU rlm@218: for a massive gain in speed. =(vision-kernel)= generates a list of rlm@218: such continuation functions, one for each channel of the eye. rlm@151: rlm@216: #+name: kernel rlm@215: #+begin_src clojure rlm@215: (in-ns 'cortex.vision) rlm@151: rlm@215: (defrecord attached-viewport [vision-fn viewport-fn] rlm@215: clojure.lang.IFn rlm@215: (invoke [this world] (vision-fn world)) rlm@215: (applyTo [this args] (apply vision-fn args))) rlm@151: rlm@216: (defn pixel-sense [sensitivity pixel] rlm@216: (let [s-r (bit-shift-right (bit-and 0xFF0000 sensitivity) 16) rlm@216: s-g (bit-shift-right (bit-and 0x00FF00 sensitivity) 8) rlm@216: s-b (bit-and 0x0000FF sensitivity) rlm@216: rlm@216: p-r (bit-shift-right (bit-and 0xFF0000 pixel) 16) rlm@216: p-g (bit-shift-right (bit-and 0x00FF00 pixel) 8) rlm@216: p-b (bit-and 0x0000FF pixel) rlm@216: rlm@216: total-sensitivity (* 255 (+ s-r s-g s-b))] rlm@216: (float (/ (+ (* s-r p-r) rlm@216: (* s-g p-g) rlm@216: (* s-b p-b)) rlm@216: total-sensitivity)))) rlm@216: rlm@215: (defn vision-kernel rlm@171: "Returns a list of functions, each of which will return a color rlm@171: channel's worth of visual information when called inside a running rlm@171: simulation." rlm@151: [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}] rlm@169: (let [retinal-map (retina-sensor-profile eye) rlm@169: camera (add-eye! creature eye) rlm@151: vision-image rlm@151: (atom rlm@151: (BufferedImage. (.getWidth camera) rlm@151: (.getHeight camera) rlm@170: BufferedImage/TYPE_BYTE_BINARY)) rlm@170: register-eye! rlm@170: (runonce rlm@170: (fn [world] rlm@170: (add-camera! rlm@170: world camera rlm@170: (let [counter (atom 0)] rlm@170: (fn [r fb bb bi] rlm@170: (if (zero? (rem (swap! counter inc) (inc skip))) rlm@170: (reset! vision-image rlm@170: (BufferedImage! r fb bb bi))))))))] rlm@151: (vec rlm@151: (map rlm@151: (fn [[key image]] rlm@151: (let [whites (white-coordinates image) rlm@151: topology (vec (collapse whites)) rlm@216: sensitivity (sensitivity-presets key key)] rlm@215: (attached-viewport. rlm@215: (fn [world] rlm@215: (register-eye! world) rlm@215: (vector rlm@215: topology rlm@215: (vec rlm@215: (for [[x y] whites] rlm@216: (pixel-sense rlm@216: sensitivity rlm@216: (.getRGB @vision-image x y)))))) rlm@215: register-eye!))) rlm@215: retinal-map)))) rlm@151: rlm@215: (defn gen-fix-display rlm@215: "Create a function to call to restore a simulation's display when it rlm@215: is disrupted by a Viewport." rlm@215: [] rlm@215: (runonce rlm@215: (fn [world] rlm@215: (add-camera! world (.getCamera world) no-op)))) rlm@215: #+end_src rlm@170: rlm@215: Note that since each of the functions generated by =(vision-kernel)= rlm@215: shares the same =(register-eye!)= function, the eye will be registered rlm@215: only once the first time any of the functions from the list returned rlm@215: by =(vision-kernel)= is called. Each of the functions returned by rlm@215: =(vision-kernel)= also allows access to the =Viewport= through which rlm@215: it recieves images. rlm@215: rlm@215: The in-game display can be disrupted by all the viewports that the rlm@215: functions greated by =(vision-kernel)= add. This doesn't affect the rlm@215: simulation or the simulated senses, but can be annoying. rlm@215: =(gen-fix-display)= restores the in-simulation display. rlm@215: ocsenave@265: ** The =vision!= function creates sensory probes. rlm@215: rlm@218: All the hard work has been done; all that remains is to apply rlm@215: =(vision-kernel)= to each eye in the creature and gather the results rlm@215: into one list of functions. rlm@215: rlm@216: #+name: main rlm@215: #+begin_src clojure rlm@170: (defn vision! rlm@170: "Returns a function which returns visual sensory data when called rlm@218: inside a running simulation." rlm@151: [#^Node creature & {skip :skip :or {skip 0}}] rlm@151: (reduce rlm@170: concat rlm@167: (for [eye (eyes creature)] rlm@215: (vision-kernel creature eye)))) rlm@215: #+end_src rlm@151: ocsenave@265: ** Displaying visual data for debugging. ocsenave@265: # Visualization of Vision. Maybe less alliteration would be better. rlm@215: It's vital to have a visual representation for each sense. Here I use rlm@215: =(view-sense)= to construct a function that will create a display for rlm@215: visual data. rlm@215: rlm@216: #+name: display rlm@215: #+begin_src clojure rlm@216: (in-ns 'cortex.vision) rlm@216: rlm@189: (defn view-vision rlm@189: "Creates a function which accepts a list of visual sensor-data and rlm@189: displays each element of the list to the screen." rlm@189: [] rlm@188: (view-sense rlm@188: (fn rlm@188: [[coords sensor-data]] rlm@188: (let [image (points->image coords)] rlm@188: (dorun rlm@188: (for [i (range (count coords))] rlm@188: (.setRGB image ((coords i) 0) ((coords i) 1) rlm@216: (gray (int (* 255 (sensor-data i))))))) rlm@189: image)))) rlm@34: #+end_src rlm@23: ocsenave@264: * Demonstrations ocsenave@264: ** Demonstrating the vision pipeline. rlm@23: rlm@215: This is a basic test for the vision system. It only tests the ocsenave@264: vision-pipeline and does not deal with loading eyes from a blender rlm@215: file. The code creates two videos of the same rotating cube from rlm@215: different angles. rlm@23: rlm@215: #+name: test-1 rlm@23: #+begin_src clojure rlm@215: (in-ns 'cortex.test.vision) rlm@23: rlm@219: (defn test-pipeline rlm@69: "Testing vision: rlm@69: Tests the vision system by creating two views of the same rotating rlm@69: object from different angles and displaying both of those views in rlm@69: JFrames. rlm@69: rlm@69: You should see a rotating cube, and two windows, rlm@69: each displaying a different view of the cube." rlm@36: [] rlm@58: (let [candy rlm@58: (box 1 1 1 :physical? false :color ColorRGBA/Blue)] rlm@112: (world rlm@112: (doto (Node.) rlm@112: (.attachChild candy)) rlm@112: {} rlm@112: (fn [world] rlm@112: (let [cam (.clone (.getCamera world)) rlm@112: width (.getWidth cam) rlm@112: height (.getHeight cam)] rlm@169: (add-camera! world cam rlm@215: (comp rlm@215: (view-image rlm@215: (File. "/home/r/proj/cortex/render/vision/1")) rlm@215: BufferedImage!)) rlm@169: (add-camera! world rlm@112: (doto (.clone cam) rlm@112: (.setLocation (Vector3f. -10 0 0)) rlm@112: (.lookAt Vector3f/ZERO Vector3f/UNIT_Y)) rlm@215: (comp rlm@215: (view-image rlm@215: (File. "/home/r/proj/cortex/render/vision/2")) rlm@215: BufferedImage!)) rlm@112: ;; This is here to restore the main view rlm@112: ;; after the other views have completed processing rlm@169: (add-camera! world (.getCamera world) no-op))) rlm@112: (fn [world tpf] rlm@112: (.rotate candy (* tpf 0.2) 0 0))))) rlm@23: #+end_src rlm@23: rlm@215: #+begin_html rlm@215:
A rotating cube viewed from two different perspectives.
rlm@215:Simulated Vision in a Virtual Environment
rlm@218: