rlm@34: #+title: Simulated Sense of Sight rlm@23: #+author: Robert McIntyre rlm@23: #+email: rlm@mit.edu rlm@38: #+description: Simulated sight for AI research using JMonkeyEngine3 and clojure rlm@34: #+keywords: computer vision, jMonkeyEngine3, clojure rlm@23: #+SETUPFILE: ../../aurellem/org/setup.org rlm@23: #+INCLUDE: ../../aurellem/org/level-0.org rlm@23: #+babel: :mkdirp yes :noweb yes :exports both rlm@23: rlm@194: * Vision rlm@23: rlm@151: rlm@212: Vision is one of the most important senses for humans, so I need to rlm@212: build a simulated sense of vision for my AI. I will do this with rlm@212: simulated eyes. Each eye can be independely moved and should see its rlm@212: own version of the world depending on where it is. rlm@212: rlm@212: Making these simulated eyes a reality is fairly simple bacause rlm@212: jMonkeyEngine already conatains extensive support for multiple views rlm@212: of the same 3D simulated world. The reason jMonkeyEngine has this rlm@212: support is because the support is necessary to create games with rlm@212: split-screen views. Multiple views are also used to create efficient rlm@212: pseudo-reflections by rendering the scene from a certain perspective rlm@212: and then projecting it back onto a surface in the 3D world. rlm@212: rlm@212: #+caption: jMonkeyEngine supports multiple views to enable split-screen games, like GoldenEye rlm@212: [[../images/goldeneye-4-player.png]] rlm@212: rlm@213: * Brief Description of jMonkeyEngine's Rendering Pipeline rlm@212: rlm@213: jMonkeyEngine allows you to create a =ViewPort=, which represents a rlm@213: view of the simulated world. You can create as many of these as you rlm@213: want. Every frame, the =RenderManager= iterates through each rlm@213: =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there rlm@213: is a =FrameBuffer= which represents the rendered image in the GPU. rlm@151: rlm@213: Each =ViewPort= can have any number of attached =SceneProcessor= rlm@213: objects, which are called every time a new frame is rendered. A rlm@213: =SceneProcessor= recieves a =FrameBuffer= and can do whatever it wants rlm@213: to the data. Often this consists of invoking GPU specific operations rlm@213: on the rendered image. The =SceneProcessor= can also copy the GPU rlm@213: image data to RAM and process it with the CPU. rlm@151: rlm@213: * The Vision Pipeline rlm@151: rlm@213: Each eye in the simulated creature needs it's own =ViewPort= so that rlm@213: it can see the world from its own perspective. To this =ViewPort=, I rlm@214: add a =SceneProcessor= that feeds the visual data to any arbitray rlm@213: continuation function for further processing. That continuation rlm@213: function may perform both CPU and GPU operations on the data. To make rlm@213: this easy for the continuation function, the =SceneProcessor= rlm@213: maintains appropriatly sized buffers in RAM to hold the data. It does rlm@213: not do any copying from the GPU to the CPU itself. rlm@214: rlm@213: #+name: pipeline-1 rlm@213: #+begin_src clojure rlm@113: (defn vision-pipeline rlm@34: "Create a SceneProcessor object which wraps a vision processing rlm@113: continuation function. The continuation is a function that takes rlm@113: [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi], rlm@113: each of which has already been appropiately sized." rlm@23: [continuation] rlm@23: (let [byte-buffer (atom nil) rlm@113: renderer (atom nil) rlm@113: image (atom nil)] rlm@23: (proxy [SceneProcessor] [] rlm@23: (initialize rlm@23: [renderManager viewPort] rlm@23: (let [cam (.getCamera viewPort) rlm@23: width (.getWidth cam) rlm@23: height (.getHeight cam)] rlm@23: (reset! renderer (.getRenderer renderManager)) rlm@23: (reset! byte-buffer rlm@23: (BufferUtils/createByteBuffer rlm@113: (* width height 4))) rlm@113: (reset! image (BufferedImage. rlm@113: width height rlm@113: BufferedImage/TYPE_4BYTE_ABGR)))) rlm@23: (isInitialized [] (not (nil? @byte-buffer))) rlm@23: (reshape [_ _ _]) rlm@23: (preFrame [_]) rlm@23: (postQueue [_]) rlm@23: (postFrame rlm@23: [#^FrameBuffer fb] rlm@23: (.clear @byte-buffer) rlm@113: (continuation @renderer fb @byte-buffer @image)) rlm@23: (cleanup [])))) rlm@213: #+end_src rlm@213: rlm@213: The continuation function given to =(vision-pipeline)= above will be rlm@213: given a =Renderer= and three containers for image data. The rlm@213: =FrameBuffer= references the GPU image data, but it can not be used rlm@213: directly on the CPU. The =ByteBuffer= and =BufferedImage= are rlm@213: initially "empty" but are sized to hold to data in the rlm@213: =FrameBuffer=. I call transfering the GPU image data to the CPU rlm@213: structures "mixing" the image data. I have provided three functions to rlm@213: do this mixing. rlm@213: rlm@213: #+name: pipeline-2 rlm@213: #+begin_src clojure rlm@113: (defn frameBuffer->byteBuffer! rlm@113: "Transfer the data in the graphics card (Renderer, FrameBuffer) to rlm@113: the CPU (ByteBuffer)." rlm@113: [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb] rlm@113: (.readFrameBuffer r fb bb) bb) rlm@113: rlm@113: (defn byteBuffer->bufferedImage! rlm@113: "Convert the C-style BGRA image data in the ByteBuffer bb to the AWT rlm@113: style ABGR image data and place it in BufferedImage bi." rlm@113: [#^ByteBuffer bb #^BufferedImage bi] rlm@113: (Screenshots/convertScreenShot bb bi) bi) rlm@113: rlm@113: (defn BufferedImage! rlm@113: "Continuation which will grab the buffered image from the materials rlm@113: provided by (vision-pipeline)." rlm@113: [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb #^BufferedImage bi] rlm@113: (byteBuffer->bufferedImage! rlm@113: (frameBuffer->byteBuffer! r fb bb) bi)) rlm@213: #+end_src rlm@112: rlm@213: Note that it is possible to write vision processing algorithms rlm@213: entirely in terms of =BufferedImage= inputs. Just compose that rlm@213: =BufferedImage= algorithm with =(BufferedImage!)=. However, a vision rlm@213: processing algorithm that is entirely hosted on the GPU does not have rlm@213: to pay for this convienence. rlm@213: rlm@214: * COMMENT asdasd rlm@213: rlm@213: (vision creature) will take an optional :skip argument which will rlm@213: inform the continuations in scene processor to skip the given rlm@213: number of cycles 0 means that no cycles will be skipped. rlm@213: rlm@213: (vision creature) will return [init-functions sensor-functions]. rlm@213: The init-functions are each single-arg functions that take the rlm@213: world and register the cameras and must each be called before the rlm@213: corresponding sensor-functions. Each init-function returns the rlm@213: viewport for that eye which can be manipulated, saved, etc. Each rlm@213: sensor-function is a thunk and will return data in the same rlm@213: format as the tactile-sensor functions the structure is rlm@213: [topology, sensor-data]. Internally, these sensor-functions rlm@213: maintain a reference to sensor-data which is periodically updated rlm@213: by the continuation function established by its init-function. rlm@213: They can be queried every cycle, but their information may not rlm@213: necessairly be different every cycle. rlm@213: rlm@213: rlm@214: rlm@214: * Physical Eyes rlm@214: rlm@214: The vision pipeline described above handles the flow of rendered rlm@214: images. Now, we need simulated eyes to serve as the source of these rlm@214: images. rlm@214: rlm@214: An eye is described in blender in the same way as a joint. They are rlm@214: zero dimensional empty objects with no geometry whose local coordinate rlm@214: system determines the orientation of the resulting eye. All eyes are rlm@214: childern of a parent node named "eyes" just as all joints have a rlm@214: parent named "joints". An eye binds to the nearest physical object rlm@214: with =(bind-sense=). rlm@214: rlm@214: #+name: add-eye rlm@214: #+begin_src clojure rlm@214: (defn add-eye! rlm@214: "Create a Camera centered on the current position of 'eye which rlm@214: follows the closest physical node in 'creature and sends visual rlm@214: data to 'continuation." rlm@214: [#^Node creature #^Spatial eye] rlm@214: (let [target (closest-node creature eye) rlm@214: [cam-width cam-height] (eye-dimensions eye) rlm@214: cam (Camera. cam-width cam-height)] rlm@214: (.setLocation cam (.getWorldTranslation eye)) rlm@214: (.setRotation cam (.getWorldRotation eye)) rlm@214: (.setFrustumPerspective rlm@214: cam 45 (/ (.getWidth cam) (.getHeight cam)) rlm@214: 1 1000) rlm@214: (bind-sense target cam) rlm@214: cam)) rlm@214: #+end_src rlm@214: rlm@214: Here, the camera is created based on metadata on the eye-node and rlm@214: attached to the nearest physical object with =(bind-sense)= rlm@214: rlm@214: rlm@214: ** The Retina rlm@214: rlm@214: An eye is a surface (the retina) which contains many discrete sensors rlm@214: to detect light. These sensors have can have different-light sensing rlm@214: properties. In humans, each discrete sensor is sensitive to red, rlm@214: blue, green, or gray. These different types of sensors can have rlm@214: different spatial distributions along the retina. In humans, there is rlm@214: a fovea in the center of the retina which has a very high density of rlm@214: color sensors, and a blind spot which has no sensors at all. Sensor rlm@214: density decreases in proportion to distance from the retina. rlm@214: rlm@214: I want to be able to model any retinal configuration, so my eye-nodes rlm@214: in blender contain metadata pointing to images that describe the rlm@214: percise position of the individual sensors using white pixels. The rlm@214: meta-data also describes the percise sensitivity to light that the rlm@214: sensors described in the image have. An eye can contain any number of rlm@214: these images. For example, the metadata for an eye might look like rlm@214: this: rlm@214: rlm@214: #+begin_src clojure rlm@214: {0xFF0000 "Models/test-creature/retina-small.png"} rlm@214: #+end_src rlm@214: rlm@214: #+caption: The retinal profile image "Models/test-creature/retina-small.png". White pixels are photo-sensitive elements. The distribution of white pixels is denser in the middle and falls off at the edges and is inspired by the human retina. rlm@214: [[../assets/Models/test-creature/retina-small.png]] rlm@214: rlm@214: Together, the number 0xFF0000 and the image image above describe the rlm@214: placement of red-sensitive sensory elements. rlm@214: rlm@214: Meta-data to very crudely approximate a human eye might be something rlm@214: like this: rlm@214: rlm@214: #+begin_src clojure rlm@214: (let [retinal-profile "Models/test-creature/retina-small.png"] rlm@214: {0xFF0000 retinal-profile rlm@214: 0x00FF00 retinal-profile rlm@214: 0x0000FF retinal-profile rlm@214: 0xFFFFFF retinal-profile}) rlm@214: #+end_src rlm@214: rlm@214: The numbers that serve as keys in the map determine a sensor's rlm@214: relative sensitivity to the channels red, green, and blue. These rlm@214: sensitivity values are packed into an integer in the order _RGB in rlm@214: 8-bit fields. The RGB values of a pixel in the image are added rlm@214: together with these sensitivities as linear weights. Therfore, rlm@214: 0xFF0000 means sensitive to red only while 0xFFFFFF means sensitive to rlm@214: all colors equally (gray). rlm@214: rlm@214: For convienence I've defined a few symbols for the more common rlm@214: sensitivity values. rlm@214: rlm@214: #+name: sensitivity rlm@214: #+begin_src clojure rlm@214: (defvar sensitivity-presets rlm@214: {:all 0xFFFFFF rlm@214: :red 0xFF0000 rlm@214: :blue 0x0000FF rlm@214: :green 0x00FF00} rlm@214: "Retinal sensitivity presets for sensors that extract one channel rlm@214: (:red :blue :green) or average all channels (:gray)") rlm@214: #+end_src rlm@214: rlm@214: ** Metadata Processing rlm@214: rlm@214: =(retina-sensor-profile)= extracts a map from the eye-node in the same rlm@214: format as the example maps above. =(eye-dimensions)= finds the rlm@214: dimansions of the smallest image required to contain all the retinal rlm@214: sensor maps. rlm@214: rlm@214: #+begin_src clojure rlm@214: (defn retina-sensor-profile rlm@214: "Return a map of pixel sensitivity numbers to BufferedImages rlm@214: describing the distribution of light-sensitive components of this rlm@214: eye. :red, :green, :blue, :gray are already defined as extracting rlm@214: the red, green, blue, and average components respectively." rlm@214: [#^Spatial eye] rlm@214: (if-let [eye-map (meta-data eye "eye")] rlm@214: (map-vals rlm@214: load-image rlm@214: (eval (read-string eye-map))))) rlm@214: rlm@214: (defn eye-dimensions rlm@214: "Returns [width, height] specified in the metadata of the eye" rlm@214: [#^Spatial eye] rlm@214: (let [dimensions rlm@214: (map #(vector (.getWidth %) (.getHeight %)) rlm@214: (vals (retina-sensor-profile eye)))] rlm@214: [(apply max (map first dimensions)) rlm@214: (apply max (map second dimensions))])) rlm@214: #+end_src rlm@214: rlm@214: rlm@214: * Eye Creation rlm@214: rlm@214: First off, get the children of the "eyes" empty node to find all the rlm@214: eyes the creature has. rlm@214: rlm@214: #+begin_src clojure rlm@214: (defvar rlm@214: ^{:arglists '([creature])} rlm@214: eyes rlm@214: (sense-nodes "eyes") rlm@214: "Return the children of the creature's \"eyes\" node.") rlm@214: #+end_src rlm@214: rlm@214: Then, rlm@214: rlm@213: #+begin_src clojure rlm@169: (defn add-camera! rlm@169: "Add a camera to the world, calling continuation on every frame rlm@34: produced." rlm@167: [#^Application world camera continuation] rlm@23: (let [width (.getWidth camera) rlm@23: height (.getHeight camera) rlm@23: render-manager (.getRenderManager world) rlm@23: viewport (.createMainView render-manager "eye-view" camera)] rlm@23: (doto viewport rlm@23: (.setClearFlags true true true) rlm@112: (.setBackgroundColor ColorRGBA/Black) rlm@113: (.addProcessor (vision-pipeline continuation)) rlm@23: (.attachScene (.getRootNode world))))) rlm@151: rlm@151: rlm@151: rlm@151: rlm@151: rlm@169: (defn vision-fn rlm@171: "Returns a list of functions, each of which will return a color rlm@171: channel's worth of visual information when called inside a running rlm@171: simulation." rlm@151: [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}] rlm@169: (let [retinal-map (retina-sensor-profile eye) rlm@169: camera (add-eye! creature eye) rlm@151: vision-image rlm@151: (atom rlm@151: (BufferedImage. (.getWidth camera) rlm@151: (.getHeight camera) rlm@170: BufferedImage/TYPE_BYTE_BINARY)) rlm@170: register-eye! rlm@170: (runonce rlm@170: (fn [world] rlm@170: (add-camera! rlm@170: world camera rlm@170: (let [counter (atom 0)] rlm@170: (fn [r fb bb bi] rlm@170: (if (zero? (rem (swap! counter inc) (inc skip))) rlm@170: (reset! vision-image rlm@170: (BufferedImage! r fb bb bi))))))))] rlm@151: (vec rlm@151: (map rlm@151: (fn [[key image]] rlm@151: (let [whites (white-coordinates image) rlm@151: topology (vec (collapse whites)) rlm@214: mask (color-channel-presets key key)] rlm@170: (fn [world] rlm@170: (register-eye! world) rlm@151: (vector rlm@151: topology rlm@151: (vec rlm@151: (for [[x y] whites] rlm@151: (bit-and rlm@151: mask (.getRGB @vision-image x y)))))))) rlm@170: retinal-map)))) rlm@151: rlm@170: rlm@170: ;; TODO maybe should add a viewport-manipulation function to rlm@170: ;; automatically change viewport settings, attach shadow filters, etc. rlm@170: rlm@170: (defn vision! rlm@170: "Returns a function which returns visual sensory data when called rlm@170: inside a running simulation" rlm@151: [#^Node creature & {skip :skip :or {skip 0}}] rlm@151: (reduce rlm@170: concat rlm@167: (for [eye (eyes creature)] rlm@169: (vision-fn creature eye)))) rlm@151: rlm@189: (defn view-vision rlm@189: "Creates a function which accepts a list of visual sensor-data and rlm@189: displays each element of the list to the screen." rlm@189: [] rlm@188: (view-sense rlm@188: (fn rlm@188: [[coords sensor-data]] rlm@188: (let [image (points->image coords)] rlm@188: (dorun rlm@188: (for [i (range (count coords))] rlm@188: (.setRGB image ((coords i) 0) ((coords i) 1) rlm@188: (sensor-data i)))) rlm@189: image)))) rlm@188: rlm@34: #+end_src rlm@23: rlm@112: rlm@34: Note the use of continuation passing style for connecting the eye to a rlm@34: function to process the output. You can create any number of eyes, and rlm@34: each of them will see the world from their own =Camera=. Once every rlm@34: frame, the rendered image is copied to a =BufferedImage=, and that rlm@34: data is sent off to the continuation function. Moving the =Camera= rlm@34: which was used to create the eye will change what the eye sees. rlm@23: rlm@34: * Example rlm@23: rlm@66: #+name: test-vision rlm@23: #+begin_src clojure rlm@68: (ns cortex.test.vision rlm@34: (:use (cortex world util vision)) rlm@34: (:import java.awt.image.BufferedImage) rlm@34: (:import javax.swing.JPanel) rlm@34: (:import javax.swing.SwingUtilities) rlm@34: (:import java.awt.Dimension) rlm@34: (:import javax.swing.JFrame) rlm@34: (:import com.jme3.math.ColorRGBA) rlm@45: (:import com.jme3.scene.Node) rlm@113: (:import com.jme3.math.Vector3f)) rlm@23: rlm@36: (defn test-two-eyes rlm@69: "Testing vision: rlm@69: Tests the vision system by creating two views of the same rotating rlm@69: object from different angles and displaying both of those views in rlm@69: JFrames. rlm@69: rlm@69: You should see a rotating cube, and two windows, rlm@69: each displaying a different view of the cube." rlm@36: [] rlm@58: (let [candy rlm@58: (box 1 1 1 :physical? false :color ColorRGBA/Blue)] rlm@112: (world rlm@112: (doto (Node.) rlm@112: (.attachChild candy)) rlm@112: {} rlm@112: (fn [world] rlm@112: (let [cam (.clone (.getCamera world)) rlm@112: width (.getWidth cam) rlm@112: height (.getHeight cam)] rlm@169: (add-camera! world cam rlm@113: ;;no-op rlm@113: (comp (view-image) BufferedImage!) rlm@112: ) rlm@169: (add-camera! world rlm@112: (doto (.clone cam) rlm@112: (.setLocation (Vector3f. -10 0 0)) rlm@112: (.lookAt Vector3f/ZERO Vector3f/UNIT_Y)) rlm@113: ;;no-op rlm@113: (comp (view-image) BufferedImage!)) rlm@112: ;; This is here to restore the main view rlm@112: ;; after the other views have completed processing rlm@169: (add-camera! world (.getCamera world) no-op))) rlm@112: (fn [world tpf] rlm@112: (.rotate candy (* tpf 0.2) 0 0))))) rlm@23: #+end_src rlm@23: rlm@213: #+name: vision-header rlm@213: #+begin_src clojure rlm@213: (ns cortex.vision rlm@213: "Simulate the sense of vision in jMonkeyEngine3. Enables multiple rlm@213: eyes from different positions to observe the same world, and pass rlm@213: the observed data to any arbitray function. Automatically reads rlm@213: eye-nodes from specially prepared blender files and instanttiates rlm@213: them in the world as actual eyes." rlm@213: {:author "Robert McIntyre"} rlm@213: (:use (cortex world sense util)) rlm@213: (:use clojure.contrib.def) rlm@213: (:import com.jme3.post.SceneProcessor) rlm@213: (:import (com.jme3.util BufferUtils Screenshots)) rlm@213: (:import java.nio.ByteBuffer) rlm@213: (:import java.awt.image.BufferedImage) rlm@213: (:import (com.jme3.renderer ViewPort Camera)) rlm@213: (:import com.jme3.math.ColorRGBA) rlm@213: (:import com.jme3.renderer.Renderer) rlm@213: (:import com.jme3.app.Application) rlm@213: (:import com.jme3.texture.FrameBuffer) rlm@213: (:import (com.jme3.scene Node Spatial))) rlm@213: #+end_src rlm@112: rlm@34: The example code will create two videos of the same rotating object rlm@34: from different angles. It can be used both for stereoscopic vision rlm@34: simulation or for simulating multiple creatures, each with their own rlm@34: sense of vision. rlm@24: rlm@35: - As a neat bonus, this idea behind simulated vision also enables one rlm@35: to [[../../cortex/html/capture-video.html][capture live video feeds from jMonkeyEngine]]. rlm@35: rlm@24: rlm@212: * COMMENT Generate Source rlm@34: #+begin_src clojure :tangle ../src/cortex/vision.clj rlm@24: <> rlm@24: #+end_src rlm@24: rlm@68: #+begin_src clojure :tangle ../src/cortex/test/vision.clj rlm@24: <> rlm@24: #+end_src