# HG changeset patch # User Robert McIntyre # Date 1396041043 14400 # Node ID 3401053124b0a867d1598f36904df062d260a061 # Parent ae10f35022ba7548f49fe02390053f64bcb27149 integrating vision into thesis. diff -r ae10f35022ba -r 3401053124b0 org/vision.org --- a/org/vision.org Fri Mar 28 16:34:35 2014 -0400 +++ b/org/vision.org Fri Mar 28 17:10:43 2014 -0400 @@ -174,21 +174,18 @@ (bind-sense target cam) cam)) #+end_src -#+results: add-eye -: #'cortex.vision/add-eye! - Here, the camera is created based on metadata on the eye-node and attached to the nearest physical object with =bind-sense= ** The Retina An eye is a surface (the retina) which contains many discrete sensors -to detect light. These sensors have can have different light-sensing -properties. In humans, each discrete sensor is sensitive to red, -blue, green, or gray. These different types of sensors can have -different spatial distributions along the retina. In humans, there is -a fovea in the center of the retina which has a very high density of -color sensors, and a blind spot which has no sensors at all. Sensor -density decreases in proportion to distance from the fovea. +to detect light. These sensors can have different light-sensing +properties. In humans, each discrete sensor is sensitive to red, blue, +green, or gray. These different types of sensors can have different +spatial distributions along the retina. In humans, there is a fovea in +the center of the retina which has a very high density of color +sensors, and a blind spot which has no sensors at all. Sensor density +decreases in proportion to distance from the fovea. I want to be able to model any retinal configuration, so my eye-nodes in blender contain metadata pointing to images that describe the diff -r ae10f35022ba -r 3401053124b0 thesis/cortex.org --- a/thesis/cortex.org Fri Mar 28 16:34:35 2014 -0400 +++ b/thesis/cortex.org Fri Mar 28 17:10:43 2014 -0400 @@ -6,22 +6,36 @@ #+LaTeX_CLASS_OPTIONS: [nofloat] * COMMENT templates - #+caption: - #+caption: - #+caption: - #+caption: - #+name: name - #+begin_listing clojure - #+begin_src clojure - #+end_src - #+end_listing + #+caption: + #+caption: + #+caption: + #+caption: + #+name: name + #+begin_listing clojure + #+end_listing - #+caption: - #+caption: - #+caption: - #+name: name - #+ATTR_LaTeX: :width 10cm - [[./images/aurellem-gray.png]] + #+caption: + #+caption: + #+caption: + #+name: name + #+ATTR_LaTeX: :width 10cm + [[./images/aurellem-gray.png]] + + #+caption: + #+caption: + #+caption: + #+caption: + #+name: name + #+begin_listing clojure + #+end_listing + + #+caption: + #+caption: + #+caption: + #+name: name + #+ATTR_LaTeX: :width 10cm + [[./images/aurellem-gray.png]] + * COMMENT Empathy and Embodiment as problem solving strategies @@ -942,6 +956,285 @@ ** Eyes reuse standard video game components + Vision is one of the most important senses for humans, so I need to + build a simulated sense of vision for my AI. I will do this with + simulated eyes. Each eye can be independently moved and should see + its own version of the world depending on where it is. + + Making these simulated eyes a reality is simple because + jMonkeyEngine already contains extensive support for multiple views + of the same 3D simulated world. The reason jMonkeyEngine has this + support is because the support is necessary to create games with + split-screen views. Multiple views are also used to create + efficient pseudo-reflections by rendering the scene from a certain + perspective and then projecting it back onto a surface in the 3D + world. + + #+caption: jMonkeyEngine supports multiple views to enable + #+caption: split-screen games, like GoldenEye, which was one of + #+caption: the first games to use split-screen views. + #+name: name + #+ATTR_LaTeX: :width 10cm + [[./images/goldeneye-4-player.png]] + +*** A Brief Description of jMonkeyEngine's Rendering Pipeline + + jMonkeyEngine allows you to create a =ViewPort=, which represents a + view of the simulated world. You can create as many of these as you + want. Every frame, the =RenderManager= iterates through each + =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there + is a =FrameBuffer= which represents the rendered image in the GPU. + + #+caption: =ViewPorts= are cameras in the world. During each frame, + #+caption: the =RenderManager= records a snapshot of what each view + #+caption: is currently seeing; these snapshots are =FrameBuffer= objects. + #+name: name + #+ATTR_LaTeX: :width 10cm + [[../images/diagram_rendermanager2.png]] + + Each =ViewPort= can have any number of attached =SceneProcessor= + objects, which are called every time a new frame is rendered. A + =SceneProcessor= receives its =ViewPort's= =FrameBuffer= and can do + whatever it wants to the data. Often this consists of invoking GPU + specific operations on the rendered image. The =SceneProcessor= can + also copy the GPU image data to RAM and process it with the CPU. + +*** Appropriating Views for Vision + + Each eye in the simulated creature needs its own =ViewPort= so + that it can see the world from its own perspective. To this + =ViewPort=, I add a =SceneProcessor= that feeds the visual data to + any arbitrary continuation function for further processing. That + continuation function may perform both CPU and GPU operations on + the data. To make this easy for the continuation function, the + =SceneProcessor= maintains appropriately sized buffers in RAM to + hold the data. It does not do any copying from the GPU to the CPU + itself because it is a slow operation. + + #+caption: Function to make the rendered secne in jMonkeyEngine + #+caption: available for further processing. + #+name: pipeline-1 + #+begin_listing clojure + #+begin_src clojure +(defn vision-pipeline + "Create a SceneProcessor object which wraps a vision processing + continuation function. The continuation is a function that takes + [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi], + each of which has already been appropriately sized." + [continuation] + (let [byte-buffer (atom nil) + renderer (atom nil) + image (atom nil)] + (proxy [SceneProcessor] [] + (initialize + [renderManager viewPort] + (let [cam (.getCamera viewPort) + width (.getWidth cam) + height (.getHeight cam)] + (reset! renderer (.getRenderer renderManager)) + (reset! byte-buffer + (BufferUtils/createByteBuffer + (* width height 4))) + (reset! image (BufferedImage. + width height + BufferedImage/TYPE_4BYTE_ABGR)))) + (isInitialized [] (not (nil? @byte-buffer))) + (reshape [_ _ _]) + (preFrame [_]) + (postQueue [_]) + (postFrame + [#^FrameBuffer fb] + (.clear @byte-buffer) + (continuation @renderer fb @byte-buffer @image)) + (cleanup [])))) + #+end_src + #+end_listing + + The continuation function given to =vision-pipeline= above will be + given a =Renderer= and three containers for image data. The + =FrameBuffer= references the GPU image data, but the pixel data + can not be used directly on the CPU. The =ByteBuffer= and + =BufferedImage= are initially "empty" but are sized to hold the + data in the =FrameBuffer=. I call transferring the GPU image data + to the CPU structures "mixing" the image data. + +*** Optical sensor arrays are described with images and referenced with metadata + + The vision pipeline described above handles the flow of rendered + images. Now, =CORTEX= needs simulated eyes to serve as the source + of these images. + + An eye is described in blender in the same way as a joint. They + are zero dimensional empty objects with no geometry whose local + coordinate system determines the orientation of the resulting eye. + All eyes are children of a parent node named "eyes" just as all + joints have a parent named "joints". An eye binds to the nearest + physical object with =bind-sense=. + + #+caption: Here, the camera is created based on metadata on the + #+caption: eye-node and attached to the nearest physical object + #+caption: with =bind-sense= + #+name: add-eye + #+begin_listing clojure +(defn add-eye! + "Create a Camera centered on the current position of 'eye which + follows the closest physical node in 'creature. The camera will + point in the X direction and use the Z vector as up as determined + by the rotation of these vectors in blender coordinate space. Use + XZY rotation for the node in blender." + [#^Node creature #^Spatial eye] + (let [target (closest-node creature eye) + [cam-width cam-height] + ;;[640 480] ;; graphics card on laptop doesn't support + ;; arbitray dimensions. + (eye-dimensions eye) + cam (Camera. cam-width cam-height) + rot (.getWorldRotation eye)] + (.setLocation cam (.getWorldTranslation eye)) + (.lookAtDirection + cam ; this part is not a mistake and + (.mult rot Vector3f/UNIT_X) ; is consistent with using Z in + (.mult rot Vector3f/UNIT_Y)) ; blender as the UP vector. + (.setFrustumPerspective + cam (float 45) + (float (/ (.getWidth cam) (.getHeight cam))) + (float 1) + (float 1000)) + (bind-sense target cam) cam)) + #+end_listing + +*** Simulated Retina + + An eye is a surface (the retina) which contains many discrete + sensors to detect light. These sensors can have different + light-sensing properties. In humans, each discrete sensor is + sensitive to red, blue, green, or gray. These different types of + sensors can have different spatial distributions along the retina. + In humans, there is a fovea in the center of the retina which has + a very high density of color sensors, and a blind spot which has + no sensors at all. Sensor density decreases in proportion to + distance from the fovea. + + I want to be able to model any retinal configuration, so my + eye-nodes in blender contain metadata pointing to images that + describe the precise position of the individual sensors using + white pixels. The meta-data also describes the precise sensitivity + to light that the sensors described in the image have. An eye can + contain any number of these images. For example, the metadata for + an eye might look like this: + + #+begin_src clojure +{0xFF0000 "Models/test-creature/retina-small.png"} + #+end_src + + #+caption: An example retinal profile image. White pixels are + #+caption: photo-sensitive elements. The distribution of white + #+caption: pixels is denser in the middle and falls off at the + #+caption: edges and is inspired by the human retina. + #+name: retina + #+ATTR_LaTeX: :width 10cm + [[./images/retina-small.png]] + + Together, the number 0xFF0000 and the image image above describe + the placement of red-sensitive sensory elements. + + Meta-data to very crudely approximate a human eye might be + something like this: + + #+begin_src clojure +(let [retinal-profile "Models/test-creature/retina-small.png"] + {0xFF0000 retinal-profile + 0x00FF00 retinal-profile + 0x0000FF retinal-profile + 0xFFFFFF retinal-profile}) + #+end_src + + The numbers that serve as keys in the map determine a sensor's + relative sensitivity to the channels red, green, and blue. These + sensitivity values are packed into an integer in the order + =|_|R|G|B|= in 8-bit fields. The RGB values of a pixel in the + image are added together with these sensitivities as linear + weights. Therefore, 0xFF0000 means sensitive to red only while + 0xFFFFFF means sensitive to all colors equally (gray). + + #+caption: This is the core of vision in =CORTEX=. A given eye node + #+caption: is converted into a function that returns visual + #+caption: information from the simulation. + #+name: name + #+begin_listing clojure +(defn vision-kernel + "Returns a list of functions, each of which will return a color + channel's worth of visual information when called inside a running + simulation." + [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}] + (let [retinal-map (retina-sensor-profile eye) + camera (add-eye! creature eye) + vision-image + (atom + (BufferedImage. (.getWidth camera) + (.getHeight camera) + BufferedImage/TYPE_BYTE_BINARY)) + register-eye! + (runonce + (fn [world] + (add-camera! + world camera + (let [counter (atom 0)] + (fn [r fb bb bi] + (if (zero? (rem (swap! counter inc) (inc skip))) + (reset! vision-image + (BufferedImage! r fb bb bi))))))))] + (vec + (map + (fn [[key image]] + (let [whites (white-coordinates image) + topology (vec (collapse whites)) + sensitivity (sensitivity-presets key key)] + (attached-viewport. + (fn [world] + (register-eye! world) + (vector + topology + (vec + (for [[x y] whites] + (pixel-sense + sensitivity + (.getRGB @vision-image x y)))))) + register-eye!))) + retinal-map)))) + #+end_listing + + Note that since each of the functions generated by =vision-kernel= + shares the same =register-eye!= function, the eye will be + registered only once the first time any of the functions from the + list returned by =vision-kernel= is called. Each of the functions + returned by =vision-kernel= also allows access to the =Viewport= + through which it receives images. + + All the hard work has been done; all that remains is to apply + =vision-kernel= to each eye in the creature and gather the results + into one list of functions. + + + #+caption: With =vision!=, =CORTEX= is already a fine simulation + #+caption: environment for experimenting with different types of + #+caption: eyes. + #+name: vision! + #+begin_listing clojure +(defn vision! + "Returns a list of functions, each of which returns visual sensory + data when called inside a running simulation." + [#^Node creature & {skip :skip :or {skip 0}}] + (reduce + concat + (for [eye (eyes creature)] + (vision-kernel creature eye)))) + #+end_listing + + + + + ** Hearing is hard; =CORTEX= does it right ** Touch uses hundreds of hair-like elements diff -r ae10f35022ba -r 3401053124b0 thesis/images/retina-small.png Binary file thesis/images/retina-small.png has changed