cortex: thesis/cortex.org comparison

comparison thesis/cortex.org @ 470:3401053124b0

integrating vision into thesis.

author	Robert McIntyre <rlm@mit.edu>
date	Fri, 28 Mar 2014 17:10:43 -0400
parents	ae10f35022ba
children	f14fa9e5b67f

comparison

equal deleted inserted replaced

-:ae10f35022ba
+:3401053124b0
 #+description: Using embodied AI to facilitate Artificial Imagination.
 #+keywords: AI, clojure, embodiment
 #+LaTeX_CLASS_OPTIONS: [nofloat]
 * COMMENT templates
 #+caption:
 #+caption:
 #+caption:
 #+caption:
 #+name: name
 #+begin_listing clojure
-#+begin_src clojure
+#+end_listing
-#+end_src
-#+end_listing
+#+caption:
+#+caption:
 #+caption:
-#+caption:
+#+name: name
-#+caption:
+#+ATTR_LaTeX: :width 10cm
-#+name: name
+[[./images/aurellem-gray.png]]
-#+ATTR_LaTeX: :width 10cm
-[[./images/aurellem-gray.png]]
+#+caption:
+#+caption:
+#+caption:
+#+caption:
+#+name: name
+#+begin_listing clojure
+#+end_listing
+#+caption:
+#+caption:
+#+caption:
+#+name: name
+#+ATTR_LaTeX: :width 10cm
+[[./images/aurellem-gray.png]]
 * COMMENT Empathy and Embodiment as problem solving strategies
 By the end of this thesis, you will have seen a novel approach to
 interpreting video using embodiment and empathy. You will have also
 #+name: name
 #+ATTR_LaTeX: :width 15cm
 [[./images/physical-hand.png]]
 ** Eyes reuse standard video game components
+Vision is one of the most important senses for humans, so I need to
+build a simulated sense of vision for my AI. I will do this with
+simulated eyes. Each eye can be independently moved and should see
+its own version of the world depending on where it is.
+Making these simulated eyes a reality is simple because
+jMonkeyEngine already contains extensive support for multiple views
+of the same 3D simulated world. The reason jMonkeyEngine has this
+support is because the support is necessary to create games with
+split-screen views. Multiple views are also used to create
+efficient pseudo-reflections by rendering the scene from a certain
+perspective and then projecting it back onto a surface in the 3D
+world.
+#+caption: jMonkeyEngine supports multiple views to enable
+#+caption: split-screen games, like GoldenEye, which was one of
+#+caption: the first games to use split-screen views.
+#+name: name
+#+ATTR_LaTeX: :width 10cm
+[[./images/goldeneye-4-player.png]]
+*** A Brief Description of jMonkeyEngine's Rendering Pipeline
+jMonkeyEngine allows you to create a =ViewPort=, which represents a
+view of the simulated world. You can create as many of these as you
+want. Every frame, the =RenderManager= iterates through each
+=ViewPort=, rendering the scene in the GPU. For each =ViewPort= there
+is a =FrameBuffer= which represents the rendered image in the GPU.
+#+caption: =ViewPorts= are cameras in the world. During each frame,
+#+caption: the =RenderManager= records a snapshot of what each view
+#+caption: is currently seeing; these snapshots are =FrameBuffer= objects.
+#+name: name
+#+ATTR_LaTeX: :width 10cm
+[[../images/diagram_rendermanager2.png]]
+Each =ViewPort= can have any number of attached =SceneProcessor=
+objects, which are called every time a new frame is rendered. A
+=SceneProcessor= receives its =ViewPort's= =FrameBuffer= and can do
+whatever it wants to the data.  Often this consists of invoking GPU
+specific operations on the rendered image.  The =SceneProcessor= can
+also copy the GPU image data to RAM and process it with the CPU.
+*** Appropriating Views for Vision
+Each eye in the simulated creature needs its own =ViewPort= so
+that it can see the world from its own perspective. To this
+=ViewPort=, I add a =SceneProcessor= that feeds the visual data to
+any arbitrary continuation function for further processing. That
+continuation function may perform both CPU and GPU operations on
+the data. To make this easy for the continuation function, the
+=SceneProcessor= maintains appropriately sized buffers in RAM to
+hold the data. It does not do any copying from the GPU to the CPU
+itself because it is a slow operation.
+#+caption: Function to make the rendered secne in jMonkeyEngine
+#+caption: available for further processing.
+#+name: pipeline-1
+#+begin_listing clojure
+#+begin_src clojure
+(defn vision-pipeline
+"Create a SceneProcessor object which wraps a vision processing
+continuation function. The continuation is a function that takes
+[#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi],
+each of which has already been appropriately sized."
+[continuation]
+(let [byte-buffer (atom nil)
+	renderer (atom nil)
+image (atom nil)]
+(proxy [SceneProcessor] []
+(initialize
+[renderManager viewPort]
+(let [cam (.getCamera viewPort)
+	   width (.getWidth cam)
+	   height (.getHeight cam)]
+(reset! renderer (.getRenderer renderManager))
+(reset! byte-buffer
+	     (BufferUtils/createByteBuffer
+	      (* width height 4)))
+(reset! image (BufferedImage.
+width height
+BufferedImage/TYPE_4BYTE_ABGR))))
+(isInitialized [] (not (nil? @byte-buffer)))
+(reshape [_ _ _])
+(preFrame [_])
+(postQueue [_])
+(postFrame
+[#^FrameBuffer fb]
+(.clear @byte-buffer)
+(continuation @renderer fb @byte-buffer @image))
+(cleanup []))))
+#+end_src
+#+end_listing
+The continuation function given to =vision-pipeline= above will be
+given a =Renderer= and three containers for image data. The
+=FrameBuffer= references the GPU image data, but the pixel data
+can not be used directly on the CPU. The =ByteBuffer= and
+=BufferedImage= are initially "empty" but are sized to hold the
+data in the =FrameBuffer=. I call transferring the GPU image data
+to the CPU structures "mixing" the image data.
+*** Optical sensor arrays are described with images and referenced with metadata
+The vision pipeline described above handles the flow of rendered
+images. Now, =CORTEX= needs simulated eyes to serve as the source
+of these images.
+An eye is described in blender in the same way as a joint. They
+are zero dimensional empty objects with no geometry whose local
+coordinate system determines the orientation of the resulting eye.
+All eyes are children of a parent node named "eyes" just as all
+joints have a parent named "joints". An eye binds to the nearest
+physical object with =bind-sense=.
+#+caption: Here, the camera is created based on metadata on the
+#+caption: eye-node and attached to the nearest physical object
+#+caption: with =bind-sense=
+#+name: add-eye
+#+begin_listing clojure
+(defn add-eye!
+"Create a Camera centered on the current position of 'eye which
+follows the closest physical node in 'creature. The camera will
+point in the X direction and use the Z vector as up as determined
+by the rotation of these vectors in blender coordinate space. Use
+XZY rotation for the node in blender."
+[#^Node creature #^Spatial eye]
+(let [target (closest-node creature eye)
+[cam-width cam-height]
+;;[640 480] ;; graphics card on laptop doesn't support
+;; arbitray dimensions.
+(eye-dimensions eye)
+cam (Camera. cam-width cam-height)
+rot (.getWorldRotation eye)]
+(.setLocation cam (.getWorldTranslation eye))
+(.lookAtDirection
+cam                           ; this part is not a mistake and
+(.mult rot Vector3f/UNIT_X)   ; is consistent with using Z in
+(.mult rot Vector3f/UNIT_Y))  ; blender as the UP vector.
+(.setFrustumPerspective
+cam (float 45)
+(float (/ (.getWidth cam) (.getHeight cam)))
+(float 1)
+(float 1000))
+(bind-sense target cam) cam))
+#+end_listing
+*** Simulated Retina
+An eye is a surface (the retina) which contains many discrete
+sensors to detect light. These sensors can have different
+light-sensing properties. In humans, each discrete sensor is
+sensitive to red, blue, green, or gray. These different types of
+sensors can have different spatial distributions along the retina.
+In humans, there is a fovea in the center of the retina which has
+a very high density of color sensors, and a blind spot which has
+no sensors at all. Sensor density decreases in proportion to
+distance from the fovea.
+I want to be able to model any retinal configuration, so my
+eye-nodes in blender contain metadata pointing to images that
+describe the precise position of the individual sensors using
+white pixels. The meta-data also describes the precise sensitivity
+to light that the sensors described in the image have. An eye can
+contain any number of these images. For example, the metadata for
+an eye might look like this:
+#+begin_src clojure
+{0xFF0000 "Models/test-creature/retina-small.png"}
+#+end_src
+#+caption: An example retinal profile image. White pixels are
+#+caption: photo-sensitive elements. The distribution of white
+#+caption: pixels is denser in the middle and falls off at the
+#+caption: edges and is inspired by the human retina.
+#+name: retina
+#+ATTR_LaTeX: :width 10cm
+[[./images/retina-small.png]]
+Together, the number 0xFF0000 and the image image above describe
+the placement of red-sensitive sensory elements.
+Meta-data to very crudely approximate a human eye might be
+something like this:
+#+begin_src clojure
+(let [retinal-profile "Models/test-creature/retina-small.png"]
+{0xFF0000 retinal-profile
+0x00FF00 retinal-profile
+0x0000FF retinal-profile
+0xFFFFFF retinal-profile})
+#+end_src
+The numbers that serve as keys in the map determine a sensor's
+relative sensitivity to the channels red, green, and blue. These
+sensitivity values are packed into an integer in the order
+=|_|R|G|B|= in 8-bit fields. The RGB values of a pixel in the
+image are added together with these sensitivities as linear
+weights. Therefore, 0xFF0000 means sensitive to red only while
+0xFFFFFF means sensitive to all colors equally (gray).
+#+caption: This is the core of vision in =CORTEX=. A given eye node
+#+caption: is converted into a function that returns visual
+#+caption: information from the simulation.
+#+name: name
+#+begin_listing clojure
+(defn vision-kernel
+"Returns a list of functions, each of which will return a color
+channel's worth of visual information when called inside a running
+simulation."
+[#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}]
+(let [retinal-map (retina-sensor-profile eye)
+camera (add-eye! creature eye)
+vision-image
+(atom
+(BufferedImage. (.getWidth camera)
+(.getHeight camera)
+BufferedImage/TYPE_BYTE_BINARY))
+register-eye!
+(runonce
+(fn [world]
+(add-camera!
+world camera
+(let [counter  (atom 0)]
+(fn [r fb bb bi]
+(if (zero? (rem (swap! counter inc) (inc skip)))
+(reset! vision-image
+(BufferedImage! r fb bb bi))))))))]
+(vec
+(map
+(fn [[key image]]
+(let [whites (white-coordinates image)
+topology (vec (collapse whites))
+sensitivity (sensitivity-presets key key)]
+(attached-viewport.
+(fn [world]
+(register-eye! world)
+(vector
+topology
+(vec
+(for [[x y] whites]
+(pixel-sense
+sensitivity
+(.getRGB @vision-image x y))))))
+register-eye!)))
+retinal-map))))
+#+end_listing
+Note that since each of the functions generated by =vision-kernel=
+shares the same =register-eye!= function, the eye will be
+registered only once the first time any of the functions from the
+list returned by =vision-kernel= is called. Each of the functions
+returned by =vision-kernel= also allows access to the =Viewport=
+through which it receives images.
+All the hard work has been done; all that remains is to apply
+=vision-kernel= to each eye in the creature and gather the results
+into one list of functions.
+#+caption: With =vision!=, =CORTEX= is already a fine simulation
+#+caption: environment for experimenting with different types of
+#+caption: eyes.
+#+name: vision!
+#+begin_listing clojure
+(defn vision!
+"Returns a list of functions, each of which returns visual sensory
+data when called inside a running simulation."
+[#^Node creature & {skip :skip :or {skip 0}}]
+(reduce
+concat
+(for [eye (eyes creature)]
+(vision-kernel creature eye))))
+#+end_listing
 ** Hearing is hard; =CORTEX= does it right
 ** Touch uses hundreds of hair-like elements

Mercurial > cortex

comparison thesis/cortex.org @ 470:3401053124b0