cortex: org/vision.org annotate

annotate org/vision.org @ 213:319963720179

fleshing out vision

author	Robert McIntyre <rlm@mit.edu>
date	Thu, 09 Feb 2012 08:11:10 -0700
parents	8e9825c38941
children	01d3e9855ef9

rev	line source
rlm@34	1 #+title: Simulated Sense of Sight
rlm@23	2 #+author: Robert McIntyre
rlm@23	3 #+email: rlm@mit.edu
rlm@38	4 #+description: Simulated sight for AI research using JMonkeyEngine3 and clojure
rlm@34	5 #+keywords: computer vision, jMonkeyEngine3, clojure
rlm@23	6 #+SETUPFILE: ../../aurellem/org/setup.org
rlm@23	7 #+INCLUDE: ../../aurellem/org/level-0.org
rlm@23	8 #+babel: :mkdirp yes :noweb yes :exports both
rlm@23	9
rlm@194	10 * Vision
rlm@23	11
rlm@151	12
rlm@212	13 Vision is one of the most important senses for humans, so I need to
rlm@212	14 build a simulated sense of vision for my AI. I will do this with
rlm@212	15 simulated eyes. Each eye can be independely moved and should see its
rlm@212	16 own version of the world depending on where it is.
rlm@212	17
rlm@212	18 Making these simulated eyes a reality is fairly simple bacause
rlm@212	19 jMonkeyEngine already conatains extensive support for multiple views
rlm@212	20 of the same 3D simulated world. The reason jMonkeyEngine has this
rlm@212	21 support is because the support is necessary to create games with
rlm@212	22 split-screen views. Multiple views are also used to create efficient
rlm@212	23 pseudo-reflections by rendering the scene from a certain perspective
rlm@212	24 and then projecting it back onto a surface in the 3D world.
rlm@212	25
rlm@212	26 #+caption: jMonkeyEngine supports multiple views to enable split-screen games, like GoldenEye
rlm@212	27 [[../images/goldeneye-4-player.png]]
rlm@212	28
rlm@213	29 * Brief Description of jMonkeyEngine's Rendering Pipeline
rlm@212	30
rlm@213	31 jMonkeyEngine allows you to create a =ViewPort=, which represents a
rlm@213	32 view of the simulated world. You can create as many of these as you
rlm@213	33 want. Every frame, the =RenderManager= iterates through each
rlm@213	34 =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there
rlm@213	35 is a =FrameBuffer= which represents the rendered image in the GPU.
rlm@151	36
rlm@213	37 Each =ViewPort= can have any number of attached =SceneProcessor=
rlm@213	38 objects, which are called every time a new frame is rendered. A
rlm@213	39 =SceneProcessor= recieves a =FrameBuffer= and can do whatever it wants
rlm@213	40 to the data. Often this consists of invoking GPU specific operations
rlm@213	41 on the rendered image. The =SceneProcessor= can also copy the GPU
rlm@213	42 image data to RAM and process it with the CPU.
rlm@151	43
rlm@213	44 * The Vision Pipeline
rlm@151	45
rlm@213	46 Each eye in the simulated creature needs it's own =ViewPort= so that
rlm@213	47 it can see the world from its own perspective. To this =ViewPort=, I
rlm@213	48 add a =SceneProcessor= that feeds the visual data to any arbitra
rlm@213	49 continuation function for further processing. That continuation
rlm@213	50 function may perform both CPU and GPU operations on the data. To make
rlm@213	51 this easy for the continuation function, the =SceneProcessor=
rlm@213	52 maintains appropriatly sized buffers in RAM to hold the data. It does
rlm@213	53 not do any copying from the GPU to the CPU itself.
rlm@213	54 #+name: pipeline-1
rlm@213	55 #+begin_src clojure
rlm@113	56 (defn vision-pipeline
rlm@34	57 "Create a SceneProcessor object which wraps a vision processing
rlm@113	58 continuation function. The continuation is a function that takes
rlm@113	59 [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi],
rlm@113	60 each of which has already been appropiately sized."
rlm@23	61 [continuation]
rlm@23	62 (let [byte-buffer (atom nil)
rlm@113	63 renderer (atom nil)
rlm@113	64 image (atom nil)]
rlm@23	65 (proxy [SceneProcessor] []
rlm@23	66 (initialize
rlm@23	67 [renderManager viewPort]
rlm@23	68 (let [cam (.getCamera viewPort)
rlm@23	69 width (.getWidth cam)
rlm@23	70 height (.getHeight cam)]
rlm@23	71 (reset! renderer (.getRenderer renderManager))
rlm@23	72 (reset! byte-buffer
rlm@23	73 (BufferUtils/createByteBuffer
rlm@113	74 (* width height 4)))
rlm@113	75 (reset! image (BufferedImage.
rlm@113	76 width height
rlm@113	77 BufferedImage/TYPE_4BYTE_ABGR))))
rlm@23	78 (isInitialized [] (not (nil? @byte-buffer)))
rlm@23	79 (reshape [_ _ _])
rlm@23	80 (preFrame [_])
rlm@23	81 (postQueue [_])
rlm@23	82 (postFrame
rlm@23	83 [#^FrameBuffer fb]
rlm@23	84 (.clear @byte-buffer)
rlm@113	85 (continuation @renderer fb @byte-buffer @image))
rlm@23	86 (cleanup []))))
rlm@213	87 #+end_src
rlm@213	88
rlm@213	89 The continuation function given to =(vision-pipeline)= above will be
rlm@213	90 given a =Renderer= and three containers for image data. The
rlm@213	91 =FrameBuffer= references the GPU image data, but it can not be used
rlm@213	92 directly on the CPU. The =ByteBuffer= and =BufferedImage= are
rlm@213	93 initially "empty" but are sized to hold to data in the
rlm@213	94 =FrameBuffer=. I call transfering the GPU image data to the CPU
rlm@213	95 structures "mixing" the image data. I have provided three functions to
rlm@213	96 do this mixing.
rlm@213	97
rlm@213	98 #+name: pipeline-2
rlm@213	99 #+begin_src clojure
rlm@113	100 (defn frameBuffer->byteBuffer!
rlm@113	101 "Transfer the data in the graphics card (Renderer, FrameBuffer) to
rlm@113	102 the CPU (ByteBuffer)."
rlm@113	103 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb]
rlm@113	104 (.readFrameBuffer r fb bb) bb)
rlm@113	105
rlm@113	106 (defn byteBuffer->bufferedImage!
rlm@113	107 "Convert the C-style BGRA image data in the ByteBuffer bb to the AWT
rlm@113	108 style ABGR image data and place it in BufferedImage bi."
rlm@113	109 [#^ByteBuffer bb #^BufferedImage bi]
rlm@113	110 (Screenshots/convertScreenShot bb bi) bi)
rlm@113	111
rlm@113	112 (defn BufferedImage!
rlm@113	113 "Continuation which will grab the buffered image from the materials
rlm@113	114 provided by (vision-pipeline)."
rlm@113	115 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb #^BufferedImage bi]
rlm@113	116 (byteBuffer->bufferedImage!
rlm@113	117 (frameBuffer->byteBuffer! r fb bb) bi))
rlm@213	118 #+end_src
rlm@112	119
rlm@213	120 Note that it is possible to write vision processing algorithms
rlm@213	121 entirely in terms of =BufferedImage= inputs. Just compose that
rlm@213	122 =BufferedImage= algorithm with =(BufferedImage!)=. However, a vision
rlm@213	123 processing algorithm that is entirely hosted on the GPU does not have
rlm@213	124 to pay for this convienence.
rlm@213	125
rlm@213	126
rlm@213	127 * Physical Eyes
rlm@213	128
rlm@213	129 The vision pipeline described above only deals with
rlm@213	130 Each eye in the creature in blender will work the same way as
rlm@213	131 joints -- a zero dimensional object with no geometry whose local
rlm@213	132 coordinate system determines the orientation of the resulting
rlm@213	133 eye. All eyes will have a parent named "eyes" just as all joints
rlm@213	134 have a parent named "joints". The resulting camera will be a
rlm@213	135 ChaseCamera or a CameraNode bound to the geo that is closest to
rlm@213	136 the eye marker. The eye marker will contain the metadata for the
rlm@213	137 eye, and will be moved by it's bound geometry. The dimensions of
rlm@213	138 the eye's camera are equal to the dimensions of the eye's "UV"
rlm@213	139 map.
rlm@213	140
rlm@213	141 (vision creature) will take an optional :skip argument which will
rlm@213	142 inform the continuations in scene processor to skip the given
rlm@213	143 number of cycles 0 means that no cycles will be skipped.
rlm@213	144
rlm@213	145 (vision creature) will return [init-functions sensor-functions].
rlm@213	146 The init-functions are each single-arg functions that take the
rlm@213	147 world and register the cameras and must each be called before the
rlm@213	148 corresponding sensor-functions. Each init-function returns the
rlm@213	149 viewport for that eye which can be manipulated, saved, etc. Each
rlm@213	150 sensor-function is a thunk and will return data in the same
rlm@213	151 format as the tactile-sensor functions the structure is
rlm@213	152 [topology, sensor-data]. Internally, these sensor-functions
rlm@213	153 maintain a reference to sensor-data which is periodically updated
rlm@213	154 by the continuation function established by its init-function.
rlm@213	155 They can be queried every cycle, but their information may not
rlm@213	156 necessairly be different every cycle.
rlm@213	157
rlm@213	158
rlm@213	159 #+begin_src clojure
rlm@169	160 (defn add-camera!
rlm@169	161 "Add a camera to the world, calling continuation on every frame
rlm@34	162 produced."
rlm@167	163 [#^Application world camera continuation]
rlm@23	164 (let [width (.getWidth camera)
rlm@23	165 height (.getHeight camera)
rlm@23	166 render-manager (.getRenderManager world)
rlm@23	167 viewport (.createMainView render-manager "eye-view" camera)]
rlm@23	168 (doto viewport
rlm@23	169 (.setClearFlags true true true)
rlm@112	170 (.setBackgroundColor ColorRGBA/Black)
rlm@113	171 (.addProcessor (vision-pipeline continuation))
rlm@23	172 (.attachScene (.getRootNode world)))))
rlm@151	173
rlm@169	174 (defn retina-sensor-profile
rlm@151	175 "Return a map of pixel selection functions to BufferedImages
rlm@169	176 describing the distribution of light-sensitive components of this
rlm@169	177 eye. Each function creates an integer from the rgb values found in
rlm@169	178 the pixel. :red, :green, :blue, :gray are already defined as
rlm@169	179 extracting the red, green, blue, and average components
rlm@151	180 respectively."
rlm@151	181 [#^Spatial eye]
rlm@151	182 (if-let [eye-map (meta-data eye "eye")]
rlm@151	183 (map-vals
rlm@167	184 load-image
rlm@151	185 (eval (read-string eye-map)))))
rlm@151	186
rlm@151	187 (defn eye-dimensions
rlm@169	188 "Returns [width, height] specified in the metadata of the eye"
rlm@151	189 [#^Spatial eye]
rlm@151	190 (let [dimensions
rlm@151	191 (map #(vector (.getWidth %) (.getHeight %))
rlm@169	192 (vals (retina-sensor-profile eye)))]
rlm@151	193 [(apply max (map first dimensions))
rlm@151	194 (apply max (map second dimensions))]))
rlm@151	195
rlm@167	196 (defvar
rlm@167	197 ^{:arglists '([creature])}
rlm@167	198 eyes
rlm@167	199 (sense-nodes "eyes")
rlm@167	200 "Return the children of the creature's \"eyes\" node.")
rlm@151	201
rlm@169	202 (defn add-eye!
rlm@169	203 "Create a Camera centered on the current position of 'eye which
rlm@169	204 follows the closest physical node in 'creature and sends visual
rlm@169	205 data to 'continuation."
rlm@151	206 [#^Node creature #^Spatial eye]
rlm@151	207 (let [target (closest-node creature eye)
rlm@151	208 [cam-width cam-height] (eye-dimensions eye)
rlm@151	209 cam (Camera. cam-width cam-height)]
rlm@151	210 (.setLocation cam (.getWorldTranslation eye))
rlm@151	211 (.setRotation cam (.getWorldRotation eye))
rlm@151	212 (.setFrustumPerspective
rlm@151	213 cam 45 (/ (.getWidth cam) (.getHeight cam))
rlm@151	214 1 1000)
rlm@151	215 (bind-sense target cam)
rlm@151	216 cam))
rlm@151	217
rlm@172	218 (defvar color-channel-presets
rlm@151	219 {:all 0xFFFFFF
rlm@151	220 :red 0xFF0000
rlm@151	221 :blue 0x0000FF
rlm@172	222 :green 0x00FF00}
rlm@172	223 "Bitmasks for common RGB color channels")
rlm@151	224
rlm@169	225 (defn vision-fn
rlm@171	226 "Returns a list of functions, each of which will return a color
rlm@171	227 channel's worth of visual information when called inside a running
rlm@171	228 simulation."
rlm@151	229 [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}]
rlm@169	230 (let [retinal-map (retina-sensor-profile eye)
rlm@169	231 camera (add-eye! creature eye)
rlm@151	232 vision-image
rlm@151	233 (atom
rlm@151	234 (BufferedImage. (.getWidth camera)
rlm@151	235 (.getHeight camera)
rlm@170	236 BufferedImage/TYPE_BYTE_BINARY))
rlm@170	237 register-eye!
rlm@170	238 (runonce
rlm@170	239 (fn [world]
rlm@170	240 (add-camera!
rlm@170	241 world camera
rlm@170	242 (let [counter (atom 0)]
rlm@170	243 (fn [r fb bb bi]
rlm@170	244 (if (zero? (rem (swap! counter inc) (inc skip)))
rlm@170	245 (reset! vision-image
rlm@170	246 (BufferedImage! r fb bb bi))))))))]
rlm@151	247 (vec
rlm@151	248 (map
rlm@151	249 (fn [[key image]]
rlm@151	250 (let [whites (white-coordinates image)
rlm@151	251 topology (vec (collapse whites))
rlm@172	252 mask (color-channel-presets key)]
rlm@170	253 (fn [world]
rlm@170	254 (register-eye! world)
rlm@151	255 (vector
rlm@151	256 topology
rlm@151	257 (vec
rlm@151	258 (for [[x y] whites]
rlm@151	259 (bit-and
rlm@151	260 mask (.getRGB @vision-image x y))))))))
rlm@170	261 retinal-map))))
rlm@151	262
rlm@170	263
rlm@170	264 ;; TODO maybe should add a viewport-manipulation function to
rlm@170	265 ;; automatically change viewport settings, attach shadow filters, etc.
rlm@170	266
rlm@170	267 (defn vision!
rlm@170	268 "Returns a function which returns visual sensory data when called
rlm@170	269 inside a running simulation"
rlm@151	270 [#^Node creature & {skip :skip :or {skip 0}}]
rlm@151	271 (reduce
rlm@170	272 concat
rlm@167	273 (for [eye (eyes creature)]
rlm@169	274 (vision-fn creature eye))))
rlm@151	275
rlm@189	276 (defn view-vision
rlm@189	277 "Creates a function which accepts a list of visual sensor-data and
rlm@189	278 displays each element of the list to the screen."
rlm@189	279 []
rlm@188	280 (view-sense
rlm@188	281 (fn
rlm@188	282 [[coords sensor-data]]
rlm@188	283 (let [image (points->image coords)]
rlm@188	284 (dorun
rlm@188	285 (for [i (range (count coords))]
rlm@188	286 (.setRGB image ((coords i) 0) ((coords i) 1)
rlm@188	287 (sensor-data i))))
rlm@189	288 image))))
rlm@188	289
rlm@34	290 #+end_src
rlm@23	291
rlm@112	292
rlm@34	293 Note the use of continuation passing style for connecting the eye to a
rlm@34	294 function to process the output. You can create any number of eyes, and
rlm@34	295 each of them will see the world from their own =Camera=. Once every
rlm@34	296 frame, the rendered image is copied to a =BufferedImage=, and that
rlm@34	297 data is sent off to the continuation function. Moving the =Camera=
rlm@34	298 which was used to create the eye will change what the eye sees.
rlm@23	299
rlm@34	300 * Example
rlm@23	301
rlm@66	302 #+name: test-vision
rlm@23	303 #+begin_src clojure
rlm@68	304 (ns cortex.test.vision
rlm@34	305 (:use (cortex world util vision))
rlm@34	306 (:import java.awt.image.BufferedImage)
rlm@34	307 (:import javax.swing.JPanel)
rlm@34	308 (:import javax.swing.SwingUtilities)
rlm@34	309 (:import java.awt.Dimension)
rlm@34	310 (:import javax.swing.JFrame)
rlm@34	311 (:import com.jme3.math.ColorRGBA)
rlm@45	312 (:import com.jme3.scene.Node)
rlm@113	313 (:import com.jme3.math.Vector3f))
rlm@23	314
rlm@36	315 (defn test-two-eyes
rlm@69	316 "Testing vision:
rlm@69	317 Tests the vision system by creating two views of the same rotating
rlm@69	318 object from different angles and displaying both of those views in
rlm@69	319 JFrames.
rlm@69	320
rlm@69	321 You should see a rotating cube, and two windows,
rlm@69	322 each displaying a different view of the cube."
rlm@36	323 []
rlm@58	324 (let [candy
rlm@58	325 (box 1 1 1 :physical? false :color ColorRGBA/Blue)]
rlm@112	326 (world
rlm@112	327 (doto (Node.)
rlm@112	328 (.attachChild candy))
rlm@112	329 {}
rlm@112	330 (fn [world]
rlm@112	331 (let [cam (.clone (.getCamera world))
rlm@112	332 width (.getWidth cam)
rlm@112	333 height (.getHeight cam)]
rlm@169	334 (add-camera! world cam
rlm@113	335 ;;no-op
rlm@113	336 (comp (view-image) BufferedImage!)
rlm@112	337 )
rlm@169	338 (add-camera! world
rlm@112	339 (doto (.clone cam)
rlm@112	340 (.setLocation (Vector3f. -10 0 0))
rlm@112	341 (.lookAt Vector3f/ZERO Vector3f/UNIT_Y))
rlm@113	342 ;;no-op
rlm@113	343 (comp (view-image) BufferedImage!))
rlm@112	344 ;; This is here to restore the main view
rlm@112	345 ;; after the other views have completed processing
rlm@169	346 (add-camera! world (.getCamera world) no-op)))
rlm@112	347 (fn [world tpf]
rlm@112	348 (.rotate candy (* tpf 0.2) 0 0)))))
rlm@23	349 #+end_src
rlm@23	350
rlm@213	351 #+name: vision-header
rlm@213	352 #+begin_src clojure
rlm@213	353 (ns cortex.vision
rlm@213	354 "Simulate the sense of vision in jMonkeyEngine3. Enables multiple
rlm@213	355 eyes from different positions to observe the same world, and pass
rlm@213	356 the observed data to any arbitray function. Automatically reads
rlm@213	357 eye-nodes from specially prepared blender files and instanttiates
rlm@213	358 them in the world as actual eyes."
rlm@213	359 {:author "Robert McIntyre"}
rlm@213	360 (:use (cortex world sense util))
rlm@213	361 (:use clojure.contrib.def)
rlm@213	362 (:import com.jme3.post.SceneProcessor)
rlm@213	363 (:import (com.jme3.util BufferUtils Screenshots))
rlm@213	364 (:import java.nio.ByteBuffer)
rlm@213	365 (:import java.awt.image.BufferedImage)
rlm@213	366 (:import (com.jme3.renderer ViewPort Camera))
rlm@213	367 (:import com.jme3.math.ColorRGBA)
rlm@213	368 (:import com.jme3.renderer.Renderer)
rlm@213	369 (:import com.jme3.app.Application)
rlm@213	370 (:import com.jme3.texture.FrameBuffer)
rlm@213	371 (:import (com.jme3.scene Node Spatial)))
rlm@213	372 #+end_src
rlm@112	373
rlm@34	374 The example code will create two videos of the same rotating object
rlm@34	375 from different angles. It can be used both for stereoscopic vision
rlm@34	376 simulation or for simulating multiple creatures, each with their own
rlm@34	377 sense of vision.
rlm@24	378
rlm@35	379 - As a neat bonus, this idea behind simulated vision also enables one
rlm@35	380 to [[../../cortex/html/capture-video.html][capture live video feeds from jMonkeyEngine]].
rlm@35	381
rlm@24	382
rlm@212	383 * COMMENT Generate Source
rlm@34	384 #+begin_src clojure :tangle ../src/cortex/vision.clj
rlm@24	385 <<eyes>>
rlm@24	386 #+end_src
rlm@24	387
rlm@68	388 #+begin_src clojure :tangle ../src/cortex/test/vision.clj
rlm@24	389 <<test-vision>>
rlm@24	390 #+end_src

Mercurial > cortex

annotate org/vision.org @ 213:319963720179