cortex: org/vision.org annotate

annotate org/vision.org @ 485:ac953b562eab

completed first draft.

author	Robert McIntyre <rlm@mit.edu>
date	Sat, 29 Mar 2014 16:22:49 -0400
parents	3401053124b0
children	819968c8a391

rev	line source
rlm@34	1 #+title: Simulated Sense of Sight
rlm@23	2 #+author: Robert McIntyre
rlm@23	3 #+email: rlm@mit.edu
rlm@38	4 #+description: Simulated sight for AI research using JMonkeyEngine3 and clojure
rlm@34	5 #+keywords: computer vision, jMonkeyEngine3, clojure
rlm@23	6 #+SETUPFILE: ../../aurellem/org/setup.org
rlm@23	7 #+INCLUDE: ../../aurellem/org/level-0.org
rlm@23	8 #+babel: :mkdirp yes :noweb yes :exports both
rlm@23	9
ocsenave@264	10 * JMonkeyEngine natively supports multiple views of the same world.
ocsenave@264	11
rlm@212	12 Vision is one of the most important senses for humans, so I need to
rlm@212	13 build a simulated sense of vision for my AI. I will do this with
rlm@306	14 simulated eyes. Each eye can be independently moved and should see its
rlm@212	15 own version of the world depending on where it is.
rlm@212	16
rlm@306	17 Making these simulated eyes a reality is simple because jMonkeyEngine
rlm@306	18 already contains extensive support for multiple views of the same 3D
rlm@218	19 simulated world. The reason jMonkeyEngine has this support is because
rlm@218	20 the support is necessary to create games with split-screen
rlm@218	21 views. Multiple views are also used to create efficient
rlm@212	22 pseudo-reflections by rendering the scene from a certain perspective
rlm@212	23 and then projecting it back onto a surface in the 3D world.
rlm@212	24
rlm@218	25 #+caption: jMonkeyEngine supports multiple views to enable split-screen games, like GoldenEye, which was one of the first games to use split-screen views.
rlm@212	26 [[../images/goldeneye-4-player.png]]
rlm@212	27
ocsenave@264	28 ** =ViewPorts=, =SceneProcessors=, and the =RenderManager=.
rlm@306	29 # =ViewPorts= are cameras; =RenderManger= takes snapshots each frame.
ocsenave@264	30 #* A Brief Description of jMonkeyEngine's Rendering Pipeline
rlm@212	31
rlm@213	32 jMonkeyEngine allows you to create a =ViewPort=, which represents a
rlm@213	33 view of the simulated world. You can create as many of these as you
rlm@213	34 want. Every frame, the =RenderManager= iterates through each
rlm@213	35 =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there
rlm@213	36 is a =FrameBuffer= which represents the rendered image in the GPU.
rlm@151	37
rlm@306	38 #+caption: =ViewPorts= are cameras in the world. During each frame, the =RenderManager= records a snapshot of what each view is currently seeing; these snapshots are =FrameBuffer= objects.
ocsenave@265	39 #+ATTR_HTML: width="400"
ocsenave@272	40 [[../images/diagram_rendermanager2.png]]
ocsenave@262	41
rlm@213	42 Each =ViewPort= can have any number of attached =SceneProcessor=
rlm@213	43 objects, which are called every time a new frame is rendered. A
rlm@306	44 =SceneProcessor= receives its =ViewPort's= =FrameBuffer= and can do
rlm@219	45 whatever it wants to the data. Often this consists of invoking GPU
rlm@219	46 specific operations on the rendered image. The =SceneProcessor= can
rlm@219	47 also copy the GPU image data to RAM and process it with the CPU.
rlm@151	48
ocsenave@264	49 ** From Views to Vision
ocsenave@264	50 # Appropriating Views for Vision.
rlm@151	51
ocsenave@264	52 Each eye in the simulated creature needs its own =ViewPort= so that
rlm@213	53 it can see the world from its own perspective. To this =ViewPort=, I
rlm@306	54 add a =SceneProcessor= that feeds the visual data to any arbitrary
rlm@213	55 continuation function for further processing. That continuation
rlm@213	56 function may perform both CPU and GPU operations on the data. To make
rlm@213	57 this easy for the continuation function, the =SceneProcessor=
rlm@306	58 maintains appropriately sized buffers in RAM to hold the data. It does
rlm@218	59 not do any copying from the GPU to the CPU itself because it is a slow
rlm@218	60 operation.
rlm@214	61
rlm@213	62 #+name: pipeline-1
rlm@213	63 #+begin_src clojure
rlm@113	64 (defn vision-pipeline
rlm@34	65 "Create a SceneProcessor object which wraps a vision processing
rlm@113	66 continuation function. The continuation is a function that takes
rlm@113	67 [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi],
rlm@306	68 each of which has already been appropriately sized."
rlm@23	69 [continuation]
rlm@23	70 (let [byte-buffer (atom nil)
rlm@113	71 renderer (atom nil)
rlm@113	72 image (atom nil)]
rlm@23	73 (proxy [SceneProcessor] []
rlm@23	74 (initialize
rlm@23	75 [renderManager viewPort]
rlm@23	76 (let [cam (.getCamera viewPort)
rlm@23	77 width (.getWidth cam)
rlm@23	78 height (.getHeight cam)]
rlm@23	79 (reset! renderer (.getRenderer renderManager))
rlm@23	80 (reset! byte-buffer
rlm@23	81 (BufferUtils/createByteBuffer
rlm@113	82 (* width height 4)))
rlm@113	83 (reset! image (BufferedImage.
rlm@113	84 width height
rlm@113	85 BufferedImage/TYPE_4BYTE_ABGR))))
rlm@23	86 (isInitialized [] (not (nil? @byte-buffer)))
rlm@23	87 (reshape [_ _ _])
rlm@23	88 (preFrame [_])
rlm@23	89 (postQueue [_])
rlm@23	90 (postFrame
rlm@23	91 [#^FrameBuffer fb]
rlm@23	92 (.clear @byte-buffer)
rlm@113	93 (continuation @renderer fb @byte-buffer @image))
rlm@23	94 (cleanup []))))
rlm@213	95 #+end_src
rlm@213	96
rlm@273	97 The continuation function given to =vision-pipeline= above will be
rlm@213	98 given a =Renderer= and three containers for image data. The
rlm@218	99 =FrameBuffer= references the GPU image data, but the pixel data can
rlm@218	100 not be used directly on the CPU. The =ByteBuffer= and =BufferedImage=
rlm@219	101 are initially "empty" but are sized to hold the data in the
rlm@306	102 =FrameBuffer=. I call transferring the GPU image data to the CPU
rlm@213	103 structures "mixing" the image data. I have provided three functions to
rlm@213	104 do this mixing.
rlm@213	105
rlm@213	106 #+name: pipeline-2
rlm@213	107 #+begin_src clojure
rlm@113	108 (defn frameBuffer->byteBuffer!
rlm@113	109 "Transfer the data in the graphics card (Renderer, FrameBuffer) to
rlm@113	110 the CPU (ByteBuffer)."
rlm@113	111 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb]
rlm@113	112 (.readFrameBuffer r fb bb) bb)
rlm@113	113
rlm@113	114 (defn byteBuffer->bufferedImage!
rlm@113	115 "Convert the C-style BGRA image data in the ByteBuffer bb to the AWT
rlm@113	116 style ABGR image data and place it in BufferedImage bi."
rlm@113	117 [#^ByteBuffer bb #^BufferedImage bi]
rlm@113	118 (Screenshots/convertScreenShot bb bi) bi)
rlm@113	119
rlm@113	120 (defn BufferedImage!
rlm@113	121 "Continuation which will grab the buffered image from the materials
rlm@113	122 provided by (vision-pipeline)."
rlm@113	123 [#^Renderer r #^FrameBuffer fb #^ByteBuffer bb #^BufferedImage bi]
rlm@113	124 (byteBuffer->bufferedImage!
rlm@113	125 (frameBuffer->byteBuffer! r fb bb) bi))
rlm@213	126 #+end_src
rlm@112	127
rlm@213	128 Note that it is possible to write vision processing algorithms
rlm@213	129 entirely in terms of =BufferedImage= inputs. Just compose that
rlm@273	130 =BufferedImage= algorithm with =BufferedImage!=. However, a vision
rlm@213	131 processing algorithm that is entirely hosted on the GPU does not have
rlm@306	132 to pay for this convenience.
rlm@213	133
ocsenave@265	134 * Optical sensor arrays are described with images and referenced with metadata
rlm@214	135 The vision pipeline described above handles the flow of rendered
rlm@214	136 images. Now, we need simulated eyes to serve as the source of these
rlm@214	137 images.
rlm@214	138
rlm@214	139 An eye is described in blender in the same way as a joint. They are
rlm@214	140 zero dimensional empty objects with no geometry whose local coordinate
rlm@214	141 system determines the orientation of the resulting eye. All eyes are
rlm@306	142 children of a parent node named "eyes" just as all joints have a
rlm@214	143 parent named "joints". An eye binds to the nearest physical object
rlm@273	144 with =bind-sense=.
rlm@214	145
rlm@214	146 #+name: add-eye
rlm@214	147 #+begin_src clojure
rlm@215	148 (in-ns 'cortex.vision)
rlm@215	149
rlm@214	150 (defn add-eye!
rlm@214	151 "Create a Camera centered on the current position of 'eye which
rlm@338	152 follows the closest physical node in 'creature. The camera will
rlm@338	153 point in the X direction and use the Z vector as up as determined
rlm@338	154 by the rotation of these vectors in blender coordinate space. Use
rlm@338	155 XZY rotation for the node in blender."
rlm@214	156 [#^Node creature #^Spatial eye]
rlm@214	157 (let [target (closest-node creature eye)
rlm@338	158 [cam-width cam-height]
rlm@338	159 ;;[640 480] ;; graphics card on laptop doesn't support
rlm@338	160 ;; arbitray dimensions.
rlm@338	161 (eye-dimensions eye)
rlm@215	162 cam (Camera. cam-width cam-height)
rlm@215	163 rot (.getWorldRotation eye)]
rlm@214	164 (.setLocation cam (.getWorldTranslation eye))
rlm@218	165 (.lookAtDirection
rlm@338	166 cam ; this part is not a mistake and
rlm@338	167 (.mult rot Vector3f/UNIT_X) ; is consistent with using Z in
rlm@338	168 (.mult rot Vector3f/UNIT_Y)) ; blender as the UP vector.
rlm@214	169 (.setFrustumPerspective
rlm@338	170 cam (float 45)
rlm@338	171 (float (/ (.getWidth cam) (.getHeight cam)))
rlm@338	172 (float 1)
rlm@338	173 (float 1000))
rlm@215	174 (bind-sense target cam) cam))
rlm@214	175 #+end_src
rlm@214	176
rlm@214	177 Here, the camera is created based on metadata on the eye-node and
rlm@273	178 attached to the nearest physical object with =bind-sense=
rlm@214	179 ** The Retina
rlm@214	180
rlm@214	181 An eye is a surface (the retina) which contains many discrete sensors
rlm@470	182 to detect light. These sensors can have different light-sensing
rlm@470	183 properties. In humans, each discrete sensor is sensitive to red, blue,
rlm@470	184 green, or gray. These different types of sensors can have different
rlm@470	185 spatial distributions along the retina. In humans, there is a fovea in
rlm@470	186 the center of the retina which has a very high density of color
rlm@470	187 sensors, and a blind spot which has no sensors at all. Sensor density
rlm@470	188 decreases in proportion to distance from the fovea.
rlm@214	189
rlm@214	190 I want to be able to model any retinal configuration, so my eye-nodes
rlm@214	191 in blender contain metadata pointing to images that describe the
rlm@306	192 precise position of the individual sensors using white pixels. The
rlm@306	193 meta-data also describes the precise sensitivity to light that the
rlm@214	194 sensors described in the image have. An eye can contain any number of
rlm@214	195 these images. For example, the metadata for an eye might look like
rlm@214	196 this:
rlm@214	197
rlm@214	198 #+begin_src clojure
rlm@214	199 {0xFF0000 "Models/test-creature/retina-small.png"}
rlm@214	200 #+end_src
rlm@214	201
rlm@214	202 #+caption: The retinal profile image "Models/test-creature/retina-small.png". White pixels are photo-sensitive elements. The distribution of white pixels is denser in the middle and falls off at the edges and is inspired by the human retina.
rlm@214	203 [[../assets/Models/test-creature/retina-small.png]]
rlm@214	204
rlm@214	205 Together, the number 0xFF0000 and the image image above describe the
rlm@214	206 placement of red-sensitive sensory elements.
rlm@214	207
rlm@214	208 Meta-data to very crudely approximate a human eye might be something
rlm@214	209 like this:
rlm@214	210
rlm@214	211 #+begin_src clojure
rlm@214	212 (let [retinal-profile "Models/test-creature/retina-small.png"]
rlm@214	213 {0xFF0000 retinal-profile
rlm@214	214 0x00FF00 retinal-profile
rlm@214	215 0x0000FF retinal-profile
rlm@214	216 0xFFFFFF retinal-profile})
rlm@214	217 #+end_src
rlm@214	218
rlm@214	219 The numbers that serve as keys in the map determine a sensor's
rlm@214	220 relative sensitivity to the channels red, green, and blue. These
rlm@218	221 sensitivity values are packed into an integer in the order =\|_\|R\|G\|B\|=
rlm@218	222 in 8-bit fields. The RGB values of a pixel in the image are added
rlm@306	223 together with these sensitivities as linear weights. Therefore,
rlm@214	224 0xFF0000 means sensitive to red only while 0xFFFFFF means sensitive to
rlm@214	225 all colors equally (gray).
rlm@214	226
rlm@306	227 For convenience I've defined a few symbols for the more common
rlm@214	228 sensitivity values.
rlm@214	229
rlm@214	230 #+name: sensitivity
rlm@214	231 #+begin_src clojure
rlm@317	232 (def sensitivity-presets
rlm@317	233 "Retinal sensitivity presets for sensors that extract one channel
rlm@317	234 (:red :blue :green) or average all channels (:all)"
rlm@214	235 {:all 0xFFFFFF
rlm@214	236 :red 0xFF0000
rlm@214	237 :blue 0x0000FF
rlm@317	238 :green 0x00FF00})
rlm@214	239 #+end_src
rlm@214	240
rlm@214	241 ** Metadata Processing
rlm@214	242
rlm@273	243 =retina-sensor-profile= extracts a map from the eye-node in the same
rlm@273	244 format as the example maps above. =eye-dimensions= finds the
rlm@219	245 dimensions of the smallest image required to contain all the retinal
rlm@214	246 sensor maps.
rlm@214	247
rlm@216	248 #+name: retina
rlm@214	249 #+begin_src clojure
rlm@214	250 (defn retina-sensor-profile
rlm@214	251 "Return a map of pixel sensitivity numbers to BufferedImages
rlm@214	252 describing the distribution of light-sensitive components of this
rlm@214	253 eye. :red, :green, :blue, :gray are already defined as extracting
rlm@214	254 the red, green, blue, and average components respectively."
rlm@214	255 [#^Spatial eye]
rlm@214	256 (if-let [eye-map (meta-data eye "eye")]
rlm@214	257 (map-vals
rlm@214	258 load-image
rlm@214	259 (eval (read-string eye-map)))))
rlm@214	260
rlm@218	261 (defn eye-dimensions
rlm@218	262 "Returns [width, height] determined by the metadata of the eye."
rlm@214	263 [#^Spatial eye]
rlm@214	264 (let [dimensions
rlm@214	265 (map #(vector (.getWidth %) (.getHeight %))
rlm@214	266 (vals (retina-sensor-profile eye)))]
rlm@214	267 [(apply max (map first dimensions))
rlm@214	268 (apply max (map second dimensions))]))
rlm@214	269 #+end_src
rlm@214	270
ocsenave@265	271 * Importing and parsing descriptions of eyes.
rlm@214	272 First off, get the children of the "eyes" empty node to find all the
rlm@214	273 eyes the creature has.
rlm@216	274 #+name: eye-node
rlm@214	275 #+begin_src clojure
rlm@317	276 (def
rlm@317	277 ^{:doc "Return the children of the creature's \"eyes\" node."
rlm@317	278 :arglists '([creature])}
rlm@214	279 eyes
rlm@317	280 (sense-nodes "eyes"))
rlm@214	281 #+end_src
rlm@214	282
rlm@273	283 Then, add the camera created by =add-eye!= to the simulation by
rlm@215	284 creating a new viewport.
rlm@214	285
rlm@216	286 #+name: add-camera
rlm@213	287 #+begin_src clojure
rlm@338	288 (in-ns 'cortex.vision)
rlm@169	289 (defn add-camera!
rlm@169	290 "Add a camera to the world, calling continuation on every frame
rlm@34	291 produced."
rlm@167	292 [#^Application world camera continuation]
rlm@23	293 (let [width (.getWidth camera)
rlm@23	294 height (.getHeight camera)
rlm@23	295 render-manager (.getRenderManager world)
rlm@23	296 viewport (.createMainView render-manager "eye-view" camera)]
rlm@23	297 (doto viewport
rlm@23	298 (.setClearFlags true true true)
rlm@112	299 (.setBackgroundColor ColorRGBA/Black)
rlm@113	300 (.addProcessor (vision-pipeline continuation))
rlm@23	301 (.attachScene (.getRootNode world)))))
rlm@215	302 #+end_src
rlm@151	303
rlm@338	304 #+results: add-camera
rlm@338	305 : #'cortex.vision/add-camera!
rlm@338	306
rlm@151	307
rlm@218	308 The eye's continuation function should register the viewport with the
rlm@218	309 simulation the first time it is called, use the CPU to extract the
rlm@215	310 appropriate pixels from the rendered image and weight them by each
rlm@218	311 sensor's sensitivity. I have the option to do this processing in
rlm@218	312 native code for a slight gain in speed. I could also do it in the GPU
rlm@273	313 for a massive gain in speed. =vision-kernel= generates a list of
rlm@218	314 such continuation functions, one for each channel of the eye.
rlm@151	315
rlm@216	316 #+name: kernel
rlm@215	317 #+begin_src clojure
rlm@215	318 (in-ns 'cortex.vision)
rlm@151	319
rlm@215	320 (defrecord attached-viewport [vision-fn viewport-fn]
rlm@215	321 clojure.lang.IFn
rlm@215	322 (invoke [this world] (vision-fn world))
rlm@215	323 (applyTo [this args] (apply vision-fn args)))
rlm@151	324
rlm@216	325 (defn pixel-sense [sensitivity pixel]
rlm@216	326 (let [s-r (bit-shift-right (bit-and 0xFF0000 sensitivity) 16)
rlm@216	327 s-g (bit-shift-right (bit-and 0x00FF00 sensitivity) 8)
rlm@216	328 s-b (bit-and 0x0000FF sensitivity)
rlm@216	329
rlm@216	330 p-r (bit-shift-right (bit-and 0xFF0000 pixel) 16)
rlm@216	331 p-g (bit-shift-right (bit-and 0x00FF00 pixel) 8)
rlm@216	332 p-b (bit-and 0x0000FF pixel)
rlm@216	333
rlm@216	334 total-sensitivity (* 255 (+ s-r s-g s-b))]
rlm@216	335 (float (/ (+ (* s-r p-r)
rlm@216	336 (* s-g p-g)
rlm@216	337 (* s-b p-b))
rlm@216	338 total-sensitivity))))
rlm@216	339
rlm@215	340 (defn vision-kernel
rlm@171	341 "Returns a list of functions, each of which will return a color
rlm@171	342 channel's worth of visual information when called inside a running
rlm@171	343 simulation."
rlm@151	344 [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}]
rlm@169	345 (let [retinal-map (retina-sensor-profile eye)
rlm@169	346 camera (add-eye! creature eye)
rlm@151	347 vision-image
rlm@151	348 (atom
rlm@151	349 (BufferedImage. (.getWidth camera)
rlm@151	350 (.getHeight camera)
rlm@170	351 BufferedImage/TYPE_BYTE_BINARY))
rlm@170	352 register-eye!
rlm@170	353 (runonce
rlm@170	354 (fn [world]
rlm@170	355 (add-camera!
rlm@170	356 world camera
rlm@170	357 (let [counter (atom 0)]
rlm@170	358 (fn [r fb bb bi]
rlm@170	359 (if (zero? (rem (swap! counter inc) (inc skip)))
rlm@170	360 (reset! vision-image
rlm@170	361 (BufferedImage! r fb bb bi))))))))]
rlm@151	362 (vec
rlm@151	363 (map
rlm@151	364 (fn [[key image]]
rlm@151	365 (let [whites (white-coordinates image)
rlm@151	366 topology (vec (collapse whites))
rlm@216	367 sensitivity (sensitivity-presets key key)]
rlm@215	368 (attached-viewport.
rlm@215	369 (fn [world]
rlm@215	370 (register-eye! world)
rlm@215	371 (vector
rlm@215	372 topology
rlm@215	373 (vec
rlm@215	374 (for [[x y] whites]
rlm@216	375 (pixel-sense
rlm@216	376 sensitivity
rlm@216	377 (.getRGB @vision-image x y))))))
rlm@215	378 register-eye!)))
rlm@215	379 retinal-map))))
rlm@151	380
rlm@215	381 (defn gen-fix-display
rlm@215	382 "Create a function to call to restore a simulation's display when it
rlm@215	383 is disrupted by a Viewport."
rlm@215	384 []
rlm@215	385 (runonce
rlm@215	386 (fn [world]
rlm@215	387 (add-camera! world (.getCamera world) no-op))))
rlm@215	388 #+end_src
rlm@170	389
rlm@273	390 Note that since each of the functions generated by =vision-kernel=
rlm@273	391 shares the same =register-eye!= function, the eye will be registered
rlm@215	392 only once the first time any of the functions from the list returned
rlm@273	393 by =vision-kernel= is called. Each of the functions returned by
rlm@273	394 =vision-kernel= also allows access to the =Viewport= through which
rlm@306	395 it receives images.
rlm@215	396
rlm@306	397 The in-game display can be disrupted by all the ViewPorts that the
rlm@306	398 functions generated by =vision-kernel= add. This doesn't affect the
rlm@215	399 simulation or the simulated senses, but can be annoying.
rlm@273	400 =gen-fix-display= restores the in-simulation display.
rlm@215	401
ocsenave@265	402 ** The =vision!= function creates sensory probes.
rlm@215	403
rlm@218	404 All the hard work has been done; all that remains is to apply
rlm@273	405 =vision-kernel= to each eye in the creature and gather the results
rlm@215	406 into one list of functions.
rlm@215	407
rlm@216	408 #+name: main
rlm@215	409 #+begin_src clojure
rlm@170	410 (defn vision!
rlm@348	411 "Returns a list of functions, each of which returns visual sensory
rlm@348	412 data when called inside a running simulation."
rlm@151	413 [#^Node creature & {skip :skip :or {skip 0}}]
rlm@151	414 (reduce
rlm@170	415 concat
rlm@167	416 (for [eye (eyes creature)]
rlm@215	417 (vision-kernel creature eye))))
rlm@215	418 #+end_src
rlm@151	419
ocsenave@265	420 ** Displaying visual data for debugging.
ocsenave@265	421 # Visualization of Vision. Maybe less alliteration would be better.
rlm@215	422 It's vital to have a visual representation for each sense. Here I use
rlm@273	423 =view-sense= to construct a function that will create a display for
rlm@215	424 visual data.
rlm@215	425
rlm@216	426 #+name: display
rlm@215	427 #+begin_src clojure
rlm@216	428 (in-ns 'cortex.vision)
rlm@216	429
rlm@189	430 (defn view-vision
rlm@189	431 "Creates a function which accepts a list of visual sensor-data and
rlm@189	432 displays each element of the list to the screen."
rlm@189	433 []
rlm@188	434 (view-sense
rlm@188	435 (fn
rlm@188	436 [[coords sensor-data]]
rlm@188	437 (let [image (points->image coords)]
rlm@188	438 (dorun
rlm@188	439 (for [i (range (count coords))]
rlm@188	440 (.setRGB image ((coords i) 0) ((coords i) 1)
rlm@216	441 (gray (int (* 255 (sensor-data i)))))))
rlm@189	442 image))))
rlm@34	443 #+end_src
rlm@23	444
ocsenave@264	445 * Demonstrations
ocsenave@264	446 ** Demonstrating the vision pipeline.
rlm@23	447
rlm@215	448 This is a basic test for the vision system. It only tests the
ocsenave@264	449 vision-pipeline and does not deal with loading eyes from a blender
rlm@215	450 file. The code creates two videos of the same rotating cube from
rlm@215	451 different angles.
rlm@23	452
rlm@215	453 #+name: test-1
rlm@23	454 #+begin_src clojure
rlm@215	455 (in-ns 'cortex.test.vision)
rlm@23	456
rlm@219	457 (defn test-pipeline
rlm@69	458 "Testing vision:
rlm@69	459 Tests the vision system by creating two views of the same rotating
rlm@69	460 object from different angles and displaying both of those views in
rlm@69	461 JFrames.
rlm@69	462
rlm@69	463 You should see a rotating cube, and two windows,
rlm@69	464 each displaying a different view of the cube."
rlm@283	465 ([] (test-pipeline false))
rlm@283	466 ([record?]
rlm@283	467 (let [candy
rlm@283	468 (box 1 1 1 :physical? false :color ColorRGBA/Blue)]
rlm@283	469 (world
rlm@283	470 (doto (Node.)
rlm@283	471 (.attachChild candy))
rlm@283	472 {}
rlm@283	473 (fn [world]
rlm@283	474 (let [cam (.clone (.getCamera world))
rlm@283	475 width (.getWidth cam)
rlm@283	476 height (.getHeight cam)]
rlm@283	477 (add-camera! world cam
rlm@283	478 (comp
rlm@283	479 (view-image
rlm@283	480 (if record?
rlm@283	481 (File. "/home/r/proj/cortex/render/vision/1")))
rlm@283	482 BufferedImage!))
rlm@283	483 (add-camera! world
rlm@283	484 (doto (.clone cam)
rlm@283	485 (.setLocation (Vector3f. -10 0 0))
rlm@283	486 (.lookAt Vector3f/ZERO Vector3f/UNIT_Y))
rlm@283	487 (comp
rlm@283	488 (view-image
rlm@283	489 (if record?
rlm@283	490 (File. "/home/r/proj/cortex/render/vision/2")))
rlm@283	491 BufferedImage!))
rlm@341	492 (let [timer (IsoTimer. 60)]
rlm@340	493 (.setTimer world timer)
rlm@340	494 (display-dilated-time world timer))
rlm@283	495 ;; This is here to restore the main view
rlm@340	496 ;; after the other views have completed processing
rlm@283	497 (add-camera! world (.getCamera world) no-op)))
rlm@283	498 (fn [world tpf]
rlm@283	499 (.rotate candy (* tpf 0.2) 0 0))))))
rlm@23	500 #+end_src
rlm@23	501
rlm@340	502 #+results: test-1
rlm@340	503 : #'cortex.test.vision/test-pipeline
rlm@340	504
rlm@215	505 #+begin_html
rlm@215	506 <div class="figure">
rlm@215	507 <video controls="controls" width="755">
rlm@215	508 <source src="../video/spinning-cube.ogg" type="video/ogg"
rlm@215	509 preload="none" poster="../images/aurellem-1280x480.png" />
rlm@215	510 </video>
rlm@309	511 <br> <a href="http://youtu.be/r5Bn2aG7MO0"> YouTube </a>
rlm@215	512 <p>A rotating cube viewed from two different perspectives.</p>
rlm@215	513 </div>
rlm@215	514 #+end_html
rlm@215	515
rlm@215	516 Creating multiple eyes like this can be used for stereoscopic vision
rlm@215	517 simulation in a single creature or for simulating multiple creatures,
rlm@215	518 each with their own sense of vision.
ocsenave@264	519 ** Demonstrating eye import and parsing.
rlm@215	520
rlm@218	521 To the worm from the last post, I add a new node that describes its
rlm@215	522 eyes.
rlm@215	523
rlm@215	524 #+attr_html: width=755
rlm@215	525 #+caption: The worm with newly added empty nodes describing a single eye.
rlm@215	526 [[../images/worm-with-eye.png]]
rlm@215	527
rlm@215	528 The node highlighted in yellow is the root level "eyes" node. It has
rlm@218	529 a single child, highlighted in orange, which describes a single
rlm@218	530 eye. This is the "eye" node. It is placed so that the worm will have
rlm@218	531 an eye located in the center of the flat portion of its lower
rlm@218	532 hemispherical section.
rlm@218	533
rlm@218	534 The two nodes which are not highlighted describe the single joint of
rlm@218	535 the worm.
rlm@215	536
rlm@215	537 The metadata of the eye-node is:
rlm@215	538
rlm@215	539 #+begin_src clojure :results verbatim :exports both
rlm@215	540 (cortex.sense/meta-data
rlm@218	541 (.getChild (.getChild (cortex.test.body/worm) "eyes") "eye") "eye")
rlm@215	542 #+end_src
rlm@215	543
rlm@215	544 #+results:
rlm@215	545 : "(let [retina \"Models/test-creature/retina-small.png\"]
rlm@215	546 : {:all retina :red retina :green retina :blue retina})"
rlm@215	547
rlm@215	548 This is the approximation to the human eye described earlier.
rlm@215	549
rlm@216	550 #+name: test-2
rlm@215	551 #+begin_src clojure
rlm@215	552 (in-ns 'cortex.test.vision)
rlm@215	553
rlm@216	554 (defn change-color [obj color]
rlm@321	555 ;;(println-repl obj)
rlm@216	556 (if obj
rlm@216	557 (.setColor (.getMaterial obj) "Color" color)))
rlm@216	558
rlm@216	559 (defn colored-cannon-ball [color]
rlm@216	560 (comp #(change-color % color)
rlm@216	561 (fire-cannon-ball)))
rlm@215	562
rlm@338	563 (defn gen-worm
rlm@338	564 "create a creature acceptable for testing as a replacement for the
rlm@338	565 worm."
rlm@338	566 []
rlm@338	567 (nodify
rlm@338	568 "worm"
rlm@338	569 [(nodify
rlm@338	570 "eyes"
rlm@338	571 [(doto
rlm@338	572 (Node. "eye1")
rlm@338	573 (.setLocalTranslation (Vector3f. 0 -1.1 0))
rlm@338	574 (.setUserData
rlm@338	575
rlm@338	576 "eye"
rlm@338	577 "(let [retina
rlm@338	578 \"Models/test-creature/retina-small.png\"]
rlm@338	579 {:all retina :red retina
rlm@338	580 :green retina :blue retina})"))])
rlm@338	581 (box
rlm@338	582 0.2 0.2 0.2
rlm@338	583 :name "worm-segment"
rlm@338	584 :position (Vector3f. 0 0 0)
rlm@338	585 :color ColorRGBA/Orange)]))
rlm@338	586
rlm@338	587
rlm@338	588
rlm@283	589 (defn test-worm-vision
rlm@321	590 "Testing vision:
rlm@321	591 You should see the worm suspended in mid-air, looking down at a
rlm@321	592 table. There are four small displays, one each for red, green blue,
rlm@321	593 and gray channels. You can fire balls of various colors, and the
rlm@321	594 four channels should react accordingly.
rlm@321	595
rlm@321	596 Keys:
rlm@321	597 r : fire red-ball
rlm@321	598 b : fire blue-ball
rlm@321	599 g : fire green-ball
rlm@321	600 <space> : fire white ball"
rlm@338	601
rlm@283	602 ([] (test-worm-vision false))
rlm@283	603 ([record?]
rlm@283	604 (let [the-worm (doto (worm)(body!))
rlm@340	605 vision (vision! the-worm)
rlm@340	606 vision-display (view-vision)
rlm@340	607 fix-display (gen-fix-display)
rlm@283	608 me (sphere 0.5 :color ColorRGBA/Blue :physical? false)
rlm@283	609 x-axis
rlm@283	610 (box 1 0.01 0.01 :physical? false :color ColorRGBA/Red
rlm@283	611 :position (Vector3f. 0 -5 0))
rlm@283	612 y-axis
rlm@283	613 (box 0.01 1 0.01 :physical? false :color ColorRGBA/Green
rlm@283	614 :position (Vector3f. 0 -5 0))
rlm@283	615 z-axis
rlm@283	616 (box 0.01 0.01 1 :physical? false :color ColorRGBA/Blue
rlm@283	617 :position (Vector3f. 0 -5 0))
rlm@340	618
rlm@338	619 ]
rlm@215	620
rlm@335	621 (world
rlm@335	622 (nodify [(floor) the-worm x-axis y-axis z-axis me])
rlm@340	623 (merge standard-debug-controls
rlm@340	624 {"key-r" (colored-cannon-ball ColorRGBA/Red)
rlm@340	625 "key-b" (colored-cannon-ball ColorRGBA/Blue)
rlm@340	626 "key-g" (colored-cannon-ball ColorRGBA/Green)})
rlm@338	627
rlm@335	628 (fn [world]
rlm@340	629 (light-up-everything world)
rlm@340	630 (speed-up world)
rlm@341	631 (let [timer (IsoTimer. 60)]
rlm@340	632 (.setTimer world timer)
rlm@340	633 (display-dilated-time world timer))
rlm@340	634 ;; add a view from the worm's perspective
rlm@340	635 (if record?
rlm@340	636 (Capture/captureVideo
rlm@340	637 world
rlm@340	638 (File.
rlm@340	639 "/home/r/proj/cortex/render/worm-vision/main-view")))
rlm@340	640
rlm@340	641 (add-camera!
rlm@340	642 world
rlm@340	643 (add-eye! the-worm (first (eyes the-worm)))
rlm@340	644 (comp
rlm@340	645 (view-image
rlm@340	646 (if record?
rlm@340	647 (File.
rlm@340	648 "/home/r/proj/cortex/render/worm-vision/worm-view")))
rlm@340	649 BufferedImage!))
rlm@340	650
rlm@340	651 (set-gravity world Vector3f/ZERO)
rlm@340	652 (add-camera! world (.getCamera world) no-op))
rlm@340	653
rlm@340	654 (fn [world _]
rlm@340	655 (.setLocalTranslation me (.getLocation (.getCamera world)))
rlm@340	656 (vision-display
rlm@340	657 (map #(% world) vision)
rlm@338	658 (if record?
rlm@340	659 (File. "/home/r/proj/cortex/render/worm-vision")))
rlm@340	660 (fix-display world)
rlm@335	661 )))))
rlm@215	662 #+end_src
rlm@215	663
rlm@335	664 #+RESULTS: test-2
rlm@337	665 : #'cortex.test.vision/test-worm-vision
rlm@335	666
rlm@335	667
rlm@218	668 The world consists of the worm and a flat gray floor. I can shoot red,
rlm@218	669 green, blue and white cannonballs at the worm. The worm is initially
rlm@218	670 looking down at the floor, and there is no gravity. My perspective
rlm@218	671 (the Main View), the worm's perspective (Worm View) and the 4 sensor
rlm@218	672 channels that comprise the worm's eye are all saved frame-by-frame to
rlm@218	673 disk.
rlm@218	674
rlm@218	675 * Demonstration of Vision
rlm@218	676 #+begin_html
rlm@218	677 <div class="figure">
rlm@218	678 <video controls="controls" width="755">
rlm@218	679 <source src="../video/worm-vision.ogg" type="video/ogg"
rlm@218	680 preload="none" poster="../images/aurellem-1280x480.png" />
rlm@218	681 </video>
rlm@309	682 <br> <a href="http://youtu.be/J3H3iB_2NPQ"> YouTube </a>
rlm@218	683 <p>Simulated Vision in a Virtual Environment</p>
rlm@218	684 </div>
rlm@218	685 #+end_html
rlm@218	686
rlm@218	687 ** Generate the Worm Video from Frames
rlm@216	688 #+name: magick2
rlm@216	689 #+begin_src clojure
rlm@216	690 (ns cortex.video.magick2
rlm@216	691 (:import java.io.File)
rlm@316	692 (:use clojure.java.shell))
rlm@216	693
rlm@216	694 (defn images [path]
rlm@216	695 (sort (rest (file-seq (File. path)))))
rlm@216	696
rlm@216	697 (def base "/home/r/proj/cortex/render/worm-vision/")
rlm@216	698
rlm@216	699 (defn pics [file]
rlm@216	700 (images (str base file)))
rlm@216	701
rlm@216	702 (defn combine-images []
rlm@216	703 (let [main-view (pics "main-view")
rlm@216	704 worm-view (pics "worm-view")
rlm@216	705 blue (pics "0")
rlm@216	706 green (pics "1")
rlm@216	707 red (pics "2")
rlm@216	708 gray (pics "3")
rlm@216	709 blender (let [b-pics (pics "blender")]
rlm@216	710 (concat b-pics (repeat 9001 (last b-pics))))
rlm@216	711 background (repeat 9001 (File. (str base "background.png")))
rlm@216	712 targets (map
rlm@216	713 #(File. (str base "out/" (format "%07d.png" %)))
rlm@216	714 (range 0 (count main-view)))]
rlm@216	715 (dorun
rlm@216	716 (pmap
rlm@216	717 (comp
rlm@216	718 (fn [[background main-view worm-view red green blue gray blender target]]
rlm@216	719 (println target)
rlm@216	720 (sh "convert"
rlm@216	721 background
rlm@216	722 main-view "-geometry" "+18+17" "-composite"
rlm@216	723 worm-view "-geometry" "+677+17" "-composite"
rlm@216	724 green "-geometry" "+685+430" "-composite"
rlm@216	725 red "-geometry" "+788+430" "-composite"
rlm@216	726 blue "-geometry" "+894+430" "-composite"
rlm@216	727 gray "-geometry" "+1000+430" "-composite"
rlm@216	728 blender "-geometry" "+0+0" "-composite"
rlm@216	729 target))
rlm@216	730 (fn [& args] (map #(.getCanonicalPath %) args)))
rlm@216	731 background main-view worm-view red green blue gray blender targets))))
rlm@216	732 #+end_src
rlm@216	733
rlm@216	734 #+begin_src sh :results silent
rlm@216	735 cd /home/r/proj/cortex/render/worm-vision
rlm@216	736 ffmpeg -r 25 -b 9001k -i out/%07d.png -vcodec libtheora worm-vision.ogg
rlm@216	737 #+end_src
rlm@236	738
ocsenave@265	739 * Onward!
ocsenave@265	740 - As a neat bonus, this idea behind simulated vision also enables one
ocsenave@265	741 to [[../../cortex/html/capture-video.html][capture live video feeds from jMonkeyEngine]].
ocsenave@265	742 - Now that we have vision, it's time to tackle [[./hearing.org][hearing]].
ocsenave@265	743 #+appendix
ocsenave@265	744
rlm@215	745 * Headers
rlm@215	746
rlm@213	747 #+name: vision-header
rlm@213	748 #+begin_src clojure
rlm@213	749 (ns cortex.vision
rlm@213	750 "Simulate the sense of vision in jMonkeyEngine3. Enables multiple
rlm@213	751 eyes from different positions to observe the same world, and pass
rlm@306	752 the observed data to any arbitrary function. Automatically reads
rlm@216	753 eye-nodes from specially prepared blender files and instantiates
rlm@213	754 them in the world as actual eyes."
rlm@213	755 {:author "Robert McIntyre"}
rlm@213	756 (:use (cortex world sense util))
rlm@213	757 (:import com.jme3.post.SceneProcessor)
rlm@237	758 (:import (com.jme3.util BufferUtils Screenshots))
rlm@213	759 (:import java.nio.ByteBuffer)
rlm@213	760 (:import java.awt.image.BufferedImage)
rlm@213	761 (:import (com.jme3.renderer ViewPort Camera))
rlm@216	762 (:import (com.jme3.math ColorRGBA Vector3f Matrix3f))
rlm@213	763 (:import com.jme3.renderer.Renderer)
rlm@213	764 (:import com.jme3.app.Application)
rlm@213	765 (:import com.jme3.texture.FrameBuffer)
rlm@213	766 (:import (com.jme3.scene Node Spatial)))
rlm@213	767 #+end_src
rlm@112	768
rlm@215	769 #+name: test-header
rlm@215	770 #+begin_src clojure
rlm@215	771 (ns cortex.test.vision
rlm@215	772 (:use (cortex world sense util body vision))
rlm@215	773 (:use cortex.test.body)
rlm@215	774 (:import java.awt.image.BufferedImage)
rlm@215	775 (:import javax.swing.JPanel)
rlm@215	776 (:import javax.swing.SwingUtilities)
rlm@215	777 (:import java.awt.Dimension)
rlm@215	778 (:import javax.swing.JFrame)
rlm@215	779 (:import com.jme3.math.ColorRGBA)
rlm@215	780 (:import com.jme3.scene.Node)
rlm@215	781 (:import com.jme3.math.Vector3f)
rlm@216	782 (:import java.io.File)
rlm@341	783 (:import (com.aurellem.capture Capture RatchetTimer IsoTimer)))
rlm@215	784 #+end_src
rlm@341	785
rlm@341	786 #+results: test-header
rlm@341	787 : com.aurellem.capture.IsoTimer
rlm@341	788
rlm@216	789 * Source Listing
rlm@216	790 - [[../src/cortex/vision.clj][cortex.vision]]
rlm@216	791 - [[../src/cortex/test/vision.clj][cortex.test.vision]]
rlm@216	792 - [[../src/cortex/video/magick2.clj][cortex.video.magick2]]
rlm@216	793 - [[../assets/Models/subtitles/worm-vision-subtitles.blend][worm-vision-subtitles.blend]]
rlm@216	794 #+html: <ul> <li> <a href="../org/sense.org">This org file</a> </li> </ul>
rlm@216	795 - [[http://hg.bortreb.com ][source-repository]]
rlm@216	796
rlm@35	797
rlm@273	798 * Next
rlm@273	799 I find some [[./hearing.org][ears]] for the creature while exploring the guts of
rlm@273	800 jMonkeyEngine's sound system.
rlm@24	801
rlm@212	802 * COMMENT Generate Source
rlm@34	803 #+begin_src clojure :tangle ../src/cortex/vision.clj
rlm@216	804 <<vision-header>>
rlm@216	805 <<pipeline-1>>
rlm@216	806 <<pipeline-2>>
rlm@216	807 <<retina>>
rlm@216	808 <<add-eye>>
rlm@216	809 <<sensitivity>>
rlm@216	810 <<eye-node>>
rlm@216	811 <<add-camera>>
rlm@216	812 <<kernel>>
rlm@216	813 <<main>>
rlm@216	814 <<display>>
rlm@24	815 #+end_src
rlm@24	816
rlm@68	817 #+begin_src clojure :tangle ../src/cortex/test/vision.clj
rlm@215	818 <<test-header>>
rlm@215	819 <<test-1>>
rlm@216	820 <<test-2>>
rlm@24	821 #+end_src
rlm@216	822
rlm@216	823 #+begin_src clojure :tangle ../src/cortex/video/magick2.clj
rlm@216	824 <<magick2>>
rlm@216	825 #+end_src

Mercurial > cortex

annotate org/vision.org @ 485:ac953b562eab