Mercurial > cortex
changeset 470:3401053124b0
integrating vision into thesis.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Fri, 28 Mar 2014 17:10:43 -0400 |
parents | ae10f35022ba |
children | f14fa9e5b67f |
files | org/vision.org thesis/cortex.org thesis/images/retina-small.png |
diffstat | 3 files changed, 315 insertions(+), 25 deletions(-) [+] |
line wrap: on
line diff
1.1 --- a/org/vision.org Fri Mar 28 16:34:35 2014 -0400 1.2 +++ b/org/vision.org Fri Mar 28 17:10:43 2014 -0400 1.3 @@ -174,21 +174,18 @@ 1.4 (bind-sense target cam) cam)) 1.5 #+end_src 1.6 1.7 -#+results: add-eye 1.8 -: #'cortex.vision/add-eye! 1.9 - 1.10 Here, the camera is created based on metadata on the eye-node and 1.11 attached to the nearest physical object with =bind-sense= 1.12 ** The Retina 1.13 1.14 An eye is a surface (the retina) which contains many discrete sensors 1.15 -to detect light. These sensors have can have different light-sensing 1.16 -properties. In humans, each discrete sensor is sensitive to red, 1.17 -blue, green, or gray. These different types of sensors can have 1.18 -different spatial distributions along the retina. In humans, there is 1.19 -a fovea in the center of the retina which has a very high density of 1.20 -color sensors, and a blind spot which has no sensors at all. Sensor 1.21 -density decreases in proportion to distance from the fovea. 1.22 +to detect light. These sensors can have different light-sensing 1.23 +properties. In humans, each discrete sensor is sensitive to red, blue, 1.24 +green, or gray. These different types of sensors can have different 1.25 +spatial distributions along the retina. In humans, there is a fovea in 1.26 +the center of the retina which has a very high density of color 1.27 +sensors, and a blind spot which has no sensors at all. Sensor density 1.28 +decreases in proportion to distance from the fovea. 1.29 1.30 I want to be able to model any retinal configuration, so my eye-nodes 1.31 in blender contain metadata pointing to images that describe the
2.1 --- a/thesis/cortex.org Fri Mar 28 16:34:35 2014 -0400 2.2 +++ b/thesis/cortex.org Fri Mar 28 17:10:43 2014 -0400 2.3 @@ -6,22 +6,36 @@ 2.4 #+LaTeX_CLASS_OPTIONS: [nofloat] 2.5 2.6 * COMMENT templates 2.7 - #+caption: 2.8 - #+caption: 2.9 - #+caption: 2.10 - #+caption: 2.11 - #+name: name 2.12 - #+begin_listing clojure 2.13 - #+begin_src clojure 2.14 - #+end_src 2.15 - #+end_listing 2.16 + #+caption: 2.17 + #+caption: 2.18 + #+caption: 2.19 + #+caption: 2.20 + #+name: name 2.21 + #+begin_listing clojure 2.22 + #+end_listing 2.23 2.24 - #+caption: 2.25 - #+caption: 2.26 - #+caption: 2.27 - #+name: name 2.28 - #+ATTR_LaTeX: :width 10cm 2.29 - [[./images/aurellem-gray.png]] 2.30 + #+caption: 2.31 + #+caption: 2.32 + #+caption: 2.33 + #+name: name 2.34 + #+ATTR_LaTeX: :width 10cm 2.35 + [[./images/aurellem-gray.png]] 2.36 + 2.37 + #+caption: 2.38 + #+caption: 2.39 + #+caption: 2.40 + #+caption: 2.41 + #+name: name 2.42 + #+begin_listing clojure 2.43 + #+end_listing 2.44 + 2.45 + #+caption: 2.46 + #+caption: 2.47 + #+caption: 2.48 + #+name: name 2.49 + #+ATTR_LaTeX: :width 10cm 2.50 + [[./images/aurellem-gray.png]] 2.51 + 2.52 2.53 * COMMENT Empathy and Embodiment as problem solving strategies 2.54 2.55 @@ -942,6 +956,285 @@ 2.56 2.57 ** Eyes reuse standard video game components 2.58 2.59 + Vision is one of the most important senses for humans, so I need to 2.60 + build a simulated sense of vision for my AI. I will do this with 2.61 + simulated eyes. Each eye can be independently moved and should see 2.62 + its own version of the world depending on where it is. 2.63 + 2.64 + Making these simulated eyes a reality is simple because 2.65 + jMonkeyEngine already contains extensive support for multiple views 2.66 + of the same 3D simulated world. The reason jMonkeyEngine has this 2.67 + support is because the support is necessary to create games with 2.68 + split-screen views. Multiple views are also used to create 2.69 + efficient pseudo-reflections by rendering the scene from a certain 2.70 + perspective and then projecting it back onto a surface in the 3D 2.71 + world. 2.72 + 2.73 + #+caption: jMonkeyEngine supports multiple views to enable 2.74 + #+caption: split-screen games, like GoldenEye, which was one of 2.75 + #+caption: the first games to use split-screen views. 2.76 + #+name: name 2.77 + #+ATTR_LaTeX: :width 10cm 2.78 + [[./images/goldeneye-4-player.png]] 2.79 + 2.80 +*** A Brief Description of jMonkeyEngine's Rendering Pipeline 2.81 + 2.82 + jMonkeyEngine allows you to create a =ViewPort=, which represents a 2.83 + view of the simulated world. You can create as many of these as you 2.84 + want. Every frame, the =RenderManager= iterates through each 2.85 + =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there 2.86 + is a =FrameBuffer= which represents the rendered image in the GPU. 2.87 + 2.88 + #+caption: =ViewPorts= are cameras in the world. During each frame, 2.89 + #+caption: the =RenderManager= records a snapshot of what each view 2.90 + #+caption: is currently seeing; these snapshots are =FrameBuffer= objects. 2.91 + #+name: name 2.92 + #+ATTR_LaTeX: :width 10cm 2.93 + [[../images/diagram_rendermanager2.png]] 2.94 + 2.95 + Each =ViewPort= can have any number of attached =SceneProcessor= 2.96 + objects, which are called every time a new frame is rendered. A 2.97 + =SceneProcessor= receives its =ViewPort's= =FrameBuffer= and can do 2.98 + whatever it wants to the data. Often this consists of invoking GPU 2.99 + specific operations on the rendered image. The =SceneProcessor= can 2.100 + also copy the GPU image data to RAM and process it with the CPU. 2.101 + 2.102 +*** Appropriating Views for Vision 2.103 + 2.104 + Each eye in the simulated creature needs its own =ViewPort= so 2.105 + that it can see the world from its own perspective. To this 2.106 + =ViewPort=, I add a =SceneProcessor= that feeds the visual data to 2.107 + any arbitrary continuation function for further processing. That 2.108 + continuation function may perform both CPU and GPU operations on 2.109 + the data. To make this easy for the continuation function, the 2.110 + =SceneProcessor= maintains appropriately sized buffers in RAM to 2.111 + hold the data. It does not do any copying from the GPU to the CPU 2.112 + itself because it is a slow operation. 2.113 + 2.114 + #+caption: Function to make the rendered secne in jMonkeyEngine 2.115 + #+caption: available for further processing. 2.116 + #+name: pipeline-1 2.117 + #+begin_listing clojure 2.118 + #+begin_src clojure 2.119 +(defn vision-pipeline 2.120 + "Create a SceneProcessor object which wraps a vision processing 2.121 + continuation function. The continuation is a function that takes 2.122 + [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi], 2.123 + each of which has already been appropriately sized." 2.124 + [continuation] 2.125 + (let [byte-buffer (atom nil) 2.126 + renderer (atom nil) 2.127 + image (atom nil)] 2.128 + (proxy [SceneProcessor] [] 2.129 + (initialize 2.130 + [renderManager viewPort] 2.131 + (let [cam (.getCamera viewPort) 2.132 + width (.getWidth cam) 2.133 + height (.getHeight cam)] 2.134 + (reset! renderer (.getRenderer renderManager)) 2.135 + (reset! byte-buffer 2.136 + (BufferUtils/createByteBuffer 2.137 + (* width height 4))) 2.138 + (reset! image (BufferedImage. 2.139 + width height 2.140 + BufferedImage/TYPE_4BYTE_ABGR)))) 2.141 + (isInitialized [] (not (nil? @byte-buffer))) 2.142 + (reshape [_ _ _]) 2.143 + (preFrame [_]) 2.144 + (postQueue [_]) 2.145 + (postFrame 2.146 + [#^FrameBuffer fb] 2.147 + (.clear @byte-buffer) 2.148 + (continuation @renderer fb @byte-buffer @image)) 2.149 + (cleanup [])))) 2.150 + #+end_src 2.151 + #+end_listing 2.152 + 2.153 + The continuation function given to =vision-pipeline= above will be 2.154 + given a =Renderer= and three containers for image data. The 2.155 + =FrameBuffer= references the GPU image data, but the pixel data 2.156 + can not be used directly on the CPU. The =ByteBuffer= and 2.157 + =BufferedImage= are initially "empty" but are sized to hold the 2.158 + data in the =FrameBuffer=. I call transferring the GPU image data 2.159 + to the CPU structures "mixing" the image data. 2.160 + 2.161 +*** Optical sensor arrays are described with images and referenced with metadata 2.162 + 2.163 + The vision pipeline described above handles the flow of rendered 2.164 + images. Now, =CORTEX= needs simulated eyes to serve as the source 2.165 + of these images. 2.166 + 2.167 + An eye is described in blender in the same way as a joint. They 2.168 + are zero dimensional empty objects with no geometry whose local 2.169 + coordinate system determines the orientation of the resulting eye. 2.170 + All eyes are children of a parent node named "eyes" just as all 2.171 + joints have a parent named "joints". An eye binds to the nearest 2.172 + physical object with =bind-sense=. 2.173 + 2.174 + #+caption: Here, the camera is created based on metadata on the 2.175 + #+caption: eye-node and attached to the nearest physical object 2.176 + #+caption: with =bind-sense= 2.177 + #+name: add-eye 2.178 + #+begin_listing clojure 2.179 +(defn add-eye! 2.180 + "Create a Camera centered on the current position of 'eye which 2.181 + follows the closest physical node in 'creature. The camera will 2.182 + point in the X direction and use the Z vector as up as determined 2.183 + by the rotation of these vectors in blender coordinate space. Use 2.184 + XZY rotation for the node in blender." 2.185 + [#^Node creature #^Spatial eye] 2.186 + (let [target (closest-node creature eye) 2.187 + [cam-width cam-height] 2.188 + ;;[640 480] ;; graphics card on laptop doesn't support 2.189 + ;; arbitray dimensions. 2.190 + (eye-dimensions eye) 2.191 + cam (Camera. cam-width cam-height) 2.192 + rot (.getWorldRotation eye)] 2.193 + (.setLocation cam (.getWorldTranslation eye)) 2.194 + (.lookAtDirection 2.195 + cam ; this part is not a mistake and 2.196 + (.mult rot Vector3f/UNIT_X) ; is consistent with using Z in 2.197 + (.mult rot Vector3f/UNIT_Y)) ; blender as the UP vector. 2.198 + (.setFrustumPerspective 2.199 + cam (float 45) 2.200 + (float (/ (.getWidth cam) (.getHeight cam))) 2.201 + (float 1) 2.202 + (float 1000)) 2.203 + (bind-sense target cam) cam)) 2.204 + #+end_listing 2.205 + 2.206 +*** Simulated Retina 2.207 + 2.208 + An eye is a surface (the retina) which contains many discrete 2.209 + sensors to detect light. These sensors can have different 2.210 + light-sensing properties. In humans, each discrete sensor is 2.211 + sensitive to red, blue, green, or gray. These different types of 2.212 + sensors can have different spatial distributions along the retina. 2.213 + In humans, there is a fovea in the center of the retina which has 2.214 + a very high density of color sensors, and a blind spot which has 2.215 + no sensors at all. Sensor density decreases in proportion to 2.216 + distance from the fovea. 2.217 + 2.218 + I want to be able to model any retinal configuration, so my 2.219 + eye-nodes in blender contain metadata pointing to images that 2.220 + describe the precise position of the individual sensors using 2.221 + white pixels. The meta-data also describes the precise sensitivity 2.222 + to light that the sensors described in the image have. An eye can 2.223 + contain any number of these images. For example, the metadata for 2.224 + an eye might look like this: 2.225 + 2.226 + #+begin_src clojure 2.227 +{0xFF0000 "Models/test-creature/retina-small.png"} 2.228 + #+end_src 2.229 + 2.230 + #+caption: An example retinal profile image. White pixels are 2.231 + #+caption: photo-sensitive elements. The distribution of white 2.232 + #+caption: pixels is denser in the middle and falls off at the 2.233 + #+caption: edges and is inspired by the human retina. 2.234 + #+name: retina 2.235 + #+ATTR_LaTeX: :width 10cm 2.236 + [[./images/retina-small.png]] 2.237 + 2.238 + Together, the number 0xFF0000 and the image image above describe 2.239 + the placement of red-sensitive sensory elements. 2.240 + 2.241 + Meta-data to very crudely approximate a human eye might be 2.242 + something like this: 2.243 + 2.244 + #+begin_src clojure 2.245 +(let [retinal-profile "Models/test-creature/retina-small.png"] 2.246 + {0xFF0000 retinal-profile 2.247 + 0x00FF00 retinal-profile 2.248 + 0x0000FF retinal-profile 2.249 + 0xFFFFFF retinal-profile}) 2.250 + #+end_src 2.251 + 2.252 + The numbers that serve as keys in the map determine a sensor's 2.253 + relative sensitivity to the channels red, green, and blue. These 2.254 + sensitivity values are packed into an integer in the order 2.255 + =|_|R|G|B|= in 8-bit fields. The RGB values of a pixel in the 2.256 + image are added together with these sensitivities as linear 2.257 + weights. Therefore, 0xFF0000 means sensitive to red only while 2.258 + 0xFFFFFF means sensitive to all colors equally (gray). 2.259 + 2.260 + #+caption: This is the core of vision in =CORTEX=. A given eye node 2.261 + #+caption: is converted into a function that returns visual 2.262 + #+caption: information from the simulation. 2.263 + #+name: name 2.264 + #+begin_listing clojure 2.265 +(defn vision-kernel 2.266 + "Returns a list of functions, each of which will return a color 2.267 + channel's worth of visual information when called inside a running 2.268 + simulation." 2.269 + [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}] 2.270 + (let [retinal-map (retina-sensor-profile eye) 2.271 + camera (add-eye! creature eye) 2.272 + vision-image 2.273 + (atom 2.274 + (BufferedImage. (.getWidth camera) 2.275 + (.getHeight camera) 2.276 + BufferedImage/TYPE_BYTE_BINARY)) 2.277 + register-eye! 2.278 + (runonce 2.279 + (fn [world] 2.280 + (add-camera! 2.281 + world camera 2.282 + (let [counter (atom 0)] 2.283 + (fn [r fb bb bi] 2.284 + (if (zero? (rem (swap! counter inc) (inc skip))) 2.285 + (reset! vision-image 2.286 + (BufferedImage! r fb bb bi))))))))] 2.287 + (vec 2.288 + (map 2.289 + (fn [[key image]] 2.290 + (let [whites (white-coordinates image) 2.291 + topology (vec (collapse whites)) 2.292 + sensitivity (sensitivity-presets key key)] 2.293 + (attached-viewport. 2.294 + (fn [world] 2.295 + (register-eye! world) 2.296 + (vector 2.297 + topology 2.298 + (vec 2.299 + (for [[x y] whites] 2.300 + (pixel-sense 2.301 + sensitivity 2.302 + (.getRGB @vision-image x y)))))) 2.303 + register-eye!))) 2.304 + retinal-map)))) 2.305 + #+end_listing 2.306 + 2.307 + Note that since each of the functions generated by =vision-kernel= 2.308 + shares the same =register-eye!= function, the eye will be 2.309 + registered only once the first time any of the functions from the 2.310 + list returned by =vision-kernel= is called. Each of the functions 2.311 + returned by =vision-kernel= also allows access to the =Viewport= 2.312 + through which it receives images. 2.313 + 2.314 + All the hard work has been done; all that remains is to apply 2.315 + =vision-kernel= to each eye in the creature and gather the results 2.316 + into one list of functions. 2.317 + 2.318 + 2.319 + #+caption: With =vision!=, =CORTEX= is already a fine simulation 2.320 + #+caption: environment for experimenting with different types of 2.321 + #+caption: eyes. 2.322 + #+name: vision! 2.323 + #+begin_listing clojure 2.324 +(defn vision! 2.325 + "Returns a list of functions, each of which returns visual sensory 2.326 + data when called inside a running simulation." 2.327 + [#^Node creature & {skip :skip :or {skip 0}}] 2.328 + (reduce 2.329 + concat 2.330 + (for [eye (eyes creature)] 2.331 + (vision-kernel creature eye)))) 2.332 + #+end_listing 2.333 + 2.334 + 2.335 + 2.336 + 2.337 + 2.338 ** Hearing is hard; =CORTEX= does it right 2.339 2.340 ** Touch uses hundreds of hair-like elements
3.1 Binary file thesis/images/retina-small.png has changed