comparison thesis/cortex.org @ 470:3401053124b0

integrating vision into thesis.
author Robert McIntyre <rlm@mit.edu>
date Fri, 28 Mar 2014 17:10:43 -0400
parents ae10f35022ba
children f14fa9e5b67f
comparison
equal deleted inserted replaced
469:ae10f35022ba 470:3401053124b0
4 #+description: Using embodied AI to facilitate Artificial Imagination. 4 #+description: Using embodied AI to facilitate Artificial Imagination.
5 #+keywords: AI, clojure, embodiment 5 #+keywords: AI, clojure, embodiment
6 #+LaTeX_CLASS_OPTIONS: [nofloat] 6 #+LaTeX_CLASS_OPTIONS: [nofloat]
7 7
8 * COMMENT templates 8 * COMMENT templates
9 #+caption: 9 #+caption:
10 #+caption: 10 #+caption:
11 #+caption: 11 #+caption:
12 #+caption: 12 #+caption:
13 #+name: name 13 #+name: name
14 #+begin_listing clojure 14 #+begin_listing clojure
15 #+begin_src clojure 15 #+end_listing
16 #+end_src 16
17 #+end_listing 17 #+caption:
18 18 #+caption:
19 #+caption: 19 #+caption:
20 #+caption: 20 #+name: name
21 #+caption: 21 #+ATTR_LaTeX: :width 10cm
22 #+name: name 22 [[./images/aurellem-gray.png]]
23 #+ATTR_LaTeX: :width 10cm 23
24 [[./images/aurellem-gray.png]] 24 #+caption:
25 #+caption:
26 #+caption:
27 #+caption:
28 #+name: name
29 #+begin_listing clojure
30 #+end_listing
31
32 #+caption:
33 #+caption:
34 #+caption:
35 #+name: name
36 #+ATTR_LaTeX: :width 10cm
37 [[./images/aurellem-gray.png]]
38
25 39
26 * COMMENT Empathy and Embodiment as problem solving strategies 40 * COMMENT Empathy and Embodiment as problem solving strategies
27 41
28 By the end of this thesis, you will have seen a novel approach to 42 By the end of this thesis, you will have seen a novel approach to
29 interpreting video using embodiment and empathy. You will have also 43 interpreting video using embodiment and empathy. You will have also
939 #+name: name 953 #+name: name
940 #+ATTR_LaTeX: :width 15cm 954 #+ATTR_LaTeX: :width 15cm
941 [[./images/physical-hand.png]] 955 [[./images/physical-hand.png]]
942 956
943 ** Eyes reuse standard video game components 957 ** Eyes reuse standard video game components
958
959 Vision is one of the most important senses for humans, so I need to
960 build a simulated sense of vision for my AI. I will do this with
961 simulated eyes. Each eye can be independently moved and should see
962 its own version of the world depending on where it is.
963
964 Making these simulated eyes a reality is simple because
965 jMonkeyEngine already contains extensive support for multiple views
966 of the same 3D simulated world. The reason jMonkeyEngine has this
967 support is because the support is necessary to create games with
968 split-screen views. Multiple views are also used to create
969 efficient pseudo-reflections by rendering the scene from a certain
970 perspective and then projecting it back onto a surface in the 3D
971 world.
972
973 #+caption: jMonkeyEngine supports multiple views to enable
974 #+caption: split-screen games, like GoldenEye, which was one of
975 #+caption: the first games to use split-screen views.
976 #+name: name
977 #+ATTR_LaTeX: :width 10cm
978 [[./images/goldeneye-4-player.png]]
979
980 *** A Brief Description of jMonkeyEngine's Rendering Pipeline
981
982 jMonkeyEngine allows you to create a =ViewPort=, which represents a
983 view of the simulated world. You can create as many of these as you
984 want. Every frame, the =RenderManager= iterates through each
985 =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there
986 is a =FrameBuffer= which represents the rendered image in the GPU.
987
988 #+caption: =ViewPorts= are cameras in the world. During each frame,
989 #+caption: the =RenderManager= records a snapshot of what each view
990 #+caption: is currently seeing; these snapshots are =FrameBuffer= objects.
991 #+name: name
992 #+ATTR_LaTeX: :width 10cm
993 [[../images/diagram_rendermanager2.png]]
994
995 Each =ViewPort= can have any number of attached =SceneProcessor=
996 objects, which are called every time a new frame is rendered. A
997 =SceneProcessor= receives its =ViewPort's= =FrameBuffer= and can do
998 whatever it wants to the data. Often this consists of invoking GPU
999 specific operations on the rendered image. The =SceneProcessor= can
1000 also copy the GPU image data to RAM and process it with the CPU.
1001
1002 *** Appropriating Views for Vision
1003
1004 Each eye in the simulated creature needs its own =ViewPort= so
1005 that it can see the world from its own perspective. To this
1006 =ViewPort=, I add a =SceneProcessor= that feeds the visual data to
1007 any arbitrary continuation function for further processing. That
1008 continuation function may perform both CPU and GPU operations on
1009 the data. To make this easy for the continuation function, the
1010 =SceneProcessor= maintains appropriately sized buffers in RAM to
1011 hold the data. It does not do any copying from the GPU to the CPU
1012 itself because it is a slow operation.
1013
1014 #+caption: Function to make the rendered secne in jMonkeyEngine
1015 #+caption: available for further processing.
1016 #+name: pipeline-1
1017 #+begin_listing clojure
1018 #+begin_src clojure
1019 (defn vision-pipeline
1020 "Create a SceneProcessor object which wraps a vision processing
1021 continuation function. The continuation is a function that takes
1022 [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi],
1023 each of which has already been appropriately sized."
1024 [continuation]
1025 (let [byte-buffer (atom nil)
1026 renderer (atom nil)
1027 image (atom nil)]
1028 (proxy [SceneProcessor] []
1029 (initialize
1030 [renderManager viewPort]
1031 (let [cam (.getCamera viewPort)
1032 width (.getWidth cam)
1033 height (.getHeight cam)]
1034 (reset! renderer (.getRenderer renderManager))
1035 (reset! byte-buffer
1036 (BufferUtils/createByteBuffer
1037 (* width height 4)))
1038 (reset! image (BufferedImage.
1039 width height
1040 BufferedImage/TYPE_4BYTE_ABGR))))
1041 (isInitialized [] (not (nil? @byte-buffer)))
1042 (reshape [_ _ _])
1043 (preFrame [_])
1044 (postQueue [_])
1045 (postFrame
1046 [#^FrameBuffer fb]
1047 (.clear @byte-buffer)
1048 (continuation @renderer fb @byte-buffer @image))
1049 (cleanup []))))
1050 #+end_src
1051 #+end_listing
1052
1053 The continuation function given to =vision-pipeline= above will be
1054 given a =Renderer= and three containers for image data. The
1055 =FrameBuffer= references the GPU image data, but the pixel data
1056 can not be used directly on the CPU. The =ByteBuffer= and
1057 =BufferedImage= are initially "empty" but are sized to hold the
1058 data in the =FrameBuffer=. I call transferring the GPU image data
1059 to the CPU structures "mixing" the image data.
1060
1061 *** Optical sensor arrays are described with images and referenced with metadata
1062
1063 The vision pipeline described above handles the flow of rendered
1064 images. Now, =CORTEX= needs simulated eyes to serve as the source
1065 of these images.
1066
1067 An eye is described in blender in the same way as a joint. They
1068 are zero dimensional empty objects with no geometry whose local
1069 coordinate system determines the orientation of the resulting eye.
1070 All eyes are children of a parent node named "eyes" just as all
1071 joints have a parent named "joints". An eye binds to the nearest
1072 physical object with =bind-sense=.
1073
1074 #+caption: Here, the camera is created based on metadata on the
1075 #+caption: eye-node and attached to the nearest physical object
1076 #+caption: with =bind-sense=
1077 #+name: add-eye
1078 #+begin_listing clojure
1079 (defn add-eye!
1080 "Create a Camera centered on the current position of 'eye which
1081 follows the closest physical node in 'creature. The camera will
1082 point in the X direction and use the Z vector as up as determined
1083 by the rotation of these vectors in blender coordinate space. Use
1084 XZY rotation for the node in blender."
1085 [#^Node creature #^Spatial eye]
1086 (let [target (closest-node creature eye)
1087 [cam-width cam-height]
1088 ;;[640 480] ;; graphics card on laptop doesn't support
1089 ;; arbitray dimensions.
1090 (eye-dimensions eye)
1091 cam (Camera. cam-width cam-height)
1092 rot (.getWorldRotation eye)]
1093 (.setLocation cam (.getWorldTranslation eye))
1094 (.lookAtDirection
1095 cam ; this part is not a mistake and
1096 (.mult rot Vector3f/UNIT_X) ; is consistent with using Z in
1097 (.mult rot Vector3f/UNIT_Y)) ; blender as the UP vector.
1098 (.setFrustumPerspective
1099 cam (float 45)
1100 (float (/ (.getWidth cam) (.getHeight cam)))
1101 (float 1)
1102 (float 1000))
1103 (bind-sense target cam) cam))
1104 #+end_listing
1105
1106 *** Simulated Retina
1107
1108 An eye is a surface (the retina) which contains many discrete
1109 sensors to detect light. These sensors can have different
1110 light-sensing properties. In humans, each discrete sensor is
1111 sensitive to red, blue, green, or gray. These different types of
1112 sensors can have different spatial distributions along the retina.
1113 In humans, there is a fovea in the center of the retina which has
1114 a very high density of color sensors, and a blind spot which has
1115 no sensors at all. Sensor density decreases in proportion to
1116 distance from the fovea.
1117
1118 I want to be able to model any retinal configuration, so my
1119 eye-nodes in blender contain metadata pointing to images that
1120 describe the precise position of the individual sensors using
1121 white pixels. The meta-data also describes the precise sensitivity
1122 to light that the sensors described in the image have. An eye can
1123 contain any number of these images. For example, the metadata for
1124 an eye might look like this:
1125
1126 #+begin_src clojure
1127 {0xFF0000 "Models/test-creature/retina-small.png"}
1128 #+end_src
1129
1130 #+caption: An example retinal profile image. White pixels are
1131 #+caption: photo-sensitive elements. The distribution of white
1132 #+caption: pixels is denser in the middle and falls off at the
1133 #+caption: edges and is inspired by the human retina.
1134 #+name: retina
1135 #+ATTR_LaTeX: :width 10cm
1136 [[./images/retina-small.png]]
1137
1138 Together, the number 0xFF0000 and the image image above describe
1139 the placement of red-sensitive sensory elements.
1140
1141 Meta-data to very crudely approximate a human eye might be
1142 something like this:
1143
1144 #+begin_src clojure
1145 (let [retinal-profile "Models/test-creature/retina-small.png"]
1146 {0xFF0000 retinal-profile
1147 0x00FF00 retinal-profile
1148 0x0000FF retinal-profile
1149 0xFFFFFF retinal-profile})
1150 #+end_src
1151
1152 The numbers that serve as keys in the map determine a sensor's
1153 relative sensitivity to the channels red, green, and blue. These
1154 sensitivity values are packed into an integer in the order
1155 =|_|R|G|B|= in 8-bit fields. The RGB values of a pixel in the
1156 image are added together with these sensitivities as linear
1157 weights. Therefore, 0xFF0000 means sensitive to red only while
1158 0xFFFFFF means sensitive to all colors equally (gray).
1159
1160 #+caption: This is the core of vision in =CORTEX=. A given eye node
1161 #+caption: is converted into a function that returns visual
1162 #+caption: information from the simulation.
1163 #+name: name
1164 #+begin_listing clojure
1165 (defn vision-kernel
1166 "Returns a list of functions, each of which will return a color
1167 channel's worth of visual information when called inside a running
1168 simulation."
1169 [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}]
1170 (let [retinal-map (retina-sensor-profile eye)
1171 camera (add-eye! creature eye)
1172 vision-image
1173 (atom
1174 (BufferedImage. (.getWidth camera)
1175 (.getHeight camera)
1176 BufferedImage/TYPE_BYTE_BINARY))
1177 register-eye!
1178 (runonce
1179 (fn [world]
1180 (add-camera!
1181 world camera
1182 (let [counter (atom 0)]
1183 (fn [r fb bb bi]
1184 (if (zero? (rem (swap! counter inc) (inc skip)))
1185 (reset! vision-image
1186 (BufferedImage! r fb bb bi))))))))]
1187 (vec
1188 (map
1189 (fn [[key image]]
1190 (let [whites (white-coordinates image)
1191 topology (vec (collapse whites))
1192 sensitivity (sensitivity-presets key key)]
1193 (attached-viewport.
1194 (fn [world]
1195 (register-eye! world)
1196 (vector
1197 topology
1198 (vec
1199 (for [[x y] whites]
1200 (pixel-sense
1201 sensitivity
1202 (.getRGB @vision-image x y))))))
1203 register-eye!)))
1204 retinal-map))))
1205 #+end_listing
1206
1207 Note that since each of the functions generated by =vision-kernel=
1208 shares the same =register-eye!= function, the eye will be
1209 registered only once the first time any of the functions from the
1210 list returned by =vision-kernel= is called. Each of the functions
1211 returned by =vision-kernel= also allows access to the =Viewport=
1212 through which it receives images.
1213
1214 All the hard work has been done; all that remains is to apply
1215 =vision-kernel= to each eye in the creature and gather the results
1216 into one list of functions.
1217
1218
1219 #+caption: With =vision!=, =CORTEX= is already a fine simulation
1220 #+caption: environment for experimenting with different types of
1221 #+caption: eyes.
1222 #+name: vision!
1223 #+begin_listing clojure
1224 (defn vision!
1225 "Returns a list of functions, each of which returns visual sensory
1226 data when called inside a running simulation."
1227 [#^Node creature & {skip :skip :or {skip 0}}]
1228 (reduce
1229 concat
1230 (for [eye (eyes creature)]
1231 (vision-kernel creature eye))))
1232 #+end_listing
1233
1234
1235
1236
944 1237
945 ** Hearing is hard; =CORTEX= does it right 1238 ** Hearing is hard; =CORTEX= does it right
946 1239
947 ** Touch uses hundreds of hair-like elements 1240 ** Touch uses hundreds of hair-like elements
948 1241