Mercurial > cortex
comparison thesis/cortex.org @ 470:3401053124b0
integrating vision into thesis.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Fri, 28 Mar 2014 17:10:43 -0400 |
parents | ae10f35022ba |
children | f14fa9e5b67f |
comparison
equal
deleted
inserted
replaced
469:ae10f35022ba | 470:3401053124b0 |
---|---|
4 #+description: Using embodied AI to facilitate Artificial Imagination. | 4 #+description: Using embodied AI to facilitate Artificial Imagination. |
5 #+keywords: AI, clojure, embodiment | 5 #+keywords: AI, clojure, embodiment |
6 #+LaTeX_CLASS_OPTIONS: [nofloat] | 6 #+LaTeX_CLASS_OPTIONS: [nofloat] |
7 | 7 |
8 * COMMENT templates | 8 * COMMENT templates |
9 #+caption: | 9 #+caption: |
10 #+caption: | 10 #+caption: |
11 #+caption: | 11 #+caption: |
12 #+caption: | 12 #+caption: |
13 #+name: name | 13 #+name: name |
14 #+begin_listing clojure | 14 #+begin_listing clojure |
15 #+begin_src clojure | 15 #+end_listing |
16 #+end_src | 16 |
17 #+end_listing | 17 #+caption: |
18 | 18 #+caption: |
19 #+caption: | 19 #+caption: |
20 #+caption: | 20 #+name: name |
21 #+caption: | 21 #+ATTR_LaTeX: :width 10cm |
22 #+name: name | 22 [[./images/aurellem-gray.png]] |
23 #+ATTR_LaTeX: :width 10cm | 23 |
24 [[./images/aurellem-gray.png]] | 24 #+caption: |
25 #+caption: | |
26 #+caption: | |
27 #+caption: | |
28 #+name: name | |
29 #+begin_listing clojure | |
30 #+end_listing | |
31 | |
32 #+caption: | |
33 #+caption: | |
34 #+caption: | |
35 #+name: name | |
36 #+ATTR_LaTeX: :width 10cm | |
37 [[./images/aurellem-gray.png]] | |
38 | |
25 | 39 |
26 * COMMENT Empathy and Embodiment as problem solving strategies | 40 * COMMENT Empathy and Embodiment as problem solving strategies |
27 | 41 |
28 By the end of this thesis, you will have seen a novel approach to | 42 By the end of this thesis, you will have seen a novel approach to |
29 interpreting video using embodiment and empathy. You will have also | 43 interpreting video using embodiment and empathy. You will have also |
939 #+name: name | 953 #+name: name |
940 #+ATTR_LaTeX: :width 15cm | 954 #+ATTR_LaTeX: :width 15cm |
941 [[./images/physical-hand.png]] | 955 [[./images/physical-hand.png]] |
942 | 956 |
943 ** Eyes reuse standard video game components | 957 ** Eyes reuse standard video game components |
958 | |
959 Vision is one of the most important senses for humans, so I need to | |
960 build a simulated sense of vision for my AI. I will do this with | |
961 simulated eyes. Each eye can be independently moved and should see | |
962 its own version of the world depending on where it is. | |
963 | |
964 Making these simulated eyes a reality is simple because | |
965 jMonkeyEngine already contains extensive support for multiple views | |
966 of the same 3D simulated world. The reason jMonkeyEngine has this | |
967 support is because the support is necessary to create games with | |
968 split-screen views. Multiple views are also used to create | |
969 efficient pseudo-reflections by rendering the scene from a certain | |
970 perspective and then projecting it back onto a surface in the 3D | |
971 world. | |
972 | |
973 #+caption: jMonkeyEngine supports multiple views to enable | |
974 #+caption: split-screen games, like GoldenEye, which was one of | |
975 #+caption: the first games to use split-screen views. | |
976 #+name: name | |
977 #+ATTR_LaTeX: :width 10cm | |
978 [[./images/goldeneye-4-player.png]] | |
979 | |
980 *** A Brief Description of jMonkeyEngine's Rendering Pipeline | |
981 | |
982 jMonkeyEngine allows you to create a =ViewPort=, which represents a | |
983 view of the simulated world. You can create as many of these as you | |
984 want. Every frame, the =RenderManager= iterates through each | |
985 =ViewPort=, rendering the scene in the GPU. For each =ViewPort= there | |
986 is a =FrameBuffer= which represents the rendered image in the GPU. | |
987 | |
988 #+caption: =ViewPorts= are cameras in the world. During each frame, | |
989 #+caption: the =RenderManager= records a snapshot of what each view | |
990 #+caption: is currently seeing; these snapshots are =FrameBuffer= objects. | |
991 #+name: name | |
992 #+ATTR_LaTeX: :width 10cm | |
993 [[../images/diagram_rendermanager2.png]] | |
994 | |
995 Each =ViewPort= can have any number of attached =SceneProcessor= | |
996 objects, which are called every time a new frame is rendered. A | |
997 =SceneProcessor= receives its =ViewPort's= =FrameBuffer= and can do | |
998 whatever it wants to the data. Often this consists of invoking GPU | |
999 specific operations on the rendered image. The =SceneProcessor= can | |
1000 also copy the GPU image data to RAM and process it with the CPU. | |
1001 | |
1002 *** Appropriating Views for Vision | |
1003 | |
1004 Each eye in the simulated creature needs its own =ViewPort= so | |
1005 that it can see the world from its own perspective. To this | |
1006 =ViewPort=, I add a =SceneProcessor= that feeds the visual data to | |
1007 any arbitrary continuation function for further processing. That | |
1008 continuation function may perform both CPU and GPU operations on | |
1009 the data. To make this easy for the continuation function, the | |
1010 =SceneProcessor= maintains appropriately sized buffers in RAM to | |
1011 hold the data. It does not do any copying from the GPU to the CPU | |
1012 itself because it is a slow operation. | |
1013 | |
1014 #+caption: Function to make the rendered secne in jMonkeyEngine | |
1015 #+caption: available for further processing. | |
1016 #+name: pipeline-1 | |
1017 #+begin_listing clojure | |
1018 #+begin_src clojure | |
1019 (defn vision-pipeline | |
1020 "Create a SceneProcessor object which wraps a vision processing | |
1021 continuation function. The continuation is a function that takes | |
1022 [#^Renderer r #^FrameBuffer fb #^ByteBuffer b #^BufferedImage bi], | |
1023 each of which has already been appropriately sized." | |
1024 [continuation] | |
1025 (let [byte-buffer (atom nil) | |
1026 renderer (atom nil) | |
1027 image (atom nil)] | |
1028 (proxy [SceneProcessor] [] | |
1029 (initialize | |
1030 [renderManager viewPort] | |
1031 (let [cam (.getCamera viewPort) | |
1032 width (.getWidth cam) | |
1033 height (.getHeight cam)] | |
1034 (reset! renderer (.getRenderer renderManager)) | |
1035 (reset! byte-buffer | |
1036 (BufferUtils/createByteBuffer | |
1037 (* width height 4))) | |
1038 (reset! image (BufferedImage. | |
1039 width height | |
1040 BufferedImage/TYPE_4BYTE_ABGR)))) | |
1041 (isInitialized [] (not (nil? @byte-buffer))) | |
1042 (reshape [_ _ _]) | |
1043 (preFrame [_]) | |
1044 (postQueue [_]) | |
1045 (postFrame | |
1046 [#^FrameBuffer fb] | |
1047 (.clear @byte-buffer) | |
1048 (continuation @renderer fb @byte-buffer @image)) | |
1049 (cleanup [])))) | |
1050 #+end_src | |
1051 #+end_listing | |
1052 | |
1053 The continuation function given to =vision-pipeline= above will be | |
1054 given a =Renderer= and three containers for image data. The | |
1055 =FrameBuffer= references the GPU image data, but the pixel data | |
1056 can not be used directly on the CPU. The =ByteBuffer= and | |
1057 =BufferedImage= are initially "empty" but are sized to hold the | |
1058 data in the =FrameBuffer=. I call transferring the GPU image data | |
1059 to the CPU structures "mixing" the image data. | |
1060 | |
1061 *** Optical sensor arrays are described with images and referenced with metadata | |
1062 | |
1063 The vision pipeline described above handles the flow of rendered | |
1064 images. Now, =CORTEX= needs simulated eyes to serve as the source | |
1065 of these images. | |
1066 | |
1067 An eye is described in blender in the same way as a joint. They | |
1068 are zero dimensional empty objects with no geometry whose local | |
1069 coordinate system determines the orientation of the resulting eye. | |
1070 All eyes are children of a parent node named "eyes" just as all | |
1071 joints have a parent named "joints". An eye binds to the nearest | |
1072 physical object with =bind-sense=. | |
1073 | |
1074 #+caption: Here, the camera is created based on metadata on the | |
1075 #+caption: eye-node and attached to the nearest physical object | |
1076 #+caption: with =bind-sense= | |
1077 #+name: add-eye | |
1078 #+begin_listing clojure | |
1079 (defn add-eye! | |
1080 "Create a Camera centered on the current position of 'eye which | |
1081 follows the closest physical node in 'creature. The camera will | |
1082 point in the X direction and use the Z vector as up as determined | |
1083 by the rotation of these vectors in blender coordinate space. Use | |
1084 XZY rotation for the node in blender." | |
1085 [#^Node creature #^Spatial eye] | |
1086 (let [target (closest-node creature eye) | |
1087 [cam-width cam-height] | |
1088 ;;[640 480] ;; graphics card on laptop doesn't support | |
1089 ;; arbitray dimensions. | |
1090 (eye-dimensions eye) | |
1091 cam (Camera. cam-width cam-height) | |
1092 rot (.getWorldRotation eye)] | |
1093 (.setLocation cam (.getWorldTranslation eye)) | |
1094 (.lookAtDirection | |
1095 cam ; this part is not a mistake and | |
1096 (.mult rot Vector3f/UNIT_X) ; is consistent with using Z in | |
1097 (.mult rot Vector3f/UNIT_Y)) ; blender as the UP vector. | |
1098 (.setFrustumPerspective | |
1099 cam (float 45) | |
1100 (float (/ (.getWidth cam) (.getHeight cam))) | |
1101 (float 1) | |
1102 (float 1000)) | |
1103 (bind-sense target cam) cam)) | |
1104 #+end_listing | |
1105 | |
1106 *** Simulated Retina | |
1107 | |
1108 An eye is a surface (the retina) which contains many discrete | |
1109 sensors to detect light. These sensors can have different | |
1110 light-sensing properties. In humans, each discrete sensor is | |
1111 sensitive to red, blue, green, or gray. These different types of | |
1112 sensors can have different spatial distributions along the retina. | |
1113 In humans, there is a fovea in the center of the retina which has | |
1114 a very high density of color sensors, and a blind spot which has | |
1115 no sensors at all. Sensor density decreases in proportion to | |
1116 distance from the fovea. | |
1117 | |
1118 I want to be able to model any retinal configuration, so my | |
1119 eye-nodes in blender contain metadata pointing to images that | |
1120 describe the precise position of the individual sensors using | |
1121 white pixels. The meta-data also describes the precise sensitivity | |
1122 to light that the sensors described in the image have. An eye can | |
1123 contain any number of these images. For example, the metadata for | |
1124 an eye might look like this: | |
1125 | |
1126 #+begin_src clojure | |
1127 {0xFF0000 "Models/test-creature/retina-small.png"} | |
1128 #+end_src | |
1129 | |
1130 #+caption: An example retinal profile image. White pixels are | |
1131 #+caption: photo-sensitive elements. The distribution of white | |
1132 #+caption: pixels is denser in the middle and falls off at the | |
1133 #+caption: edges and is inspired by the human retina. | |
1134 #+name: retina | |
1135 #+ATTR_LaTeX: :width 10cm | |
1136 [[./images/retina-small.png]] | |
1137 | |
1138 Together, the number 0xFF0000 and the image image above describe | |
1139 the placement of red-sensitive sensory elements. | |
1140 | |
1141 Meta-data to very crudely approximate a human eye might be | |
1142 something like this: | |
1143 | |
1144 #+begin_src clojure | |
1145 (let [retinal-profile "Models/test-creature/retina-small.png"] | |
1146 {0xFF0000 retinal-profile | |
1147 0x00FF00 retinal-profile | |
1148 0x0000FF retinal-profile | |
1149 0xFFFFFF retinal-profile}) | |
1150 #+end_src | |
1151 | |
1152 The numbers that serve as keys in the map determine a sensor's | |
1153 relative sensitivity to the channels red, green, and blue. These | |
1154 sensitivity values are packed into an integer in the order | |
1155 =|_|R|G|B|= in 8-bit fields. The RGB values of a pixel in the | |
1156 image are added together with these sensitivities as linear | |
1157 weights. Therefore, 0xFF0000 means sensitive to red only while | |
1158 0xFFFFFF means sensitive to all colors equally (gray). | |
1159 | |
1160 #+caption: This is the core of vision in =CORTEX=. A given eye node | |
1161 #+caption: is converted into a function that returns visual | |
1162 #+caption: information from the simulation. | |
1163 #+name: name | |
1164 #+begin_listing clojure | |
1165 (defn vision-kernel | |
1166 "Returns a list of functions, each of which will return a color | |
1167 channel's worth of visual information when called inside a running | |
1168 simulation." | |
1169 [#^Node creature #^Spatial eye & {skip :skip :or {skip 0}}] | |
1170 (let [retinal-map (retina-sensor-profile eye) | |
1171 camera (add-eye! creature eye) | |
1172 vision-image | |
1173 (atom | |
1174 (BufferedImage. (.getWidth camera) | |
1175 (.getHeight camera) | |
1176 BufferedImage/TYPE_BYTE_BINARY)) | |
1177 register-eye! | |
1178 (runonce | |
1179 (fn [world] | |
1180 (add-camera! | |
1181 world camera | |
1182 (let [counter (atom 0)] | |
1183 (fn [r fb bb bi] | |
1184 (if (zero? (rem (swap! counter inc) (inc skip))) | |
1185 (reset! vision-image | |
1186 (BufferedImage! r fb bb bi))))))))] | |
1187 (vec | |
1188 (map | |
1189 (fn [[key image]] | |
1190 (let [whites (white-coordinates image) | |
1191 topology (vec (collapse whites)) | |
1192 sensitivity (sensitivity-presets key key)] | |
1193 (attached-viewport. | |
1194 (fn [world] | |
1195 (register-eye! world) | |
1196 (vector | |
1197 topology | |
1198 (vec | |
1199 (for [[x y] whites] | |
1200 (pixel-sense | |
1201 sensitivity | |
1202 (.getRGB @vision-image x y)))))) | |
1203 register-eye!))) | |
1204 retinal-map)))) | |
1205 #+end_listing | |
1206 | |
1207 Note that since each of the functions generated by =vision-kernel= | |
1208 shares the same =register-eye!= function, the eye will be | |
1209 registered only once the first time any of the functions from the | |
1210 list returned by =vision-kernel= is called. Each of the functions | |
1211 returned by =vision-kernel= also allows access to the =Viewport= | |
1212 through which it receives images. | |
1213 | |
1214 All the hard work has been done; all that remains is to apply | |
1215 =vision-kernel= to each eye in the creature and gather the results | |
1216 into one list of functions. | |
1217 | |
1218 | |
1219 #+caption: With =vision!=, =CORTEX= is already a fine simulation | |
1220 #+caption: environment for experimenting with different types of | |
1221 #+caption: eyes. | |
1222 #+name: vision! | |
1223 #+begin_listing clojure | |
1224 (defn vision! | |
1225 "Returns a list of functions, each of which returns visual sensory | |
1226 data when called inside a running simulation." | |
1227 [#^Node creature & {skip :skip :or {skip 0}}] | |
1228 (reduce | |
1229 concat | |
1230 (for [eye (eyes creature)] | |
1231 (vision-kernel creature eye)))) | |
1232 #+end_listing | |
1233 | |
1234 | |
1235 | |
1236 | |
944 | 1237 |
945 ** Hearing is hard; =CORTEX= does it right | 1238 ** Hearing is hard; =CORTEX= does it right |
946 | 1239 |
947 ** Touch uses hundreds of hair-like elements | 1240 ** Touch uses hundreds of hair-like elements |
948 | 1241 |