Mercurial > cortex
diff thesis/cortex.org @ 472:516a029e0be9
complete first draft of hearing.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Fri, 28 Mar 2014 18:14:04 -0400 |
parents | f14fa9e5b67f |
children | 486ce07f5545 |
line wrap: on
line diff
1.1 --- a/thesis/cortex.org Fri Mar 28 17:31:33 2014 -0400 1.2 +++ b/thesis/cortex.org Fri Mar 28 18:14:04 2014 -0400 1.3 @@ -954,7 +954,7 @@ 1.4 #+ATTR_LaTeX: :width 15cm 1.5 [[./images/physical-hand.png]] 1.6 1.7 -** Eyes reuse standard video game components 1.8 +** COMMENT Eyes reuse standard video game components 1.9 1.10 Vision is one of the most important senses for humans, so I need to 1.11 build a simulated sense of vision for my AI. I will do this with 1.12 @@ -1253,6 +1253,305 @@ 1.13 1.14 ** Hearing is hard; =CORTEX= does it right 1.15 1.16 + At the end of this section I will have simulated ears that work the 1.17 + same way as the simulated eyes in the last section. I will be able to 1.18 + place any number of ear-nodes in a blender file, and they will bind to 1.19 + the closest physical object and follow it as it moves around. Each ear 1.20 + will provide access to the sound data it picks up between every frame. 1.21 + 1.22 + Hearing is one of the more difficult senses to simulate, because there 1.23 + is less support for obtaining the actual sound data that is processed 1.24 + by jMonkeyEngine3. There is no "split-screen" support for rendering 1.25 + sound from different points of view, and there is no way to directly 1.26 + access the rendered sound data. 1.27 + 1.28 + =CORTEX='s hearing is unique because it does not have any 1.29 + limitations compared to other simulation environments. As far as I 1.30 + know, there is no other system that supports multiple listerers, 1.31 + and the sound demo at the end of this section is the first time 1.32 + it's been done in a video game environment. 1.33 + 1.34 +*** Brief Description of jMonkeyEngine's Sound System 1.35 + 1.36 + jMonkeyEngine's sound system works as follows: 1.37 + 1.38 + - jMonkeyEngine uses the =AppSettings= for the particular 1.39 + application to determine what sort of =AudioRenderer= should be 1.40 + used. 1.41 + - Although some support is provided for multiple AudioRendering 1.42 + backends, jMonkeyEngine at the time of this writing will either 1.43 + pick no =AudioRenderer= at all, or the =LwjglAudioRenderer=. 1.44 + - jMonkeyEngine tries to figure out what sort of system you're 1.45 + running and extracts the appropriate native libraries. 1.46 + - The =LwjglAudioRenderer= uses the [[http://lwjgl.org/][=LWJGL=]] (LightWeight Java Game 1.47 + Library) bindings to interface with a C library called [[http://kcat.strangesoft.net/openal.html][=OpenAL=]] 1.48 + - =OpenAL= renders the 3D sound and feeds the rendered sound 1.49 + directly to any of various sound output devices with which it 1.50 + knows how to communicate. 1.51 + 1.52 + A consequence of this is that there's no way to access the actual 1.53 + sound data produced by =OpenAL=. Even worse, =OpenAL= only supports 1.54 + one /listener/ (it renders sound data from only one perspective), 1.55 + which normally isn't a problem for games, but becomes a problem 1.56 + when trying to make multiple AI creatures that can each hear the 1.57 + world from a different perspective. 1.58 + 1.59 + To make many AI creatures in jMonkeyEngine that can each hear the 1.60 + world from their own perspective, or to make a single creature with 1.61 + many ears, it is necessary to go all the way back to =OpenAL= and 1.62 + implement support for simulated hearing there. 1.63 + 1.64 +*** Extending =OpenAl= 1.65 + 1.66 + Extending =OpenAL= to support multiple listeners requires 500 1.67 + lines of =C= code and is too hairy to mention here. Instead, I 1.68 + will show a small amount of extension code and go over the high 1.69 + level stragety. Full source is of course available with the 1.70 + =CORTEX= distribution if you're interested. 1.71 + 1.72 + =OpenAL= goes to great lengths to support many different systems, 1.73 + all with different sound capabilities and interfaces. It 1.74 + accomplishes this difficult task by providing code for many 1.75 + different sound backends in pseudo-objects called /Devices/. 1.76 + There's a device for the Linux Open Sound System and the Advanced 1.77 + Linux Sound Architecture, there's one for Direct Sound on Windows, 1.78 + and there's even one for Solaris. =OpenAL= solves the problem of 1.79 + platform independence by providing all these Devices. 1.80 + 1.81 + Wrapper libraries such as LWJGL are free to examine the system on 1.82 + which they are running and then select an appropriate device for 1.83 + that system. 1.84 + 1.85 + There are also a few "special" devices that don't interface with 1.86 + any particular system. These include the Null Device, which 1.87 + doesn't do anything, and the Wave Device, which writes whatever 1.88 + sound it receives to a file, if everything has been set up 1.89 + correctly when configuring =OpenAL=. 1.90 + 1.91 + Actual mixing (doppler shift and distance.environment-based 1.92 + attenuation) of the sound data happens in the Devices, and they 1.93 + are the only point in the sound rendering process where this data 1.94 + is available. 1.95 + 1.96 + Therefore, in order to support multiple listeners, and get the 1.97 + sound data in a form that the AIs can use, it is necessary to 1.98 + create a new Device which supports this feature. 1.99 + 1.100 + Adding a device to OpenAL is rather tricky -- there are five 1.101 + separate files in the =OpenAL= source tree that must be modified 1.102 + to do so. I named my device the "Multiple Audio Send" Device, or 1.103 + =Send= Device for short, since it sends audio data back to the 1.104 + calling application like an Aux-Send cable on a mixing board. 1.105 + 1.106 + The main idea behind the Send device is to take advantage of the 1.107 + fact that LWJGL only manages one /context/ when using OpenAL. A 1.108 + /context/ is like a container that holds samples and keeps track 1.109 + of where the listener is. In order to support multiple listeners, 1.110 + the Send device identifies the LWJGL context as the master 1.111 + context, and creates any number of slave contexts to represent 1.112 + additional listeners. Every time the device renders sound, it 1.113 + synchronizes every source from the master LWJGL context to the 1.114 + slave contexts. Then, it renders each context separately, using a 1.115 + different listener for each one. The rendered sound is made 1.116 + available via JNI to jMonkeyEngine. 1.117 + 1.118 + Switching between contexts is not the normal operation of a 1.119 + Device, and one of the problems with doing so is that a Device 1.120 + normally keeps around a few pieces of state such as the 1.121 + =ClickRemoval= array above which will become corrupted if the 1.122 + contexts are not rendered in parallel. The solution is to create a 1.123 + copy of this normally global device state for each context, and 1.124 + copy it back and forth into and out of the actual device state 1.125 + whenever a context is rendered. 1.126 + 1.127 + The core of the =Send= device is the =syncSources= function, which 1.128 + does the job of copying all relevant data from one context to 1.129 + another. 1.130 + 1.131 + #+caption: Program for extending =OpenAL= to support multiple 1.132 + #+caption: listeners via context copying/switching. 1.133 + #+name: sync-openal-sources 1.134 + #+begin_listing C 1.135 +void syncSources(ALsource *masterSource, ALsource *slaveSource, 1.136 + ALCcontext *masterCtx, ALCcontext *slaveCtx){ 1.137 + ALuint master = masterSource->source; 1.138 + ALuint slave = slaveSource->source; 1.139 + ALCcontext *current = alcGetCurrentContext(); 1.140 + 1.141 + syncSourcef(master,slave,masterCtx,slaveCtx,AL_PITCH); 1.142 + syncSourcef(master,slave,masterCtx,slaveCtx,AL_GAIN); 1.143 + syncSourcef(master,slave,masterCtx,slaveCtx,AL_MAX_DISTANCE); 1.144 + syncSourcef(master,slave,masterCtx,slaveCtx,AL_ROLLOFF_FACTOR); 1.145 + syncSourcef(master,slave,masterCtx,slaveCtx,AL_REFERENCE_DISTANCE); 1.146 + syncSourcef(master,slave,masterCtx,slaveCtx,AL_MIN_GAIN); 1.147 + syncSourcef(master,slave,masterCtx,slaveCtx,AL_MAX_GAIN); 1.148 + syncSourcef(master,slave,masterCtx,slaveCtx,AL_CONE_OUTER_GAIN); 1.149 + syncSourcef(master,slave,masterCtx,slaveCtx,AL_CONE_INNER_ANGLE); 1.150 + syncSourcef(master,slave,masterCtx,slaveCtx,AL_CONE_OUTER_ANGLE); 1.151 + syncSourcef(master,slave,masterCtx,slaveCtx,AL_SEC_OFFSET); 1.152 + syncSourcef(master,slave,masterCtx,slaveCtx,AL_SAMPLE_OFFSET); 1.153 + syncSourcef(master,slave,masterCtx,slaveCtx,AL_BYTE_OFFSET); 1.154 + 1.155 + syncSource3f(master,slave,masterCtx,slaveCtx,AL_POSITION); 1.156 + syncSource3f(master,slave,masterCtx,slaveCtx,AL_VELOCITY); 1.157 + syncSource3f(master,slave,masterCtx,slaveCtx,AL_DIRECTION); 1.158 + 1.159 + syncSourcei(master,slave,masterCtx,slaveCtx,AL_SOURCE_RELATIVE); 1.160 + syncSourcei(master,slave,masterCtx,slaveCtx,AL_LOOPING); 1.161 + 1.162 + alcMakeContextCurrent(masterCtx); 1.163 + ALint source_type; 1.164 + alGetSourcei(master, AL_SOURCE_TYPE, &source_type); 1.165 + 1.166 + // Only static sources are currently synchronized! 1.167 + if (AL_STATIC == source_type){ 1.168 + ALint master_buffer; 1.169 + ALint slave_buffer; 1.170 + alGetSourcei(master, AL_BUFFER, &master_buffer); 1.171 + alcMakeContextCurrent(slaveCtx); 1.172 + alGetSourcei(slave, AL_BUFFER, &slave_buffer); 1.173 + if (master_buffer != slave_buffer){ 1.174 + alSourcei(slave, AL_BUFFER, master_buffer); 1.175 + } 1.176 + } 1.177 + 1.178 + // Synchronize the state of the two sources. 1.179 + alcMakeContextCurrent(masterCtx); 1.180 + ALint masterState; 1.181 + ALint slaveState; 1.182 + 1.183 + alGetSourcei(master, AL_SOURCE_STATE, &masterState); 1.184 + alcMakeContextCurrent(slaveCtx); 1.185 + alGetSourcei(slave, AL_SOURCE_STATE, &slaveState); 1.186 + 1.187 + if (masterState != slaveState){ 1.188 + switch (masterState){ 1.189 + case AL_INITIAL : alSourceRewind(slave); break; 1.190 + case AL_PLAYING : alSourcePlay(slave); break; 1.191 + case AL_PAUSED : alSourcePause(slave); break; 1.192 + case AL_STOPPED : alSourceStop(slave); break; 1.193 + } 1.194 + } 1.195 + // Restore whatever context was previously active. 1.196 + alcMakeContextCurrent(current); 1.197 +} 1.198 + #+end_listing 1.199 + 1.200 + With this special context-switching device, and some ugly JNI 1.201 + bindings that are not worth mentioning, =CORTEX= gains the ability 1.202 + to access multiple sound streams from =OpenAL=. 1.203 + 1.204 + #+caption: Program to create an ear from a blender empty node. The ear 1.205 + #+caption: follows around the nearest physical object and passes 1.206 + #+caption: all sensory data to a continuation function. 1.207 + #+name: add-ear 1.208 + #+begin_listing clojure 1.209 +(defn add-ear! 1.210 + "Create a Listener centered on the current position of 'ear 1.211 + which follows the closest physical node in 'creature and 1.212 + sends sound data to 'continuation." 1.213 + [#^Application world #^Node creature #^Spatial ear continuation] 1.214 + (let [target (closest-node creature ear) 1.215 + lis (Listener.) 1.216 + audio-renderer (.getAudioRenderer world) 1.217 + sp (hearing-pipeline continuation)] 1.218 + (.setLocation lis (.getWorldTranslation ear)) 1.219 + (.setRotation lis (.getWorldRotation ear)) 1.220 + (bind-sense target lis) 1.221 + (update-listener-velocity! target lis) 1.222 + (.addListener audio-renderer lis) 1.223 + (.registerSoundProcessor audio-renderer lis sp))) 1.224 + #+end_listing 1.225 + 1.226 + 1.227 + The =Send= device, unlike most of the other devices in =OpenAL=, 1.228 + does not render sound unless asked. This enables the system to 1.229 + slow down or speed up depending on the needs of the AIs who are 1.230 + using it to listen. If the device tried to render samples in 1.231 + real-time, a complicated AI whose mind takes 100 seconds of 1.232 + computer time to simulate 1 second of AI-time would miss almost 1.233 + all of the sound in its environment! 1.234 + 1.235 + #+caption: Program to enable arbitrary hearing in =CORTEX= 1.236 + #+name: hearing 1.237 + #+begin_listing clojure 1.238 +(defn hearing-kernel 1.239 + "Returns a function which returns auditory sensory data when called 1.240 + inside a running simulation." 1.241 + [#^Node creature #^Spatial ear] 1.242 + (let [hearing-data (atom []) 1.243 + register-listener! 1.244 + (runonce 1.245 + (fn [#^Application world] 1.246 + (add-ear! 1.247 + world creature ear 1.248 + (comp #(reset! hearing-data %) 1.249 + byteBuffer->pulse-vector))))] 1.250 + (fn [#^Application world] 1.251 + (register-listener! world) 1.252 + (let [data @hearing-data 1.253 + topology 1.254 + (vec (map #(vector % 0) (range 0 (count data))))] 1.255 + [topology data])))) 1.256 + 1.257 +(defn hearing! 1.258 + "Endow the creature in a particular world with the sense of 1.259 + hearing. Will return a sequence of functions, one for each ear, 1.260 + which when called will return the auditory data from that ear." 1.261 + [#^Node creature] 1.262 + (for [ear (ears creature)] 1.263 + (hearing-kernel creature ear))) 1.264 + #+end_listing 1.265 + 1.266 + Armed with these functions, =CORTEX= is able to test possibly the 1.267 + first ever instance of multiple listeners in a video game engine 1.268 + based simulation! 1.269 + 1.270 + #+caption: Here a simple creature responds to sound by changing 1.271 + #+caption: its color from gray to green when the total volume 1.272 + #+caption: goes over a threshold. 1.273 + #+name: sound-test 1.274 + #+begin_listing java 1.275 +/** 1.276 + * Respond to sound! This is the brain of an AI entity that 1.277 + * hears its surroundings and reacts to them. 1.278 + */ 1.279 +public void process(ByteBuffer audioSamples, 1.280 + int numSamples, AudioFormat format) { 1.281 + audioSamples.clear(); 1.282 + byte[] data = new byte[numSamples]; 1.283 + float[] out = new float[numSamples]; 1.284 + audioSamples.get(data); 1.285 + FloatSampleTools. 1.286 + byte2floatInterleaved 1.287 + (data, 0, out, 0, numSamples/format.getFrameSize(), format); 1.288 + 1.289 + float max = Float.NEGATIVE_INFINITY; 1.290 + for (float f : out){if (f > max) max = f;} 1.291 + audioSamples.clear(); 1.292 + 1.293 + if (max > 0.1){ 1.294 + entity.getMaterial().setColor("Color", ColorRGBA.Green); 1.295 + } 1.296 + else { 1.297 + entity.getMaterial().setColor("Color", ColorRGBA.Gray); 1.298 + } 1.299 + #+end_listing 1.300 + 1.301 + #+caption: First ever simulation of multiple listerners in =CORTEX=. 1.302 + #+caption: Each cube is a creature which processes sound data with 1.303 + #+caption: the =process= function from listing \ref{sound-test}. 1.304 + #+caption: the ball is constantally emiting a pure tone of 1.305 + #+caption: constant volume. As it approaches the cubes, they each 1.306 + #+caption: change color in response to the sound. 1.307 + #+name: sound-cubes. 1.308 + #+ATTR_LaTeX: :width 10cm 1.309 + [[./images/aurellem-gray.png]] 1.310 + 1.311 + This system of hearing has also been co-opted by the 1.312 + jMonkeyEngine3 community and is used to record audio for demo 1.313 + videos. 1.314 + 1.315 ** Touch uses hundreds of hair-like elements 1.316 1.317 ** Proprioception is the sense that makes everything ``real''