diff thesis/cortex.org @ 472:516a029e0be9

complete first draft of hearing.
author Robert McIntyre <rlm@mit.edu>
date Fri, 28 Mar 2014 18:14:04 -0400
parents f14fa9e5b67f
children 486ce07f5545
line wrap: on
line diff
     1.1 --- a/thesis/cortex.org	Fri Mar 28 17:31:33 2014 -0400
     1.2 +++ b/thesis/cortex.org	Fri Mar 28 18:14:04 2014 -0400
     1.3 @@ -954,7 +954,7 @@
     1.4      #+ATTR_LaTeX: :width 15cm
     1.5      [[./images/physical-hand.png]]
     1.6  
     1.7 -** Eyes reuse standard video game components
     1.8 +** COMMENT Eyes reuse standard video game components
     1.9  
    1.10     Vision is one of the most important senses for humans, so I need to
    1.11     build a simulated sense of vision for my AI. I will do this with
    1.12 @@ -1253,6 +1253,305 @@
    1.13  
    1.14  ** Hearing is hard; =CORTEX= does it right
    1.15  
    1.16 +   At the end of this section I will have simulated ears that work the
    1.17 +   same way as the simulated eyes in the last section. I will be able to
    1.18 +   place any number of ear-nodes in a blender file, and they will bind to
    1.19 +   the closest physical object and follow it as it moves around. Each ear
    1.20 +   will provide access to the sound data it picks up between every frame.
    1.21 +
    1.22 +   Hearing is one of the more difficult senses to simulate, because there
    1.23 +   is less support for obtaining the actual sound data that is processed
    1.24 +   by jMonkeyEngine3. There is no "split-screen" support for rendering
    1.25 +   sound from different points of view, and there is no way to directly
    1.26 +   access the rendered sound data.
    1.27 +
    1.28 +   =CORTEX='s hearing is unique because it does not have any
    1.29 +   limitations compared to other simulation environments. As far as I
    1.30 +   know, there is no other system that supports multiple listerers,
    1.31 +   and the sound demo at the end of this section is the first time
    1.32 +   it's been done in a video game environment.
    1.33 +
    1.34 +*** Brief Description of jMonkeyEngine's Sound System
    1.35 +
    1.36 +   jMonkeyEngine's sound system works as follows:
    1.37 +
    1.38 +   - jMonkeyEngine uses the =AppSettings= for the particular
    1.39 +     application to determine what sort of =AudioRenderer= should be
    1.40 +     used.
    1.41 +   - Although some support is provided for multiple AudioRendering
    1.42 +     backends, jMonkeyEngine at the time of this writing will either
    1.43 +     pick no =AudioRenderer= at all, or the =LwjglAudioRenderer=.
    1.44 +   - jMonkeyEngine tries to figure out what sort of system you're
    1.45 +     running and extracts the appropriate native libraries.
    1.46 +   - The =LwjglAudioRenderer= uses the [[http://lwjgl.org/][=LWJGL=]] (LightWeight Java Game
    1.47 +     Library) bindings to interface with a C library called [[http://kcat.strangesoft.net/openal.html][=OpenAL=]]
    1.48 +   - =OpenAL= renders the 3D sound and feeds the rendered sound
    1.49 +     directly to any of various sound output devices with which it
    1.50 +     knows how to communicate.
    1.51 +  
    1.52 +   A consequence of this is that there's no way to access the actual
    1.53 +   sound data produced by =OpenAL=. Even worse, =OpenAL= only supports
    1.54 +   one /listener/ (it renders sound data from only one perspective),
    1.55 +   which normally isn't a problem for games, but becomes a problem
    1.56 +   when trying to make multiple AI creatures that can each hear the
    1.57 +   world from a different perspective.
    1.58 +
    1.59 +   To make many AI creatures in jMonkeyEngine that can each hear the
    1.60 +   world from their own perspective, or to make a single creature with
    1.61 +   many ears, it is necessary to go all the way back to =OpenAL= and
    1.62 +   implement support for simulated hearing there.
    1.63 +
    1.64 +*** Extending =OpenAl=
    1.65 +
    1.66 +    Extending =OpenAL= to support multiple listeners requires 500
    1.67 +    lines of =C= code and is too hairy to mention here. Instead, I
    1.68 +    will show a small amount of extension code and go over the high
    1.69 +    level stragety. Full source is of course available with the
    1.70 +    =CORTEX= distribution if you're interested.
    1.71 +
    1.72 +    =OpenAL= goes to great lengths to support many different systems,
    1.73 +    all with different sound capabilities and interfaces. It
    1.74 +    accomplishes this difficult task by providing code for many
    1.75 +    different sound backends in pseudo-objects called /Devices/.
    1.76 +    There's a device for the Linux Open Sound System and the Advanced
    1.77 +    Linux Sound Architecture, there's one for Direct Sound on Windows,
    1.78 +    and there's even one for Solaris. =OpenAL= solves the problem of
    1.79 +    platform independence by providing all these Devices.
    1.80 +
    1.81 +    Wrapper libraries such as LWJGL are free to examine the system on
    1.82 +    which they are running and then select an appropriate device for
    1.83 +    that system.
    1.84 +
    1.85 +    There are also a few "special" devices that don't interface with
    1.86 +    any particular system. These include the Null Device, which
    1.87 +    doesn't do anything, and the Wave Device, which writes whatever
    1.88 +    sound it receives to a file, if everything has been set up
    1.89 +    correctly when configuring =OpenAL=.
    1.90 +
    1.91 +    Actual mixing (doppler shift and distance.environment-based
    1.92 +    attenuation) of the sound data happens in the Devices, and they
    1.93 +    are the only point in the sound rendering process where this data
    1.94 +    is available.
    1.95 +
    1.96 +    Therefore, in order to support multiple listeners, and get the
    1.97 +    sound data in a form that the AIs can use, it is necessary to
    1.98 +    create a new Device which supports this feature.
    1.99 +
   1.100 +    Adding a device to OpenAL is rather tricky -- there are five
   1.101 +    separate files in the =OpenAL= source tree that must be modified
   1.102 +    to do so. I named my device the "Multiple Audio Send" Device, or
   1.103 +    =Send= Device for short, since it sends audio data back to the
   1.104 +    calling application like an Aux-Send cable on a mixing board.
   1.105 +
   1.106 +    The main idea behind the Send device is to take advantage of the
   1.107 +    fact that LWJGL only manages one /context/ when using OpenAL. A
   1.108 +    /context/ is like a container that holds samples and keeps track
   1.109 +    of where the listener is. In order to support multiple listeners,
   1.110 +    the Send device identifies the LWJGL context as the master
   1.111 +    context, and creates any number of slave contexts to represent
   1.112 +    additional listeners. Every time the device renders sound, it
   1.113 +    synchronizes every source from the master LWJGL context to the
   1.114 +    slave contexts. Then, it renders each context separately, using a
   1.115 +    different listener for each one. The rendered sound is made
   1.116 +    available via JNI to jMonkeyEngine.
   1.117 +
   1.118 +    Switching between contexts is not the normal operation of a
   1.119 +    Device, and one of the problems with doing so is that a Device
   1.120 +    normally keeps around a few pieces of state such as the
   1.121 +    =ClickRemoval= array above which will become corrupted if the
   1.122 +    contexts are not rendered in parallel. The solution is to create a
   1.123 +    copy of this normally global device state for each context, and
   1.124 +    copy it back and forth into and out of the actual device state
   1.125 +    whenever a context is rendered.
   1.126 +
   1.127 +    The core of the =Send= device is the =syncSources= function, which
   1.128 +    does the job of copying all relevant data from one context to
   1.129 +    another. 
   1.130 +
   1.131 +    #+caption: Program for extending =OpenAL= to support multiple
   1.132 +    #+caption: listeners via context copying/switching.
   1.133 +    #+name: sync-openal-sources
   1.134 +    #+begin_listing C
   1.135 +void syncSources(ALsource *masterSource, ALsource *slaveSource, 
   1.136 +		 ALCcontext *masterCtx, ALCcontext *slaveCtx){
   1.137 +  ALuint master = masterSource->source;
   1.138 +  ALuint slave = slaveSource->source;
   1.139 +  ALCcontext *current = alcGetCurrentContext();
   1.140 +
   1.141 +  syncSourcef(master,slave,masterCtx,slaveCtx,AL_PITCH);
   1.142 +  syncSourcef(master,slave,masterCtx,slaveCtx,AL_GAIN);
   1.143 +  syncSourcef(master,slave,masterCtx,slaveCtx,AL_MAX_DISTANCE);
   1.144 +  syncSourcef(master,slave,masterCtx,slaveCtx,AL_ROLLOFF_FACTOR);
   1.145 +  syncSourcef(master,slave,masterCtx,slaveCtx,AL_REFERENCE_DISTANCE);
   1.146 +  syncSourcef(master,slave,masterCtx,slaveCtx,AL_MIN_GAIN);
   1.147 +  syncSourcef(master,slave,masterCtx,slaveCtx,AL_MAX_GAIN);
   1.148 +  syncSourcef(master,slave,masterCtx,slaveCtx,AL_CONE_OUTER_GAIN);
   1.149 +  syncSourcef(master,slave,masterCtx,slaveCtx,AL_CONE_INNER_ANGLE);
   1.150 +  syncSourcef(master,slave,masterCtx,slaveCtx,AL_CONE_OUTER_ANGLE);
   1.151 +  syncSourcef(master,slave,masterCtx,slaveCtx,AL_SEC_OFFSET);
   1.152 +  syncSourcef(master,slave,masterCtx,slaveCtx,AL_SAMPLE_OFFSET);
   1.153 +  syncSourcef(master,slave,masterCtx,slaveCtx,AL_BYTE_OFFSET);
   1.154 +    
   1.155 +  syncSource3f(master,slave,masterCtx,slaveCtx,AL_POSITION);
   1.156 +  syncSource3f(master,slave,masterCtx,slaveCtx,AL_VELOCITY);
   1.157 +  syncSource3f(master,slave,masterCtx,slaveCtx,AL_DIRECTION);
   1.158 +  
   1.159 +  syncSourcei(master,slave,masterCtx,slaveCtx,AL_SOURCE_RELATIVE);
   1.160 +  syncSourcei(master,slave,masterCtx,slaveCtx,AL_LOOPING);
   1.161 +
   1.162 +  alcMakeContextCurrent(masterCtx);
   1.163 +  ALint source_type;
   1.164 +  alGetSourcei(master, AL_SOURCE_TYPE, &source_type);
   1.165 +
   1.166 +  // Only static sources are currently synchronized! 
   1.167 +  if (AL_STATIC == source_type){
   1.168 +    ALint master_buffer;
   1.169 +    ALint slave_buffer;
   1.170 +    alGetSourcei(master, AL_BUFFER, &master_buffer);
   1.171 +    alcMakeContextCurrent(slaveCtx);
   1.172 +    alGetSourcei(slave, AL_BUFFER, &slave_buffer);
   1.173 +    if (master_buffer != slave_buffer){
   1.174 +      alSourcei(slave, AL_BUFFER, master_buffer);
   1.175 +    }
   1.176 +  }
   1.177 +  
   1.178 +  // Synchronize the state of the two sources.
   1.179 +  alcMakeContextCurrent(masterCtx);
   1.180 +  ALint masterState;
   1.181 +  ALint slaveState;
   1.182 +
   1.183 +  alGetSourcei(master, AL_SOURCE_STATE, &masterState);
   1.184 +  alcMakeContextCurrent(slaveCtx);
   1.185 +  alGetSourcei(slave, AL_SOURCE_STATE, &slaveState);
   1.186 +
   1.187 +  if (masterState != slaveState){
   1.188 +    switch (masterState){
   1.189 +    case AL_INITIAL : alSourceRewind(slave); break;
   1.190 +    case AL_PLAYING : alSourcePlay(slave);   break;
   1.191 +    case AL_PAUSED  : alSourcePause(slave);  break;
   1.192 +    case AL_STOPPED : alSourceStop(slave);   break;
   1.193 +    }
   1.194 +  }
   1.195 +  // Restore whatever context was previously active.
   1.196 +  alcMakeContextCurrent(current);
   1.197 +}
   1.198 +    #+end_listing
   1.199 +
   1.200 +    With this special context-switching device, and some ugly JNI
   1.201 +    bindings that are not worth mentioning, =CORTEX= gains the ability
   1.202 +    to access multiple sound streams from =OpenAL=. 
   1.203 +
   1.204 +    #+caption: Program to create an ear from a blender empty node. The ear
   1.205 +    #+caption: follows around the nearest physical object and passes 
   1.206 +    #+caption: all sensory data to a continuation function.
   1.207 +    #+name: add-ear
   1.208 +    #+begin_listing clojure
   1.209 +(defn add-ear!  
   1.210 +  "Create a Listener centered on the current position of 'ear 
   1.211 +   which follows the closest physical node in 'creature and 
   1.212 +   sends sound data to 'continuation."
   1.213 +  [#^Application world #^Node creature #^Spatial ear continuation]
   1.214 +  (let [target (closest-node creature ear)
   1.215 +        lis (Listener.)
   1.216 +        audio-renderer (.getAudioRenderer world)
   1.217 +        sp (hearing-pipeline continuation)]
   1.218 +    (.setLocation lis (.getWorldTranslation ear))
   1.219 +    (.setRotation lis (.getWorldRotation ear))
   1.220 +    (bind-sense target lis)
   1.221 +    (update-listener-velocity! target lis)
   1.222 +    (.addListener audio-renderer lis)
   1.223 +    (.registerSoundProcessor audio-renderer lis sp)))
   1.224 +    #+end_listing
   1.225 +
   1.226 +    
   1.227 +    The =Send= device, unlike most of the other devices in =OpenAL=,
   1.228 +    does not render sound unless asked. This enables the system to
   1.229 +    slow down or speed up depending on the needs of the AIs who are
   1.230 +    using it to listen. If the device tried to render samples in
   1.231 +    real-time, a complicated AI whose mind takes 100 seconds of
   1.232 +    computer time to simulate 1 second of AI-time would miss almost
   1.233 +    all of the sound in its environment!
   1.234 +
   1.235 +    #+caption: Program to enable arbitrary hearing in =CORTEX=
   1.236 +    #+name: hearing
   1.237 +    #+begin_listing clojure
   1.238 +(defn hearing-kernel
   1.239 +  "Returns a function which returns auditory sensory data when called
   1.240 +   inside a running simulation."
   1.241 +  [#^Node creature #^Spatial ear]
   1.242 +  (let [hearing-data (atom [])
   1.243 +        register-listener!
   1.244 +        (runonce 
   1.245 +         (fn [#^Application world]
   1.246 +           (add-ear!
   1.247 +            world creature ear
   1.248 +            (comp #(reset! hearing-data %)
   1.249 +                  byteBuffer->pulse-vector))))]
   1.250 +    (fn [#^Application world]
   1.251 +      (register-listener! world)
   1.252 +      (let [data @hearing-data
   1.253 +            topology              
   1.254 +            (vec (map #(vector % 0) (range 0 (count data))))]
   1.255 +        [topology data]))))
   1.256 +    
   1.257 +(defn hearing!
   1.258 +  "Endow the creature in a particular world with the sense of
   1.259 +   hearing. Will return a sequence of functions, one for each ear,
   1.260 +   which when called will return the auditory data from that ear."
   1.261 +  [#^Node creature]
   1.262 +  (for [ear (ears creature)]
   1.263 +    (hearing-kernel creature ear)))
   1.264 +    #+end_listing
   1.265 +
   1.266 +    Armed with these functions, =CORTEX= is able to test possibly the
   1.267 +    first ever instance of multiple listeners in a video game engine
   1.268 +    based simulation!
   1.269 +
   1.270 +    #+caption: Here a simple creature responds to sound by changing
   1.271 +    #+caption: its color from gray to green when the total volume
   1.272 +    #+caption: goes over a threshold.
   1.273 +    #+name: sound-test
   1.274 +    #+begin_listing java
   1.275 +/**
   1.276 + * Respond to sound!  This is the brain of an AI entity that 
   1.277 + * hears its surroundings and reacts to them.
   1.278 + */
   1.279 +public void process(ByteBuffer audioSamples, 
   1.280 +		    int numSamples, AudioFormat format) {
   1.281 +    audioSamples.clear();
   1.282 +    byte[] data = new byte[numSamples];
   1.283 +    float[] out = new float[numSamples];
   1.284 +    audioSamples.get(data);
   1.285 +    FloatSampleTools.
   1.286 +	byte2floatInterleaved
   1.287 +	(data, 0, out, 0, numSamples/format.getFrameSize(), format);
   1.288 +
   1.289 +    float max = Float.NEGATIVE_INFINITY;
   1.290 +    for (float f : out){if (f > max) max = f;}
   1.291 +    audioSamples.clear();
   1.292 +
   1.293 +    if (max > 0.1){
   1.294 +	entity.getMaterial().setColor("Color", ColorRGBA.Green);
   1.295 +    }
   1.296 +    else {
   1.297 +	entity.getMaterial().setColor("Color", ColorRGBA.Gray);
   1.298 +    }
   1.299 +    #+end_listing
   1.300 +
   1.301 +    #+caption: First ever simulation of multiple listerners in =CORTEX=.
   1.302 +    #+caption: Each cube is a creature which processes sound data with
   1.303 +    #+caption: the =process= function from listing \ref{sound-test}. 
   1.304 +    #+caption: the ball is constantally emiting a pure tone of
   1.305 +    #+caption: constant volume. As it approaches the cubes, they each
   1.306 +    #+caption: change color in response to the sound.
   1.307 +    #+name: sound-cubes.
   1.308 +    #+ATTR_LaTeX: :width 10cm
   1.309 +    [[./images/aurellem-gray.png]]
   1.310 +
   1.311 +    This system of hearing has also been co-opted by the
   1.312 +    jMonkeyEngine3 community and is used to record audio for demo
   1.313 +    videos.
   1.314 +
   1.315  ** Touch uses hundreds of hair-like elements
   1.316  
   1.317  ** Proprioception is the sense that makes everything ``real''