comparison thesis/cortex.org @ 472:516a029e0be9

complete first draft of hearing.
author Robert McIntyre <rlm@mit.edu>
date Fri, 28 Mar 2014 18:14:04 -0400
parents f14fa9e5b67f
children 486ce07f5545
comparison
equal deleted inserted replaced
471:f14fa9e5b67f 472:516a029e0be9
952 #+caption: simulation environment. 952 #+caption: simulation environment.
953 #+name: name 953 #+name: name
954 #+ATTR_LaTeX: :width 15cm 954 #+ATTR_LaTeX: :width 15cm
955 [[./images/physical-hand.png]] 955 [[./images/physical-hand.png]]
956 956
957 ** Eyes reuse standard video game components 957 ** COMMENT Eyes reuse standard video game components
958 958
959 Vision is one of the most important senses for humans, so I need to 959 Vision is one of the most important senses for humans, so I need to
960 build a simulated sense of vision for my AI. I will do this with 960 build a simulated sense of vision for my AI. I will do this with
961 simulated eyes. Each eye can be independently moved and should see 961 simulated eyes. Each eye can be independently moved and should see
962 its own version of the world depending on where it is. 962 its own version of the world depending on where it is.
1251 community and is now (in modified form) part of a system for 1251 community and is now (in modified form) part of a system for
1252 capturing in-game video to a file. 1252 capturing in-game video to a file.
1253 1253
1254 ** Hearing is hard; =CORTEX= does it right 1254 ** Hearing is hard; =CORTEX= does it right
1255 1255
1256 At the end of this section I will have simulated ears that work the
1257 same way as the simulated eyes in the last section. I will be able to
1258 place any number of ear-nodes in a blender file, and they will bind to
1259 the closest physical object and follow it as it moves around. Each ear
1260 will provide access to the sound data it picks up between every frame.
1261
1262 Hearing is one of the more difficult senses to simulate, because there
1263 is less support for obtaining the actual sound data that is processed
1264 by jMonkeyEngine3. There is no "split-screen" support for rendering
1265 sound from different points of view, and there is no way to directly
1266 access the rendered sound data.
1267
1268 =CORTEX='s hearing is unique because it does not have any
1269 limitations compared to other simulation environments. As far as I
1270 know, there is no other system that supports multiple listerers,
1271 and the sound demo at the end of this section is the first time
1272 it's been done in a video game environment.
1273
1274 *** Brief Description of jMonkeyEngine's Sound System
1275
1276 jMonkeyEngine's sound system works as follows:
1277
1278 - jMonkeyEngine uses the =AppSettings= for the particular
1279 application to determine what sort of =AudioRenderer= should be
1280 used.
1281 - Although some support is provided for multiple AudioRendering
1282 backends, jMonkeyEngine at the time of this writing will either
1283 pick no =AudioRenderer= at all, or the =LwjglAudioRenderer=.
1284 - jMonkeyEngine tries to figure out what sort of system you're
1285 running and extracts the appropriate native libraries.
1286 - The =LwjglAudioRenderer= uses the [[http://lwjgl.org/][=LWJGL=]] (LightWeight Java Game
1287 Library) bindings to interface with a C library called [[http://kcat.strangesoft.net/openal.html][=OpenAL=]]
1288 - =OpenAL= renders the 3D sound and feeds the rendered sound
1289 directly to any of various sound output devices with which it
1290 knows how to communicate.
1291
1292 A consequence of this is that there's no way to access the actual
1293 sound data produced by =OpenAL=. Even worse, =OpenAL= only supports
1294 one /listener/ (it renders sound data from only one perspective),
1295 which normally isn't a problem for games, but becomes a problem
1296 when trying to make multiple AI creatures that can each hear the
1297 world from a different perspective.
1298
1299 To make many AI creatures in jMonkeyEngine that can each hear the
1300 world from their own perspective, or to make a single creature with
1301 many ears, it is necessary to go all the way back to =OpenAL= and
1302 implement support for simulated hearing there.
1303
1304 *** Extending =OpenAl=
1305
1306 Extending =OpenAL= to support multiple listeners requires 500
1307 lines of =C= code and is too hairy to mention here. Instead, I
1308 will show a small amount of extension code and go over the high
1309 level stragety. Full source is of course available with the
1310 =CORTEX= distribution if you're interested.
1311
1312 =OpenAL= goes to great lengths to support many different systems,
1313 all with different sound capabilities and interfaces. It
1314 accomplishes this difficult task by providing code for many
1315 different sound backends in pseudo-objects called /Devices/.
1316 There's a device for the Linux Open Sound System and the Advanced
1317 Linux Sound Architecture, there's one for Direct Sound on Windows,
1318 and there's even one for Solaris. =OpenAL= solves the problem of
1319 platform independence by providing all these Devices.
1320
1321 Wrapper libraries such as LWJGL are free to examine the system on
1322 which they are running and then select an appropriate device for
1323 that system.
1324
1325 There are also a few "special" devices that don't interface with
1326 any particular system. These include the Null Device, which
1327 doesn't do anything, and the Wave Device, which writes whatever
1328 sound it receives to a file, if everything has been set up
1329 correctly when configuring =OpenAL=.
1330
1331 Actual mixing (doppler shift and distance.environment-based
1332 attenuation) of the sound data happens in the Devices, and they
1333 are the only point in the sound rendering process where this data
1334 is available.
1335
1336 Therefore, in order to support multiple listeners, and get the
1337 sound data in a form that the AIs can use, it is necessary to
1338 create a new Device which supports this feature.
1339
1340 Adding a device to OpenAL is rather tricky -- there are five
1341 separate files in the =OpenAL= source tree that must be modified
1342 to do so. I named my device the "Multiple Audio Send" Device, or
1343 =Send= Device for short, since it sends audio data back to the
1344 calling application like an Aux-Send cable on a mixing board.
1345
1346 The main idea behind the Send device is to take advantage of the
1347 fact that LWJGL only manages one /context/ when using OpenAL. A
1348 /context/ is like a container that holds samples and keeps track
1349 of where the listener is. In order to support multiple listeners,
1350 the Send device identifies the LWJGL context as the master
1351 context, and creates any number of slave contexts to represent
1352 additional listeners. Every time the device renders sound, it
1353 synchronizes every source from the master LWJGL context to the
1354 slave contexts. Then, it renders each context separately, using a
1355 different listener for each one. The rendered sound is made
1356 available via JNI to jMonkeyEngine.
1357
1358 Switching between contexts is not the normal operation of a
1359 Device, and one of the problems with doing so is that a Device
1360 normally keeps around a few pieces of state such as the
1361 =ClickRemoval= array above which will become corrupted if the
1362 contexts are not rendered in parallel. The solution is to create a
1363 copy of this normally global device state for each context, and
1364 copy it back and forth into and out of the actual device state
1365 whenever a context is rendered.
1366
1367 The core of the =Send= device is the =syncSources= function, which
1368 does the job of copying all relevant data from one context to
1369 another.
1370
1371 #+caption: Program for extending =OpenAL= to support multiple
1372 #+caption: listeners via context copying/switching.
1373 #+name: sync-openal-sources
1374 #+begin_listing C
1375 void syncSources(ALsource *masterSource, ALsource *slaveSource,
1376 ALCcontext *masterCtx, ALCcontext *slaveCtx){
1377 ALuint master = masterSource->source;
1378 ALuint slave = slaveSource->source;
1379 ALCcontext *current = alcGetCurrentContext();
1380
1381 syncSourcef(master,slave,masterCtx,slaveCtx,AL_PITCH);
1382 syncSourcef(master,slave,masterCtx,slaveCtx,AL_GAIN);
1383 syncSourcef(master,slave,masterCtx,slaveCtx,AL_MAX_DISTANCE);
1384 syncSourcef(master,slave,masterCtx,slaveCtx,AL_ROLLOFF_FACTOR);
1385 syncSourcef(master,slave,masterCtx,slaveCtx,AL_REFERENCE_DISTANCE);
1386 syncSourcef(master,slave,masterCtx,slaveCtx,AL_MIN_GAIN);
1387 syncSourcef(master,slave,masterCtx,slaveCtx,AL_MAX_GAIN);
1388 syncSourcef(master,slave,masterCtx,slaveCtx,AL_CONE_OUTER_GAIN);
1389 syncSourcef(master,slave,masterCtx,slaveCtx,AL_CONE_INNER_ANGLE);
1390 syncSourcef(master,slave,masterCtx,slaveCtx,AL_CONE_OUTER_ANGLE);
1391 syncSourcef(master,slave,masterCtx,slaveCtx,AL_SEC_OFFSET);
1392 syncSourcef(master,slave,masterCtx,slaveCtx,AL_SAMPLE_OFFSET);
1393 syncSourcef(master,slave,masterCtx,slaveCtx,AL_BYTE_OFFSET);
1394
1395 syncSource3f(master,slave,masterCtx,slaveCtx,AL_POSITION);
1396 syncSource3f(master,slave,masterCtx,slaveCtx,AL_VELOCITY);
1397 syncSource3f(master,slave,masterCtx,slaveCtx,AL_DIRECTION);
1398
1399 syncSourcei(master,slave,masterCtx,slaveCtx,AL_SOURCE_RELATIVE);
1400 syncSourcei(master,slave,masterCtx,slaveCtx,AL_LOOPING);
1401
1402 alcMakeContextCurrent(masterCtx);
1403 ALint source_type;
1404 alGetSourcei(master, AL_SOURCE_TYPE, &source_type);
1405
1406 // Only static sources are currently synchronized!
1407 if (AL_STATIC == source_type){
1408 ALint master_buffer;
1409 ALint slave_buffer;
1410 alGetSourcei(master, AL_BUFFER, &master_buffer);
1411 alcMakeContextCurrent(slaveCtx);
1412 alGetSourcei(slave, AL_BUFFER, &slave_buffer);
1413 if (master_buffer != slave_buffer){
1414 alSourcei(slave, AL_BUFFER, master_buffer);
1415 }
1416 }
1417
1418 // Synchronize the state of the two sources.
1419 alcMakeContextCurrent(masterCtx);
1420 ALint masterState;
1421 ALint slaveState;
1422
1423 alGetSourcei(master, AL_SOURCE_STATE, &masterState);
1424 alcMakeContextCurrent(slaveCtx);
1425 alGetSourcei(slave, AL_SOURCE_STATE, &slaveState);
1426
1427 if (masterState != slaveState){
1428 switch (masterState){
1429 case AL_INITIAL : alSourceRewind(slave); break;
1430 case AL_PLAYING : alSourcePlay(slave); break;
1431 case AL_PAUSED : alSourcePause(slave); break;
1432 case AL_STOPPED : alSourceStop(slave); break;
1433 }
1434 }
1435 // Restore whatever context was previously active.
1436 alcMakeContextCurrent(current);
1437 }
1438 #+end_listing
1439
1440 With this special context-switching device, and some ugly JNI
1441 bindings that are not worth mentioning, =CORTEX= gains the ability
1442 to access multiple sound streams from =OpenAL=.
1443
1444 #+caption: Program to create an ear from a blender empty node. The ear
1445 #+caption: follows around the nearest physical object and passes
1446 #+caption: all sensory data to a continuation function.
1447 #+name: add-ear
1448 #+begin_listing clojure
1449 (defn add-ear!
1450 "Create a Listener centered on the current position of 'ear
1451 which follows the closest physical node in 'creature and
1452 sends sound data to 'continuation."
1453 [#^Application world #^Node creature #^Spatial ear continuation]
1454 (let [target (closest-node creature ear)
1455 lis (Listener.)
1456 audio-renderer (.getAudioRenderer world)
1457 sp (hearing-pipeline continuation)]
1458 (.setLocation lis (.getWorldTranslation ear))
1459 (.setRotation lis (.getWorldRotation ear))
1460 (bind-sense target lis)
1461 (update-listener-velocity! target lis)
1462 (.addListener audio-renderer lis)
1463 (.registerSoundProcessor audio-renderer lis sp)))
1464 #+end_listing
1465
1466
1467 The =Send= device, unlike most of the other devices in =OpenAL=,
1468 does not render sound unless asked. This enables the system to
1469 slow down or speed up depending on the needs of the AIs who are
1470 using it to listen. If the device tried to render samples in
1471 real-time, a complicated AI whose mind takes 100 seconds of
1472 computer time to simulate 1 second of AI-time would miss almost
1473 all of the sound in its environment!
1474
1475 #+caption: Program to enable arbitrary hearing in =CORTEX=
1476 #+name: hearing
1477 #+begin_listing clojure
1478 (defn hearing-kernel
1479 "Returns a function which returns auditory sensory data when called
1480 inside a running simulation."
1481 [#^Node creature #^Spatial ear]
1482 (let [hearing-data (atom [])
1483 register-listener!
1484 (runonce
1485 (fn [#^Application world]
1486 (add-ear!
1487 world creature ear
1488 (comp #(reset! hearing-data %)
1489 byteBuffer->pulse-vector))))]
1490 (fn [#^Application world]
1491 (register-listener! world)
1492 (let [data @hearing-data
1493 topology
1494 (vec (map #(vector % 0) (range 0 (count data))))]
1495 [topology data]))))
1496
1497 (defn hearing!
1498 "Endow the creature in a particular world with the sense of
1499 hearing. Will return a sequence of functions, one for each ear,
1500 which when called will return the auditory data from that ear."
1501 [#^Node creature]
1502 (for [ear (ears creature)]
1503 (hearing-kernel creature ear)))
1504 #+end_listing
1505
1506 Armed with these functions, =CORTEX= is able to test possibly the
1507 first ever instance of multiple listeners in a video game engine
1508 based simulation!
1509
1510 #+caption: Here a simple creature responds to sound by changing
1511 #+caption: its color from gray to green when the total volume
1512 #+caption: goes over a threshold.
1513 #+name: sound-test
1514 #+begin_listing java
1515 /**
1516 * Respond to sound! This is the brain of an AI entity that
1517 * hears its surroundings and reacts to them.
1518 */
1519 public void process(ByteBuffer audioSamples,
1520 int numSamples, AudioFormat format) {
1521 audioSamples.clear();
1522 byte[] data = new byte[numSamples];
1523 float[] out = new float[numSamples];
1524 audioSamples.get(data);
1525 FloatSampleTools.
1526 byte2floatInterleaved
1527 (data, 0, out, 0, numSamples/format.getFrameSize(), format);
1528
1529 float max = Float.NEGATIVE_INFINITY;
1530 for (float f : out){if (f > max) max = f;}
1531 audioSamples.clear();
1532
1533 if (max > 0.1){
1534 entity.getMaterial().setColor("Color", ColorRGBA.Green);
1535 }
1536 else {
1537 entity.getMaterial().setColor("Color", ColorRGBA.Gray);
1538 }
1539 #+end_listing
1540
1541 #+caption: First ever simulation of multiple listerners in =CORTEX=.
1542 #+caption: Each cube is a creature which processes sound data with
1543 #+caption: the =process= function from listing \ref{sound-test}.
1544 #+caption: the ball is constantally emiting a pure tone of
1545 #+caption: constant volume. As it approaches the cubes, they each
1546 #+caption: change color in response to the sound.
1547 #+name: sound-cubes.
1548 #+ATTR_LaTeX: :width 10cm
1549 [[./images/aurellem-gray.png]]
1550
1551 This system of hearing has also been co-opted by the
1552 jMonkeyEngine3 community and is used to record audio for demo
1553 videos.
1554
1256 ** Touch uses hundreds of hair-like elements 1555 ** Touch uses hundreds of hair-like elements
1257 1556
1258 ** Proprioception is the sense that makes everything ``real'' 1557 ** Proprioception is the sense that makes everything ``real''
1259 1558
1260 ** Muscles are both effectors and sensors 1559 ** Muscles are both effectors and sensors