# HG changeset patch # User Robert McIntyre # Date 1396239506 14400 # Node ID 68665d2c32a7e5cbd51fc037d4b8b3d1f3b0ab20 # Parent ced955c3c84f37d8fd6142887ad40549b50b572a spellcheck; almost done with first draft! diff -r ced955c3c84f -r 68665d2c32a7 thesis/cortex.bib --- a/thesis/cortex.bib Sun Mar 30 22:48:19 2014 -0400 +++ b/thesis/cortex.bib Mon Mar 31 00:18:26 2014 -0400 @@ -12,7 +12,7 @@ year = 2013, addendum = {\why{All complicated creatures in {\tt CORTEX} are described using Blender's extensive 3D modeling - capabilities. Blender is a very sophistaced 3D + capabilities. Blender is a very sophisticated 3D modeling environment and has been used to create a short movie called Sintel \url{http://www.sintel.org/}.}} } @@ -90,10 +90,10 @@ year = "1998", title = "The Man Who Mistook His Wife For A Hat: And Other Clinical Tales", ISBN = "9780330700580", - addendum = {\why{This book describes exoitic cases where the human + addendum = {\why{This book describes exotic cases where the human mind goes wrong. The section on proprioception is - particurally relevant to this thesis, and one of the - best explinations of how important proprioception + particularly relevant to this thesis, and one of the + best explanations of how important proprioception is, though the eyes of someone who has lost the sense.}} } @@ -158,7 +158,7 @@ be improved with {\tt CORTEX}. Larson uses a simple blocks world simulator to explore using self-organizing maps to bootstrap symbols just from - exploration with a simule arm and colored blocks.}} + exploration with a simulate arm and colored blocks.}} } @phdthesis{sussman-hacker, @@ -174,7 +174,7 @@ problem solving is begging to be implemented in {\tt CORTEX}'s rich world. Will program debugging still work well with many more senses and a more - complicated environement?}} + complicated environment?}} } @phdthesis{coen-x-modal, diff -r ced955c3c84f -r 68665d2c32a7 thesis/cortex.org --- a/thesis/cortex.org Sun Mar 30 22:48:19 2014 -0400 +++ b/thesis/cortex.org Mon Mar 31 00:18:26 2014 -0400 @@ -59,7 +59,6 @@ constraint can be the difference between easily understanding what is happening in a video and being completely lost in a sea of incomprehensible color and movement. - ** The problem: recognizing actions in video is hard! @@ -77,7 +76,7 @@ the problem is that many computer vision systems focus on pixel-level details or comparisons to example images (such as \cite{volume-action-recognition}), but the 3D world is so variable - that it is hard to descrive the world in terms of possible images. + that it is hard to describe the world in terms of possible images. In fact, the contents of scene may have much less to do with pixel probabilities than with recognizing various affordances: things you @@ -102,7 +101,7 @@ [[./images/wall-push.png]] Each of these examples tells us something about what might be going - on in our minds as we easily solve these recognition problems. + on in our minds as we easily solve these recognition problems: The hidden chair shows us that we are strongly triggered by cues relating to the position of human bodies, and that we can determine @@ -115,6 +114,11 @@ most positions, and we can easily project this self-knowledge to imagined positions triggered by images of the human body. + The cat tells us that imagination of some kind plays an important + role in understanding actions. The question is: Can we be more + precise about what sort of imagination is required to understand + these actions? + ** A step forward: the sensorimotor-centered approach In this thesis, I explore the idea that our knowledge of our own @@ -139,13 +143,13 @@ model of its own body in place of the cat. Possibly also create a simulation of the stream of water. - 2. Play out this simulated scene and generate imagined sensory + 2. ``Play out'' this simulated scene and generate imagined sensory experience. This will include relevant muscle contractions, a close up view of the stream from the cat's perspective, and most - importantly, the imagined feeling of water entering the - mouth. The imagined sensory experience can come from a - simulation of the event, but can also be pattern-matched from - previous, similar embodied experience. + importantly, the imagined feeling of water entering the mouth. + The imagined sensory experience can come from a simulation of + the event, but can also be pattern-matched from previous, + similar embodied experience. 3. The action is now easily identified as drinking by the sense of taste alone. The other senses (such as the tongue moving in and @@ -160,7 +164,7 @@ 2. Generate proprioceptive sensory data from this alignment. 3. Use the imagined proprioceptive data as a key to lookup related - sensory experience associated with that particular proproceptive + sensory experience associated with that particular proprioceptive feeling. 4. Retrieve the feeling of your bottom resting on a surface, your @@ -194,14 +198,14 @@ viewpoint. Another powerful advantage is that using the language of multiple - body-centered rich senses to describe body-centerd actions offers a + body-centered rich senses to describe body-centered actions offers a massive boost in descriptive capability. Consider how difficult it would be to compose a set of HOG filters to describe the action of a simple worm-creature ``curling'' so that its head touches its tail, and then behold the simplicity of describing thus action in a language designed for the task (listing \ref{grand-circle-intro}): - #+caption: Body-centerd actions are best expressed in a body-centered + #+caption: Body-centered actions are best expressed in a body-centered #+caption: language. This code detects when the worm has curled into a #+caption: full circle. Imagine how you would replicate this functionality #+caption: using low-level pixel features such as HOG filters! @@ -220,30 +224,23 @@ #+end_src #+end_listing -** =EMPATH= regognizes actions using empathy - - First, I built a system for constructing virtual creatures with +** =EMPATH= recognizes actions using empathy + + Exploring these ideas further demands a concrete implementation, so + first, I built a system for constructing virtual creatures with physiologically plausible sensorimotor systems and detailed environments. The result is =CORTEX=, which is described in section - \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other - AI researchers; it is provided in full with detailed instructions - on the web [here].) + \ref{sec-2}. Next, I wrote routines which enabled a simple worm-like creature to infer the actions of a second worm-like creature, using only its own prior sensorimotor experiences and knowledge of the second worm's joint positions. This program, =EMPATH=, is described in - section \ref{sec-3}, and the key results of this experiment are - summarized below. - - I have built a system that can express the types of recognition - problems in a form amenable to computation. It is split into - four parts: - - - Free/Guided Play :: The creature moves around and experiences the - world through its unique perspective. Many otherwise - complicated actions are easily described in the language of a - full suite of body-centered, rich senses. For example, + section \ref{sec-3}. It's main components are: + + - Embodied Action Definitions :: Many otherwise complicated actions + are easily described in the language of a full suite of + body-centered, rich senses and experiences. For example, drinking is the feeling of water sliding down your throat, and cooling your insides. It's often accompanied by bringing your hand close to your face, or bringing your face close to water. @@ -251,26 +248,35 @@ your quadriceps, then feeling a surface with your bottom and relaxing your legs. These body-centered action descriptions can be either learned or hard coded. - - Posture Imitation :: When trying to interpret a video or image, + + - Guided Play :: The creature moves around and experiences the + world through its unique perspective. As the creature moves, + it gathers experiences that satisfy the embodied action + definitions. + + - Posture imitation :: When trying to interpret a video or image, the creature takes a model of itself and aligns it with - whatever it sees. This alignment can even cross species, as + whatever it sees. This alignment might even cross species, as when humans try to align themselves with things like ponies, dogs, or other humans with a different body type. - - Empathy :: The alignment triggers associations with + + - Empathy :: The alignment triggers associations with sensory data from prior experiences. For example, the alignment itself easily maps to proprioceptive data. Any sounds or obvious skin contact in the video can to a lesser - extent trigger previous experience. Segments of previous - experiences are stitched together to form a coherent and - complete sensory portrait of the scene. - - Recognition :: With the scene described in terms of first - person sensory events, the creature can now run its - action-identification programs on this synthesized sensory - data, just as it would if it were actually experiencing the - scene first-hand. If previous experience has been accurately + extent trigger previous experience keyed to hearing or touch. + Segments of previous experiences gained from play are stitched + together to form a coherent and complete sensory portrait of + the scene. + + - Recognition :: With the scene described in terms of + remembered first person sensory events, the creature can now + run its action-identified programs (such as the one in listing + \ref{grand-circle-intro} on this synthesized sensory data, + just as it would if it were actually experiencing the scene + first-hand. If previous experience has been accurately retrieved, and if it is analogous enough to the scene, then the creature will correctly identify the action in the scene. - My program, =EMPATH= uses this empathic problem solving technique to interpret the actions of a simple, worm-like creature. @@ -287,28 +293,31 @@ #+name: worm-recognition-intro #+ATTR_LaTeX: :width 15cm [[./images/worm-poses.png]] - - #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer - #+caption: the complete sensory experience and classify these four poses. - #+caption: The last image is a composite, depicting the intermediate stages - #+caption: of \emph{wriggling}. - #+name: worm-recognition-intro-2 - #+ATTR_LaTeX: :width 15cm - [[./images/empathy-1.png]] - Next, I developed an experiment to test the power of =CORTEX='s - sensorimotor-centered language for solving recognition problems. As - a proof of concept, I wrote routines which enabled a simple - worm-like creature to infer the actions of a second worm-like - creature, using only its own previous sensorimotor experiences and - knowledge of the second worm's joints (figure - \ref{worm-recognition-intro-2}). The result of this proof of - concept was the program =EMPATH=, described in section \ref{sec-3}. - -** =EMPATH= is built on =CORTEX=, en environment for making creatures. - - # =CORTEX= provides a language for describing the sensorimotor - # experiences of various creatures. +*** Main Results + + - After one-shot supervised training, =EMPATH= was able recognize a + wide variety of static poses and dynamic actions---ranging from + curling in a circle to wiggling with a particular frequency --- + with 95\% accuracy. + + - These results were completely independent of viewing angle + because the underlying body-centered language fundamentally is + independent; once an action is learned, it can be recognized + equally well from any viewing angle. + + - =EMPATH= is surprisingly short; the sensorimotor-centered + language provided by =CORTEX= resulted in extremely economical + recognition routines --- about 500 lines in all --- suggesting + that such representations are very powerful, and often + indispensable for the types of recognition tasks considered here. + + - Although for expediency's sake, I relied on direct knowledge of + joint positions in this proof of concept, it would be + straightforward to extend =EMPATH= so that it (more + realistically) infers joint positions from its visual data. + +** =EMPATH= is built on =CORTEX=, a creature builder. I built =CORTEX= to be a general AI research platform for doing experiments involving multiple rich senses and a wide variety and @@ -319,19 +328,21 @@ language of creatures and senses, but in order to explore those ideas they must first build a platform in which they can create simulated creatures with rich senses! There are many ideas that - would be simple to execute (such as =EMPATH=), but attached to them - is the multi-month effort to make a good creature simulator. Often, - that initial investment of time proves to be too much, and the - project must make do with a lesser environment. + would be simple to execute (such as =EMPATH= or + \cite{larson-symbols}), but attached to them is the multi-month + effort to make a good creature simulator. Often, that initial + investment of time proves to be too much, and the project must make + do with a lesser environment. =CORTEX= is well suited as an environment for embodied AI research for three reasons: - - You can create new creatures using Blender, a popular 3D modeling - program. Each sense can be specified using special blender nodes - with biologically inspired paramaters. You need not write any - code to create a creature, and can use a wide library of - pre-existing blender models as a base for your own creatures. + - You can create new creatures using Blender (\cite{blender}), a + popular 3D modeling program. Each sense can be specified using + special blender nodes with biologically inspired parameters. You + need not write any code to create a creature, and can use a wide + library of pre-existing blender models as a base for your own + creatures. - =CORTEX= implements a wide variety of senses: touch, proprioception, vision, hearing, and muscle tension. Complicated @@ -343,24 +354,25 @@ available. - =CORTEX= supports any number of creatures and any number of - senses. Time in =CORTEX= dialates so that the simulated creatures - always precieve a perfectly smooth flow of time, regardless of + senses. Time in =CORTEX= dilates so that the simulated creatures + always perceive a perfectly smooth flow of time, regardless of the actual computational load. - =CORTEX= is built on top of =jMonkeyEngine3=, which is a video game - engine designed to create cross-platform 3D desktop games. =CORTEX= - is mainly written in clojure, a dialect of =LISP= that runs on the - java virtual machine (JVM). The API for creating and simulating - creatures and senses is entirely expressed in clojure, though many - senses are implemented at the layer of jMonkeyEngine or below. For - example, for the sense of hearing I use a layer of clojure code on - top of a layer of java JNI bindings that drive a layer of =C++= - code which implements a modified version of =OpenAL= to support - multiple listeners. =CORTEX= is the only simulation environment - that I know of that can support multiple entities that can each - hear the world from their own perspective. Other senses also - require a small layer of Java code. =CORTEX= also uses =bullet=, a - physics simulator written in =C=. + =CORTEX= is built on top of =jMonkeyEngine3= + (\cite{jmonkeyengine}), which is a video game engine designed to + create cross-platform 3D desktop games. =CORTEX= is mainly written + in clojure, a dialect of =LISP= that runs on the java virtual + machine (JVM). The API for creating and simulating creatures and + senses is entirely expressed in clojure, though many senses are + implemented at the layer of jMonkeyEngine or below. For example, + for the sense of hearing I use a layer of clojure code on top of a + layer of java JNI bindings that drive a layer of =C++= code which + implements a modified version of =OpenAL= to support multiple + listeners. =CORTEX= is the only simulation environment that I know + of that can support multiple entities that can each hear the world + from their own perspective. Other senses also require a small layer + of Java code. =CORTEX= also uses =bullet=, a physics simulator + written in =C=. #+caption: Here is the worm from figure \ref{worm-intro} modeled #+caption: in Blender, a free 3D-modeling program. Senses and @@ -375,8 +387,8 @@ - distributed communication among swarm creatures - self-learning using free exploration, - evolutionary algorithms involving creature construction - - exploration of exoitic senses and effectors that are not possible - in the real world (such as telekenisis or a semantic sense) + - exploration of exotic senses and effectors that are not possible + in the real world (such as telekinesis or a semantic sense) - imagination using subworlds During one test with =CORTEX=, I created 3,000 creatures each with @@ -400,37 +412,6 @@ \end{sidewaysfigure} #+END_LaTeX -** Contributions - - - I built =CORTEX=, a comprehensive platform for embodied AI - experiments. =CORTEX= supports many features lacking in other - systems, such proper simulation of hearing. It is easy to create - new =CORTEX= creatures using Blender, a free 3D modeling program. - - - I built =EMPATH=, which uses =CORTEX= to identify the actions of - a worm-like creature using a computational model of empathy. - - - After one-shot supervised training, =EMPATH= was able recognize a - wide variety of static poses and dynamic actions---ranging from - curling in a circle to wriggling with a particular frequency --- - with 95\% accuracy. - - - These results were completely independent of viewing angle - because the underlying body-centered language fundamentally is - independent; once an action is learned, it can be recognized - equally well from any viewing angle. - - - =EMPATH= is surprisingly short; the sensorimotor-centered - language provided by =CORTEX= resulted in extremely economical - recognition routines --- about 500 lines in all --- suggesting - that such representations are very powerful, and often - indispensible for the types of recognition tasks considered here. - - - Although for expediency's sake, I relied on direct knowledge of - joint positions in this proof of concept, it would be - straightforward to extend =EMPATH= so that it (more - realistically) infers joint positions from its visual data. - * Designing =CORTEX= In this section, I outline the design decisions that went into @@ -441,18 +422,18 @@ Throughout this project, I intended for =CORTEX= to be flexible and extensible enough to be useful for other researchers who want to - test out ideas of their own. To this end, wherver I have had to make - archetictural choices about =CORTEX=, I have chosen to give as much + test out ideas of their own. To this end, wherever I have had to make + architectural choices about =CORTEX=, I have chosen to give as much freedom to the user as possible, so that =CORTEX= may be used for - things I have not forseen. + things I have not foreseen. ** Building in simulation versus reality - The most important archetictural decision of all is the choice to - use a computer-simulated environemnt in the first place! The world + The most important architectural decision of all is the choice to + use a computer-simulated environment in the first place! The world is a vast and rich place, and for now simulations are a very poor reflection of its complexity. It may be that there is a significant - qualatative difference between dealing with senses in the real - world and dealing with pale facilimilies of them in a simulation + qualitative difference between dealing with senses in the real + world and dealing with pale facsimiles of them in a simulation \cite{brooks-representation}. What are the advantages and disadvantages of a simulation vs. reality? @@ -519,13 +500,13 @@ The need for real time processing only increases if multiple senses are involved. In the extreme case, even simple algorithms will have to be accelerated by ASIC chips or FPGAs, turning what would - otherwise be a few lines of code and a 10x speed penality into a + otherwise be a few lines of code and a 10x speed penalty into a multi-month ordeal. For this reason, =CORTEX= supports - /time-dialiation/, which scales back the framerate of the + /time-dilation/, which scales back the framerate of the simulation in proportion to the amount of processing each frame. From the perspective of the creatures inside the simulation, time always appears to flow at a constant rate, regardless of how - complicated the envorimnent becomes or how many creatures are in + complicated the environment becomes or how many creatures are in the simulation. The cost is that =CORTEX= can sometimes run slower than real time. This can also be an advantage, however --- simulations of very simple creatures in =CORTEX= generally run at @@ -536,7 +517,7 @@ If =CORTEX= is to support a wide variety of senses, it would help to have a better understanding of what a ``sense'' actually is! While vision, touch, and hearing all seem like they are quite - different things, I was supprised to learn during the course of + different things, I was surprised to learn during the course of this thesis that they (and all physical senses) can be expressed as exactly the same mathematical object due to a dimensional argument! @@ -561,13 +542,13 @@ Most human senses consist of many discrete sensors of various properties distributed along a surface at various densities. For skin, it is Pacinian corpuscles, Meissner's corpuscles, Merkel's - disks, and Ruffini's endings, which detect pressure and vibration - of various intensities. For ears, it is the stereocilia distributed - along the basilar membrane inside the cochlea; each one is - sensitive to a slightly different frequency of sound. For eyes, it - is rods and cones distributed along the surface of the retina. In - each case, we can describe the sense with a surface and a - distribution of sensors along that surface. + disks, and Ruffini's endings (\cite{9.01-textbook), which detect + pressure and vibration of various intensities. For ears, it is the + stereocilia distributed along the basilar membrane inside the + cochlea; each one is sensitive to a slightly different frequency of + sound. For eyes, it is rods and cones distributed along the surface + of the retina. In each case, we can describe the sense with a + surface and a distribution of sensors along that surface. The neat idea is that every human sense can be effectively described in terms of a surface containing embedded sensors. If the @@ -614,7 +595,7 @@ I did not need to write my own physics simulation code or shader to build =CORTEX=. Doing so would lead to a system that is impossible for anyone but myself to use anyway. Instead, I use a video game - engine as a base and modify it to accomodate the additional needs + engine as a base and modify it to accommodate the additional needs of =CORTEX=. Video game engines are an ideal starting point to build =CORTEX=, because they are not far from being creature building systems themselves. @@ -684,7 +665,7 @@ for other projects, it needs a way to construct complicated creatures. If possible, it would be nice to leverage work that has already been done by the community of 3D modelers, or at least - enable people who are talented at moedling but not programming to + enable people who are talented at modeling but not programming to design =CORTEX= creatures. Therefore, I use Blender, a free 3D modeling program, as the main @@ -704,7 +685,7 @@ sensors if applicable. - Make each empty-node the child of the top-level node. - #+caption: An example of annoting a creature model with empty + #+caption: An example of annotating a creature model with empty #+caption: nodes to describe the layout of senses. There are #+caption: multiple empty nodes which each describe the position #+caption: of muscles, ears, eyes, or joints. @@ -717,7 +698,7 @@ Blender is a general purpose animation tool, which has been used in the past to create high quality movies such as Sintel \cite{blender}. Though Blender can model and render even complicated - things like water, it is crucual to keep models that are meant to + things like water, it is crucial to keep models that are meant to be simulated as creatures simple. =Bullet=, which =CORTEX= uses though jMonkeyEngine3, is a rigid-body physics system. This offers a compromise between the expressiveness of a game level and the @@ -725,9 +706,9 @@ should be naturally expressed as rigid components held together by joint constraints. - But humans are more like a squishy bag with wrapped around some - hard bones which define the overall shape. When we move, our skin - bends and stretches to accomodate the new positions of our bones. + But humans are more like a squishy bag wrapped around some hard + bones which define the overall shape. When we move, our skin bends + and stretches to accommodate the new positions of our bones. One way to make bodies composed of rigid pieces connected by joints /seem/ more human-like is to use an /armature/, (or /rigging/) @@ -735,17 +716,16 @@ mesh deforms as a function of the position of each ``bone'' which is a standard rigid body. This technique is used extensively to model humans and create realistic animations. It is not a good - technique for physical simulation, however because it creates a lie - -- the skin is not a physical part of the simulation and does not - interact with any objects in the world or itself. Objects will pass - right though the skin until they come in contact with the - underlying bone, which is a physical object. Whithout simulating - the skin, the sense of touch has little meaning, and the creature's - own vision will lie to it about the true extent of its body. - Simulating the skin as a physical object requires some way to - continuously update the physical model of the skin along with the - movement of the bones, which is unacceptably slow compared to rigid - body simulation. + technique for physical simulation because it is a lie -- the skin + is not a physical part of the simulation and does not interact with + any objects in the world or itself. Objects will pass right though + the skin until they come in contact with the underlying bone, which + is a physical object. Without simulating the skin, the sense of + touch has little meaning, and the creature's own vision will lie to + it about the true extent of its body. Simulating the skin as a + physical object requires some way to continuously update the + physical model of the skin along with the movement of the bones, + which is unacceptably slow compared to rigid body simulation. Therefore, instead of using the human-like ``deformable bag of bones'' approach, I decided to base my body plans on multiple solid @@ -762,7 +742,7 @@ together by invisible joint constraints. This is what I mean by ``eve-like''. The main reason that I use eve-style bodies is for efficiency, and so that there will be correspondence between the - AI's semses and the physical presence of its body. Each individual + AI's senses and the physical presence of its body. Each individual section is simulated by a separate rigid body that corresponds exactly with its visual representation and does not change. Sections are connected by invisible joints that are well supported @@ -870,7 +850,7 @@ must be called /after/ =physical!= is called. #+caption: Program to find the targets of a joint node by - #+caption: exponentiallly growth of a search cube. + #+caption: exponentially growth of a search cube. #+name: joint-targets #+begin_listing clojure #+begin_src clojure @@ -905,7 +885,7 @@ a dispatch on the metadata of each joint node. #+caption: Program to dispatch on blender metadata and create joints - #+caption: sutiable for physical simulation. + #+caption: suitable for physical simulation. #+name: joint-dispatch #+begin_listing clojure #+begin_src clojure @@ -985,8 +965,8 @@ In general, whenever =CORTEX= exposes a sense (or in this case physicality), it provides a function of the type =sense!=, which takes in a collection of nodes and augments it to support that - sense. The function returns any controlls necessary to use that - sense. In this case =body!= cerates a physical body and returns no + sense. The function returns any controls necessary to use that + sense. In this case =body!= creates a physical body and returns no control functions. #+caption: Program to give joints to a creature. @@ -1022,7 +1002,7 @@ creature. #+caption: With the ability to create physical creatures from blender, - #+caption: =CORTEX= gets one step closer to becomming a full creature + #+caption: =CORTEX= gets one step closer to becoming a full creature #+caption: simulation environment. #+name: name #+ATTR_LaTeX: :width 15cm @@ -1085,7 +1065,7 @@ hold the data. It does not do any copying from the GPU to the CPU itself because it is a slow operation. - #+caption: Function to make the rendered secne in jMonkeyEngine + #+caption: Function to make the rendered scene in jMonkeyEngine #+caption: available for further processing. #+name: pipeline-1 #+begin_listing clojure @@ -1160,7 +1140,7 @@ (let [target (closest-node creature eye) [cam-width cam-height] ;;[640 480] ;; graphics card on laptop doesn't support - ;; arbitray dimensions. + ;; arbitrary dimensions. (eye-dimensions eye) cam (Camera. cam-width cam-height) rot (.getWorldRotation eye)] @@ -1345,7 +1325,7 @@ =CORTEX='s hearing is unique because it does not have any limitations compared to other simulation environments. As far as I - know, there is no other system that supports multiple listerers, + know, there is no other system that supports multiple listeners, and the sound demo at the end of this section is the first time it's been done in a video game environment. @@ -1384,7 +1364,7 @@ Extending =OpenAL= to support multiple listeners requires 500 lines of =C= code and is too hairy to mention here. Instead, I will show a small amount of extension code and go over the high - level stragety. Full source is of course available with the + level strategy. Full source is of course available with the =CORTEX= distribution if you're interested. =OpenAL= goes to great lengths to support many different systems, @@ -1406,7 +1386,7 @@ sound it receives to a file, if everything has been set up correctly when configuring =OpenAL=. - Actual mixing (doppler shift and distance.environment-based + Actual mixing (Doppler shift and distance.environment-based attenuation) of the sound data happens in the Devices, and they are the only point in the sound rendering process where this data is available. @@ -1623,10 +1603,10 @@ #+END_SRC #+end_listing - #+caption: First ever simulation of multiple listerners in =CORTEX=. + #+caption: First ever simulation of multiple listeners in =CORTEX=. #+caption: Each cube is a creature which processes sound data with #+caption: the =process= function from listing \ref{sound-test}. - #+caption: the ball is constantally emiting a pure tone of + #+caption: the ball is constantly emitting a pure tone of #+caption: constant volume. As it approaches the cubes, they each #+caption: change color in response to the sound. #+name: sound-cubes. @@ -1756,7 +1736,7 @@ fit the height and width of the UV image). #+caption: Programs to extract triangles from a geometry and get - #+caption: their verticies in both world and UV-coordinates. + #+caption: their vertices in both world and UV-coordinates. #+name: get-triangles #+begin_listing clojure #+BEGIN_SRC clojure @@ -1851,7 +1831,7 @@ jMonkeyEngine's =Matrix4f= objects, which can describe any affine transformation. - #+caption: Program to interpert triangles as affine transforms. + #+caption: Program to interpret triangles as affine transforms. #+name: triangle-affine #+begin_listing clojure #+BEGIN_SRC clojure @@ -1894,7 +1874,7 @@ =inside-triangle?= determines whether a point is inside a triangle in 2D pixel-space. - #+caption: Program to efficiently determine point includion + #+caption: Program to efficiently determine point inclusion #+caption: in a triangle. #+name: in-triangle #+begin_listing clojure @@ -2089,7 +2069,7 @@ Armed with the =touch!= function, =CORTEX= becomes capable of giving creatures a sense of touch. A simple test is to create a - cube that is outfitted with a uniform distrubition of touch + cube that is outfitted with a uniform distribution of touch sensors. It can feel the ground and any balls that it touches. #+caption: =CORTEX= interface for creating touch in a simulated @@ -2111,7 +2091,7 @@ #+end_listing The tactile-sensor-profile image for the touch cube is a simple - cross with a unifom distribution of touch sensors: + cross with a uniform distribution of touch sensors: #+caption: The touch profile for the touch-cube. Each pure white #+caption: pixel defines a touch sensitive feeler. @@ -2119,7 +2099,7 @@ #+ATTR_LaTeX: :width 7cm [[./images/touch-profile.png]] - #+caption: The touch cube reacts to canonballs. The black, red, + #+caption: The touch cube reacts to cannonballs. The black, red, #+caption: and white cross on the right is a visual display of #+caption: the creature's touch. White means that it is feeling #+caption: something strongly, black is not feeling anything, @@ -2171,7 +2151,7 @@ like a normal dot-product angle is. The purpose of these functions is to build a system of angle - measurement that is biologically plausable. + measurement that is biologically plausible. #+caption: Program to measure angles along a vector #+name: helpers @@ -2201,7 +2181,7 @@ connects. The only tricky part here is making the angles relative to the joint's initial ``straightness''. - #+caption: Program to return biologially reasonable proprioceptive + #+caption: Program to return biologically reasonable proprioceptive #+caption: data for each joint. #+name: proprioception #+begin_listing clojure @@ -2359,7 +2339,7 @@ *** Creating muscles - #+caption: This is the core movement functoion in =CORTEX=, which + #+caption: This is the core movement function in =CORTEX=, which #+caption: implements muscles that report on their activation. #+name: muscle-kernel #+begin_listing clojure @@ -2417,7 +2397,7 @@ intricate marionette hand with several strings for each finger: #+caption: View of the hand model with all sense nodes. You can see - #+caption: the joint, muscle, ear, and eye nodess here. + #+caption: the joint, muscle, ear, and eye nodes here. #+name: hand-nodes-1 #+ATTR_LaTeX: :width 11cm [[./images/hand-with-all-senses2.png]] @@ -2430,7 +2410,7 @@ With the hand fully rigged with senses, I can run it though a test that will test everything. - #+caption: A full test of the hand with all senses. Note expecially + #+caption: A full test of the hand with all senses. Note especially #+caption: the interactions the hand has with itself: it feels #+caption: its own palm and fingers, and when it curls its fingers, #+caption: it sees them with its eye (which is located in the center @@ -2440,7 +2420,7 @@ #+ATTR_LaTeX: :width 16cm [[./images/integration.png]] -** =CORTEX= enables many possiblities for further research +** =CORTEX= enables many possibilities for further research Often times, the hardest part of building a system involving creatures is dealing with physics and graphics. =CORTEX= removes @@ -2561,14 +2541,14 @@ #+end_src #+end_listing -** Embodiment factors action recognition into managable parts +** Embodiment factors action recognition into manageable parts Using empathy, I divide the problem of action recognition into a recognition process expressed in the language of a full compliment - of senses, and an imaganitive process that generates full sensory + of senses, and an imaginative process that generates full sensory data from partial sensory data. Splitting the action recognition problem in this manner greatly reduces the total amount of work to - recognize actions: The imaganitive process is mostly just matching + recognize actions: The imaginative process is mostly just matching previous experience, and the recognition process gets to use all the senses to directly describe any action. @@ -2586,8 +2566,8 @@ experience, observe however much of it they desire, and decide whether the worm is doing the action they describe. =curled?= relies on proprioception, =resting?= relies on touch, =wiggling?= - relies on a fourier analysis of muscle contraction, and - =grand-circle?= relies on touch and reuses =curled?= as a gaurd. + relies on a Fourier analysis of muscle contraction, and + =grand-circle?= relies on touch and reuses =curled?= as a guard. #+caption: Program for detecting whether the worm is curled. This is the #+caption: simplest action predicate, because it only uses the last frame @@ -2634,7 +2614,7 @@ #+caption: uses a summary of the tactile information from the underbelly #+caption: of the worm, and is only true if every segment is touching the #+caption: floor. Note that this function contains no references to - #+caption: proprioction at all. + #+caption: proprioception at all. #+name: resting #+begin_listing clojure #+begin_src clojure @@ -2675,9 +2655,9 @@ #+caption: Program for detecting whether the worm has been wiggling for - #+caption: the last few frames. It uses a fourier analysis of the muscle + #+caption: the last few frames. It uses a Fourier analysis of the muscle #+caption: contractions of the worm's tail to determine wiggling. This is - #+caption: signigicant because there is no particular frame that clearly + #+caption: significant because there is no particular frame that clearly #+caption: indicates that the worm is wiggling --- only when multiple frames #+caption: are analyzed together is the wiggling revealed. Defining #+caption: wiggling this way also gives the worm an opportunity to learn @@ -2738,7 +2718,7 @@ #+end_listing #+caption: Using =debug-experience=, the body-centered predicates - #+caption: work together to classify the behaviour of the worm. + #+caption: work together to classify the behavior of the worm. #+caption: the predicates are operating with access to the worm's #+caption: full sensory data. #+name: basic-worm-view @@ -2749,10 +2729,10 @@ empathic recognition system. There is power in the simplicity of the action predicates. They describe their actions without getting confused in visual details of the worm. Each one is frame - independent, but more than that, they are each indepent of + independent, but more than that, they are each independent of irrelevant visual details of the worm and the environment. They will work regardless of whether the worm is a different color or - hevaily textured, or if the environment has strange lighting. + heavily textured, or if the environment has strange lighting. The trick now is to make the action predicates work even when the sensory data on which they depend is absent. If I can do that, then @@ -2776,7 +2756,7 @@ As the worm moves around during free play and its experience vector grows larger, the vector begins to define a subspace which is all - the sensations the worm can practicaly experience during normal + the sensations the worm can practically experience during normal operation. I call this subspace \Phi-space, short for physical-space. The experience vector defines a path through \Phi-space. This path has interesting properties that all derive @@ -2801,7 +2781,7 @@ body along a specific path through \Phi-space. There is a simple way of taking \Phi-space and the total ordering - provided by an experience vector and reliably infering the rest of + provided by an experience vector and reliably inferring the rest of the senses. ** Empathy is the process of tracing though \Phi-space @@ -2817,8 +2797,8 @@ matching experience records for each input, using the tiered proprioceptive bins. - Finally, to infer sensory data, select the longest consective chain - of experiences. Conecutive experience means that the experiences + Finally, to infer sensory data, select the longest consecutive chain + of experiences. Consecutive experience means that the experiences appear next to each other in the experience vector. This algorithm has three advantages: @@ -2833,8 +2813,8 @@ 2. It protects from wrong interpretations of transient ambiguous proprioceptive data. For example, if the worm is flat for just - an instant, this flattness will not be interpreted as implying - that the worm has its muscles relaxed, since the flattness is + an instant, this flatness will not be interpreted as implying + that the worm has its muscles relaxed, since the flatness is part of a longer chain which includes a distinct pattern of muscle activation. Markov chains or other memoryless statistical models that operate on individual frames may very well make this @@ -2855,7 +2835,7 @@ (defn gen-phi-scan "Nearest-neighbors with binning. Only returns a result if - the propriceptive data is within 10% of a previously recorded + the proprioceptive data is within 10% of a previously recorded result in all dimensions." [phi-space] (let [bin-keys (map bin [3 2 1]) @@ -2882,13 +2862,13 @@ from previous experience. It prefers longer chains of previous experience to shorter ones. For example, during training the worm might rest on the ground for one second before it performs its - excercises. If during recognition the worm rests on the ground for - five seconds, =longest-thread= will accomodate this five second + exercises. If during recognition the worm rests on the ground for + five seconds, =longest-thread= will accommodate this five second rest period by looping the one second rest chain five times. - =longest-thread= takes time proportinal to the average number of + =longest-thread= takes time proportional to the average number of entries in a proprioceptive bin, because for each element in the - starting bin it performes a series of set lookups in the preceeding + starting bin it performs a series of set lookups in the preceding bins. If the total history is limited, then this is only a constant multiple times the number of entries in the starting bin. This analysis also applies even if the action requires multiple longest @@ -2966,7 +2946,7 @@ experiences from the worm that includes the actions I want to recognize. The =generate-phi-space= program (listing \ref{generate-phi-space} runs the worm through a series of - exercices and gatheres those experiences into a vector. The + exercises and gatherers those experiences into a vector. The =do-all-the-things= program is a routine expressed in a simple muscle contraction script language for automated worm control. It causes the worm to rest, curl, and wiggle over about 700 frames @@ -2975,7 +2955,7 @@ #+caption: Program to gather the worm's experiences into a vector for #+caption: further processing. The =motor-control-program= line uses #+caption: a motor control script that causes the worm to execute a series - #+caption: of ``exercices'' that include all the action predicates. + #+caption: of ``exercises'' that include all the action predicates. #+name: generate-phi-space #+begin_listing clojure #+begin_src clojure @@ -3039,14 +3019,14 @@ #+caption: From only proprioceptive data, =EMPATH= was able to infer #+caption: the complete sensory experience and classify four poses - #+caption: (The last panel shows a composite image of \emph{wriggling}, + #+caption: (The last panel shows a composite image of /wiggling/, #+caption: a dynamic pose.) #+name: empathy-debug-image #+ATTR_LaTeX: :width 10cm :placement [H] [[./images/empathy-1.png]] One way to measure the performance of =EMPATH= is to compare the - sutiability of the imagined sense experience to trigger the same + suitability of the imagined sense experience to trigger the same action predicates as the real sensory experience. #+caption: Determine how closely empathy approximates actual @@ -3086,7 +3066,7 @@ Running =test-empathy-accuracy= using the very short exercise program defined in listing \ref{generate-phi-space}, and then doing - a similar pattern of activity manually yeilds an accuracy of around + a similar pattern of activity manually yields an accuracy of around 73%. This is based on very limited worm experience. By training the worm for longer, the accuracy dramatically improves. @@ -3113,21 +3093,21 @@ =test-empathy-accuracy=. The majority of errors are near the boundaries of transitioning from one type of action to another. During these transitions the exact label for the action is more open - to interpretation, and dissaggrement between empathy and experience + to interpretation, and disagreement between empathy and experience is more excusable. ** Digression: Learn touch sensor layout through free play In the previous section I showed how to compute actions in terms of - body-centered predicates which relied averate touch activation of - pre-defined regions of the worm's skin. What if, instead of - recieving touch pre-grouped into the six faces of each worm - segment, the true topology of the worm's skin was unknown? This is - more similiar to how a nerve fiber bundle might be arranged. While - two fibers that are close in a nerve bundle /might/ correspond to - two touch sensors that are close together on the skin, the process - of taking a complicated surface and forcing it into essentially a - circle requires some cuts and rerragenments. + body-centered predicates which relied on the average touch + activation of pre-defined regions of the worm's skin. What if, + instead of receiving touch pre-grouped into the six faces of each + worm segment, the true topology of the worm's skin was unknown? + This is more similar to how a nerve fiber bundle might be + arranged. While two fibers that are close in a nerve bundle /might/ + correspond to two touch sensors that are close together on the + skin, the process of taking a complicated surface and forcing it + into essentially a circle requires some cuts and rearrangements. In this section I show how to automatically learn the skin-topology of a worm segment by free exploration. As the worm rolls around on the @@ -3151,15 +3131,15 @@ #+end_listing After collecting these important regions, there will many nearly - similiar touch regions. While for some purposes the subtle + similar touch regions. While for some purposes the subtle differences between these regions will be important, for my - purposes I colapse them into mostly non-overlapping sets using - =remove-similiar= in listing \ref{remove-similiar} - - #+caption: Program to take a lits of set of points and ``collapse them'' - #+caption: so that the remaining sets in the list are siginificantly + purposes I collapse them into mostly non-overlapping sets using + =remove-similar= in listing \ref{remove-similar} + + #+caption: Program to take a list of sets of points and ``collapse them'' + #+caption: so that the remaining sets in the list are significantly #+caption: different from each other. Prefer smaller sets to larger ones. - #+name: remove-similiar + #+name: remove-similar #+begin_listing clojure #+begin_src clojure (defn remove-similar @@ -3181,7 +3161,7 @@ Actually running this simulation is easy given =CORTEX='s facilities. #+caption: Collect experiences while the worm moves around. Filter the touch - #+caption: sensations by stable ones, collapse similiar ones together, + #+caption: sensations by stable ones, collapse similar ones together, #+caption: and report the regions learned. #+name: learn-touch #+begin_listing clojure @@ -3216,7 +3196,7 @@ #+end_src #+end_listing - The only thing remining to define is the particular motion the worm + The only thing remaining to define is the particular motion the worm must take. I accomplish this with a simple motor control program. #+caption: Motor control program for making the worm roll on the ground. @@ -3275,7 +3255,7 @@ the worm's physiology and the worm's environment to correctly deduce that the worm has six sides. Note that =learn-touch-regions= would work just as well even if the worm's touch sense data were - completely scrambled. The cross shape is just for convienence. This + completely scrambled. The cross shape is just for convenience. This example justifies the use of pre-defined touch regions in =EMPATH=. * Contributions @@ -3283,19 +3263,18 @@ In this thesis you have seen the =CORTEX= system, a complete environment for creating simulated creatures. You have seen how to implement five senses: touch, proprioception, hearing, vision, and - muscle tension. You have seen how to create new creatues using + muscle tension. You have seen how to create new creatures using blender, a 3D modeling tool. I hope that =CORTEX= will be useful in further research projects. To this end I have included the full source to =CORTEX= along with a large suite of tests and examples. I - have also created a user guide for =CORTEX= which is inculded in an - appendix to this thesis \ref{}. -# dxh: todo reference appendix + have also created a user guide for =CORTEX= which is included in an + appendix to this thesis. You have also seen how I used =CORTEX= as a platform to attach the /action recognition/ problem, which is the problem of recognizing actions in video. You saw a simple system called =EMPATH= which - ientifies actions by first describing actions in a body-centerd, - rich sense language, then infering a full range of sensory + identifies actions by first describing actions in a body-centered, + rich sense language, then inferring a full range of sensory experience from limited data using previous experience gained from free play. @@ -3305,23 +3284,22 @@ In conclusion, the main contributions of this thesis are: - - =CORTEX=, a system for creating simulated creatures with rich - senses. - - =EMPATH=, a program for recognizing actions by imagining sensory - experience. - -# An anatomical joke: -# - Training -# - Skeletal imitation -# - Sensory fleshing-out -# - Classification + - =CORTEX=, a comprehensive platform for embodied AI experiments. + =CORTEX= supports many features lacking in other systems, such + proper simulation of hearing. It is easy to create new =CORTEX= + creatures using Blender, a free 3D modeling program. + + - =EMPATH=, which uses =CORTEX= to identify the actions of a + worm-like creature using a computational model of empathy. + #+BEGIN_LaTeX \appendix #+END_LaTeX + * Appendix: =CORTEX= User Guide Those who write a thesis should endeavor to make their code not only - accessable, but actually useable, as a way to pay back the community + accessible, but actually usable, as a way to pay back the community that made the thesis possible in the first place. This thesis would not be possible without Free Software such as jMonkeyEngine3, Blender, clojure, emacs, ffmpeg, and many other tools. That is why I @@ -3349,7 +3327,7 @@ Creatures are created using /Blender/, a free 3D modeling program. You will need Blender version 2.6 when using the =CORTEX= included - in this thesis. You create a =CORTEX= creature in a similiar manner + in this thesis. You create a =CORTEX= creature in a similar manner to modeling anything in Blender, except that you also create several trees of empty nodes which define the creature's senses. @@ -3417,7 +3395,7 @@ to set the empty node's display mode to ``Arrows'' so that you can clearly see the direction of the axes. - Each retina file should contain white pixels whever you want to be + Each retina file should contain white pixels wherever you want to be sensitive to your chosen color. If you want the entire field of view, specify :all of 0xFFFFFF and a retinal map that is entirely white. @@ -3453,7 +3431,7 @@ #+END_EXAMPLE You may also include an optional ``scale'' metadata number to - specifiy the length of the touch feelers. The default is $0.1$, + specify the length of the touch feelers. The default is $0.1$, and this is generally sufficient. The touch UV should contain white pixels for each touch sensor. @@ -3475,7 +3453,7 @@ #+ATTR_LaTeX: :width 9cm :placement [H] [[./images/finger-2.png]] -*** Propriocepotion +*** Proprioception Proprioception is tied to each joint node -- nothing special must be done in a blender model to enable proprioception other than @@ -3582,10 +3560,10 @@ representing that described in a blender file. - =(light-up-everything world)= :: distribute a standard compliment - of lights throught the simulation. Should be adequate for most + of lights throughout the simulation. Should be adequate for most purposes. - - =(node-seq node)= :: return a recursuve list of the node's + - =(node-seq node)= :: return a recursive list of the node's children. - =(nodify name children)= :: construct a node given a node-name and @@ -3638,7 +3616,7 @@ - =(proprioception! creature)= :: give the creature the sense of proprioception. Returns a list of functions, one for each joint, that when called during a running simulation will - report the =[headnig, pitch, roll]= of the joint. + report the =[heading, pitch, roll]= of the joint. - =(movement! creature)= :: give the creature the power of movement. Creates a list of functions, one for each muscle, that when @@ -3677,7 +3655,7 @@ function will import all jMonkeyEngine3 classes for immediate use. - - =(display-dialated-time world timer)= :: Shows the time as it is + - =(display-dilated-time world timer)= :: Shows the time as it is flowing in the simulation on a HUD display. diff -r ced955c3c84f -r 68665d2c32a7 thesis/rlm-cortex-meng.tex --- a/thesis/rlm-cortex-meng.tex Sun Mar 30 22:48:19 2014 -0400 +++ b/thesis/rlm-cortex-meng.tex Mon Mar 31 00:18:26 2014 -0400 @@ -53,7 +53,7 @@ \usepackage{minted} \usepackage[backend=bibtex,style=alphabetic]{biblatex} %\usepackage[section]{placeins} -\usepackage[section,subsection,subsubsection]{extraplaceins} +\usepackage[section,subsection]{extraplaceins} %\floatsetup[listing]{style=Plaintop}