# HG changeset patch # User Robert McIntyre # Date 1398712259 14400 # Node ID 5d89879fc894358ffe44d45d8a303a26b11b95ad # Parent f4770e3d30ae772dc9ecd2cd7aa47bad01816cdd couple hours worth of edits. diff -r f4770e3d30ae -r 5d89879fc894 thesis/cortex.org --- a/thesis/cortex.org Mon Apr 28 13:14:52 2014 -0400 +++ b/thesis/cortex.org Mon Apr 28 15:10:59 2014 -0400 @@ -43,15 +43,15 @@ * Empathy \& Embodiment: problem solving strategies - By the end of this thesis, you will have seen a novel approach to - interpreting video using embodiment and empathy. You will also see - one way to efficiently implement physical empathy for embodied - creatures. Finally, you will become familiar with =CORTEX=, a system - for designing and simulating creatures with rich senses, which I - have designed as a library that you can use in your own research. - Note that I /do not/ process video directly --- I start with - knowledge of the positions of a creature's body parts and works from - there. + By the end of this thesis, you will have a novel approach to + representing an recognizing physical actions using embodiment and + empathy. You will also see one way to efficiently implement physical + empathy for embodied creatures. Finally, you will become familiar + with =CORTEX=, a system for designing and simulating creatures with + rich senses, which I have designed as a library that you can use in + your own research. Note that I /do not/ process video directly --- I + start with knowledge of the positions of a creature's body parts and + works from there. This is the core vision of my thesis: That one of the important ways in which we understand others is by imagining ourselves in their @@ -81,11 +81,11 @@ \cite{volume-action-recognition}), but the 3D world is so variable that it is hard to describe the world in terms of possible images. - In fact, the contents of scene may have much less to do with pixel - probabilities than with recognizing various affordances: things you - can move, objects you can grasp, spaces that can be filled . For - example, what processes might enable you to see the chair in figure - \ref{hidden-chair}? + In fact, the contents of a scene may have much less to do with + pixel probabilities than with recognizing various affordances: + things you can move, objects you can grasp, spaces that can be + filled . For example, what processes might enable you to see the + chair in figure \ref{hidden-chair}? #+caption: The chair in this image is quite obvious to humans, but #+caption: it can't be found by any modern computer vision program. @@ -106,21 +106,21 @@ Each of these examples tells us something about what might be going on in our minds as we easily solve these recognition problems: - The hidden chair shows us that we are strongly triggered by cues - relating to the position of human bodies, and that we can determine - the overall physical configuration of a human body even if much of - that body is occluded. - - The picture of the girl pushing against the wall tells us that we - have common sense knowledge about the kinetics of our own bodies. - We know well how our muscles would have to work to maintain us in - most positions, and we can easily project this self-knowledge to - imagined positions triggered by images of the human body. - - The cat tells us that imagination of some kind plays an important - role in understanding actions. The question is: Can we be more - precise about what sort of imagination is required to understand - these actions? + - The hidden chair shows us that we are strongly triggered by cues + relating to the position of human bodies, and that we can + determine the overall physical configuration of a human body even + if much of that body is occluded. + + - The picture of the girl pushing against the wall tells us that we + have common sense knowledge about the kinetics of our own bodies. + We know well how our muscles would have to work to maintain us in + most positions, and we can easily project this self-knowledge to + imagined positions triggered by images of the human body. + + - The cat tells us that imagination of some kind plays an important + role in understanding actions. The question is: Can we be more + precise about what sort of imagination is required to understand + these actions? ** A step forward: the sensorimotor-centered approach @@ -135,12 +135,12 @@ the cool water hitting their tongue, and feel the water entering their body, and are able to recognize that /feeling/ as drinking. So, the label of the action is not really in the pixels of the - image, but is found clearly in a simulation inspired by those - pixels. An imaginative system, having been trained on drinking and - non-drinking examples and learning that the most important - component of drinking is the feeling of water sliding down one's - throat, would analyze a video of a cat drinking in the following - manner: + image, but is found clearly in a simulation / recollection inspired + by those pixels. An imaginative system, having been trained on + drinking and non-drinking examples and learning that the most + important component of drinking is the feeling of water sliding + down one's throat, would analyze a video of a cat drinking in the + following manner: 1. Create a physical model of the video by putting a ``fuzzy'' model of its own body in place of the cat. Possibly also create @@ -193,7 +193,7 @@ the particulars of any visual representation of the actions. If you teach the system what ``running'' is, and you have a good enough aligner, the system will from then on be able to recognize running - from any point of view, even strange points of view like above or + from any point of view -- even strange points of view like above or underneath the runner. This is in contrast to action recognition schemes that try to identify actions using a non-embodied approach. If these systems learn about running as viewed from the side, they @@ -201,12 +201,13 @@ viewpoint. Another powerful advantage is that using the language of multiple - body-centered rich senses to describe body-centered actions offers a - massive boost in descriptive capability. Consider how difficult it - would be to compose a set of HOG filters to describe the action of - a simple worm-creature ``curling'' so that its head touches its - tail, and then behold the simplicity of describing thus action in a - language designed for the task (listing \ref{grand-circle-intro}): + body-centered rich senses to describe body-centered actions offers + a massive boost in descriptive capability. Consider how difficult + it would be to compose a set of HOG (Histogram of Oriented + Gradients) filters to describe the action of a simple worm-creature + ``curling'' so that its head touches its tail, and then behold the + simplicity of describing thus action in a language designed for the + task (listing \ref{grand-circle-intro}): #+caption: Body-centered actions are best expressed in a body-centered #+caption: language. This code detects when the worm has curled into a @@ -272,10 +273,10 @@ together to form a coherent and complete sensory portrait of the scene. - - Recognition :: With the scene described in terms of - remembered first person sensory events, the creature can now - run its action-identified programs (such as the one in listing - \ref{grand-circle-intro} on this synthesized sensory data, + - Recognition :: With the scene described in terms of remembered + first person sensory events, the creature can now run its + action-definition programs (such as the one in listing + \ref{grand-circle-intro}) on this synthesized sensory data, just as it would if it were actually experiencing the scene first-hand. If previous experience has been accurately retrieved, and if it is analogous enough to the scene, then @@ -327,20 +328,21 @@ number of creatures. I intend it to be useful as a library for many more projects than just this thesis. =CORTEX= was necessary to meet a need among AI researchers at CSAIL and beyond, which is that - people often will invent neat ideas that are best expressed in the - language of creatures and senses, but in order to explore those + people often will invent wonderful ideas that are best expressed in + the language of creatures and senses, but in order to explore those ideas they must first build a platform in which they can create simulated creatures with rich senses! There are many ideas that - would be simple to execute (such as =EMPATH= or - \cite{larson-symbols}), but attached to them is the multi-month - effort to make a good creature simulator. Often, that initial - investment of time proves to be too much, and the project must make - do with a lesser environment. + would be simple to execute (such as =EMPATH= or Larson's + self-organizing maps (\cite{larson-symbols})), but attached to them + is the multi-month effort to make a good creature simulator. Often, + that initial investment of time proves to be too much, and the + project must make do with a lesser environment or be abandoned + entirely. =CORTEX= is well suited as an environment for embodied AI research for three reasons: - - You can create new creatures using Blender (\cite{blender}), a + - You can design new creatures using Blender (\cite{blender}), a popular 3D modeling program. Each sense can be specified using special blender nodes with biologically inspired parameters. You need not write any code to create a creature, and can use a wide @@ -352,9 +354,8 @@ senses like touch and vision involve multiple sensory elements embedded in a 2D surface. You have complete control over the distribution of these sensor elements through the use of simple - png image files. In particular, =CORTEX= implements more - comprehensive hearing than any other creature simulation system - available. + png image files. =CORTEX= implements more comprehensive hearing + than any other creature simulation system available. - =CORTEX= supports any number of creatures and any number of senses. Time in =CORTEX= dilates so that the simulated creatures @@ -425,7 +426,7 @@ Throughout this project, I intended for =CORTEX= to be flexible and extensible enough to be useful for other researchers who want to - test out ideas of their own. To this end, wherever I have had to make + test ideas of their own. To this end, wherever I have had to make architectural choices about =CORTEX=, I have chosen to give as much freedom to the user as possible, so that =CORTEX= may be used for things I have not foreseen. @@ -437,25 +438,26 @@ reflection of its complexity. It may be that there is a significant qualitative difference between dealing with senses in the real world and dealing with pale facsimiles of them in a simulation - \cite{brooks-representation}. What are the advantages and + (\cite{brooks-representation}). What are the advantages and disadvantages of a simulation vs. reality? *** Simulation The advantages of virtual reality are that when everything is a simulation, experiments in that simulation are absolutely - reproducible. It's also easier to change the character and world - to explore new situations and different sensory combinations. + reproducible. It's also easier to change the creature and + environment to explore new situations and different sensory + combinations. If the world is to be simulated on a computer, then not only do - you have to worry about whether the character's senses are rich + you have to worry about whether the creature's senses are rich enough to learn from the world, but whether the world itself is rendered with enough detail and realism to give enough working - material to the character's senses. To name just a few + material to the creature's senses. To name just a few difficulties facing modern physics simulators: destructibility of the environment, simulation of water/other fluids, large areas, nonrigid bodies, lots of objects, smoke. I don't know of any - computer simulation that would allow a character to take a rock + computer simulation that would allow a creature to take a rock and grind it into fine dust, then use that dust to make a clay sculpture, at least not without spending years calculating the interactions of every single small grain of dust. Maybe a @@ -471,14 +473,14 @@ the complexity of implementing the senses. Instead of just grabbing the current rendered frame for processing, you have to use an actual camera with real lenses and interact with photons to - get an image. It is much harder to change the character, which is + get an image. It is much harder to change the creature, which is now partly a physical robot of some sort, since doing so involves changing things around in the real world instead of modifying lines of code. While the real world is very rich and definitely - provides enough stimulation for intelligence to develop as - evidenced by our own existence, it is also uncontrollable in the + provides enough stimulation for intelligence to develop (as + evidenced by our own existence), it is also uncontrollable in the sense that a particular situation cannot be recreated perfectly or - saved for later use. It is harder to conduct science because it is + saved for later use. It is harder to conduct Science because it is harder to repeat an experiment. The worst thing about using the real world instead of a simulation is the matter of time. Instead of simulated time you get the constant and unstoppable flow of @@ -488,8 +490,8 @@ may simply be impossible given the current speed of our processors. Contrast this with a simulation, in which the flow of time in the simulated world can be slowed down to accommodate the - limitations of the character's programming. In terms of cost, - doing everything in software is far cheaper than building custom + limitations of the creature's programming. In terms of cost, doing + everything in software is far cheaper than building custom real-time hardware. All you need is a laptop and some patience. ** Simulated time enables rapid prototyping \& simple programs @@ -505,24 +507,24 @@ to be accelerated by ASIC chips or FPGAs, turning what would otherwise be a few lines of code and a 10x speed penalty into a multi-month ordeal. For this reason, =CORTEX= supports - /time-dilation/, which scales back the framerate of the - simulation in proportion to the amount of processing each frame. - From the perspective of the creatures inside the simulation, time - always appears to flow at a constant rate, regardless of how - complicated the environment becomes or how many creatures are in - the simulation. The cost is that =CORTEX= can sometimes run slower - than real time. This can also be an advantage, however --- - simulations of very simple creatures in =CORTEX= generally run at - 40x on my machine! + /time-dilation/, which scales back the framerate of the simulation + in proportion to the amount of processing each frame. From the + perspective of the creatures inside the simulation, time always + appears to flow at a constant rate, regardless of how complicated + the environment becomes or how many creatures are in the + simulation. The cost is that =CORTEX= can sometimes run slower than + real time. Time dialation works both ways, however --- simulations + of very simple creatures in =CORTEX= generally run at 40x real-time + on my machine! ** All sense organs are two-dimensional surfaces If =CORTEX= is to support a wide variety of senses, it would help - to have a better understanding of what a ``sense'' actually is! - While vision, touch, and hearing all seem like they are quite - different things, I was surprised to learn during the course of - this thesis that they (and all physical senses) can be expressed as - exactly the same mathematical object due to a dimensional argument! + to have a better understanding of what a sense actually is! While + vision, touch, and hearing all seem like they are quite different + things, I was surprised to learn during the course of this thesis + that they (and all physical senses) can be expressed as exactly the + same mathematical object! Human beings are three-dimensional objects, and the nerves that transmit data from our various sense organs to our brain are @@ -545,7 +547,7 @@ Most human senses consist of many discrete sensors of various properties distributed along a surface at various densities. For skin, it is Pacinian corpuscles, Meissner's corpuscles, Merkel's - disks, and Ruffini's endings \cite{textbook901}, which detect + disks, and Ruffini's endings (\cite{textbook901}), which detect pressure and vibration of various intensities. For ears, it is the stereocilia distributed along the basilar membrane inside the cochlea; each one is sensitive to a slightly different frequency of @@ -556,19 +558,19 @@ In fact, almost every human sense can be effectively described in terms of a surface containing embedded sensors. If the sense had any more dimensions, then there wouldn't be enough room in the - spinal chord to transmit the information! + spinal cord to transmit the information! Therefore, =CORTEX= must support the ability to create objects and then be able to ``paint'' points along their surfaces to describe each sense. Fortunately this idea is already a well known computer graphics - technique called /UV-mapping/. The three-dimensional surface of a - model is cut and smooshed until it fits on a two-dimensional - image. You paint whatever you want on that image, and when the - three-dimensional shape is rendered in a game the smooshing and - cutting is reversed and the image appears on the three-dimensional - object. + technique called /UV-mapping/. In UV-maping, the three-dimensional + surface of a model is cut and smooshed until it fits on a + two-dimensional image. You paint whatever you want on that image, + and when the three-dimensional shape is rendered in a game the + smooshing and cutting is reversed and the image appears on the + three-dimensional object. To make a sense, interpret the UV-image as describing the distribution of that senses sensors. To get different types of @@ -610,12 +612,12 @@ game engine will allow you to efficiently create multiple cameras in the simulated world that can be used as eyes. Video game systems offer integrated asset management for things like textures and - creatures models, providing an avenue for defining creatures. They + creature models, providing an avenue for defining creatures. They also understand UV-mapping, since this technique is used to apply a texture to a model. Finally, because video game engines support a - large number of users, as long as =CORTEX= doesn't stray too far - from the base system, other researchers can turn to this community - for help when doing their research. + large number of developers, as long as =CORTEX= doesn't stray too + far from the base system, other researchers can turn to this + community for help when doing their research. ** =CORTEX= is based on jMonkeyEngine3 @@ -623,14 +625,14 @@ engines to see which would best serve as a base. The top contenders were: - - [[http://www.idsoftware.com][Quake II]]/[[http://www.bytonic.de/html/jake2.html][Jake2]] :: The Quake II engine was designed by ID - software in 1997. All the source code was released by ID - software into the Public Domain several years ago, and as a - result it has been ported to many different languages. This - engine was famous for its advanced use of realistic shading - and had decent and fast physics simulation. The main advantage - of the Quake II engine is its simplicity, but I ultimately - rejected it because the engine is too tied to the concept of a + - [[http://www.idsoftware.com][Quake II]]/[[http://www.bytonic.de/html/jake2.html][Jake2]] :: The Quake II engine was designed by ID software + in 1997. All the source code was released by ID software into + the Public Domain several years ago, and as a result it has + been ported to many different languages. This engine was + famous for its advanced use of realistic shading and it had + decent and fast physics simulation. The main advantage of the + Quake II engine is its simplicity, but I ultimately rejected + it because the engine is too tied to the concept of a first-person shooter game. One of the problems I had was that there does not seem to be any easy way to attach multiple cameras to a single character. There are also several physics @@ -670,11 +672,11 @@ enable people who are talented at modeling but not programming to design =CORTEX= creatures. - Therefore, I use Blender, a free 3D modeling program, as the main + Therefore I use Blender, a free 3D modeling program, as the main way to create creatures in =CORTEX=. However, the creatures modeled in Blender must also be simple to simulate in jMonkeyEngine3's game engine, and must also be easy to rig with =CORTEX='s senses. I - accomplish this with extensive use of Blender's ``empty nodes.'' + accomplish this with extensive use of Blender's ``empty nodes.'' Empty nodes have no mass, physical presence, or appearance, but they can hold metadata and have names. I use a tree structure of @@ -699,14 +701,14 @@ Blender is a general purpose animation tool, which has been used in the past to create high quality movies such as Sintel - \cite{blender}. Though Blender can model and render even complicated - things like water, it is crucial to keep models that are meant to - be simulated as creatures simple. =Bullet=, which =CORTEX= uses - though jMonkeyEngine3, is a rigid-body physics system. This offers - a compromise between the expressiveness of a game level and the - speed at which it can be simulated, and it means that creatures - should be naturally expressed as rigid components held together by - joint constraints. + (\cite{blender}). Though Blender can model and render even + complicated things like water, it is crucial to keep models that + are meant to be simulated as creatures simple. =Bullet=, which + =CORTEX= uses though jMonkeyEngine3, is a rigid-body physics + system. This offers a compromise between the expressiveness of a + game level and the speed at which it can be simulated, and it means + that creatures should be naturally expressed as rigid components + held together by joint constraints. But humans are more like a squishy bag wrapped around some hard bones which define the overall shape. When we move, our skin bends @@ -729,10 +731,10 @@ physical model of the skin along with the movement of the bones, which is unacceptably slow compared to rigid body simulation. - Therefore, instead of using the human-like ``deformable bag of - bones'' approach, I decided to base my body plans on multiple solid - objects that are connected by joints, inspired by the robot =EVE= - from the movie WALL-E. + Therefore, instead of using the human-like ``bony meatbag'' + approach, I decided to base my body plans on multiple solid objects + that are connected by joints, inspired by the robot =EVE= from the + movie WALL-E. #+caption: =EVE= from the movie WALL-E. This body plan turns #+caption: out to be much better suited to my purposes than a more @@ -742,19 +744,19 @@ =EVE='s body is composed of several rigid components that are held together by invisible joint constraints. This is what I mean by - ``eve-like''. The main reason that I use eve-style bodies is for - efficiency, and so that there will be correspondence between the - AI's senses and the physical presence of its body. Each individual - section is simulated by a separate rigid body that corresponds - exactly with its visual representation and does not change. - Sections are connected by invisible joints that are well supported - in jMonkeyEngine3. Bullet, the physics backend for jMonkeyEngine3, - can efficiently simulate hundreds of rigid bodies connected by - joints. Just because sections are rigid does not mean they have to - stay as one piece forever; they can be dynamically replaced with - multiple sections to simulate splitting in two. This could be used - to simulate retractable claws or =EVE='s hands, which are able to - coalesce into one object in the movie. + /eve-like/. The main reason that I use eve-like bodies is for + simulation efficiency, and so that there will be correspondence + between the AI's senses and the physical presence of its body. Each + individual section is simulated by a separate rigid body that + corresponds exactly with its visual representation and does not + change. Sections are connected by invisible joints that are well + supported in jMonkeyEngine3. Bullet, the physics backend for + jMonkeyEngine3, can efficiently simulate hundreds of rigid bodies + connected by joints. Just because sections are rigid does not mean + they have to stay as one piece forever; they can be dynamically + replaced with multiple sections to simulate splitting in two. This + could be used to simulate retractable claws or =EVE='s hands, which + are able to coalesce into one object in the movie. *** Solidifying/Connecting a body @@ -2443,10 +2445,10 @@ improvement, among which are using vision to infer proprioception and looking up sensory experience with imagined vision, touch, and sound. - - Evolution :: Karl Sims created a rich environment for - simulating the evolution of creatures on a connection - machine. Today, this can be redone and expanded with =CORTEX= - on an ordinary computer. + - Evolution :: Karl Sims created a rich environment for simulating + the evolution of creatures on a Connection Machine + (\cite{sims-evolving-creatures}). Today, this can be redone + and expanded with =CORTEX= on an ordinary computer. - Exotic senses :: Cortex enables many fascinating senses that are not possible to build in the real world. For example, telekinesis is an interesting avenue to explore. You can also @@ -2457,7 +2459,7 @@ an effector which creates an entire new sub-simulation where the creature has direct control over placement/creation of objects via simulated telekinesis. The creature observes this - sub-world through it's normal senses and uses its observations + sub-world through its normal senses and uses its observations to make predictions about its top level world. - Simulated prescience :: step the simulation forward a few ticks, gather sensory data, then supply this data for the creature as @@ -2470,25 +2472,24 @@ with each other. Because the creatures would be simulated, you could investigate computationally complex rules of behavior which still, from the group's point of view, would happen in - ``real time''. Interactions could be as simple as cellular + real time. Interactions could be as simple as cellular organisms communicating via flashing lights, or as complex as humanoids completing social tasks, etc. - - =HACKER= for writing muscle-control programs :: Presented with - low-level muscle control/ sense API, generate higher level + - =HACKER= for writing muscle-control programs :: Presented with a + low-level muscle control / sense API, generate higher level programs for accomplishing various stated goals. Example goals might be "extend all your fingers" or "move your hand into the area with blue light" or "decrease the angle of this joint". It would be like Sussman's HACKER, except it would operate with much more data in a more realistic world. Start off with "calisthenics" to develop subroutines over the motor control - API. This would be the "spinal chord" of a more intelligent - creature. The low level programming code might be a turning - machine that could develop programs to iterate over a "tape" - where each entry in the tape could control recruitment of the - fibers in a muscle. - - Sense fusion :: There is much work to be done on sense + API. The low level programming code might be a turning machine + that could develop programs to iterate over a "tape" where + each entry in the tape could control recruitment of the fibers + in a muscle. + - Sense fusion :: There is much work to be done on sense integration -- building up a coherent picture of the world and - the things in it with =CORTEX= as a base, you can explore + the things in it. With =CORTEX= as a base, you can explore concepts like self-organizing maps or cross modal clustering in ways that have never before been tried. - Inverse kinematics :: experiments in sense guided motor control @@ -2761,7 +2762,7 @@ jumping actually /is/. Of course, the action predicates are not directly applicable to - video data which lacks the advanced sensory information which they + video data, which lacks the advanced sensory information which they require! The trick now is to make the action predicates work even when the @@ -2858,7 +2859,8 @@ #+END_EXAMPLE The worm's previous experience of lying on the ground and lifting - its head generates possible interpretations for each frame: + its head generates possible interpretations for each frame (the + numbers are experience-indices): #+BEGIN_EXAMPLE [ flat, flat, flat, flat, flat, flat, flat, lift-head ] @@ -2878,9 +2880,9 @@ #+END_EXAMPLE The new path through \Phi-space is synthesized from two actual - paths that the creature actually experiences, the "1-2-3-4" chain - and the "6-7-8-9" chain. The "1-2-3-4" chain is necessary because - it ends with the worm lifting its head. It originated from a short + paths that the creature has experienced: the "1-2-3-4" chain and + the "6-7-8-9" chain. The "1-2-3-4" chain is necessary because it + ends with the worm lifting its head. It originated from a short training session where the worm rested on the floor for a brief while and then raised its head. The "6-7-8-9" chain is part of a longer chain of inactivity where the worm simply rested on the @@ -3800,3 +3802,4 @@ +TODO -- add a paper about detecting biological motion from only a few dots. diff -r f4770e3d30ae -r 5d89879fc894 thesis/images/blender-worm.png Binary file thesis/images/blender-worm.png has changed diff -r f4770e3d30ae -r 5d89879fc894 thesis/images/empty-sense-nodes.png Binary file thesis/images/empty-sense-nodes.png has changed diff -r f4770e3d30ae -r 5d89879fc894 thesis/rlm-cortex-meng.tex --- a/thesis/rlm-cortex-meng.tex Mon Apr 28 13:14:52 2014 -0400 +++ b/thesis/rlm-cortex-meng.tex Mon Apr 28 15:10:59 2014 -0400 @@ -31,6 +31,7 @@ %\usepackage{floatrow} \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc} +\usepackage[headheight=14pt]{geometry} %\usepackage{fixltx2e} %\usepackage{graphicx} %\usepackage{longtable}