rlm@513: diff -r f639e2139ce2 thesis/cortex.org rlm@513: --- a/thesis/cortex.org Sun Mar 30 01:34:43 2014 -0400 rlm@513: +++ b/thesis/cortex.org Sun Mar 30 10:07:17 2014 -0400 rlm@513: @@ -41,49 +41,46 @@ rlm@513: [[./images/aurellem-gray.png]] rlm@513: rlm@513: rlm@513: -* Empathy and Embodiment as problem solving strategies rlm@513: +* Empathy \& Embodiment: problem solving strategies rlm@513: rlm@513: - By the end of this thesis, you will have seen a novel approach to rlm@513: - interpreting video using embodiment and empathy. You will have also rlm@513: - seen one way to efficiently implement empathy for embodied rlm@513: - creatures. Finally, you will become familiar with =CORTEX=, a system rlm@513: - for designing and simulating creatures with rich senses, which you rlm@513: - may choose to use in your own research. rlm@513: - rlm@513: - This is the core vision of my thesis: That one of the important ways rlm@513: - in which we understand others is by imagining ourselves in their rlm@513: - position and emphatically feeling experiences relative to our own rlm@513: - bodies. By understanding events in terms of our own previous rlm@513: - corporeal experience, we greatly constrain the possibilities of what rlm@513: - would otherwise be an unwieldy exponential search. This extra rlm@513: - constraint can be the difference between easily understanding what rlm@513: - is happening in a video and being completely lost in a sea of rlm@513: - incomprehensible color and movement. rlm@513: - rlm@513: -** Recognizing actions in video is extremely difficult rlm@513: - rlm@513: - Consider for example the problem of determining what is happening rlm@513: - in a video of which this is one frame: rlm@513: - rlm@513: +** The problem: recognizing actions in video is extremely difficult rlm@513: +# developing / requires useful representations rlm@513: + rlm@513: + Examine the following collection of images. As you, and indeed very rlm@513: + young children, can easily determine, each one is a picture of rlm@513: + someone drinking. rlm@513: + rlm@513: + # dxh: cat, cup, drinking fountain, rain, straw, coconut rlm@513: #+caption: A cat drinking some water. Identifying this action is rlm@513: - #+caption: beyond the state of the art for computers. rlm@513: + #+caption: beyond the capabilities of existing computer vision systems. rlm@513: #+ATTR_LaTeX: :width 7cm rlm@513: [[./images/cat-drinking.jpg]] rlm@513: + rlm@513: + Nevertheless, it is beyond the state of the art for a computer rlm@513: + vision program to describe what's happening in each of these rlm@513: + images, or what's common to them. Part of the problem is that many rlm@513: + computer vision systems focus on pixel-level details or probability rlm@513: + distributions of pixels, with little focus on [...] rlm@513: + rlm@513: + rlm@513: + In fact, the contents of scene may have much less to do with pixel rlm@513: + probabilities than with recognizing various affordances: things you rlm@513: + can move, objects you can grasp, spaces that can be filled rlm@513: + (Gibson). For example, what processes might enable you to see the rlm@513: + chair in figure \ref{hidden-chair}? rlm@513: + # Or suppose that you are building a program that recognizes chairs. rlm@513: + # How could you ``see'' the chair ? rlm@513: rlm@513: - It is currently impossible for any computer program to reliably rlm@513: - label such a video as ``drinking''. And rightly so -- it is a very rlm@513: - hard problem! What features can you describe in terms of low level rlm@513: - functions of pixels that can even begin to describe at a high level rlm@513: - what is happening here? rlm@513: - rlm@513: - Or suppose that you are building a program that recognizes chairs. rlm@513: - How could you ``see'' the chair in figure \ref{hidden-chair}? rlm@513: - rlm@513: + # dxh: blur chair rlm@513: #+caption: The chair in this image is quite obvious to humans, but I rlm@513: #+caption: doubt that any modern computer vision program can find it. rlm@513: #+name: hidden-chair rlm@513: #+ATTR_LaTeX: :width 10cm rlm@513: [[./images/fat-person-sitting-at-desk.jpg]] rlm@513: + rlm@513: + rlm@513: + rlm@513: + rlm@513: rlm@513: Finally, how is it that you can easily tell the difference between rlm@513: how the girls /muscles/ are working in figure \ref{girl}? rlm@513: @@ -95,10 +92,13 @@ rlm@513: #+ATTR_LaTeX: :width 7cm rlm@513: [[./images/wall-push.png]] rlm@513: rlm@513: + rlm@513: + rlm@513: + rlm@513: Each of these examples tells us something about what might be going rlm@513: on in our minds as we easily solve these recognition problems. rlm@513: rlm@513: - The hidden chairs show us that we are strongly triggered by cues rlm@513: + The hidden chair shows us that we are strongly triggered by cues rlm@513: relating to the position of human bodies, and that we can determine rlm@513: the overall physical configuration of a human body even if much of rlm@513: that body is occluded. rlm@513: @@ -109,10 +109,107 @@ rlm@513: most positions, and we can easily project this self-knowledge to rlm@513: imagined positions triggered by images of the human body. rlm@513: rlm@513: -** =EMPATH= neatly solves recognition problems rlm@513: +** A step forward: the sensorimotor-centered approach rlm@513: +# ** =EMPATH= recognizes what creatures are doing rlm@513: +# neatly solves recognition problems rlm@513: + In this thesis, I explore the idea that our knowledge of our own rlm@513: + bodies enables us to recognize the actions of others. rlm@513: + rlm@513: + First, I built a system for constructing virtual creatures with rlm@513: + physiologically plausible sensorimotor systems and detailed rlm@513: + environments. The result is =CORTEX=, which is described in section rlm@513: + \ref{sec-2}. (=CORTEX= was built to be flexible and useful to other rlm@513: + AI researchers; it is provided in full with detailed instructions rlm@513: + on the web [here].) rlm@513: + rlm@513: + Next, I wrote routines which enabled a simple worm-like creature to rlm@513: + infer the actions of a second worm-like creature, using only its rlm@513: + own prior sensorimotor experiences and knowledge of the second rlm@513: + worm's joint positions. This program, =EMPATH=, is described in rlm@513: + section \ref{sec-3}, and the key results of this experiment are rlm@513: + summarized below. rlm@513: + rlm@513: + #+caption: From only \emph{proprioceptive} data, =EMPATH= was able to infer rlm@513: + #+caption: the complete sensory experience and classify these four poses. rlm@513: + #+caption: The last image is a composite, depicting the intermediate stages of \emph{wriggling}. rlm@513: + #+name: worm-recognition-intro-2 rlm@513: + #+ATTR_LaTeX: :width 15cm rlm@513: + [[./images/empathy-1.png]] rlm@513: + rlm@513: + # =CORTEX= provides a language for describing the sensorimotor rlm@513: + # experiences of various creatures. rlm@513: + rlm@513: + # Next, I developed an experiment to test the power of =CORTEX='s rlm@513: + # sensorimotor-centered language for solving recognition problems. As rlm@513: + # a proof of concept, I wrote routines which enabled a simple rlm@513: + # worm-like creature to infer the actions of a second worm-like rlm@513: + # creature, using only its own previous sensorimotor experiences and rlm@513: + # knowledge of the second worm's joints (figure rlm@513: + # \ref{worm-recognition-intro-2}). The result of this proof of rlm@513: + # concept was the program =EMPATH=, described in section rlm@513: + # \ref{sec-3}. The key results of this rlm@513: + rlm@513: + # Using only first-person sensorimotor experiences and third-person rlm@513: + # proprioceptive data, rlm@513: + rlm@513: +*** Key results rlm@513: + - After one-shot supervised training, =EMPATH= was able recognize a rlm@513: + wide variety of static poses and dynamic actions---ranging from rlm@513: + curling in a circle to wriggling with a particular frequency --- rlm@513: + with 95\% accuracy. rlm@513: + - These results were completely independent of viewing angle rlm@513: + because the underlying body-centered language fundamentally is; rlm@513: + once an action is learned, it can be recognized equally well from rlm@513: + any viewing angle. rlm@513: + - =EMPATH= is surprisingly short; the sensorimotor-centered rlm@513: + language provided by =CORTEX= resulted in extremely economical rlm@513: + recognition routines --- about 0000 lines in all --- suggesting rlm@513: + that such representations are very powerful, and often rlm@513: + indispensible for the types of recognition tasks considered here. rlm@513: + - Although for expediency's sake, I relied on direct knowledge of rlm@513: + joint positions in this proof of concept, it would be rlm@513: + straightforward to extend =EMPATH= so that it (more rlm@513: + realistically) infers joint positions from its visual data. rlm@513: + rlm@513: +# because the underlying language is fundamentally orientation-independent rlm@513: + rlm@513: +# recognize the actions of a worm with 95\% accuracy. The rlm@513: +# recognition tasks rlm@513: rlm@513: - I propose a system that can express the types of recognition rlm@513: - problems above in a form amenable to computation. It is split into rlm@513: + rlm@513: + rlm@513: + rlm@513: + [Talk about these results and what you find promising about them] rlm@513: + rlm@513: +** Roadmap rlm@513: + [I'm going to explain how =CORTEX= works, then break down how rlm@513: + =EMPATH= does its thing. Because the details reveal such-and-such rlm@513: + about the approach.] rlm@513: + rlm@513: + # The success of this simple proof-of-concept offers a tantalizing rlm@513: + rlm@513: + rlm@513: + # explore the idea rlm@513: + # The key contribution of this thesis is the idea that body-centered rlm@513: + # representations (which express rlm@513: + rlm@513: + rlm@513: + # the rlm@513: + # body-centered approach --- in which I try to determine what's rlm@513: + # happening in a scene by bringing it into registration with my own rlm@513: + # bodily experiences --- are indispensible for recognizing what rlm@513: + # creatures are doing in a scene. rlm@513: + rlm@513: +* COMMENT rlm@513: +# body-centered language rlm@513: + rlm@513: + In this thesis, I'll describe =EMPATH=, which solves a certain rlm@513: + class of recognition problems rlm@513: + rlm@513: + The key idea is to use self-centered (or first-person) language. rlm@513: + rlm@513: + I have built a system that can express the types of recognition rlm@513: + problems in a form amenable to computation. It is split into rlm@513: four parts: rlm@513: rlm@513: - Free/Guided Play :: The creature moves around and experiences the rlm@513: @@ -286,14 +383,14 @@ rlm@513: code to create a creature, and can use a wide library of rlm@513: pre-existing blender models as a base for your own creatures. rlm@513: rlm@513: - - =CORTEX= implements a wide variety of senses, including touch, rlm@513: + - =CORTEX= implements a wide variety of senses: touch, rlm@513: proprioception, vision, hearing, and muscle tension. Complicated rlm@513: senses like touch, and vision involve multiple sensory elements rlm@513: embedded in a 2D surface. You have complete control over the rlm@513: distribution of these sensor elements through the use of simple rlm@513: png image files. In particular, =CORTEX= implements more rlm@513: comprehensive hearing than any other creature simulation system rlm@513: - available. rlm@513: + available. rlm@513: rlm@513: - =CORTEX= supports any number of creatures and any number of rlm@513: senses. Time in =CORTEX= dialates so that the simulated creatures rlm@513: @@ -353,7 +450,24 @@ rlm@513: \end{sidewaysfigure} rlm@513: #+END_LaTeX rlm@513: rlm@513: -** Contributions rlm@513: +** Road map rlm@513: + rlm@513: + By the end of this thesis, you will have seen a novel approach to rlm@513: + interpreting video using embodiment and empathy. You will have also rlm@513: + seen one way to efficiently implement empathy for embodied rlm@513: + creatures. Finally, you will become familiar with =CORTEX=, a system rlm@513: + for designing and simulating creatures with rich senses, which you rlm@513: + may choose to use in your own research. rlm@513: + rlm@513: + This is the core vision of my thesis: That one of the important ways rlm@513: + in which we understand others is by imagining ourselves in their rlm@513: + position and emphatically feeling experiences relative to our own rlm@513: + bodies. By understanding events in terms of our own previous rlm@513: + corporeal experience, we greatly constrain the possibilities of what rlm@513: + would otherwise be an unwieldy exponential search. This extra rlm@513: + constraint can be the difference between easily understanding what rlm@513: + is happening in a video and being completely lost in a sea of rlm@513: + incomprehensible color and movement. rlm@513: rlm@513: - I built =CORTEX=, a comprehensive platform for embodied AI rlm@513: experiments. =CORTEX= supports many features lacking in other rlm@513: @@ -363,18 +477,22 @@ rlm@513: - I built =EMPATH=, which uses =CORTEX= to identify the actions of rlm@513: a worm-like creature using a computational model of empathy. rlm@513: rlm@513: -* Building =CORTEX= rlm@513: - rlm@513: - I intend for =CORTEX= to be used as a general-purpose library for rlm@513: - building creatures and outfitting them with senses, so that it will rlm@513: - be useful for other researchers who want to test out ideas of their rlm@513: - own. To this end, wherver I have had to make archetictural choices rlm@513: - about =CORTEX=, I have chosen to give as much freedom to the user as rlm@513: - possible, so that =CORTEX= may be used for things I have not rlm@513: - forseen. rlm@513: - rlm@513: -** Simulation or Reality? rlm@513: - rlm@513: + rlm@513: +* Designing =CORTEX= rlm@513: + In this section, I outline the design decisions that went into rlm@513: + making =CORTEX=, along with some details about its rlm@513: + implementation. (A practical guide to getting started with =CORTEX=, rlm@513: + which skips over the history and implementation details presented rlm@513: + here, is provided in an appendix \ref{} at the end of this paper.) rlm@513: + rlm@513: + Throughout this project, I intended for =CORTEX= to be flexible and rlm@513: + extensible enough to be useful for other researchers who want to rlm@513: + test out ideas of their own. To this end, wherver I have had to make rlm@513: + archetictural choices about =CORTEX=, I have chosen to give as much rlm@513: + freedom to the user as possible, so that =CORTEX= may be used for rlm@513: + things I have not forseen. rlm@513: + rlm@513: +** Building in simulation versus reality rlm@513: The most important archetictural decision of all is the choice to rlm@513: use a computer-simulated environemnt in the first place! The world rlm@513: is a vast and rich place, and for now simulations are a very poor