Task Goal Ideation

From Hanson Robotics Wiki
Jump to: navigation, search

Goals, from short to long term

  • Within 15 weeks, we hope to achieve a first character robot prototype within a new framework built from Blender, CogBot, Open Cog, Hanson character software, and computer vision, using a character animation rig to track people with unprecedented expressivity, and engage the user in basic dialogue with CogBot and nice looking lipsync, and first basic integration with Open Cog.
  • Within 15 months, we hope to achieve walking, expressive humanoid robot characters, deployed as product that human users perceive as alive, captivating, and lovable. We hope to take the TED prize as the robot walks on stage and composes its own compelling TED talk, and answers judges’ questions afterwards.
  • Within 15 years, we hope to achieve human-level AGI, with creative intelligence on-par-with or exceeding humanity’s greatest geniuses.

Software Strategy, Goals, Tasks

  • a. test model/rig with Morse
    • use Morse especially for Dynamixel control
    • need to find or make good Dynamixel models for Morse (including the electronics and firmware)
    • use both blender and morse,
  • investigate smartbody for authoring procedural animations and their state graphs (Massive-Software-style)
  • Connect FaceShift to Blender, to control a character rig in blender, then control servos in Hanson robot (both dynamixel and pololu) to correspond to the blender rig
  • b. Get Blender recording and saving these animations.
  • d. remap Faceshift control points to robot control points (in Blender rig
  • e. test and strategize the BGE (BGE/Morse for the long term)
  • f. Build a good Dmitry model and rig in Blender/BGE/and.or Morse
  • a. Build a good Eva rig
  • g. Output the Blender rig contols to move the accurately to the robot
    • i. motion output from Blender, translated to control signal, sent to com ports
    • ii. probably through a 0mq or TCPIP port, as a separate layer
    • iii. Later through a ROS stack
  • h. Control the Blender (BGE) rig using Faceshift
    • i. First: teleoperation mode
    • ii. Second: Capture Faceshift animations in Blender, to build a library of such animations
    • ii. Later, represent Dmitry accurately
  • i. We want to test what BGE features will work in Morse
      • can we use a Blender rig and BGE-type interactive character scripting to control our rig in Morse?
    • We want the complete game engine character control, with interactive animations, animation blending, animation authoring pipeline
  • j. Ben’s link to a genetic muscle-based gait generator
  • k. Ben’s 3-layer animator: motion primitives, generators (splines and moses-genetic) of multi-optimized primitives (including motion capture, artistic principles, biomechanics, low-energy optimization, stability of paths, and other useful fitness functions), high-level semantic representation in the atom space for planning (reasoning, affordances, micro-theories, SLAM). Loops among the layers for learning, hypothesizing, creativity/emergence, and urge-based goal pursuit (including narrative character behavior).

Motivated character behaviour

1. Urges (primitive goal-stimulus-behavior loops)

  • a. Reflexes, reward-seeking loops:
    • i. Gaze behaviour
    • ii. Attentional constructs, saccade primitives
  • b. Affordance models of urges
  • c. Emotions, metabolism, cognitive states
  • d. Correlated behaviors

2. Motives—Built up on/from urges

  • a. Urge subsumption hierarchy
  • b. Willed Urge may subsume Reflexive Urges and Conditioned Urges, which may over-ride the Reflexive Urges,
  • c. uses repurposed lower urges,

3. Goals—Built on motives, must use motives and

  • a. Combined beliefs, urges

Short-term goals

  • a. Lipsync through Blender
    • i. layer that generates spoken audio and visemes from dialogue generator (OpenCog/CE/Cogbot)

4. sending the text to ms-sapi (or Acapela sapi or other) uses ms-sapi, taking sending the resulting visemes to blender layer that uses annosoft to generate visemes send the visemes to Blender

  • b. Integrate microcone SDK, for directional speaker tracking
  • c. Perceptual fusion,

Tracking people, objects, space 1. People ID

  • xv. Fusion of persons’ faces, gestures, gaze, moutn motions, speaking, Face ID, shirt-blob tracking, saliency
  • xvii. Representing them in a 3 space
  • xviii. Building game models based on best-guess perceptions
    • 1. Competing models, weighted in confidence
  • xix. SLAM controls, vs. game engine controls. Integration and separate/parallel use.
    • d. Attention regulation system
  • xx. Reflexes
  • xxi. Goal-urges, for
    • 1. Artistic
    • 2. For Information acquisition
    • 3. Game engine scripting vs. dialogue(open-dial), vs. OpenPsi, vs. OpenCog, vs. Chemsim
  • e. Integrate saliency tracking
  • f. Kinect people tracking

Responses, attention

  • xxiv. Look at faces more, but follow peoples hands some
  • xxv. Saliency for boredom, RoI
    • g. Gaze following behaviors, using
  • xxvii. Eye trackers
  • h. Improved face tracking
  • xxviii. Eye-to eye motion, looking at mouth
  • xxix. Fusion of eye cam, wide angle body cam, Kinect, audio localization, etc
  • j. CI2CV activities.
  • k. Face Recognition (ID)
  • l. MPT toolkit for facial expression detection
  • m. Voice affect detection
  • n. Accent detection
  • o. Voice biometrics--Speaker ID
  • p. People models/objects/theories

Facts about a person, encoded in an expert database

1. Chronology of a person, both regarding the robots’ experience with that person and

2. Social map of that person, with chronology of relationships and events, affect mapping, desires and goals mapping through time.

  • xxxi. Predictions about a given person
  • xxxii. Robot’s relationship with the person
    • 1. How the person perceives the robot (& guesses about what people think the robot wants and perceives)
    • 2. What the person may expects from the robot—the robot’s obligations and promises to the person,
    • 3. The robot’s feelings towards that person and the stories behind the feelings—the rationalizations,
  • xxxiii. What the robot wants from that person (and a schedule and plan for these wants)
  • xxxiv. Perceptions should all map into these

Narrative interaction templates

  • xxxv. Time-based urge patterns
  • xxxvi. Affordance patterns

Layers of robot behaviors and urges

  • xxxvii. Physical reflex-like behaviors--Massive like
  • xxxviii. Verbal behaviors. NLG for goals
  • xxxix. OpenPsi behaviors as urges (pseudo-evolutionary psychology, physio-sim)
  • xl. Cognitive/emotional goal graphs
  • xli. Coordination among these layers
  • xlii. Planning for achieving goals
  • xliii. Story generation
  • xliv. Learning new strategies for plans—mimicking old stories
  • xlv. Affordance micro-steps
  • xlvi. Semantic Affordance graphs
    • 1. Learning, experimentation
    • 2. Imagination, simulation in advance, SLAM it
  • xlvii. Narrative templates
    • 1. Reward cycles—
    • 2. Use in learning and evolution (GA)

ROS Motors

BGE-Morse to ROS motor communications (RoMoCo) layer Specifications.

  • xlviii. Task: get BGE +Morse +a ROS motor control output working,
  • l. Next, with config table to specify each servo’s characteristics. Validate this works with
  • several servos.
  • li. Finally, control an actual robot face, using a BGE face rig.


  • lii. ROS
    • 1. ROS nodes exist for all the motor types we are using (and the types of motor controllers we're using).
    • 2. Robot Operating System (ROS): http://www.ros.org/.
    • 3. ROS is the leading standard for low-level robotics control and communication--mostly for sensor and motor data communications, low-level distributed processing, and memory management. So: ROS… is a file format, an application server, a network protocol... it also allows you to wrap up any given software as a node in ROS, which allows the software to communicate with other ROS nodes using the ROS protocols.
    • 4. While ROS doesn't really do much cognitive stuff, it's got good libraries for controlling motors and sensors.
    • 5. The libraries of existing ROS nodes include various motor controllers, which we would use for our BGE-to-Robot interface.
  • liii. Motors output:
    • 1. some are hobby servos, controlled via the Pololu Maestro 24 servocontrollers, using ssc-mode on the Pololu board; that communicates through an RS-232 com port.
    • 2. http://answers.ros.org/question/10357/is-there-a-ros-wrapper-for-pololu-smc-04b-servomotor/
    • 3. the other motors are the Robotis MX-Series Dynamixel motors, controlled through the USB-to-Dynamixel controller, which is an RS-485 interface.
    • 4. http://wiki.ros.org/dynamixel_motor
    • 5. ROS has the tools for communicating to the motors, but I have seen nothing like our desired "motion orchestrator" command layer in ROS.
    • 6. Our motor command layer should abstractly represent motions for each (servo) channel
    • 7. probably as -1 to 1, with 0 being the default position
    • 8. the input to the motor control layer will be translated to this abstraction, before going out to the motors. This abstraction will be (16-bit? 32 bit resolution?) floating point
    • 9. the MX Dynamixels have 5000 unit resolution, but other motors may have higher reolution and we want to keep the door open for that after we translate to an abstraction of each motion channel (bone/servo), then we will translate each channel to output to the specific desired kind of servo, and its corresponding control command.
    • 10. Each motor will have a safe range of operation, and a desired default position.
  • lv. So, we will want to specify the values for each motor output in a config file (probably as xml). This config file should specify for each servo output channel: servo #, servo name, type of servo, type of controller, max position, default position, min position.

Expansion on Long-term goals and our Mission

  • a. To give rise to Genius Machines—automata who match the best of humans in intellect, creativity, compassion, wisdom, ethics, social skills, etc

Expansion on mid-range Goals

  • a. To experiment in new systems and components to achieve more brilliant humanlike behavior in machines
  • b. To facilitate new experimentation in systems and components
  • c. To design the above to be art—perceived powerful by people, wondrous, smart
  • d. To design the above as science—science of mind, HCI/HRI, materials, biosciences, etc
  • e. To design the above as product—make useful things, widely adopted and profitable
  • f. To get robots to perceive and respond to people in ways that seem smart and useful—numerous

Expansion of short-range goals (and some tasks)

To finish Dmitry

  • i. Lip sync with Annosoft from Mark Zartler, using live mic feed
  • ii. FaceShift control
  • 1. FaceShift – lip-sync mediation rules
  • 2. FaceShift – face-tracking mediation
  • iii. Make joystick buttons play animations (VSA authored animations)
  • 1. Either by calling VSA, or by playing the animations from the library of animations
  • iv. Build a new Dynamixel Neck for Phil, to use the Phil for testing
  • v. Good face tracking with joystick override
  • 2. Display the WideAngle scene with joystick crosshairs
  • 1. Takes (Blender) input to control both hobby servos and Dynamixels, smoothly playing the animations
  • 2. Take joystick commands into Blender
  • ii. joystick
  • 1. upon trigger-pull, the joystick overrides the FaceShift and facetracking, allowing the user to take control of the neck and eyes.
  • 1. this should take control relatively from current neck/eye positions, not absolute position
  • i. (Or it could be absolute, IF we allow the user to see where the joystick positions are relative to the scene. This can be achieved by showing the user a wide angle view of the scene with a crosshairs to show where the eyes and neck will go when you pull the trigger given joystick current position )
  • 2. Play animations with joystick buttons pushed (maybe VSA, maybe your own player using csv animation files)
  • iii. Do the lipsync using Annosoft (get them in touch with Zartler for help)
  • c. Wide angle saliency tracking for robot peripheral vision
  • including attentional rules for glancing towards saliency

Character Drives

  • a. Urge-based narrative generation: rationalization and mindshare campaigns. Our stories serve our urges. Urges sublimate into goals.
  • b. Chemsim is just a loose model of bio-based urge mechanisms
  • i. Chemsim/ basic biodrives
  • 1. Metabolic models, organ models
  • 2. Endocrine
  • 3. Autonomic nervous system,
  • 4. Emotions/limbic system
  • 5. Attentional system and reflexes
  • 6. CNS Reflexes, Central pattern generators
  • a. Saccade and saliency responses
  • i. Informational models built from these as information gathering systems
  • b. Emotional models of information input in general
  • 7. Motive/drive sequences
  • a. The narrative of drives—what are we after
  • 8. “Higher” drives
  • a. Social drives
  • i. Belonging, tribe, family, bonding
  • ii. Rebellion, power struggles, release from the group
  • iii. Moral authority
  • b. Exploration, discover, play, humor
  • c. Artificial goals (institutional/cultural affects)
  • d. Higher drives are composed from other drives. How do they interoperate to achieve this?

Capture Character Engine logic into other system

  • 1.2.1. like Open Dial
  • 1.2.2. Test with Philip K Dick personality
  • 1.3. Branching initiatives
  • 1.4. Integrate latest Cogbot advances
  • 1.4.1. Compare these to Open Cog systematically
  • 1.4.2. Get Kino to Hong Kong to work with our team here
  • 1.5. Urges/emotion/motivation State Graph,
  • 1.5.1. Massive inspired nodes
  • 1.5.2. create more animations and content to support it
  • 1.5.3. Animations for the initiatives
  • 1.5.4. waking up, checking out the environment, self, exploring,
  • 1.5.10. social interactions, responses to people, moods, mad, sad, proud,
  • 1.5.11. getting tired, going to sleep

Speech Rec improvement tricks

  • 1.6.1. Use multiple speech recognizers
  • 1.6.2. Load a user’s trained profile based on face rec, gender rec, and voice biometric ID
  • Save a user’s speech input as .WAV files for comparison (to speech rec-transcripts/log files), for later improvements to a user’s mode l
  • Different profiles for different noisy environments. Load the profile upon recognizing the environment (for example, by visual cues, by listening for noise patterns, by being told “you are at the Hong Kong convention center”, or by user selection from a handy UI).
  • 1.6.3. Weight the current-conversationally relevant vocabulary subsets, to increa
  • Dynamic opening/closing of Vocabs, as new separate speech rec/matching processes. For example, after a specific book is mentioned in dialogue, the specific vocabulary associated with that book is opened, which maps to the book organized by meanings , allowing a user to search the book, or to search commentary or writings about the book
  • Input of the additional Vocab, harvested from user transcripts
  • 1.6.4. Error correction dialogue logic, based on n-best weighted recognition
  • “I’m sorry, but Did you say X, or did you say Y? Please speak clearly”
  • 1.6.5. Semantic expansion, synonym expansion
    • Wordnet and NL understanding tools

Integrated Echo-cancellation system

  • 1.7.1. Filters audio between the audio-input /sound-card and applications/processes
  • 1.7.2. Can be turned on and off by outside calls
  • 1.7.3. Applications could to request unfiltered or filtered audio data,
  • 1.7.4. May want to also buffer the audio data with time information (a kind of memory)

Machine Perception

  • 1.8.1. expression detection
  • 1.8.2. Face rec biometric ID
  • Move away from Cognitec to another more open, cheaper ID system.
  • Connect to dialogue, saying the recognized person's name in dialogue
  • facerec-to-AI/dialogue/animation messaging
  • Facerec, new-face acquisition: we need image-quality validation
  • Enhance face rec with use optical-flow & pixel-density (shirt-tracking)
  • 1.8.3. Audio localization
  • Separating sound and sending it to speech recognition
  • Correlating the Audio-localization data with the egosphere and users
  • 1.8.4. Saliency tracking
  • smell: http://gigaom.com/2014/04/29/consumer-physics-150-smartphone-spectrometer-can-tell-the-number-of-calories-in-your-food/

Animation improvements

  • 1.9.1. Add second-order smoothing
  • 1.9.2. Programmatic animations
  • 1.9.3. Animate brow up + down for left and right and up looking motions
  • 1.9.4. Test and tune the face tracking movements
  • 1.9.5. Design better authoring tools


Integration and fusion of:

  • 2.1. Face tracking
  • 2.2. Saliency tracking
  • 2.3. Sound tracking
  • 2.4. Emotion mirror
  • 3. Integrate Façade character dialogue techniques
  • 4. Integrate Narrative Science tools, for expository generation, and extemporaneous speaking
  • 4.1. Consider rules of emotional effect, for TED prize
  • 5. Integrate IBM Deep QA (aka Watson) for question answering about generated expository
  • 5.1. translate into microtheory frameworks
  • 5.2. Saliency tracking
  • 5.3. Audio localization
  • 5.4. “Good-bye” meaning should result in pause
  • 5.5. use micro-initiatives as Massive-type logic, or MICRO-THEORY FORMATION as path-planning. and their genetic branching
  • 5.6. Zubek-style probabalistic planning… to achieve reinforcement learning with Awareness,
  • 5.6.1. read: http://robert.zubek.net/blog/2009/09/23/needs-based-ai-part-3-action-performance/look to FaceX command
  • 5.7. Save state across sessions,
  • 5.8. 2-or-3-mic localization
  • 5.9. Motion detector/tracking
  • 5.10. Look4faces procedural animations, left right, local
  • 5.11. Remote-control
  • 5.11.1. Mouse/keystroke driven cues
  • 5.11.2. Config table specifying which strokes and cues


  • What remains to be built and how to build it; everything any developer will need to know to move forward
  • How to use the software, how to use the source code
  • Improvements you think need to be made, bugs that need to be fixed
  • Tasks accomplished, what you learned, how it all worked
  • System diagram
  • 6.1.2. Neatly packed install files, for porting to other machines

Organize a network of collaborators