Software architecture

From Hanson Robotics Wiki
Jump to: navigation, search

This page provides an overview of existing and planned robot animation systems, including the Arthur and the Eva head models. The overall flow of control consists of sensory inputs, primarily vision and audio, which is (will be) integrated via the Perception Synthesizer into a 3D spatial map. Behaviours and actions are (will be) generated by the Action Orchestrator; the actions are used to control the Blender API, causing a blender rig to move and animate. The Rig generates physiological action unit (PAU) messages, which are converted into specific physical robot control motor messages. Neither the Perception Synthesizer nor the Action Orchestrator exist at this time; these are large subsystems that are yet to be fully specified and created.

The Hardware wiki provides hardware engineering data. The Animation wiki discusses scripted character animations. The precise form that the animations will take, and how they will integrate with the Action Orchestrator, is an open work item. The discussion below adresses some of this.

The layout below is given in multiple sections. First, a listing of requirements that a successful system is expected to meet. Next, a menu of software technologies and systems that may be used to create subsystem components. Finally, quick sketches of the existing systems, and how they are put together.

Requirements

The various system architectures are expected to meet a number of requirements. These include:

Technology selection

Much of the HR work consists of integrating existing component technologies into a whole. A survey of possible software components that could be used is given in the Relevant Software page. These include:

  • Saliency tracking -- The tracking of (unknown) objects and perception of novel events in the visual field.
  • Face tracking and feature extraction -- Tracking of faces in the visual background (location, distance, headpose). Extraction of facial features (lip, cheek, eyebrow, gaze, forehead, etc.). Detection and extraction of emotional expressions (happy, sad, angry, etc.).
  • Sound/auditory processing -- Tracking location of voice/speech. Extracting voice from background chatter. Identifying stress and emotion in the voice. Basic speech recognition.

System Architectures

A brief review of the current and planned system architectures. This includes:

  • The current web UI system.
  • The current Arthur head system.
  • The planned Eva head system.

Current Web UI Architecture

Control of head via Web interface. See Robot Web Dashboard for a general strategic discussion of a user interface architecture. The list below is in rough order, from user input to robot control output.

  1. The ros_motors_webui provides a web interface for controlling the Einstein head, including a menu selection of emotional expressions, and individual motor controls.
  2. The basic head api. This provides a bridge to the ros_pololu_servo code, used to drive the actual motors. The head api is currently deprecated, and will be replaced by the XXX YYY ??? package.
  3. Motor control via the ros pololu servo package. This package has been forked from the geni-labs ros_pololu_servo package, and is no longer compatible with it. Here, "pololu" refers to Pololu Robotics and Electronics corporation, a provider of servo motors.

Current Arthur/Einstein Architecture

This section describes the current design and implementation of the Arthur and Einstein heads. The list below is in rough order, from sensory (vision) input to robot control output.

  1. Video in via UVC-compatible video camera (Using the usb_cam driver.)
  2. Vision processing to extract face locations (done via HR proprietary fork of the ROS pi_vision. The original authors of ROS pi_vision no longer support it, as of early 2013)
  3. Control of a blender rig, via the robo_blender ROS node. This node listens to various command messages, and publishes head, neck and eye positions as PAU messages. These are then interpreted by the pau2motors module, below. Multiple modes of operation are supported:
    • Animation mode: plays one of several canned animations
    • Look Around: ???
    • Manual Head: head is under direct control in blender; corresponding position information is output.
    • TrackDev: tracking via vision input
  4. The PAU messages are converted to motor commands via pau2motors.
  5. Motor control is performed by the ros_servo_pololu and the dynamixel_motor ROS nodes, which communicate with the Pololu Robotics and the Dynamixel servo motors.

Planned Eva Architecture

  1. Video in via UVC-compatible video camera (Using the usb_cam driver (see also the usb_cam ROS page).
  2. Vision processing to extract face locations, This is done via the HR fork of the ROS pi_vision. The original authors of ROS pi_vision no longer support it, as of early 2013. The HR version adds support for tracking multiple faces in the scene, and for publishing coordinates as positions in 3D space.
  3. Facial feature extraction by guessing location of face in bounding box: eyes are 2/3rds the way up on face, and each is 1/3rd from edge. Mouth is 1/3rd from the bottom. To be performed by ???
  4. Possible future facial feature extraction via Ci2CV or other package. See Relevant Software for other candidate technologies.
  5. Perception Synthesizer, to be created. Will integrate visual, audio and other sensory perceptions into a 3D map. Possibly based on code & architecture from Jamie Diprose [link?]. Generated messages will be XXX YYY ??? Queriable data will be XXX YYY ??? Long-term goals are for it to generate messages such as "saw-a-smiling-face" or "heard-sound-of-laughter". Short term goals are to generate messages of the form "see-persons-face at coordinate-location x,y,z"
  6. Behavior generation. Subscribes to perception messages, generates behaviors. The current implementation uses behaviors encoded in Owyl trees, implemnted in the eva_behavior package. Listens to face appeared/disappeared messages. Publishes gaze-at, look-at as well as emotion and gesture messages. The long-term solution is to specify and design an Action Orchestrator, which can not only respond to perceptions via hand-scripted animations, but also to model emotional state via OpenPsi and to learn and reason about the environment using OpenCog.
  7. Create a ROS node to receive behavior messages, and make python calls to the blender API. This API is a virtual python class. Any blender rig implementing this class will be able to respond to the behaviors generated in the previous steps.
  8. The Eva blender rig implements the Blender API for a specific head, the Eva head. It includes a "command listener" which accepts commands (such as setEmotionStates(
["amused" ...]
) and setEmotionGestures(["nod" ...][1]) and displays appropriately.
  9. Creation of PAU messages (specifically PAU ROS messages) by the blender rig. For every frame, three pau messages are built: neck orientation, eye-gaze direction and face expression. The published topics are specified in outputs.yaml along with a list of blender meshes to convert bone locations into shapekeys.
  10. Conversion of PAU messages into specific motor drives. See the ROS messages in pau2motors/pau'