Perception

From Hanson Robotics Wiki
Jump to: navigation, search

Simply put, the goal is for the robot to be able to see and hear with human level or better abilities in the areas of object / person recognition, spatial acuity, sound disambiguation and localization, voice recognition -- and generally being able to see and hear the world without confusion. We will build this from simple to advanced, as with everything else, but the goal should always be to eliminate confusion and maximize coherence and understanding of the perceived world.

  • Face Tracking
    • Need to integrate with optical flow for better tracking, and with face and object recognition to do true identification and disambiguation of the visual field
  • Face Recognition / Identification (distinguish and remember individual faces)
    • Need to use this in the perception synthesizer to do better tracking, track reacquisition in an interactive context (i.e. not just forget the person you're talking to because the track is lost, but carry on a real interpersonal interaction)
    • Needs to tie into a long term memory database and be associated with names and other biographical data so Sophia can build a model of people data that can eventually be combined with emotional and cognitive reason to associate data with emotion and thoughts
  • General object recognition and tracking
    • Fill-in the rest of the world that isn't people and be able to better comprehend and navigate the world
  • Face Gesture and Emotion Recognition
    • Needs to feed into an emotional understanding and motivation model
    • Mimicry (which, coupled with questioning the user about their gestures, can tie into grounded reinforcement learning of gestures when those systems are online)
  • Skeleton / body gesture tracking
    • Use in the perception synthesizer to do better person tracking (associate with face, gives more track points)
    • Mimicry (which, coupled with questioning the user about their gestures, can tie into grounded reinforcement learning of gestures when those systems are online)
  • Body Gesture Recognition
    • Needs to feed into an emotional understanding and motivation model (body language)
  • Voice stress analysis / vocal intonation emotional analysis
    • Needs to feed into an emotional understanding and motivation model
  • Speech recognition
    • Emotional semantic analysis feeds into an emotional understanding and motivation model
    • Cognitive semantic analysis feeds into cognitive models and goal-driven factual discourse
  • Audio localization
    • Combine with object and person tracking to build a fuller picture of the world
    • Turn to face someone when being spoken to

For all tracking tasks both visual and IR, LIDAR, etc. sensing will be used. Whatever gives the best results will back-up continually improving visual analysis results -- just like humans use all their senses to round-out their view of the world despite having very good visual sensing. The perceptions, through perception synthesis, need to combine to give the best model of the world we can create.

Perception Synthesis

The goal of perception synthesis is to build a model of the world around the robot with identifiers for every object; factual and emotional memory associated with each previously identified person and object; and automatic acquisition of new people and objects, and updating of memory about already identified objects with perceptions.

Perception synthesis proper involves combining all the inputs into a unified in-the-moment model of the robot's surrounds. The further analysis (understanding, motivation and memory) take the output from perception / perception synthesis and use it to build cognitive and emotional models of the world and respond to it.

  • 3D model of perceived world
  • Identifiers associated with perceived objects
    • With disambiguation and memory, to give object persistence and history
    • Associate emotional and factual history with identifiers
    • Update data based on new perceptions and understanding

Once the robot can move, touch sensing will be added to the list of perceptions and , path planning and reactive movement to the list of reactions.