Relevant software

From Hanson Robotics Wiki
Jump to: navigation, search

This page comprises a list of software that may be useful for Hanson Robotics efforts in the near term. The focus is on OSS but commercial software may be listed too, esp. if there is no good OSS equivalent.

Big Systems and Frameworks

  • OpenCog, an AI research toolkit / cognitive architecture with many relevant tools at various levels of maturity
  • ROS, Robot Operating System, of course
  • Blender, Blender Game Engine.
  • Gazebo
  • MORSE -- Modular OpenRobots Simulation Engine. Uses Blender to render, written in python, BSD license. Ben says: "Sami, in Ethiopia, spent 6 weeks with MORSE. It doesn't do everything the documentation says, and the developer community is small and inattentive...." Thus, Gazebo seems to be a better choice ... See also Armature creation in Morse

Purpose-Specific Software

Software that could be employed to implement subsystem components.

Saliency tracking

Tracking of (unknown) objects and perception of novel events in the visual field.

  • TLD-Predator; OSS-GPL3.0; for tracking of unknown objects in video streams, wherein the object of interest is defined by a bounding box in a single frame. TLD simultaneously tracks the object, learns its appearance and detects it whenever it appears in the video. The result is a real-time tracking that often improves over time. Proposed use: couple it with other tracking software such as Open CV or saliency tracker that draws bounding box around a region of interest, which TLD will then learn better and track better. Suggested by Ben. Not tested with robots. Links: http://libccv.org/doc/doc-tld/ (new version), http://personal.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html (link to original MATLAB version) , http://www.gnebehay.com/tld/ (C++ version)
  • LibCCV Vision processing toolkit, including TLD but also other useful stuff like Convolutional Neural Nets for object recognition.

Face tracking and feature extraction

Tracking of faces in the visual background (location, distance, headpose). Extraction of facial features (lip, cheek, eyebrow, gaze, forehead, etc.). Detection and extraction of emotional expressions (happy, sad, angry, etc.).

  • ros-indigo-face-detector: 1.0.8-0
  • Ci2CV, GPL license; extracts and tracks 68 different facial features from webcam data. Requires users to be close enough to the camera to be able to resolve such fine-grained details, i.e. typically within 2-3 meters. Provides sufficient data to determine expression and headpose; and thus the expressed emotion (angry, sad, etc.). However, that would require more work; works well in early testing (per Ean Schussler; not tested with actual robots). source code on github.
  • FaceShift, for mapping camera input from faces into facial animations in Maya, Blender, etc. (commercial product, already in use w/in Hanson Robotics to create facial animations in Blender/MORSE).
  • Cara from IMSRV; for visually tracking facial expressions, facial features, and 3D position of large numbers of faces at distances up to 8 meters; also estimates a given face's gaze, headpose, attention, age and gender. License unknown (this information has been requested of IMRSV CEO Jason Sosa), free for non-commercial and research use, or $39/mo per user for commercial use. Status:Not yet experimented with. Links: IMSRV page -- news item.
  • OpenCV has announced a $50K challenge with a focus on improving, among other things: (3) human pose estimation, (7) face recognition, (8) gesture recognition, (9) action recognition.
  • Fraunhofer SHORE™ Facial recognition, gender detection, analysis of four facial expressions. Propietery.

Sound/auditory processing

Tracking location of voice/speech. Extracting voice from background chatter. Identifying stress and emotion in the voice. Basic speech recognition.

  • OpenEar, for identifying emotion from speech. Status: not yet experimented with. License: OSS (which?)
  • HARK short for "Honda Research Institute-Japan (HRI-JP) Audition for Robots with Kyoto University". Provides sound source localization, sound source separation, acoustic feature extraction, and automatic speech recognition. This is required for localizing speech in a noisy and loud environment, e.g. for maintaining a conversation with one person while in a large crowd. Utilizes microphone arrays, e.g. of 8 microphones. License: maybe open source ???

Speech to text systems

  • Julius voice reco. See Julius on wikipedia, and Julis website. HMM-bsed tri-gram model. BSD license. Rumoured to work better than CMU Sphinx.
  • HTK. Hidden Markov Toolkit. C code. BSD-like license. Last update was April 2009. Appears to be a dead project.