Designing a restaurant assistance Robot - Involving Engagement of Multiple-parties in conversation

    The main goal of this project is to attain accuracy in uncertain pattern of input (speech) and to develop a stable system by coupling speech and vision as a single unit. To investigate these factors, I preferred the Human-Robot Interaction as the platform and use case scenario. By analyzing the modalities of Human-Robot Interaction, engagement of multiple parties in a conversation was implemented in this project. I designed a Conversation Manager using speech and vision modules in NAO (Torso) humanoid robot.

   As far as human robot interaction is concerned, some social attributes should be taken into account such as Turn taking, decision-making, paying attention to speaker, and handling the multiple parties in the conversation. I designed the system with these capabilities into a unit termed as the “Conversation Manager”. The Conversation Manager has been designed to manage the basic modalities of interaction with multimodal tendency.

   Finally to attain the hypothesis (i.e.) by coupling speech and vision into a single system, we can attain better user ratings in a system than with speech or vision treated as a single system. For this I developed two systems: one with speech alone and another one with speech and vision combined (Multimodal system) and recorded the user ratings. 

Demonstration of multi-party conversion with NAO Torso robot

The two main objectives if this project

  • To develop a system by coupling speech and vision together in single unit as multimodal system and to attain better user ratings for that system

  • To effectively engage people in conversation and handle multiple parties in conversation
     

Face Tracking Module - NAO Torso Robot

     Face detection module is termed to be a major resource of this project. The information stated in this section is obtained from the datasheet provided by Aldebaran robotics. There are two cameras present in Nao’s head to make vision recognition and face detection. Face recognition engine in Nao is provided by OKI. Face detection module is initiated by ALFaceDetection API once Nao started to detect the faces it stores the vales in ALMemory module in the variable “facedetected”.  Before face recognition Nao has to learn the facial structure. In this process Nao maps the face and stores the necessary values and it will be retrieved again when Nao recognizes that particular face. When Nao detects the face it stores the information of the face in the form of arrays. For a face it stores two arrays, where one array is the “Timestamp” and another array is termed as “Faceinfo” so all the details of the face are stored in these arrays.  

Sound Localization Module - NAO Torso Robot

    In this test the robot's microphone captures the user's voice and triggers the head movement toward the direction of the sound. The NAO robot consists of four microphones and when the voice is received by these microphones, it is calculated for the highest perception rate of all the microphones. The head movement depends upon the direction of a microphone ,which has perceived the maximum sound signals.  

NAO's face tracking module. This video was recorded during my test session with the NAO robot. The robot tracks the face and moves its head toward the target location. In this case the target is user's face. 

NAO's sound localization module. This video was recorded during my test session with the NAO robot.  

Poster Presentation