Appropriate prosodic variation is valued by users

Rafael Escalante-Ruiz, University of Texas at El Paso


When humans converse they can detect and respond to their interlocutor’s fleeting emotional changes. This ability is especially important in tutoring situations, because effective tutors assess the learner’s need for help and encouragement and act appropriately at the correct times. The most effective tutors are able, in addition, to detect emotions in the learner such as uncertainty, over-confidence, and enthusiasm, and then react with emotionally appropriate behaviors. Current computer speech-based systems lack the ability to detect the user’s emotional changes and thus are unable to respond in an emotionally appropriate way. While there has been research in emotion recognition and generation in tutoring systems, it has focused mainly on words. Previous research by Hollingsed and Ward [7] showed that varying the words of the acknowledgments of a speech-based tutoring system made the system better liked by users [7, 13]; however, the users complained about the fixed prosody of the system’s acknowledgments. The present research aimed to create a rule-based model of the behavior and response prosody of a human tutor and to test its effectiveness. The first goal was to discover the different types of emotions expressed by the tutor; to do this, 339 tutor acknowledgments contained in 16 tutor-student interactions were analyzed by a group of raters to find common emotion types. The second goal was to discover how these emotions were expressed, assuming that the cues to interpretation correspond to the speaker’s intent. To do this, responses rated high for each emotion type were grouped and analyzed to find commonalities in their prosody. The third goal was to find when the tutor used each emotion. To do this, the conversations were again analyzed to find commonalities in the context-of-occurrence of each emotion. Each commonality was quantified and the result was a set of rules that described the behavior of the tutor. For example, talking with a warm tone of voice when the student is uncertain, responding authoritatively to keep control when the student is over-confident, or responding with high energy when the student is enthusiastic. To test the effectiveness of these rules, they were integrated into a Wizard-of-Oz [8] tutoring system and 21 students were asked to interact with it and compared it to a system that produced random acknowledgments. The student’s perceptions of friendliness and naturalness were higher when using the rule-based tutoring system. Although only one of the three measures were significant, the users tended to prefer the system that was appropriately emotionally responsive. This suggests that emotional modeling with prosodic variation can be effective.

Subject Area

Computer science

Recommended Citation

Escalante-Ruiz, Rafael, "Appropriate prosodic variation is valued by users" (2009). ETD Collection for University of Texas, El Paso. AAI1482102.