Publication Date





As dialog systems become more capable, users tend to talk more spontaneously and less formally. Spontaneous speech includes features which convey information about the user's state. In particular, filled pauses, such as `um' and `uh', can indicate that the user is having trouble, wants more time, wants to hold the floor, or is uncertain. In this paper we present a first study of the acoustic characteristics of filled pauses in tutorial dialogs. We show that in this domain, as in other domains, filled pauses typically have flat pitch and fairly constant energy. We present a simple algorithm based on these features which detects filled pauses with 80% coverage and 67% accuracy. Analysis of the prediction failures shows that some are due to filled pauses of unusual types and related phenomena: filled pauses marking a change of state, cases where uncertainty is marked by lengthening a vowel in a word, and filled pauses which seque directly into a word.