While many aspects of speech processing, including speech recognition and speech synthesis, have seen enormous advances over the past few years, advances in dialog have been more modest. This difference is largely attributable to the lack of resources that can support machine learning of dialog models and dialog phenomena. The research community accordingly needs a corpus of spoken dialogs with quality annotations every 100 milliseconds or so. We envisage a large and diverse collection: on the order of fifty hours of data, representing hundreds of speakers and many genres, with every instant labeled for interaction quality by one or more human judges. To make it maximally useful, its design will be a community effort.
This technical report is an edited version of a National Science Foundation proposal, submitted to the CISE Community Research Infrastructure Program in February 2019.