Co-located Events

COLIPS Invited NLP Special Session

Speaker: Ville Hautamäki

Title: Clustering based models in speaker recognition

Abstract:

Maximum a posteriori adapted Gaussian mixture model (GMM-MAP) is widely used in speaker verification. GMMs have three sets of parameters to be adapted: means, covariances and weights. However, practice has shown that it is sufficient to adapt the means only. Motivated by this, we formulate maximum a posteriori vector quantization (VQ-MAP) procedure which stores and adapts the mean vectors (centroids) only.

In this study, we extensively compare GMM-MAP and VQ-MAP classifiers on NIST 2005, 2006 and 2008 SRE corpora, while having a standard discriminative classifier (GLDS-SVM) as a point of reference. We focus on parameter setting for N-top scoring, model order, and performance for different amounts of training data. We show that the reason for the inferior performance of VQ when compared to GMM in speaker verification has been the lack of adaptation in the former. With adaptation, VQ gives comparable accuracy with GMM-MAP with simpler implementation and faster adaptation.

The most interesting result, against a general belief, is that GMM-UBM yields better results for short segments whereas VQ-UBM is good for long utterances. The results also suggest that maximum likelihood Speaker verification training of the UBM is sub-optimal, and hence, alternative ways to train the UBM should be considered.


Ville Hautamäki received the MSc degree in Computer Science from the University of Joensuu, Finland in 2005. He received the PhD degree in Computer Science from the same university in 2008, with the topic of "Improving Pattern Recognition Methods for Speaker Recognition". Currently he is a post doctoral researcher on a scholarship from Academy of Finland attached to the Institute for Infocomm Research, A*STAR, Singapore. His current research interests are cluster analysis, speaker recognition and speaker diarization. His web-pages is http://cs.joensuu.fi/~villeh

Speaker: Andreea Niculescu

Title: Multimodal conversational interfaces in HCI - what influences their quality assessment?

Abstract:

At present, multimodal conversational interfaces enable users to communicate with computer systems using a wide range of input/output modalities, such as speech, text, touch, etc. There is therefore a growing need to find not only reliable standardized evaluation methods for such interfaces but also to determine those factors having impact on their quality assessment. Research has shown that recognition accuracy is not the limiting factor for the widespread use of such interfaces - it is the quality of the overall dialogue design and the adequate match of the application to its context. Also additional factors such as the visual and acoustical interface appearance, degree of fun and comfort play an important role in the quality assessment of the interface. Based on a few case studies, this talk will analyze the way humans interact with conversational interfaces and highlight some important factors that contribute to a more positive quality assessment.


Andreea Niculescu is a Ph.D. student at the University of Twente (Netherlands), in the Human Media Interaction Group. She holds a bachelor degree in computer science and a master degree in communication studies, both from the Ruhr-University Bochum (Germany). She also holds a bachelor degree in German and Romanic Languages from the University of Bucharest (Romania). Her research interests are in the field of Human Computer interaction (HCI) and include multimodal conversational interfaces, user experience studies, speech technology and usability prediction models.
Website: http://wwwhome.cs.utwente.nl/~niculescuai/

 

Chinese and Oriental Languages Information Processing Society IEEE Computer Society

IALP 2009 is jointly organized by the Chinese and Oriental Languages Information Processing Society (COLIPS) and IEEE Singapore Computer Chapter (IEEE Singapore CC)

Images courtesy of Singpore Tourism Board. Layout by Free CSS Templates.