Dépôt numérique

Modélisation de la langue pour la reconnaissance vocale Incorporant des modèles probabilistes Sujet.

Haidar, Akmal (2014). Modélisation de la langue pour la reconnaissance vocale Incorporant des modèles probabilistes Sujet. Thèse. Québec, Université du Québec, Institut national de la recherche scientifique, Doctorat en télécommunications, 188 p.

Télécharger (1MB) | Prévisualisation


The use of language models in automatic speech recognition helps to find the best next word, to increase recognition accuracy. Statistical language models (LMs) are trained on a large collection of text to automatically determine the model’s parameters. LMs encode linguistic knowledge in such a way that can be useful to process human language. Generally, a LM exploits the immediate past information only. Such models can capture short-length dependencies between words very well. However, in any language for communication, words have both semantic and syntactic importance. Most speech recognition systems are designed for a specific task and use language models that are trained from a large amount of text that is appropriate for this task. A task-specific language model will not do well for a different domain or topic. A perfect language model for speech recognition on general language is still far away. However, language models that are trained from a diverse style of language can do well, but are not perfectly suited for a certain domain. In this research, we introduce new language modeling approaches for automatic speech recognition (ASR) systems incorporating probabilistic topic models. In the first part of the thesis, we propose three approaches for LM adaptation by clustering the background training information into different topics incorporating latent Dirichlet allocation (LDA). In the first approach, a hard-clustering method is applied into LDA training to form different topics. We propose an n-gram weighting technique to form an adapted model by combining the topic models. The models are then further modified by using latent semantic marginals (LSM) using a minimum discriminant information (MDI) technique. In the second approach, we introduce a clustering technique where the background n-grams are directed into different topics using a fraction of the global count of the n-grams. Here, the probabilities of the n-grams for different topics are multiplied by the global counts of the n-grams and are used as the counts of the respective topics. We also introduce a weighting technique that outperforms the n-gram weighting technique. In the third approach, we propose another clustering technique where the topic probabilities of the training documents are multiplied by the document-based n-gram counts and the products are summed up for all training documents; thereby the background n-grams are assigned into different topics In the second part of the thesis, we propose five topic modeling algorithms that are trained by using the expectation-maximization (EM) algorithm. A context-based probabilistic latent semantic analysis (CPLSA) model is proposed to overcome the limitation of a recently proposed unsmoothed bigram PLSA (UBPLSA) model. The CPLSA model can compute the correct topic probabilities of the unseen test document as it can compute all the bigram probabilities in the training phase, and thereby yields the proper bigram model for the unseen document. The CPLSA model is extended to a document-based CPLSA (DCPLSA) model where the document-based word probabilities for topics are trained. To propose the DCPLSA model, we are motivated by the fact that the words in different documents can be used to describe different topics. An interpolated latent Dirichlet language model (ILDLM) is proposed to incorporate long-range semantic information by interpolating distance-based n-grams into a recently proposed LDLM. Similar to the LDLM and ILDLM models, we propose enhanced PLSA (EPLSA) and interpolated EPLSA (IEPLSA) models in the PLSA framework. In the final part of the thesis, we propose two new Dirichlet class language models that are trained by using the variational Bayesian EM (VB-EM) algorithm to incorporate longrange information into a recently proposed Dirichlet class language model (DCLM). The latent variable of DCLM represents the class information of an n-gram event rather than the topic in LDA. We introduce an interpolated DCLM (IDCLM) where the class information is exploited from (n-1) previous history words of the n-grams through Dirichlet distribution using interpolated distanced n-grams. A document-based DCLM (DDCLM) is proposed where the DCLM is trained for each document using document-based n-gram events. In all the above approaches, the adapted models are interpolated with the background model to capture the local lexical regularities. We perform experiments using the ’87-89 Wall Street Journal (WSJ) corpus incorporating a multi-pass continuous speech recognition (CSR) system. In the first pass, we use the background n-gram language model for lattice generation and then we apply the LM adaptation approaches for lattice rescoring in the second pass.

Type de document: Thèse
Directeur de mémoire/thèse: O’Shaughnessy, Douglas
Mots-clés libres: reconnaissance vocale; modèle de langage; allocation de Dirichlet latente; latent semantic marginals
Centre: Centre Énergie Matériaux Télécommunications
Date de dépôt: 17 mars 2015 20:29
Dernière modification: 20 nov. 2015 15:44
URI: http://espace.inrs.ca/id/eprint/2624

Actions (Identification requise)

Modifier la notice Modifier la notice