Einzeltreffer — DigiBib

State-of-the-art automatic speech recognition (ASR) systems are usually based on hidden Markov models (HMMs) that emit cepstral-based features which are assumed to be piecewise stationary. While not really robust to noise, these features are also known to be very sensitive to auxiliary information, such as pitch, energy, rate-of-speech (ROS), etc. Attempts so far to include such auxiliary information in state-of-the-art ASR systems have often been based on simply appending these auxiliary features to the standard acoustic feature vectors. In the present paper, we investigate different approaches to incorporating this auxiliary information using dynamic Bayesian networks (DBNs) or hybrid HMM/ANNs (HMMs with artificial neural networks). These approaches are motivated by the fact that the auxiliary information is not necessarily (directly) emitted by the HMM states but, rather, carries higher-level information (e.g., speaker characteristics) that is correlated with the standard features. As implicitly done for gender modeling elsewhere, this auxiliary information then appears as a conditional variable in the emission distributions and can be hidden (except in the case of some HMM/ANNs) as its estimates become too noisy. Based on recognition experiments carried out on the OGI Numbers database (free format numbers spoken over the telephone), we show that auxiliary information that conditions the distribution of the standard features can, in certain conditions, provide more robust recognition than using auxiliary information that is appended to the standard features; this is most evident in the case of energy as an auxiliary variable in noisy speech.

Titel:	Speech recognition with auxiliary information
Autor/in / Beteiligte Person:	STEPHENSON, Todd A ; DOSS, Mathew Magimai ; BOURLARD, Hervé
Link:	Volltext View record from PASCAL Archive
Zeitschrift:	IEEE transactions on speech and audio processing, Jg. 12 (2004), Heft 3, S. 189-203
Veröffentlichung:	New York, NY: Institute of Electrical and Electronics Engineers, 2004
Medientyp:	academicJournal
Umfang:	print, 49 ref
ISSN:	1063-6676 (print)
Schlagwort:	Telecommunications Télécommunications Sciences exactes et technologie Exact sciences and technology Sciences appliquees Applied sciences Telecommunications et theorie de l'information Telecommunications and information theory Théorie de l'information, du signal et des communications Information, signal and communications theory Traitement du signal Signal processing Traitement de la parole Speech processing Analyse cepstrale Cepstral analysis Approche probabiliste Probabilistic approach Enfoque probabilista Modèle Markov variable cachée Hidden Markov models Modélisation Modeling Modelización Reconnaissance automatique Automatic recognition Reconocimiento automático Reconnaissance parole Speech recognition Reconocimiento voz Réseau Bayes Bayes network Red Bayes Réseau neuronal Neural network Red neuronal Tonie Pitch(acoustics) Altura sonida Traitement parole Tratamiento palabra Artificial neural networks (ANNs) automatic speech recognition (ASR) auxiliary information dynamic Bayesian networks (DBNs) energy hidden Markov models (HMMs) pitch rate-of-speech (ROS)
Sonstiges:	Nachgewiesen in: PASCAL Archive Sprachen: English Original Material: INIST-CNRS Document Type: Article File Description: text Language: English Author Affiliations: Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP), 1920 Martigny, Switzerland Rights: Copyright 2004 INIST-CNRS ; CC BY 4.0 ; Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS Notes: Telecommunications and information theory

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

Speech recognition with auxiliary information