Distant Speech Recognition Using a Microphone Array Network : Processing natural speech variability for improved verbal human-computer interaction
In: IEICE transactions on information and systems, Jg. 93 (2010), Heft 9, S. 2451-2462
academicJournal
- print, 16 ref
Zugriff:
In this work, spatial information consisting of the position and orientation angle of an acoustic source is estimated by an artificial neural network (ANN). The estimated position of a speaker in an enclosed space is used to refine the estimated time delays for a delay-and-sum beamformer, thus enhancing the output signal. On the other hand, the orientation angle is used to restrict the lexicon used in the recognition phase, assuming that the speaker faces a particular direction while speaking. To compensate the effect of the transmission channel inside a short frame analysis window, a new cepstral mean normalization (CMN) method based on a Gaussian mixture model (GMM) is investigated and shows better performance than the conventional CMN for short utterances. The performance of the proposed method is evaluated through Japanese digit/command recognition experiments.
Titel: |
Distant Speech Recognition Using a Microphone Array Network : Processing natural speech variability for improved verbal human-computer interaction
|
---|---|
Autor/in / Beteiligte Person: | NAKANO, Alberto Yoshihiro ; NAKAGAWA, Seiichi ; YAMAMOTO, Kazumasa |
Link: | |
Zeitschrift: | IEICE transactions on information and systems, Jg. 93 (2010), Heft 9, S. 2451-2462 |
Veröffentlichung: | Oxford: Oxford University Press, 2010 |
Medientyp: | academicJournal |
Umfang: | print, 16 ref |
ISSN: | 0916-8532 (print) |
Schlagwort: |
|
Sonstiges: |
|