Reading materials

Site: MyCourses
Course: ELEC-E5510 - Speech Recognition D, Lecture, 3.11.2021-17.12.2021
Book: Reading materials
Printed by: Guest user
Date: Tuesday, 3 December 2024, 4:29 AM

Description

Collection of reading materials by topic.

1. Course readings

Good reading material for the ASR course:

2. Material on project topics

Reading materials for project work organized by topic.

2.3. Language recognition

Examples of state-of-the-art models:


2.5. Language Modeling for Indian Languages

2.6. Automatic detection of alcohol intoxication

  • Wang, W. Y., Biadsy, F., Rosenberg, A., & Hirschberg, J. (2013). Automatic detection of speaker state: Lexical, prosodic, and phonetic approaches to level-of-interest and intoxication classification. Computer Speech & Language27(1), 168-189.
  • Schuller, B., Steidl, S., Batliner, A., Schiel, F., Krajewski, J., Weninger, F., & Eyben, F. (2014). Medium-term speaker states—A review on intoxication, sleepiness and the first challengeComputer Speech & Language28(2), 346-374.
  • Tools: OpenSMILE, Anaconda?
  • Materials: Alcohol Language Corpus

2.7. Speech adaptation

  • X. Zhu, G. T. Beauregard, and L. L. Wyse: Real-time signal estimation from modified short-time fourier transform magnitude spectra. IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 5, pp. 1645–1653, July 2007.
  • H. K. Kathania, S. Shahnawazuddin, W. Ahmad, N. Adiga, S. K. Jana, and A. B. Samaddar: Improving children’s speech recognition through time scale modification based speaking rate adaptation. in 2018 International Conference on Signal Processing and Communications (SPCOM), July 2018
  • H. K. Kathania, W. Ahmad, S. Shahnawazuddin, and A. B. Samaddar: Explicit pitch mapping for improved children’s speech recognition. Circuits, Systems, and Signal Processing, September 2017.

2.8. Speaker adaptation


2.9. Deep denoising autoencoder for speech enhancement

2.10. Native language recognition

2.11. Chatbots


2.12. Comparing subword language models

2.13. Speaker recognition


2.15. DNNs for acoustic modeling


2.16. Connectionist temporal classification

2.18. Data augmentation

2.20. Speech compression

  • http://www.data-compression.com/speech.html
  • B.T.Lilly and K.K. Paliwal, Effect of speech coders on speech recognition performance, Griffith University, Australia.
  • Juan M. Huerta and Richard M. Stern, Speech recognition from GSM codec parameters, Carnegie Mellon University, USA.
  • Dan Chazan, Gilad Cohen, Ron Hoory and Meir Zibulski, Low bit rate speech compression for playback in speech recognition systems, in Proc. Eur. Signal Processing Conf
  • L. Besacier, C. Bergamini, D. Vaufreydaz and E. Castelli, The effect of speech and audio compression on speech recognition performance, In: Proc. IEEE Multimedia Signal Processing Workshop

2.21. Speech recognition in noise


2.22. Confidence measures for ASR

  • H. Jiang. Condence measures for speech recognition: A survey. Speech communication, 45(4):455{470, 2005.
  • T. Schaaf and T. Kemp. Condence measures for spontaneous speech recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP-97, pages 875-878, IEEE, 1997.
  • JGA Dolfing and A. Wendemuth. Combination of condence measures in isolated word recognition. In Proceedings of the International Conference on
    Spoken Language Processing, pages 3237-3240, 1998.
  • F. Wessel, R. Schluter, K. Macherey, and H. Ney. Condence measures for large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9(3):288-298, 2001.
  • T. Fabian. Condence Measurement Techniques in Automatic Speech Recognition and Dialog Management. Der Andere Verlag, 2008.

2.23. Features for ASR

  • Hynek Hermanskyn Should recognizers have ears? Speech Communication 25 (1998) 3-27.
  • Hynek Hermansky's original article in Journal of Acoustic Society of America  87 (4), 1990.

2.24. Multichannel ASR

2.25. Pronunciation model adaptation

  • Bisani, M., & Ney, H. (2008). Joint-sequence models for grapheme-to-phoneme conversionSpeech communication50(5), 434-451.
  • Maas, A., Xie, Z., Jurafsky, D., & Ng, A. (2015). Lexicon-free conversational speech recognition with neural networks. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 345-354).
  • Tools: Sequitar G2P
  • Materials: WSJ_5k (from exercise 4)

2.26. Audio indexing

  • Steve Renals, Dave Abberley, David Kirby, Tony Robinson: Indexing and
    retrieval of broadcast news.
     Speech CommunicationVolume 32, Issues 1-2,
    September 2000, Pages 5-20.
  • John S. Garofolo, Cedric GP Auzanne, Ellen M. Voorhees: The TREC Spoken Document Retrieval Track: A Success Story. In 8th Text Retrieval
    Conference, pages 107--129, Washington, 2000.
  • Chelba, C.; Hazen, T.J.; Saraclar, M.: Retrieval and browsing of spoken content. Signal Processing Magazine, IEEE , vol.25, no.3, pp.39-49, May
    2008.