2. Material on project topics
2.17. Attention-based ASR
- Chan, W., Jaitly, N., Le, Q. V., & Vinyals, O. (2015). Listen, attend and spell. arXiv preprint arXiv:1508.01211.
- Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., & Bengio, Y. (2015). Attention-based models for speech recognition. arXiv preprint arXiv:1506.07503.
- Chiu, C. C., Sainath, T. N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., ... & Bacchiani, M. (2018). State-of-the-art speech recognition with sequence-to-sequence models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4774-4778). IEEE.
- Lüscher, C., Beck, E., Irie, K., Kitza, M., Michel, W., Zeyer, A., ... & Ney, H. (2019). RWTH ASR Systems for LibriSpeech: Hybrid vs Attention--w/o Data Augmentation. arXiv preprint arXiv:1905.03072.
- Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., ... & Ochiai, T. (2018). Espnet: End-to-end speech processing toolkit. arXiv preprint arXiv:1804.00015. (The Espnet toolkit is another toolkit you could use besides Speechbrain.)
Ravanelli, M., Parcollet, T.,
Plantinga, P., Rouhe, A., Cornell, S., Lugosch, L., ... & Bengio, Y.
(2021). SpeechBrain: A general-purpose speech toolkit. arXiv preprint arXiv:2106.04624.