CS-E400209 - Special Course in Computer Science D: Conversational AI & Voice Interaction, Lectures, 21.3.2024-6.6.2024
Kurssiasetusten perusteella kurssi on päättynyt 06.06.2024 Etsi kursseja: CS-E400209
Osion kuvaus
-
PART I: FOUNDATIONS
Pre-survey (Please complete by March 20)
Please note this is a tentative course schedule & proposed readings, subject to change.
Week 1. Course Introduction (March 21)
Course topic introduction
Showcasing case studies (Children using Voice Assistants/ Chatbots using LLMs & Whisper/Healthcare)
Discussion / Q&AReadings:
Garg, Radhika, et al. "The last decade of HCI research on children and voice-based conversational agents." Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 2022.Sheuli Paul, et al. “A Novel Multimodal Situated Spoken Dialog System for Human Robot Communication in Emergency Evacuation.” Proceedings of the 21st IEEE International Conference on Machine Learning and Applications. 2022
Week 2. Human-Centered Conversational AI and Voice Interaction (March 27)
Foundation of human-centered design for Conversational AIReadings:
Murad, Christine, Heloisa Candello, and Cosmin Munteanu. "What’s The Talk on VUI Guidelines? A Meta-Analysis of Guidelines for Voice User Interface Design ." Proceedings of the 5th International Conference on Conversational User Interfaces. 2023.
Amershi, Saleema, et al. "Guidelines for human-AI interaction." Proceedings of the 2019 CHI conference on human factors in computing systems. 2019.
Pearl, Cathy. Designing voice user interfaces: Principles of conversational experiences. " O'Reilly Media, Inc.", 2016.
Week 3. Basics of NLP (April 04)
NLP basics/ Language modelingReadings:
Jurafsky, D., & Martin, J. H. (2021). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Chapter 6: Vector Semantics & Embeddings (Pages 1-9).
Devlin, Jacob, et al. “Bert: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805 (2018).
Gu, Yu, Xiang Deng, and Yu Su. “Don’t Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments.” arXiv preprint arXiv:2212.09736 (2022).Week 4. Speech Recognition and Synthesis (April 11)
Overview of Speech Recognition and Speech Synthesis
Robust Speech Recognition via Large-Scale Weak Supervision (Whisper)Readings:
Overview of Speech Recognition (Aalto wiki pages)_acoustic modeling
Robust Speech Recognition via Large-Scale Weak Supervision (Whisper)
Week 5. Design of Dialogue Systems (April 18)
Dialogue systems based on language model (rule-based + LLM-based)Readings:
Daniel Jurafsky & James H. Martin. Speech and Language Processing. Chapter 15: Chatbots & Dialogue Systems
Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues (Feng et al., ACL 2023)
Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation (Tang et al., ACL 2023)Week 6. Half-time project presentations (April 25)
PART II: APPLIED RESEARCH & DESIGN
Week 7. Voice interaction design (May 02)
Readings:
Martin Porcheron, Joel E. Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice Interfaces in Everyday Life. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18).
Eunkyung Jo, Daniel A. Epstein, Hyunhoon Jung, and Young-Ho Kim. 2023. Understanding the Benefits and Challenges of Deploying Conversational AI Leveraging Large Language Models for Public Health Intervention. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23).Week 8. Human-AI Interaction, Trust & Participatory Design (May 08)
Readings:
Sciuto, Alex, et al. "Hey Alexa, What's Up?" A Mixed-Methods Studies of In-Home Conversational Agent Usage." Proceedings of the 2018 designing interactive systems conference. 2018.Week 9. Personalities & Perceptions in Voice Assistants (May 16)
Readings:
Nass, Clifford, et al. “Anthropomorphism, agency, and ethopoeia: computers as social actors.” INTERACT’93 and CHI’93 conference companion on Human factors in computing systems. 1993.
Xu, Ying, and Mark Warschauer. “What are you talking to?: Understanding children’s perceptions of conversational agents.” Proceedings of the 2020 CHI conference on human factors in computing systems. 2020.
Sunok Lee, Minji Cho, and Sangsu Lee. 2020. What If Conversational Agents Became Invisible? Comparing Users’ Mental Models According to Physical Entity of AI Speaker. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.Week 10. Trustworthy Conversational AI (May 23)
Readings:
Building responsible and trustworthy conversational AI
Week 11. Privacy & Security in Speech (May 30)
Readings:
Backström, Tom "Introduction to Speech Processing: Chapter Security and Privacy in Speech Technology"
Bäckström, Tom "Privacy in Speech Technology". 2023Week 12. Final Project Presentations (Demos & Poster Session) (June 06)
Final Project Paper & Course Reflections (June 13)