ELEC-E5550 - Statistical Natural Language Processing D, Lecture, 9.1.2024-16.4.2024
Kurssiasetusten perusteella kurssi on päättynyt 16.04.2024 Etsi kursseja: ELEC-E5550
Osion kuvaus
-
Lecture schedule 2024 (tentative, topics and dates may still change):
- 09 Jan 1 Introduction & course organization / Mikko Kurimo
- 16 Jan 2 Statistical language models / Mikko Kurimo
- 23 Jan 3 Sentence level processing / Mikko Kurimo
- 30 Jan 4 Word2vec / Tiina Lindh-Knuutila
- 06 Feb 5 Neural language modeling and large language models / Mittul Singh
- 13 Feb 6 Morpheme-level processing / Mathias Creutz
- 20 Feb Exam week, no lecture
- 27 Feb 7 Speech recognition / Tamas Grosz
- 05 Mar 8 Chatbots and dialogue agents / Mikko Kurimo
- 12 Mar 9 Statistical machine translation / Jaakko Väyrynen
- 19 Mar 10 Neural machine translation / Sami Virpioja
- 26 Mar 11 LLMs in industry / Shantipriya Parida
- 02 April (spring break - no lecture)
- 09 Apr 12 Course conclusion / Mikko Kurimo
- 16 April Exam
-
Lecture 1 exercise return box Tehtävä
-What kind of Natural Language Processing applications have you used?
-What is working well? What does not work?
-What kind of future applications would be useful in your daily life?Please type or upload the notes from your breakout group discussion here, e.g. as a photo, text or pdf file to earn a lecture activity point.
-
- statistical language models and their applications
- maximum likelihood estimation of n-grams
- class-based n-grams
- the main smoothing methods for n-grams
- introduction to other statistical and neural language models
Lecture 2 in the course text books:
- Manning-Schutze: Chapter 6 pp. 191-228
- Jurafsky-Martin 3rd (online) edition: Chapter 3 pp. 37-62 (and Chapter 7 pp.131-150 for simple NNLMs)
- statistical language models and their applications
-
Lecture 2 exercise return box (Good-Turing) TehtäväWatch a video where Prof. Jurafsky (Stanford) explains Good-Turing smoothing (between 02:00 – 08:45)
- Click:
- Or search for:”Good Turing video Jurafsky”
- Answer briefly these 3 questions in a single file or text field
- Estimate the prob. of catching next any new fish species, if you already got: 5 perch, 2 pike, 1 trout, 1 zander and 1 salmon?
- Estimate the prob. of catching next a salmon?
- What may cause practical problems when applying Good-Turing smoothing for rare words in large text corpora?
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
-
Lecture 3 slides (2024) Tiedosto PDF
Part-of-Speech and Named Entity tagging
Hidden Markov models and Viterbi algorithm
Advanced tagging methods
Lecture 3 in the course text books:
- Manning - Schütze(1999). MIT Press. Chapters 9--12
- Jurafsky-Martin 3rd (online) edition: Chapters 8--9
-
Lecture 3 exercise return box (HMM and Viterbi) Tehtävä
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
Discuss with each other in breakout rooms and propose answers for these 3 questions:
1. Finish the POS tagging by Viterbi search example by hand.
- Return the values of the boxes and the final tag sequence. Either take a photo of your drawing, fill in the given ppt, or just type the values into the text box
2. Did everyone get the same tags? Is the result correct? Why / why not?
3. What are the pros and cons of HMM tagger?
All submissions, even incorrect or incomplete ones, will be awarded by one activity point. -
Lecture 4 slides (2024) Tiedosto PDF
- distributional semantics
- vector space models
- word2vec
- information retrieval
Lecture 3 in the course text books:
- Jurafsky-Martin 3rd (online) edition: Chapter 6
-
Lecture 4 exercise return box (word vectors) Tehtävä
- What are the benefits of distributional semantics?
- What kind of problems there might be?
- What kind of applications can you come up with using these models?
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
-
Lecture 5 slides (2024) draft Tiedosto PDF
NNLMs are discussed in Chapter 7 in the 2020 online version of Jurafsky - Martin book.
-
Lecture 5 exercise return box: Self-attention Tehtävä
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
-
Lecture 6 slides (2024) Tiedosto PDF
Not all of these slides will be discussed during the lecture, but everything is useful reading, still.
NOTE: There is no text book yet that would cover this stuff well, so read the slides carefully!
-
Lecture 7 slides (2023) Tiedosto PDF
Hybrid DNN-HMM architecture
End-to-end architectures
Applications
Lecture 5 in the course text books:
- Jurafsky-Martin 3rd (online) edition: Chapters 26
-
Lecture 7 exercise return box TehtäväCalculate the WER and CER metrics by comparing the ASR hypthesis to thehuman transcript!
- ASR hyp 1: he then appeared in the episode smackdown
- Human transcript: he then appeared on an episode of smackdown
- ASR hyp 2: he than apeared on a episode off smacdown
Which metric measures the true accuracy better in your opinion and why?All submissions, even incorrect or incomplete ones, will be awarded one activity point.
-
Lecture 8 slides (2024) Tiedosto PDF
Rule-based and Corpus-based chatbots
Retrieval and Machine Learning based chatbots
Evaluation of chatbots
More information in the course text books:
- Jurafsky-Martin 3rd (online) edition (February 2024): Chapter 15
- Jurafsky-Martin 3rd (online) edition (February 2024): Chapter 15
-
Lecture 8 exercise return box Tehtävä
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
- Try ELIZA and/or PARRY. When do they fail? How to improve?
Eliza: https://www.eclecticenergies.com/ego/eliza
Eliza: http://psych.fullerton.edu/mbirnbaum/psych101/Eliza.htm
Parry: https://www.botlibre.com/browse?id=857177
Parry: https://www.chatbots.org/chatbot/parry/ - Try more chatbots or dialogue agents, where are they good and bad?
https://chat.openai.com/chat
https://convai.huggingface.co/
https://www.chatbots.org/
- Try ELIZA and/or PARRY. When do they fail? How to improve?
-
Lecture 9 slides (2024) Tiedosto PDF
Lecture based on:
- Chapter 13.2-13.4 in Manning & Schutze
- Chapter 21 in the OLD printed Jurafsky & Martin: Speech and Language Processing
- Chapter 11 in the 2020 Jurafsky & Martin: Speech and Language Processing
- Chapter 13 in the 2024 Jurafsky & Martin: Speech and Language Processing
- Koehn: "Statistical Machine Translation", http://www.statmt.org/book/
-
Lecture 10 slides (2024) Tiedosto PDF
-
Lecture 11 slides (2024) LLMs in industry Tiedosto PDF
- Overview
○ Generative AI
○ Language Model
○ Large Language Models
● LLMs in Industries
● Use Cases
● Case Study
- Overview
-
Lecture 12 slides (2024) Conclusion Tiedosto PDF
- The contents of the course
- Info about passing the course and grading
- Info about the exam
- Quick recap of previous lectures