ELEC-E5550 - Statistical Natural Language Processing D, Lecture, 10.1.2023-18.4.2023
This course space end date is set to 18.04.2023 Search Courses: ELEC-E5550
Topic outline
-
Lecture schedule 2023:
- 10 jan 1 Introduction & Project groups / Mikko Kurimo
- 17 jan 2 Statistical language models / Mikko Kurimo
- 24 jan 3 Sentence level processing / Mikko Kurimo
- 31 jan 4 Word2vec / Tiina Lindh-Knuutila
- 07 feb 5 Neural language modeling and large language models / Mittul Singh
- 14 feb 6 Morpheme-level processing / Mathias Creutz
- 21 feb Exam week, no lecture
- 28 feb 7 Speech recognition / Tamas Grosz
- 07 mar 8 Chatbots and dialogue agents / Mikko Kurimo
- 14 mar 9 Statistical machine translation / Jaakko Väyrynen
- 21 mar 10 Neural machine translation / Stig-Arne Grönroos
- 28 mar 11 LLM discussion and course conclusion / Aku Rouhe and Mikko Kurimo
- 04 april (no lecture)
- 18 april Exam
Below you can find slides of 2022 lectures until they are substituted by 2023 ones as the course progresses. Lecture recordings will also be added here.
-
Lecture 1 exercise return box Assignment
-What kind of Natural Language Processing applications have you used?
-What is working well? What does not work?
-What kind of future applications would be useful in your daily life?Please type or upload the notes from your breakout group discussion here, e.g. as a photo, text or pdf file to earn a lecture activity point.
-
- statistical language models and their applications
- maximum likelihood estimation of n-grams
- class-based n-grams
- the main smoothing methods for n-grams
- introduction to other statistical and neural language models
Lecture 2 in the course text books:
- Manning-Schutze: Chapter 6 pp. 191-228
- Jurafsky-Martin 3rd (online) edition: Chapter 3 pp. 37-62 (and Chapter 7 pp.131-150 for simple NNLMs)
- statistical language models and their applications
-
Lecture 2 A exercise return box (applications) Assignment
List as many potential applications for statistical language models as you can!
Typically they are tasks where you need the probability or to find the most probable word or sentence given some background informationPlease type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
-
Lecture 2 B exercise return box (Good-Turing) AssignmentWatch a video where Prof. Jurafsky (Stanford) explains Good-Turing smoothing (between 02:00 – 08:45)
- Click:
- Or search for:”Good Turing video Jurafsky”
- Answer briefly these 3 questions in a single file or text field
- Estimate the prob. of catching next any new fish species, if you already got: 5 perch, 2 pike, 1 trout, 1 zander and 1 salmon?
- Estimate the prob. of catching next a salmon?
- What may cause practical problems when applying Good-Turing smoothing for rare words in large text corpora?
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
-
Lecture 3 slides (2023) Final File PDF
Part-of-Speech and Named Entity tagging
Hidden Markov models and Viterbi algorithm
Advanced tagging methods
Lecture 3 in the course text books:
- Manning - Schütze(1999). MIT Press. Chapters 9--12
- Jurafsky-Martin 3rd (online) edition: Chapters 8--9
-
Lecture 3 exercise return box (HMM and Viterbi) Assignment
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
Discuss with each other in breakout rooms and propose answers for these 3 questions:
1. Finish the POS tagging by Viterbi search example by hand.
- Return the values of the boxes and the final tag sequence. Either take a photo of your drawing, fill in the given ppt, or just type the values into the text box
2. Did everyone get the same tags? Is the result correct? Why / why not?
3. What are the pros and cons of HMM tagger?
All submissions, even incorrect or incomplete ones, will be awarded by one activity point. -
Lecture 4 slides (2023) File PDF
- distributional semantics
- vector space models
- word2vec
- information retrieval
Lecture 3 in the course text books:
- Jurafsky-Martin 3rd (online) edition: Chapter 6
-
Lecture 4 exercise return box (word vectors) Assignment
- What are the benefits of distributional semantics?
- What kind of problems there might be?
- What kind of applications can you come up with using these models?
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
-
Lecture 5 slides (2023) final File PDF
NNLMs are discussed in Chapter 7 in the 2020 online version of Jurafsky - Martin book.
-
Lecture 5 exercise return box: Self-attention Assignment
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
-
Lecture 6 slides (2023) File PDF
Not all of these slides will be discussed during the lecture, but everything is useful reading, still.
NOTE: There is no text book yet that would cover this stuff well, so read the slides carefully!
-
Lecture 7 slides (2023) File PDF
Hybrid DNN-HMM architecture
End-to-end architectures
Applications
Lecture 5 in the course text books:
- Jurafsky-Martin 3rd (online) edition: Chapters 26
-
Lecture 7 exercise return box AssignmentCalculate the WER and CER metrics by comparing the ASR hyp to thehuman transcript!ASR hyp: he then appeared in the episode smackdownHuman transcript: he then appeared on an episode of smackdownWhich metric measures the true accuracy better in your opinion and why?
All submissions, even incorrect or incomplete ones, will be awarded one activity point.
-
Lecture 8 slides (2023) File PDF
Rule-based and Corpus-based chatbots
Retrieval and Machine Learning based chatbots
Evaluation of chatbots
More information in the course text books:
- Jurafsky-Martin 3rd (online) edition: Chapter 24
-
Lecture 8 exercise return box Assignment
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
- Try ELIZA, When does it fail? How to improve it? https://www.eclecticenergies.com/ego/eliza http://psych.fullerton.edu/mbirnbaum/psych101/Eliza.htm
- Try PARRY, When does it fail? How to improve it? https://www.chatbots.org/chatbot/parry/ https://www.botlibre.com/browse?id=857177
- Try more chatbots or dialogue agents, How to automatically evaluate them? https://convai.huggingface.co/ https://www.chatbots.org/ https://chat.openai.com/chat
- What ethical issues do chatbots have? Any suggestions how to solve them?
-
Lecture 9 slides (2023) File PDF
Lecture based on:
- Chapter 13.2-13.4 in Manning & Schutze
- Chapter 21 in the OLD Jurafsky & Martin: Speech and Language Processing
- Chapter 11 in the NEW Jurafsky & Martin: Speech and Language Processing
- Koehn: "Statistical Machine Translation", http://www.statmt.org/book/
-
Lecture 10 slides (2023) File PDF
NMT is discussed in Chapter 11 in the 2020 online version of Jurafsky - Martin book.