Kurssi: ELEC-E5550 - Statistical Natural Language Processing D, Lecture, 10.1.2023-18.4.2023, Aihe: Lectures

Osion kuvaus

Lectures
Lecture schedule 2023:
10 jan 1 Introduction & Project groups / Mikko Kurimo
17 jan 2 Statistical language models / Mikko Kurimo
24 jan 3 Sentence level processing / Mikko Kurimo
31 jan 4 Word2vec / Tiina Lindh-Knuutila
07 feb 5 Neural language modeling and large language models / Mittul Singh
14 feb 6 Morpheme-level processing / Mathias Creutz
21 feb Exam week, no lecture
28 feb 7 Speech recognition / Tamas Grosz
07 mar 8 Chatbots and dialogue agents / Mikko Kurimo
14 mar 9 Statistical machine translation / Jaakko Väyrynen
21 mar 10 Neural machine translation / Stig-Arne Grönroos
28 mar 11 LLM discussion and course conclusion / Aku Rouhe and Mikko Kurimo
04 april (no lecture)
18 april Exam
Below you can find slides of 2022 lectures until they are substituted by 2023 ones as the course progresses. Lecture recordings will also be added here.
- Valitse aktiviteetti 10 Jan 2023 1: Introduction & Course content / Mik...
  
  10 Jan 2023 1: Introduction & Course content / Mikko Kurimo
- Valitse aktiviteetti Lecture 1 slides (2023)
  Lecture 1 slides (2023) Tiedosto PDF
  
  Introduction to Statistical Natural Language Processing
  Course practicalities in 2023
  Lecture 1 in the course text books:
  Manning-Schutze: Chapters 1-2 pp. 1-80
- Valitse aktiviteetti Lecture 1 exercise return box
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  
  Lecture 1 exercise return box Tehtävä
  
  -What kind of Natural Language Processing applications have you used?
  -What is working well? What does not work?
  -What kind of future applications would be useful in your daily life?
  Please type or upload the notes from your breakout group discussion here, e.g. as a photo, text or pdf file to earn a lecture activity point.
- Valitse aktiviteetti 17 Jan 2023 2: Statistical language models / Mikko...
  
  17 Jan 2023 2: Statistical language models / Mikko Kurimo
- Valitse aktiviteetti Lecture 2 slides (2023) Revised
  Lecture 2 slides (2023) Revised Tiedosto PDF
  
  statistical language models and their applications
  maximum likelihood estimation of n-grams
  class-based n-grams
  the main smoothing methods for n-grams
  introduction to other statistical and neural language models
  Lecture 2 in the course text books:
  Manning-Schutze: Chapter 6 pp. 191-228
  Jurafsky-Martin 3rd (online) edition: Chapter 3 pp. 37-62 (and Chapter 7 pp.131-150 for simple NNLMs)
- Valitse aktiviteetti Lecture 2 A exercise return box (applications)
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  
  Lecture 2 A exercise return box (applications) Tehtävä
  
  List as many potential applications for statistical language models as you can!
  Typically they are tasks where you need the probability or to find the most probable word or sentence given some background information
  Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
- Valitse aktiviteetti Lecture 2 B exercise return box (Good-Turing)
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  Lecture 2 B exercise return box (Good-Turing) Tehtävä
  
  Watch a video where Prof. Jurafsky (Stanford) explains Good-Turing smoothing (between 02:00 – 08:45)
  Click:
  Or search for:”Good Turing video Jurafsky”
  Answer briefly these 3 questions in a single file or text field
  Estimate the prob. of catching next any new fish species, if you already got: 5 perch, 2 pike, 1 trout, 1 zander and 1 salmon?
  Estimate the prob. of catching next a salmon?
  What may cause practical problems when applying Good-Turing smoothing for rare words in large text corpora?
  Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
- Valitse aktiviteetti 24 Jan 2023 3: Sentence level processing / Mikko K...
  24 Jan 2023 3: Sentence level processing / Mikko Kurimo
- Valitse aktiviteetti Lecture 3 slides (2023) Final
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  Lecture 3 slides (2023) Final Tiedosto PDF
  
  Part-of-Speech and Named Entity tagging
  Hidden Markov models and Viterbi algorithm
  Advanced tagging methods
  
  Lecture 3 in the course text books:
  Manning - Schütze(1999). MIT Press. Chapters 9--12
  Jurafsky-Martin 3rd (online) edition: Chapters 8--9
- Valitse aktiviteetti Lecture 3 exercise return box (HMM and Viterbi)
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  
  Lecture 3 exercise return box (HMM and Viterbi) Tehtävä
  
  Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
  Discuss with each other in breakout rooms and propose answers for these 3 questions:
  
  1. Finish the POS tagging by Viterbi search example by hand.
  - Return the values of the boxes and the final tag sequence. Either take a photo of your drawing, fill in the given ppt, or just type the values into the text box
  2. Did everyone get the same tags? Is the result correct? Why / why not?
  3. What are the pros and cons of HMM tagger?
  
  All submissions, even incorrect or incomplete ones, will be awarded by one activity point.
- Valitse aktiviteetti 31 Jan 2023 4: Word2vec / Tiina Lindh-Knuutila
  31 Jan 2023 4: Word2vec / Tiina Lindh-Knuutila
- Valitse aktiviteetti Lecture 4 slides (2023)
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  Lecture 4 slides (2023) Tiedosto PDF
  
  distributional semantics
  vector space models
  word2vec
  information retrieval
  Lecture 3 in the course text books:
  Jurafsky-Martin 3rd (online) edition: Chapter 6
- Valitse aktiviteetti Lecture 4 exercise return box (word vectors)
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  Lecture 4 exercise return box (word vectors) Tehtävä
  
  What are the benefits of distributional semantics?
  What kind of problems there might be?
  What kind of applications can you come up with using these models?
  Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
- Valitse aktiviteetti 07 feb 2023 5 Neural language modeling and Large l...
  
  07 feb 2023 5 Neural language modeling and Large language models / Mittul Singh
- Valitse aktiviteetti Lecture 5 slides (2023) final
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  
  Lecture 5 slides (2023) final Tiedosto PDF
  
  NNLMs are discussed in Chapter 7 in the 2020 online version of Jurafsky - Martin book.
- Valitse aktiviteetti Lecture 5 exercise return box: Self-attention
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  
  Lecture 5 exercise return box: Self-attention Tehtävä
  
  Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
- Valitse aktiviteetti 14 feb 2023 6 Morpheme-level processing / Mathias ...
  
  14 feb 2023 6 Morpheme-level processing / Mathias Creutz
- Valitse aktiviteetti Lecture 6 slides (2023)
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  
  Lecture 6 slides (2023) Tiedosto PDF
  
  Not all of these slides will be discussed during the lecture, but everything is useful reading, still.
  NOTE: There is no text book yet that would cover this stuff well, so read the slides carefully!
- Valitse aktiviteetti Lecture 6 exercise return box
  
  Lecture 6 exercise return box Tehtävä
  
  Opiskelijoiden täytyy
  
  Palauta
  
  Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
- Valitse aktiviteetti 21 Feb 2023: Exam week, no lecture
  21 Feb 2023: Exam week, no lecture
- Valitse aktiviteetti 28 feb 2023 7 Speech recognition / Tamas Grosz
  
  28 feb 2023 7 Speech recognition / Tamas Grosz
- Valitse aktiviteetti Lecture 7 slides (2023)
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  Lecture 7 slides (2023) Tiedosto PDF
  
  Hybrid DNN-HMM architecture
  End-to-end architectures
  Applications
  
  Lecture 5 in the course text books:
  Jurafsky-Martin 3rd (online) edition: Chapters 26
- Valitse aktiviteetti Lecture 7 exercise return box
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  
  Lecture 7 exercise return box Tehtävä
  
  Calculate the WER and CER metrics by comparing the ASR hyp to the
  human transcript!
  
  ASR hyp: he then appeared in the episode smackdown
  Human transcript: he then appeared on an episode of smackdown
  
  Which metric measures the true accuracy better in your opinion and why?
  All submissions, even incorrect or incomplete ones, will be awarded one activity point.
- Valitse aktiviteetti 07 Mar 2023 8: Chatbots and dialogue agents / Mikk...
  07 Mar 2023 8: Chatbots and dialogue agents / Mikko Kurimo
- Valitse aktiviteetti Lecture 8 slides (2023)
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  Lecture 8 slides (2023) Tiedosto PDF
  
  Rule-based and Corpus-based chatbots
  Retrieval and Machine Learning based chatbots
  Evaluation of chatbots
  
  More information in the course text books:
  Jurafsky-Martin 3rd (online) edition: Chapter 24
- Valitse aktiviteetti Lecture 8 exercise return box
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  Lecture 8 exercise return box Tehtävä
  
  Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
  Try ELIZA, When does it fail? How to improve it? https://www.eclecticenergies.com/ego/eliza http://psych.fullerton.edu/mbirnbaum/psych101/Eliza.htm
  Try PARRY, When does it fail? How to improve it? https://www.chatbots.org/chatbot/parry/ https://www.botlibre.com/browse?id=857177
  Try more chatbots or dialogue agents, How to automatically evaluate them? https://convai.huggingface.co/ https://www.chatbots.org/ https://chat.openai.com/chat
  What ethical issues do chatbots have? Any suggestions how to solve them?
- Valitse aktiviteetti 14 March 2023 9: Statistical machine translation /...
  14 March 2023 9: Statistical machine translation / Jaakko Väyrynen
- Valitse aktiviteetti Lecture 9 slides (2023)
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  Lecture 9 slides (2023) Tiedosto PDF
  
  Lecture based on:
  Chapter 13.2-13.4 in Manning & Schutze
  Chapter 21 in the OLD Jurafsky & Martin: Speech and Language Processing
  Chapter 11 in the NEW Jurafsky & Martin: Speech and Language Processing
  Koehn: "Statistical Machine Translation", http://www.statmt.org/book/
- Valitse aktiviteetti Lecture 9 exercise return box
  Lecture 9 exercise return box Tehtävä
  
  Opiskelijoiden täytyy
  
  Palauta
  
  Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
  Consider different levels of language and different kinds of source-target pairs:
  What would be easy/hard to translate with MT?
  Have you seen failed/succesful usage or applications of MT?
- Valitse aktiviteetti 21 Mar 2023 10: Neural machine translation / Stig-...
  21 Mar 2023 10: Neural machine translation / Stig-Arne Grönroos
- Valitse aktiviteetti Lecture 10 slides (2023)
  
  Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
  
  Lecture 10 slides (2023) Tiedosto PDF
  
  NMT is discussed in Chapter 11 in the 2020 online version of Jurafsky - Martin book.
- Valitse aktiviteetti Lecture 10 exercise return box
  
  Lecture 10 exercise return box Tehtävä
  
  Opiskelijoiden täytyy
  
  Palauta
- Valitse aktiviteetti 28 Mar 2023 11: LLM discussion and course conclusi...
  28 Mar 2023 11: LLM discussion and course conclusion / Aku Rouhe and Mikko Kurimo
- Valitse aktiviteetti Lecture 11 slides (2023) Conclusion
  Lecture 11 slides (2023) Conclusion Tiedosto PDF
  
  The contents of the course
  Info about passing the course and grading
  Info about the exam
  Quick recap of previous lectures

ELEC-E5550 - Statistical Natural Language Processing D, Lecture, 10.1.2023-18.4.2023

Osion kuvaus

Lectures

Opiskelijoille

Opettajille

Palvelusta