Kurs: ELEC-E5550 - Statistical Natural Language Processing D, Lecture, 9.1.2024-16.4.2024, Sektion: Lectures

Moodle Högskolor Kursrespons Länkar till tjänster Intelliboard

This course space end date is set to 16.04.2024 Search Courses: ELEC-E5550

Översikt

Lectures
Lecture schedule 2024 (tentative, topics and dates may still change):
09 Jan 1 Introduction & course organization / Mikko Kurimo
16 Jan 2 Statistical language models / Mikko Kurimo
23 Jan 3 Sentence level processing / Mikko Kurimo
30 Jan 4 Word2vec / Tiina Lindh-Knuutila
06 Feb 5 Neural language modeling and large language models / Mittul Singh
13 Feb 6 Morpheme-level processing / Mathias Creutz
20 Feb Exam week, no lecture
27 Feb 7 Speech recognition / Tamas Grosz
05 Mar 8 Chatbots and dialogue agents / Mikko Kurimo
12 Mar 9 Statistical machine translation / Jaakko Väyrynen
19 Mar 10 Neural machine translation / Sami Virpioja
26 Mar 11 LLMs in industry / Shantipriya Parida
02 April (spring break - no lecture)
09 Apr 12 Course conclusion / Mikko Kurimo
16 April Exam
Below you can find slides of 2023 lectures until they are substituted by 2024 ones as the course progresses. Lecture recordings will also be added here.
- Välj aktivitet 1: Introduction & Course content / Mikko Kurimo
  
  1: Introduction & Course content / Mikko Kurimo
- Välj aktivitet Lecture 1 slides (2024)
  Lecture 1 slides (2024) Fil PDF
  
  Introduction to Statistical Natural Language Processing
  Course practicalities in 2024
  Lecture 1 in the course text books:
  Manning-Schutze: Chapters 1-2 pp. 1-80
- Välj aktivitet Lecture 1 exercise return box
  
  Tillgänglig om: Du tillhör någon grupp
  
  Lecture 1 exercise return box Inlämningsuppgift
  
  -What kind of Natural Language Processing applications have you used?
  -What is working well? What does not work?
  -What kind of future applications would be useful in your daily life?
  Please type or upload the notes from your breakout group discussion here, e.g. as a photo, text or pdf file to earn a lecture activity point.
- Välj aktivitet 2: Statistical language models / Mikko Kurimo
  
  2: Statistical language models / Mikko Kurimo
- Välj aktivitet Lecture 2 slides (2024)
  Lecture 2 slides (2024) Fil PDF
  
  statistical language models and their applications
  maximum likelihood estimation of n-grams
  class-based n-grams
  the main smoothing methods for n-grams
  introduction to other statistical and neural language models
  Lecture 2 in the course text books:
  Manning-Schutze: Chapter 6 pp. 191-228
  Jurafsky-Martin 3rd (online) edition: Chapter 3 pp. 37-62 (and Chapter 7 pp.131-150 for simple NNLMs)
- Välj aktivitet Lecture 2 exercise return box (Good-Turing)
  
  Tillgänglig om: Du tillhör någon grupp
  Lecture 2 exercise return box (Good-Turing) Inlämningsuppgift
  
  Watch a video where Prof. Jurafsky (Stanford) explains Good-Turing smoothing (between 02:00 – 08:45)
  Click:
  Or search for:”Good Turing video Jurafsky”
  Answer briefly these 3 questions in a single file or text field
  Estimate the prob. of catching next any new fish species, if you already got: 5 perch, 2 pike, 1 trout, 1 zander and 1 salmon?
  Estimate the prob. of catching next a salmon?
  What may cause practical problems when applying Good-Turing smoothing for rare words in large text corpora?
  Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
- Välj aktivitet 3: Sentence level processing / Mikko Kurimo
  3: Sentence level processing / Mikko Kurimo
- Välj aktivitet Lecture 3 slides (2024)
  
  Tillgänglig om: Du tillhör någon grupp
  Lecture 3 slides (2024) Fil PDF
  
  Part-of-Speech and Named Entity tagging
  Hidden Markov models and Viterbi algorithm
  Advanced tagging methods
  
  Lecture 3 in the course text books:
  Manning - Schütze(1999). MIT Press. Chapters 9--12
  Jurafsky-Martin 3rd (online) edition: Chapters 8--9
- Välj aktivitet Lecture 3 exercise return box (HMM and Viterbi)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Lecture 3 exercise return box (HMM and Viterbi) Inlämningsuppgift
  
  Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
  Discuss with each other in breakout rooms and propose answers for these 3 questions:
  
  1. Finish the POS tagging by Viterbi search example by hand.
  - Return the values of the boxes and the final tag sequence. Either take a photo of your drawing, fill in the given ppt, or just type the values into the text box
  2. Did everyone get the same tags? Is the result correct? Why / why not?
  3. What are the pros and cons of HMM tagger?
  
  All submissions, even incorrect or incomplete ones, will be awarded by one activity point.
- Välj aktivitet 4: Word2vec / Tiina Lindh-Knuutila
  4: Word2vec / Tiina Lindh-Knuutila
- Välj aktivitet Lecture 4 slides (2024)
  
  Tillgänglig om: Du tillhör någon grupp
  Lecture 4 slides (2024) Fil PDF
  
  distributional semantics
  vector space models
  word2vec
  information retrieval
  Lecture 3 in the course text books:
  Jurafsky-Martin 3rd (online) edition: Chapter 6
- Välj aktivitet Lecture 4 exercise return box (word vectors)
  
  Tillgänglig om: Du tillhör någon grupp
  Lecture 4 exercise return box (word vectors) Inlämningsuppgift
  
  What are the benefits of distributional semantics?
  What kind of problems there might be?
  What kind of applications can you come up with using these models?
  Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
- Välj aktivitet 5 Neural language modeling and Large language mode...
  
  5 Neural language modeling and Large language models / Mittul Singh
- Välj aktivitet Lecture 5 slides (2024) draft
  
  Tillgänglig om: Du tillhör någon grupp
  
  Lecture 5 slides (2024) draft Fil PDF
  
  NNLMs are discussed in Chapter 7 in the 2020 online version of Jurafsky - Martin book.
- Välj aktivitet Lecture 5 exercise return box: Self-attention
  
  Tillgänglig om: Du tillhör någon grupp
  
  Lecture 5 exercise return box: Self-attention Inlämningsuppgift
  
  Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
- Välj aktivitet 6 Morpheme-level processing / Mathias Creutz
  
  6 Morpheme-level processing / Mathias Creutz
- Välj aktivitet Lecture 6 slides (2024)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Lecture 6 slides (2024) Fil PDF
  
  Not all of these slides will be discussed during the lecture, but everything is useful reading, still.
  NOTE: There is no text book yet that would cover this stuff well, so read the slides carefully!
- Välj aktivitet Lecture 6 exercise return box
  
  Lecture 6 exercise return box Inlämningsuppgift
  
  Students must
  
  Lämna in
  
  Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
- Välj aktivitet 7 Speech recognition / Tamas Grosz
  
  7 Speech recognition / Tamas Grosz
- Välj aktivitet Lecture 7 slides (2023)
  
  Tillgänglig om: Du tillhör någon grupp
  Lecture 7 slides (2023) Fil PDF
  
  Hybrid DNN-HMM architecture
  End-to-end architectures
  Applications
  
  Lecture 5 in the course text books:
  Jurafsky-Martin 3rd (online) edition: Chapters 26
- Välj aktivitet Lecture 7 slides (2024)
  
  Lecture 7 slides (2024) Fil PDF
- Välj aktivitet Lecture 7 exercise return box
  
  Tillgänglig om: Du tillhör någon grupp
  Lecture 7 exercise return box Inlämningsuppgift
  
  Calculate the WER and CER metrics by comparing the ASR hypthesis to the
  human transcript!
  
  ASR hyp 1: he then appeared in the episode smackdown
  Human transcript: he then appeared on an episode of smackdown
  ASR hyp 2: he than apeared on a episode off smacdown
  
  Which metric measures the true accuracy better in your opinion and why?
  All submissions, even incorrect or incomplete ones, will be awarded one activity point.
- Välj aktivitet 8: Chatbots and dialogue agents / Mikko Kurimo
  8: Chatbots and dialogue agents / Mikko Kurimo
- Välj aktivitet Lecture 8 slides (2024)
  
  Tillgänglig om: Du tillhör någon grupp
  Lecture 8 slides (2024) Fil PDF
  
  Rule-based and Corpus-based chatbots
  Retrieval and Machine Learning based chatbots
  Evaluation of chatbots
  More information in the course text books:
  Jurafsky-Martin 3rd (online) edition (February 2024): Chapter 15
- Välj aktivitet Lecture 8 exercise return box
  
  Tillgänglig om: Du tillhör någon grupp
  Lecture 8 exercise return box Inlämningsuppgift
  
  Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
  Try ELIZA and/or PARRY. When do they fail? How to improve?
  Eliza: https://www.eclecticenergies.com/ego/eliza
  Eliza: http://psych.fullerton.edu/mbirnbaum/psych101/Eliza.htm
  Parry: https://www.botlibre.com/browse?id=857177
  Parry: https://www.chatbots.org/chatbot/parry/
  Try more chatbots or dialogue agents, where are they good and bad?
  https://chat.openai.com/chat
  https://convai.huggingface.co/
  https://www.chatbots.org/
- Välj aktivitet 9: Statistical machine translation / Jaakko Väyryn...
  9: Statistical machine translation / Jaakko Väyrynen
- Välj aktivitet Lecture 9 slides (2024)
  
  Tillgänglig om: Du tillhör någon grupp
  Lecture 9 slides (2024) Fil PDF
  
  Lecture based on:
  Chapter 13.2-13.4 in Manning & Schutze
  Chapter 21 in the OLD printed Jurafsky & Martin: Speech and Language Processing
  Chapter 11 in the 2020 Jurafsky & Martin: Speech and Language Processing
  Chapter 13 in the 2024 Jurafsky & Martin: Speech and Language Processing
  Koehn: "Statistical Machine Translation", http://www.statmt.org/book/
- Välj aktivitet Lecture 9 exercise return box
  Lecture 9 exercise return box Inlämningsuppgift
  
  Students must
  
  Lämna in
  
  Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
  Consider different levels of language and different kinds of source-target pairs:
  What would be easy/hard to translate with MT?
  Have you seen failed/succesful usage or applications of MT?
- Välj aktivitet 10: Neural machine translation / Sami Virpioja
  10: Neural machine translation / Sami Virpioja
- Välj aktivitet Lecture 10 slides (2024)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Lecture 10 slides (2024) Fil PDF
- Välj aktivitet Lecture 10 exercise return box
  
  Lecture 10 exercise return box Inlämningsuppgift
  
  Students must
  
  Lämna in
- Välj aktivitet 11: LLMs in industry / Shantipriya Parida
  11: LLMs in industry / Shantipriya Parida
- Välj aktivitet Lecture 11 slides (2024) LLMs in industry
  
  Tillgänglig om: Du tillhör någon grupp
  Lecture 11 slides (2024) LLMs in industry Fil PDF
  
  Overview
  ○ Generative AI
  ○ Language Model
  ○ Large Language Models
  ● LLMs in Industries
  ● Use Cases
  ● Case Study
- Välj aktivitet Lecture 11 exercise return box
  
  Lecture 11 exercise return box Inlämningsuppgift
  
  Students must
  
  Lämna in
- Välj aktivitet Lecture 12 slides (2024) Conclusion
  
  Tillgänglig om: Du tillhör någon grupp
  Lecture 12 slides (2024) Conclusion Fil PDF
  
  The contents of the course
  Info about passing the course and grading
  Info about the exam
  Quick recap of previous lectures

MyCourses service break

ELEC-E5550 - Statistical Natural Language Processing D, Lecture, 9.1.2024-16.4.2024

Översikt

Lectures

Students

Teachers

Service