ELEC-E5550 - Statistical Natural Language Processing D, 12.01.2021-14.04.2021
This course space end date is set to 14.04.2021 Search Courses: ELEC-E5550
Topic outline
-
Lecture schedule 2021:
- 12 Jan 1 Introduction & Project groups / Mikko Kurimo
- 19 jan 2 Statistical language models / Mikko Kurimo
- 26 jan 3 Word2vec / Tiina Lindh-Knuutila
- 02 feb 4 Sentence level processing / Mikko Kurimo
- 09 feb 5 Speech recognition / Janne Pylkkönen
- 16 feb 6 Chatbots and dialogue agents / Mikko Kurimo
- 23 feb Exam week, no lecture
- 02 mar 7 Statistical machine translation / Jaakko Väyrynen
- 09 mar 8 Morpheme-level processing / Mathias Creutz
- 16 mar 9 Neural language modeling and BERT / Mittul Singh
- 23 mar 10 Neural machine translation / Stig-Arne Grönroos
- 30 mar 11 Societal impacts and course conclusion / Krista Lagus and Mikko Kurimo
Below you can find slides of 2020 lectures until they are substituted by 2021 ones as the course progresses. Lecture recordings will also be added here.
-
Zoom link to participate in the lectures URL
-
NEW Drive link to view the recorded lectures URL
-
OLD Drive link to view the recorded lectures URL
-
Introduction to Statistical Natural Language Processing
Course practicalities in 2021
Lecture 1 in the course text books:
- Manning-Schutze: Chapters 1-2 pp. 1-80
-
Make a submission
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
-
- statistical language models and their applications
- maximum likelihood estimation of n-grams
- class-based n-grams
- the main smoothing methods for n-grams
- introduction to other statistical and neural language models
Lecture 2 in the course text books:
- Manning-Schutze: Chapter 6 pp. 191-228
- Jurafsky-Martin 3rd (online) edition: Chapter 3 pp. 37-62 (and Chapter 7 pp.131-150 for simple NNLMs)
- statistical language models and their applications
-
Make a submission
List as many potential applications for statistical language models as you can!
Typically they are tasks where you need the probability or to find the most probable word or sentence given some background informationPlease type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
-
Make a submission
-
- distributional semantics
- vector space models
- word2vec
- information retrieval
Lecture 3 in the course text books:
- Jurafsky-Martin 3rd (online) edition: Chapter 6
-
Make a submission
- What are the benefits of distributional semantics?
- What kind of problems there might be?
- What kind of applications can you come up with using these models?
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
-
Part-of-Speech and Named Entity tagging
Hidden Markov models and Viterbi algorithm
Advanced tagging methods
Lecture 4 in the course text books:
- Manning - Schütze(1999). MIT Press. Chapters 9--12
- Jurafsky-Martin 3rd (online) edition: Chapters 8--9
-
Make a submission
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
Discuss with each other in breakout rooms and propose answers for these 3 questions:
1. Finish the POS tagging by Viterbi search example by hand.
- Return the values of the boxes and the final tag sequence. Either take a photo of your drawing, fill in the given ppt, or just type the values into the text box
2. Did everyone get the same tags? Is the result correct? Why / why not?
3. What are the pros and cons of HMM tagger?
All submissions, even incorrect or incomplete ones, will be awarded by one activity point. -
Hybrid DNN-HMM architecture
End-to-end architectures
Applications
Lecture 5 in the course text books:
- Jurafsky-Martin 3rd (online) edition: Chapters 26
-
Make a submission
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
Discuss with each other in breakout rooms and propose answers for these questions:
Think about an application where ASR would be useful, but where
it is not yet commonly used. How would ASR change the user
experience? What are the biggest challenges for ASR in that use
case?All submissions, even incorrect or incomplete ones, will be awarded by one activity point.
-
Rule-based and Corpus-based chatbots
Retrieval and Machine Learning based chatbots
Evaluation of chatbots
Lecture 6 in the course text books:
- Jurafsky-Martin 3rd (online) edition: Chapter 24
-
Make a submission
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
Discuss with each other in breakout rooms and propose answers for these 6 questions:
- Which chatbots and dialogue agents have you used? What can they do, what not?
- Try ELIZA, e.g. https://www.eclecticenergies.com/ego/eliza or http://psych.fullerton.edu/mbirnbaum/psych101/Eliza.htm When does it fail? How to improve it?
- Try PARRY, e.g. https://www.chatbots.org/chatbot/parry/ or https://www.botlibre.com/browse?id=857177 When does it fail? How to improve it?
- Try more chatbots or dialogue agents, e.g. transformer: https://convai.huggingface.co/ or anyone from: https://www.chatbots.org/
- What do you think: How to make better chatbots? How to automatically evaluate chatbots?
- What ethical issues do chatbots have? Any suggestions how to solve them?
-
Lecture based on:
- Chapter 13.2-13.4 in Manning & Schutze
- Chapter 21 in the OLD Jurafsky & Martin: Speech and Language Processing
- Chapter 11 in the NEW Jurafsky & Martin: Speech and Language Processing
- Koehn: "Statistical Machine Translation", http://www.statmt.org/book/
-
Make a submission
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
Discuss with each other in breakout rooms and propose answers for these 2 questions:
Consider different levels of language and different kinds of source-target pairs:
- What would be easy/hard to translate with MT?
- Have you seen failed/succesful usage or applications of MT?
-
Not all of these slides will be discussed during the lecture, but everything is useful reading, still.
NOTE: There is no text book yet that would cover this stuff well, so read the slides carefully!
-
Make a submission
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
-
NNLMs are discussed in Chapter 7 in the 2020 online version of Jurafsky - Martin book.
-
Make a submission
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
-
Make a submission
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
-
NMT is discussed in Chapter 11 in the 2020 online version of Jurafsky - Martin book.
-
Make a submission
Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
-
- This material is not included in the text books
- Check the slides and any reading material mentioned there
-
- The contents of the course
- Info about passing the course and grading
- Info about the exam
- Quick recap of previous lectures
-
Discuss with group:
1. Do you think it is possible to detect speaker’s emotions from text? Explain!
2. What good might there be, if we could create a “WORRY-O-METER”?
3. What problems do you foresee?Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.
-
Make a submission
Try out Medicine Radar at: Laaketutka.fi E.g.
- search for a medicine you know
- Look at the list of other medicines, other symptoms, typical dosages. Is this what you would expect?
Discuss:
1. What is interesting about this prototype?
2. What could it be useful for?
3. Come up with questions & share them with your group!Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.