Osion kuvaus

  • Lecture schedule 2024 (tentative, topics and dates may still change):

    • 09 Jan 1 Introduction & course organization / Mikko Kurimo
    • 16 Jan 2 Statistical language models / Mikko Kurimo
    • 23 Jan 3 Sentence level processing / Mikko Kurimo
    • 30 Jan 4 Word2vec /  Tiina Lindh-Knuutila
    • 06 Feb 5 Neural language modeling and large language models / Mittul Singh
    • 13 Feb 6 Morpheme-level processing / Mathias Creutz
    • 20 Feb Exam week, no lecture
    • 27 Feb 7 Speech recognition / Tamas Grosz
    • 05 Mar 8 Chatbots and dialogue agents / Mikko Kurimo
    • 12 Mar 9 Statistical machine translation / Jaakko Väyrynen
    • 19 Mar 10 Neural machine translation / Sami Virpioja
    • 26 Mar 11 LLMs in industry / Shantipriya Parida
    • 02 April (spring break - no lecture)
    • 09 Apr 12 Course conclusion / Mikko Kurimo
    • 16 April Exam
    Below you can find slides of 2023 lectures until they are substituted by 2024 ones as the course progresses. Lecture recordings will also be added here.

    • 1: Introduction & Course content / Mikko Kurimo

    • Tiedosto icon

      Introduction to Statistical Natural Language Processing

      Course practicalities in 2024

      Lecture 1 in the course text books:

      • Manning-Schutze: Chapters 1-2 pp. 1-80

    • Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
      Tehtävä icon
      Lecture 1 exercise return box Tehtävä

      -What kind of Natural Language Processing applications have you used?
      -What is working well? What does not work?
      -What kind of future applications would be useful in your daily life?

      Please type or upload the notes from your breakout group discussion here, e.g. as a photo, text or pdf file to earn a lecture activity point.

    • 2: Statistical language models / Mikko Kurimo


    • Tiedosto icon
      • statistical language models and their applications
      • maximum likelihood estimation of n-grams
      • class-based n-grams
      • the main smoothing methods for n-grams
      • introduction to other statistical and neural language models

      Lecture 2 in the course text books:

      • Manning-Schutze: Chapter 6 pp. 191-228
      • Jurafsky-Martin 3rd (online) edition: Chapter 3 pp. 37-62 (and Chapter 7 pp.131-150 for simple NNLMs)

    • Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
      Tehtävä icon
      Lecture 2 exercise return box (Good-Turing) Tehtävä
      Watch a video where Prof. Jurafsky (Stanford) explains Good-Turing smoothing (between 02:00 – 08:45)
      • Click:
      • Or search for:”Good Turing video Jurafsky”
      • Answer briefly these 3 questions in a single file or  text field
      1. Estimate the prob. of catching next any new fish species, if you already got: 5 perch, 2 pike, 1 trout, 1 zander and 1 salmon?
      2. Estimate the prob. of catching next a salmon?
      3. What may cause practical problems when applying Good-Turing smoothing for rare words in large text corpora?

      Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

    • 3: Sentence level processing / Mikko Kurimo


    • Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
      Tiedosto icon
      Lecture 3 slides (2024) Tiedosto PDF

      Part-of-Speech and Named Entity tagging

      Hidden Markov models and Viterbi algorithm

      Advanced tagging methods


      Lecture 3 in the course text books:

      • Manning - Schütze(1999). MIT Press. Chapters 9--12
      • Jurafsky-Martin 3rd (online) edition: Chapters 8--9

    • Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
      Tehtävä icon
      Lecture 3 exercise return box (HMM and Viterbi) Tehtävä

      Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

      Discuss with each other in breakout rooms and propose answers for these 3 questions:

      1. Finish the POS tagging by Viterbi search example by hand.
      - Return the values of the boxes and the final tag sequence. Either take a photo of your drawing, fill in the given ppt, or just type the values into the text box  
      2. Did everyone get the same tags? Is the result correct? Why / why not?
      3. What are the pros and cons of HMM tagger?

      All submissions, even incorrect or incomplete ones, will be awarded by one activity point.

    • 4: Word2vec / Tiina Lindh-Knuutila


    • Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
      Tiedosto icon
      Lecture 4 slides (2024) Tiedosto PDF
      • distributional semantics
      • vector space models
      • word2vec
      • information retrieval

      Lecture 3 in the course text books:

      • Jurafsky-Martin 3rd (online) edition: Chapter 6

    • Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
      Tehtävä icon
      Lecture 4 exercise return box (word vectors) Tehtävä
      1. What are the benefits of distributional semantics?
      2. What kind of problems there might be?
      3. What kind of applications can you come up with using these models?

      Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

    • 5 Neural language modeling and Large language models / Mittul Singh

    • Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
      Tiedosto icon
      Lecture 5 slides (2024) draft Tiedosto PDF

      NNLMs are discussed in Chapter 7 in the 2020 online version of Jurafsky - Martin book.

    • Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
      Tehtävä icon
      Lecture 5 exercise return box: Self-attention Tehtävä

      Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

    • 6 Morpheme-level processing / Mathias Creutz

    • Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
      Tiedosto icon
      Lecture 6 slides (2024) Tiedosto PDF

      Not all of these slides will be discussed during the lecture, but everything is useful reading, still.

      NOTE: There is no text book yet that would cover this stuff well, so read the slides carefully!

    • Tehtävä icon

      Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

    • 7 Speech recognition / Tamas Grosz

    • Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
      Tiedosto icon
      Lecture 7 slides (2023) Tiedosto PDF

      Hybrid DNN-HMM architecture

      End-to-end architectures

      Applications


      Lecture 5 in the course text books:

      • Jurafsky-Martin 3rd (online) edition: Chapters 26

    • Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
      Tehtävä icon
      Lecture 7 exercise return box Tehtävä

      Calculate the WER and CER metrics by comparing the ASR hypthesis to the
      human transcript!

      • ASR hyp 1:            he then appeared in the episode smackdown
      • Human transcript: he then appeared on  an episode of smackdown
      • ASR hyp 2:             he than apeared on a episode off smacdown

      Which metric measures the true accuracy better in your opinion and why?

      All submissions, even incorrect or incomplete ones, will be awarded one activity point.

    • 8: Chatbots and dialogue agents / Mikko Kurimo


    • Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
      Tiedosto icon
      Lecture 8 slides (2024) Tiedosto PDF

      Rule-based and Corpus-based chatbots

      Retrieval and Machine Learning based chatbots

      Evaluation of chatbots

      More information in the course text books:

      • Jurafsky-Martin 3rd (online) edition (February 2024): Chapter 15

    • Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
      Tehtävä icon
      Lecture 8 exercise return box Tehtävä

      Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

      1. Try ELIZA and/or PARRY. When do they fail? How to improve?
        Eliza: https://www.eclecticenergies.com/ego/eliza
        Eliza: http://psych.fullerton.edu/mbirnbaum/psych101/Eliza.htm  
        Parry: https://www.botlibre.com/browse?id=857177
        Parry: https://www.chatbots.org/chatbot/parry/     
      2. Try more chatbots or dialogue agents, where are they good and bad?
        https://chat.openai.com/chat
        https://convai.huggingface.co/
        https://www.chatbots.org/
    • 9: Statistical machine translation / Jaakko Väyrynen


    • Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
      Tiedosto icon
      Lecture 9 slides (2024) Tiedosto PDF

      Lecture based on:

      • ˆ Chapter 13.2-13.4 in Manning & Schutze
      • ˆ Chapter 21 in the OLD printed Jurafsky & Martin: Speech and Language Processing
      • Chapter 11 in the 2020 Jurafsky & Martin: Speech and Language Processing
      • Chapter 13 in the 2024 Jurafsky & Martin: Speech and Language Processing
      • ˆ Koehn: "Statistical Machine Translation", http://www.statmt.org/book/
    • Tehtävä icon

      Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

      Consider different levels of language and different kinds of source-target pairs:
        1. What would be easy/hard to translate with MT?
        2. Have you seen failed/succesful usage or applications of MT?


      1. 10: Neural machine translation / Sami Virpioja


      2. Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
        Tiedosto icon
        Lecture 10 slides (2024) Tiedosto PDF
      3. Tehtävä icon
      4. 11: LLMs in industryShantipriya Parida


      5. Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
        Tiedosto icon
        Lecture 11 slides (2024) LLMs in industry Tiedosto PDF
        • Overview
          ○ Generative AI
          ○ Language Model
          ○ Large Language Models
          ● LLMs in Industries
          ● Use Cases
          ● Case Study
      6. Tehtävä icon
      7. Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään
        Tiedosto icon
        Lecture 12 slides (2024) Conclusion Tiedosto PDF
        • The contents of the course
        • Info about passing the course and grading
        • Info about the exam
        • Quick recap of previous lectures