Topic outline

  • Lecture schedule 2022:

    • 11 jan 1 Introduction & Project groups / Mikko Kurimo
    • 18 jan 2 Statistical language models / Mikko Kurimo
    • 25 jan 3 Word2vec /  Tiina Lindh-Knuutila
    • 01 feb 4 Sentence level processing / Mikko Kurimo
    • 08 feb 5 Speech recognition / Janne Pylkkönen
    • 15 feb 6 Morpheme-level processing / Mathias Creutz
    • 22 feb Exam week, no lecture
    • 01 mar 7 Chatbots and dialogue agents / Mikko Kurimo
    • 08 mar 8 Neural language modeling and BERT / Mittul Singh
    • 15 mar 9 Statistical machine translation / Jaakko Väyrynen
    • 22 mar 10 Neural machine translation / Stig-Arne Grönroos
    • 29 mar 11 Societal impacts and course conclusion / Krista Lagus  and Mikko Kurimo

    Below you can find slides of 2021 lectures until they are substituted by 2022 ones as the course progresses. Lecture recordings will also be added here.

    • Not available unless: You belong to any group
      URL icon
      Zoom link to participate in the lectures URL
    • 11 Jan 2022 1: Introduction & Course content / Mikko Kurimo

    • File icon

      Introduction to Statistical Natural Language Processing

      Course practicalities in 2022

      Lecture 1 in the course text books:

      • Manning-Schutze: Chapters 1-2 pp. 1-80

    • Not available unless: You belong to any group
      File icon
      Lecture 1 recording (2022) File MP4
    • Assignment icon

      -What kind of Natural Language Processing applications have you used?
      -What is working well? What does not work?
      -What kind of future applications would be useful in your daily life?

      Please type or upload the notes from your breakout group discussion here, e.g. as a photo, text or pdf file to earn a lecture activity point.

    • 18 Jan 2022 2: Statistical language models / Mikko Kurimo


    • File icon
      • statistical language models and their applications
      • maximum likelihood estimation of n-grams
      • class-based n-grams
      • the main smoothing methods for n-grams
      • introduction to other statistical and neural language models

      Lecture 2 in the course text books:

      • Manning-Schutze: Chapter 6 pp. 191-228
      • Jurafsky-Martin 3rd (online) edition: Chapter 3 pp. 37-62 (and Chapter 7 pp.131-150 for simple NNLMs)

    • Not available unless: You belong to any group
      File icon
      Lecture 2 recording (2022) File MP4
    • Assignment icon

      List as many potential applications for statistical language models as you can!
      Typically they are tasks where you need the probability or to find the most probable word or sentence given some background information

      Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

    • Assignment icon
      Watch a video where Prof. Jurafsky (Stanford) explains Good-Turing smoothing (between 02:00 – 08:45)
      • Click:
      • Or search for:”Good Turing video Jurafsky”
      • Answer briefly these 3 questions in a single file or  text field
      1. Estimate the prob. of catching next any new fish species, if you already got: 5 perch, 2 pike, 1 trout, 1 zander and 1 salmon?
      2. Estimate the prob. of catching next a salmon?
      3. What may cause practical problems when applying Good-Turing smoothing for rare words in large text corpora?

      Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

    • 25 Jan 2022 3: Word2vec / Tiina Lindh-Knuutila


    • File icon
      • distributional semantics
      • vector space models
      • word2vec
      • information retrieval

      Lecture 3 in the course text books:

      • Jurafsky-Martin 3rd (online) edition: Chapter 6

    • Not available unless: You belong to any group
      File icon
      Lecture 3 recording (2022) File MP4
    • Assignment icon
      1. What are the benefits of distributional semantics?
      2. What kind of problems there might be?
      3. What kind of applications can you come up with using these models?

      Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

    • 1 Feb 2022 4: Sentence level processing / Mikko Kurimo


    • File icon

      Part-of-Speech and Named Entity tagging

      Hidden Markov models and Viterbi algorithm

      Advanced tagging methods


      Lecture 4 in the course text books:

      • Manning - Schütze(1999). MIT Press. Chapters 9--12
      • Jurafsky-Martin 3rd (online) edition: Chapters 8--9

    • Not available unless: You belong to any group
      File icon
      Lecture 4 recording (2022) File MP4
    • Assignment icon

      Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

      Discuss with each other in breakout rooms and propose answers for these 3 questions:

      1. Finish the POS tagging by Viterbi search example by hand.
      - Return the values of the boxes and the final tag sequence. Either take a photo of your drawing, fill in the given ppt, or just type the values into the text box  
      2. Did everyone get the same tags? Is the result correct? Why / why not?
      3. What are the pros and cons of HMM tagger?

      All submissions, even incorrect or incomplete ones, will be awarded by one activity point.

    • 08 Feb 2022 5: Speech recognition / Janne Pylkkönen


    • File icon

      Hybrid DNN-HMM architecture

      End-to-end architectures

      Applications


      Lecture 5 in the course text books:

      • Jurafsky-Martin 3rd (online) edition: Chapters 26

    • Not available unless: You belong to any group
      File icon
      Lecture 5 recording (2022) File MP4
    • Assignment icon

      Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

      Discuss with each other in breakout rooms and propose answers for these questions:

        Think about an application where ASR would be useful, but where
        it is not yet commonly used. How would ASR change the user
        experience? What are the biggest challenges for ASR in that use
        case?

        All submissions, even incorrect or incomplete ones, will be awarded by one activity point.

      1. 15 Feb 2022 6: Morpheme-level processing / Mathias Creutz


      2. File icon

        Not all of these slides will be discussed during the lecture, but everything is useful reading, still.

        NOTE: There is no text book yet that would cover this stuff well, so read the slides carefully!

      3. Not available unless: You belong to any group
        File icon
        Lecture 6 recording (2022) File MP4
      4. Assignment icon

        Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

      5. 23 Feb 2021: Exam week, no lecture


      6. 01 Mar 2022 7: Chatbots and dialogue agents / Mikko Kurimo


      7. File icon

        Rule-based and Corpus-based chatbots

        Retrieval and Machine Learning based chatbots

        Evaluation of chatbots


        Lecture 6 in the course text books:

        • Jurafsky-Martin 3rd (online) edition: Chapter 24

      8. Not available unless: You belong to any group
        File icon
        Lecture 7 recording (2022) File MP4
      9. Assignment icon

        Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

        Discuss with each other in breakout rooms and propose answers for these 6 questions:

          1. Which chatbots and dialogue agents have you used?  What can they do, what not?
          2. Try ELIZA, e.g. https://www.eclecticenergies.com/ego/eliza or http://psych.fullerton.edu/mbirnbaum/psych101/Eliza.htm When does it fail? How to improve it?
          3. Try PARRY, e.g. https://www.chatbots.org/chatbot/parry/ or https://www.botlibre.com/browse?id=857177  When does it fail? How to improve it?
          4. Try more chatbots or dialogue agents, e.g. transformer: https://convai.huggingface.co/ or anyone from: https://www.chatbots.org/
          5. What do you think: How to make better chatbots? How to automatically evaluate chatbots?
          6. What ethical issues do chatbots have? Any suggestions how to solve them?


        1. 8 March 2022 8: Neural language modeling / Mittul Singh


        2. File icon

          NNLMs are discussed in Chapter 7 in the 2020 online version of Jurafsky - Martin book.

        3. Not available unless: You belong to any group
          File icon
          Lecture 8 recording (2022) File MP4
        4. Assignment icon

          Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

        5. Assignment icon

          Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

        6. 15 March 2022 9: Statistical machine translation / Jaakko Väyrynen


        7. File icon

          Lecture based on:

          • ˆ Chapter 13.2-13.4 in Manning & Schutze
          • ˆ Chapter 21 in the OLD Jurafsky & Martin: Speech and Language Processing
          • Chapter 11 in the NEW Jurafsky & Martin: Speech and Language Processing
          • ˆ Koehn: "Statistical Machine Translation", http://www.statmt.org/book/
        8. Not available unless: You belong to any group
          File icon
          Lecture 9 recording (2022) File MP4
        9. Assignment icon

          Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

          Discuss with each other in breakout rooms and propose answers for these 2 questions:

          Consider different levels of language and different kinds of source-target pairs:

            1. What would be easy/hard to translate with MT?
            2. Have you seen failed/succesful usage or applications of MT?


          1. 22 Mar 2022 10: Neural machine translation / Stig-Arne Grönroos


          2. File icon

            NMT is discussed in Chapter 11 in the 2020 online version of Jurafsky - Martin book.

          3. Not available unless: You belong to any group
            File icon
            Lecture 10 recording (2022) File MP4
          4. Assignment icon

            Other tasks that you can use an NMT architecture for?
            Same form, different semantics.

            Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

          5. 29 Mar 2022 11: Societal impacts and course conclusion / Krista Lagus  and Mikko Kurimo


          6. File icon

            The first part of Krista's presentation

          7. File icon
            • This material is not included in the text books
            • Check the slides and any reading material mentioned there
          8. File icon
            • The contents of the course
            • Info about passing the course and grading
            • Info about the exam
            • Quick recap of previous lectures
          9. Not available unless: You belong to any group
            File icon
            Lecture 11 recording, part 1 (2022) File MP4
          10. Not available unless: You belong to any group
            File icon
            Lecture 11 recording, part 2 (2022) File MP4
          11. Assignment icon

            Discuss with group:
            1. Do you think it is possible to detect speaker’s emotions from text? Explain!
            2. What good might there be, if we could create a “WORRY-O-METER”?
            3. What problems do you foresee?

            Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.

          12. Assignment icon

            1. What are important principles for you, that you would like to see more of, in the world / in the discussions / in social media?
            ‒ Write down at least one, and describe it to your group
            2. What would the world be like if your chosen principle was adopted or became stronger in the world, or in some particular context or forum?  Describe concretely, if possible

            Please type or upload your answer here, e.g. as a photo, text or pdf file and earn a lecture activity point.