Topic outline

  • Welcome to the summer course Introduction to Text Mining.

    Zoom link to sessions:  https://aalto.zoom.us/j/67648007604?pwd=VVVXWW9JeldDVUFTcllYbUpvQlUrZz09 (pass  165388)


    This will be a beginners' course that will familiarize the students with the basic tools of text mining. Emphasis will be on teaching the students how to adapt and apply some widely used algorithms and techniques in the area of natural language processing (NLP) and data scraping. Students will be provided code snippets to implement the algorithms.

    The students, in particular:

    • Will know how to build an NLP workflow on given a corpus.

    • Can employ some Python packages for NLP, especially Spacy.

    • Understand the basic elements of the workflow like POS tagging, dependency trees, and named entity recognition.

    • Understand how to test and apply the algorithms for mining for important words and themes from documents - TF-IDF scores and Topic Modeling.

    • Learn how to scrape data from webpages.


    Implementation (work load and assessment):

    This course will consist of 3 weeks of lectures: 6 x 1 hr 45 mins. Each session will be a mix of lecturing and coding sessions. Coding will be done in Python in the Google Colab environment. In general, precompiled codes and datasets will be provided and students will be expected to make to minor edits. Grading will be based on 60% quizzes (multiple choice in Aalto A+) and 40% project (collating supplied codes).


    Credits: 3

    Language: English

    Grading scale: 0-5

    Teacher: Kunal Bhattacharya


    Lecture dates: 9.8, 11.8, 16.8, 18.8, 23.8, and 25.8 (Tue + Thu days x 3 weeks)

    Time: all days 13:15 - 15:00


    Pre-requisites:

    1. Basic algebra and statistics.

    2. Some coding experience will be helpful. At least there should be an appetite to do edits and run precompiled codes in Python.

    3. In absence of (1) and (2), the instructor’s permission may be sought.

    4. A Google account. Python codes will be run in the Google Colab environment.


    Textbook: 

    Stefan Jansen. Hands-On Machine Learning for Algorithmic Trading. Packt Publishing Ltd, 2018. 

    Alternatively, the lecture slides.