Please note! Course description is confirmed for two academic years (1.8.2018-31.7.2020), which means that in general, e.g. Learning outcomes, assessment methods and key content stays unchanged. However, via course syllabus, it is possible to specify or change the course execution in each realization of the course, such as how the contact sessions are organized, assessment methods weighted or materials used.

LEARNING OUTCOMES

Learning outcomes for this course, upon successful completion, include the ability to: 1) understand principles of programming using the Python programming language, 2) use Python to collect data from various sources for analysis, 3) employ Python for data cleaning, 4) implement statistical and predictive models in Python using business data, 5) understand how to choose the correct statistical or predictive model based on the available data and business context, and 6) understand how the information resulting from data analysis leads to improved business decision-making.

Credits: 6

Schedule: 26.07.2021 - 13.08.2021

Teacher in charge (valid 01.08.2020-31.07.2022): Joan Lofgren

Teacher in charge (applies in this implementation): Dustin White

Contact information for the course (valid 24.06.2020-21.12.2112):We are meeting remotely, but I want to hear from you!
Please feel free to reach out at dusty.white@aalto.fi, or to reach out through the Discussion Boards here on MyCourses. Depending on where you are in the world, I might not be able to respond immediately due to time differences, but I will make every effort to respond as promptly as possible!

The course
will run each weekday during our session from 1300 UTC to approximately 1600
UTC
. This course will
be fast paced, with new content each class period. You will need to be
present for each lecture
. Subsequent material will build off the material
that we cover in each class. We will be learning remotely, but we will be
meeting through Zoom, so that we can interact as if we were in the same place.

 



Class will consist of two portions: 1) Lecture, and 2)
Office hours. Office hours (or lab time as I will call it) will take place
during the 60-90 minutes immediately following the daily lecture. All portions
of the course will occur via Zoom. During lab, you will be able to break out
into small groups to work together on the material for a given class period. I
will spend the time answering questions and checking in with your groups as you
work through exercises related to the course material.

CEFR level (applies in this implementation):

Language of instruction and studies (valid 01.08.2020-31.07.2022):

Teaching language: English

Languages of study attainment: English

CONTENT, ASSESSMENT AND WORKLOAD

Content
  • Valid 01.08.2020-31.07.2022:

    This course is intended to introduce the student to programming languages as tools for conducting data analysis, focusing on Python in particular. The course will cover basic principles of programming languages, as well as libraries useful in collecting, cleaning and analyzing data in order to answer research questions. Students will learn to use Python to apply forecasting tools and predictive models to business settings. The course will be divided between lecture and lab time, and labs will be focused on teaching students how to implement the programming techniques and statistical models discussed in lectures.

  • Applies in this implementation:

    Class Schedule


    Course Schedule

    Session 1
    – 27 July 2020

    Introduction
    to using Python. We will cover opening notebooks, and basic functions in
    Python.

    Class at 1300
    UTC (1600 Finland time)

    No
    reading or assignments due

    Session 2
    – 28 July 2020

    Loops and
    Conditions. We will focus on creating logical conditions for our programs to
    meet, as well as looping through code to streamline repeated processes.

    Class at 1300
    UTC (1600 Finland time)

    Assignment 1
    due one hour prior to the start of class (1200 UTC, 1500 Finland Time).

    Session 3
    – 29 July 2020

    Functions.
    Creating functions in a programming language allows us to reuse code in many
    contexts and to solve new problems. We will explore how to do this in Python
    so that we better understand the code we will be using moving forward.

    Class at 1300
    UTC (1600 Finland time)

    Assignment 2
    due one hour prior to the start of class (1200 UTC, 1500 Finland Time).

    Session 4
    – 30 July 2020

    Data Frames
    and Pandas. We will practice importing and utilizing data in Python. This is
    the basis for being able to conduct analysis in Python.

    Class at 1300
    UTC (1600 Finland time)

    Assignment 3
    due one hour prior to the start of class (1200 UTC, 1500 Finland Time).

    Session 5
    – 31 July 2020

    Regular
    Expression and text analysis. Sometimes it is advantageous to be able to
    process text into quantifiable information. Regex provides us the capability
    to transform text and quickly extract patterns from raw data.

    Class at 1300
    UTC (1600 Finland time)

    Assignment 4
    due one hour prior to the start of class (1200 UTC, 1500 Finland Time).

     

    Read Numsense! Chapter 1

    Session 6
    – 3 Aug 2020

    Plotting in
    Python. We will create visuals using Python to be able to supplement the
    stories that we tell with data through visual media.

    Class at 1300
    UTC (1600 Finland time)

    Assignment 5
    due one hour prior to the start of class (1200 UTC, 1500 Finland Time).

    Session 7
    – 4 Aug 2020

    Introducing
    Linear Regression and its implementation in Python. Linear regression
    provides a jumping-off point for statistical analysis, and gives us a chance
    to prepare our data for analysis.

    Class at 1300
    UTC (1600 Finland time)

    Assignment 6
    due one hour prior to the start of class (1200 UTC, 1500 Finland Time).

    Read Numsense! Chapter 6

    Session 8
    – 5 Aug 2020

    Classification
    and Regression Trees. Decision trees will give us a chance to discuss machine
    learning and why it differs from regression analysis.

    Class at 1300
    UTC (1600 Finland time)

    Assignment 7 and
    project proposal
    due one hour prior to the start of class (1200 UTC, 1500
    Finland Time).

    Read
    Numsense! Chapter 9

    Session 9
    – 6 Aug 2020

    Random
    Forests and ensemble methods. Ensemble methods provide improved accuracy and
    robustness relative to single machine learning models. We will explore these
    properties through random forest models.

    Class at 1300
    UTC (1600 Finland time)

    Assignment 8
    due one hour prior to the start of class (1200 UTC, 1500 Finland Time).

    Read Numsense! Chapter 10

    Session 10
    – 7 Aug 2020

    Clustering
    models. We will explore unsupervised learning through the k-means clustering
    algorithm, and learn about trying to identify various groups of observations
    within data, both as a tool for prediction, as well as for better
    understanding the available data.

    Class at 1300
    UTC (1600 Finland time)

    Assignment 9
    due one hour prior to the start of class (1200 UTC, 1500 Finland Time).

    Read Numsense! Chapter 2

    Session 11
    – 10 Aug 2020

    Cross-Validation.
    We want our models to work in the real world. Using cross-validation, we can
    use our data to mimic the real-world and ensure that, to the best of our
    ability, our data practices represent the events that we expect to encounter
    as we implement our models.

    Class at 1300
    UTC (1600 Finland time)

    Assignment 10
    due one hour prior to the start of class (1200 UTC, 1500 Finland Time).

    Session 12
    – 11 Aug 2020

    Web scraping
    allows an analyst to collect data from nearly any resource that can be
    accessed online. This powerful tool allows for the examination of complex
    problems and the creative collection of resources to address many different
    needs.

    Class at 1300
    UTC (1600 Finland time)

    Assignment 11
    due one hour prior to the start of class (1200 UTC, 1500 Finland Time).

    Session 13
    – 12 Aug 2020

    Where
    possible, the use of Web APIs to streamline data collection is a valuable
    tool. Data collected by API is typically clean and standardized, unlike the
    data that is collected through web scraping.

    Class at 1300
    UTC (1600 Finland time)

    Assignment 12
    due one hour prior to the start of class (1200 UTC, 1500 Finland Time).

    Read Numsense! Chapter 5

    Session 14
    – 13 Aug 2020

    Project
    Workday. We will use the time that we have today to finalize our projects and
    presentations for the last day of class.

    Class at 1300
    UTC (1600 Finland time)

    Assignment 13
    due one hour prior to the start of class (1200 UTC, 1500 Finland Time).

    Session 15
    – 14 Aug 2020

    Project
    presentations. Each student will present a brief summary of a research
    question they have answered during the term, and policy implications from the
    results that they have uncovered.

    Class at 1300
    UTC (1600 Finland time)

    Project Presentation and
    Writeup due one hour prior to the start of class (1200 UTC, 1500 Finland
    Time).


    Assignments

    Assignments will be completed in Mimir, with one assignment corresponding to each of the topics covered in class. This makes for 13 assignments in total.

    Project

    Each student will be asked to find a research question based on data from the European Data Portal. Using the tools covered in class, each student will address their research question in a brief written report, and prepare a short presentation to be given on the final day of class. This project is intended to provide students the opportunity to showcase their learning through this course in a way that can be discussed in job interviews and other contexts where data analysis is a valuable skill.

Assessment Methods and Criteria
  • Applies in this implementation:

    Grading

    Course
    Requirements and Values

    Weighting (%)
    or maximum points

    Homework
    Assignments (aggregated)

    60

    Project

    20

    Project
    Presentation

    10

    Discussion
    and Participation

    10

    Total

    100

    Conversion
    scale

    Final
    grade

    (official
    scale)

    90 - 100

    5

    80 - 89

    4

    70 - 79

    3

    60 - 69

    2

    50 - 59

    1

    0 - 49

    0

     


    Grading Methods

    I will specify the criteria for each assignment, as well as the point value assigned to each of the criteria. Because the assignments will each focus on different types of problems, these criteria will differ from assignment to assignment. Each assignment will be of APPROXIMATELY the same value, though some assignments will be worth slightly more or less than others.

    I will also provide specific grading details for the course project on the project description document.

Workload
  • Applies in this implementation:

     

    ECTS
    Student Workload

     

    Number of Hours

    Faculty-led engagement (May include synchronous sessions and
    asynchronous interaction, eg viewing recorded lectures, distance teamwork and
    other peer interaction such as threaded discussions.)

    45

    Self-study hours (May include acquisition of content and
    assignment completion.)

    115

    Work
    with course materials, eg required reading

    Exam
    preparation

    Individual
    research & writing

    Team
    projects (meetings, research, preparation, etc.)

    40

    0

    50

    25

     

    Total
    of all student workload hours

     

    160


DETAILS

Study Material
  • Applies in this implementation:

    1. Numsense! Data Science for the Layman: No Math Added (by Annalyn Ng and Kenneth Soo)
    2. Python Data Science Handbook (by Jake VanderPlas)
    Note: Numsense must be purchased, but the Python Data Science Handbook can be obtained at no cost through the link above. The overall cost of materials for the course should be VERY low.

Prerequisites
  • Valid 01.08.2020-31.07.2022:

    none

Registration for Courses
  • Valid 01.08.2020-31.07.2022:

    The course is only for the Mikkeli Campus students and the registration is done at the Mikkeli study office.

  • Applies in this implementation:

    Academic Integrity

    If I find that you have plagiarized, been dishonest in completing your assignments, or cheated an an exam or assignment, then I reserve the right to award you no points on the entire exam, project, or assignment and to report the behavior to the university. I also reserve the right to award a failing grade, independent of your score on other assignments. Academic integrity is essential to education, and I take it very seriously.

    Mimir Software


    Coding
    exercises will form the entirety of the homework assignments. These
    assignments will be completed through the Mimir Classroom web application.
    This application provides access to a virtual machine that can run all of the
    code you will need to implement for this course. It will run on any machine
    that is able to use Google Chrome. Other browsers may be sufficient, as well,
    but my experience suggests that Google Chrome will be the most compatible.

     

    Using Mimir
    Classroom will provide you near-instant feedback on your code exercises, and
    will also provide you the opportunity to submit your code as many times as
    you would like, so that you can keep practicing until you get the problem
    right. This will help your grade almost as much as it will help you to learn
    to code!

     

    You will
    receive an email invitation just before the beginning of the course, so that
    you can register for our classroom. I will walk you through the application
    on the first day of class.


FURTHER INFORMATION

Description

Registration and further information