Please note! Course description is confirmed for two academic years (1.8.2018-31.7.2020), which means that in general, e.g. Learning outcomes, assessment methods and key content stays unchanged. However, via course syllabus, it is possible to specify or change the course execution in each realization of the course, such as how the contact sessions are organized, assessment methods weighted or materials used.
LEARNING OUTCOMES
Learning outcomes for this course, upon successful completion, include the ability to: 1) understand principles of programming using the Python programming language, 2) use Python to collect data from various sources for analysis, 3) employ Python for data cleaning, 4) implement statistical and predictive models in Python using business data, 5) understand how to choose the correct statistical or predictive model based on the available data and business context, and 6) understand how the information resulting from data analysis leads to improved business decision-making.
Credits: 6
Schedule: 26.07.2021 - 13.08.2021
Teacher in charge (valid 01.08.2020-31.07.2022): Joan Lofgren
Teacher in charge (applies in this implementation): Dustin White
Contact information for the course (valid 24.06.2020-21.12.2112):We are meeting remotely, but I want to hear from you!
Please feel free to reach out at dusty.white@aalto.fi, or to reach out through the Discussion Boards here on MyCourses. Depending on where you are in the world, I might not be able to respond immediately due to time differences, but I will make every effort to respond as promptly as possible!
The course
will run each weekday during our session from 1300 UTC to approximately 1600
UTC. This course will
be fast paced, with new content each class period. You will need to be
present for each lecture. Subsequent material will build off the material
that we cover in each class. We will be learning remotely, but we will be
meeting through Zoom, so that we can interact as if we were in the same place.
Class will consist of two portions: 1) Lecture, and 2)
Office hours. Office hours (or lab time as I will call it) will take place
during the 60-90 minutes immediately following the daily lecture. All portions
of the course will occur via Zoom. During lab, you will be able to break out
into small groups to work together on the material for a given class period. I
will spend the time answering questions and checking in with your groups as you
work through exercises related to the course material.
CEFR level (applies in this implementation):
Language of instruction and studies (valid 01.08.2020-31.07.2022):
Teaching language: English
Languages of study attainment: English
CONTENT, ASSESSMENT AND WORKLOAD
Content
Valid 01.08.2020-31.07.2022:
This course is intended to introduce the student to programming languages as tools for conducting data analysis, focusing on Python in particular. The course will cover basic principles of programming languages, as well as libraries useful in collecting, cleaning and analyzing data in order to answer research questions. Students will learn to use Python to apply forecasting tools and predictive models to business settings. The course will be divided between lecture and lab time, and labs will be focused on teaching students how to implement the programming techniques and statistical models discussed in lectures.
Applies in this implementation:
Class Schedule
Course Schedule
Session 1
– 27 July 2020Introduction
to using Python. We will cover opening notebooks, and basic functions in
Python.Class at 1300
UTC (1600 Finland time)No
reading or assignments dueSession 2
– 28 July 2020Loops and
Conditions. We will focus on creating logical conditions for our programs to
meet, as well as looping through code to streamline repeated processes.Class at 1300
UTC (1600 Finland time)Assignment 1
due one hour prior to the start of class (1200 UTC, 1500 Finland Time).Session 3
– 29 July 2020Functions.
Creating functions in a programming language allows us to reuse code in many
contexts and to solve new problems. We will explore how to do this in Python
so that we better understand the code we will be using moving forward.Class at 1300
UTC (1600 Finland time)Assignment 2
due one hour prior to the start of class (1200 UTC, 1500 Finland Time).Session 4
– 30 July 2020Data Frames
and Pandas. We will practice importing and utilizing data in Python. This is
the basis for being able to conduct analysis in Python.Class at 1300
UTC (1600 Finland time)Assignment 3
due one hour prior to the start of class (1200 UTC, 1500 Finland Time).Session 5
– 31 July 2020Regular
Expression and text analysis. Sometimes it is advantageous to be able to
process text into quantifiable information. Regex provides us the capability
to transform text and quickly extract patterns from raw data.Class at 1300
UTC (1600 Finland time)Assignment 4
due one hour prior to the start of class (1200 UTC, 1500 Finland Time).Read Numsense! Chapter 1
Session 6
– 3 Aug 2020Plotting in
Python. We will create visuals using Python to be able to supplement the
stories that we tell with data through visual media.Class at 1300
UTC (1600 Finland time)Assignment 5
due one hour prior to the start of class (1200 UTC, 1500 Finland Time).Session 7
– 4 Aug 2020Introducing
Linear Regression and its implementation in Python. Linear regression
provides a jumping-off point for statistical analysis, and gives us a chance
to prepare our data for analysis.Class at 1300
UTC (1600 Finland time)Assignment 6
due one hour prior to the start of class (1200 UTC, 1500 Finland Time).Read Numsense! Chapter 6
Session 8
– 5 Aug 2020Classification
and Regression Trees. Decision trees will give us a chance to discuss machine
learning and why it differs from regression analysis.Class at 1300
UTC (1600 Finland time)Assignment 7 and
project proposal due one hour prior to the start of class (1200 UTC, 1500
Finland Time).Read
Numsense! Chapter 9Session 9
– 6 Aug 2020Random
Forests and ensemble methods. Ensemble methods provide improved accuracy and
robustness relative to single machine learning models. We will explore these
properties through random forest models.Class at 1300
UTC (1600 Finland time)Assignment 8
due one hour prior to the start of class (1200 UTC, 1500 Finland Time).Read Numsense! Chapter 10
Session 10
– 7 Aug 2020Clustering
models. We will explore unsupervised learning through the k-means clustering
algorithm, and learn about trying to identify various groups of observations
within data, both as a tool for prediction, as well as for better
understanding the available data.Class at 1300
UTC (1600 Finland time)Assignment 9
due one hour prior to the start of class (1200 UTC, 1500 Finland Time).Read Numsense! Chapter 2
Session 11
– 10 Aug 2020Cross-Validation.
We want our models to work in the real world. Using cross-validation, we can
use our data to mimic the real-world and ensure that, to the best of our
ability, our data practices represent the events that we expect to encounter
as we implement our models.Class at 1300
UTC (1600 Finland time)Assignment 10
due one hour prior to the start of class (1200 UTC, 1500 Finland Time).Session 12
– 11 Aug 2020Web scraping
allows an analyst to collect data from nearly any resource that can be
accessed online. This powerful tool allows for the examination of complex
problems and the creative collection of resources to address many different
needs.Class at 1300
UTC (1600 Finland time)Assignment 11
due one hour prior to the start of class (1200 UTC, 1500 Finland Time).Session 13
– 12 Aug 2020Where
possible, the use of Web APIs to streamline data collection is a valuable
tool. Data collected by API is typically clean and standardized, unlike the
data that is collected through web scraping.Class at 1300
UTC (1600 Finland time)Assignment 12
due one hour prior to the start of class (1200 UTC, 1500 Finland Time).Read Numsense! Chapter 5
Session 14
– 13 Aug 2020Project
Workday. We will use the time that we have today to finalize our projects and
presentations for the last day of class.Class at 1300
UTC (1600 Finland time)Assignment 13
due one hour prior to the start of class (1200 UTC, 1500 Finland Time).Session 15
– 14 Aug 2020Project
presentations. Each student will present a brief summary of a research
question they have answered during the term, and policy implications from the
results that they have uncovered.Class at 1300
UTC (1600 Finland time)Project Presentation and
Writeup due one hour prior to the start of class (1200 UTC, 1500 Finland
Time).Assignments
Assignments will be completed in Mimir, with one assignment corresponding to each of the topics covered in class. This makes for 13 assignments in total.
Project
Each student will be asked to find a research question based on data from the European Data Portal. Using the tools covered in class, each student will address their research question in a brief written report, and prepare a short presentation to be given on the final day of class. This project is intended to provide students the opportunity to showcase their learning through this course in a way that can be discussed in job interviews and other contexts where data analysis is a valuable skill.
Assessment Methods and Criteria
Applies in this implementation:
Grading
Course
Requirements and ValuesWeighting (%)
or maximum pointsHomework
Assignments (aggregated)60
Project
20
Project
Presentation10
Discussion
and Participation10
Total
100
Conversion
scaleFinal
grade(official
scale)90 - 100
5
80 - 89
4
70 - 79
3
60 - 69
2
50 - 59
1
0 - 49
0
Grading Methods
I will specify the criteria for each assignment, as well as the point value assigned to each of the criteria. Because the assignments will each focus on different types of problems, these criteria will differ from assignment to assignment. Each assignment will be of APPROXIMATELY the same value, though some assignments will be worth slightly more or less than others.
I will also provide specific grading details for the course project on the project description document.
Workload
Applies in this implementation:
ECTS
Student WorkloadNumber of Hours
Faculty-led engagement (May include synchronous sessions and
asynchronous interaction, eg viewing recorded lectures, distance teamwork and
other peer interaction such as threaded discussions.)45
Self-study hours (May include acquisition of content and
assignment completion.)115
Work
with course materials, eg required readingExam
preparationIndividual
research & writingTeam
projects (meetings, research, preparation, etc.)40
0
50
25Total
of all student workload hours160
DETAILS
Study Material
Applies in this implementation:
Note: Numsense must be purchased, but the Python Data Science Handbook can be obtained at no cost through the link above. The overall cost of materials for the course should be VERY low.
Prerequisites
Valid 01.08.2020-31.07.2022:
none
Registration for Courses
Valid 01.08.2020-31.07.2022:
The course is only for the Mikkeli Campus students and the registration is done at the Mikkeli study office.
Applies in this implementation:
Academic Integrity
If I find that you have plagiarized, been dishonest in completing your assignments, or cheated an an exam or assignment, then I reserve the right to award you no points on the entire exam, project, or assignment and to report the behavior to the university. I also reserve the right to award a failing grade, independent of your score on other assignments. Academic integrity is essential to education, and I take it very seriously.
Mimir Software
Coding
exercises will form the entirety of the homework assignments. These
assignments will be completed through the Mimir Classroom web application.
This application provides access to a virtual machine that can run all of the
code you will need to implement for this course. It will run on any machine
that is able to use Google Chrome. Other browsers may be sufficient, as well,
but my experience suggests that Google Chrome will be the most compatible.Using Mimir
Classroom will provide you near-instant feedback on your code exercises, and
will also provide you the opportunity to submit your code as many times as
you would like, so that you can keep practicing until you get the problem
right. This will help your grade almost as much as it will help you to learn
to code!You will
receive an email invitation just before the beginning of the course, so that
you can register for our classroom. I will walk you through the application
on the first day of class.