Please note! Course description is confirmed for two academic years, which means that in general, e.g. Learning outcomes, assessment methods and key content stays unchanged. However, via course syllabus, it is possible to specify or change the course execution in each realization of the course, such as how the contact sessions are organized, assessment methods weighted or materials used.
LEARNING OUTCOMES
Learning outcomes for this course, upon successful completion, include the ability to: 1) understand principles of programming using the Python programming language, 2) use Python to collect data from various sources for analysis, 3) employ Python for data cleaning, 4) implement statistical and predictive models in Python using business data, 5) understand how to choose the correct statistical or predictive model based on the available data and business context, and 6) understand how the information resulting from data analysis leads to improved business decision-making.
Credits: 6
Schedule: 04.01.2021 - 22.01.2021
Teacher in charge (valid 01.08.2020-31.07.2022): Joan Lofgren
Teacher in charge (applies in this implementation): Dustin White
Contact information for the course (valid 07.12.2020-21.12.2112):
Dustin R White,
PhD
Assistant
Professor of Economics, University of Nebraska at Omaha
I study sports, labor, and health economics, and teach
lots of data-driven coursework. While I grew up in the Seattle, WA area, I also
lived for two years in southern Brazil, and speak Spanish and Portuguese
(though I am a bit out of practice!). I love learning languages, and I have
caught EVERY SINGLE Pokémon!
Email: drwhite@unomaha.edu
Office hours/lab time will take place during the
60-90 minutes immediately following the daily lecture. All portions of the
course will occur via Zoom. During lab, you will be able to break out into
small groups to work together on the material for a given class period. I will
spend the time answering questions and checking in with your groups as you work
through exercises related to the course material.
CEFR level (applies in this implementation):
Language of instruction and studies (valid 01.08.2020-31.07.2022):
Teaching language: English
Languages of study attainment: English
CONTENT, ASSESSMENT AND WORKLOAD
Content
Valid 01.08.2020-31.07.2022:
This course is intended to introduce the student to programming languages as tools for conducting data analysis, focusing on Python in particular. The course will cover basic principles of programming languages, as well as libraries useful in collecting, cleaning and analyzing data in order to answer research questions. Students will learn to use Python to apply forecasting tools and predictive models to business settings. The course will be divided between lecture and lab time, and labs will be focused on teaching students how to implement the programming techniques and statistical models discussed in lectures.
Applies in this implementation:
Session 1 –
4 Jan 2020Introduction to using
Python. We will cover opening notebooks, and basic functions in Python.Class at 1200
UTC (1500 Finland time)No
reading or assignments dueSession 2 –
5 Jan 2020Loops and
Conditions. We will focus on creating logical conditions for our programs to
meet, as well as looping through code to streamline repeated processes.Class at 1200
UTC (1500 Finland time)Assignment 1
due one hour prior to the start of class (1100 UTC, 1400 Finland Time).Wednesday,
Jan 6: EpiphanyNo course activities
Session 3 –
7 Jan 2020Functions.
Creating functions in a programming language allows us to reuse code in many
contexts and to solve new problems. We will explore how to do this in Python
so that we better understand the code we will be using moving forward.Class at 1200
UTC (1500 Finland time)Assignment 2
due one hour prior to the start of class (1100 UTC, 1400 Finland Time).Session 4 –
8 Jan 2020Data Frames
and Pandas. We will practice importing and utilizing data in Python. This is
the basis for being able to conduct analysis in Python.Class at 1200
UTC (1500 Finland time)Assignment 3
due one hour prior to the start of class (1100 UTC, 1400 Finland Time).Session 5 –
11 Jan 2020Regular
Expression and text analysis. Sometimes it is advantageous to be able to
process text into quantifiable information. Regex provides us the capability
to transform text and quickly extract patterns from raw data.Class at 1200
UTC (1500 Finland time)Assignment 4
due one hour prior to the start of class (1100 UTC, 1400 Finland Time).Read Numsense! Chapter 1
Session 6 –
12 Jan 2020Plotting in
Python. We will create visuals using Python to be able to supplement the
stories that we tell with data through visual media.Class at 1200
UTC (1500 Finland time)Assignment 5
due one hour prior to the start of class (1100 UTC, 1400 Finland Time).Session 7 –
13 Jan 2020Introducing
Linear Regression and its implementation in Python. Linear regression
provides a jumping-off point for statistical analysis, and gives us a chance
to prepare our data for analysis.Class at 1200
UTC (1500 Finland time)Assignment 6
due one hour prior to the start of class (1100 UTC, 1400 Finland Time).Read Numsense! Chapter 6
Session 8 –
14 Jan 2020Classification
and Regression Trees. Decision trees will give us a chance to discuss machine
learning and why it differs from regression analysis.Class at 1200
UTC (1500 Finland time)Assignment 7 and
project proposal due one hour prior to the start of class (1100 UTC, 1400
Finland Time).Read
Numsense! Chapter 9Session 9 –
15 Jan 2020Random
Forests and ensemble methods. Ensemble methods provide improved accuracy and
robustness relative to single machine learning models. We will explore these
properties through random forest models.Class at 1200
UTC (1500 Finland time)Assignment 8
due one hour prior to the start of class (1100 UTC, 1400 Finland Time).Read Numsense! Chapter 10
Session 10
– 18 Jan 2020Clustering
models. We will explore unsupervised learning through the k-means clustering
algorithm, and learn about trying to identify various groups of observations
within data, both as a tool for prediction, as well as for better
understanding the available data.Class at 1200
UTC (1500 Finland time)Assignment 9
due one hour prior to the start of class (1100 UTC, 1400 Finland Time).Read Numsense! Chapter 2
Session 11
– 19 Jan 2020Cross-Validation.
We want our models to work in the real world. Using cross-validation, we can
use our data to mimic the real-world and ensure that, to the best of our ability,
our data practices represent the events that we expect to encounter as we
implement our models.Class at 1200
UTC (1500 Finland time)Assignment 10
due one hour prior to the start of class (1100 UTC, 1400 Finland Time).Session 12
– 20 Jan 2020Web scraping
allows an analyst to collect data from nearly any resource that can be
accessed online. This powerful tool allows for the examination of complex
problems and the creative collection of resources to address many different
needs.Class at 1200
UTC (1500 Finland time)Assignment 11
due one hour prior to the start of class (1100 UTC, 1400 Finland Time).Session 13
– 21 Jan 2020Where
possible, the use of Web APIs to streamline data collection is a valuable
tool. Data collected by API is typically clean and standardized, unlike the
data that is collected through web scraping.Class at 1200
UTC (1500 Finland time)Assignment 12
due one hour prior to the start of class (1100 UTC, 1400 Finland Time).Read Numsense! Chapter 5
Session 14
– 22 Jan 2020Project
presentations. Each student will present a brief summary of a research
question they have answered during the term, and policy implications from the
results that they have uncovered.Class at 1200
UTC (1500 Finland time)Assignment 13
due one hour prior to the start of class (1100 UTC, 1400 Finland Time).Project Presentation and
Writeup due one hour prior to the start of class (1100 UTC, 1400 Finland
Time).
Assessment Methods and Criteria
Applies in this implementation:
Mimir
Software:Coding
exercises will form the entirety of the homework assignments. These assignments
will be completed through the Mimir Classroom web application. This application
provides access to a virtual machine that can run all of the code you will need
to implement for this course. It will run on any machine that is able to use
Google Chrome. Other browsers may be sufficient, as well, but my experience
suggests that Google Chrome will be the most compatible.Using Mimir
Classroom will provide you near-instant feedback on your code exercises, and
will also provide you the opportunity to submit your code as many times as you
would like, so that you can keep practicing until you get the problem right.
This will help your grade almost as much as it will help you to learn to code!
You will receive an email invitation just before the
beginning of the course, so that you can register for our classroom. I will
walk you through the application on the first day of class.
Workload
Applies in this implementation:
Number of Hours
Faculty-led engagement (May include synchronous sessions and
asynchronous interaction, eg viewing recorded lectures, distance teamwork and
other peer interaction such as threaded discussions.):45
Self-study hours (May include acquisition of content and
assignment completion.):115
Work with course materials, eg
required reading40
Exam preparation
0
Individual research & writing
50
Team projects (meetings, research,
preparation, etc.)25
Other
0
Total
of all student workload hours160
DETAILS
Prerequisites
Valid 01.08.2020-31.07.2022:
none