Python and JupyterHub
Python Basics
This course will make use of Python notebooks. Thus, some basic familiarity with the programming language Python or the ability to acquire it independently is a prerequisite for succeeding in this course. We are aware that some students have never touched Python. Thus, the first week is designed to help you get familiar with the language and notebook structure.
First, we recommend you take a look at this Python Quickstart. Do not worry (yet) if this seems over your head. This exercise is not about understanding the subtleties of code, but to give you a flavour of what Data Science is about and what kind of tools we will be using. Also, you can ignore the installation part. We do not require you to install any software on your computer, but will provide you with a browser-based Jupyter environment (more on this below).
Second, we ask you to complete two DataCamp online Python courses. Self-studying the Python basics on your own schedule is the most efficient way to get started and will save the tutorial sessions for more complex topics (more on this under the Week 1 tab). But for now, continue to the JupyterHub section below.
JupyterHub
The course requires coding assignments to be completed.
For working on the coding assignments, we will provide you a web-based Python environment called "JupyterHub". This environment allows you to work with Python notebooks using only a web-browser and your Aalto account (same username and password that you are using for logging into MyCourses). If you have taken CS courses, such as "Machine Learning: Basic Principles" or "Deep Learning", you will be familiar with JupyterHub. To avoid frustration, both experienced and new users need to understand what JupyterHub is and what it is not:
JupyterHub What it is: What it is not: - a pre-configured Python environment that frees you from the task of installing and managing Python packages (e.g. pandas, numpy, sklearn) on your own computer - a computational powerful work environment - a convenient tool to help students who are new to programming to focus on the Data Science tools, instead of spending time on Python installations - a stable work environment (the service will sometimes crash or be down for maintenance reasons) - an easy way for teachers to publish assignments, tutorials, and the exam to all students at once. - a tool for writing master thesis! (seriously!) - a taste of what you will find at your future work place
You should see JupyterHub as a nice-to-have tool that can make your life easier. It is perfectly fine to do all assignments, incl. the exam on the web-based JupyterHub. However, if you want to explore more, learn how to manage your own Python installation, and get a feel for the "backend" side of Data Science, you may want to explore further. We encourage all students to install Python on their own computers and manage packages via Anaconda. However, as the course schedule is already tight, we do not provide any technical support for these endeavours. That's why we offer the JupyterHub.
That being said, let's go over the typical workflow for completing a coding assignment (looks different if self-managed Python env):- Once published, fetch the assignment into your home folder (on Jupyter Hub)
- Complete the exercises inside the notebook
- Download the completed notebook (file type .ipynb) to your computer
- Upload the file to the MyCourses submission box
Note: the JupyterHub "submit" function is NOT used in this course. You have to submit all coding assignments and exams to a usual MyCourses submission box.
What is JupyterHub?
JupyterHub is a server which hosts Jupyter Notebooks. In this course, it is used to share tutorials and publish assignments. It currently has capacity for about 500 concurrent users.
Okey what next?
Go to the site
JupyterHub is hosted at https://jupyter.cs.aalto.fi/.
Login
When the page has loaded, you should be greeted with a login page. Use your Aalto credentials to log in.
Choose the correct module
Next, a Spawner Options page will come up. Choose "30E03000 Data Science for Business I 2020" and press "Spawn". JupyterHub will launch a session for you. Depending on the load to the server, this might take even a couple of minutes (but usually just seconds).
Home Page
Once your session has started, you will see the screen below. If you have never worked with Aalto's JupyterHub, the "Files" tab will be empty. If you have used JupyterHub in other courses, you might see your old files which is totally ok. Next, click the "Assignments" tab to see your assignments.
Fetching the assignments
In the Assignments tab, you see a list of assignments available to you. As the course goes on, more assignments will be available.
Check what assignments are available under the "Released assignments" header and click "Fetch". The assignment will now be shown under the "Downloaded assignments" header.
Start working
Go back to the "Files" tab. A new folder named "dsfb2020a" will have appeared. Open it.
A folder named similarly to the fetched assignment will have appeared (if not, refresh the page in your browser). Within the folder you will find the assignment exercise notebook that you are required to complete. Further, the folder usually contains the relevant data files and in some cases images/banners.Congrats, you are all set and ready to start working on the assignment. Good luck!
For the final exam and some assignments you may be asked to submit your code to MyCourses.Download your notebook (not needed for tutorials)
After you have completed the assignment, you can download it to your computer. Mark the file (1.) and then click "download2 (2.):
Note: JupyterHub shows a submission function under the "Assignments" tab, however, we are NOT using it in this course (feature is still in beta stage). Submitting your assignment in JupyterHub is NOT a valid submission!Submitting your notebook
Submit the download .ipynb file into the correct MyCourses submission box. This step is identical to what you are used from all other courses.