Introduction to Jupyter

1. What is Jupyter?

Project jupyter is a collection of open-source web applications (Jupyter Notebook, Jupyter Lab and Jupyter Hub) that allow to create and share documents that contain live code, text, equations, visualizations, images, hypterlinks, and code output. Jupyter supports various programming languages, including Python, R, and Julia, making it an excellent tool for data science, statistical modeling, machine learning, and more.

Jupyter utilized the IPYNB ("Interactive Python Notebook") file format. IPYNB files are structured using JSON (JavaScript Object Notation) and contain metadata at the root level, including information such as the Jupyter Notebook version, kernel information, and notebook-specific metadata like the notebook name, author, and creation date.

The main content of the notebook is organized into cells. Each cell can contain either code, Markdown text, or raw content. Code cells contain executable code snippets. When executed, code cells produce output that can be displayed directly below the cell.

Markdown cells contain formatted text written in Markdown syntax. Markdown cells support various formatting options, including headings, lists, links, images, and more. Markdown cells are rendered as formatted text when the notebook is viewed or exported.

2. How to work on a *.ipynb file

Aalto Jupyter Hub (strongly recommended)

All Aalto students can login into Aalto Jupyter Hub with their Aalto credentials. Then, you can select the course - in this case, CS-E4730 Computational Social Science (2024):

Selector for CSS course

- and click on "Start". This will start a Jupyter environment with all the correct python libraries you will need in the course. Here you can upload any .ipynb file by clicking on the button "Upload". After the .ipynb file has been uploaded to Aalto Jupyter Hub, you can double click on it to open the notebook.

Another option is installing Jupyter yourself (your own responsibility):

To install Jupyter Notebook on your computer, you can use Python's package manager, pip. Open your terminal or command prompt and run:

pip install notebook

After the installation completes successfully, you still need to take care of installing all the python libraries that you might need to import in the notebook. Note that TAs on this course will not help troubleshooting your local installation. We recommend you to use Aalto Jupyter Hub.

3. How to submit programming exercises in this course

In CS-E4730 Computational Social Science course, you will solve some programming exercises in Python. You will find the Jupyter notebooks in A+. You can download the notebooks (*.ipynb files) and then upload them in the CS-E4730 Computational Social Science course space in Aalto Jupyter Hub. The Jupyter notebooks already have most of the code needed for the task. You will need to read the notebook and implement the missing parts. After you have implemented all the needed parts, you need to go back to A+ and follow the instructions there.

Sometimes you will be asked to upload your completed Jupyter notebook to A+. If this is the case, A+ will run unit tests to check that you have indeed solved the tasks correctly. In some cases, you will not be asked to submit your solved Jupyter notebook directly, but you will be asked to answer some questions on A+. However, answering those questions requires that you have already solved the programming task!

But, before starting, let's review a bit of the Python basics!

4. Using Python in Jupyter

To demonstrate how to work with Python in a Jupyter Notebook, we start with the famous Hello world:

Python data types

Lists, tuples, and dictionaries

A list is a mutable (changeable) ordered collection of elements. Lists are defined using square brackets [], allows for duplicate elements and can contain elements of different data types. Elements in a list can be accessed by index, starting with 0 (first element).

Tuples are immutable (unchangeable) ordered collections of elements. They are defined using parentheses (), allow duplicate elements, and can contain elements of different data types. Elements in a tuple can be accessed by index, similar to lists. Tuples are often used to store fixed collections of items that should not be modified.

Dictionaries are mutable (changeable) unordered collections of key-value pairs and do not allow duplicate keys (whereas a value can appear multiple times). They are defined using curly braces {}. Since dictionaries are unordered, elements in a dictionary are accessed by key rather than by index. Dictionaries are useful for storing data in a structured format, where each piece of data is associated with a unique identifier (key).

Note: The values() method gives a listing of the values in the dictionary in "arbitrary" order. This order may sometime seem like the values are ordered based on the keys, especially if the keys are small integers. However, YOU CANNOT TRUST THIS TO HAPPEN EVERY TIME! Dictionaries are unordered!

Control flow in python

In Python, the if, elif (else if), and else statements are used for conditional execution of code. They allow you to control the flow of your program based on certain conditions.

The if statement is used to execute a block of code if a certain condition is true.

The elif statement is used to check additional conditions if the preceding if statement or elif statements (if any) evaluate to False. You can have multiple elif statements, each with its own condition. The elif statements are optional and can only appear after an if.

The else statement is used to execute a block of code if none of the preceding conditions (in the if and elif statements) evaluate to True. The else statement is optional and can appear only once in an if-elif-else block.

Loops in python

In Python, loops are used to iterate over a sequence of elements or to execute a block of code repeatedly. There are two main types of loops in Python: for loops and while loops.

for loops are typically used when you know the number of times you want to iterate over a sequence or when you want to iterate over the elements of a sequence (like lists, tuples, strings, dictionaries, etc.)

while loops are used when you want to execute a block of code repeatedly as long as a condition is true.

In addition to these basic loops, Python also provides a way to control the flow of a loop using statements like break, continue, and else.

Functions

Functions are reusable blocks of code that perform a specific task when you call them. They allow you to break down your program into smaller, more manageable pieces, making your code more organized, readable, and maintainable. Functions help promote code reusability, reduce redundancy, and improve the overall structure of your programs.

You define a function using the def keyword followed by the function name and parentheses (). Any input parameters to the function are placed within the parentheses. To use a function, you call it by its name followed by parentheses (). If the function takes parameters, you pass the values for those parameters within the parentheses.

Functions can have zero or more parameters (also called arguments). Parameters are variables that receive values when the function is called. Parameters can have default values. A default value is already specified in the function definition.

Functions can return values using the return statement. This allows functions to send data back to the caller. A function can have multiple return statements, but only one will be executed in each function call. If a function does not explicitly return a value, it implicitly returns None.

It is good practice (but not mandatory) to specify the parameters types and the return types for a function. It is also good practice to include documentation for your functions using docstrings. Docstrings are string literals placed immediately after the function header, and they describe what the function does.

Generators

Generators in Python are a way to create iterators in a simple and efficient manner. They are functions that allow you to generate a sequence of values over time rather than storing all the values in memory at once. This can be particularly useful when dealing with large amount of data or infinite sequences, because they allow you to generate values on-the-fly without having to store them all in memory. They are also commonly used in combination with loops to iterate over sequences of values in a memory-efficient manner.

The key feature of generators is the yield statement. When a function contains a yield statement, it becomes a generator. The yield statement is used to return a value from the generator and temporarily suspend the function's execution. Later, when the generator is called again, it resumes execution from where it left off.

Here's an example to illustrate how generators work:

Classes and objects

Classes and objects provide a way to model real-world entities and encapsulate their behavior and data. They are fundamental concepts in object-oriented programming (OOP) and are widely used in Python for creating reusable and organized code.

A class is a blueprint for creating objects (instances of a class). A class defines the attributes (data) and methods (functions) that belong to the objects created from it. Objects are instances of classes, and each object has its own unique set of attributes and methods.

In the following example, we define a class named Car with class attribute wheels and instance methods __init__, drive, honk. The __init__ method is a special method called constructor. It initializes new objects with the provided attributes (make, model, year). The class attribute wheels is the same for all objects in the same class (all cars have 4 wheels).

How to read data from files and how to use Python libraries

Python has a vast ecosystem of libraries that cover a wide range of functionalities. Here, we introduce you to some popular Python libraries that are also useful for this course.

In Python, you can work with text data using built-in file object. You can read a file line by line using a file object's readline() method inside a loop.

We can also work with JSON data using the built-in json module, that provides functions to parse JSON strings into Python objects (deserialization) and serialize Python objects into JSON strings (serialization).

NetworkX is a Python library for creating, analyzing, and visualizing complex networks (graphs). It provides data structures for representing various types of networks, along with algorithms for network analysis and manipulation. Next cell assumes that networkx is installed in your python environment.