Home Schools Course feedback Service Links Intelliboard

This course space end date is set to 05.12.2022 Search Courses: ELEC-E7130

Assignment 2. Basic measurements

Due: Wednesday, 5 October 2022, 10:00 PM

Prerequisites

To complete this assignment, you need to use the knowledge learned from Assignment 1.
You also need to use more Linux commands such as dig, curl or ssh as well as to make shell scripts.

If you are not very familiar with Linux and Python, you can
1. Watch the introductory video for this assignment. (You can find it also in the video section.) You can learn how to use the dig and curl commands from the video. You may wish to apply them in tasks.
2. View supporting documents to look at those commands and codes in detail.
3. Take a look at some code snippets which may give you some help. (link fixed on 2022-09-22T08:40 EEST)

Learning outcomes

At the end of this assignment, students should be able to

Develop scripts to measure network latency and throughput.
Get to know more ways to measure network latency and throughput.
Apply some skills to make data processing easier.
Analyse the performance of different servers in terms of latency and throughput.
Develop codes for data analysis using Python or R.
Know some basic concepts related to the statistic measurements.

Introduction

This assignment has three tasks that complement ones for the first assignment. Please read all instructions before starting because it is helpful to identify common work.

Task 1: Statistic measurements
Task 2: Measuring latency
Task 3: Measuring throughput

For the task 2 and 3, you need to make scripts (and crontab or another timing method) to execute at least 24 hours different commands in order to achieve different measurements according to the instructions for every task.

TIP: Regarding crontab and SSH usage, check out linux_intro.pdf page 10. When you edit crontab script in Aalto server, the edited crontab will be located there and will not show in any other computer (server, workstation). Therefore, you need to edit your existing crontab script via SSH again.

Task 1: Statistic measurements

You must answer the following points appropriately:

What are statistical measures? What are the importance and purpose?
Differences between mean and median in terms of definition, applicability, relevance to the data set, and disadvantage.
Explain the concepts of quantile and its purpose, as well as the differences with respect to percentile and quartile. What is the meaning of 75th and 25th percentile?

Task 2: Measuring latency

For the report, describe your measurement setup to the selected 3 name servers, 3 research servers and 2 iperf servers based on instructions at the servers section of this exercise. Start measurements and collect data for at least 24 hours using a shell script and crontab..

Note: Leave measurements running at least for two weeks due to this data will be used in the Final Assignment.

The table below describes the methods to apply for each server selected (based on instructions at the servers section) to measure the latency and defines the running time of the scripts based on the columns ‘Script executed’ and ‘Selecting the minute’ to their respective servers.

Measuring latency of each server with different method
Server	Script executed	Selecting the minute	Method 1	Method 2
3 name servers	Every hour	at studentid modulo 60 [1]	5 ICMP echo request [2]	DNS query [3]
3 research servers	Every 10 mins	at studentid modulo 10 [1]	5 ICMP echo request [2]	-
2 iperf servers	Every 10 mins	at studentid modulo 10 [1]	5 ICMP echo request [2]	TCP connect latency [4]

NOTES:

A recommended way to get evently distributed (among students) value for minutes is to run on Linux expr $(id -u) % 60, where $id is not actually student id, but it is the Linux user ID (UID), but will serve an approximate one.
Using the command ping focusing on the rtt (check -O and -D options)
Using the command dig focusing on the query time (e.g. dig @<nameserver> ns <any webpage from that country>)
Using the command curl to download the 1K.bin file, thereby it does not cause too much load, and to focus on the variables time_connect and time_namelookup (check -w options) to get the TCP connect latency by subtracting the time_connect - time_namelookup.

TIPS:

It’s possible to calculate the results by hand from the logs as there are only a few measurements, but in the future, we will have a bigger dataset that can be tedious or impossible to go through by hand; in addition, it helps to introduce machine learning later in the analysis.
There must be three shell scripts created: one for name servers, one for research server measurements, and one for iperf servers; in order to run all of them with crontab which helps to execute all tests at the start of the hour but distributed it further. Besides, you may log all output and make filtering and data collection later, or you can do filtering right away.

Report a table with following metrics for each of your target servers.
- Median delay with lost packets with delay of infinity, thus if more than 50 % of packets are lost, then consider as infinity.
- Mean delay.
- Loss ratio.
- Delay spread as the difference with 75th and 25th percentiles. If more than 25 % of packets is lost, then consider as infinity.
Finally, make conclusions about stability of network delay. Were some of the hosts different from the others? Could you observe any daytime variations? Do the timezones where target servers (or you) have an impact?

Report, task 2

Describe your measurement setup
Table of measurement results.
Conclusions on network stability.

Task 3: Measuring throughput

In this task you will measure throughput in three different ways: by file transfer, by special measurement tool, and by using measurement service, where the measurement service method must be done manually, the rest can be automatized with, for example, crontab. Finally, you will compare their results.

Start measurement and collect data for at least 24 hours using a shell script and crontab, and describe your measurement setup in the report.

NOTE: Leave measurements running at least for two weeks due to this data will be used in the Final Assignment.

The table below defines the methods focused on measuring the throughput in different ways since most network users are interested only in a single factor: how many bits per second can be downloaded or sent with their network connection. Therefore, run the following throughput tests at your home (remember to check power saving settings of your computer); but in case of you have metered (mobile) broadband, then we recommend using Aalto servers for the task.

In optimum cases, one of the delay measurements should be run simultaneously as the throughput measurements.

Measuring throughput in different ways
Way	Script executed	Selecting the minute	Method 1	Tool
1) by file transfer	Every hour	at studentid modulo 60 [2]	HTTP download tool [3]	`curl`
2) by special measurement tool	Every hour	at studentid modulo 60 [2]	Network performance measurement tool [4]	`iperf3`
3) by using measurement	Manually [1]	-	Measurement service	e.g. Speed Test

NOTES:

Run a few measurements by hand with method 3 within the same time frame and write down the date, time, and results from it
A recommended way to get evently distributed (among students) value for minutes is to run on Linux expr $(id -u) % 60, where $id is not actually student id, but it is the Linux user ID (UID), but will serve an approximate one.
Download files from the target server using some HTTP download tool (curl recommended)
Network performance measurement tool iperf3 for 10 seconds (default) in both directions (check -t option).

NOTE: Target servers for the method 1 and 2 can be found from table in annex Servers (selecting two iperf servers according to the instructions).

Make a table where you results from these 3 methods and calculate basic statistics, such as mean, median, max, min and average deviation. Note that for methods 2 and 3 you will receive upload (UL) and download (DL) readings. Report them separately.

Example results from throughput measurements in megabits per second.
Key	*HTTP (server 1)*	*HTPP (server 2)*	*Iperf UL (server 1)*	*Iperf DL (server 1)*	*Iperf UL (server 2)*	*Iperf DL (server 2)*	ST UL	ST DL
2021-10-13 18:08:01	59.7905	1.1123	-	-	-	-	33.54	94.17
2021-10-13 19:08:01	54.8992	0.3974	11.8925	86.7741	0.2358	-	-	-
Mean	53.961954	0.4740	10.4326	84.5329	0.3156	0.5744	32.9	91.775
Median	57.3448	0.3261	11.8049	84.4473	0.2358	0.5744	32.9	91.775
Min	40.7213	0.1317	7.6005	82.3772	0.1646	0.4169	32.26	89.38
Max	60.4366	1.1123	11.8925	86.7741	0.5464	0.7319	33.54	94.17
Avg deviation	4.1048	0.1969	0.1298	3.0690	0.1056	0.2335	0.9488	3.5508

DL = Download
UL = Upload
ST = SpeedTest
server 1 and server 2 = the target servers selected.
See definition of Avg deviation.

Finally, make conclusions about the methods of network throughput answering for at least the following topics. It may be beneficial to graph results to identify some trends.
1. Are the results between methods in line with each other?
2. Did some method has a lot of deviation? What do you think might cause this?
3. Was there some method that gives higher values than others? What do you think might cause this?
4. Is there variation due time? For example, did you get higher throughput during the day or night?
5. Was there are anomalies? For example, no connection or very different capacity.

Report, task 3

Describe your measurement setup. Include (example) code and samples of results.
Table of results.
Conclusions on throughput methods.

TIPS: Look how the log file structure looks like, and you could modify the existing parse.py file to suit the objective.

Servers

Nameservers

In the Aalto servers, the course includes some tools including mycountry which assigns your country based on your ùserid and performs a few checks, and provides some name servers related to the country.

The following commands must be run in one of Aalto server computers to execute mycountry
```
source /work/courses/unix/T/ELEC/E7130/general/use.sh
mycountry
```
Note: You may need to type the command kinit before to access to the directory. Also the source works with bourne shell compatible shells: bash and zsh (current Aalto default shell).

The command will print something like this:
```
br OK (Brazil): d.dns.br, e.dns.br, f.dns.br, a.dns.br, c.dns.br,
b.dns.br Your UID is 1346517, thus your ccTLD is br (Brazil)
```
For this example, the country assigned is Brazil with the next name servers:
- d.dns.br
- e.dns.br
- f.dns.br
- a.dns.br
- c.dns.br
- b.dns.br
In order to select three servers, you need to test the connection to the servers, if none of the nameservers do not respond to ICMP messages, try using traceroute to find the last hop before that server and use it as a target (it responds).

Warning: Do not test traffic more frequently than twice an hour.

If you do not know which domain names exist, try a search news site:br with one popular search engine. Actually, it should not make a difference if the domain exists or not.

Note: You need to run the mycountry command yourself at Aalto IT kosh.aalto.fi, lyta.aalto.fi, brute.aalto.fi or force.aalto.fi server. You can access them easily via SSH connection remotely. In the Aalto campus network direct DNS queries are disallowed from normal client networks. However, those are allowed from above servers. On the other hand, the same filtering applies to typical residential networks too. You need to run actual DNS latency tests from those servers from where direct DNS requests are allowed.

Note: Also make sure that you do not ask information from the local name server but the one far away!

Research servers

There are few distributed research testbeds, including Caida ARK that researchers can utilize for internet-wide measurements. Here we test latency for a few of these sites where we use the servers below.

Based on the next formulas (use integer division), select three servers (use integer division) from tables below hosts: one from the first table, and two from the second table.

Server 1: id_0 = studentID % 3
Server 2: id_1a = studentID % 6
Server 3: id_1b = studentID / 7 % 6

Select one: *id0*
id_0	server
0	cbg-uk.ark.caida.org
1	arn-se.ark.caida.org
2	pna-es.ark.caida.org

Select two: *id1a* and *id1b*
id_0	server
0	hlz-nz.ark.caida.org
1	cjj-kr.ark.caida.org
2	per-au.ark.caida.org
3	scl-cl.ark.caida.org
4	eug-us.ark.caida.org
5	san-us.ark.caida.org

Warning: Traffic towards this destination can be more frequent (measurement intervals of 10 minutes towards these destinations are acceptable).

For example, if the studentID is 123456 and modulo 3 is applied to it in the case of server, the result will be 0 and therefore the server to use is pna-es.ark.caida.org.

Iperf servers

Iperf3 servers accept only one connection at a time. The hosts running Iperf3 servers used in this course are configured to run 11 different instances, each on a different port from 5200 to 5210. Utilize , for example, $RANDOM variable to select port (-p option) at random.

The first iperf3 server to use is ok1.iperf.comnet-student.eu.

The second iperf3 server is selected according to studentID modulo 2 according to the following table.

the second iperf server
id_0	Far away iperf server
0	blr1.iperf.comnet-student.eu
1	sgp1.iperf.comnet-student.eu

NOTES:

An URL is for example, http://blr1.iperf.comnet-student.eu:80/10M.bin . Based on iperf tests, select that file size that is most appropriate.

Sample file sizes
file	size
1K.bin	1KiB
5K.bin	5KiB
10M.bin	10MiB
50M.bin	50MiB
100M.bin	100MiB
500M.bin	500MiB
500G.bin	500GiB

In addition, the iperf servers will serve above files over HTTP at TCP port 80 which is useful for the task 2 Measuring latency using the command curl.

As it is not known in advance how much capacity is available in the network, it would be prudent to define the maximum time for transfer. Depending on used tool, there are alternatives:
- curl supports -m secs max-time option that will abort transfer if it takes more than secs seconds. Remember to include the amount of bytes transfered to your output format.
- With any program, it is possible to use the timeout command that will kill (or send a signal to) the program if it runs longer than set timeout. A used signal can be specified with -s INT to be SIGINT for example. To set wget command timeout to 60 seconds following command can be used: timeout 60 wget https:://…

Grading standard

To pass this course, you need to achieve at least 15 points in this assignment. And if you submit the assignment late, you can get a maximum of 15 points.

You can get up to 30 points for this assignment:

Task 1

Explain the concepts requested related to statistical measures. (3p)

Task 2

Successfully complete the measurement and describe detailed measurement steps. (2p)
Perform data processing on the data of each server, and get the table. The table contains the four required items. (8p)
Draw appropriate conclusions about the problems (4p)

Task 3

Successfully complete the measurement and describe detailed measurement steps. (1p)
Perform data processing on the data of each method, and get the table. The table contains the required items. (8p)
Draw appropriate conclusions about the problems (4p)

The quality of the report (bonus 2p)

The instruction of assignment

For the assignment, your submission must contain (Please don’t contain original data in your submission):

A zip file that includes your codes and scripts.
A PDF file as your report.

Regarding the report, your report must have:

A cover page indicating your name, student ID and your e-mail address.
The report should include a description of measurements, a summary of the results and conclusions based on the results.
An explanation of each problem, explain how you solved it and why you did it.

Annex

SSH connection to Aalto servers

This way is useful to access dataset or execute some scripts remotely through crontab achieving the connection using the next command: ssh -X <aalto_username>@brute.aalto.fi, where -X ensures to run remotely some applications (Matlab or gedit) even if you don’t have installed them on your personal device.

There are also other Aalto servers available, such as kosh, lyta, and force which are usually online 24/7, especially brute and force servers were implemented for heavy computational and educational purposes.

Recommendation: There are several client programs for SSH for remote computing, but we can recommend PuTTY, MobaXterm for Windows or something similar to them in order to get a SSH connection to Aalto servers.

Delay spread

Delay spread is absolute difference of two quantilities. If like 25% of measurements are <= 125 ms and 75% are <= 175 ms then the delay spread between 25th ad 75th perceintiles are 50 ms (175 ms - 125 ms).

Aalto network topology

basic_measurements.pdf
21 September 2022, 8:05 PM

ELEC-E7130 - Internet Traffic Measurements and Analysis, Lecture, 7.9.2022-5.12.2022