ELEC-E7130 - Internet Traffic Measurements and Analysis, Lecture, 7.9.2022-5.12.2022
Kurssiasetusten perusteella kurssi on päättynyt 05.12.2022 Etsi kursseja: ELEC-E7130
Assignment 3. User traffic
Prerequisites
To complete the first two parts of this assignment, you need an access to Aalto computers.
For the third part you will need a root level access to Linux computer (or administrator access on Windows computer). If you do not have a computer suitable for that (e.g. if you only have a company laptop), please contact course staff and a loan computer can be arranged. A virtual computer will work for that purpose.
If you are not very familiar with network capture skills (TCPdump, Wireshark or tshark), you can
There is an introductory video for network capture.
View ELEC-E7130 Network capture tutorial to look through those commands and codes in detail.
Take a look at some code snippets which may give you some help.
Learning outcomes
At the end of this assignment, students should be able to
- Capture internet traffic.
- Analyze the network captured traffic from different aspects.
- Get to know passive measurements.
- Compare the memory and time used by different methods and then choose a more appropriate method.
- Plot the traffic volume from the flow data.
- Get to know the differences between flow data and packet capture.
Introduction
This assignment contains three tasks to introduce in more detail the traffic data that can be analysed for different tools. Please read all instructions before starting because it is helpful to identify common work.
- Task 1: Introduction to the traffic data
- Task 2: Analyse flow data
- Task 3: Analyse packet capture (user traffic)
To use some of the course-specific tools, some environment settings are needed in Aalto servers. Depending on your login shell, you need to run one of the following commands on school computer. The first command is used if you have any Bourne Shell compatible (like the Aalto default zsh or bash).
Note: You may type the command
kinit
before accessing the directory to avoid issues related to the permissions.
source /work/courses/unix/T/ELEC/E7130/general/use.sh
source /work/courses/unix/T/ELEC/E7130/general/use.csh
You need to provide the tool’s name and method (command line, if any) you have used to answer the above questions in your report file. We recommend that you try to use at least one command-line tool for analysis because, in a final assignment, the data volume is much larger.
Task 1: Introduction to the traffic data
You must answer the following points appropriately:
- What is the passive measurement in terms of network traffic? What kind of information does it provide?
- Explain the concepts of packet capture and flow data and the information can be provided by them. What are the advantages, disadvantages and importance of network analysis?
- What is hashing? How does the hash algorithm work and what is the relation with the memory management in the large data analysis?
Task 2: Analyse flow data
First, use a tool (CoralReef, NetMate, tstat or program of your choice) to convert the given sample pcap file ($TRACE/capture/flow.pcap
) into flows.
Note: Remember to have executed the command
kinit
andsource /work/courses/unix/T/ELEC/E7130/general/use.sh
to be able to access the directory.
Once with flow data, answer the following points.
- Provide basic statistics of flow data, including
- total number of flows,
- minimum, median, mean and maximum flow sizes in bytes and packets
Plot the traffic volume (bytes) of the flow data file.
Note: Getting traffic volume is more difficult from flow data files due to the known information are only start time, end time, and flow size (bytes) (as shown in the figure). For example, if the flow contains 100,000 bytes starting at 3.4 and ending at 7.8, we can calculate that about 20,000 bytes for each second. See more information in Network capture tutorial (Traffic volume in certain interval, pp. 14).
Which are the most used protocols as well as the three source ports and the three destination ports most common (according to the flows)? Detail in a table for each one
- the number of flows
- the number of packets
- the amount of data (bytes)
- the application or usage
Hint: The column ‘pro’ defines the protocol used.
Which are the top-ten host pairs based on
- number of flows
- number of bytes Are there the same pairs?
Plot the number of flows for the 100 most common pairs of hosts
- Using linear scale
- Using logarithmic scale
Repeat the previous plot (both lineas and logarithmic scale) using this time fixed size (216 slots) array approach (Network capture tutorial - Large data analysis, pp. 8 and solution #2, pp. 10). What can you say about the results?
Is there a better way to do this (in terms of running time/memory consumption)?
Note: You can use
/bin/time
command to get resource consumption of a command, use-v
for more verbose. It provided a more detailed output than shell built-intime
.
Report, task 2
- Describe how you generated flow data
- Provide descriptive statistics
- Provide a table of top-ten host pairs
- Provide tables of top-five ports, top-five sport, and top-five dport most common with the information requested.
- Provide top-100 most common pairs plots and evaluate them
- Provide top-100 most common pairs plots with the fixed-array approach.
- Discussion on resource memory requirements.
Task 3: Analyse packet capture (user traffic)
In this task, capture the traffic data from your computer. In the case of using a virtual machine (VM), generate traffic within that virtual computer instead of the usual host because it acts as a separate computer.
Choose one of the packet-capturing tools available such as dumpcap, Wireshark, tcpdump, etc.; to capture network traffic for one hour or more while using the computer as your normally do (browse web, check e-mails, watch video, listen music, do assignments, and so on).
Analyze the captured data using suitable tool and answer following questions:
- How many IPv4 hosts (and IPv6, if any) are communicating?
- Top 5 host countries (e.g. GeoIP)
- Top 15 hosts by byte counts.
- Top 15 hosts by packet counts.
- Top 10 TCP and top 5 UDP port numbers (by packet count).
- Top 10 fastest TCP connections
- Bit and packet rate over time (e.g. tcpstat, capinfos)
- How many hosts were tried to contact to, but communication failed for a reason or another? Can you identify different subclasses of failed communications?
Note: Please choose one of the mass analysis tools to use such as shown in the Table 1. Mass analysis tools or another suitable tool (some packet-capturing software can also analyze for such a small amount of data, but it is better to practice the mass analyzer tool
Report, task 3
- Describe your analysis setup. Include code snippets.
- Answers to questions above.
- Did byte and packet count top hosts differ?
- Any interesting observations?
Grading standard
To pass this course, you need to achieve at least 15 points in this assignment. And if you submit the assignment late, you can get a maximum of 15 points.
You can get up to 30 points for this assignment:
Task 1
- Explain the concepts requested related to traffic data. (4p)
Task 2
- Use the correct method to convert a given file into flows. (1p)
- Accurately answer the 7 questions raised in the task. (12p)
Task 3
- Accurately answer the 8 questions raised in the task. (11p)
- Summarize based on the answers to the questions you answered. (1p)
The quality of the report (bonus 2p)
The instruction of assignment
For the assignment, your submission must contain (Please don’t contain original data in your submission):
- A zip file that includes your codes and scripts.
- A PDF file as your report.
Regarding the report, your report must have:
- A cover page indicating your name, student ID and your e-mail address.
- The report should include a description of measurements, a summary of the results and conclusions based on the results.
- An explanation of each problem, explain how you solved it and why you did it.
Annex
- How to calculate traffic volume with bits per second?
See more information in Network capture tutorial (Traffic volume in certain interval, pp. 14).
- 28. syyskuuta 2022, 17.18