37E44000 - Critical Issues in Information Systems Research D, Lecture, 4.5.2023-24.5.2023
This course space end date is set to 24.05.2023 Search Courses: 37E44000
Homework
Experience with Network Analysis using Gephi
In this assignment, you will construct a network visualization and interpret it. The dataset we will use is from Amazon website where we know which products were bought together. We do not have the actual names of the product but just Ids. The only thing we have about the products is their type: technology, books, furniture or household appliances. The network of products contains 2,270 nodes and 2,212 edges. When presenting the results, do not just report the numbers. Interpret and make them practically significant. An example visualization is presented below.
Software:
Step 1: Download and install Gephi from - https://gephi.org/
Data uploading
Two CSV files are present in the Homework folder: Nodes.csv and Edges.csv.
Step 2: Open Gephi and go to Data Laboratory on the top. First, import Nodes.csv and make sure you specify the nodes as Undirected and append to the existing Workplace at the end.
Step 3: After importing node table, import Edges table. Again, make sure you specify the edges as Undirected and append to the existing sheet at the end.
Step 4: Click on the Overview on the top where you see your network visualization.
Step 5: Partition the nodes by Color based on the type of product.
Step 6: Modify the size of a node based on its Degree.
Step 7: Calculate the Degree, Closeness and Betweenness Centrality.
Answer the following Questions:
Question 1: Paste a screenshot of the Product network visualization presented using any Layout of your choice. Interpret the network based on your first impression.
Question 2: What is the average degree of the network? Interpret it in terms of the product network.
Question 3: Report the density of network. How do you interpret the density of network? What does it say about network sparseness?
Question 4: Which is the top product with the highest degree centrality in our network? What does it say about the product?
Question 5: Which product has the highest clustering coefficient? What can you do with this information?