The following project topics are available for the course: each team picks one topic and dataset (it's OK for several teams to work on the same dataset).


  • Project: Reality Commons
  • Data description: Data that we will use for this topic comes from Reality Mining experiment at MIT. In this experiment, 75 participants were given mobile phone and their call logs, as well as Bluetooth data and some other types of data, were collected from the participants over several months.
  • Possible research questions:
    • Can you correlate any personal attributes/features from the survey data with people's social network structure (like their degrees)?
    • How do people shape their personal networks (how large is the fraction of communication targeted at the top friends? Does this differ? Does this differ per channel?)
    • Do people have persistent daily rhythms (e.g. staying up late, visible as late-night text messages)? Are people with lots of night-time activity more popular? [see Aledavood et al, EPJ Data Science 2019]
  • Link to data: http://realitycommons.media.mit.edu/index.html

  • Project: Sociopatterns
  • Data description: "SocioPatterns is an interdisciplinary research collaboration formed in 2008 that adopts a data-driven methodology to study social dynamics and human activity. Since 2008, we have collected longitudinal data on the physical proximity and face-to-face contacts of individuals in numerous real-world environments, covering widely varying contexts across several countries: schools, museums, hospitals, etc. We use the data to study human behaviour and to develop agent-based models for the transmission of infectious diseases."
  • Possible research questions:
  • How long do people talk to one another at conferences? How many people do they meet? Are there different "networking strategies"?
  • How do the temporal networks of contacts at schools/workplaces affect the spreading of (simulated) disease? What would be the best interventions to stop the spread?

  • Project: Bitcoin OTC trust
  • Data description: “This is who-trusts-whom network of people who trade using Bitcoin on a platform called Bitcoin OTC. Since Bitcoin users are anonymous, there is a need to maintain a record of users' reputation to prevent transactions with fraudulent and risky users. Members of Bitcoin OTC rate other members in a scale of -10 (total distrust) to +10 (total trust) in steps of 1. This is the first explicit weighted signed directed network available for research.”¨
  • Possible research questions: How does the rating network evolve? Can someone’s “final”/long-term rating be predicted on the basis of the first ratings? How can you trust a rating?
  • Link to data: https://snap.stanford.edu/data/soc-sign-bitcoin-otc.html

  • Project: Public transport networks
  • Data description: Data for the public transport networks of 25 cities across the world in multiple easy-to-use data formats. These data formats include network edge lists, temporal network event lists, SQLite databases, GeoJSON files, and General Transit Feed Specification (GTFS) compatible ZIP-files. The source data for creating these networks has been published by public transport agencies using GTFS data format. To produce the network data extracts for each city, the original data have been curated for errors, filtered spatially and temporally and augmented with walking distances between public transport stops using data from OpenStreetMap. Cities included in this dataset version: Adelaide, Belfast, Berlin, Bordeaux, Brisbane, Canberra, Detroit, Dublin, Grenoble, Helsinki, Kuopio, Lisbon, Luxembourg, Melbourne, Nantes, Palermo, Paris, Prague, Rennes, Rome, Sydney, Toulouse, Turku, Venice, and Winnipeg.
  • Possible research questions:
  • When looking at network topology only (or multiplex-network-topology), how do the public transport networks of cities differ? Do they look different in cities of different size? On different continents? Of different geographies?
  • Going to the details, using the schedules and computing routes and trip times, do cities differ in terms of accessibility (=how easy it is on average to get from A to B?) (see Kujala et al., 2018)
  • Are the networks of some cities more vulnerable than those of others? Why?


Last modified: Friday, 1 March 2019, 8:36 AM