Friday, 4 January 2013

Personal Data Hacks: Visualizing Data from OpenFlights.org

A friend recently told me about OpenFlights.org, a website that allows you to record, analyze, and share personal flight data. He showed me his dataset, which contained a record of every flight he'd taken over the past 10+ years. I was keen to investigate the dataset further, and my friend was happy to provide me with a copy so I could have a play (thank you Luigi!).

The end result is the following collection of visualizations created in Gephi, with a little help from R. They show key transport hubs and routes for airports, countries, and continents that my friend has visited, and demonstrate some of the fun, insightful ways you can use such personal data.

If you're interested in how the visualizations were created, check out the section at the end of this blog posting where I briefly describe the technologies required and steps involved.

Note: OpenFlights.org is free to use, and supported by advertising and donations. You can join me in supporting OpenFlights.org via this link.

Hub Airports and Key Routes

The first visualization below shows the primary airports and routes used by Luigi. Each airport has been ranked in size and colour according to the number of other connected airports, while each connection has been weighted according to the number of times that route was flown. The layout here was generated in Gephi, ensuring (simply put) that related nodes are co-located:


As you can see, PSA (Pisa) and STN (London Stansted) are far and away the most used airports. Not only that, but the return journey between the two airports has been taken many times. These two facts make perfect sense given that Luigi is from Pisa, but moved to the UK a few years ago. Other significant hubs are London Heathrow, London Gatwick, and Rome - not too surprising.

Key Countries and Connections 

Given that many airports are within the same country, is it possible to reflect that in the visualization? One way to achieve this is to partition airports by colours corresponding to different countries, as follows:


So that's kind of OK - the predominance of Italy (yellowish-green) and the UK (blue) - is starting to show, but it's quite confusing.

A better approach is to group the airports and connections by country, and to layout the nodes according to (approximate) geographical positions. The following graph also has a few graphical tweaks for readability:


We're now getting to something approximating a worldwide travel heatmap for my friend. The key travel hubs of the UK and Italy are obvious, also key routes are also jumping out more: between Italy and the UK, Italy and Germany / France, and the UK and Spain. The significance of the other routes also becomes a bit more apparent - further afield countries corresponding to occasional holiday travel (for instance).

Continental Travel

What about different continents? If we return to the original graph and partition the airports by continent, a European bias becomes very clear:


It's also nice to see the groupings of continental airports jumping out - in particular the Green nodes in the bottom right corresponding to African airports. Note that I avoided grouping by continent here because the resulting node for Europe dwarfed all the other nodes, which didn't make for a good visualization.

Creating the Visualizations

The flight data is downloadable from OpenFlights.org as Comma Separated Values. I used a little command-line manipulation (awk, sort, and uniq) to compress the data into a list of unique flights, with a count corresponding to the number of times that flight was taken.

Next, I loaded the data into R, then converted it into a graph which could be easily exported to GML (Graph Modelling Language), then loaded into Gephi and visualized.

The downloaded dataset didn't contain city, country, or continent data. Adding this required an export of nodes from Gephi, followed by a merge with the OpenFlights.org Airport dataset (spreadsheet magic), and a re-import into Gephi.

8 comments:

  1. Hi,
    Nice job !

    I would like to share a similar job I've done several months ago with the OpenFlights.org global dataset.

    http://matthieu-totet.fr/Koumin/iatamaps/

    Eager to see other gephi works from you ;-)

    ReplyDelete
  2. That's interesting to see the different visualisations, but why not simply overlay on a world map?

    ReplyDelete
    Replies
    1. Would certainly make sense, though it may make it a little harder to make out the connections.

      Delete
  3. Nice work Jim! My favourite map is the worldwide travel heatmap. With the tools you used would it be possible to add an interactive 'time slider window' to the graph to see what routes were travelled between certain dates? The slider could run horizontally beneath the graph. The lefthand edge of the slider sets the start date and the righthand side the end date. Only data between those dates would be displayed.

    ReplyDelete
  4. Nice ideas. Unfortunately Gephi is fairly static, so it would be a lot of effort to create any kind of dynamic content :/

    ReplyDelete
  5. It's also nice to see the groupings of continental airports jumping out - in particular the Green nodes in the bottom right corresponding to African airports.

    Glyn Willmoth

    ReplyDelete
  6. Excellent visualizations, by the way, are there any corresponding software for that? On the other hand, I have some some personal data protection resource to share - (877)-871-1295

    ReplyDelete
  7. Very Good !
    May I ask how to see aircraft registration in the statistics or top 10 ?

    ReplyDelete