Visualizing Rebel Alliances in the UK Government

The UK will shortly go to the polls for the 2015 General Election. However there's currently no clear front-runner, and in fact no clear coalition on the cards for a new government. The "new normal" of hung parliaments and coalition forming as part of UK politics appears to be here to stay.

As such, I decided to take a look at the open dataset provided by The Public Whip project, with a view to visualizing the relationships between MPs (members of parliament) in the 2010 to 2015 UK government, using a tool called Gephi. The idea was to analyse how MPs are related through their voting patterns in the house of commons, and in particular how they are related through agreement or rebelliousness.

Also I'll admit it: I wanted to write an article with "Rebel Alliance" in the title because I like Star Wars.

In the rest of this article, I'll describe several visualizations that were created from that public whip dataset. These show various aspects of MP relation…

How to create your own replica of the SureChEMBL patent-chemistry dataset

Introduction: Why replicate SureChEMBL? SureChEMBL is a patent chemistry dataset and set of web services that provides a rich source of information to the drug discovery research community. It was previously owned, developed, and sold by Macmillan, but was recently handed over to the European Bioinformatics Institute (EMBL/EBI) and is now free for everyone to use.

SureChEMBL can already be accessed online, so why would a locally hosted replica be needed?

To answer that question, I'll give the reasons provided by a pharmaceutical company who recently commissioned me to develop a SureChEMBL data replication facility:
1) Firewall restrictions can be avoided - companies involved in drug discovery are often working with substructures or other related search queries which may lead to highly lucrative discoveries. As such, researchers are often prohibited from using external web services, even secure services such as SureChEMBL, as a risk-mitigation strategy. Downloading data files - e.g…

Anatomy of an Emerging Knowledge Network: The Zapnito Graph Vizualized

In this article, I take a high-level look Zapnito, a multi-tenant "Networked Knowledge" platform designed around small, expert communities.

Zapnito is a knowledge sharing platform that allows organizations to create branded networks of experts. It's aimed at publishers, consultancies, media companies, and other corporations. Zapnito includes some social features (such as follow relationships, collaboration), but its focus is knowledge sharing rather than social networking.

As the founder puts it: "Zapnito is a white label platform that offers knowledge network capabilities for publishers. We provide both highly private and open networks, and we own neither publisher content or associated data - both of these are retained by publishers." 

The aim of this article is to show some of the interesting insights that can be gained from basic Social Network Analysis (SNA) of Zapnito. I'll be showing visualizations (such as that on the right) built from an anonymiz…

I Know Where You Were Last Summer: London's public bike data is telling everyone where you've been

This article is about a publicly available dataset of bicycle journey data that contains enough information to track the movements of individual cyclists across London, for a six month period just over a year ago.

I'll also explore how this dataset could be linked with other datasets to identify the actual people who made each of these journeys, and the privacy concerns this kind of linking raises.


It probably won't surprise you to learn that there is a publicly available Transport For London dataset that contains records of bike journeys for London's bicycle hire scheme. What may surprise you is that this record includes unique customer identifiers, as well as the location and date/time for the start and end of each journey. The public dataset currently covers a period of six months between 2012 and 2013.

What are the consequences of this? It means that someone who has access to the data can extract and analyse the journeys made by individual cyclists within London du…

London maps and bike rental communities, according to Boris Bike journey data

Every time someone in London makes a journey on a Boris Bike (officially, the Barclays Cycle Hire Scheme), the local government body Transport For London (TFL) record that journey. TFL make some of this data available for download, to allow further analysis and experimentation.

Below, you'll find maps of the most popular bike stations and routes in London, created from the TFL data using Gephi, plus a few simple data processing scripts that I threw together. The idea for these maps originated within a project group at a course on Data Visualisation, held at the Guardian last year. We're working on a more publisher friendly form, so thank you to my course mates for giving me the go ahead to include them here.

First, here's a map showing all bike stations and all popular journeys.

The first map shows the most popular routes and bike stations, those with more than ~150 journeys made during the six months of data that TFL make available. The size of each bike station in this m…

Visualizing The Wheel of Time: Reader Sentiment for an Epic Fantasy Series

In the following blog post, I explore reader sentiment for the Epic Fantasy book series The Wheel of Time, as expressed in user-submitted ratings on Amazon and GoodReads.

If you're a data scientist (or similar), you'll probably be interested in the data analysis which includes some interesting observations about the usefulness - or otherwise - of Amazon reviews. If you're a Epic Fantasy Series reader, you'll be interested in the outcome of my analysis: I've decided to go ahead and read all fourteen books.

Note -This is a spoiler free zone. 
Updated 17th Sept: charts now more accurate, not quite as pretty

Recently, I was looking for a good book to read, and a friend recommended The Wheel of Time series by Robert Jordan. But I'd heard from a few sources that the later volumes were harder going than the earlier volumes. Having struggled with later volumes of the Game of Thrones series (sorry George) I was wary of starting a mammoth fourteen volume series unless I …

Personal Data Hacks: Visualizing Data from

A friend recently told me about, a website that allows you to record, analyze, and share personal flight data. He showed me his dataset, which contained a record of every flight he'd taken over the past 10+ years. I was keen to investigate the dataset further, and my friend was happy to provide me with a copy so I could have a play (thank you Luigi!).

The end result is the following collection of visualizations created in Gephi, with a little help from R. They show key transport hubs and routes for airports, countries, and continents that my friend has visited, and demonstrate some of the fun, insightful ways you can use such personal data.

If you're interested in how the visualizations were created, check out the section at the end of this blog posting where I briefly describe the technologies required and steps involved.

Note: is free to use, and supported by advertising and donations. You can join me in supporting via this lin…