May 20, 2016 Hatem Ben Yacoub

How to Create Open Data regional Heatmap with R

In the regional Open Data report in Arab world, the map in the report cover was created in R. Looks simple ? Let’s see how to do this :

First we need some data, so we have to download the ODB-3rdEdition-Rankings.csv

Screen Shot ODBScore2015

I’ll use here the IDs to select the 9 Arab countries using :

Screen Shot ODB2015Arab

That’s pretty simple I think ! We got the data, now let’s put this on a map. I’ll be using ggplot2 library here, so we’ll need to install.package first if you don’t have it already :

Rplot04

We don’t need to plot the whole world, the reason why we have to use a subset, I’ll put the whole mena_region in one variable, then use odb_region to put only countries that are covered in this edition of the ODB.

Rplot05

Here you should notice that there is one missing country in the map (That I did not notice myself in the beginning too), can you notice it ? Well in the ODB document Emirates is mentioned as UAE while in the map_data it’s mentioned as United Arab Emirates, so we’ll need to fix this :

Screen Shot 2016-05-20 at 7.58.22 PM

This is the behaviour of read.csv which convert string to factor, simple way to fix this is to disable stringsAsFactors since the beginning, the whole code become :

Rplot13

I can see the missing country ! Well, I that’s not the exact code but you should notice that all countries are there !

Now I will plot the ODB data on this map, first thing is to merge odb_region with ODB2015Arab :

I just renamed the ‘region’ column to be able to merge it by ‘Country’. See the result below :

Screen Shot odb_region

You notice that odb_region have 31 variables after it was only 6 before the merge. Now you can plot any variable available in ODB-3rdEdition-Rankings.csv file.

Let’s start with the ODB Score, the one I used on the report cover, I think the code below is self-explanatory :

Rplot07

I will not go into explaining the whole code in details as it’s just parameters of the ggplot() function to remove axis, labels, set colors.. etc. We can plot for example the Rank.Change this way  :

Rplot12

Or we the Implementation :

Rplot11

The only issue here is how to make the right choice of colours. Hope this could help you to play with ODB data and plot any kind of variable in the region of your choice.

Enjoy !

Tagged: , , , ,

About the Author

Hatem Ben Yacoub Energy Engineer, Entrepreneur, ICT & eGov Consultant with over 15 years experience. Independent Open Data expert.

(HBY) Consultancy