Visualising the confirmed cases of COVID-19 in England (15 March 2020)

Preamble

In [23]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import plotly.graph_objects as go    # for data visualisation
import plotly.io as pio              # to set shahin plot layout
from geopy.geocoders import Nominatim
from geopy.exc import GeocoderTimedOut
from IPython.display import display, clear_output

pio.templates['shahin'] = pio.to_templated(go.Figure().update_layout(legend=dict(orientation="h",y=1.1, x=.5, xanchor='center'),margin=dict(t=0,r=0,b=0,l=0))).layout.template
pio.templates.default = 'shahin'

Introduction

This section is similar to the previous ones on Visualising the confirmed cases of COVID-19 in England with Scattergeo and Mapbox.

Since publishing those visualisations, it was announced that the data source is no longer being updated. Instead, the data has now been made available at arcgis.com

Terms of use taken from the data source

No special restrictions or limitations on using the item’s content have been provided.

Visualising the Table

The first step is to read the CSV data into a pandas.DataFrame and display the first five samples.

In [8]:
data = pd.read_csv('https://www.arcgis.com/sharing/rest/content/items/b684319181f94875a6879bbc833ca3a6/data')
data.head()
Out[8]:
GSS_CD GSS_NM TotalCases
0 E09000002 Barking and Dagenham 7
1 E09000003 Barnet 25
2 E08000016 Barnsley 5
3 E06000022 Bath and North East Somerset 2
4 E06000055 Bedford 0

We have the local authority (GSS_NM) and the number of confirmed cases (TotalCases). Let's add two new columns, lat and lon, so that we can store the location information that we'll use to visualise the data.

In [9]:
data["lat"] = np.nan
data["lon"] = np.nan
data.head()
Out[9]:
GSS_CD GSS_NM TotalCases lat lon
0 E09000002 Barking and Dagenham 7 NaN NaN
1 E09000003 Barnet 25 NaN NaN
2 E08000016 Barnsley 5 NaN NaN
3 E06000022 Bath and North East Somerset 2 NaN NaN
4 E06000055 Bedford 0 NaN NaN

We can see the columns are empty. Let's populate these by making requests through GeoPy.

First, we need an instance of the Nominatim (geocoder for OpenStreetMap data) object. We don't want to violate the usage policy, so we'll also pass in a user_agent.

In [10]:
geolocator = Nominatim(user_agent="covid_shahinrostami.com")

Let's see if we can get some location data using one of our GSS_NM items. To demonstrate, we'll use the one for my hometown, Bournemouth.

In [11]:
data.GSS_NM[10]
Out[11]:
'Bournemouth, Christchurch and Poole'

This will now be passed into the geocode() method. We'll also append "UK" to the string for disambiguation, e.g. France has a "Bury" too.

In [12]:
location = geolocator.geocode(f"{data.GSS_NM[10]}, UK")
location
Out[12]:
Location(Bournemouth, Bournemouth, Christchurch and Poole, South West England, England, United Kingdom, (50.744673199999994, -1.8579577396350433, 0.0))

It looks like it's returned all the information we need. We will need to access this directly too.

In [13]:
print(location.latitude, location.longitude)
50.744673199999994 -1.8579577396350433

Now we need to do this for every local_authority in our dataset and fill in the missing lat and lon values.

In [14]:
for index, row in data.iterrows():
    location = geolocator.geocode(f"{row.GSS_NM}, UK",timeout=100)

    data.loc[index,'lat'] = location.latitude 
    data.loc[index,'lon'] = location.longitude

    # None of the following code is required
    # I just wanted a progress bar!
    clear_output(wait = True)
    amount_unloaded = np.floor(((data.shape[0]-index)/data.shape[0])*25).astype(int)
    amount_loaded = np.ceil((index/data.shape[0])*25).astype(int)
    loading = f"Retrieving locations >{'|'*amount_loaded}{'.'*amount_unloaded}<"
    display(loading)

print("Done!")
'Retrieving locations >|||||||||||||||||||||||||<'
Done!

Now let's put this on the map! We'll go for a bubble plot on a map of the UK, where larger bubbles indicate more confirmed cases.

Note

To plot on Mapbox maps with Plotly you will need a Mapbox account and a public Mapbox Access Token. I've removed mine from mapbox_access_token in the cell below.

In [21]:
data['text'] = data['GSS_NM'] + '<br>Confirmed Cases ' + (data['TotalCases']).astype(str)

import plotly.graph_objects as go

mapbox_access_token = "your_mapbox_access_token"

fig = go.Figure(go.Scattermapbox(
    lon = data['lon'],
    lat = data['lat'],
        mode = 'markers',
        marker = go.scattermapbox.Marker(
            size = data['TotalCases']/1,
            color = 'rgb(180,0,0)',
        ),
        text = data['text'],
    ))

fig.update_layout(
    autosize = True,
    hovermode = 'closest',
    mapbox = dict(
        accesstoken = mapbox_access_token,
        bearing = 0,
        center = {'lat': (data.lat.min() + data.lat.max())/2,
                'lon': (data.lon.min() + data.lon.max())/2},
        pitch = 0,
        zoom = 5,
        style = "basic", # try basic, dark, light, outdoors, or satellite.
    ),
)

fig.show()

It's an interactive plot, so you can hover over it to get more information.

Conclusion

In this notebook, we went on a rather quick journey. We loaded in some data from a hosted CSV file, used a helpful service to populate some location data, and plotted it all on a map using Plotly.

Support this work

You can support this work by getting the e-book. This notebook will always be available for free in its online format.