Animal Crossing Villagers - Style Co-occurrence

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from chord import Chord

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the Pokemon with stats Generation 8 dataset to visualise the co-occurrence of Pokémon types from generations one to eight.

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

In [2]:
data_url = '/mnt/d/shahinrostami.github.io/files/datasets/ac/villagers.csv'
data = pd.read_csv(data_url)
data.head()
Out[2]:
Name Species Gender Personality Hobby Birthday Catchphrase Favorite Song Style 1 Style 2 Color 1 Color 2 Wallpaper Flooring Furniture List Filename Unique Entry ID
0 Admiral Bird Male Cranky Nature 27-Jan aye aye Steep Hill Cool Cool Black Blue dirt-clod wall tatami 717;1849;7047;2736;787;5970;3449;3622;3802;410... brd06 B3RyfNEqwGmcccRC3
1 Agent S Squirrel Female Peppy Fitness 2-Jul sidekick Go K.K. Rider Active Simple Blue Black concrete wall colorful tile flooring 7845;7150;3468;4080;290;3971;3449;1708;4756;25... squ05 SGMdki6dzpDZyXAw5
2 Agnes Pig Female Big Sister Play 21-Apr snuffle K.K. House Simple Elegant Pink White gray molded-panel wall arabesque flooring 4129;7236;7235;7802;896;3428;4027;7325;3958;71... pig17 jzWCiDPm9MqtCfecP
3 Al Gorilla Male Lazy Fitness 18-Oct ayyyeee Go K.K. Rider Active Active Red White concrete wall green rubber flooring 1452;4078;4013;833;4116;3697;7845;3307;3946;39... gor08 LBifxETQJGEaLhBjC
4 Alfonso Alligator Male Lazy Play 9-Jun it'sa me Forest Life Simple Simple Red Blue yellow playroom wall green honeycomb tile 4763;3205;3701;1557;3623;85;3208;3584;4761;121... crd00 REpd8KxB8p9aGBRSE
In [ ]:
 

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

In [3]:
data
Out[3]:
Name Species Gender Personality Hobby Birthday Catchphrase Favorite Song Style 1 Style 2 Color 1 Color 2 Wallpaper Flooring Furniture List Filename Unique Entry ID
0 Admiral Bird Male Cranky Nature 27-Jan aye aye Steep Hill Cool Cool Black Blue dirt-clod wall tatami 717;1849;7047;2736;787;5970;3449;3622;3802;410... brd06 B3RyfNEqwGmcccRC3
1 Agent S Squirrel Female Peppy Fitness 2-Jul sidekick Go K.K. Rider Active Simple Blue Black concrete wall colorful tile flooring 7845;7150;3468;4080;290;3971;3449;1708;4756;25... squ05 SGMdki6dzpDZyXAw5
2 Agnes Pig Female Big Sister Play 21-Apr snuffle K.K. House Simple Elegant Pink White gray molded-panel wall arabesque flooring 4129;7236;7235;7802;896;3428;4027;7325;3958;71... pig17 jzWCiDPm9MqtCfecP
3 Al Gorilla Male Lazy Fitness 18-Oct ayyyeee Go K.K. Rider Active Active Red White concrete wall green rubber flooring 1452;4078;4013;833;4116;3697;7845;3307;3946;39... gor08 LBifxETQJGEaLhBjC
4 Alfonso Alligator Male Lazy Play 9-Jun it'sa me Forest Life Simple Simple Red Blue yellow playroom wall green honeycomb tile 4763;3205;3701;1557;3623;85;3208;3584;4761;121... crd00 REpd8KxB8p9aGBRSE
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
386 Winnie Horse Female Peppy Fashion 31-Jan hay-OK K.K. Country Cool Simple White Gray backyard-fence wall backyard lawn 1626;1626;5970;4003;3509;1620;1627;3467;1631;1... hrs05 b4HwfyvThyipScSAv
387 Wolfgang Wolf Male Cranky Education 25-Nov snarrrl K.K. D&B Cool Active Black Green dark wooden-mosaic wall stripe flooring 4117;7323;7323;3275;4109;3270;3196;4338;3200;3... wol02 RbF2wcn6jRxtgLDRd
388 Yuka Koala Female Snooty Fashion 20-Jul tsk tsk Soulful K.K. Cool Elegant Orange Yellow beige art-deco wall simple purple flooring 3957;3955;2554;3974;3951;794;4106;3959;3958;997 kal00 QDcxk3dCNT6yeD9hk
389 Zell Deer Male Smug Music 7-Jun pronk K.K. Swing Cool Gorgeous Purple Gray cityscape wall monochromatic tile flooring 1875;863;4129;4053;4053;3951;794;3775;4046;423... der02 LodBWtdMRZbjFNga9
390 Zucker Octopus Male Lazy Nature 8-Mar bloop Spring Blossoms Simple Cute Blue Yellow chain-link fence backyard lawn 4042;4412;7526;4077;4077;3064;4077;3946;3617;3... ocp02 2F5tipHqgmWvWXTLC

391 rows × 17 columns

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

In [4]:
pd.DataFrame(data.columns.values.tolist())
Out[4]:
0
0 Name
1 Species
2 Gender
3 Personality
4 Hobby
5 Birthday
6 Catchphrase
7 Favorite Song
8 Style 1
9 Style 2
10 Color 1
11 Color 2
12 Wallpaper
13 Flooring
14 Furniture List
15 Filename
16 Unique Entry ID

So let's select just these two columns and work with a list containing only them as we move forward.

Without further investigation, we can see that we have at least a few NaN values in the table above. We are only interested in co-occurrence of types, so we can remove all samples which contain a NaN value.

We can also see an instance where the type Fighting at index $1014$ is followed by \n. We'll strip all these out before continuing.

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

First we'll populate our list of type names by looking for the unique ones.

In [5]:
names = np.unique(data[['Style 1', 'Style 2']]).tolist()
pd.DataFrame(names)
Out[5]:
0
0 Active
1 Cool
2 Cute
3 Elegant
4 Gorgeous
5 Simple

Now we can create our empty co-occurrence matrix using these type names for the row and column indeces.

In [6]:
matrix = pd.DataFrame(0, index=names, columns=names)
matrix
Out[6]:
Active Cool Cute Elegant Gorgeous Simple
Active 0 0 0 0 0 0
Cool 0 0 0 0 0 0
Cute 0 0 0 0 0 0
Elegant 0 0 0 0 0 0
Gorgeous 0 0 0 0 0 0
Simple 0 0 0 0 0 0

We can populate a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

Which we can now use to create the matrix.

In [8]:
for index, x in data.iterrows():
    if(x['Style 1'] != x['Style 2'] ):
        matrix.at[x['Style 1'], x['Style 2']] += 1
        matrix.at[x['Style 2'], x['Style 1']] += 1
    if(x['Style 1'] == x['Style 2']):
        matrix.at[x['Style 1'], x['Style 1']] += 1

matrix = matrix.values.tolist()

We can list DataFrame for better presentation.

In [9]:
pd.DataFrame(matrix)
Out[9]:
0 1 2 3 4 5
0 6 22 13 3 5 45
1 22 9 0 20 22 45
2 13 0 16 22 10 48
3 3 20 22 1 46 17
4 5 22 10 46 1 7
5 45 45 48 17 7 33
In [120]:
colors = ["#fee440","#00bbf9","#00f5d4","#9b5de5","#f15bb5","#f68251"]


Chord(matrix, names, colors=colors).show()
Chord Diagram

Chord Diagram with Names

It would be nice to show a list of Pokémon names and images when hovering over co-occurring Pokémon types. To do this, we can make use of the optional details parameter.

Let's also add a column to our dataset to store URLs that point to the images.

In [102]:
data['URL'] = ""

for index, row in data.iterrows():
    #url = f"http://127.0.0.1:8000/images/data-is-beautiful/lol/champion/{row.name}.png"
    url = "https://shahinrostami.com/images/data-is-beautiful/villagers/{}.png".format(row.Name.replace(' ', '_').replace("'", '_').replace('.', ''))
    data.at[index,'URL'] = url
In [103]:
details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)

Now we can populate the details array with lists of Pokémon names in the correct positions.

In [104]:
details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)

for count_x, item_x in enumerate(names):
    for count_y, item_y in enumerate(names):
        details_urls =[]
        details_names=[]
        if(item_y == item_x):
            print("pow")
            details_urls = data[
                (data['Style 1'] == item_x) & (data['Style 2'] == item_x)]['URL'].to_list()

            details_names = data[
                (data['Style 1'] == item_x) & (data["Style 2"] == item_x)]['Name'].to_list()
        else:
            details_urls = data[
                ((data['Style 1'].isin([item_x])) &
                (data['Style 2'].isin([item_y]))) |
                ((data['Style 2'].isin([item_x])) &
                (data['Style 1'].isin([item_y])))]['URL'].to_list()
            details_names = data[
                ((data['Style 1'].isin([item_x])) &
                (data['Style 2'].isin([item_y]))) |
                ((data['Style 2'].isin([item_x])) &
                (data['Style 1'].isin([item_y])))]['Name'].to_list()
        
        urls_names = np.column_stack((details_urls, details_names))
        if(urls_names.size > 0):
            
            details[count_x][count_y] = details_names
            details_thumbs[count_x][count_y] = details_urls

        else:
            print("reset")
            details[count_x][count_y] = []
            details_thumbs[count_x][count_y] = []

details=pd.DataFrame(details).values.tolist()
details_thumbs=pd.DataFrame(details_thumbs).values.tolist()
pow
pow
reset
reset
pow
pow
pow
pow
In [105]:
print(data.Name)
0       Admiral
1       Agent S
2         Agnes
3            Al
4       Alfonso
         ...   
386      Winnie
387    Wolfgang
388        Yuka
389        Zell
390      Zucker
Name: Name, Length: 391, dtype: object

Finally, we can put it all together but this time with the details matrix passed in.

In [121]:
Chord(
    matrix,
    names,
    colors=colors,
    details=details,
    details_thumbs=details_thumbs,
    noun="Villagers",
    thumbs_width=50,
    thumbs_margin=0,
    popup_width=740,
    thumbs_font_size=10,
    credit=True
).show()
Chord Diagram

Conclusion

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!

League of Legends - Class Combinations

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from chord import Chord

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the Pokemon with stats Generation 8 dataset to visualise the co-occurrence of Pokémon types from generations one to eight.

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

In [2]:
data_url = 'https://shahinrostami.com/datasets/lol/champion.json'
data = pd.read_json(data_url)
data.head()
Out[2]:
type format version data
Aatrox champion standAloneComplex 10.13.1 {'version': '10.13.1', 'id': 'Aatrox', 'key': ...
Ahri champion standAloneComplex 10.13.1 {'version': '10.13.1', 'id': 'Ahri', 'key': '1...
Akali champion standAloneComplex 10.13.1 {'version': '10.13.1', 'id': 'Akali', 'key': '...
Alistar champion standAloneComplex 10.13.1 {'version': '10.13.1', 'id': 'Alistar', 'key':...
Amumu champion standAloneComplex 10.13.1 {'version': '10.13.1', 'id': 'Amumu', 'key': '...
In [ ]:
 

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

In [3]:
data = pd.DataFrame(data.data.tolist()).set_index(data.index)
In [4]:
data
Out[4]:
version id key name title blurb info image tags partype stats
Aatrox 10.13.1 Aatrox 266 Aatrox the Darkin Blade Once honored defenders of Shurima against the ... {'attack': 8, 'defense': 4, 'magic': 3, 'diffi... {'full': 'Aatrox.png', 'sprite': 'champion0.pn... [Fighter, Tank] Blood Well {'hp': 580, 'hpperlevel': 90, 'mp': 0, 'mpperl...
Ahri 10.13.1 Ahri 103 Ahri the Nine-Tailed Fox Innately connected to the latent power of Rune... {'attack': 3, 'defense': 4, 'magic': 8, 'diffi... {'full': 'Ahri.png', 'sprite': 'champion0.png'... [Mage, Assassin] Mana {'hp': 526, 'hpperlevel': 92, 'mp': 418, 'mppe...
Akali 10.13.1 Akali 84 Akali the Rogue Assassin Abandoning the Kinkou Order and her title of t... {'attack': 5, 'defense': 3, 'magic': 8, 'diffi... {'full': 'Akali.png', 'sprite': 'champion0.png... [Assassin] Energy {'hp': 575, 'hpperlevel': 95, 'mp': 200, 'mppe...
Alistar 10.13.1 Alistar 12 Alistar the Minotaur Always a mighty warrior with a fearsome reputa... {'attack': 6, 'defense': 9, 'magic': 5, 'diffi... {'full': 'Alistar.png', 'sprite': 'champion0.p... [Tank, Support] Mana {'hp': 600, 'hpperlevel': 106, 'mp': 350, 'mpp...
Amumu 10.13.1 Amumu 32 Amumu the Sad Mummy Legend claims that Amumu is a lonely and melan... {'attack': 2, 'defense': 6, 'magic': 8, 'diffi... {'full': 'Amumu.png', 'sprite': 'champion0.png... [Tank, Mage] Mana {'hp': 613.12, 'hpperlevel': 84, 'mp': 287.2, ...
... ... ... ... ... ... ... ... ... ... ... ...
Zed 10.13.1 Zed 238 Zed the Master of Shadows Utterly ruthless and without mercy, Zed is the... {'attack': 9, 'defense': 2, 'magic': 1, 'diffi... {'full': 'Zed.png', 'sprite': 'champion4.png',... [Assassin] Energy {'hp': 584, 'hpperlevel': 85, 'mp': 200, 'mppe...
Ziggs 10.13.1 Ziggs 115 Ziggs the Hexplosives Expert With a love of big bombs and short fuses, the ... {'attack': 2, 'defense': 4, 'magic': 9, 'diffi... {'full': 'Ziggs.png', 'sprite': 'champion4.png... [Mage] Mana {'hp': 536, 'hpperlevel': 92, 'mp': 480, 'mppe...
Zilean 10.13.1 Zilean 26 Zilean the Chronokeeper Once a powerful Icathian mage, Zilean became o... {'attack': 2, 'defense': 5, 'magic': 8, 'diffi... {'full': 'Zilean.png', 'sprite': 'champion4.pn... [Support, Mage] Mana {'hp': 504, 'hpperlevel': 82, 'mp': 452, 'mppe...
Zoe 10.13.1 Zoe 142 Zoe the Aspect of Twilight As the embodiment of mischief, imagination, an... {'attack': 1, 'defense': 7, 'magic': 8, 'diffi... {'full': 'Zoe.png', 'sprite': 'champion4.png',... [Mage, Support] Mana {'hp': 560, 'hpperlevel': 92, 'mp': 425, 'mppe...
Zyra 10.13.1 Zyra 143 Zyra Rise of the Thorns Born in an ancient, sorcerous catastrophe, Zyr... {'attack': 4, 'defense': 3, 'magic': 8, 'diffi... {'full': 'Zyra.png', 'sprite': 'champion4.png'... [Mage, Support] Mana {'hp': 504, 'hpperlevel': 79, 'mp': 418, 'mppe...

148 rows × 11 columns

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

In [5]:
pd.DataFrame(data.columns.values.tolist())
Out[5]:
0
0 version
1 id
2 key
3 name
4 title
5 blurb
6 info
7 image
8 tags
9 partype
10 stats

So let's select just these two columns and work with a list containing only them as we move forward.

Without further investigation, we can see that we have at least a few NaN values in the table above. We are only interested in co-occurrence of types, so we can remove all samples which contain a NaN value.

We can also see an instance where the type Fighting at index $1014$ is followed by \n. We'll strip all these out before continuing.

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

First we'll populate our list of type names by looking for the unique ones.

In [6]:
types = [item for sublist in data.tags.tolist() for item in sublist]
names = np.unique(types).tolist()
pd.DataFrame(names)
Out[6]:
0
0 Assassin
1 Fighter
2 Mage
3 Marksman
4 Support
5 Tank

Now we can create our empty co-occurrence matrix using these type names for the row and column indeces.

In [7]:
matrix = pd.DataFrame(0, index=names, columns=names)
matrix
Out[7]:
Assassin Fighter Mage Marksman Support Tank
Assassin 0 0 0 0 0 0
Fighter 0 0 0 0 0 0
Mage 0 0 0 0 0 0
Marksman 0 0 0 0 0 0
Support 0 0 0 0 0 0
Tank 0 0 0 0 0 0

We can populate a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

Which we can now use to create the matrix.

In [8]:
len(data.tags[0])
Out[8]:
2
In [9]:
for x in data.tags:
    if(len(x) == 2):
        matrix.at[x[0], x[1]] += 1
        matrix.at[x[1], x[0]] += 1
    if(len(x) == 1):
        matrix.at[x[0], x[0]] += 1

matrix = matrix.values.tolist()

We can list DataFrame for better presentation.

In [10]:
pd.DataFrame(matrix)
Out[10]:
0 1 2 3 4 5
0 5 17 8 5 1 0
1 17 3 6 1 3 34
2 8 6 13 6 21 4
3 5 1 6 13 2 0
4 1 3 21 2 1 4
5 0 34 4 0 4 1
In [11]:
colors = ["#ffbe0b","#fb5607","#ff006e","#8338ec","#3a86ff","#80FF72"]

Chord(matrix, names, colors=colors).show()
Chord Diagram

Chord Diagram with Names

It would be nice to show a list of Pokémon names and images when hovering over co-occurring Pokémon types. To do this, we can make use of the optional details parameter.

Let's also add a column to our dataset to store URLs that point to the images.

In [12]:
data['URL'] = ""

for index, row in data.iterrows():
    #url = f"http://127.0.0.1:8000/images/data-is-beautiful/lol/champion/{row.name}.png"
    url = f"https://shahinrostami.com/images/data-is-beautiful/lol/champion/{row.name}.png"
    data.at[index,'URL'] = url
In [13]:
data.URL
Out[13]:
Aatrox     https://shahinrostami.com/images/data-is-beaut...
Ahri       https://shahinrostami.com/images/data-is-beaut...
Akali      https://shahinrostami.com/images/data-is-beaut...
Alistar    https://shahinrostami.com/images/data-is-beaut...
Amumu      https://shahinrostami.com/images/data-is-beaut...
                                 ...                        
Zed        https://shahinrostami.com/images/data-is-beaut...
Ziggs      https://shahinrostami.com/images/data-is-beaut...
Zilean     https://shahinrostami.com/images/data-is-beaut...
Zoe        https://shahinrostami.com/images/data-is-beaut...
Zyra       https://shahinrostami.com/images/data-is-beaut...
Name: URL, Length: 148, dtype: object
In [14]:
data.loc['Akali']
Out[14]:
version                                              10.13.1
id                                                     Akali
key                                                       84
name                                                   Akali
title                                     the Rogue Assassin
blurb      Abandoning the Kinkou Order and her title of t...
info       {'attack': 5, 'defense': 3, 'magic': 8, 'diffi...
image      {'full': 'Akali.png', 'sprite': 'champion0.png...
tags                                              [Assassin]
partype                                               Energy
stats      {'hp': 575, 'hpperlevel': 95, 'mp': 200, 'mppe...
URL        https://shahinrostami.com/images/data-is-beaut...
Name: Akali, dtype: object
In [15]:
names
Out[15]:
['Assassin', 'Fighter', 'Mage', 'Marksman', 'Support', 'Tank']

Next, we'll create an empty multi-dimensional arrays with the same shape as our matrix for our details and thumbnail images.

In [16]:
data[['tag_1','tag_2']] = pd.DataFrame(data.tags.tolist(), index= data.index)
In [17]:
data
Out[17]:
version id key name title blurb info image tags partype stats URL tag_1 tag_2
Aatrox 10.13.1 Aatrox 266 Aatrox the Darkin Blade Once honored defenders of Shurima against the ... {'attack': 8, 'defense': 4, 'magic': 3, 'diffi... {'full': 'Aatrox.png', 'sprite': 'champion0.pn... [Fighter, Tank] Blood Well {'hp': 580, 'hpperlevel': 90, 'mp': 0, 'mpperl... https://shahinrostami.com/images/data-is-beaut... Fighter Tank
Ahri 10.13.1 Ahri 103 Ahri the Nine-Tailed Fox Innately connected to the latent power of Rune... {'attack': 3, 'defense': 4, 'magic': 8, 'diffi... {'full': 'Ahri.png', 'sprite': 'champion0.png'... [Mage, Assassin] Mana {'hp': 526, 'hpperlevel': 92, 'mp': 418, 'mppe... https://shahinrostami.com/images/data-is-beaut... Mage Assassin
Akali 10.13.1 Akali 84 Akali the Rogue Assassin Abandoning the Kinkou Order and her title of t... {'attack': 5, 'defense': 3, 'magic': 8, 'diffi... {'full': 'Akali.png', 'sprite': 'champion0.png... [Assassin] Energy {'hp': 575, 'hpperlevel': 95, 'mp': 200, 'mppe... https://shahinrostami.com/images/data-is-beaut... Assassin None
Alistar 10.13.1 Alistar 12 Alistar the Minotaur Always a mighty warrior with a fearsome reputa... {'attack': 6, 'defense': 9, 'magic': 5, 'diffi... {'full': 'Alistar.png', 'sprite': 'champion0.p... [Tank, Support] Mana {'hp': 600, 'hpperlevel': 106, 'mp': 350, 'mpp... https://shahinrostami.com/images/data-is-beaut... Tank Support
Amumu 10.13.1 Amumu 32 Amumu the Sad Mummy Legend claims that Amumu is a lonely and melan... {'attack': 2, 'defense': 6, 'magic': 8, 'diffi... {'full': 'Amumu.png', 'sprite': 'champion0.png... [Tank, Mage] Mana {'hp': 613.12, 'hpperlevel': 84, 'mp': 287.2, ... https://shahinrostami.com/images/data-is-beaut... Tank Mage
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Zed 10.13.1 Zed 238 Zed the Master of Shadows Utterly ruthless and without mercy, Zed is the... {'attack': 9, 'defense': 2, 'magic': 1, 'diffi... {'full': 'Zed.png', 'sprite': 'champion4.png',... [Assassin] Energy {'hp': 584, 'hpperlevel': 85, 'mp': 200, 'mppe... https://shahinrostami.com/images/data-is-beaut... Assassin None
Ziggs 10.13.1 Ziggs 115 Ziggs the Hexplosives Expert With a love of big bombs and short fuses, the ... {'attack': 2, 'defense': 4, 'magic': 9, 'diffi... {'full': 'Ziggs.png', 'sprite': 'champion4.png... [Mage] Mana {'hp': 536, 'hpperlevel': 92, 'mp': 480, 'mppe... https://shahinrostami.com/images/data-is-beaut... Mage None
Zilean 10.13.1 Zilean 26 Zilean the Chronokeeper Once a powerful Icathian mage, Zilean became o... {'attack': 2, 'defense': 5, 'magic': 8, 'diffi... {'full': 'Zilean.png', 'sprite': 'champion4.pn... [Support, Mage] Mana {'hp': 504, 'hpperlevel': 82, 'mp': 452, 'mppe... https://shahinrostami.com/images/data-is-beaut... Support Mage
Zoe 10.13.1 Zoe 142 Zoe the Aspect of Twilight As the embodiment of mischief, imagination, an... {'attack': 1, 'defense': 7, 'magic': 8, 'diffi... {'full': 'Zoe.png', 'sprite': 'champion4.png',... [Mage, Support] Mana {'hp': 560, 'hpperlevel': 92, 'mp': 425, 'mppe... https://shahinrostami.com/images/data-is-beaut... Mage Support
Zyra 10.13.1 Zyra 143 Zyra Rise of the Thorns Born in an ancient, sorcerous catastrophe, Zyr... {'attack': 4, 'defense': 3, 'magic': 8, 'diffi... {'full': 'Zyra.png', 'sprite': 'champion4.png'... [Mage, Support] Mana {'hp': 504, 'hpperlevel': 79, 'mp': 418, 'mppe... https://shahinrostami.com/images/data-is-beaut... Mage Support

148 rows × 14 columns

In [18]:
#data.loc[data.tag_2.isna(), 'tag_2'] = data[data.tag_2.isna()].tag_1
In [19]:
details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)

Now we can populate the details array with lists of Pokémon names in the correct positions.

In [20]:
data[(data['tag_1'] == "Assassin") & (data["tag_2"].isnull())]
Out[20]:
version id key name title blurb info image tags partype stats URL tag_1 tag_2
Akali 10.13.1 Akali 84 Akali the Rogue Assassin Abandoning the Kinkou Order and her title of t... {'attack': 5, 'defense': 3, 'magic': 8, 'diffi... {'full': 'Akali.png', 'sprite': 'champion0.png... [Assassin] Energy {'hp': 575, 'hpperlevel': 95, 'mp': 200, 'mppe... https://shahinrostami.com/images/data-is-beaut... Assassin None
Khazix 10.13.1 Khazix 121 Kha'Zix the Voidreaver The Void grows, and the Void adapts—in none of... {'attack': 9, 'defense': 4, 'magic': 3, 'diffi... {'full': 'Khazix.png', 'sprite': 'champion1.pn... [Assassin] Mana {'hp': 572.8, 'hpperlevel': 85, 'mp': 327.2, '... https://shahinrostami.com/images/data-is-beaut... Assassin None
Shaco 10.13.1 Shaco 35 Shaco the Demon Jester Crafted long ago as a plaything for a lonely p... {'attack': 8, 'defense': 4, 'magic': 6, 'diffi... {'full': 'Shaco.png', 'sprite': 'champion3.png... [Assassin] Mana {'hp': 587, 'hpperlevel': 89, 'mp': 297.2, 'mp... https://shahinrostami.com/images/data-is-beaut... Assassin None
Talon 10.13.1 Talon 91 Talon the Blade's Shadow Talon is the knife in the darkness, a merciles... {'attack': 9, 'defense': 3, 'magic': 1, 'diffi... {'full': 'Talon.png', 'sprite': 'champion3.png... [Assassin] Mana {'hp': 588, 'hpperlevel': 95, 'mp': 377.2, 'mp... https://shahinrostami.com/images/data-is-beaut... Assassin None
Zed 10.13.1 Zed 238 Zed the Master of Shadows Utterly ruthless and without mercy, Zed is the... {'attack': 9, 'defense': 2, 'magic': 1, 'diffi... {'full': 'Zed.png', 'sprite': 'champion4.png',... [Assassin] Energy {'hp': 584, 'hpperlevel': 85, 'mp': 200, 'mppe... https://shahinrostami.com/images/data-is-beaut... Assassin None
In [21]:
for count_x, item_x in enumerate(names):
    for count_y, item_y in enumerate(names):
        if(item_y == item_x):
            details_urls = data[
                (data['tag_1'] == item_x) & (data["tag_2"].isnull())]['URL'].to_list()

            details_names = data[
                (data['tag_1'] == item_x) & (data["tag_2"].isnull())]['name'].to_list()
        else:
            details_urls = data[
                (data['tag_1'].isin([item_x, item_y])) &
                (data['tag_2'].isin([item_y, item_x]))]['URL'].to_list()

            details_names = data[
                (data['tag_1'].isin([item_x, item_y])) &
                (data['tag_2'].isin([item_y, item_x]))]['name'].to_list()
        
        urls_names = np.column_stack((details_urls, details_names))
        if(urls_names.size > 0):
            details[count_x][count_y] = details_names
            details_thumbs[count_x][count_y] = details_urls

        else:
            details[count_x][count_y] = []
            details_thumbs[count_x][count_y] = []

details=pd.DataFrame(details).values.tolist()
details_thumbs=pd.DataFrame(details_thumbs).values.tolist()
In [22]:
pd.DataFrame(details)
Out[22]:
0 1 2 3 4 5
0 [Akali, Kha'Zix, Shaco, Talon, Zed] [Ekko, Fiora, Fizz, Irelia, Jax, Kayn, Lee Sin... [Ahri, Evelynn, Kassadin, Katarina, LeBlanc, M... [Quinn, Teemo, Tristana, Twitch, Vayne] [Pyke] []
1 [Ekko, Fiora, Fizz, Irelia, Jax, Kayn, Lee Sin... [Gangplank, Mordekaiser, Rek'Sai] [Diana, Elise, Gragas, Rumble, Ryze, Swain] [Jayce] [Kayle, Taric, Thresh] [Aatrox, Blitzcrank, Camille, Darius, Dr. Mund...
2 [Ahri, Evelynn, Kassadin, Katarina, LeBlanc, M... [Diana, Elise, Gragas, Rumble, Ryze, Swain] [Annie, Aurelion Sol, Brand, Cassiopeia, Karth... [Azir, Ezreal, Jhin, Kennen, Kog'Maw, Varus] [Anivia, Bard, Fiddlesticks, Heimerdinger, Ive... [Amumu, Cho'Gath, Galio, Maokai]
3 [Quinn, Teemo, Tristana, Twitch, Vayne] [Jayce] [Azir, Ezreal, Jhin, Kennen, Kog'Maw, Varus] [Aphelios, Caitlyn, Corki, Draven, Graves, Jin... [Ashe, Senna] []
4 [Pyke] [Kayle, Taric, Thresh] [Anivia, Bard, Fiddlesticks, Heimerdinger, Ive... [Ashe, Senna] [Rakan] [Alistar, Braum, Leona, Tahm Kench]
5 [] [Aatrox, Blitzcrank, Camille, Darius, Dr. Mund... [Amumu, Cho'Gath, Galio, Maokai] [] [Alistar, Braum, Leona, Tahm Kench] [Shen]

Finally, we can put it all together but this time with the details matrix passed in.

In [23]:
Chord(
    matrix,
    names,
    colors=colors,
    details=details,
    details_thumbs=details_thumbs,
    noun="Champions",
    thumbs_width=70,
    thumbs_margin=1,
    popup_width=740,
    thumbs_font_size=10,
    credit=True
).show()
Chord Diagram

Conclusion

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!

Top Olympic Medal Earning Countries

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from chord import Chord

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the TidyTuesday Animal Crossing villagers dataset to visualise the relationship between Species and .

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

In [2]:
data_url = 'https://shahinrostami.com/datasets/athlete_events.csv'
raw_data = pd.read_csv(data_url)
raw_data.head()
Out[2]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
0 1 A Dijiang M 24.0 180.0 80.0 China CHN 1992 Summer 1992 Summer Barcelona Basketball Basketball Men's Basketball NaN
1 2 A Lamusi M 23.0 170.0 60.0 China CHN 2012 Summer 2012 Summer London Judo Judo Men's Extra-Lightweight NaN
2 3 Gunnar Nielsen Aaby M 24.0 NaN NaN Denmark DEN 1920 Summer 1920 Summer Antwerpen Football Football Men's Football NaN
3 4 Edgar Lindenau Aabye M 34.0 NaN NaN Denmark/Sweden DEN 1900 Summer 1900 Summer Paris Tug-Of-War Tug-Of-War Men's Tug-Of-War Gold
4 5 Christine Jacoba Aaftink F 21.0 185.0 82.0 Netherlands NED 1988 Winter 1988 Winter Calgary Speed Skating Speed Skating Women's 500 metres NaN
In [3]:
data = raw_data[raw_data.Medal.notna()]
data.head()
Out[3]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
3 4 Edgar Lindenau Aabye M 34.0 NaN NaN Denmark/Sweden DEN 1900 Summer 1900 Summer Paris Tug-Of-War Tug-Of-War Men's Tug-Of-War Gold
37 15 Arvo Ossian Aaltonen M 30.0 NaN NaN Finland FIN 1920 Summer 1920 Summer Antwerpen Swimming Swimming Men's 200 metres Breaststroke Bronze
38 15 Arvo Ossian Aaltonen M 30.0 NaN NaN Finland FIN 1920 Summer 1920 Summer Antwerpen Swimming Swimming Men's 400 metres Breaststroke Bronze
40 16 Juhamatti Tapio Aaltonen M 28.0 184.0 85.0 Finland FIN 2014 Winter 2014 Winter Sochi Ice Hockey Ice Hockey Men's Ice Hockey Bronze
41 17 Paavo Johannes Aaltonen M 28.0 175.0 64.0 Finland FIN 1948 Summer 1948 Summer London Gymnastics Gymnastics Men's Individual All-Around Bronze

capitalise the name, personality, and species of each villager.

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

In [4]:
data.shape
Out[4]:
(39783, 15)
In [5]:
data = data[data['NOC'].isin(list(data['NOC'].value_counts()[:20].index))]

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

So let's select just these two columns and work with a list containing only them as we move forward.

In [6]:
species_personality = pd.DataFrame(data[['NOC', 'Medal']].values).dropna().astype(str)
species_personality
Out[6]:
0 1
0 FIN Bronze
1 FIN Bronze
2 FIN Bronze
3 FIN Bronze
4 FIN Gold
... ... ...
30152 URS Gold
30153 URS Silver
30154 URS Bronze
30155 RUS Bronze
30156 RUS Silver

30157 rows × 2 columns

In [7]:
species_personality = species_personality.dropna()
In [ ]:
 

Now for the names of our types.

In [8]:
#left = np.unique(pd.DataFrame(species_personality)[0]).tolist()
left = list(data['Medal'].value_counts().index)[::-1]
#left.sort()
left = list(["Gold","Silver","Bronze"])

pd.DataFrame(left)
Out[8]:
0
0 Gold
1 Silver
2 Bronze
In [9]:
#right = np.unique(pd.DataFrame(species_personality)[1]).tolist()
right = list(data['NOC'].value_counts().index)
#right.sort()
pd.DataFrame(right)
Out[9]:
0
0 USA
1 URS
2 GER
3 GBR
4 FRA
5 ITA
6 SWE
7 CAN
8 AUS
9 RUS
10 HUN
11 NED
12 NOR
13 GDR
14 CHN
15 JPN
16 FIN
17 SUI
18 ROU
19 KOR

Which we can now use to create the matrix.

In [10]:
features= left+right
d = pd.DataFrame(0, index=features, columns=features)

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

We can build a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

In [11]:
species_personality.values
Out[11]:
array([['FIN', 'Bronze'],
       ['FIN', 'Bronze'],
       ['FIN', 'Bronze'],
       ...,
       ['URS', 'Bronze'],
       ['RUS', 'Bronze'],
       ['RUS', 'Silver']], dtype=object)
In [12]:
for x in species_personality.values:
    d.at[x[0], x[1]] += 1
    d.at[x[1], x[0]] += 1
In [ ]:
 

Chord Diagram

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

In [13]:
colors =["#FFD700","#C0C0C0","#A57164",
'#e6194b', '#3cb44b', '#ffe119', '#4363d8', '#f58231', '#911eb4', '#46f0f0', '#f032e6', '#bcf60c', '#fabebe', '#008080', '#e6beff', '#9a6324', '#fffac8', '#800000', '#aaffc3', '#808000', '#ffd8b1', '#000075', '#808080', '#ffffff', '#000000'
         #'#e6194B', '#3cb44b', '#ffe119', '#4363d8', '#f58231', '#911eb4', '#42d4f4', '#f032e6', '#bfef45', '#fabed4', '#469990', '#dcbeff', '#9A6324', '#fffac8', '#800000', '#aaffc3', '#808000', '#ffd8b1', '#000075', '#a9a9a9', '#ffffff', '#000000'
        ]
In [14]:
names = left + right
len(names)
Out[14]:
23

Finally, we can put it all together.
In [15]:
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, font_size_large=7,noun="medals",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2).show()
Chord Diagram

Chord Diagram with Names

It would be nice to show a list of Pokémon names when hovering over co-occurring Pokémon types. To do this, we can make use of the optional details parameter.

In [ ]:
 

Next, we'll create an empty multi-dimensional array with the same shape as our matrix.

In [16]:
details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)

Now we can populate the details array with lists of Pokémon names in the correct positions.

In [17]:
for count_x, item_x in enumerate(names):
    for count_y, item_y in enumerate(names):
        details_urls = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['url'].to_list()
        
        details_names = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['name'].to_list()
        
        urls_names = np.column_stack((details_urls, details_names))
        if(urls_names.size > 0):
            details[count_x][count_y] = details_names
            details_thumbs[count_x][count_y] = details_urls

        else:
            details[count_x][count_y] = []
            details_thumbs[count_x][count_y] = []

details=pd.DataFrame(details).values.tolist()
details_thumbs=pd.DataFrame(details_thumbs).values.tolist()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'species'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-17-f54e3a0b2906> in <module>
      2     for count_y, item_y in enumerate(names):
      3         details_urls = data[
----> 4             (data['species'].isin([item_x, item_y])) &
      5             (data['personality'].isin([item_y, item_x]))]['url'].to_list()
      6 

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'species'
In [ ]:
len(right)

Finally, we can put it all together but this time with the details matrix passed in.

In [ ]:
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, font_size_large=7,details=details,details_thumbs=details_thumbs,noun="villagers",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).to_html()
In [ ]:
np.empty(shape=(6,1)).tolist()

Conclusion

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!

Olympic Weightlifting Medals with Stacked Bar Charts

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import plotly.graph_objects as go    # for data visualisation

# Optional Customisations
import plotly.io as pio              # to set shahin plot layout
pio.templates['shahin'] = pio.to_templated(go.Figure().update_layout(
    legend=dict(orientation="h",y=1.1, x=.5, xanchor='center'),
    margin=dict(t=0,r=0,b=0,l=0))).layout.template
pio.templates.default = 'shahin'
pio.renderers.default = "notebook_connected" # remove when running locally 

Introduction

In this section, we're going to use 120 years of Olympic history to create a visualisation. Let's set our sights on something that illustrates the distribution of Olympic medals awarded for the weightlifting sport.

Weightlifting cats

The Dataset

We'll use the 120 years of Olympic history: athletes and results dataset, which we'll download and load with pandas. You're also welcome to use the mirrored that has been used in the following cell.

In [2]:
data_url = 'https://shahinrostami.com/datasets/athlete_events.csv'
raw_data = pd.read_csv(data_url)
raw_data.head()
Out[2]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
0 1 A Dijiang M 24.0 180.0 80.0 China CHN 1992 Summer 1992 Summer Barcelona Basketball Basketball Men's Basketball NaN
1 2 A Lamusi M 23.0 170.0 60.0 China CHN 2012 Summer 2012 Summer London Judo Judo Men's Extra-Lightweight NaN
2 3 Gunnar Nielsen Aaby M 24.0 NaN NaN Denmark DEN 1920 Summer 1920 Summer Antwerpen Football Football Men's Football NaN
3 4 Edgar Lindenau Aabye M 34.0 NaN NaN Denmark/Sweden DEN 1900 Summer 1900 Summer Paris Tug-Of-War Tug-Of-War Men's Tug-Of-War Gold
4 5 Christine Jacoba Aaftink F 21.0 185.0 82.0 Netherlands NED 1988 Winter 1988 Winter Calgary Speed Skating Speed Skating Women's 500 metres NaN

It looks like the data was loaded without any issues. Let's have a quick look at the available features.

In [3]:
pd.DataFrame(raw_data.columns)
Out[3]:
0
0 ID
1 Name
2 Sex
3 Age
4 Height
5 Weight
6 Team
7 NOC
8 Games
9 Year
10 Season
11 City
12 Sport
13 Event
14 Medal

Data Wrangling

We're only interested in Olympic weightlifting data for our visualisation, so we'll filter by selecting all rows where the Sport is set to Weightlifting.

In [4]:
data = raw_data[raw_data.Sport =="Weightlifting"]
data.head()
Out[4]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
80 22 Andreea Aanei F 22.0 170.0 125.0 Romania ROU 2016 Summer 2016 Summer Rio de Janeiro Weightlifting Weightlifting Women's Super-Heavyweight NaN
154 59 Ivan Nikolov Abadzhiev M 24.0 164.0 71.0 Bulgaria BUL 1956 Summer 1956 Summer Melbourne Weightlifting Weightlifting Men's Lightweight NaN
155 59 Ivan Nikolov Abadzhiev M 28.0 164.0 71.0 Bulgaria BUL 1960 Summer 1960 Summer Roma Weightlifting Weightlifting Men's Middleweight NaN
156 60 Mikhail Abadzhiev M 24.0 172.0 75.0 Bulgaria BUL 1960 Summer 1960 Summer Roma Weightlifting Weightlifting Men's Middleweight NaN
234 112 Aziz Abbas M 21.0 169.0 67.0 Iraq IRQ 1964 Summer 1964 Summer Tokyo Weightlifting Weightlifting Men's Lightweight NaN

If we look at the Medal column in the table above, we can see NaN values for when an athlete was not awarded a medal. As we're only interested Olympic medalists for this visualisation, let's drop all the rows where no medal was awarded.

In [5]:
data = data[data.Medal.notna()]
data.head()
Out[5]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
2331 1301 Sri Wahyuni Agustiani F 21.0 147.0 47.0 Indonesia INA 2016 Summer 2016 Summer Rio de Janeiro Weightlifting Weightlifting Women's Flyweight Silver
2637 1480 Franz Aigner M 32.0 NaN 107.0 Austria AUT 1924 Summer 1924 Summer Paris Weightlifting Weightlifting Men's Heavyweight Silver
3045 1698 Khadzhimurat Magomedovich Akkayev M 19.0 178.0 105.0 Russia RUS 2004 Summer 2004 Summer Athina Weightlifting Weightlifting Men's Middle-Heavyweight Silver
3046 1698 Khadzhimurat Magomedovich Akkayev M 23.0 178.0 105.0 Russia RUS 2008 Summer 2008 Summer Beijing Weightlifting Weightlifting Men's Middle-Heavyweight Bronze
3067 1713 Artur Vladimirovich Akoyev M 26.0 NaN 109.0 Unified Team EUN 1992 Summer 1992 Summer Barcelona Weightlifting Weightlifting Men's Heavyweight II Silver

If we're interested, we can take a peek at how many medals have been awarded in total for bronze, silver, and gold.

In [6]:
pd.DataFrame(data.Medal.value_counts())
Out[6]:
Medal
Gold 217
Bronze 216
Silver 213

Now that we have our filtered and relevant data, let's build a list of participating countries. At first glance, it looks like Team may be the feature we're interested in, and for the Weightlifting sport, it is indeed a good selection. However, in other sports in the same dataset, we will see Teams such as Japan-1 and Japan-2.

In [7]:
pd.DataFrame(raw_data[raw_data.Team.str.contains("Japan")].Team.unique())
Out[7]:
0
0 Japan
1 Japan-1
2 Japan-2
3 Japan-3

For now, we'll continue with the NOC feature, which holds the name of the National Olympic Committee for each athlete.

In [8]:
noc = data.NOC.unique().tolist()
print(noc)
['INA', 'AUT', 'RUS', 'EUN', 'BUL', 'URS', 'LUX', 'USA', 'JPN', 'IRI', 'TUR', 'FRA', 'BLR', 'GEO', 'IRQ', 'HUN', 'AUS', 'POL', 'ROU', 'GER', 'SWE', 'ITA', 'CUB', 'GDR', 'CHN', 'NED', 'TPE', 'KAZ', 'PRK', 'MDA', 'GBR', 'ARM', 'UKR', 'BEL', 'CAN', 'PHI', 'LTU', 'GRE', 'TCH', 'EGY', 'COL', 'FIN', 'VIE', 'SUI', 'FRG', 'KOR', 'THA', 'NOR', 'DEN', 'MEX', 'EST', 'TTO', 'IND', 'UZB', 'NGR', 'CRO', 'VEN', 'QAT', 'LAT', 'ARG', 'SGP', 'LIB', 'ESP', 'AZE']

Visualising the Data

Now that we have prepared our data, let's create a few visualisations. Instead of just showing you the final visualisation, we will develop our visualisation incrementally, where each subsequent visualisation improves on the last.

Stacked Bar Chart - Iteration 1

When we started this notebook, we had the idea of creating a stacked bar chart to visualise the medals awarded to each country in the weightlifting sport. Our first visualisation may look something like the following.

In [9]:
fig = go.Figure(layout=dict(barmode='stack'))

fig.add_bar(name="Bronze", x=noc, y=data[data.Medal == "Bronze"].NOC
            .value_counts().reindex(noc), marker_color="brown")

fig.add_bar(name="Silver", x=noc, y=data[data.Medal == "Silver"].NOC
            .value_counts().reindex(noc), marker_color="silver")

fig.add_bar(name="Gold", x=noc, y=data[data.Medal == "Gold"].NOC
            .value_counts().reindex(noc), marker_color="gold")

fig.show()

It's not a bad start! We have our bars stacked in the right order, from bronze up to gold, and our colours were selected to be gold, silver, and brown (as no colour parameter exists for bronze).

Stacked Bar Chart - Iteration 2

However, we can make some improvements to enhance the usefulness and beauty of the visualisation. Let's try the following:

  • Assign some specific HEX colour codes for our bar colours,
  • Order the bars in descending order by total medals awarded,
  • and Angle the bar (tick) labels at -45 degrees.
In [10]:
fig = go.Figure(layout=dict(
    barmode='stack', 
    xaxis= dict(categoryorder='total descending', tickangle=-45)))

fig.add_bar(name="Bronze", x=noc, y=data[data.Medal == "Bronze"].NOC
            .value_counts().reindex(noc), marker_color="#A57164")

fig.add_bar(name="Silver", x=noc, y=data[data.Medal == "Silver"].NOC
            .value_counts().reindex(noc), marker_color="#C0C0C0")

fig.add_bar(name="Gold", x=noc, y=data[data.Medal == "Gold"].NOC
            .value_counts().reindex(noc), marker_color="#FFD700")

fig.show()

Great! It's already looking easier to navigate, and the colours are more suitable for the data they're representing.

Stacked Bar Chart - Iteration 3

Let's continue to make improvements, this time we'll try the following:

  • Reduce the font-size of the bar (tick) labels, as some currently disappear if the width of the plot is too small (e.g., when shrinking the browser width),
  • Change the font-colour of our bar (tick) labels,
  • Add an outline and some transparency to our bars,
  • Reduce the gaps between our bars,
  • Hide the y-axis ticks,
  • and Add a thick line at the bottom of the x-axis.
In [11]:
fig = go.Figure(layout=dict(
    barmode='stack', bargap = 0.1,
    xaxis= dict(categoryorder='total descending', tickangle=-45,
                showline=True, linewidth=2, linecolor='black',ticks='',
                tickfont=dict(size=8, color='black')),
    yaxis=dict(showticklabels=False)))

fig.add_bar(name="Bronze", x=noc, y=data[data.Medal == "Bronze"].NOC
            .value_counts().reindex(noc), marker_color="#A57164")

fig.add_bar(name="Silver", x=noc, y=data[data.Medal == "Silver"].NOC
            .value_counts().reindex(noc), marker_color="#C0C0C0")

fig.add_bar(name="Gold", x=noc, y=data[data.Medal == "Gold"].NOC
            .value_counts().reindex(noc), marker_color="#FFD700")

fig.update_traces(marker_line_color='#003366',
                  marker_line_width=1, opacity=0.7)

fig.show()

Looking good!

Stacked Bar Chart - Final Iteration

Now to wrap things up, we may be interested in just selecting the "top 15" medal earning countries for weightlifting. We'll also start using the Team feature instead of working with the NOC. This will require some additional preparation. First, we'll determine the top 15 medal earners.

In [12]:
top_15 = data.Team.value_counts()[:15]
pd.DataFrame(top_15)
Out[12]:
Team
Soviet Union 62
China 57
United States 42
Bulgaria 36
Poland 32
Russia 26
Germany 25
Hungary 20
Iran 18
North Korea 17
Kazakhstan 16
Greece 16
France 16
Italy 15
Japan 14

Next, we'll filter our data to only include rows from these teams.

In [13]:
data =  data[data.Team.isin(list(top_15.index.values))]
teams = data.Team.unique().tolist()
data.head()
Out[13]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
3045 1698 Khadzhimurat Magomedovich Akkayev M 19.0 178.0 105.0 Russia RUS 2004 Summer 2004 Summer Athina Weightlifting Weightlifting Men's Middle-Heavyweight Silver
3046 1698 Khadzhimurat Magomedovich Akkayev M 23.0 178.0 105.0 Russia RUS 2008 Summer 2008 Summer Beijing Weightlifting Weightlifting Men's Middle-Heavyweight Bronze
4001 2306 Ruslan Vladimirovich Albegov M 24.0 192.0 156.0 Russia RUS 2012 Summer 2012 Summer London Weightlifting Weightlifting Men's Super-Heavyweight Bronze
4360 2483 Rumen Aleksandrov M 20.0 176.0 89.0 Bulgaria BUL 1980 Summer 1980 Summer Moskva Weightlifting Weightlifting Men's Middle-Heavyweight Silver
4404 2511 Vasily Ivanovich Alekseyev M 30.0 185.0 160.0 Soviet Union URS 1972 Summer 1972 Summer Munich Weightlifting Weightlifting Men's Super-Heavyweight Gold

Finally, we'll produce our final visualisation that will display the top 15 medal earning countries (or teams) for the weightlifting sport. We'll also try the following improvements to our visualisation:

  • Changing the fonts to use Muli (if it's available),
  • Hide the legend as the bar colours are all we need,
  • Adding a title (and some top-margin to give it space),
  • Adding text above our bars indicating the total medals per country,
  • Increasing the thickness of our bar outlines (as we have fewer bars now),
  • and Changing the angle of the bar (tick) labels to 60 degrees, so they stay within the boundaries of our visualisation.
In [14]:
fig = go.Figure(layout=dict(
    title="Top 15 Olympic weightlifting medal earners between {}-{}"
        .format(data.Year.min(),data.Year.max()),
    barmode='stack', bargap = 0.1, margin=dict(t=40, r=0, b=0, l=0),
    font=dict(family="Muli", size=14, color="#212529",), showlegend=False,
    xaxis= dict(categoryorder='total descending', tickangle=60,
                showline=True, linewidth=2, linecolor='black',ticks='',
                tickfont=dict(family="Muli", size=16, color="#212529")),
    yaxis=dict(showticklabels=False)),
)

fig.add_bar(name="Bronze", x=teams, y=data[data.Medal == "Bronze"].Team
            .value_counts().reindex(teams), marker_color="#A57164")

fig.add_bar(name="Silver", x=teams, y=data[data.Medal == "Silver"].Team
            .value_counts().reindex(teams), marker_color="#C0C0C0")

fig.add_bar(name="Gold", x=teams, y=data[data.Medal == "Gold"].Team
            .value_counts().reindex(teams), marker_color="#FFD700",
            text=data.Team.value_counts().reindex(teams), textposition="outside")

fig.update_traces(marker_line_color='#003366',
                  marker_line_width=1.5, opacity=0.7, textfont_size=14)

fig.show()

Conclusion

In this section, we went through a few improvement cycles to produce a visualisation illustrating the top Olympic weightlifting medal earners in the 120 years of Olympic history: athletes and results dataset.

The visualisation ended up looking great, but a few plotly limitations prevented one final improvement - changing the bar colours to be gradients.

Weightlifting cats

Bootstrap 5 is Here!

Bootstrap 5’s very first alpha has arrived, and it looks like it’s something to celebrate! Amongst the many differences between bootstrap 5, and its previous version, Bootstrap 4, there are some major ones to look out for.

We'll explore these in the remainder of this article, but you may want to watch the video too.

jQuery and JavaScript

The first major change, which I think people are going to be very happy about, is that Bootstrap no longer depends on jQuery! For many developers, Bootstrap's dependency on jQuery was a deal-breaker, meaning many of them moved away to other frameworks. This could bring many of those developers back meaning Bootstrap may be an even more popular moving forward.

Although I had no issue with jQuery, I understand that it’s an excellent solution to an old problem. However, a lot has changed over the years, meaning most of these problems have now been addressed in newer web browsers.

Dropped Support for Internet Explorer

The second major change is about something even older than jQuery. Bootstrap 5 has officially dropped support for Internet Explorer!

Supporting Internet Explorer was certainly a nightmare, especially over a decade ago when I was working in web development. However with Microsoft promoting Edge, dropping support for Internet Explorer is the norm nowadays.

CSS Custom Properties

Even so, by dropping support for Internet Explorer, we can talk about our third major change - Bootstrap 5 has been able to start using CSS custom properties.

This means being able to define easy to understand variables in one place and use them in multiple other places. This should improve the theming experience, and it looks like theme creators will be busy moving their themes over.

Improved Documentation

The fourth major change is actually to the Bootstrap documentation. It looks like the team have put some great effort in improving their documentation by removing ambiguity, and giving more support to those wanting to extend Bootstrap.

There’s now more content on theming, complete with even more code snippets that help you build on top of Bootstrap's source files. The colour palette has also been expanded!

There’s even an npm project to get you started quicker.

Updated Forms

The fifth major change comes with the re-design of all of Bootstrap's form controls.

Custom form controls for things like checkboxes and switches were possible in Bootstrap 4, but in Bootstrap 5, the claim is that they’ve gone fully custom with standard markup!

Utility API

The sixth major change is the implementation of the new Utility API in Bootstrap 5.

Utilities have become the preferred way to build, which we can see with the success of Utility-first CSS frameworks like Tailwind CSS.

If you build on Bootstrap using the source files, supposedly your mind will be blown by the new experience!

Conclusion

There are many differences in Bootstrap 5, but you’ll, of course, find familiarity too. Things like the grid system are still here in an enhanced form.

If you haven’t seen it before, Bootstrap now has its own icon library called Bootstrap Icons which is definitely worth checking out.

But these are just the first of many enhancements in Bootstrap 5, and it’s just the alpha after all.

It does look like there’s still no built-in dark mode, which appears to be a highly requested addition. Although, for now, custom dark modes can be created by changing a few variables.

Still, you can head over to https://v5.getbootstrap.com to explore the new release for yourself. You can even get it as pre-release using the node package manager, npm i bootstrap@next.

StamiStudios.com Everyday Ita Bag - Panels and Colours

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from chord import Chord

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the TidyTuesday Animal Crossing villagers dataset to visualise the relationship between Species and .

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

In [2]:
data_url = 'https://shahinrostami.com/datasets/stami_bags_panel_colour.csv'
data = pd.read_csv(data_url)
data.head()
Out[2]:
panel colour
0 Heart Pink
1 Circle lilac
2 Sakura Mint
3 circle mint
4 Animal Black
In [3]:
#data['Country.of.Origin'][data['Country.of.Origin'] == 'United States (Hawaii)'] = 'Hawaii'
#data['Country.of.Origin'][data['Country.of.Origin'] == 'Tanzania, United Republic Of'] = 'Tanzania'
In [4]:
#data = data[data.Variety != 'Other']
data = data[data.notna()]
In [5]:
data = data[data['panel'].isin(list(data['panel'].value_counts()[:20].index))]
data = data[data['colour'].isin(list(data['colour'].value_counts()[:11].index))]
In [ ]:
 

capitalise the name, personality, and species of each villager.

In [6]:
data['panel'] = data['panel'].str.capitalize()
data['colour'] = data['colour'].str.capitalize()

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

In [7]:
data.shape
Out[7]:
(3040, 2)
In [8]:
d_colours = list(data.colour.value_counts().index)
d_colours.sort()
d_colours
Out[8]:
['Black', 'Blue', 'Green', 'Lilac', 'Mint', 'Navy', 'Pink', 'White', 'Yellow']

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

So let's select just these two columns and work with a list containing only them as we move forward.

In [9]:
species_personality = pd.DataFrame(data[['colour', 'panel']].values).dropna().astype(str)
species_personality
Out[9]:
0 1
0 Pink Heart
1 Mint Sakura
2 Black Animal
3 White Circle
4 White Star
... ... ...
3035 Blue Circle
3036 White Circle
3037 White Heart
3038 White Circle
3039 Black Circle

3040 rows × 2 columns

In [10]:
species_personality = species_personality.dropna()

Now for the names of our types.

In [11]:
#left = np.unique(pd.DataFrame(species_personality)[0]).tolist()
left = list(data['colour'].value_counts().index)[::-1]
#left.sort()

pd.DataFrame(left)
Out[11]:
0
0 Yellow
1 Green
2 Blue
3 Navy
4 Mint
5 Lilac
6 Pink
7 White
8 Black
In [12]:
#right = np.unique(pd.DataFrame(species_personality)[1]).tolist()
right = list(data['panel'].value_counts().index)
#right.sort()
pd.DataFrame(right)
Out[12]:
0
0 Circle
1 Star
2 Crescent
3 Moon
4 Bat wings
5 Sakura
6 Heart
7 Dice20
8 Frog
9 Animal
10 Feline-ears
11 Cat
12 Angel-wings
13 Hive
14 Bottle
15 Paw
16 Petals
17 Pixel
18 Citrus

Which we can now use to create the matrix.

In [13]:
features= left+right
d = pd.DataFrame(0, index=features, columns=features)

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

We can build a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

In [14]:
species_personality = list(itertools.chain.from_iterable((i, i[::-1]) for i in species_personality.values))
In [15]:
for x in species_personality:
    d.at[x[0], x[1]] += 1
In [16]:
d=d/(d.values.sum()/2)*100
In [17]:
d
Out[17]:
Yellow Green Blue Navy Mint Lilac Pink White Black Circle ... Animal Feline-ears Cat Angel-wings Hive Bottle Paw Petals Pixel Citrus
Yellow 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.032895 ... 0.000000 0.000000 0.000000 0.000000 0.230263 0.000000 0.032895 0.032895 0.000000 0.855263
Green 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.032895
Blue 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.394737 ... 0.296053 0.197368 0.164474 0.164474 0.000000 0.361842 0.098684 0.000000 0.098684 0.000000
Navy 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.625000 ... 0.131579 0.098684 0.263158 0.000000 0.164474 0.328947 0.131579 0.065789 0.098684 0.098684
Mint 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.361842 ... 0.098684 0.000000 0.131579 0.032895 0.065789 0.098684 0.263158 0.065789 0.197368 0.263158
Lilac 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.723684 ... 0.493421 0.296053 0.493421 0.032895 0.098684 0.460526 0.296053 0.526316 0.263158 0.065789
Pink 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.986842 ... 0.427632 0.328947 0.361842 0.263158 0.263158 0.032895 0.328947 0.789474 0.394737 0.164474
White 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.927632 ... 1.348684 0.493421 0.493421 2.861842 0.625000 0.361842 0.263158 0.657895 0.197368 0.361842
Black 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 7.796053 ... 1.940789 3.289474 2.138158 0.427632 1.743421 1.315789 0.953947 0.098684 0.953947 0.164474
Circle 0.032895 0.000000 0.394737 0.625000 0.361842 0.723684 0.986842 2.927632 7.796053 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Star 0.098684 0.000000 0.625000 0.427632 0.493421 0.723684 1.118421 1.513158 3.684211 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Crescent 0.098684 0.000000 0.328947 0.657895 0.197368 0.789474 1.743421 0.855263 3.256579 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Moon 0.164474 0.000000 0.690789 0.855263 0.361842 0.921053 0.723684 0.953947 3.059211 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Bat wings 0.000000 0.000000 0.098684 0.230263 0.032895 0.394737 0.065789 0.197368 6.644737 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Sakura 0.000000 0.000000 0.230263 0.065789 0.197368 0.394737 3.026316 0.953947 1.414474 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Heart 0.000000 0.000000 0.164474 0.098684 0.164474 0.592105 1.940789 0.888158 1.743421 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Dice20 0.000000 0.000000 0.164474 0.328947 0.230263 0.493421 0.394737 0.427632 2.993421 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Frog 0.000000 2.565789 0.065789 0.000000 1.907895 0.098684 0.328947 0.032895 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Animal 0.000000 0.000000 0.296053 0.131579 0.098684 0.493421 0.427632 1.348684 1.940789 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Feline-ears 0.000000 0.000000 0.197368 0.098684 0.000000 0.296053 0.328947 0.493421 3.289474 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Cat 0.000000 0.000000 0.164474 0.263158 0.131579 0.493421 0.361842 0.493421 2.138158 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Angel-wings 0.000000 0.000000 0.164474 0.000000 0.032895 0.032895 0.263158 2.861842 0.427632 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Hive 0.230263 0.000000 0.000000 0.164474 0.065789 0.098684 0.263158 0.625000 1.743421 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Bottle 0.000000 0.000000 0.361842 0.328947 0.098684 0.460526 0.032895 0.361842 1.315789 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Paw 0.032895 0.000000 0.098684 0.131579 0.263158 0.296053 0.328947 0.263158 0.953947 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Petals 0.032895 0.000000 0.000000 0.065789 0.065789 0.526316 0.789474 0.657895 0.098684 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Pixel 0.000000 0.000000 0.098684 0.098684 0.197368 0.263158 0.394737 0.197368 0.953947 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Citrus 0.855263 0.032895 0.000000 0.098684 0.263158 0.065789 0.164474 0.361842 0.164474 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

28 rows × 28 columns

Chord Diagram

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

In [18]:
colors =["#ffe75f", "#85e063", "#aed6f8", "#1e3c6f", "#affec6", "#d3b0e7", "#fedbe8", "#f5f4e9", "#222222", "#f7a296", "#f48fb1", "#ce93d8", "#a9a3db", "#89cffa", "#80deea", "#80cbc4", "#a5d6a7", "#e6ee9c", "#fff59d", "#ffe082", "#ffcc80", "#f7a296", "#f06292", "#a76fcb", "#7986cb", "#64b5f6", "#4ecaec", "#4db6ac"]
In [19]:
names = left + right

Finally, we can put it all together.
In [20]:
Chord(d.values.round(2).tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, noun="percent",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).show()
Chord Diagram

Chord Diagram with Names

It would be nice to show a list of Pokémon names when hovering over co-occurring Pokémon types. To do this, we can make use of the optional details parameter.

In [ ]:
 

Next, we'll create an empty multi-dimensional array with the same shape as our matrix.

In [21]:
details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)

Now we can populate the details array with lists of Pokémon names in the correct positions.

In [22]:
for count_x, item_x in enumerate(names):
    for count_y, item_y in enumerate(names):
        details_urls = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['url'].to_list()
        
        details_names = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['name'].to_list()
        
        urls_names = np.column_stack((details_urls, details_names))
        if(urls_names.size > 0):
            details[count_x][count_y] = details_names
            details_thumbs[count_x][count_y] = details_urls

        else:
            details[count_x][count_y] = []
            details_thumbs[count_x][count_y] = []

details=pd.DataFrame(details).values.tolist()
details_thumbs=pd.DataFrame(details_thumbs).values.tolist()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'species'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-22-f54e3a0b2906> in <module>
      2     for count_y, item_y in enumerate(names):
      3         details_urls = data[
----> 4             (data['species'].isin([item_x, item_y])) &
      5             (data['personality'].isin([item_y, item_x]))]['url'].to_list()
      6 

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'species'
In [ ]:
len(right)

Finally, we can put it all together but this time with the details matrix passed in.

In [ ]:
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, font_size_large=7,details=details,details_thumbs=details_thumbs,noun="villagers",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).show()
In [ ]:
np.empty(shape=(6,1)).tolist()

Conclusion

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!

Arabica Coffee Beans - Origin and Variety

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from chord import Chord

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the TidyTuesday Animal Crossing villagers dataset to visualise the relationship between Species and .

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

In [2]:
data_url = 'https://shahinrostami.com/datasets/arabica_data.csv'
data = pd.read_csv(data_url)
data.head()
Out[2]:
Unnamed: 0 Species Owner Country.of.Origin Farm.Name Lot.Number Mill ICO.Number Company Altitude ... Color Category.Two.Defects Expiration Certification.Body Certification.Address Certification.Contact unit_of_measurement altitude_low_meters altitude_high_meters altitude_mean_meters
0 1 Arabica metad plc Ethiopia metad plc NaN metad plc 2014/2015 metad agricultural developmet plc 1950-2200 ... Green 0 April 3rd, 2016 METAD Agricultural Development plc 309fcf77415a3661ae83e027f7e5f05dad786e44 19fef5a731de2db57d16da10287413f5f99bc2dd m 1950.0 2200.0 2075.0
1 2 Arabica metad plc Ethiopia metad plc NaN metad plc 2014/2015 metad agricultural developmet plc 1950-2200 ... Green 1 April 3rd, 2016 METAD Agricultural Development plc 309fcf77415a3661ae83e027f7e5f05dad786e44 19fef5a731de2db57d16da10287413f5f99bc2dd m 1950.0 2200.0 2075.0
2 3 Arabica grounds for health admin Guatemala san marcos barrancas "san cristobal cuch NaN NaN NaN NaN 1600 - 1800 m ... NaN 0 May 31st, 2011 Specialty Coffee Association 36d0d00a3724338ba7937c52a378d085f2172daa 0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660 m 1600.0 1800.0 1700.0
3 4 Arabica yidnekachew dabessa Ethiopia yidnekachew dabessa coffee plantation NaN wolensu NaN yidnekachew debessa coffee plantation 1800-2200 ... Green 2 March 25th, 2016 METAD Agricultural Development plc 309fcf77415a3661ae83e027f7e5f05dad786e44 19fef5a731de2db57d16da10287413f5f99bc2dd m 1800.0 2200.0 2000.0
4 5 Arabica metad plc Ethiopia metad plc NaN metad plc 2014/2015 metad agricultural developmet plc 1950-2200 ... Green 2 April 3rd, 2016 METAD Agricultural Development plc 309fcf77415a3661ae83e027f7e5f05dad786e44 19fef5a731de2db57d16da10287413f5f99bc2dd m 1950.0 2200.0 2075.0

5 rows × 44 columns

In [3]:
data['Country.of.Origin'][data['Country.of.Origin'] == 'United States (Hawaii)'] = 'Hawaii'
data['Country.of.Origin'][data['Country.of.Origin'] == 'Tanzania, United Republic Of'] = 'Tanzania'
/home/shahin/miniconda3/envs/analytics/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
/home/shahin/miniconda3/envs/analytics/lib/python3.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
In [4]:
data = data[data.Variety != 'Other']
data = data[data['Variety'].notna()]
In [5]:
data = data[data['Country.of.Origin'].isin(list(data['Country.of.Origin'].value_counts()[:10].index))]
data = data[data['Variety'].isin(list(data['Variety'].value_counts()[:10].index))]
In [ ]:
 

capitalise the name, personality, and species of each villager.

In [6]:
#data['Publisher'] = data['Publisher'].str.capitalize()
#data['Genre'] = data['Genre'].str.capitalize()

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

In [7]:
data.shape
Out[7]:
(851, 44)

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

So let's select just these two columns and work with a list containing only them as we move forward.

In [8]:
species_personality = pd.DataFrame(data[['Country.of.Origin', 'Variety']].values).dropna().astype(str)
species_personality
Out[8]:
0 1
0 Guatemala Bourbon
1 Costa Rica Caturra
2 Brazil Bourbon
3 Uganda SL14
4 Honduras Caturra
... ... ...
846 Honduras Catuai
847 Honduras Catuai
848 Mexico Bourbon
849 Guatemala Catuai
850 Honduras Caturra

851 rows × 2 columns

In [9]:
species_personality = species_personality.dropna()

Now for the names of our types.

In [10]:
#left = np.unique(pd.DataFrame(species_personality)[0]).tolist()
left = list(data['Country.of.Origin'].value_counts().index)[::-1]
#left.sort()

pd.DataFrame(left)
Out[10]:
0
0 Kenya
1 Uganda
2 Hawaii
3 Costa Rica
4 Honduras
5 Taiwan
6 Brazil
7 Colombia
8 Guatemala
9 Mexico
In [11]:
#right = np.unique(pd.DataFrame(species_personality)[1]).tolist()
right = list(data['Variety'].value_counts().index)
#right.sort()
pd.DataFrame(right)
Out[11]:
0
0 Caturra
1 Bourbon
2 Typica
3 Catuai
4 Hawaiian Kona
5 Yellow Bourbon
6 Mundo Novo
7 SL14
8 SL28
9 Pacas

Which we can now use to create the matrix.

In [12]:
features= left+right
d = pd.DataFrame(0, index=features, columns=features)

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

We can build a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

In [13]:
species_personality = list(itertools.chain.from_iterable((i, i[::-1]) for i in species_personality.values))
In [14]:
for x in species_personality:
    d.at[x[0], x[1]] += 1
In [ ]:
 

Chord Diagram

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

In [15]:
colors =[
    "#ff575c","#ff914d","#ffca38","#f2fa00","#94f000","#00fa68","#0087db","#0054f0","#000cf5","#5d00e0",
    
"#6f1d1b","#955939","#bb9457","#7f5e38","#432818","#6e4021","#99582a","#cc9f69","#ffe6a7","#755939"]
In [16]:
names = left + right

Finally, we can put it all together.
In [17]:
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, font_size_large=7,noun="coffee beans",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).show()
Chord Diagram

Chord Diagram with Names

It would be nice to show a list of Pokémon names when hovering over co-occurring Pokémon types. To do this, we can make use of the optional details parameter.

In [ ]:
 

Next, we'll create an empty multi-dimensional array with the same shape as our matrix.

In [18]:
details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)

Now we can populate the details array with lists of Pokémon names in the correct positions.

In [19]:
for count_x, item_x in enumerate(names):
    for count_y, item_y in enumerate(names):
        details_urls = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['url'].to_list()
        
        details_names = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['name'].to_list()
        
        urls_names = np.column_stack((details_urls, details_names))
        if(urls_names.size > 0):
            details[count_x][count_y] = details_names
            details_thumbs[count_x][count_y] = details_urls

        else:
            details[count_x][count_y] = []
            details_thumbs[count_x][count_y] = []

details=pd.DataFrame(details).values.tolist()
details_thumbs=pd.DataFrame(details_thumbs).values.tolist()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'species'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-19-f54e3a0b2906> in <module>
      2     for count_y, item_y in enumerate(names):
      3         details_urls = data[
----> 4             (data['species'].isin([item_x, item_y])) &
      5             (data['personality'].isin([item_y, item_x]))]['url'].to_list()
      6 

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'species'
In [ ]:
len(right)

Finally, we can put it all together but this time with the details matrix passed in.

In [ ]:
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, font_size_large=7,details=details,details_thumbs=details_thumbs,noun="villagers",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).show()
In [ ]:
np.empty(shape=(6,1)).tolist()

Conclusion

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!

Video Game Titles - Publishers and Genres

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from chord import Chord

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the TidyTuesday Animal Crossing villagers dataset to visualise the relationship between Species and .

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

In [2]:
data_url = 'https://shahinrostami.com/datasets/vgsales.csv'
data = pd.read_csv(data_url)
data.head()
Out[2]:
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
0 1 Wii Sports Wii 2006.0 Sports Nintendo 41.49 29.02 3.77 8.46 82.74
1 2 Super Mario Bros. NES 1985.0 Platform Nintendo 29.08 3.58 6.81 0.77 40.24
2 3 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.85 12.88 3.79 3.31 35.82
3 4 Wii Sports Resort Wii 2009.0 Sports Nintendo 15.75 11.01 3.28 2.96 33.00
4 5 Pokemon Red/Pokemon Blue GB 1996.0 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37

capitalise the name, personality, and species of each villager.

In [3]:
data['Publisher'] = data['Publisher'].str.capitalize()
data['Genre'] = data['Genre'].str.capitalize()

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

In [4]:
data.shape
Out[4]:
(16598, 11)

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

In [5]:
pd.DataFrame(data.columns.values.tolist())
Out[5]:
0
0 Rank
1 Name
2 Platform
3 Year
4 Genre
5 Publisher
6 NA_Sales
7 EU_Sales
8 JP_Sales
9 Other_Sales
10 Global_Sales

So let's select just these two columns and work with a list containing only them as we move forward.

In [6]:
species_personality = pd.DataFrame(data[['Publisher', 'Genre']].values).dropna().astype(str)
species_personality
Out[6]:
0 1
0 Nintendo Sports
1 Nintendo Platform
2 Nintendo Racing
3 Nintendo Sports
4 Nintendo Role-playing
... ... ...
16593 Kemco Platform
16594 Infogrames Shooter
16595 Activision Racing
16596 7g//ames Puzzle
16597 Wanadoo Platform

16540 rows × 2 columns

Now for the names of our types.

In [7]:
#left = np.unique(pd.DataFrame(species_personality)[0]).tolist()
left = list(data.Publisher.value_counts()[:14].index)
pd.DataFrame(left)
Out[7]:
0
0 Electronic arts
1 Activision
2 Namco bandai games
3 Ubisoft
4 Konami digital entertainment
5 Thq
6 Nintendo
7 Sony computer entertainment
8 Sega
9 Take-two interactive
10 Capcom
11 Atari
12 Tecmo koei
13 Square enix
In [8]:
right = np.unique(pd.DataFrame(species_personality)[1]).tolist()
pd.DataFrame(right)
Out[8]:
0
0 Action
1 Adventure
2 Fighting
3 Misc
4 Platform
5 Puzzle
6 Racing
7 Role-playing
8 Shooter
9 Simulation
10 Sports
11 Strategy

Which we can now use to create the matrix.

In [9]:
features= left+right
d = pd.DataFrame(0, index=features, columns=features)

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

We can build a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

In [10]:
species_personality = list(itertools.chain.from_iterable((i, i[::-1]) for i in species_personality.values))
In [11]:
for x in species_personality:
    if(x[0] in left or x[1] in left):
        d.at[x[0], x[1]] += 1
In [12]:
d
Out[12]:
Electronic arts Activision Namco bandai games Ubisoft Konami digital entertainment Thq Nintendo Sony computer entertainment Sega Take-two interactive ... Fighting Misc Platform Puzzle Racing Role-playing Shooter Simulation Sports Strategy
Electronic arts 0 0 0 0 0 0 0 0 0 0 ... 39 46 16 7 159 35 139 116 561 37
Activision 0 0 0 0 0 0 0 0 0 0 ... 7 103 60 7 74 41 159 23 144 22
Namco bandai games 0 0 0 0 0 0 0 0 0 0 ... 134 97 19 20 27 151 37 29 51 61
Ubisoft 0 0 0 0 0 0 0 0 0 0 ... 19 151 70 24 52 41 92 119 72 29
Konami digital entertainment 0 0 0 0 0 0 0 0 0 0 ... 20 77 40 10 13 37 40 86 280 28
Thq 0 0 0 0 0 0 0 0 0 0 ... 71 66 85 17 101 8 36 27 31 32
Nintendo 0 0 0 0 0 0 0 0 0 0 ... 18 100 112 74 37 106 26 29 55 32
Sony computer entertainment 0 0 0 0 0 0 0 0 0 0 ... 30 128 66 12 65 49 51 15 124 12
Sega 0 0 0 0 0 0 0 0 0 0 ... 37 62 52 22 48 64 40 12 135 35
Take-two interactive 0 0 0 0 0 0 0 0 0 0 ... 1 27 11 1 20 6 65 4 151 22
Capcom 0 0 0 0 0 0 0 0 0 0 ... 58 11 46 6 13 38 25 2 3 3
Atari 0 0 0 0 0 0 0 0 0 0 ... 37 26 21 22 36 28 40 9 56 17
Tecmo koei 0 0 0 0 0 0 0 0 0 0 ... 12 14 1 0 5 47 3 13 39 50
Square enix 0 0 0 0 0 0 0 0 0 0 ... 3 6 0 4 0 129 16 4 0 9
Action 183 310 248 193 148 194 79 90 101 93 ... 0 0 0 0 0 0 0 0 0 0
Adventure 13 25 58 59 53 47 35 41 31 12 ... 0 0 0 0 0 0 0 0 0 0
Fighting 39 7 134 19 20 71 18 30 37 1 ... 0 0 0 0 0 0 0 0 0 0
Misc 46 103 97 151 77 66 100 128 62 27 ... 0 0 0 0 0 0 0 0 0 0
Platform 16 60 19 70 40 85 112 66 52 11 ... 0 0 0 0 0 0 0 0 0 0
Puzzle 7 7 20 24 10 17 74 12 22 1 ... 0 0 0 0 0 0 0 0 0 0
Racing 159 74 27 52 13 101 37 65 48 20 ... 0 0 0 0 0 0 0 0 0 0
Role-playing 35 41 151 41 37 8 106 49 64 6 ... 0 0 0 0 0 0 0 0 0 0
Shooter 139 159 37 92 40 36 26 51 40 65 ... 0 0 0 0 0 0 0 0 0 0
Simulation 116 23 29 119 86 27 29 15 12 4 ... 0 0 0 0 0 0 0 0 0 0
Sports 561 144 51 72 280 31 55 124 135 151 ... 0 0 0 0 0 0 0 0 0 0
Strategy 37 22 61 29 28 32 32 12 35 22 ... 0 0 0 0 0 0 0 0 0 0

26 rows × 26 columns

Chord Diagram

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

In [13]:
left[5] = 'THQ'
In [14]:
left[0] = 'EA'
left[2] = 'Namco'
left[4] = 'Konami'
left[7] = 'Sony'
left[9] = 'Take-Two'
left[-1] = "Square"
left[-2]= 'Tecmo'
left
Out[14]:
['EA',
 'Activision',
 'Namco',
 'Ubisoft',
 'Konami',
 'THQ',
 'Nintendo',
 'Sony',
 'Sega',
 'Take-Two',
 'Capcom',
 'Atari',
 'Tecmo',
 'Square']
In [15]:
right[7] = 'RPG'
right
Out[15]:
['Action',
 'Adventure',
 'Fighting',
 'Misc',
 'Platform',
 'Puzzle',
 'Racing',
 'RPG',
 'Shooter',
 'Simulation',
 'Sports',
 'Strategy']
In [16]:
colors =["#312f85",
         "#f4e301",
         "#f75802",
         "#3e4682",
         "#ad0332",
         "#666769",
         "#e80113",
         "#f78700",
         "#0100f4",
         "#1272c3","#f7cd01","#dd1a22","#00407b","#f70000",
         
        "#ff4400","#ffcc00","#5c6633","#00e63d","#00d6e6","#566d73","#3d85f2","#00fff2","#0000e6","#290066","#ff80e5","#731d28"]
In [17]:
names = left + right

Finally, we can put it all together.
In [18]:
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, font_size_large=7,noun="titles",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).show()
Chord Diagram

Chord Diagram with Names

It would be nice to show a list of Pokémon names when hovering over co-occurring Pokémon types. To do this, we can make use of the optional details parameter.

In [ ]:
 

Next, we'll create an empty multi-dimensional array with the same shape as our matrix.

In [19]:
details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)

Now we can populate the details array with lists of Pokémon names in the correct positions.

In [20]:
for count_x, item_x in enumerate(names):
    for count_y, item_y in enumerate(names):
        details_urls = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['url'].to_list()
        
        details_names = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['name'].to_list()
        
        urls_names = np.column_stack((details_urls, details_names))
        if(urls_names.size > 0):
            details[count_x][count_y] = details_names
            details_thumbs[count_x][count_y] = details_urls

        else:
            details[count_x][count_y] = []
            details_thumbs[count_x][count_y] = []

details=pd.DataFrame(details).values.tolist()
details_thumbs=pd.DataFrame(details_thumbs).values.tolist()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'species'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-20-f54e3a0b2906> in <module>
      2     for count_y, item_y in enumerate(names):
      3         details_urls = data[
----> 4             (data['species'].isin([item_x, item_y])) &
      5             (data['personality'].isin([item_y, item_x]))]['url'].to_list()
      6 

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'species'
In [ ]:
len(right)

Finally, we can put it all together but this time with the details matrix passed in.

In [ ]:
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, font_size_large=7,details=details,details_thumbs=details_thumbs,noun="villagers",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).show()
In [ ]:
np.empty(shape=(6,1)).tolist()

Conclusion

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!

US Mortality - Race and Manner of Death

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from chord import Chord

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the TidyTuesday Animal Crossing villagers dataset to visualise the relationship between Species and .

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

In [2]:
data_url = '/Users/shahin/Documents/devel/data/2015_data.csv'
data = pd.read_csv(data_url)
data.head()
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-2-a9a58ed19414> in <module>
      1 data_url = '/Users/shahin/Documents/devel/data/2015_data.csv'
----> 2 data = pd.read_csv(data_url)
      3 data.head()

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    674         )
    675 
--> 676         return _read(filepath_or_buffer, kwds)
    677 
    678     parser_f.__name__ = name

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    446 
    447     # Create the parser.
--> 448     parser = TextFileReader(fp_or_buf, **kwds)
    449 
    450     if chunksize or iterator:

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    878             self.options["has_index_names"] = kwds["has_index_names"]
    879 
--> 880         self._make_engine(self.engine)
    881 
    882     def close(self):

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1112     def _make_engine(self, engine="c"):
   1113         if engine == "c":
-> 1114             self._engine = CParserWrapper(self.f, **self.options)
   1115         else:
   1116             if engine == "python":

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1889         kwds["usecols"] = self.usecols
   1890 
-> 1891         self._reader = parsers.TextReader(src, **kwds)
   1892         self.unnamed_cols = self._reader.unnamed_cols
   1893 

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File /Users/shahin/Documents/devel/data/2015_data.csv does not exist: '/Users/shahin/Documents/devel/data/2015_data.csv'
In [ ]:
data['race_recode_5'].value_counts()
In [ ]:
data.columns

capitalise the name, personality, and species of each villager.

In [ ]:
#data['manner'] = data['manner_of_death']#.str.capitalize()
#data['race_recode_5'] = data['race_recode_5']#.str.capitalize()
#data['species'] = data['species'].str.capitalize()

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

In [ ]:
data.shape

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

In [ ]:
pd.DataFrame(data.columns.values.tolist())

So let's select just these two columns and work with a list containing only them as we move forward.

In [ ]:
data.fillna(0, inplace=True)
In [ ]:
 
In [ ]:
data.iloc[6572].manner_of_death
In [ ]:
 
In [ ]:
import json
In [ ]:
with open("/Users/shahin/Documents/devel/data/2015_data.json", "r") as read_file:
    codes = json.load(read_file)
In [ ]:
codes['manner_of_death']
In [ ]:
codes['manner_of_death']['0'] = codes['manner_of_death'].pop('Blank')
In [ ]:
remove = ["Natural", "Not specified", "Could not determine", "Pending investigation"]
In [ ]:
list(codes['manner_of_death'].values())
In [ ]:
left = list(codes['race_recode_5'].values())
pd.DataFrame(left)
In [ ]:
right = list(codes['manner_of_death'].values())
pd.DataFrame(right)
In [ ]:
right = [x for x in right if x not in remove]
In [ ]:
data['manner_of_death'] = data['manner_of_death'].astype('int32')
data['manner_of_death'] = data['manner_of_death'].astype('str')

data.iloc[6572].manner_of_death
In [ ]:
left.sort()
right.sort()
In [ ]:
left
In [ ]:
data = data.replace({"manner_of_death": codes['manner_of_death']})
In [ ]:
data['race_recode_5'] = data['race_recode_5'].astype('int32')
data['race_recode_5'] = data['race_recode_5'].astype('str')

data.iloc[6572].race_recode_5
In [ ]:
data = data.replace({"race_recode_5": codes['race_recode_5']})
In [ ]:
data.iloc[6572].race_recode_5
In [ ]:
manner_race = pd.DataFrame(data[['manner_of_death', 'race_recode_5']].values)
manner_race

Now for the names of our types.

Which we can now use to create the matrix.

In [ ]:
features= left+right
d = pd.DataFrame(0, index=features, columns=features)

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

We can build a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

In [ ]:
manner_race = list(itertools.chain.from_iterable((i, i[::-1]) for i in manner_race.values))
In [ ]:
for x in manner_race:
    if(x[0] not in remove and x[1] not in remove):
        d.at[x[0], x[1]] += 1
In [ ]:
d
In [ ]:
for race in left:
    d.loc[ race , : ] = ((d.loc[ race , : ] / d.loc[ race , : ].sum()) * 100)
    d.loc[ : , race  ] = ((d.loc[  : , race  ] / d.loc[  : , race  ].sum()) * 100)
In [ ]:
(d[race].value_counts(normalize=True)*100).astype(int)
In [ ]:
d

Chord Diagram

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

In [ ]:
colors = ["#2DE1FC","#1883B4","#C5DB66","#90B64D","#DB2B39","#E76926", "#DB9118"]
In [ ]:
names = left + right

Finally, we can put it all together.

In [ ]:
 
In [ ]:
 
In [ ]:
names[1]= "Asian or PI"


names[0]= "AIAN"
In [ ]:
 

Finally, we can put it all together but this time with the details matrix passed in.

In [ ]:
Chord(
    d.round(2).values.tolist(),
    names,
    colors=colors,
    credit=True,
      wrap_labels=True,
      margin=50, 
    font_size_large=7,
divide=True,
    noun="percent",
    divide_idx=len(left),
    divide_size=.2,
    width=850).show()

Chord Diagram with Names

It would be nice to show a list of Pokémon names when hovering over co-occurring Pokémon types. To do this, we can make use of the optional details parameter.

In [ ]:
 

Next, we'll create an empty multi-dimensional array with the same shape as our matrix.

In [ ]:
details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)

Now we can populate the details array with lists of Pokémon names in the correct positions.

In [ ]:
for count_x, item_x in enumerate(names):
    for count_y, item_y in enumerate(names):
        details_urls = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['url'].to_list()
        
        details_names = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['name'].to_list()
        
        urls_names = np.column_stack((details_urls, details_names))
        if(urls_names.size > 0):
            details[count_x][count_y] = details_names
            details_thumbs[count_x][count_y] = details_urls

        else:
            details[count_x][count_y] = []
            details_thumbs[count_x][count_y] = []

details=pd.DataFrame(details).values.tolist()
details_thumbs=pd.DataFrame(details_thumbs).values.tolist()
In [ ]:
len(right)
In [ ]:
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, font_size_large=7,details=details,details_thumbs=details_thumbs,noun="villagers",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).show()
In [ ]:
np.empty(shape=(6,1)).tolist()

Conclusion

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!

Visualisation of Co-occurring Types

Preamble

In [2]:
:dep darn = {version = "0.1.11"}
:dep ndarray = {version = "0.13.0"}
:dep itertools = {version = "0.9.0"}
:dep chord = {Version = "0.1.4"}
extern crate ndarray;

use ndarray::prelude::*;
use itertools::Itertools;
use chord::{Chord, Plot};

Introduction

In this section, we're going to use the Complete Pokemon Dataset dataset to visualise the co-occurrence of Pokémon types from generations one to eight.

The Dataset

The dataset documentation states that we can expect two type variables per each of the 1028 samples of the first eight generations, type_1, and type_2.

Let's download the mirrored dataset and have a look for ourselves.

In [3]:
let data = darn::read_csv("https://shahinrostami.com/datasets/pokemon_gen_1_to_8.csv");
In [4]:
darn::show_frame(&data.0, Some(&data.1));
Out[4]:
pokedex_number name german_name japanese_name generation status species type_number type_1 type_2 height_m weight_kg abilities_number ability_1 ability_2 ability_hidden total_points hp attack defense sp_attack sp_defense speed catch_rate base_friendship base_experience growth_rate egg_type_number egg_type_1 egg_type_2 percentage_male egg_cycles against_normal against_fire against_water against_electric against_grass against_ice against_fight against_poison against_ground against_flying against_psychic against_bug against_rock against_ghost against_dragon against_dark against_steel against_fairy
"0" "1" "Bulbasaur" "Bisasam" "フシギダネ (Fushigidane)" "1" "Normal" "Seed Pokémon" "2" "Grass" "Poison" "0.7" "6.9" "2" "Overgrow" "" "Chlorophyll" "318" "45" "49" "49" "65" "65" "45" "45" "70" "64" "Medium Slow" "2" "Grass" "Monster" "87.5" "20" "1" "2" "0.5" "0.5" "0.25" "2" "0.5" "1" "1" "2" "2" "1" "1" "1" "1" "1" "1" "0.5"
"1" "2" "Ivysaur" "Bisaknosp" "フシギソウ (Fushigisou)" "1" "Normal" "Seed Pokémon" "2" "Grass" "Poison" "1" "13" "2" "Overgrow" "" "Chlorophyll" "405" "60" "62" "63" "80" "80" "60" "45" "70" "142" "Medium Slow" "2" "Grass" "Monster" "87.5" "20" "1" "2" "0.5" "0.5" "0.25" "2" "0.5" "1" "1" "2" "2" "1" "1" "1" "1" "1" "1" "0.5"
"2" "3" "Venusaur" "Bisaflor" "フシギバナ (Fushigibana)" "1" "Normal" "Seed Pokémon" "2" "Grass" "Poison" "2" "100" "2" "Overgrow" "" "Chlorophyll" "525" "80" "82" "83" "100" "100" "80" "45" "70" "236" "Medium Slow" "2" "Grass" "Monster" "87.5" "20" "1" "2" "0.5" "0.5" "0.25" "2" "0.5" "1" "1" "2" "2" "1" "1" "1" "1" "1" "1" "0.5"
"3" "3" "Mega Venusaur" "Bisaflor" "フシギバナ (Fushigibana)" "1" "Normal" "Seed Pokémon" "2" "Grass" "Poison" "2.4" "155.5" "1" "Thick Fat" "" "" "625" "80" "100" "123" "122" "120" "80" "45" "70" "281" "Medium Slow" "2" "Grass" "Monster" "87.5" "20" "1" "1" "0.5" "0.5" "0.25" "1" "0.5" "1" "1" "2" "2" "1" "1" "1" "1" "1" "1" "0.5"
"4" "4" "Charmander" "Glumanda" "ヒトカゲ (Hitokage)" "1" "Normal" "Lizard Pokémon" "1" "Fire" "" "0.6" "8.5" "2" "Blaze" "" "Solar Power" "309" "39" "52" "43" "60" "50" "65" "45" "70" "62" "Medium Slow" "2" "Dragon" "Monster" "87.5" "20" "1" "0.5" "2" "1" "0.5" "0.5" "1" "1" "2" "1" "1" "0.5" "2" "1" "1" "1" "0.5" "0.5"
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
"1023" "888" "Zacian Hero of Many Battles" "" "" "8" "Legendary" "Warrior Pokémon" "1" "Fairy" "" "2.8" "110" "1" "Intrepid Sword" "" "" "670" "92" "130" "115" "80" "115" "138" "" "" "" "Slow" "1" "Undiscovered" "" "" "120" "1" "1" "1" "1" "1" "1" "0.5" "2" "1" "1" "1" "0.5" "1" "1" "0" "0.5" "2" "1"
"1024" "889" "Zamazenta Crowned Shield" "" "" "8" "Legendary" "Warrior Pokémon" "2" "Fighting" "Steel" "2.9" "785" "1" "Dauntless Shield" "" "" "720" "92" "130" "145" "80" "145" "128" "" "" "" "Slow" "1" "Undiscovered" "" "" "120" "0.5" "2" "1" "1" "0.5" "0.5" "2" "0" "2" "1" "1" "0.25" "0.25" "1" "0.5" "0.5" "0.5" "1"
"1025" "889" "Zamazenta Hero of Many Battles" "" "" "8" "Legendary" "Warrior Pokémon" "1" "Fighting" "" "2.9" "210" "1" "Dauntless Shield" "" "" "670" "92" "130" "115" "80" "115" "138" "" "" "" "Slow" "1" "Undiscovered" "" "" "120" "1" "1" "1" "1" "1" "1" "1" "1" "1" "2" "2" "0.5" "0.5" "1" "1" "0.5" "1" "2"
"1026" "890" "Eternatus" "" "" "8" "Legendary" "Gigantic Pokémon" "2" "Poison" "Dragon" "20" "950" "1" "Pressure" "" "" "690" "140" "85" "95" "145" "95" "130" "" "" "" "Slow" "1" "Undiscovered" "" "" "120" "1" "0.5" "0.5" "0.5" "0.25" "2" "0.5" "0.5" "2" "1" "2" "0.5" "1" "1" "2" "1" "1" "1"
"1027" "890" "Eternatus Eternamax" "" "" "8" "Legendary" "Gigantic Pokémon" "2" "Poison" "Dragon" "100" "" "0" "" "" "" "1125" "255" "115" "250" "125" "250" "130" "" "" "" "Slow" "1" "Undiscovered" "" "" "120" "1" "0.5" "0.5" "0.5" "0.25" "2" "0.5" "0.5" "2" "1" "2" "0.5" "1" "1" "2" "1" "1" "1"

It looks good so far, we can clearly see the two type columns. Let's confirm that we have 1028 samples.

In [5]:
&data.0.shape()
Out[5]:
[1028, 51]

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the column names that the Pokémon types are split between the columns type_1 and type_2.

In [96]:
&data.1
Out[96]:
["", "pokedex_number", "name", "german_name", "japanese_name", "generation", "status", "species", "type_number", "type_1", "type_2", "height_m", "weight_kg", "abilities_number", "ability_1", "ability_2", "ability_hidden", "total_points", "hp", "attack", "defense", "sp_attack", "sp_defense", "speed", "catch_rate", "base_friendship", "base_experience", "growth_rate", "egg_type_number", "egg_type_1", "egg_type_2", "percentage_male", "egg_cycles", "against_normal", "against_fire", "against_water", "against_electric", "against_grass", "against_ice", "against_fight", "against_poison", "against_ground", "against_flying", "against_psychic", "against_bug", "against_rock", "against_ghost", "against_dragon", "against_dark", "against_steel", "against_fairy"]

So let's select just these two columns and work with a list containing only them as we move forward.

In [102]:
let types = data.0.slice(s![.., 9..11]).into_owned();
darn::show_frame(&types, None);
Out[102]:
"Grass" "Poison"
"Grass" "Poison"
"Grass" "Poison"
"Grass" "Poison"
"Fire" ""
... ...
"Fairy" ""
"Fighting" "Steel"
"Fighting" ""
"Poison" "Dragon"
"Poison" "Dragon"

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

First, we'll populate our list of type names by looking for the unique ones.

In [103]:
let mut names = types.iter().cloned().unique().collect_vec();
names
Out[103]:
["Grass", "Poison", "Fire", "", "Flying", "Dragon", "Water", "Bug", "Normal", "Dark", "Electric", "Psychic", "Ground", "Ice", "Steel", "Fairy", "Fighting", "Rock", "Ghost"]

Let's sort this alphabetically.

In [104]:
names.sort();
names
Out[104]:
["", "Bug", "Dark", "Dragon", "Electric", "Fairy", "Fighting", "Fire", "Flying", "Ghost", "Grass", "Ground", "Ice", "Normal", "Poison", "Psychic", "Rock", "Steel", "Water"]

We'll also remove the empty string that has appeared as a result of samples with only one type.

In [105]:
names.remove(0);
names
Out[105]:
["Bug", "Dark", "Dragon", "Electric", "Fairy", "Fighting", "Fire", "Flying", "Ghost", "Grass", "Ground", "Ice", "Normal", "Poison", "Psychic", "Rock", "Steel", "Water"]

Now we can create our empty co-occurrence matrix with a shape that can hold co-occurrences between our types.

In [106]:
let type_count = names.len();
let mut matrix: Vec<Vec<f64>> = vec![vec![Default::default(); type_count]; type_count];
matrix
Out[106]:
[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]

We can populate a co-occurrence matrix with the following approach. Here, we're looping through every sample in our dataset and incrementing the corresponding matrix entry by one using the type_1 and type_2 indices from the names vector. To make sure we have a co-occurrence matrix, we're also doing the same in reverse, i.e. type_2 and type_1.

In [107]:
for item in types.genrows() { 
    if(!item[0].is_empty() && !item[1].is_empty()) {
        matrix[names.iter().position(|s| s == &item[1]).unwrap()][names.iter().position(|s| s == &item[0]).unwrap()] += 1.0;
        matrix[names.iter().position(|s| s == &item[0]).unwrap()][names.iter().position(|s| s == &item[1]).unwrap()] += 1.0;
    };
};

Chord Diagram

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

In [108]:
let colors = ["#A6B91A", "#705746", "#6F35FC", "#F7D02C", "#D685AD",
          "#C22E28", "#EE8130", "#A98FF3", "#735797", "#7AC74C",
          "#E2BF65", "#96D9D6", "#A8A77A", "#A33EA1", "#F95587",
          "#B6A136", "#B7B7CE", "#6390F0"];

Finally, we can put it all together.

In [120]:
Chord {
    matrix: matrix.clone(),
    names: names.clone(),
    colors: format!("{:?}", colors),
    margin: 30.0,
    wrap_labels: true,
    ..Chord::default()
}
.show();
Out[120]:
Chord Diagram

Conclusion

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!