Non-Dominated Sorting

Preamble

In [1]:
# used to create block diagrams
%reload_ext xdiag_magic
%xdiag_output_format svg
    
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import plotly.graph_objects as go    # for data visualisation
import plotly.io as pio              # to set shahin plot layout
import platypus as plat              # multi-objective optimisation framework
import plotly.express as px

pio.templates['shahin'] = pio.to_templated(go.Figure().update_layout(legend=dict(orientation="h",y=1.1, x=.5, xanchor='center'),margin=dict(t=0,r=0,b=40,l=40))).layout.template
pio.templates.default = 'shahin'

Introduction

In this section we're going to take a look at how to use the Platypus framework to apply the non-dominated sorting algorithm to a population of randomly generated solutions. Non-dominated sorting is important during the selection stage of an Evolutionary Algorithm, because it allows us to prioritise the selection of solutions based on their dominance relations with respect to the rest of the population. For example, we may wish to only select 25 solutions from a population of 50, with the intention to use these 25 solutions in the variation stage that follows.

Non-Dominated Sorting with Platypus

Let's define some necessary variables before invoking the Platypus implementation of the non-dominated sorting algorithm. We will use the ZDT1 test function with design variables $\mathrm{D}=30$ throughout this example, with population size $\mathrm{N}=50$.

In [2]:
problem = plat.ZDT1()
D = 30
N = 50

With these variables defined, we will now move onto generating our initial population. We will be using Platypus Solution objects for this, which we will initialise with random problem variables, evaluate, and then append to a list named solutions.

In [3]:
solutions = []

for i in range(N):
    solution = plat.Solution(problem)
    solution.variables = np.random.rand(D)
    solution.evaluate()
    solutions.append(solution)

Let's print out the variables and objectives for the first solution in this list to see what they look like.

In [14]:
print(f"Design variables:\n {solutions[0].variables}")
print(f"Objective values:\n {solutions[0].objectives}")
Design variables:
 [0.52605347 0.89977685 0.01395493 0.64451395 0.2116908  0.0010937
 0.85039043 0.77150837 0.4491369  0.90356755 0.19027337 0.94842878
 0.88791918 0.49979162 0.0801286  0.7354959  0.24616541 0.56134147
 0.638655   0.54388286 0.78525609 0.40214282 0.02438632 0.44368194
 0.03706867 0.8945156  0.15447895 0.56931374 0.0438145  0.30317388]
Objective values:
 [0.5260534732269664, 3.5988779548555936]

Now that we have a population of solutions stored in the solutions variable, we can pass this into the Platypus.nondominated_sort() function that will assign each solution a rank parameter, starting at 0 for the first non-dominated front and incrementing for other detected front.

In [5]:
plat.nondominated_sort(solutions)

Let's check to see if the first solution in our solutions list has one of these newly calculated rank parameters.

In [6]:
solutions[0].rank
Out[6]:
1

We now have a population of solutions that have been assigned their design variables, their objective values, and their ranks according to the non-dominated sorting algorithm.

Visualising the Non-Dominated Fronts

It would be useful to visualise these non-dominated fronts, according to their newly assigned ranks, using a scatterplot for further investigation. Before we do this, we will migrate our solution variables from their Platypus data-structure to a DataFrame. We will do this for the simple reason that our visualisation tools, e.g. Plotly, work quite well with the DataFrame format.

Let's begin by creating a new DataFrame with the columns f1, f2, and front_rank. f1 and f2 will be used to store our objective values for each solution, which is sufficient for the two-objective ZDT1 problem, and rank_front will be used to store the non-dominated sorting ranks that were calculated earlier.

In [7]:
solutions_df = pd.DataFrame(index=range(N),columns=['f1','f2','front_rank'])
solutions_df.head()
Out[7]:
f1 f2 front_rank
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN

We can see that we've also defined an index range that covers the number of solutions in our population, $\mathrm{N}=50$. This means we have $50$ rows ready, but they their values are initialised to NaN (Not A Number), which in this case simply indicates missing data.

Let's use a loop to iterate through our solutions list of $50$ solutions and assign the desired values to the corresponding row in our solutions_df DataFrame

In [8]:
for i in range(N):
    solutions_df.loc[i].f1 = solutions[i].objectives[0]
    solutions_df.loc[i].f2 = solutions[i].objectives[1]
    solutions_df.loc[i].front_rank = solutions[i].rank
        
solutions_df.head()
Out[8]:
f1 f2 front_rank
0 0.526053 3.59888 1
1 0.325999 4.02324 1
2 0.369734 4.01765 3
3 0.201137 4.39002 2
4 0.368714 3.88653 2

We can see our DataFrame now contains the desired values. We're now ready to go with some visualisation, so let's start with a simple scatterplot for the entire population in the objective space.

In [9]:
fig = go.Figure(layout=dict(xaxis=dict(title='f1'), 
                            yaxis=dict(title='f2')))

fig.add_scatter(x=solutions_df.f1, y=solutions_df.f2, mode='markers')

fig.show()