Sample Size Sufficiency

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import plotly.graph_objects as go    # for data visualisation
import platypus as plat              # multi-objective optimisation framework
from scipy import stats

# Optional Customisations
import plotly.io as pio              # to set shahin plot layout
pio.templates['shahin'] = pio.to_templated(go.Figure().update_layout(
    legend=dict(orientation="h",y=1.1, x=.5, xanchor='center'),
    margin=dict(t=0,r=0,b=40,l=40))).layout.template
pio.templates.default = 'shahin'
pio.renderers.default = "notebook_connected" # remove when running locally 

Introduction

Before conducting a comparison between algorithms we need to determine whether our sample size will be sufficient, i.e. is our sample size large enough to support our hypothesis? One approach to determine sample size sufficiency is to investigate the relationship between the sample size ($n$) and the Standard Error of the Mean $(SE_M$). This is calculated by taking the standard deviation and dividing it by the square root of the number of samples under consideration. This is done for each sample size incrementally until any further increase offers trivial gains.

Let's try to determine the sufficient sample size for our experiment using this approach.

Executing an Experiment and Generating Results

In this section, we will be using the Platypus implementation of NSGA-II to generate solutions for the DTLZ1 test problem.

First, we will create a list named problems where each element is a DTLZ test problem that we want to use.

In [2]:
problems = [plat.DTLZ1]

Similarly, we will create a list named algorithms where each element is an algorithm that we want to compare.

In [3]:
algorithms = [plat.NSGAII]

Now we can execute an experiment, specifying the number of function evaluations, $nfe=10,000$, and the number of executions per problem, $seed=250$. This may take some time to complete depending on your processor speed and the number of function evaluations.

Warning

Running the code below will take a long time to complete even if you have good hardware. To put things into perspective, you will be executing an optimisation process 250 times, per 1 test problem, per 1 algorithm. That's 250 executions of 10,000 function evaluations, totalling in at 2,500,000 function evaluations.

We are also using the ProcessPoolEvaluator in Platypus to speed things up.

In [4]:
with plat.ProcessPoolEvaluator(10) as evaluator:
    results = plat.experiment(algorithms, problems, nfe=10000, seeds=250, evaluator=evaluator)

Once the above execution has completed, we can initialize an instance of the hypervolume indicator provided by Platypus.

In [5]:
hyp = plat.Hypervolume(minimum=[0, 0, 0], maximum=[1, 1, 1])

Now we can use the calculate function provided by Platypus to calculate all our hypervolume indicator measurements for the results from our above experiment.

In [6]:
hyp_result = plat.calculate(results, hyp)

Finally, let's get the hypervolume indicator scores for all executions of NSGA-II on DTLZ1.

In [7]:
hyp_result = hyp_result['NSGAII']['DTLZ1']['Hypervolume']

Calculating and Plotting the Sample Error from the Mean

It may be tempting at this point to start generating results with other algorithms to start our comparison, however, it's important to determine a sufficient sample size before moving on. One approach is to look at the relationship between sample sizes and the Standard Error of the Mean (SEM). The formula for this is $SE_M = \frac{s}{\sqrt{n}}$

Let's use the sem function from scipy.stats to calculate the $SE_M$ for each sample size made possible by our experiment above. We will incrementally append these to a list so that we can plot them later.

In [8]:
SEM = []

for sample_size in range(3,len(hyp_result)):
    SEM.append(stats.sem(hyp_result[0:sample_size]))

All that's left now is to plot the $SE_M$ for each incrementally ascending sample size.

In [9]:
fig = go.Figure(
    data=go.Scatter(x=list(range(3,len(hyp_result))), y=SEM),
    layout=dict(xaxis=dict(title='Sample Size'),yaxis=dict(title='Standard Error of the Mean'))
)

fig.show()

We may decide that a sufficient sample size can be selected when the $SE_M$ starts to settle below $0.05$. In this case, a sample size of $50$ can be justified.

Conclusion

In this section, we have generated a large sample of results for a single algorithm on a single problem, and we have then calculated the sample error from the mean incrementally on all possible sample sizes. This has allowed us to determine a sample size that may be sufficient in the rest of our experiment. You will find different sample sizes used throughout the literature, however, you will very rarely find a clear justification for the selection. Using this approach, you can increase your confidence in the number of samples in your experiments.

Oversampling the Cardiotocography Data Set

Preamble

You may be missing the imblearn package below. With anaconda, you can grab it using conda install -c conda-forge imbalanced-learn

In [2]:
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import plotly.graph_objects as go    # for data visualisation
import plotly.io as pio              # to set shahin plot layout

from imblearn.over_sampling import ADASYN

pio.templates['shahin'] = pio.to_templated(go.Figure().update_layout(legend=dict(orientation="h",y=1.1, x=.5, xanchor='center'),margin=dict(t=0,r=0,b=40,l=40))).layout.template
pio.templates.default = 'shahin'

Introduction

We've covered class imbalance and over sampling in another section, so this will just serve as another example on a different dataset. In this section, we'll be using the Cardiotocography (CTG) dataset located at https://archive.ics.uci.edu/ml/datasets/cardiotocography. It has 23 attributes, 2 of which are two different classifications of the same samples, CLASS (1 to 10) and NSP (1 to 3).

Downloading the Dataset

To keep this notebook independant, we will download the CTG dataset within our code. If you've already downloaded it and viewed it using some spreadsheet software, you will have noticed that we want to use the "Data" spreadsheet, and that we also want to drop he first row.

In [3]:
url="https://archive.ics.uci.edu/ml/machine-learning-databases/00193/CTG.xls"
data=pd.read_excel(url, sheet_name="Data", header=1)

Let's have a quick look at what we've downloaded.

In [8]:
data.head()
Out[8]:
b e AC FM UC DL DS DP DR Unnamed: 9 ... E AD DE LD FS SUSP Unnamed: 42 CLASS Unnamed: 44 NSP
0 240.0 357.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN ... -1.0 -1.0 -1.0 -1.0 1.0 -1.0 NaN 9.0 NaN 2.0
1 5.0 632.0 4.0 0.0 4.0 2.0 0.0 0.0 0.0 NaN ... -1.0 1.0 -1.0 -1.0 -1.0 -1.0 NaN 6.0 NaN 1.0
2 177.0 779.0 2.0 0.0 5.0 2.0 0.0 0.0 0.0 NaN ... -1.0 1.0 -1.0 -1.0 -1.0 -1.0 NaN 6.0 NaN 1.0
3 411.0 1192.0 2.0 0.0 6.0 2.0 0.0 0.0 0.0 NaN ... -1.0 1.0 -1.0 -1.0 -1.0 -1.0 NaN 6.0 NaN 1.0
4 533.0 1147.0 4.0 0.0 5.0 0.0 0.0 0.0 0.0 NaN ... -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN 2.0 NaN 1.0

5 rows × 46 columns

Preparing the Dataset

Looking at the data, we can see the last 3 rows are not samples

In [59]:
data.tail()
Out[59]:
b e AC FM UC DL DS DP DR Unnamed: 9 ... E AD DE LD FS SUSP Unnamed: 42 CLASS Unnamed: 44 NSP
2124 1576.0 3049.0 1.0 0.0 9.0 0.0 0.0 0.0 0.0 NaN ... 1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN 5.0 NaN 2.0
2125 2796.0 3415.0 1.0 1.0 5.0 0.0 0.0 0.0 0.0 NaN ... -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN 1.0 NaN 1.0
2126 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2127 NaN NaN NaN NaN NaN 0.0 0.0 0.0 0.0 NaN ... 72.0 332.0 252.0 107.0 69.0 197.0 NaN NaN NaN NaN
2128 NaN NaN NaN 564.0 23.0 16.0 1.0 4.0 0.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 46 columns

So we will get rid of them.

In [60]:
data = data.drop(data.tail(3).index)

Let's check to make sure they're gone.

In [61]:
data.tail()
Out[61]:
b e AC FM UC DL DS DP DR Unnamed: 9 ... E AD DE LD FS SUSP Unnamed: 42 CLASS Unnamed: 44 NSP
2121 2059.0 2867.0 0.0 0.0 6.0 0.0 0.0 0.0 0.0 NaN ... 1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN 5.0 NaN 2.0
2122 1576.0 2867.0 1.0 0.0 9.0 0.0 0.0 0.0 0.0 NaN ... 1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN 5.0 NaN 2.0
2123 1576.0 2596.0 1.0 0.0 7.0 0.0 0.0 0.0 0.0 NaN ... 1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN 5.0 NaN 2.0
2124 1576.0 3049.0 1.0 0.0 9.0 0.0 0.0 0.0 0.0 NaN ... 1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN 5.0 NaN 2.0
2125 2796.0 3415.0 1.0 1.0 5.0 0.0 0.0 0.0 0.0 NaN ... -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN 1.0 NaN 1.0

5 rows × 46 columns

Now let's create a dataframe that only contains our input features and our desired classification. In the spreadsheet we know our 21 features were labelled with the numbers 1 to 21, starting from the column LB and ending on the column Tendency. Let's display all the columns in our dataframe.

In [62]:
data.columns
Out[62]:
Index(['b', 'e', 'AC', 'FM', 'UC', 'DL', 'DS', 'DP', 'DR', 'Unnamed: 9', 'LB',
       'AC.1', 'FM.1', 'UC.1', 'DL.1', 'DS.1', 'DP.1', 'ASTV', 'MSTV', 'ALTV',
       'MLTV', 'Width', 'Min', 'Max', 'Nmax', 'Nzeros', 'Mode', 'Mean',
       'Median', 'Variance', 'Tendency', 'Unnamed: 31', 'A', 'B', 'C', 'D',
       'E', 'AD', 'DE', 'LD', 'FS', 'SUSP', 'Unnamed: 42', 'CLASS',
       'Unnamed: 44', 'NSP'],
      dtype='object')

There are many ways to reduce all of these columns down to the 21 we want. In this notebook, we're going to be explicit with our column selections to ensure no errors can be introduced from a change in the external spreadsheet data.

In [63]:
columns = ['LB', 'AC.1', 'FM.1', 'UC.1', 'DL.1', 'DS.1', 'DP.1', 'ASTV',
           'MSTV', 'ALTV', 'MLTV', 'Width', 'Min', 'Max', 'Nmax', 'Nzeros',
           'Mode', 'Mean', 'Median', 'Variance', 'Tendency']

features = data[columns]

Let's have a peek at our new dataframe.

In [64]:
features.head()
Out[64]:
LB AC.1 FM.1 UC.1 DL.1 DS.1 DP.1 ASTV MSTV ALTV ... Width Min Max Nmax Nzeros Mode Mean Median Variance Tendency
0 120.0 0.000000 0.0 0.000000 0.000000 0.0 0.0 73.0 0.5 43.0 ... 64.0 62.0 126.0 2.0 0.0 120.0 137.0 121.0 73.0 1.0
1 132.0 0.006380 0.0 0.006380 0.003190 0.0 0.0 17.0 2.1 0.0 ... 130.0 68.0 198.0 6.0 1.0 141.0 136.0 140.0 12.0 0.0
2 133.0 0.003322 0.0 0.008306 0.003322 0.0 0.0 16.0 2.1 0.0 ... 130.0 68.0 198.0 5.0 1.0 141.0 135.0 138.0 13.0 0.0
3 134.0 0.002561 0.0 0.007682 0.002561 0.0 0.0 16.0 2.4 0.0 ... 117.0 53.0 170.0 11.0 0.0 137.0 134.0 137.0 13.0 1.0
4 132.0 0.006515 0.0 0.008143 0.000000 0.0 0.0 16.0 2.4 0.0 ... 117.0 53.0 170.0 9.0 0.0 137.0 136.0 138.0 11.0 1.0

5 rows × 21 columns

Let's also create a new dataframe to store our labels, the NSP column, which contains our desired classifications.

In [65]:
labels = data['NSP']

Again, we'll check to see what we've created.

In [66]:
labels.head()
Out[66]:
0    2.0
1    1.0
2    1.0
3    1.0
4    1.0
Name: NSP, dtype: float64

Frequency of each Classication

Now let's see how many samples we have for each class.

In [67]:
value_counts = labels.value_counts()

fig = go.Figure()
fig.add_bar(x=value_counts.index, y=value_counts.values)
fig.show()

We can see that our dataset is imbalanced, with class 1 having a significantly higher number of samples. This may be a problem for us depending on our approach, so we may want to balance this dataset before continuing our study.

Oversampling

We could perform over-sampling using an implementation of the Adaptive Synthetic (ADASYN) sampling approach in the imbalanced-learn library. We will pass in our features and labels variables, and expect outputs that have been resampled for balanced frequency.

In [68]:
features_resampled, labels_resampled = ADASYN().fit_sample(features, labels)

Let's see how many samples we have for each class this time.

In [69]:
labels_resampled = pd.DataFrame(labels_resampled, columns=['NSP'])
value_counts = labels_resampled['NSP'].value_counts()

fig = go.Figure()
fig.add_bar(x=value_counts.index, y=value_counts.values)
fig.show()

Much better!

Conclusion

In this section, we've used adaptive synthetic sampling to resample and balance our CTG dataset. The output is a balanced dataset, however, it's important to remember that these approaches should only be applied to training data, and never to data that is to be used for testing. For your own experiments, make sure you only apply an approach like this after you have already split your dataset and locked away the test set for later.

Using a Framework to Compare Algorithm Performance

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import platypus as plat              # multi-objective optimisation framework

Introduction

When preparing to implement multi-objective optimisation experiments, it's often more convenient to use a ready-made framework/library instead of programming everything from scratch. There are many libraries and frameworks that have been implemented in many different programming languages. With our focus on multi-objective optimisation, our choice is an easy one. We will choose Platypus which has a focus on multi-objective problems and optimisation.

Platypus is a framework for evolutionary computing in Python with a focus on multiobjective evolutionary algorithms (MOEAs). It differs from existing optimization libraries, including PyGMO, Inspyred, DEAP, and Scipy, by providing optimization algorithms and analysis tools for multiobjective optimization.

In this section, we will use the Platypus framework to compare the performance of the Non-dominated Sorting Genetic Algorithm II (NSGA-II)1 and the Pareto Archived Evolution Strategy (PAES)2. To do this, we will use them to generate solutions to four problems in the DTLZ test suite3.

Because both of these algorithms are stochastic, meaning that they will produce different results every time they are executed, we will select a sufficient sample size of 30 per algorithm per test problem. We will also use the default configurations for all the test problems and algorithms employed in this comparison. We will use the Hypervolume Indicator (introduced in earlier sections) as our performance metric.

Executing an Experiment and Generating Results

In this section, we will using the Platypus implementation of NSGA-II and PAES to generate solutions for the DTLZ1, DTLZ2, DTLZ3, and DTLZ4 test problems.

First, we will create a list named problems where each element is a DTLZ test problem that we want to use.

In [2]:
problems = [plat.DTLZ1, plat.DTLZ2, plat.DTLZ3, plat.DTLZ4]

Similarly, we will create a list named algorithms where each element is an algorithm that we want to compare.

In [3]:
algorithms = [plat.NSGAII, plat.PAES]

Now we can execute an experiment, specifying the number of function evaluations, $nfe=10,000$, and the number of executions per problem, $seeds=30$. This may take some time to complete depending on your processor speed and the number of function evaluations.

Warning

Running the code below will take a long time to complete even if you have good hardware. To put things into perspective, you will be executing an optimisation process 30 times, per 4 test problems, per 2 algorithms. That's 240 executions of 10,000 function evaluations, totalling in at 2,400,000 function evaluations. You may prefer to change the value of seeds below to something smaller like 5 for now.

We are also using the ProcessPoolEvaluator in Platypus to speed things up.

In [4]:
with plat.ProcessPoolEvaluator(10) as evaluator:
    results = plat.experiment(algorithms, problems, seeds=30, nfe=10000, evaluator=evaluator)

Once the above execution has completed, we can initialize an instance of the hypervolume indicator provided by Platypus.

In [5]:
hyp = plat.Hypervolume(minimum=[0, 0, 0], maximum=[1, 1, 1])

Now we can use the calculate function provided by Platypus to calculate all our hypervolume indicator measurements for the results from our above experiment.

In [6]:
hyp_result = plat.calculate(results, hyp)

Finally, we can display these resultsu using the display function provided by Platypus.

In [7]:
plat.display(hyp_result, ndigits=3)
NSGAII
    DTLZ1
        Hypervolume : [0.699, 0.024, 0.44, 0.787, 0.122, 0.513, 0.686, 0.044, 0.472, 0.103, 0.439, 0.795, 0.103, 0.769, 0.1, 0.045, 0.0, 0.0, 0.017, 0.0, 0.0, 0.617, 0.0, 0.0, 0.0, 0.199, 0.0, 0.522, 0.0, 0.002]
    DTLZ2
        Hypervolume : [0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209]
    DTLZ3
        Hypervolume : [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
    DTLZ4
        Hypervolume : [0.0, 0.0, 0.0, 0.209, 0.209, 0.209, 0.209, 0.0, 0.0, 0.0, 0.209, 0.209, 0.0, 0.209, 0.209, 0.209, 0.209, 0.209, 0.209, 0.0, 0.209, 0.0, 0.209, 0.209, 0.0, 0.0, 0.0, 0.0, 0.209, 0.0]
PAES
    DTLZ1
        Hypervolume : [0.73, 0.697, 0.841, 0.842, 0.602, 0.829, 0.634, 0.861, 0.838, 0.628, 0.853, 0.73, 0.605, 0.662, 0.616, 0.843, 0.467, 0.835, 0.822, 0.824, 0.826, 0.702, 0.857, 0.598, 0.236, 0.835, 0.766, 0.77, 0.68, 0.783]
    DTLZ2
        Hypervolume : [0.195, 0.196, 0.192, 0.191, 0.192, 0.191, 0.182, 0.186, 0.192, 0.173, 0.186, 0.186, 0.152, 0.197, 0.194, 0.194, 0.177, 0.188, 0.189, 0.155, 0.178, 0.195, 0.189, 0.191, 0.196, 0.18, 0.186, 0.194, 0.152, 0.196]
    DTLZ3
        Hypervolume : [0.027, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.024, 0.0, 0.109, 0.0, 0.068, 0.0, 0.037, 0.0, 0.087, 0.0, 0.076, 0.0, 0.0, 0.092, 0.0, 0.072, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
    DTLZ4
        Hypervolume : [0.0, 0.0, 0.0, 0.0, 0.0, 0.148, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

Statistical Comparison of the Hypervolume Results

Now that we have a data structure that has been populated with results from each execution of the algorithms, we can do a quick statistical comparison to give us some indication as to which algorithmn (NSGA-II or PAES) performs better on each problem.

We can see in the output of display above that the data structure is organised as follows:

  • Algorithm name (e.g. NSGAII)
    • Problem name (e.g. DTLZ1)
      • Performance metric (e.g. Hypervolume)
        • The score for each run (e.g. 30 individual scores).

As a quick test, let's try and get the hypervolume indicator score for the first execution of NSGA-II on DTLZ1.

In [8]:
hyp_result['NSGAII']['DTLZ1']['Hypervolume'][0]
Out[8]:
0.6994730803830401

To further demonstrate how this works, let's also get the hypervolume indicator score for the sixth execution of NSGA-II on DTLZ1.

In [9]:
hyp_result['NSGAII']['DTLZ1']['Hypervolume'][5]
Out[9]:
0.5133530126219944

Finally, let's get the hypervolume indicator scores for all executions of NSGA-II on DTLZ1.

In [10]:
hyp_result['NSGAII']['DTLZ1']['Hypervolume']
Out[10]:
[0.6994730803830401,
 0.023633507390145952,
 0.43992961322697366,
 0.7874948337892063,
 0.12188120046384762,
 0.5133530126219944,
 0.6862515722411968,
 0.044290599003643356,
 0.471622401547617,
 0.10284908978058507,
 0.4392461927994993,
 0.7950275064573571,
 0.10266551759637746,
 0.7691174571576805,
 0.09952335665103744,
 0.04478737965014432,
 0.0,
 0.0,
 0.017045525592154418,
 0.0,
 0.0,
 0.6171522594782475,
 0.0,
 0.0,
 0.0,
 0.19925123811511297,
 0.0,
 0.5223248110288524,
 0.0,
 0.0020909300897199268]

Perfect. Now we can use numpy to calculate the mean hypervolume indicator value for all of our executions of NSGA-II on DTLZ1.

In [11]:
np.mean(hyp_result['NSGAII']['DTLZ1']['Hypervolume'])
Out[11]:
0.24996703616881444

Let's do the same for PAES.

In [12]:
np.mean(hyp_result['PAES']['DTLZ1']['Hypervolume'])
Out[12]:
0.7271208306100885

We can see that the mean hypervolume indicator value for PAES on DTLZ1 is higher than that of NSGA-II on DTLZ1. A higher hypervolume indicator value indicates better performance, so we can tentatively say that PAES outperforms NSGA-II on our configuration of DTLZ1 according to the hypervolume indicator. Of course, we haven't determined if this result is statistically significant.

Let's create a DataFrame where each column refers to the mean hypervolume indicator values for the test problems DTLZ1, DTLZ2, DTLZ3, and DTLZ4, and each row represent the performance of an algorithm (in this case, PAES and NSGA-II).

In [13]:
df_hyp_results = pd.DataFrame(index = hyp_result.keys())

for key_algorithm, algorithm in hyp_result.items():
    for key_problem, problem in algorithm.items():
        for hypervolumes in problem['Hypervolume']:
            df_hyp_results.loc[key_algorithm,key_problem] = np.mean(hypervolumes)
            
df_hyp_results
Out[13]:
DTLZ1 DTLZ2 DTLZ3 DTLZ4
NSGAII 0.002091 0.208583 0.0 0.0
PAES 0.783057 0.195858 0.0 0.0

Now we have an overview of how our selected algorithms performed on the selected test problems according to the hypervolume indicator. Personally, I find it easier to compare algorithm performance when each column represents a different algorithm rather than a problem.

In [14]:
df_hyp_results.transpose()
Out[14]:
NSGAII PAES
DTLZ1 0.002091 0.783057
DTLZ2 0.208583 0.195858
DTLZ3 0.000000 0.000000
DTLZ4 0.000000 0.000000

Without consideration for statistical significance, which algorithm performs best on each test problem?

Conclusion

In this section we have demonstrated how we can compare two popular multi-objective evolutionary algorithms on a selection of four test problems using the hypervolume indicator to measure their performance. In this case, we simply compared the mean of a sample of executions per problem per algorithm without consideration for statistical significance, however, this it is important to take this into account to ensure that any differences haven't occurred by chance.

Exercise

Create your own experiment, but this time include different algorithms and problems and determine which algorithm performs the best on each problem.


  1. Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. A. M. T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation, 6(2), 182-197. 

  2. Knowles, J., & Corne, D. (1999, July). The pareto archived evolution strategy: A new baseline algorithm for pareto multiobjective optimisation. In Congress on Evolutionary Computation (CEC99) (Vol. 1, pp. 98-105). 

  3. Deb, K., Thiele, L., Laumanns, M., & Zitzler, E. (2002, May). Scalable multi-objective optimization test problems. In Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No. 02TH8600) (Vol. 1, pp. 825-830). IEEE. 

Using a Framework to Generate Results

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import plotly.graph_objects as go    # for data visualisation
import platypus as plat              # multi-objective optimisation framework

# Optional Customisations
import plotly.io as pio              # to set shahin plot layout
pio.templates['shahin'] = pio.to_templated(go.Figure().update_layout(
    legend=dict(orientation="h",y=1.1, x=.5, xanchor='center'),
    margin=dict(t=0,r=0,b=40,l=40))).layout.template
pio.templates.default = 'shahin'
pio.renderers.default = "notebook_connected" # remove when running locally 

Introduction

When preparing to implement multi-objective optimisation experiments, it's often more convenient to use a ready-made framework/library instead of programming everything from scratch. Many libraries and frameworks have been implemented in many different programming languages. With our focus on multi-objective optimisation, our choice is an easy one. We will choose Platypus which has a focus on multi-objective problems and optimisation.

Platypus is a framework for evolutionary computing in Python with a focus on multiobjective evolutionary algorithms (MOEAs). It differs from existing optimization libraries, including PyGMO, Inspyred, DEAP, and Scipy, by providing optimization algorithms and analysis tools for multiobjective optimization.

In this section, we will use the Platypus framework to optimise solutions to an test function using the Non-dominated Sorting Genetic Algorithm II (NSGA-II)1, a fast and elitist multi-objective genetic algorithm.

Approximating DTLZ2 Solutions using NSGA-II

In this section, we will using the Platypus implementation of NSGA-II to generate solutions for the DTLZ2 test problem2. The DTLZ2 test problem comes from the DTLZ test suite, which consists of box-constrained continuous and scalable multi-objective problems.

Using Platypus, we will instantiate a DTLZ2 test problem object using its constructor with no parameters, this means our DTLZ2 problem will using the default configurations.

In [2]:
problem = plat.DTLZ2()

Next, we want to create an instance of the NSGA-II optimiser, and we'll pass in our problem object as a parameter for its configurations.

In [3]:
algorithm = plat.NSGAII(problem)

Now we can execute the optimisation process. Let's give the optimiser a budget of 10,000 function evaluations as a termination criterion. This may take some time to complete depending on your processor speed and the number of function evaluations.

In [4]:
algorithm.run(10000)

Finally, we can display the results. In this case, we will be printing the objective values for each solution in the final population of the above execution.

In [5]:
for solution in algorithm.result:
    print(solution.objectives)
[1.001007119645248, 7.320663605350456e-06]
[1.2551891561535127e-05, 1.001949596999905]
[0.6365017373606322, 0.7720829478305243]
[0.6040272026651482, 0.7973731691070404]
[0.8943979790285478, 0.44830556677711475]
[0.20737830831722096, 0.9792444903351435]
[0.9641621183072847, 0.2669465813743207]
[0.22974128290102674, 0.9738012806943456]
[0.9074625783291138, 0.42262341807486337]
[0.5626841811487956, 0.8268269319198399]
[0.9955878351151196, 0.09715203869969777]
[0.6541050368669689, 0.7565458811646342]
[0.7931956056922939, 0.6091635809139232]
[0.4075616295976096, 0.9133684712543005]
[0.9551540243895625, 0.2984403865110613]
[0.12580571035464744, 0.9922195208182154]
[0.43084543109258033, 0.9027432757526948]
[0.17262115804228387, 0.9851461356741054]
[0.9742249527397525, 0.22759727216550557]
[0.9436449788695505, 0.3329806896422685]
[0.9693088376345036, 0.2528164331044648]
[0.7802225409928367, 0.6259347805113161]
[0.999710475296015, 0.028776185214175166]
[0.9519338071882103, 0.30774604781643256]
[0.9204735827033173, 0.3920089901267253]
[0.281922030771632, 0.9599324506434516]
[0.25390226439260966, 0.9678889336787766]
[0.1468963654343551, 0.9894488115966512]
[0.9299970521452303, 0.37138370081400407]
[0.456950480235959, 0.8898626456670854]
[0.8042411905133598, 0.5950853419545293]
[0.7595331736440106, 0.650986188441795]
[0.5458607171172147, 0.8389609942511561]
[0.3807573587253349, 0.9251715068025156]
[0.01172481513498524, 1.0000203107268981]
[0.29838119788195683, 0.9545572591415691]
[0.734845723530811, 0.6785300521827218]
[0.9389805778605371, 0.344506149455128]
[0.44296839700883706, 0.8969012445110347]
[0.31266511156459453, 0.9501650900649391]
[0.39380746321248794, 0.9193447968006679]
[0.9941080509449213, 0.11873624512846409]
[0.07969628111758259, 0.9970049861035941]
[0.3648484507124488, 0.93117439742437]
[0.038229760960032735, 0.999510331833269]
[0.6907185248790944, 0.7238588915762386]
[0.9992356692221003, 0.04539132102173066]
[0.8651311855491628, 0.5028723749491782]
[0.855249304419896, 0.5186312952724829]
[0.6661710620674723, 0.7465050495868518]
[0.5796318638279344, 0.8152231549363501]
[0.26501599133844567, 0.9651769949681495]
[0.933352046253645, 0.36202439090847405]
[0.8838267891914454, 0.46846894666308847]
[0.997997396213642, 0.06510743638010075]
[0.8432592774376269, 0.5379509455380959]
[0.8226766438656586, 0.5701827060151605]
[0.5923837311229727, 0.806407150477761]
[0.06275534947409371, 0.9981916424209306]
[0.7487291721792584, 0.663395660577969]
[0.9861294817147326, 0.1733832533117437]
[0.8339862770820583, 0.5527538143146593]
[0.9818788806407794, 0.2090648185097081]
[0.9835000607889485, 0.18336051535183787]
[0.6808948084321051, 0.7331020004362592]
[0.9918923700768506, 0.1318594898862529]
[0.8730953790018411, 0.48849213970593036]
[0.10024212439528354, 0.9951321620855413]
[0.32778266165395553, 0.9451151793788639]
[0.517590548750682, 0.8563042643490033]
[0.9890916647497262, 0.15014403746317279]
[0.16318177705358686, 0.986752875695305]
[0.8289280825984927, 0.5606793667515289]
[0.33550903300448554, 0.9421544434946403]
[0.6742581256624149, 0.739979002778486]
[0.47120056244996794, 0.8828515729177222]
[0.7051604993165188, 0.7114174168549432]
[0.6998400405974342, 0.7148120451338175]
[0.5323980724317485, 0.8480820599282101]
[0.7699751729676161, 0.6387047577442772]
[0.97809843898801, 0.21123101565486438]
[0.8769759115955059, 0.48243722445627285]
[0.7188372916882997, 0.6958954624833902]
[0.048209209909288564, 0.9990611475855025]
[0.48087778551481514, 0.8768944870576701]
[0.49363787446272445, 0.8707621538184233]
[0.34868464753028316, 0.9376914590939949]
[0.8420077404593919, 0.5404270992500956]
[0.8142401932398751, 0.5808454368272058]
[0.7509279490605555, 0.6611220687794795]
[0.9133623018503035, 0.4094732361924201]
[0.5068728297395826, 0.8625919556065812]
[0.7714399919913498, 0.6366329440434589]
[0.3533936063877142, 0.9356229737955203]
[0.09908077052330753, 0.9965607003345781]
[0.7264294568163192, 0.6878543662891023]
[0.5841729417550895, 0.8119896036171829]
[0.9898877169034702, 0.1469169220731843]
[0.9980488161652018, 0.06431439708523554]
[0.8836941506243303, 0.4729211532484433]

Visualising Solutions in Objective Space

In the last section, we concluded by printing the objective values for every solution. This information will be easier to digest using a plot, so let's quickly put the data into a Pandas DataFrame and then use Plotly to create a scatterplot. Let's start by moving our Platypus data structure to a DataFrame.

In [6]:
objective_values = np.empty((0, 2))

for solution in algorithm.result:
    y = solution.objectives
    objective_values = np.vstack([objective_values, y])
    
# convert to DataFrame
objective_values = pd.DataFrame(objective_values, columns=['f1','f2'])

All done, we can have a peek at our DataFrame to make sure we've not made any obvious mistakes.

In [7]:
objective_values
Out[7]:
f1 f2
0 1.001007 0.000007
1 0.000013 1.001950
2 0.636502 0.772083
3 0.604027 0.797373
4 0.894398 0.448306
... ... ...
95 0.726429 0.687854
96 0.584173 0.811990
97 0.989888 0.146917
98 0.998049 0.064314
99 0.883694 0.472921

100 rows × 2 columns

Looking good. All that we have left is the scatter plot.

In [8]:
fig = go.Figure()
fig.add_scatter(x=objective_values.f1, y=objective_values.f2, mode='markers')
fig.show()

Great! If you search for the true Pareto-optimal front for DTLZ2, you can see that our approximation is looking as expected.

Conclusion

In this section we have demonstrated how we can use a popular multi-objective optimisation algorithm, NSGA-II, to approximate multiple trade-off solutions to the DTLZ2 test problem. We did this using the Platypus framework, and also moved the results into a DataFrame in order to easily create a scatterplot.

Exercise

Repeat the same experiment in this section, but this time select a different algorithm that has been implemented in Platypus.


  1. Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. A. M. T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation, 6(2), 182-197. 

  2. Deb, K., Thiele, L., Laumanns, M., & Zitzler, E. (2002, May). Scalable multi-objective optimization test problems. In Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No. 02TH8600) (Vol. 1, pp. 825-830). IEEE.. 

Docker Toolbox, Windows 7, and Shared Volumes

Introduction

Docker is a container platform with so much to offer, and I've already published a quick guide on how to set it up in an earlier article. However, if you don't meet the system requirements for the newer versions then you will have to rely on Docker Toolbox, which is described as a legacy desktop solution:

Docker Toolbox is for older Mac and Windows systems that do not meet the requirements of Docker for Mac and Docker for Windows. We recommend updating to the newer applications, if possible.

To operate, Docker Toolbox installs a VirtualBox virtual machine named default, running the boot2docker Linux distribution. Because of this additional layer of virtualisation, additional steps are needed when sharing volumes between the host operating system and a Docker container. For example, a computer lab that I use for teaching runs Windows 7 with restricted student access. I have convinced our IT team to install Docker Toolbox, but my students still need some way of sharing a folder on their student drive (herein referred to as the H:\ drive) with the Docker container.

Preparing VirtualBox to Share Folders

To achieve this, the folder (e.g. H:\work) needs to be shared with the VirtualBox virtual machine named default, before it can be mounted as a volume within a Docker container. There may be other approaches, but here are the steps for our particular set-up:

  1. Open "Docker Quickstart Terminal".
  2. Once Docker is running, type docker-machine stop default.
  3. Open the Command Line from the start menu (search for cmd.exe).
  4. Navigate to the VirtualBox folder using cd C:/Program Files/Oracle/VirtualBox
  5. Type VBoxManage.exe sharedfolder add default --name "h/work" --hostpath "\\?\h:\work" --automount
  6. Type VBoxManage.exe setextradata default VBoxInternal2/SharedFoldersEnableSymlinksCreate/v-root 1
  7. Type VBoxManage.exe setextradata default VBoxInternal2/SharedFoldersEnableSymlinksCreate/h/work 1

The above steps will prepare VirtualBox for sharing H:\work. Now we need to mount the directory inside our docker-machine:

  1. Open "Docker Quickstart Terminal".
  2. Once Docker is running, type docker-machine ssh default
  3. Type sudo mkdir --parents /h/work
  4. Type sudo mount -t vboxsf h/work /h/work/
  5. Type exit

You can now create Docker container instances and share the H:\work folder with them as a mounted volume. Let's demonstrate this with jupyter/scipy-notebook

Sharing Folders with a Docker Container

To create a Docker container from the jupyter/scipy-notebook image, type the following command and wait for it to complete execution: docker run --name="scipy" --user root -v /h/work:/home/jovyan -d -e GRANT_SUDO=yes -p 8888:8888 jupyter/scipy-notebook start-notebook.sh --NotebookApp.token=''

This may take some time, as it will need to download and extract the image. Once it's finished, you should be able to access the Jupyter notebook using 127.0.0.1:8888. I hope this helps you get up and running with Docker Toolbox and shared folders. Of course, the process is typically easier when using the non-legacy Docker solutions.

Comments from the Old Blog

Docker Comments

Quick Guide to Docker

Introduction

Docker is a fantastic platform which has changed my workflow for the better. Before Docker, I was using VMWare images and linked-clones to create isolated environments for my experiments. Sharing these was not easy - the file sizes were too large and not everyone has a VMWare license. Performance wasn't great either, and VMWare updates would sometimes cause problems with the image. Before VMWare, I was using MATLAB and directly sharing the scripts for my experiments. This was an unpleasant experience, as each of the many versions of MATLAB (e.g. R2011a, R2011b, R2012a, R2012b, etc.) would break some functionality from one of the toolboxes. Again, not everyone could afford a MATLAB license, and I only had mine as part of my institution.

With Docker, I've moved to using Python's scientific stack and libraries such as TensorFlow and Keras. It took a while to get used to living without the MATLAB IDE, but there was a similar IDE called Spyder, which is a Scientific Python Development EnviRonment. However, I've gotten over the MATLAB IDE since discovering Jupyter. With all these technologies, I can share an environment identical to the one I executed my experiments in, and allow a colleague or reviewer to do the same. My students can also get up and running with the environments needed by my taught units without having to configure everything themselves. The only difficulty I've encountered so far is when someone has a Windows installation which doesn't support Docker for Windows, as Docker Toolbox isn't always a smooth experience.

I've created this video as a brief introduction to Docker, complete with instructions on how to install Docker on macOS, Windows 10, and Ubuntu 16.04.

The aim is to get someone started and tinkering with Docker quickly.

Common Problems

Kernel appears to have died

If your kernel dies when training a network using TensorFlow, then you might need to increase the amount of RAM allocated to Docker.

Kernel Restarting

You can do this in the Docker Preferences:

Minimum requirements not met

I have Windows 10 Pro/Enterprise/Education, but the installer says I don't meet the minimum requirements

You need to be on Windows 10 build 10586 or newer. You can check your Windows build number by clicking the start-bar and winver, and pressing the enter key:

Windows Build

Contributing Hypervolume Indicator

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import platypus as plat              # multi-objective optimisation framework
import pygmo as pg                   # multi-objective optimisation framework

Introduction

In this section, we're going to take a look at how to calculate the contributing hypervolume indicator values for a population of solutions.

The Contributing Hypervolume (CHV) indicator is a population sorting mechanism based on an adaptation of the hypervolume indicator. The hypervolume indicator works by calculating the size of the objective space that has been dominated by an entire approximation set with respect to a specified reference point, whereas the CHV indicator assigns each solution in an approximation set with the size of the space that has been dominated by each solution exclusively. With this information, the population can be sorted by the most dominant and diverse solutions. This has been illustrated below in two-dimensional space with a population of three solutions.

Calculating the exact CHV indicator is attractively simple. The method begins by first calculating the hypervolume indicator quality of a population $\mathrm{X}$, and then for each solution in the population, the solution is removed and the hypervolume indicator quality is again calculated for the new population. The new hypervolume indicator value is then subtracted from the hypervolume indicator value of the whole population, which results in the CHV indicator value of the solution which was removed. It is then possible to calculate the CHV indicator values of all the solutions in the population, order them by descending value so that they are ordered by the greatest explicit hypervolume indicator contribution, and select the better solutions to form the next parent population. This approach has been listed in Algorithm below.

$$ \begin{aligned} &\textbf{CHVIndicator}(f^{ref}, X)\\ &\;\;\;\;X_{HV} \leftarrow HV(f^{ref},X)\\ &\;\;\;\;for\;\;{n = 1 : \lambda}\;\;do\\ &\;\;\;\;X_t \leftarrow X \backslash {X_n}\\ &HV_n \leftarrow HV(f^{ref},X_t)\\ &CHV_n \leftarrow X_{HV} - HV_n\\ &\textbf{return}\;\;CHV \end{aligned} $$

Although many optimisation algorithms use the CHV as a sorting criterion for selection, its calculation becomes computationally infeasible as the number of objectives being considered increase. Monte Carlo approximations have been used to speed up the calculation of the CHV in which through empirical experiments has shown that the method does not impair the quality of the approximation set. However, the speed increase provided by the Monte Carlo approximation method still results in an infeasibility of the CHV indicator on problems consisting of five objectives or more.

This particular measure of diversity preservation can also be used to reduce the size of a final approximation set produced by an optimiser, to a size that will not overwhelm and confuse a decision-maker.

CHV Dependence

It is important to note the dependence of the CHV indicator on the population on which it is being calculated. To demonstrate this a population initially consisting of three solutions in two-objective space has been created, these solutions are $A:(1,10)$, $B:(5,3)$, $C:(6,2)$. The figure below illustrates the explicit HV which is contributed by each solution in the population, and it can be observed that in this population solution B contributes the most to the overall HV indicator value. This can be seen clearly by the number of squares shaded by the corresponding colour, and in this case, solution B covers seven squares.

CHV with three solutions

In order to properly demonstrate this dependence, a solution has been inserted into the population used in the previous example. The solutions for the following example are now $A:(1,10)$, $B:(5,3)$, $C:(6,2)$, $D:(4,4)$. The figure below illustrates the explicit HV which is contributed by each solution in this updated population, and it can be observed that in this population solution B now contributes the least to the overall HV indicator value. This can again be seen clearly by the number of squares shaded by the corresponding colour, and in this case, solution B now covers a single square. This is because the new solution, D, offers dominance over new areas of the objective space as well as objective space which is mutually dominated by solution B. As such, the objective which is mutually dominated by both of these solutions is no longer considered an explicit contribution by any one of these solutions.

CHV with four solutions

Calculating the Contributing Hypervolume Indicator with Platypus

Let's define some necessary variables before invoking the Platypus implementation of the hypervolume indicator algorithm to calculate the CHV of each solution. We will use the ZDT1 test function with the number of design variables $\mathrm{D}=30$ throughout this example, and with the population size $\mathrm{N}=100$.

In [2]:
problem = plat.ZDT1()
D = 30
N = 100

With these variables defined, we will now move onto generating our initial population. We will be using Platypus Solution objects for this, which we will initialise with random problem variables, evaluate, and then append to a list named solutions.

In [3]:
solutions = []

for i in range(N):
    solution = plat.Solution(problem)
    solution.variables = np.random.rand(D)
    solution.evaluate()
    solutions.append(solution)

Let's print out the variables and objectives for the first solution in this list to see what they look like.

In [4]:
print(f"Design variables:\n {solutions[0].variables}")
print(f"Objective values:\n {solutions[0].objectives}")
Design variables:
 [0.388372   0.16503775 0.32763606 0.80250695 0.27260434 0.09171043
 0.28853992 0.2869231  0.55675102 0.01486664 0.40790584 0.31637772
 0.6012928  0.90728331 0.75130006 0.87632648 0.03419778 0.39863985
 0.55448981 0.31577517 0.72176196 0.54713074 0.7239315  0.95006824
 0.38856727 0.60305019 0.27776695 0.41258289 0.65094839 0.02544009]
Objective values:
 [0.38837200082254586, 3.7087621950846414]

Now that we have a population of solutions stored in the solutions variable, we can prepare an instance of the Platypus.indicators.Hypervolume() object with the desired reference point for the hypervolume indicator calculation. For ZDT1, the reference point is typically $\langle11,11\rangle$.

In [5]:
hyp = plat.indicators.Hypervolume(minimum=[0, 0], maximum=[11, 11])

We can now use this hyp object to calculate the hypervolume indicator for any population.

Note

The Platypus implementation of the hypervolume indicator requires either a minimum and a maximum point, or a reference_set (not the same as the hypervolume reference point). Normally, a hypervolume indicator algorithm would only require a single vector that defines the reference point $f ^{ref}$. In the case of Platypus, $f ^{ref}$ actually corresponds to maximum, but Platypus also forces us to provide a vector for minimum, which we have set to $\langle0, 0\rangle$

Let's calculate the hypervolume indicator value for the population of solutions we created above and named solution. We'll store this in HV.

In [6]:
HV = hyp.calculate(solutions)
print(f"Hypervolume indicator value: {HV}")
Hypervolume indicator value: 0.7988496734751387

We now have this single hypervolume indicator value that we will use to calculate the explicit hypervolume contribution of each solution in the population.

In [7]:
CHV = np.empty((0, 1))

for i in range(N):
    solutions_subset = solutions[:i] + solutions[i+1:]
    CHV = np.vstack([CHV, HV - hyp.calculate(solutions_subset)])
    print(f"CHV of solution {i}:\t{CHV[i][0]}")
CHV of solution 0:	0.0
CHV of solution 1:	0.0
CHV of solution 2:	0.0
CHV of solution 3:	0.0
CHV of solution 4:	0.0
CHV of solution 5:	0.0
CHV of solution 6:	0.0
CHV of solution 7:	0.0
CHV of solution 8:	2.1016782758342956e-05
CHV of solution 9:	-1.1102230246251565e-16
CHV of solution 10:	1.1102230246251565e-16
CHV of solution 11:	0.0
CHV of solution 12:	-1.1102230246251565e-16
CHV of solution 13:	0.0
CHV of solution 14:	0.0
CHV of solution 15:	0.0
CHV of solution 16:	-1.1102230246251565e-16
CHV of solution 17:	0.0
CHV of solution 18:	0.0
CHV of solution 19:	0.0
CHV of solution 20:	-1.1102230246251565e-16
CHV of solution 21:	0.0
CHV of solution 22:	0.0
CHV of solution 23:	1.1102230246251565e-16
CHV of solution 24:	0.0005987393592888912
CHV of solution 25:	-1.1102230246251565e-16
CHV of solution 26:	0.0
CHV of solution 27:	0.0
CHV of solution 28:	0.0
CHV of solution 29:	0.0
CHV of solution 30:	0.0
CHV of solution 31:	0.0
CHV of solution 32:	0.0
CHV of solution 33:	0.0
CHV of solution 34:	-1.1102230246251565e-16
CHV of solution 35:	0.0
CHV of solution 36:	0.0
CHV of solution 37:	0.0
CHV of solution 38:	0.0
CHV of solution 39:	4.5810210042018795e-06
CHV of solution 40:	-1.1102230246251565e-16
CHV of solution 41:	0.0
CHV of solution 42:	0.0
CHV of solution 43:	-1.1102230246251565e-16
CHV of solution 44:	0.0
CHV of solution 45:	0.0
CHV of solution 46:	0.0001693656124199805
CHV of solution 47:	0.0
CHV of solution 48:	-1.1102230246251565e-16
CHV of solution 49:	0.0
CHV of solution 50:	0.0
CHV of solution 51:	0.0
CHV of solution 52:	0.0
CHV of solution 53:	0.0
CHV of solution 54:	0.0
CHV of solution 55:	6.056482614091863e-06
CHV of solution 56:	-1.1102230246251565e-16
CHV of solution 57:	1.1102230246251565e-16
CHV of solution 58:	0.0
CHV of solution 59:	0.0
CHV of solution 60:	0.0
CHV of solution 61:	0.0
CHV of solution 62:	0.0
CHV of solution 63:	-1.1102230246251565e-16
CHV of solution 64:	-1.1102230246251565e-16
CHV of solution 65:	0.0
CHV of solution 66:	-1.1102230246251565e-16
CHV of solution 67:	0.0
CHV of solution 68:	3.6880928190763385e-05
CHV of solution 69:	0.05976250426304486
CHV of solution 70:	0.0
CHV of solution 71:	0.0
CHV of solution 72:	0.0
CHV of solution 73:	0.0
CHV of solution 74:	0.0
CHV of solution 75:	0.0
CHV of solution 76:	0.0
CHV of solution 77:	0.0
CHV of solution 78:	0.0
CHV of solution 79:	5.772156118832861e-05
CHV of solution 80:	0.0
CHV of solution 81:	-1.1102230246251565e-16
CHV of solution 82:	9.727938862358343e-05
CHV of solution 83:	0.0
CHV of solution 84:	0.0
CHV of solution 85:	0.0
CHV of solution 86:	0.0
CHV of solution 87:	0.0
CHV of solution 88:	-1.1102230246251565e-16
CHV of solution 89:	-1.1102230246251565e-16
CHV of solution 90:	-1.1102230246251565e-16
CHV of solution 91:	0.0
CHV of solution 92:	0.0006112409913804351
CHV of solution 93:	4.781326621960957e-06
CHV of solution 94:	0.0
CHV of solution 95:	-1.1102230246251565e-16
CHV of solution 96:	0.0
CHV of solution 97:	0.0
CHV of solution 98:	3.6825691780317804e-05
CHV of solution 99:	-1.1102230246251565e-16

With the above list, we can consider anything with a CHV of $0$ or around the precision of $16$ positions behind the decimal as a solution that does not contribute to the hypervolume quality of the population.

Calculating the Contributing Hypervolume Indicator with PyGMO

We can also use a different framework's implementation of the hypervolume indicator on our population. We should be expecting the same values, but this is a good exercise to learn how to use a different framework, and perhaps to check that they do indeed arrive at the same value.

This time we will use the PyGMO framework. With PyGMO, we are lucky enough to have a built-in function to calculate the CHV of each solution. PyGMO's hypervolume indicator function can work with a few different data-types, including numpy.array(). We have previously moved our Platypus solutions to a pandas.DataFrame (which can easily be output as a numpy.array()). Let's begin by creating a new DataFrame with the columns f1 and f2 which will be used to store our objective values for each solution.

In [8]:
solutions_df = pd.DataFrame(index=range(N),columns=['f1','f2']).astype(float)
solutions_df.head()
Out[8]:
f1 f2
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN

We can see that we've also defined an index range that covers the number of solutions in our population, $\mathrm{N}=100$. This means we have $100$ rows ready, but their values are initialised to NaN (Not A Number), which in this case simply indicates missing data.

Let's use a loop to iterate through our solutions list of $100$ solutions and assign the desired values to the corresponding row in our solutions_df DataFrame

In [9]:
for i in range(N):
    solutions_df.loc[i].f1 = solutions[i].objectives[0]
    solutions_df.loc[i].f2 = solutions[i].objectives[1]
        
solutions_df.head()
Out[9]:
f1 f2
0 0.388372 3.708762
1 0.682821 3.925579
2 0.451501 3.956080
3 0.482829 4.662264
4 0.481288 4.249790

We can see our DataFrame now contains the desired values. We can now easily get this data as a numpy.array() to feed into PyGMO's hypervolume indicator object constructor.

In [10]:
hyp = pg.hypervolume(solutions_df[['f1','f2']].values)

Now we can invoke the exclusive() method on our hypervolume object and pass in the reference point to calculate the CHV values.

In [11]:
for i in range(N):
    print(f"CHV of solution {i}:\t{hyp.exclusive(i, [11, 11]) / np.prod([11, 11])}")
CHV of solution 0:	0.0
CHV of solution 1:	0.0
CHV of solution 2:	0.0
CHV of solution 3:	0.0
CHV of solution 4:	0.0
CHV of solution 5:	0.0
CHV of solution 6:	0.0
CHV of solution 7:	0.0
CHV of solution 8:	2.1016782758638404e-05
CHV of solution 9:	0.0
CHV of solution 10:	1.1744508029092565e-16
CHV of solution 11:	0.0
CHV of solution 12:	0.0
CHV of solution 13:	1.1744508029092565e-16
CHV of solution 14:	0.0
CHV of solution 15:	1.1744508029092565e-16
CHV of solution 16:	0.0
CHV of solution 17:	0.0
CHV of solution 18:	1.1744508029092565e-16
CHV of solution 19:	0.0
CHV of solution 20:	0.0
CHV of solution 21:	0.0
CHV of solution 22:	0.0
CHV of solution 23:	0.0
CHV of solution 24:	0.0005987393592886903
CHV of solution 25:	1.1744508029092565e-16
CHV of solution 26:	0.0
CHV of solution 27:	0.0
CHV of solution 28:	0.0
CHV of solution 29:	0.0
CHV of solution 30:	0.0
CHV of solution 31:	0.0
CHV of solution 32:	0.0
CHV of solution 33:	0.0
CHV of solution 34:	0.0
CHV of solution 35:	1.1744508029092565e-16
CHV of solution 36:	0.0
CHV of solution 37:	0.0
CHV of solution 38:	0.0
CHV of solution 39:	4.581021004415666e-06
CHV of solution 40:	1.1744508029092565e-16
CHV of solution 41:	0.0
CHV of solution 42:	0.0
CHV of solution 43:	0.0
CHV of solution 44:	0.0
CHV of solution 45:	0.0
CHV of solution 46:	0.00016936561242023008
CHV of solution 47:	0.0
CHV of solution 48:	0.0
CHV of solution 49:	0.0
CHV of solution 50:	1.1744508029092565e-16
CHV of solution 51:	0.0
CHV of solution 52:	0.0
CHV of solution 53:	0.0
CHV of solution 54:	0.0
CHV of solution 55:	6.0564826140698415e-06
CHV of solution 56:	0.0
CHV of solution 57:	0.0
CHV of solution 58:	0.0
CHV of solution 59:	0.0
CHV of solution 60:	0.0
CHV of solution 61:	1.1744508029092565e-16
CHV of solution 62:	0.0
CHV of solution 63:	0.0
CHV of solution 64:	0.0
CHV of solution 65:	0.0
CHV of solution 66:	0.0
CHV of solution 67:	0.0
CHV of solution 68:	3.688092819079091e-05
CHV of solution 69:	0.05976250426304513
CHV of solution 70:	0.0
CHV of solution 71:	0.0
CHV of solution 72:	1.1744508029092565e-16
CHV of solution 73:	0.0
CHV of solution 74:	0.0
CHV of solution 75:	0.0
CHV of solution 76:	0.0
CHV of solution 77:	0.0
CHV of solution 78:	0.0
CHV of solution 79:	5.7721561188396505e-05
CHV of solution 80:	0.0
CHV of solution 81:	0.0
CHV of solution 82:	9.727938862354673e-05
CHV of solution 83:	0.0
CHV of solution 84:	0.0
CHV of solution 85:	0.0
CHV of solution 86:	0.0
CHV of solution 87:	0.0
CHV of solution 88:	0.0
CHV of solution 89:	0.0
CHV of solution 90:	0.0
CHV of solution 91:	1.1744508029092565e-16
CHV of solution 92:	0.0006112409913808416
CHV of solution 93:	4.781326621988483e-06
CHV of solution 94:	0.0
CHV of solution 95:	1.1744508029092565e-16
CHV of solution 96:	1.1744508029092565e-16
CHV of solution 97:	0.0
CHV of solution 98:	3.6825691780362765e-05
CHV of solution 99:	0.0

With the knowledge that Platypus normalises the HV values, we have already normalised PyGMO's results by dividing by the product of the reference point. Now we can see that both frameworks have arrived at the same hypervolume indicator value, although PyGMO has handled the precision issue differently.

Conclusion

In this section, we have introduced the contributing hypervolume indicator as a criterion that can be used in the selection of solutions from a population. With CHV values assigned to solutions in a population, we can, for example, sort them by descending order and select the top 10 solutions to get the 10 solutions that contribute most to the population. We also demonstrated the application of two implementations of the contributing hypervolume indicator, one in Platypus, and one in PyGMO.

Exercise

Try updating the code for the Platypus CHV calculation to address the precision issue.

Exercise

Try using the CHV values for each solution to create a visualisation which highlights the top $T$ (e.g. $T=5$) solutions in a non-dominated front.


  1. E. Zitzler, S. K ̈unzli, Indicator-based selection in multiobjective search, in: Parallel Problem Solving from Nature-PPSNVIII, Springer, 2004, pp. 832–842 

Hypervolume Indicator

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import platypus as plat              # multi-objective optimisation framework
import pygmo as pg                   # multi-objective optimisation framework

Introduction

In this section we're going to take a look at how to calculate the hypervolume indicator value for a population of solutions. The hypervolume indicator (or $s$-metric) is a performance metric for indicating the quality of a non-dominated approximation set, introduced by [1] where it is described as the "size of the space covered or size of dominated space". It can be defined as:

$$ HV\big(f ^{ref}, \mathrm{X}\big) = \Lambda \left( \bigcup_{\mathrm{X}_n \in \mathrm{X} } \Big[ f_{1}(\mathrm{X}_n), f_{1}^{ref} \Big] \times \dots \times \Big[ f_{m}(\mathrm{X}_n), f_{m}^{ref} \Big] \right) $$

where $HV\left(f ^{ref}, \mathrm{X}\right)$ resolves the size of the space covered by an approximation set $\mathrm{X}$, $f^{ref} \in \mathbb{R}$ refers to a chosen reference point, and $\Lambda \left(.\right)$ refers to the Lebesgue measure. This can be illustrated in two-dimensional objective space (to allow for easy visualisation) with a population of three solutions.

The hypervolume indicator is appealing because it is compatible with any number of problem objectives and requires no prior knowledge of the true Pareto-optimal front, this is important when working with real-world problems which have not yet been solved. The hypervolume indicator is currently used in the field of multi-objective optimisation as both a proximity and diversity performance metric, as well as in the decision-making process.

Unlike dominance-based criteria which require only two solutions for performing a comparison (which can be used on an ordinal scale), a reference vector is required to calculate the HV indicator value (i.e. it requires the objective to be measured on an interval scale). When used for pairwise or multiple comparison of optimisation algorithms, this reference vector must be the same, otherwise, the resulting HV indicator values are not comparable. This reference vector can be approximated as large values for each problem objective in order for all objective values in any approximation set to be within the reference vector.

Calculating the Hypervolume Indicator with Platypus

Let's define some necessary variables before invoking the Platypus implementation of the hypervolume indicator algorithm. We will use the ZDT1 test function with the number of design variables $\mathrm{D}=30$ throughout this example, and with the population size $\mathrm{N}=100$.

In [2]:
problem = plat.ZDT1()
D = 30
N = 100

With these variables defined, we will now move onto generating our initial population. We will be using Platypus Solution objects for this, which we will initialise with random problem variables, evaluate, and then append to a list named solutions.

In [3]:
solutions = []

for i in range(N):
    solution = plat.Solution(problem)
    solution.variables = np.random.rand(D)
    solution.evaluate()
    solutions.append(solution)

Let's print out the variables and objectives for the first solution in this list to see what they look like.

In [4]:
print(f"Design variables:\n {solutions[0].variables}")
print(f"Objective values:\n {solutions[0].objectives}")
Design variables:
 [0.03319209 0.48451621 0.55483626 0.08112446 0.18948245 0.90540391
 0.41676484 0.65298363 0.59713341 0.89705621 0.17407802 0.00511218
 0.05250722 0.52101625 0.04062096 0.3296622  0.77899607 0.14547564
 0.18833536 0.71929345 0.65747571 0.77174683 0.01498789 0.77254087
 0.27025744 0.97433607 0.23659607 0.13750044 0.61372126 0.4816075 ]
Objective values:
 [0.033192094201290434, 4.526025533331595]

Now that we have a population of solutions stored in the solutions variable, we can prepare an instance of the Platypus.indicators.Hypervolume() object with the desired reference point for the hypervolume indicator calculation. For ZDT1, the reference point is typically $\langle11,11\rangle$.

In [5]:
hyp = plat.indicators.Hypervolume(minimum=[0, 0], maximum=[11, 11])

We can now use this hyp object to calculate the hypervolume indicator for any population.

Note

The Platypus implementation of the hypervolume indicator requires either a minimum and a maximum point, or a reference_set (not the same as the hypervolume reference point). Normally, a hypervolume indicator algorithm would only require a single vector that defines the reference point $f ^{ref}$. In the case of Platypus, $f ^{ref}$ actually corresponds to maximum, but Platypus also forces us to provide a vector for minimum, which we have set to $\langle0, 0\rangle$

Let's calculate the hypervolume indicator value for the population of solutions we created above and named solution.

In [6]:
print(f"Hypervolume indicator value: {hyp.calculate(solutions)}")
Hypervolume indicator value: 0.7779645859838822

We now have this single hypervolume indicator value that we can use to gauge and compare the performance of our population. The higher the value is, the "better" the hypervolume quality.

Calculating the Hypervolume Indicator with PyGMO

We can also use a different framework's implementation of the hypervolume indicator on our population. We should be expecting the same value, but this is a good exercise to learn how to use a different framework, and perhaps to check that they do indeed arrive at the same value.

This time we will use the PyGMO framework. PyGMO's hypervolume indicator function can work with a few different data-types, including numpy.array(). We have previously moved our Platypus solutions to a pandas.DataFrame (which can easily be output as a numpy.array()). Let's begin by creating a new DataFrame with the columns f1 and f2 which will be used to store our objective values for each solution.

In [7]:
solutions_df = pd.DataFrame(index=range(N),columns=['f1','f2']).astype(float)
solutions_df.head()
Out[7]:
f1 f2
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN

We can see that we've also defined an index range that covers the number of solutions in our population, $\mathrm{N}=100$. This means we have $100$ rows ready, but their values are initialised to NaN (Not A Number), which in this case simply indicates missing data.

Let's use a loop to iterate through our solutions list of $100$ solutions and assign the desired values to the corresponding row in our solutions_df DataFrame

In [8]:
for i in range(N):
    solutions_df.loc[i].f1 = solutions[i].objectives[0]
    solutions_df.loc[i].f2 = solutions[i].objectives[1]
        
solutions_df.head()
Out[8]:
f1 f2
0 0.033192 4.526026
1 0.690335 3.981366
2 0.281927 3.885510
3 0.547213 3.484696
4 0.293064 4.689847

We can see our DataFrame now contains the desired values. We can now easily get this data as a numpy.array() to feed into PyGMO's hypervolume indicator object constructor.

In [9]:
hyp = pg.hypervolume(solutions_df[['f1','f2']].values)

Now we can invoke the compute() method on our hypervolume object and pass in the reference point to calculate the value.

In [10]:
hyp.compute([11, 11])
Out[10]:
94.13371490404984

Upon first inspection, this value seems to be different to the one calculated by Platypus so we may decide that we've either done something wrong, or there is a mistake in at least one of the implementations. However, in this case, it is simply the case that one of them (Platypus) has normalised the output. We can do the same with the output from PyGMO by dividing the hypervolume indicator value by the product of the reference point.

In [11]:
hyp.compute([11, 11]) / np.prod([11, 11])
Out[11]:
0.777964585983883

Now we can see that both frameworks have arrived at the same hypervolume indicator.

Conclusion

In this section, we have introduced the hypervolume indicator as a criterion that can be used in the selection of a population. We also demonstrated the application of two implementations of the hypervolume indicator, one in Platypus, and one in PyGMO.

Exercise

Experiment with the hypervolume indicator: try using different reference points, different population sizes, and different problems.


  1. E. Zitzler, S. K ̈unzli, Indicator-based selection in multiobjective search, in: Parallel Problem Solving from Nature-PPSNVIII, Springer, 2004, pp. 832–842 

Non-Dominated Sorting

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import plotly.graph_objects as go    # for data visualisation
import platypus as plat              # multi-objective optimisation framework
import plotly.express as px

# Optional Customisations
import plotly.io as pio              # to set shahin plot layout
pio.templates['shahin'] = pio.to_templated(go.Figure().update_layout(
    legend=dict(orientation="h",y=1.1, x=.5, xanchor='center'),
    margin=dict(t=0,r=0,b=40,l=40))).layout.template
pio.templates.default = 'shahin'
pio.renderers.default = "notebook_connected" # remove when running locally 

Introduction

In this section, we're going to take a look at how to use the Platypus framework to apply the non-dominated sorting algorithm to a population of randomly generated solutions. Non-dominated sorting is important during the selection stage of an Evolutionary Algorithm because it allows us to prioritise the selection of solutions based on their dominance relations with respect to the rest of the population. For example, we may wish to only select 25 solutions from a population of 50, with the intention to use these 25 solutions in the variation stage that follows.

Non-Dominated Sorting with Platypus

Let's define some necessary variables before invoking the Platypus implementation of the non-dominated sorting algorithm. We will use the ZDT1 test function with the number design variables $\mathrm{D}=30$ throughout this example, and with the population size $\mathrm{N}=50$.

In [2]:
problem = plat.ZDT1()
D = 30
N = 50

With these variables defined, we will now move onto generating our initial population. We will be using Platypus Solution objects for this, which we will initialise with random problem variables, evaluate, and then append to a list named solutions.

In [3]:
solutions = []

for i in range(N):
    solution = plat.Solution(problem)
    solution.variables = np.random.rand(D)
    solution.evaluate()
    solutions.append(solution)

Let's print out the variables and objectives for the first solution in this list to see what they look like.

In [4]:
print(f"Design variables:\n {solutions[0].variables}")
print(f"Objective values:\n {solutions[0].objectives}")
Design variables:
 [0.28667423 0.55662194 0.90739352 0.04724923 0.15597164 0.43121412
 0.48961484 0.11949491 0.3495653  0.19701299 0.41410021 0.10228634
 0.52024402 0.98179538 0.03733266 0.48695646 0.87985442 0.42723804
 0.32991964 0.55900199 0.59828705 0.41586351 0.59193016 0.17133831
 0.74197031 0.26004467 0.25864923 0.65233514 0.95503633 0.0399368 ]
Objective values:
 [0.28667423276614434, 3.7452491696316215]

Now that we have a population of solutions stored in the solutions variable, we can pass this into the Platypus.nondominated_sort() function that will assign each solution a rank parameter, starting at 0 for the first non-dominated front and incrementing for every additional front that is detected.

In [5]:
plat.nondominated_sort(solutions)

Let's check to see if the first solution in our solutions list has one of these newly calculated rank parameters.

In [6]:
solutions[0].rank
Out[6]:
1

We now have a population of solutions that have been assigned their design variables, their objective values, and their ranks according to the non-dominated sorting algorithm.

Visualising the Non-Dominated Fronts

It would be useful to visualise these non-dominated fronts, according to their newly assigned ranks, using a scatterplot for further investigation. Before we do this, we will migrate our solution variables from their Platypus data-structure to a DataFrame. We will do this for the simple reason that our visualisation tools, e.g. Plotly, work quite well with the DataFrame format.

Let's begin by creating a new DataFrame with the columns f1, f2, and front_rank. f1 and f2 will be used to store our objective values for each solution, which is sufficient for the two-objective ZDT1 problem, and rank_front will be used to store the non-dominated sorting ranks that were calculated earlier.

In [7]:
solutions_df = pd.DataFrame(index=range(N),columns=['f1','f2','front_rank'])
solutions_df.head()
Out[7]:
f1 f2 front_rank
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN

We can see that we've also defined an index range that covers the number of solutions in our population, $\mathrm{N}=50$. This means we have $50$ rows ready, but their values are initialised to NaN (Not A Number), which in this case simply indicates missing data.

Let's use a loop to iterate through our solutions list of $50$ solutions and assign the desired values to the corresponding row in our solutions_df DataFrame

In [8]:
for i in range(N):
    solutions_df.loc[i].f1 = solutions[i].objectives[0]
    solutions_df.loc[i].f2 = solutions[i].objectives[1]
    solutions_df.loc[i].front_rank = solutions[i].rank
        
solutions_df.head()
Out[8]:
f1 f2 front_rank
0 0.286674 3.74525 1
1 0.852155 3.36735 2
2 0.474235 4.17776 4
3 0.808175 3.41983 3
4 0.0473586 5.73006 1

We can see our DataFrame now contains the desired values. We're now ready to go with some visualisation, so let's start with a simple scatterplot for the entire population in the objective space.

In [9]:
fig = go.Figure(layout=dict(xaxis=dict(title='f1'), 
                            yaxis=dict(title='f2')))

fig.add_scatter(x=solutions_df.f1, y=solutions_df.f2, mode='markers')

fig.show()

This visualisation gives us an idea of how our solutions look in the objective space. However, we can do better and visualise each front using a different colour. This way, we can visually distinguish between the different non-dominated fronts in our population.

Before producing the visualisation, we may fish to determine the number of fronts found.

In [10]:
solutions_df.front_rank.nunique()
Out[10]:
7

We will then need to produce a sorted vector containing the rank of each front.

In [11]:
fronts = sorted(solutions_df.front_rank.unique())
print(fronts)
[0, 1, 2, 3, 4, 5, 6]

With this vector, we can now visualise each front on the same scatterplot, using different colours to distinguish between their ranking. If you click on items in the legend, you can also disable/enable each front to examine them independently.

In [12]:
fig = go.Figure(layout=dict(
    xaxis=dict(title='f1'),
    yaxis=dict(title='f2')))

for front in fronts:
    front_solutions_df = solutions_df.loc[solutions_df.front_rank == front]
    fig.add_scatter(x=front_solutions_df.f1, y=front_solutions_df.f2, 
                    name=f'front {front}',  
                    mode='markers',
                    marker=dict(color = px.colors.qualitative.Plotly[front]))

fig.show()

The above visualising is great and the code we used to generate it gave us some finer control over the order in which they are plotted, the colours, the axis labels, and so on. However, I wanted to also share with you an approach to get a very similar plot using the plotly.express. As you can see below, we have achieved a very similar plot in only a single line of code.

In [13]:
px.scatter(solutions_df, x="f1", y="f2", color="front_rank")

Conclusion

In this section, we have demonstrated how we can use a software package such as Platypus to calculate the non-dominated sorting ranks for a population, which is a useful technique in the selection stage of an Evolutionary Algorithm. We then used this information to create a scatterplot that distinguishes between each non-dominated front using different colours for every rank. This is useful when visualising a population either during runtime or after a search has concluded.

Pareto Optimality and Dominance Relations

Preamble

In [1]:
# used to create block diagrams
%reload_ext xdiag_magic
%xdiag_output_format svg
    
import numpy as np                   # for multi-dimensional containers

Introduction

In an earlier section, we briefly covered selection in single-objective problems. There is a fundamental difference between selection in single-objective problems when compared to multi-objective problems. In single-objective problems, we often look for a single solution which has the best objective value, whereas this is not possible in multi-objective problems because they often involve conflicts between multiple objectives. Because multi-objective problems consider more than one objective value, it is often the case that a solution with the best value for one objective will often have degradation in one or more other objective values. This is where a trade-off emerges in the objective space, and it is unlikely that there exists a single solution that can be considered optimal.

Therefore, the solution to a multi-objective optimisation problem is not a single solution vector, but instead an approximation set. This is a set of many candidate solutions that present trade-offs between the multiple objectives, where any improvement in one objective value will result in the degradation in one or more of the other objective values. This notion of "optimum" solutions is called Pareto-optimality.

Pareto-optimality and other approaches to determining dominance relationships between multiple solutions in a population are important during the selection stage of an optimisation algorithm, highlighted below.

In [2]:
%%blockdiag
{
    orientation = portrait
    Initialisation -> Evaluation -> "Terminate?" -> Selection -> Variation -> Evaluation
    Selection [color = '#ffffcc']
}
blockdiag { orientation = portrait Initialisation -> Evaluation -> "Terminate?" -> Selection -> Variation -> Evaluation Selection [color = '#ffffcc'] } InitialisationEvaluationTerminate?SelectionVariation

Notation

Let's define a multi-objective function $f(x)$ consisting of two objectives.

$$ f(x) = (f_{1}(x),f_{2}(x))\tag{1} $$

We have a population $\mathrm{X}$ of $\mathrm{N}$ candidate solutions.

$$ \mathbf{X} = \langle \mathrm{X}_1,\mathrm{X}_2,\ldots,\mathrm{X}_{\mathrm{N}}\rangle $$

where $\mathrm{N}$ refers to the number of solutions in the population, $\mathrm{X}_{n}$ refers to the $n$-th solution in the population, and $x_{dn}$ refers to the $d$-th decision variable of the $n$-th solution in the population.

$$ \mathrm{X}_{n} = \langle x_{1n},x_{2n},\ldots,x_{\mathrm{D}n} \rangle $$

We will also define the corresponding objective values $\mathrm{Y}$ which are calculated when evaluating the function in Equation 1.

$$ \mathbf{Y} = \langle \mathrm{Y}_1,\mathrm{Y}_2,\ldots,\mathrm{Y}_{\mathrm{N}}\rangle $$

where $\mathrm{N}$ refers to the number of objective value sets, $\mathrm{Y}_{n}$ refers to the $n$-th set of objective values in the population, and $y_{mn}$ refers to the $m$-th objective value of the $n$-th set of objective values in the population.

$$ \mathrm{Y}_{n} = \langle y_{1n},y_{2n} \rangle $$

Dominance

When using or designing algorithms to solve multi-objective optimisation problems, we will often encounter the concept of domination. This concept is useful for comparing two solutions to determine whether one is better than the other.

We can now use our notation to define dominance relationships. Let's take two solutions to a two-objective problem: $\mathrm{X}_1$ and $\mathrm{X}_2$, with their corresponding objective values $\mathrm{Y}_{1} = \langle y_{1,1},y_{2,1} \rangle$ and $\mathrm{Y}_{2} = \langle y_{1,2},y_{2,2} \rangle$.

Definition 1: A solution $\mathrm{X}_1$ is said to dominate another solution $\mathrm{X}_2$, if both of the following conditions are satisfied:

  1. The objective values of $\mathrm{X}_1$ are no worse than those of $\mathrm{X}_2$ in all objectives, i.e. for this two-objective problem $f_{m}(\mathrm{X}_1) \leq f_{m}(\mathrm{X}_2)$ for all $m = (1, 2)$.

  2. The objective values of solution $\mathrm{X}_1$ are strictly better than at least one of those of solution $\mathrm{X}_2$, i.e. for this two-objective problem $f_{m}(\mathrm{X}_1) \lt f_{m}(\mathrm{X}_2)$ for at least one $m = (1, 2)$.

If any of the two conditions are violated, the solution $\mathrm{X}_1$ does not dominate the solution $\mathrm{X}_2$ . Otherwise, we can claim $\mathrm{X}_1$ dominates $\mathrm{X}_2$.

Definition 2: Two solutions $\mathrm{X}_1$ and $\mathrm{X}_2$ are said to be non-dominating with respect to each other if the following conditions are satisfied:

  1. The objective values of solution $\mathrm{X}_1$ are strictly better than at least one of those of solution $\mathrm{X}_2$, i.e. for this two-objective problem $f_{m}(\mathrm{X}_1) \lt f_{m}(\mathrm{X}_2)$ for at least one $m = (1, 2)$.

  2. The objective values of solution $\mathrm{X}_1$ are strictly worse than at least one of those of solution $\mathrm{X}_2$, i.e. for this two-objective problem $f_{m}(\mathrm{X}_1) \gt f_{m}(\mathrm{X}_2)$ for at least one $m = (1, 2)$.

Selecting Solutions

Generally, one of our goals throughout the optimisation process is to select the best solutions. This means that solutions that can be categorised as either "dominating" or "non-dominating" are solutions that we are interested in. To be clear, we are not interested in solutions that are dominated, because they do not offer any desirable performance with respect to any of the considered objectives.

Let's use Python to demonstrate these dominance relations that are often used for selection. Here, we will assume a minimisation problem, where smaller values are better. We will initialise four sets of solutions by synthetically assigning objective values that will demonstrate our dominance relations.

Our first set $\mathrm{Y}_1$ and $\mathrm{Y}_2$ will demonstrate the scenario where $\mathrm{Y}_1$ dominates $\mathrm{Y}_2$.

In [3]:
Y1 = np.array([0, 0.5])
Y2 = np.array([0.5, 0.5])

Our second set $\mathrm{Y}_3$ and $\mathrm{Y}_4$ will demonstrate the scenario where $\mathrm{Y}_3$ is identical to $\mathrm{Y}_4$.

In [4]:
Y3 = np.array([0.5, 0.5])
Y4 = np.array([0.5, 0.5])

Our third set $\mathrm{Y}_5$ and $\mathrm{Y}_6$ will demonstrate the scenario where $\mathrm{Y}_5$ and $\mathrm{Y}_6$ are non-dominated with respect to each other.

In [5]:
Y5 = np.array([0, 0.5])
Y6 = np.array([0.5, 0])

Our fourth set $\mathrm{Y}_7$ and $\mathrm{Y}_8$ will demonstrate the scenario where $\mathrm{Y}_7$ is dominated by $\mathrm{Y}_8$.

In [6]:
Y7 = np.array([0.5, 0.5])
Y8 = np.array([0, 0.25])

First, we will define a function to determine whether a solution dominates another or not.

In [7]:
def dominates(X1, X2):
    if(np.any(X1 < X2) and np.all(X1 <= X2)):
        return True
    else:
        return False

Now, let's test it with our four sets of objective values.

In [8]:
dominates(Y1, Y2)
Out[8]:
True
In [9]:
dominates(Y3, Y4)
Out[9]:
False
In [10]:
dominates(Y5, Y6)
Out[10]:
False
In [11]:
dominates(Y7, Y8)
Out[11]:
False

As expected, the only solution pairing that satisfies the criteria for dominance is our first set $\mathrm{Y}_1$ and $\mathrm{Y}_2$.

Next, let's define a function to determine whether two solutions are non-dominated.

In [12]:
def nondominated(X1, X2):
    if(np.any(X1 < X2) and np.any(X1 > X2)):
        return True
    else:
        return False

Again, we will test it with our four sets of objective values.

In [13]:
nondominated(Y1, Y2)
Out[13]:
False
In [14]:
nondominated(Y3, Y4)
Out[14]:
False
In [15]:
nondominated(Y5, Y6)
Out[15]:
True
In [16]:
nondominated(Y7, Y8)
Out[16]:
False

As expected, the only solution pairing that satisfies the criteria for dominance is our first set $\mathrm{Y}_5$ and $\mathrm{Y}_6$.

We can string these two functions together within a decision structure to determine the dominance relation between any two solutions.

In [17]:
def dominance_relation(X1, X2):
    if(np.all(X1 == X2)):
        print("The solutions are identical.")
    elif(dominates(X1, X2)):
        print("The first solution dominates the second.")
    elif(nondominated(X1, X2)):
        print("The two solations are nondominating")
    else:
        print("The first solution is dominated by the second.")

Finally, we will test it with our four sets of objective values.

In [18]:
dominance_relation(Y1, Y2)
The first solution dominates the second.
In [19]:
dominance_relation(Y3, Y4)
The solutions are identical.
In [20]:
dominance_relation(Y5, Y6)
The two solations are nondominating
In [21]:
dominance_relation(Y7, Y8)
The first solution is dominated by the second.

Conclusion

In this section we introduced the concept of Pareto-optimality and looked at dominance relations in more detail, complete with examples in Python. The next challenge we will encounter is when we need to select a subset of solutions from a population that consists of entirely non-dominated solutions. For example, which 100 solutions do we select from a population of 200 non-dominated solutions? We will offer some solutions to this challenge in the later sections.