Pareto Optimality and Dominance Relations

Preamble

In [1]:
# used to create block diagrams
%reload_ext xdiag_magic
%xdiag_output_format svg
    
import numpy as np                   # for multi-dimensional containers

Introduction

In an earlier section, we briefly covered selection in single-objective problems. There is a fundamental difference between selection in single-objective problems when compared to multi-objective problems. In single-objective problems, we often look for a single solution which has the best objective value, whereas this is not possible in multi-objective problems because they often involve conflicts between multiple objectives. Because multi-objective problems consider more than one objective value, it is often the case that a solution with the best value for one objective will often have degradation in one or more other objective values. This is where a trade-off emerges in the objective space, and it is unlikely that there exists a single solution that can be considered optimal.

Therefore, the solution to a multi-objective optimisation problem is not a single solution vector, but instead an approximation set. This is a set of many candidate solutions that present trade-offs between the multiple objectives, where any improvement in one objective value will result in the degradation in one or more of the other objective values. This notion of "optimum" solutions is called Pareto-optimality.

Pareto-optimality and other approaches to determining dominance relationships between multiple solutions in a population are important during the selection stage of an optimisation algorithm, highlighted below.

In [2]:
%%blockdiag
{
    orientation = portrait
    Initialisation -> Evaluation -> "Terminate?" -> Selection -> Variation -> Evaluation
    Selection [color = '#ffffcc']
}
blockdiag { orientation = portrait Initialisation -> Evaluation -> "Terminate?" -> Selection -> Variation -> Evaluation Selection [color = '#ffffcc'] } InitialisationEvaluationTerminate?SelectionVariation

Notation

Let's define a multi-objective function $f(x)$ consisting of two objectives.

$$ f(x) = (f_{1}(x),f_{2}(x))\tag{1} $$

We have a population $\mathrm{X}$ of $\mathrm{N}$ candidate solutions.

$$ \mathbf{X} = \langle \mathrm{X}_1,\mathrm{X}_2,\ldots,\mathrm{X}_{\mathrm{N}}\rangle $$

where $\mathrm{N}$ refers to the number of solutions in the population, $\mathrm{X}_{n}$ refers to the $n$-th solution in the population, and $x_{dn}$ refers to the $d$-th decision variable of the $n$-th solution in the population.

$$ \mathrm{X}_{n} = \langle x_{1n},x_{2n},\ldots,x_{\mathrm{D}n} \rangle $$

We will also define the corresponding objective values $\mathrm{Y}$ which are calculated when evaluating the function in Equation 1.

$$ \mathbf{Y} = \langle \mathrm{Y}_1,\mathrm{Y}_2,\ldots,\mathrm{Y}_{\mathrm{N}}\rangle $$

where $\mathrm{N}$ refers to the number of objective value sets, $\mathrm{Y}_{n}$ refers to the $n$-th set of objective values in the population, and $y_{mn}$ refers to the $m$-th objective value of the $n$-th set of objective values in the population.

$$ \mathrm{Y}_{n} = \langle y_{1n},y_{2n} \rangle $$

Dominance

When using or designing algorithms to solve multi-objective optimisation problems, we will often encounter the concept of domination. This concept is useful for comparing two solutions to determine whether one is better than the other.

We can now use our notation to define dominance relationships. Let's take two solutions to a two-objective problem: $\mathrm{X}_1$ and $\mathrm{X}_2$, with their corresponding objective values $\mathrm{Y}_{1} = \langle y_{1,1},y_{2,1} \rangle$ and $\mathrm{Y}_{2} = \langle y_{1,2},y_{2,2} \rangle$.

Definition 1: A solution $\mathrm{X}_1$ is said to dominate another solution $\mathrm{X}_2$, if both of the following conditions are satisfied:

  1. The objective values of $\mathrm{X}_1$ are no worse than those of $\mathrm{X}_2$ in all objectives, i.e. for this two-objective problem $f_{m}(\mathrm{X}_1) \leq f_{m}(\mathrm{X}_2)$ for all $m = (1, 2)$.

  2. The objective values of solution $\mathrm{X}_1$ are strictly better than at least one of those of solution $\mathrm{X}_2$, i.e. for this two-objective problem $f_{m}(\mathrm{X}_1) \lt f_{m}(\mathrm{X}_2)$ for at least one $m = (1, 2)$.

If any of the two conditions are violated, the solution $\mathrm{X}_1$ does not dominate the solution $\mathrm{X}_2$ . Otherwise, we can claim $\mathrm{X}_1$ dominates $\mathrm{X}_2$.

Definition 2: Two solutions $\mathrm{X}_1$ and $\mathrm{X}_2$ are said to be non-dominating with respect to each other if the following conditions are satisfied:

  1. The objective values of solution $\mathrm{X}_1$ are strictly better than at least one of those of solution $\mathrm{X}_2$, i.e. for this two-objective problem $f_{m}(\mathrm{X}_1) \lt f_{m}(\mathrm{X}_2)$ for at least one $m = (1, 2)$.

  2. The objective values of solution $\mathrm{X}_1$ are strictly worse than at least one of those of solution $\mathrm{X}_2$, i.e. for this two-objective problem $f_{m}(\mathrm{X}_1) \gt f_{m}(\mathrm{X}_2)$ for at least one $m = (1, 2)$.

Selecting Solutions

Generally, one of our goals throughout the optimisation process is to select the best solutions. This means that solutions that can be categorised as either "dominating" or "non-dominating" are solutions that we are interested in. To be clear, we are not interested in solutions that are dominated, because they do not offer any desirable performance with respect to any of the considered objectives.

Let's use Python to demonstrate these dominance relations that are often used for selection. Here, we will assume a minimisation problem, where smaller values are better. We will initialise four sets of solutions by synthetically assigning objective values that will demonstrate our dominance relations.

Our first set $\mathrm{Y}_1$ and $\mathrm{Y}_2$ will demonstrate the scenario where $\mathrm{Y}_1$ dominates $\mathrm{Y}_2$.

In [3]:
Y1 = np.array([0, 0.5])
Y2 = np.array([0.5, 0.5])

Our second set $\mathrm{Y}_3$ and $\mathrm{Y}_4$ will demonstrate the scenario where $\mathrm{Y}_3$ is identical to $\mathrm{Y}_4$.

In [4]:
Y3 = np.array([0.5, 0.5])
Y4 = np.array([0.5, 0.5])

Our third set $\mathrm{Y}_5$ and $\mathrm{Y}_6$ will demonstrate the scenario where $\mathrm{Y}_5$ and $\mathrm{Y}_6$ are non-dominated with respect to each other.

In [5]:
Y5 = np.array([0, 0.5])
Y6 = np.array([0.5, 0])

Our fourth set $\mathrm{Y}_7$ and $\mathrm{Y}_8$ will demonstrate the scenario where $\mathrm{Y}_7$ is dominated by $\mathrm{Y}_8$.

In [6]:
Y7 = np.array([0.5, 0.5])
Y8 = np.array([0, 0.25])

First, we will define a function to determine whether a solution dominates another or not.

In [7]:
def dominates(X1, X2):
    if(np.any(X1 < X2) and np.all(X1 <= X2)):
        return True
    else:
        return False

Now, let's test it with our four sets of objective values.

In [8]:
dominates(Y1, Y2)
Out[8]:
True
In [9]:
dominates(Y3, Y4)
Out[9]:
False
In [10]:
dominates(Y5, Y6)
Out[10]:
False
In [11]:
dominates(Y7, Y8)
Out[11]:
False

As expected, the only solution pairing that satisfies the criteria for dominance is our first set $\mathrm{Y}_1$ and $\mathrm{Y}_2$.

Next, let's define a function to determine whether two solutions are non-dominated.

In [12]:
def nondominated(X1, X2):
    if(np.any(X1 < X2) and np.any(X1 > X2)):
        return True
    else:
        return False

Again, we will test it with our four sets of objective values.

In [13]:
nondominated(Y1, Y2)
Out[13]:
False
In [14]:
nondominated(Y3, Y4)
Out[14]:
False
In [15]:
nondominated(Y5, Y6)
Out[15]:
True
In [16]:
nondominated(Y7, Y8)
Out[16]:
False

As expected, the only solution pairing that satisfies the criteria for dominance is our first set $\mathrm{Y}_5$ and $\mathrm{Y}_6$.

We can string these two functions together within a decision structure to determine the dominance relation between any two solutions.

In [17]:
def dominance_relation(X1, X2):
    if(np.all(X1 == X2)):
        print("The solutions are identical.")
    elif(dominates(X1, X2)):
        print("The first solution dominates the second.")
    elif(nondominated(X1, X2)):
        print("The two solations are nondominating")
    else:
        print("The first solution is dominated by the second.")

Finally, we will test it with our four sets of objective values.

In [18]:
dominance_relation(Y1, Y2)
The first solution dominates the second.
In [19]:
dominance_relation(Y3, Y4)
The solutions are identical.
In [20]:
dominance_relation(Y5, Y6)
The two solations are nondominating
In [21]:
dominance_relation(Y7, Y8)
The first solution is dominated by the second.

Conclusion

In this section we introduced the concept of Pareto-optimality and looked at dominance relations in more detail, complete with examples in Python. The next challenge we will encounter is when we need to select a subset of solutions from a population that consists of entirely non-dominated solutions. For example, which 100 solutions do we select from a population of 200 non-dominated solutions? We will offer some solutions to this challenge in the later sections.

Block Diagrams with Blockdiag

Preamble

In [1]:
# used to create block diagrams
%reload_ext xdiag_magic
%xdiag_output_format svg

Introduction

You may have noticed that I programmatically generate block diagrams in many of my notebooks, such as the one below.

In [2]:
%%blockdiag
{
    orientation = portrait
    Initialisation -> Evaluation -> "Terminate?" -> Selection -> Variation -> Evaluation
    Initialisation [color = '#ffffcc']
}
blockdiag { orientation = portrait Initialisation -> Evaluation -> "Terminate?" -> Selection -> Variation -> Evaluation Initialisation [color = '#ffffcc'] } InitialisationEvaluationTerminate?SelectionVariation

I achieve this using what appears to be a relatively unknown Jupyter extension called xdiag_magic (by GitHub user kaibaesler) which provides custom magic functions for generating diagrams with blockdiag. Unfortunately, it wasn't a straight-forward conda install ... to get it up and running, but it's still quite easy.

Installation

These instructions assume you already have Jupyter Lab up and running, perhaps through Anaconda.

  1. Clone or Download this repository to your your machine. Make sure to extract it if you download it as a ZIP archive.
  2. Using the command line, navigate to the xidag_magic_extension directory, e.g. using cd.
  3. Once you're inside the directory, make sure you are within the desired environment, e.g. if you're using conda make sure you type conda activate <environment_name>.
  4. Type pip install -r requirements.txt.
  5. Type pip install -e ..

Examples

To test it, launch Jupyter Lab and open or create a notebook. You need to run the following code before generating any diagrams.

In [3]:
%reload_ext xdiag_magic
%xdiag_output_format svg

Then you can visit blockdiag's examples and try running some of their example diagrams to see the output. Your diagram code must appear in a Jupyter Lab Code cell, and be surrounded with the following.

In [ ]:
%%blockdiag
{
    
}

For example:

In [5]:
%%blockdiag
{
    orientation = portrait
    A -> B -> C
}
blockdiag { orientation = portrait A -> B -> C } ABC

Single Objective Problems: Rastrigin

Preamble

In [1]:
# used to create block diagrams
%reload_ext xdiag_magic
%xdiag_output_format svg
    
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import plotly.graph_objects as go    # for data visualisation
import plotly.io as pio              # to set shahin plot layout

pio.templates['shahin'] = pio.to_templated(go.Figure().update_layout(margin=dict(t=0,r=0,b=40,l=40))).layout.template
pio.templates.default = 'shahin'

Introduction

Before moving on, let's take some time to have a closer look at a single-objective problem. This will give us some perspective. In single-objective problems, the objective is to find a single solution which represents the global optimum in the entire search space. Determining which solutions out-perform others is a simple task when only considering a single objective because the best solution is simply the one with the highest (for maximisation problems) or lowest (for minimisation problems) objective value. Let's take the Rastrigin function as an example.

The Rastrigin Function

This is a single-objective test function which has been expressed in Equation 1.

$$ f(\mathbf{x}) = A n + \sum_{d=1}^D \left[x_i^2 - A\cos(2 \pi x_d)\right]\tag{1} $$

where $x_d \in [-5.12, 5.12]$, i.e. each problem variable should be between $-5.12$ and $5.12$. It is continuous, convex, and unimodal. It is scalable with regards to its dimensionality in the problem domain, however, we will use its two-dimensional configuration for easy visualisation throughout this notebook. Let's implement this problem in Python so we can evaluate some solutions.

In [2]:
def rastrigin(x):
    A = 10
    y = A * D + np.sum(np.square(x) - A * np.cos(2 * np.pi * x))
    return y

Now let's prepare for the initialisation of five solutions with two problem variables. We will specify the desired population size, $\mathrm{N}$, and the dimensionality of the problem domain, $\mathrm{D}$. We will also specify the lower and upper boundaries for our problem variables.

In [3]:
N = 5
D = 2
D_lower = np.ones((1, D)) * -5.12
D_upper = np.ones((1, D)) * 5.12

Let's initialise a population with random decision variables.

In [4]:
X = pd.DataFrame(np.random.uniform(low=D_lower, high=D_upper, size=(N,D)))
X
Out[4]:
0 1
0 -4.506840 4.083571
1 4.340260 3.006881
2 -0.467388 2.008804
3 0.014341 3.792710
4 -1.056248 4.844613

To generate $\mathrm{Y}$, we will evaluate each solution $\mathrm{X}_{\mathrm{N}}$ using the rastrigin() test function defined above.

In [5]:
Y = np.empty((0, 1))

X = pd.DataFrame(np.random.uniform(low=D_lower, high=D_upper, size=(N,D)))

for n in range(N):
    y = rastrigin(X.iloc[n])
    Y = np.vstack([Y, y])
    
# convert to DataFrame
Y = pd.DataFrame(Y, columns=['f'])

We only have five solutions, so it's feasible to list all our objective values.

In [6]:
Y
Out[6]:
f
0 47.320567
1 13.944034
2 51.926243
3 31.970453
4 62.623311

We can very easily select the best solution from the above population. It is simply the solution with the smallest objective value (as we are concerned with minimisation).

In [7]:
np.min(Y)
Out[7]:
f    13.944034
dtype: float64

Note

The above example relies on random numbers. This means you will see different results upon executing the code in this notebook for yourself.

Let's see if we can get a better understanding of the problem domain by generating and visualising 3000 more solutions. This means setting $\mathrm{N}=3000$. We can achieve this by repeating the same code as above with slightly different parameters to generate our population of solutions, and then evaluate them using our rastrigin() function.

In [8]:
N = 3000

X = pd.DataFrame(
    np.random.uniform(low=D_lower, high=D_upper, size=(N,D)),
    columns=['x1', 'x2']
)

Y = np.empty((0, 1))

for n in range(N):
    y = rastrigin(X.iloc[n])
    Y = np.vstack([Y, y])
    
Y = pd.DataFrame(Y, columns=['f'])

All that's left now is to visualise the results. For this, we'll use a 3D scatterplot to plot the problem variables of each solution along with each corresponding objective value. We will reserve the vertical axis for the objective value, $f$, meaning that "lower" points represent better performing solutions.

In [10]:
fig = go.Figure(layout=dict(scene = dict(xaxis_title='x1',
                                         yaxis_title='x2',
                                         zaxis_title='f')))

fig.add_scatter3d(x=X.x1, y=X.x2, z=Y.f, 
                  mode='markers',
                  marker=dict(color = -Y.f, size=4))

fig.show()

Conclusion

In this section, we covered the very basics in what we briefly covered the topic of single-objective problems using the popular Rastrigin function as an example. The most important lesson from this section is that it is trivial to determine which solution outperforms the rest when working with a single-objective problem.

In the next section, we will demonstrate why this is more difficult for multi-objective problems, and introduce some approaches to selection.

Single Objective Problems

Preamble

In [1]:
# used to create block diagrams
%reload_ext xdiag_magic
%xdiag_output_format svg
    
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import plotly.graph_objects as go    # for data visualisation
import plotly.io as pio              # to set shahin plot layout

pio.templates['shahin'] = pio.to_templated(go.Figure().update_layout(margin=dict(t=0,r=0,b=40,l=40))).layout.template
pio.templates.default = 'shahin'

Introduction

Before moving on, let's take some time to have a closer look at a single-objective problem. This will give us some perspective. In single-objective problems, the objective is to find a single solution which represents the global optimum in the entire search space. Determining which solutions outperform others is a simple task when only considering a single-objective because the best solution is simply the one with the highest (for maximisation problems) or lowest (for minimisation problems) objective value. Let's take the Sphere function as an example.

The Sphere Function

This is a single-objective test function which has been expressed in Equation 1.

$$ f(\boldsymbol{x}) = \sum_{d=1}^{D} x_{d}^{2}\tag{1} $$

where $x_d \in [-5.12, 5.12]$, i.e. each problem variable should be between $-5.12$ and $5.12$. It is continuous, convex, and unimodal. It is scalable with regards to its dimensionality in the problem domain, however, we will use its two-dimensional configuration for easy visualisation throughout this notebook. Let's implement this problem in Python so we can evaluate some solutions.

In [2]:
def sphere(x):
    y = np.sum(np.square(x))
    return y

Now let's prepare for the initialisation of five solutions with two problem variables. We will specify the desired population size, $\mathrm{N}$, and the dimensionality of the problem domain, $\mathrm{D}$. We will also specify the lower and upper boundaries for our problem variables.

In [3]:
N = 5
D = 2
D_lower = np.ones((1, D)) * -5.12
D_upper = np.ones((1, D)) * 5.12

Let's initialise a population with random decision variables.

In [4]:
X = pd.DataFrame(np.random.uniform(low=D_lower, high=D_upper, size=(N,D)))
X
Out[4]:
0 1
0 1.896248 4.409782
1 2.673535 3.220184
2 3.656851 -3.629166
3 -3.381706 -4.333205
4 -3.065257 -0.617601

To generate $\mathrm{Y}$, we will evaluate each solution $\mathrm{X}_{\mathrm{N}}$ using the sphere() test function defined above.

In [5]:
Y = np.empty((0, 1))

X = pd.DataFrame(np.random.uniform(low=D_lower, high=D_upper, size=(N,D)))

for n in range(N):
    y = sphere(X.iloc[n])
    Y = np.vstack([Y, y])
    
# convert to DataFrame
Y = pd.DataFrame(Y, columns=['f'])

We only have five solutions, so it's feasible to list all our objective values.

In [6]:
Y
Out[6]:
f
0 21.716052
1 21.122412
2 34.228878
3 18.797980
4 5.192487

We can very easily select the best solution from the above population. It is simply the solution with the smallest objective value (as we are concerned with minimisation).

In [7]:
np.min(Y)
Out[7]:
f    5.192487
dtype: float64

Note

The above example relies on random numbers. This means you will see different results upon executing the code in this notebook for yourself.

Let's see if we can get a better understanding of the problem domain by generating and visualising 1000 more solutions. This means setting $\mathrm{N}=1000$. We can achieve this by repeating the same code as above with slightly different parameters to generate our population of solutions, and then evaluate them using our sphere() function.

In [8]:
N = 1000

X = pd.DataFrame(
    np.random.uniform(low=D_lower, high=D_upper, size=(N,D)),
    columns=['x1', 'x2']
)

Y = np.empty((0, 1))

for n in range(N):
    y = sphere(X.iloc[n])
    Y = np.vstack([Y, y])
    
Y = pd.DataFrame(Y, columns=['f'])

All that's left now is to visualise the results. For this, we'll use a 3D scatterplot to plot the problem variables of each solution along with each corresponding objective value. We will reserve the vertical axis for the objective value, $f$, meaning that "lower" points represent better performing solutions.

In [10]:
fig = go.Figure(layout=dict(scene = dict(xaxis_title='x1',
                                         yaxis_title='x2',
                                         zaxis_title='f')))

fig.add_scatter3d(x=X.x1, y=X.x2, z=Y.f, 
                  mode='markers',
                  marker=dict(color = -Y.f))

fig.show()

Conclusion

In this section, we covered the very basics on the topic of single-objective optimisation problems using the popular Sphere function as an example. The most important lesson from this section is that it is trivial to determine which solution outperforms the rest when working with a single-objective problem.

In the next section, we will demonstrate why this is more difficult for multi-objective problems, and introduce some approaches to selection.

Exercise

Repeat the same experiment in this section, but this time use the Rastrigin function (Equation 2, where $\mathrm{A} = 10$) instead of the Sphere function. $$ f(\mathbf{x}) = A n + \sum_{d=1}^D \left[x_i^2 - A\cos(2 \pi x_d)\right]\tag{2} $$

Setup Anaconda, Jupyter, and Tensorflow v1

Software Setup

We are taking a practical approach in the following sections. As such, we need the right tools and environments available in order to keep up with the examples and exercises. We will be using Python 3 along with typical packages from its scientific stack, such as numpy (for multi-dimensional containers), pandas (for DataFrames), plotly (for plotting), etc. These instructions are written for Tensorflow v1, so you will see some specific package versions listed throughout for compatibility. We will write all of our code within a Jupyter Notebook, but you are free to use other IDEs such as PyCharm or Spyder.

Jupyter Lab

Figure 1 - A Jupyter Notebook being edited within Jupyter Lab.
Theme from https://github.com/shahinrostami/theme-purple-please

Get Anaconda

There are many different ways to get up and running with an environment that will facilitate our work. One approach I can recommend is to install and use Anaconda.

Anaconda® is a package manager, an environment manager, a Python/R data science distribution, and a collection of over 1,500+ open source packages. Anaconda is free and easy to install, and it offers free community support.

https://docs.anaconda.com/anaconda/

To get up and running, just visit the Anaconda website and download a version of Anaconda for your operating system. I recommend getting the latest Anaconda version (2019.07 at the time of writing this section), and selecting a Python 3.X version.

Anaconda Download

Figure 2 - Downloading an Anaconda Distribution for your operating system.

Create Your Environment

Once Anaconda is installed, we need to create and configure our environment. Again, there are many ways to accomplish this. You can complete all the steps using the Anaconda Navigator (graphical interface), but we will use the command line interface, simply because it will give us a better report if and when something goes wrong.

If you added Anaconda to your PATH environment during the installation process, then you can run these commands directly from Terminal, Powershell, or CMD. If you didn't, then you can search for and run Anaconda Prompt.

Anaconda Prompt

Figure 3 - Searching for Anaconda Prompt on Windows 10.

Now we can create and configure our conda environment using the following commands.

conda create -n dmat python=3.6

This will create a conda environment named dmat with the latest Python 3.6 package ready to go. You should be presented with a list of packages that will be installed, and asked if you wish to proceed. To do so, just enter the character y. If this operation is successful, you should see the following output at the end:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate dmat
#
# To deactivate an active environment, use
#
#     $ conda deactivate

As the message suggests, you will need to type the following command to activate and start entering commands within our environment named dmat.

conda activate dmat

Once you do that, you should see your terminal prompt now leads with the environment name within parantheses:

(dmat) melica:~ shahin$

Note

The example above shows the macOS machine name "melica" and the user "shahin". You will see something different on your own machine, and it may appear in a different format on a different operating system such as Windows. As long as the prompt leads with "(dmat)", you are on the right track.

This will allow you to identify which environment you are currently operating in. If you restart your machine, you should be able to use conda activate dmat within your Anaconda prompt to get back into the same environment.

Install Packages

If you environment was already configured and ready, you would be able to enter the command jupyter lab to launch an instance of the Jupyter Lab IDE in the current directory. However, if we try that in our newly created environment, we will receive an error:

(dmat) melica:~ shahin$ jupyter lab
-bash: jupyter: command not found

So let's fix that. Let's install a few packages we know we will be needed:

  • Jupyter Lab
  • Numpy
  • Pandas
  • Plotly
  • Tensorflow

We will do them one-by-one to get a better idea of any errors if they occur. This time we will include the -y option which automatically says "yes" to any questions asking during the installation process.

conda install jupyterlab -y
conda install numpy=1.16 -y
conda install pandas -y
conda install plotly -y

You could do this in a single command, e.g.:

conda install jupyterlab numpy=1.16 pandas plotly -y

Let's install nodejs. This is needed to run our Jupyter Lab extension in the next section.

conda install -c conda-forge nodejs -y

Finally, let's install Tensorflow v1.

conda install -c conda-forge tensorflow=1

Install Jupyer Lab Extensions

There's one last thing we need to do before we move on, and that's installing any Jupyter Lab extensions that we may need. One particular extension that we need is the plotly extension, which will allow our Jupyter Notebooks to render our Plotly visualisations. Within your conda environment, simply run the following command:

jupyter labextension install @jupyterlab/plotly-extension

This may take some time, especially when it builds your jupyterlab assets, so keep an eye on it until you're returned control over the anaconda prompt, i.e. when you see the following:

(dmat) melica:~ shahin$

Now we're good to go!

A Quick Test

Let's test if everything is working as it should be. In your anaconda prompt, within your conda environment, run the following command

jupyter lab

This should start the Jupyter Lab server and launch a browser window with the IDE ready to use.

Jupyter Lab

Figure 4 - A fresh installation of Jupyter Lab.

Let's create a new notebook. In the Launcher tab which has opened by default, click "Python 3" under the Notebook heading. This will create a new and empty notebook named Untitled.ipynb in the current directory.

Let's try to import our packages. If everything is configured as it should be, you should see no errors. Type the following into the first cell and click "play" button to execute it and create a new cell.

In [1]:
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import plotly.express as px          # plotly express
import plotly.graph_objects as go    # for data visualisation
import plotly.io as pio              # to set shahin plot layout
from tensorflow import keras         # neural networks API

pio.templates['shahin'] = pio.to_templated(go.Figure().update_layout(legend=dict(orientation="h",y=1.1, x=.5, xanchor='center'),margin=dict(t=0,r=0,b=40,l=40))).layout.template
pio.templates.default = 'shahin'

If we followed all the instructions and didn't encounter any errors, everything should be working. The last two lines beginning with pio.templates are modifications made to the colours/margins of all figures rendered by Plotly, and do not need to be included.

Enter the following code into the new and empty cell (below the first one) to test if Plotly visualisations will render within the notebook.

In [13]:
tips = px.data.tips()
fig = px.histogram(tips, x="total_bill", color="sex")
fig.show()

If the Jupyter Lab extension is installed and functioning, you should see an interactive histogram.

Summary

In this section we've downloaded, installed, configured, and tested our environment such that we're ready to run the following examples and experiments. If you ever find that you're missing packages, you can install them in the same way as we installed numpy and the others in this section.

Using a Framework and the ZDT Test Suite

Preamble

In [1]:
# used to create block diagrams
%reload_ext xdiag_magic
%xdiag_output_format svg
    
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import plotly.graph_objects as go    # for data visualisation
import plotly.io as pio              # to set shahin plot layout
import platypus as plat              # multi-objective optimisation framework

pio.templates['shahin'] = pio.to_templated(go.Figure().update_layout(legend=dict(orientation="h",y=1.1, x=.5, xanchor='center'),margin=dict(t=0,r=0,b=40,l=40))).layout.template
pio.templates.default = 'shahin'

Introduction

When preparing to implement multi-objective optimisation experiments, it's often more convenient to use a ready-made framework/library instead of programming everything from scratch. There are many libraries and frameworks that have been implemented in many different programming languages, but as we're using Python we will be selecting from frameworks such as DEAP, PyGMO, and Platypus.

With our focus on multi-objective optimisation, our choice is an easy one. We will choose Platypus which has a focus on multi-objective problems and optimisation.

Platypus is a framework for evolutionary computing in Python with a focus on multiobjective evolutionary algorithms (MOEAs). It differs from existing optimization libraries, including PyGMO, Inspyred, DEAP, and Scipy, by providing optimization algorithms and analysis tools for multiobjective optimization.

As a first look into Platypus, let's repeat the process covered in the earlier section on "Synthetic Objective Functions and ZDT1", where we randomly initialise a solution and then evaluate it using ZDT1.

In [2]:
%%blockdiag
{
    orientation = portrait
    "Problem Variables" -> "Test Function" -> "Objective Values"
    "Test Function" [color = '#ffffcc']
}
blockdiag { orientation = portrait "Problem Variables" -> "Test Function" -> "Objective Values" "Test Function" [color = '#ffffcc'] } Problem VariablesTest FunctionObjective Values

The ZDT test function

Similar to the last time, we will be using a synthetic test problem throughout this notebook called ZDT1. It is part of the ZDT test suite, consisting of six different two-objective synthetic test problems. This is quite an old test suite, easy to solve, and very easy to visualise.

Mathematically, the ZDT11 two-objective test function can be expressed as:

$$ \begin{aligned} f_1(x_1) &= x_1 \tag{1} \\ f_2(x) &= g \cdot h \\ g(x_2,\ldots,x_{\mathrm{D}}) &= 1 + 9 \cdot \sum_{d=2}^{\mathrm{D}} \frac{x_d}{(V-1)}\\ h(f_1,g) &= 1 - \sqrt{f1/g} \end{aligned} $$

where $x$ is a solution to the problem, defined as a vector of $V$ decision variables.

$$ x= \langle x_{1},x_{2},\ldots,x_{\mathrm{D}} \rangle \tag{2} $$

and all decision variables fall between $0$ and $1$.

$$ 0 \le x_d \le 1, d=1,\ldots,\mathrm{D} \tag{3} $$

For this bi-objective test function, $f_1$ is the first objective, and $f_2$ is the second objective. This particular objective function is, by design, scalable up to any number of problem variables but is restricted to two problem objectives.

Let's start implementing this in Python, beginning with the initialisation of a solution according to Equations 2 and 3. In this case, we will have 30 problem variables $\mathrm{D}=30$.

In [3]:
D = 30
x = np.random.rand(D)
print(x)
[0.87237386 0.92808638 0.12822784 0.42461699 0.07676533 0.1059336
 0.10116919 0.98212044 0.50798059 0.18697349 0.57088301 0.0918622
 0.1445064  0.38973214 0.72360086 0.20382492 0.64733888 0.58831445
 0.39389366 0.68950765 0.75862002 0.16861867 0.97931378 0.51856309
 0.43889415 0.24061761 0.06984669 0.12362507 0.50009499 0.59296542]

Now that we have a solution to evaluate, let's implement the ZDT1 synthetic test function using Equation 1.

In [4]:
def ZDT1(x):
    f1 = x[0]  # objective 1
    g = 1 + 9 * np.sum(x[1:D] / (D-1))
    h = 1 - np.sqrt(f1 / g)
    f2 = g * h  # objective 2
    
    return [f1, f2]

Finally, let's invoke our implemented test function using our solution $x$ from earlier.

In [5]:
objective_values = ZDT1(x)
print(objective_values)
[0.8723738604708684, 2.761515792435468]

Now we can see the two objective values that measure our solution $x$ according to the ZDT1 synthetic test function, which is a minimisation problem.

Using a Framework

We've quickly repeated our earlier exercise, where we move from our mathematical description of ZDT1 to an implementation in Python. Now, let's use the Platypus' implementation of ZDT1, which would have saves us from having to implement it in Python ourselves.

We have already imported Platypus as plat above, so to get an instance of ZDT1 all we need to do is use the object constructor.

In [6]:
problem = plat.ZDT1()

Just like that, our variable problem references an instance of the ZDT1 test problem.

Now we need to create a solution in a structure that is defined by Platypus. This solution object is what Platypus expects when performing all of the operations that it provides.

In [7]:
solution = plat.Solution(problem)

By using the Solution() constructor and passing in our earlier instantiated problem, the solution is initialised with the correct number of variables and objectives. We can check this ourselves.

In [8]:
print(f"This solution's variables are set to:\n{solution.variables}")
print(f"This solution has {len(solution.variables)} variables")
This solution's variables are set to:
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
This solution has 30 variables
In [9]:
print(f"This solution's objectives are set to:\n{solution.objectives}")
print(f"This solution has {len(solution.objectives)} objectives")
This solution's objectives are set to:
[None, None]
This solution has 2 objectives

Earlier in this notebook we randomly generated 30 problem variables and stored them in the variable x. Let's assign this to our variables and check that it works.

In [10]:
solution.variables = x
print(f"This solution's variables are set to:\n{solution.variables}")
This solution's variables are set to:
[0.87237386 0.92808638 0.12822784 0.42461699 0.07676533 0.1059336
 0.10116919 0.98212044 0.50798059 0.18697349 0.57088301 0.0918622
 0.1445064  0.38973214 0.72360086 0.20382492 0.64733888 0.58831445
 0.39389366 0.68950765 0.75862002 0.16861867 0.97931378 0.51856309
 0.43889415 0.24061761 0.06984669 0.12362507 0.50009499 0.59296542]

Now we can invoke the evaluate() method which will use the assigned problem to evaluate the problem variables and calculate the objective values. We can print these out afterwards to see the results.

In [11]:
solution.evaluate()
print(solution.objectives)
[0.8723738604708684, 2.761515792435468]

These objectives should be the same as the ones that were calculated by our own implementation of ZDT1, within some margin of error.

Conclusion

In this section, we introduced a framework for multi-objective optimisation and analysis. We used it to create an instance of the ZDT1 test problem, which we then used to initialise an empty solution. This empty solution was then assigned the randomly generated problem variables and then evaluated according to ZDT1 to calculate our objective values.

Exercise

Using the framework introduced in this section, evaluate a number of randomly generated solutions for ZDT2, ZDT3, ZDT4, and ZDT6.


  1. E. Zitzler, K. Deb, and L. Thiele. Comparison of Multiobjective Evolutionary Algorithms: Empirical Results. Evolutionary Computation, 8(2):173-195, 2000 

CVSS Exploratory Data Analysis

Preamble

In [136]:
# used to create block diagrams
%reload_ext xdiag_magic
%xdiag_output_format svg
    
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import plotly.graph_objects as go    # for data visualisation
import plotly.io as pio              # to set shahin plot layout
from wordcloud import WordCloud      # visualising word clouds
import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = [10, 10]
pio.templates['shahin'] = pio.to_templated(go.Figure().update_layout(legend=dict(orientation="h",y=1.1, x=.5, xanchor='center'),margin=dict(t=0,r=0,b=40,l=40))).layout.template
pio.templates.default = 'shahin'

Dataset

In [82]:
data = pd.read_csv('../data/nvd_bufferoverflow.csv')
data['published_date'] = pd.to_datetime(data['published_date']).dt.date # date only, remove time
data.tail()
Out[82]:
vuln_id cvss_score3 cvss_score2 summary published_date
275 CVE-2018-4003 9.8 7.5 An exploitable heap overflow vulnerability ex... 2019-03-21
276 CVE-2019-6778 7.8 4.6 In QEMU 3.0.0, tcp_emu in slirp/tcp_subr.c has... 2019-03-21
277 CVE-2019-9895 9.8 7.5 In PuTTY versions before 0.71 on Unix, a remo... 2019-03-21
278 CVE-2019-9903 6.5 4.3 PDFDoc::markObject in PDFDoc.cc in Poppler 0.... 2019-03-21
279 CVE-2019-9904 6.5 4.3 An issue was discovered in lib\\cdt\\dttree.c ... 2019-03-21

Introduction

In [8]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 280 entries, 0 to 279
Data columns (total 5 columns):
vuln_id           280 non-null object
cvss_score3       248 non-null float64
cvss_score2       248 non-null float64
summary           280 non-null object
published_date    280 non-null object
dtypes: float64(2), object(3)
memory usage: 11.0+ KB
In [118]:
print(f"Earliest date {data.published_date.min()}")
print(f"Latest date {data.published_date.max()}")
print(f"Over {(data.published_date.max() - data.published_date.min()).days} days")
Earliest date 2019-03-15
Latest date 2019-07-29
Over 136 days

Two numerical features, cvss_score3 and cvss_score2. There is a difference in severity classification and base score range between the two metrics. E.g. a score of $> 9.0$ is classified as "Critical" in CVSS3, but only "High" in CVSS2.

In [11]:
data.describe()
Out[11]:
cvss_score3 cvss_score2
count 248.000000 248.000000
mean 8.393952 6.742339
std 1.321331 1.815771
min 3.700000 2.100000
25% 7.500000 5.000000
50% 8.800000 6.800000
75% 9.800000 7.500000
max 9.800000 10.000000

How many missing vulnerability scores?

In [135]:
data.isna().sum()
Out[135]:
vuln_id            0
cvss_score3       32
cvss_score2       32
summary            0
published_date     0
dtype: int64
In [77]:
fig = go.Figure()

fig.add_trace(go.Box(y=data.cvss_score3, name='CVSS3'))
fig.add_trace(go.Box(y=data.cvss_score2, name='CVSS2'))

fig.show()

CVSS distributions

In [58]:
fig = go.Figure()

fig.add_trace(go.Histogram(x=data.cvss_score3, name='CVSS3'))
fig.add_trace(go.Histogram(x=data.cvss_score2, name='CVSS2'))

fig.update_traces(opacity=0.75)
fig.show()

The summary field has some description of the event relating to the vulnerability detected. This could potentially be quantified and used for auxiliary analysis.

In [46]:
wordcloud = WordCloud(width=600, height=600, background_color="white").generate(str(data.summary.values))
plt.figure(figsize=(10,10), dpi=80)
plt.imshow(wordcloud)
plt.axis('off')
plt.show()

Timeseries

In [122]:
data = data.set_index('published_date', drop=False)
data.sort_index(inplace=True)
data.tail()
Out[122]:
vuln_id cvss_score3 cvss_score2 summary published_date
published_date
2019-07-28 CVE-2019-14323 NaN NaN SSDP Responder 1.x through 1.5 mishandles inco... 2019-07-28
2019-07-28 CVE-2019-14363 NaN NaN A stack-based buffer overflow in the upnpd bin... 2019-07-28
2019-07-29 CVE-2019-14267 NaN NaN PDFResurrect 0.15 has a buffer overflow via a ... 2019-07-29
2019-07-29 CVE-2019-13126 NaN NaN An integer overflow in NATS Server 2.0.0 allow... 2019-07-29
2019-07-29 CVE-2019-14378 NaN NaN ip_reass in ip_input.c in libslirp 4.0.0 has a... 2019-07-29

Vunerabilities published daily

In [128]:
daily_frequency = data.published_date.value_counts()
daily_frequency.sort_index(inplace=True)

fig = go.Figure()

fig.add_trace(go.Scatter(x=daily_frequency.index.values, y=daily_frequency.values))

fig.show()

Cumulative mean

In [134]:
daily_frequency = data.published_date.value_counts()
daily_frequency.sort_index(inplace=True)

fig = go.Figure()

fig.add_trace(go.Scatter(x=data.published_date, y=data.cvss_score3.expanding().mean(), name='CVSS3'))
fig.add_trace(go.Scatter(x=data.published_date, y=data.cvss_score2.expanding().mean(), name='CVSS2'))

fig.show()

Populations in Objective and Decision Space

Preamble

In [18]:
# used to create block diagrams
%reload_ext xdiag_magic
%xdiag_output_format svg
    
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import plotly.graph_objects as go    # for data visualisation
import plotly.io as pio              # to set shahin plot layout

pio.templates['shahin'] = pio.to_templated(go.Figure().update_layout(margin=dict(t=0,r=0,b=40,l=40))).layout.template
pio.templates.default = 'shahin'

Introduction

In single-objective problems, the objective is to find a single candidate solution which represents the global optimum in the entire search space. Multi-objective problems often involve conflicts between multiple objectives, and as a result, it is unlikely that there exists a single optimal solution. Therefore, the solution to a multi-objective problem cannot be a single candidate solution, but instead a set of candidate solutions which represent the optimum trade-off surface in the objective space. This set is referred to as an approximation set, as it is an approximation of the theoretical optima.

Through the optimisation process in an Evolutionary Multi-Objective Optimisation Algorithm, a population of solutions is iteratively subjected to various operators until termination, at which point this final population (or a sub-set) is determined to be the approximation set.

In [19]:
%%blockdiag
{
    orientation = portrait
    Initialisation ->Evaluation -> "Terminate?" ->Selection -> Variation -> Evaluation
}
blockdiag { orientation = portrait Initialisation ->Evaluation -> "Terminate?" ->Selection -> Variation -> Evaluation } InitialisationEvaluationTerminate?SelectionVariation

We can introduce and describe these concepts through a demonstration, by generating a population of decision variables and using the ZDT1 test function (described in earlier sections) as the objective function.

In [20]:
def ZDT1(x):
    f1 = x[0] # objective 1
    g = 1 + 9 * np.sum(x[1:D] / (D-1))
    h = 1 - np.sqrt(f1/g)
    f2 = g * h # objective 2
    
    return [f1, f2]

We will describe these concepts mathematically and then implement them using Python.

Populations in Objective and Decision Space

We will begin by defining a population of size $\mathrm{N}=50$, number of decision variables $\mathrm{D}=30$ where each variable falls between 0 and 1 ($x_d \in [0,1]$), and number of objective values $\mathrm{M}=2$.

In [21]:
N = 50
D = 30
D_lower = np.zeros((1, D))
D_upper = np.ones((1, D))
M = 2

Let's define a population $\mathrm{X}$ of $\mathrm{N}$ candidate solutions.

$$ \mathbf{X} = \langle \mathrm{X}_1,\mathrm{X}_2,\ldots,\mathrm{X}_{\mathrm{N}}\rangle $$

where $\mathrm{N}$ refers to the number of solutions in the population, $\mathrm{X}_{n}$ refers to the $n$-th solution in the population, and $x_{dn}$ refers to the $d$-th decision variable of the $n$-th solution in the population.

$$ \mathrm{X}_{n} = \langle x_{1n},x_{2n},\ldots,x_{\mathrm{D}n} \rangle $$

Let's initialise this population with random decision variables.

In [22]:
X = pd.DataFrame(np.random.uniform(low=D_lower, high=D_upper, size=(N,D)))
X.head(5) # Show only the first 5 solutions 
Out[22]:
0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 28 29
0 0.658366 0.738013 0.160028 0.553322 0.267253 0.773616 0.519313 0.321656 0.575206 0.281367 ... 0.931161 0.829619 0.117695 0.482267 0.462349 0.358292 0.700312 0.506833 0.362892 0.777778
1 0.346467 0.292960 0.492081 0.256995 0.207796 0.440034 0.966651 0.770660 0.681126 0.124248 ... 0.983806 0.780162 0.460022 0.613621 0.393904 0.630011 0.806656 0.433402 0.595858 0.321959
2 0.410500 0.383482 0.106059 0.818280 0.786929 0.659367 0.947222 0.237282 0.315053 0.385614 ... 0.185948 0.681667 0.483836 0.565063 0.189563 0.101392 0.917253 0.962605 0.811564 0.720876
3 0.814664 0.949217 0.369056 0.443638 0.949463 0.329071 0.013348 0.629007 0.308824 0.976051 ... 0.261085 0.089768 0.009929 0.453266 0.725004 0.482853 0.157002 0.943433 0.675229 0.765054
4 0.033328 0.761892 0.675197 0.384937 0.859946 0.512067 0.300232 0.813839 0.471706 0.386304 ... 0.218133 0.667969 0.967386 0.448930 0.970140 0.950803 0.255558 0.425681 0.774692 0.213606

5 rows × 30 columns

Similarly, we will define the corresponding objective values $\mathrm{Y}$ of the population $\mathrm{X}$.

$$ \mathbf{Y} = \langle \mathrm{Y}_1,\mathrm{Y}_2,\ldots,\mathrm{Y}_{\mathrm{N}}\rangle $$

where $\mathrm{N}$ refers to the number of objective value sets, $\mathrm{Y}_{n}$ refers to the $n$-th set of objective values in the population, and $y_{mn}$ refers to the $m$-th objective value of the $n$-th set of objective values in the population.

$$ \mathrm{Y}_{n} = \langle y_{1n},y_{2n},\ldots,y_{\mathrm{M}n} \rangle $$

To generate $\mathrm{Y}$, we will evaluate each solution $\mathrm{X}_{\mathrm{N}}$ using the ZDT1() test function defined above.

In [23]:
Y = np.empty((0, 2))

for n in range(N):
    y = ZDT1(X.iloc[n])
    Y = np.vstack([Y, y])
    
# convert to DataFrame
Y = pd.DataFrame(Y, columns=['f1','f2'])
Y.head(5) # Shows only first 5 sets of objective values
Out[23]:
f1 f2
0 0.658366 3.631890
1 0.346467 4.304928
2 0.410500 3.849521
3 0.814664 3.448844
4 0.033328 5.218485

Finally, we will visualise each of our 50 solutions in objective space.

In [26]:
fig = go.Figure(layout=dict(xaxis=dict(title='f1'),yaxis=dict(title='f2')))

# This is not how we would normally plot this data.
# Here, it is done this way so that the colours match those in the visualisation below.
for index, row in Y.iterrows():
    fig.add_scatter(x=[row.f1], y=[row.f2], name=f'solution {index+1}', mode='markers')

fig.show()

For completeness, we may also wish to visualise our 50 solutions in the decision space, although it is not very useful at this point.

In [27]:
fig = go.Figure(layout=dict(xaxis=dict(title='decision variables', range=[1, D]),yaxis=dict(title='value')))

for index, row in X.iterrows():
    fig.add_scatter(x=X.columns.values+1, y=row, name=f'solution {index+1}')

fig.show()

Conclusion

In this section, we have covered how to generate and visualise a population in the objective space and decision space. We did this using the ZDT1 synthetic test function, according to the specified population size and dimensionality of the objective space and decision space. You will find many different styles of mathematical notation employed to describe populations in the Evolutionary Algorithm literature, however, you should be able to understand them if you learn the ones described in this section.

Exercise

Using the ZDT test suite paper listed in the references, duplicate this notebook but with the focus on ZDT2 instead of ZDT1.

Population Initialisation

Preamble

In [1]:
# used to create block diagrams
%reload_ext xdiag_magic
%xdiag_output_format svg
    
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import plotly.graph_objects as go    # for data visualisation
import plotly.io as pio              # to set shahin plot layout
import plotly.express as px

pio.templates['shahin'] = pio.to_templated(go.Figure().update_layout(margin=dict(t=0,r=0,b=40,l=40))).layout.template
pio.templates.default = 'shahin'

Introduction

Before the main optimisation process (the "generational loop") can begin, we need to complete the initialisation stage of the algorithm. Typically, this involves generating the initial population of solutions by randomly sampling the search-space. We can see in the figure below that this initialisation stage is the first real stage, and it's only executed once. There are many schemes for generating the initial population, and some even include simply loading in a population from an earlier run of an algorithm.

In [2]:
%%blockdiag
{
    orientation = portrait
    Initialisation -> Evaluation -> "Terminate?" -> Selection -> Variation -> Evaluation
    Initialisation [color = '#ffffcc']
}
blockdiag { orientation = portrait Initialisation -> Evaluation -> "Terminate?" -> Selection -> Variation -> Evaluation Initialisation [color = '#ffffcc'] } InitialisationEvaluationTerminate?SelectionVariation

Randomly sampling the search-space

When generating an initial population, it's often desirable to have a diverse representation of the search space. This supports better exploitation of problem variables earlier on in the search, without having to rely solely on exploration operators.

We previously defined a solution $x$ as consisting of many problem variables.

$$ x=\langle x_{1},x_{2},\ldots,x_{\mathrm{D}} \rangle \tag{1} $$

We also defined a multi-objective function $f(x)$ as consisting of many objectives.

$$ f(x) =(f_{1}(x),f_{2}(x),\ldots,f_{M}(x))\tag{2} $$

However, we need to have a closer look at how we describe a general multi-objective optimisation problem before we initialise our initial population.

$$ \left.\begin{array}{lll}\tag{3} optimise & f_{m}(x), & m=1,2,\ldots,\mathrm{M};\\ subject\, to & g_{j}(x)\geq0, & j=1,2,\ldots,J;\\ & h_{k}(x)=0, & k=1,2,\ldots,K;\\ & x_{d}^{(L)}\leq x_{d}\leq x_{d}^{(U)} & d=1,2,\ldots,\mathrm{D}; \end{array}\right\} $$

We may already be familiar with some parts of Equation 3, but there are some we haven't covered yet. There are $\mathrm{M}$ objective functions which can be either minimised or maximised. The constraint functions $g_j(x)$ and $h_k(x)$ impose inequality and equality constraints which must be satisfied by a solution $x$ for it to be considered a feasible solution. Another condition which affects the feasibility of a solution $x$ is whether the problem variables fall between (inclusively) the lower $x_{d}^{(L)}$ and upper $x_{d}^{(U)}$ boundaries within the decision space.

The lower $x_{d}^{(L)}$ and upper $x_{d}^{(U)}$ boundaries may not be the same for each problem variable. For example, we can define the following upper and lower boundaries for a problem with 10 problem variables.

In [3]:
D_lower = [-2, -2, -2, 0, -5, 0.5, 1, 1, 0, 1]
D_upper = [ 1,  2,  3, 1, .5, 2.5, 5, 5, 8, 2]

In Python, we normally use np.random.rand() to generate random numbers. If we want to generate a population of 20 solutions, each with 10 problem variables ($\mathrm{D} = 10$), we could try something like the following.

In [4]:
D = 10
population = pd.DataFrame(np.random.rand(20,D))
population
Out[4]:
0 1 2 3 4 5 6 7 8 9
0 0.590342 0.009451 0.890784 0.306292 0.984712 0.682825 0.473914 0.560432 0.111785 0.462240
1 0.921900 0.066398 0.327026 0.604339 0.166585 0.673214 0.125520 0.963955 0.099468 0.196180
2 0.945010 0.132590 0.313339 0.556327 0.457996 0.005666 0.861911 0.139357 0.612309 0.731180
3 0.211811 0.759304 0.309044 0.498986 0.130329 0.705464 0.546643 0.090862 0.339554 0.065640
4 0.856414 0.115943 0.593150 0.125143 0.294797 0.850350 0.163504 0.928731 0.150811 0.400341
5 0.472995 0.760259 0.789299 0.738774 0.950839 0.681109 0.388965 0.930970 0.141688 0.685523
6 0.700306 0.272724 0.909531 0.735807 0.799667 0.809203 0.488569 0.805074 0.720130 0.158046
7 0.524256 0.853185 0.128229 0.925865 0.743851 0.165464 0.473188 0.302935 0.212842 0.745215
8 0.814943 0.421095 0.810756 0.482329 0.565891 0.811648 0.965809 0.201598 0.940368 0.292690
9 0.076506 0.718995 0.779093 0.123325 0.898121 0.229695 0.989757 0.364593 0.086867 0.145217
10 0.437626 0.181315 0.101115 0.426660 0.418920 0.036345 0.578733 0.080113 0.514906 0.247020
11 0.300621 0.715011 0.164628 0.958213 0.060206 0.322342 0.112500 0.473499 0.208361 0.132181
12 0.473655 0.872924 0.351757 0.881928 0.050773 0.664500 0.519606 0.987282 0.028262 0.062119
13 0.400474 0.397302 0.328870 0.862925 0.981874 0.885058 0.694371 0.707866 0.578021 0.255265
14 0.790887 0.472954 0.869448 0.352427 0.178975 0.343769 0.325636 0.978543 0.500829 0.676535
15 0.245241 0.711101 0.226324 0.291004 0.499567 0.609855 0.765306 0.861456 0.668297 0.230549
16 0.756543 0.163826 0.555273 0.045854 0.000470 0.232882 0.217587 0.529543 0.045853 0.181732
17 0.896667 0.945724 0.013270 0.651433 0.515852 0.674524 0.932024 0.651124 0.444133 0.354258
18 0.264448 0.247095 0.790230 0.637688 0.239949 0.851213 0.014877 0.825981 0.949973 0.617948
19 0.264197 0.528780 0.007522 0.040722 0.931318 0.009531 0.171899 0.182540 0.677357 0.316056

This works fine if all of our problem variables are to be within the boundaries 0 and 1 ($x_d \in [0,1]$). However, in this case, we have 10 different upper and lower boundaries, so we can use np.random.uniform() instead.

In [5]:
population = pd.DataFrame(np.random.uniform(low=D_lower, high=D_upper, size=(20,D)))
population
Out[5]:
0 1 2 3 4 5 6 7 8 9
0 -0.422994 1.352138 -1.191633 0.680231 -4.692353 1.218220 4.584070 3.037505 2.408068 1.941658
1 0.329578 1.968430 2.595166 0.010203 -3.830213 1.733408 1.507408 2.229782 5.144455