NDArray Index Arrays and Mask Index Arrays

Preamble

In [2]:
:dep darn = {version = "0.1.11"}
:dep ndarray = {version = "0.13.0"}
:dep itertools = {version = "0.9.0"}
:dep plotly = {version = "0.4.0"}
extern crate ndarray;

use ndarray::prelude::*;
use itertools::Itertools;
use plotly::{Plot, Scatter, Layout, Rgb, NamedColor};
use plotly::common::{Mode, Title, Marker, Line};
use plotly::layout::{Axis};

Introduction

NumPy has many features that Rust's NDArray doesn't have yet, e.g. index arrays and mask index arrays. For example, when we index a one-dimensional array we often use a single integer value to return an element at the corresponding position. That is, given an array example containing ten floating-point value elements

In [3]:
let example = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9];

We would access the third element using

In [4]:
example[2]
Out[4]:
0.3

However, there is more than one way to index an array! We may wish to index an array with another array to select multiple samples at once, e.g. example[[2, 5, 8]] for the third, sixth, and ninth sample. We may also want to index an array using a boolean mask array, allowing us to select samples based on some criteria, e.g. example[example > 0.5] to select all samples greater than $0.5$.

Currently, NDArray doesn't offer an easy way to index an array with another array or with a mask array, but it can still be achieved with some extra work. What follows is an approach for selecting samples using index and mask arrays.

Loading our Dataset

We will continue using the Iris Flower dataset, so we will load it using the darn crate to avoid repetition.

We will only be using our species labels, stored in labels, throughout the rest of this section.

In [5]:
let iris = darn::iris_typed();

The darn::iris_typed() function returns a tuple of type (Array2::<f32>, Vec<String>, Array1::<String>), where the first element is an array containing our iris flower features, the second element a vector of our feature headers, and the final element is an array containing our iris species labels. As always, let's have a quick look at a few samples from our features.

In [6]:
darn::show_frame(&iris.0, Some(&iris.1));
Out[6]:
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
5.1 3.5 1.4 0.2
4.9 3.0 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5.0 3.6 1.4 0.2
... ... ... ...
6.7 3.0 5.2 2.3
6.3 2.5 5.0 1.9
6.5 3.0 5.2 2.0
6.2 3.4 5.4 2.3
5.9 3.0 5.1 1.8

We'll also check the unique elements in our labels.

In [7]:
iris.2.iter().unique().format("\n")
Out[7]:
"Iris-setosa"
"Iris-versicolor"
"Iris-virginica"

To make things easier for ourselves throughout the rest of this section, let's assign the various parts of our dataset to different variables.

In [8]:
let features = iris.0;
let headers = iris.1;
let labels = iris.2;

Index Array

During our analyses, we may encounter the need to select multiple samples from our dataset. For example, we may wish to select the samples at index $0$, $10$, and $20$. Let's output the samples at these indeces for reference.

In [9]:
println!("Sample 0: {:}", features.row(0));
Sample 0: [5.1, 3.5, 1.4, 0.2]
In [10]:
println!("Sample 10: {:}", features.row(10));
Sample 10: [5.4, 3.7, 1.5, 0.2]
In [11]:
println!("Sample 20: {:}", features.row(20));
Sample 20: [5.4, 3.4, 1.7, 0.2]

To return these samples all at once using an array as the index, we can use the ArrayBase::select() function:

Select arbitrary subviews corresponding to indices and and copy them into a new array.

The first parameter for this function is the axis we wish to select along, and the second is an array containing the desired indices.

In [12]:
println!("{:}",features.select(Axis(0), &[0, 10, 20]));
[[5.1, 3.5, 1.4, 0.2],
 [5.4, 3.7, 1.5, 0.2],
 [5.4, 3.4, 1.7, 0.2]]

If we check these against the individually indexed samples above, we can see that it has worked as intended.

Index Mask Arrays

We may also want to use a mask to index our array. To work around this missing feature in NDArray, we can build a boolean mask and then use it to generate an index array. We can also use column-wise boolean operations when considering multiple column masks.

Building Boolean Masks

We can build a boolean mask of the same shape as our array with true values where some condition is met. For example, in NumPy we could do features > 0.5 to create a mask with true where values are over $0.5$, and false elsewhere.

We can do the same with Rust and ndarray.

In [13]:
let mask = features.map(|elem| *elem > 0.5);

Let's take peek at our mask to see how it looks.

In [14]:
darn::show_frame(&mask, Some(&headers))
Out[14]:
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
true true true false
true true true false
true true true false
true true true false
true true true false
... ... ... ...
true true true true
true true true true
true true true true
true true true true
true true true true

We could also build a mask using our labels. For example, we may want to return true where elements are equal to Iris-virginica.

In [15]:
let mask = labels.map(|elem| elem == "Iris-virginica");
println!("{:}", mask);
[false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true]

Indexing with Mask Arrays

Now to build an index array and a mask array simultaneously.

In [16]:
let mut count = -1;
let mut indices = Vec::<usize>::new();
let mask = labels.map(|elem| {
    count += 1;    
    if(elem == "Iris-virginica") { indices.push(count as usize) };
    elem == "Iris-virginica"
    }
);

With this approach, we're iterating through every element in the array we wish to mask, labels. When our criteria is satisfied, i.e. elem == "Iris-virginica", we're pushing the curent index stored in count to a vector named indices. The map transormation itself builds the mask based on the criteria specified.

Let's have a look at the indices returnd from this approach.

In [17]:
indices
Out[17]:
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149]

Finally, we can use these indices to select all the samples which belong to the Virginica species.

In [18]:
let virginica = features.select(Axis(0), &indices);
darn::show_frame(&virginica, Some(&headers));
Out[18]:
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
6.3 3.3 6.0 2.5
5.8 2.7 5.1 1.9
7.1 3.0 5.9 2.1
6.3 2.9 5.6 1.8
6.5 3.0 5.8 2.2
... ... ... ...
6.7 3.0 5.2 2.3
6.3 2.5 5.0 1.9
6.5 3.0 5.2 2.0
6.2 3.4 5.4 2.3
5.9 3.0 5.1 1.8

Plotting with Plotly

It's always helpful to visualise what we've achieved. Let's plot the petal width and height for all of our samples, and then plot the same for all samples of the Virginica species in a different colour.

In [19]:
let layout = Layout::new()
    .xaxis(Axis::new().title(Title::new("Length (cm)")))
    .yaxis(Axis::new().title(Title::new("Width (cm)")));

let petal = Scatter::new(features.column(2).to_vec(), features.column(3).to_vec())
    .mode(Mode::Markers)
    .name("Petal (All)")
    .marker(Marker::new().color(Rgb::new(69, 57, 172)).size(12));
    
let petal_v = Scatter::new(virginica.column(2).to_vec(), virginica.column(3).to_vec())
    .mode(Mode::Markers)
    .name("Petal (Virginica)")
    .marker(Marker::new().color(Rgb::new(234, 105, 0)).size(12))
    .line(Line::new().color(NamedColor::White).width(0.5));

let mut plot = Plot::new();

plot.set_layout(layout);
plot.add_trace(petal);
plot.add_trace(petal_v);

darn::show_plot(plot);
Out[19]: