Introduction

This demo has two goals. The first is to show the use of the 'ml' module. This allows the use of the software 'LightGBM' and 'XGBoost' during the analysis of hyperspectral images. The second goal presents an original way to build a search engine dedicated to hyperspectral imagery. This solution uses little computational resources while maintaining a good accuracy.

Snow entity learning with LightGBM

This is a remake of a previous example "Snow entity learning with Random Forest". The code is a revised version to improve readability and ease of use.

This example follow the standard three steps that is the hallmark of machine learning.

First, we create the learn and test sets. For doing, we utilize an original way of clustering the snow spectra from an images selection.

Next, the learning process follows. We take 25% of the snow spectra set for learning purpose. The accuracy is calculated on the remaining 75%. A feature importances graph is generated. It's one of the sweet spot of using Gradient Boosting, he has some explanatory capacity.

Finally, we exercise the model prediction on a selected images set, some with snow and others without. We add a difficulty level by using no-snow images that can be confusing for the learned model. And in some case it is, see the first note below. To have a better visual feedback we show the result of applying the model prediction on all the image including the spectra used to learn the model. It does not follow the standard way of doing, but it's easier to visually assess the results.

Notes:

  • The images electromagnetic spectrum used here is in the visual range. This is not the best range to identify matter like snow. Near infrared is a better bet.
  • A basic hyperparameters tuning was done before the learning. But not so much.
  • Tree Gradient Boosting is efficient for hyperspectral image classification, see the paper: B. T. Abe, O. O. Olugbara and T. Marwala, Hyperspectral Image Classification using Random Forests and Neural Networks, Proceedings of the World Congress on Engineering and Computer Science, 2012 Vol1, San Francisco.
  • Results from XGBoost and LightGBM are almost the same. However, LightGBM is faster.
In [14]:
%matplotlib inline

from __future__ import print_function
import os
import os.path as osp
import numpy as np

# pysptools.ml wrap LightGBM and XGBoost
import pysptools.ml as ml
# pysptools.skl is a bridge to some scikit-learn functionalities
import pysptools.skl as skl

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


# estimator is one of ml.HyperLGBMClassifier or ml.HyperXGBClassifier
def fit_model(rpath, images, cmaps, estimator, param, stat=False):
    
    def accuracy(model, X_test, y_test):
        y_pred = model.predict(X_test)
        # evaluate predictions
        accuracy = accuracy_score(y_test, y_pred)
        print("Accuracy: %.2f%%" % (accuracy * 100.0))
        X,y = skl.shape_to_XY(images, cmaps)
        
    X,y = skl.shape_to_XY(images, cmaps)
    seed = 5
    train_size = 0.25
    X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=train_size,
                                                        random_state=seed)
    model = estimator(**param)
    model.fit(X_train, y_train)
    if stat == True:
        accuracy(model, X_test, y_test)
    if rpath == None:
        model.display_feature_importances(sort=True, n_labels='all')
    else:
        model.plot_feature_importances(rpath, sort=True, n_labels='all')
    return model
In [15]:
# Environment, input/output
home_path = os.environ['HOME']
source_path = osp.join(home_path, 'dev-data/CZ_hsdb')
result_path = None

snow_fname = ['img1','img2','imgc7','imga6','imgb3','imgc1','imgc4','imgc5']
no_snow_fname = ['imga1','imgb1','imgb6','imga7']

# collect scaled images
snow_img = []
# collect classification maps
snow_cmap = []

First step: build a snow spectra set

To extract the snow spectra set we use a clustering method. The clusters are made with scikit-learn estimators. We use estimators as pattern-matcher and within a trail and error methodology. Note that for the cases presented, clustering is not fine tune. We can do a better clustering if we take our time.

The snow spectra set is build from eight images.

In [16]:
# img1

# get_scaled_img_and_class_map:
#    * load the image
#    * shrink it by 3
#    * scale
#    * class map = Cluster(estimator, estimator_param)
#    * return the scaled/shrinked image and the class map

scaled, cmap = ml.get_scaled_img_and_class_map(source_path, result_path, 'img1', 
                          [['Snow',{'rec':(41,79,49,100)}]],
                          skl.HyperGaussianNB, None,
                          display=True)

snow_img.append(scaled)
snow_cmap.append(cmap)
In [17]:
# img2
scaled, cmap = ml.get_scaled_img_and_class_map(source_path, result_path, 'img2', 
                          [['Snow',{'rec':(83,50,100,79)},{'rec':(107,151,111,164)}]],
                          skl.HyperLogisticRegression, {'class_weight':{0:1.0,1:5}},
                          display=True)

snow_img.append(scaled)
snow_cmap.append(cmap)
In [18]:
# imgc7
scaled, cmap = ml.get_scaled_img_and_class_map(source_path, result_path, 'imgc7', 
                          [['Snow',{'rec':(104,4,126,34)},{'rec':(111,79,124,101)}]],
                          skl.HyperSVC, {'class_weight':{0:1,1:10},'gamma':0.5},
                          display=True)

# Clean the top half:
cmap[0:55,0:cmap.shape[1]] = 0
ml.display_img(cmap, 'imgc7 class map cleaned')

snow_img.append(scaled)
snow_cmap.append(cmap)
In [19]:
# imga6
scaled, cmap = ml.get_scaled_img_and_class_map(source_path, result_path, 'imga6', 
                          [['Snow',{'rec':(5,134,8,144)}]],
                          skl.HyperLogisticRegression, {'class_weight':{0:1.0,1:5}},
                          display=True)

snow_img.append(scaled)
snow_cmap.append(cmap)
In [20]:
# imgb3
scaled, cmap = ml.get_scaled_img_and_class_map(source_path, result_path, 'imgb3', 
                          [['Snow',{'rec':(99,69,103,95)}]],
                          skl.HyperLogisticRegression, {'class_weight':{0:1.0,1:5}},
                          display=True)

snow_img.append(scaled)
snow_cmap.append(cmap)
In [21]:
# imgc1
scaled, cmap = ml.get_scaled_img_and_class_map(source_path, result_path, 'imgc1', 
                          [['Snow',{'rec':(51,69,54,91)},{'rec':(101,3,109,16)}]],
                          skl.HyperLogisticRegression, {'class_weight':{0:1.0,1:5}},
                          display=True)

snow_img.append(scaled)
snow_cmap.append(cmap)
In [22]:
# imgc4
scaled, cmap = ml.get_scaled_img_and_class_map(source_path, result_path, 'imgc4', 
                          [['Snow',{'rec':(47,61,49,63)}]],
                          skl.HyperSVC, {'class_weight':{0:0.05,1:40}},
                          display=True)

snow_img.append(scaled)
snow_cmap.append(cmap)
In [23]:
# imgc5
scaled, cmap = ml.get_scaled_img_and_class_map(source_path, result_path, 'imgc5', 
                          [['Snow',{'rec':(17,151,20,156)}]],
                          skl.HyperLogisticRegression, {'class_weight':{0:1.0,1:5}},
                          display=True)

snow_img.append(scaled)
snow_cmap.append(cmap)

Second step: learn

Do a standard binary learning using LightGBM. Note that PySptools.ml can also do multiclass classification. For this example, we use the HyperLGBMClassifier estimator. See the pysptools.ml module doc.

In [24]:
n_shrink = 3
# These images have no snow. They are shrinked and scaled.
no_snow_img = ml.batch_load(source_path, no_snow_fname, n_shrink)

# Take image dimensions
v, h, bands = snow_img[0].shape
# Build an all background class map, 
# for pysptools a background pixel have an id of zero
bkg_cmap = np.zeros((v, h))

# Parameters used by LightGBM, no true effort have been done to fine tune,
# but  num_leaves=10 and max_depth=10 seems a minimum
tune = {'boosting_type':"gbdt", 'num_leaves':10, 'max_depth':10,
        'learning_rate':0.1, 'n_estimators':10,
        'subsample_for_bin':50000, 'objective':None,
        'min_split_gain':0., 'min_child_weight':5, 'min_child_samples':10}

# We use HyperLGBMClassifier as estimator
model = fit_model(result_path, snow_img + no_snow_img,
                  snow_cmap + [bkg_cmap] * 4,
                  ml.HyperLGBMClassifier, tune, stat=True)

# Save the model somewhere and reload it
result_path = osp.join(home_path, 'results')
# '2' is the number of classes, i.e. background and snow
model.save(osp.join(result_path, 'lgbm_model'), bands, 2)
# reload
model = ml.load_lgbm_model(osp.join(result_path, 'lgbm_model'))
result_path = None
Accuracy: 98.72%

Third step: verification

The model is applied to a set of images to verify its accuracy. It's a visual validation. The same idea can be use for a SEARCH ENGINE setup. False positives are particularly visible.

In [25]:
ml.batch_classify(source_path, result_path, model, snow_fname + no_snow_fname, n_shrink)