LIME For Regression Problems

LIME For Regression Problems
Photo by Belchonock

LIME stands for Local Interpretable Model Agnostic Explanations. Lime is one of the techniques of visualization that helps to explain individual predictions with model interpretability. Now, let's look what LIME means..

Local surrogate models are interpretable models used to explain individual predictions of black box models in machine learning. LIME focuses on training local surrogate models.

The term model-agnostic means it can be applied on any machine learning models. To know more about model-agnostic, here is the  The LIME technique is to understand the perturbing data samples and understand the changes in the prediction.

Understanding LIME algorithm

LIME explains the prediction of desired input by sampling its neighboring inputs and learning a sparse linear model based on the predictions of these neighbors, features with large coefficients in linear model are then considered to be important for that input's prediction.

Specifically, generating a local explanation for an input requires sampling around the input to generate an explanation for its prediction.

Let's do some hands on with Python.

Before you get started, we need to install lime

!pip install lime

Next, import all the required libraries

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

Load the dataset

data_path = '/content/Life Expectancy Data.csv'
data = pd.read_csv(data_path)
print(data.head())

Let's look if there are null values in the data. There are lot of missing values in the dataset. For convenience, fill them by the mean of respective columns.

data.isnull().sum()
data = data.fillna(data.mean())

Now, let's convert categorical to numerical by using Label Encoding

# Import label encoder
from sklearn import preprocessing
 
# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
 
# Encode labels in columns 'Country, Status'.
data['Country']= label_encoder.fit_transform(data['Country'])
data['Status']= label_encoder.fit_transform(data['Status'])
data.head()

Split the data into train and test data and dividing the data dependent and independent features

train, test = train_test_split(data, test_size=0.2)
train_x = train.loc[:, train.columns != "Life expectancy "]
test_x = test.loc[:, test.columns != "Life expectancy "]
train_y = train["Life expectancy "]
test_y = test["Life expectancy "]

Now, build the model with Random Forest Regressor

model = RandomForestRegressor(max_depth=6, random_state=0, n_estimators=10)
model.fit(train_x, train_y)

prediction of the model

test_pred = model.predict(test_x)
print(mean_squared_error(test_y, test_pred))
print(r2_score(test_y, test_pred)*100)

Interpret model predictions with LIME. We need to import lime package

import lime
import lime.lime_tabular

Create the explainer. Lime has one explainer for all the models

explainer = lime.lime_tabular.LimeTabularExplainer(train_x.values, feature_names=train_x.columns.values.tolist(),verbose=True, mode='regression')

Here, I will choose 2 instances and use them to explain the predictions

Select 5th instance

# Choose the 5th instance and use it to predict the results
j = 5
exp = explainer.explain_instance(test_x.values[j], model.predict, num_features=6)
# Show the predictions
exp.show_in_notebook(show_table=True)

Select 10th instance

# Choose the 10th instance and use it to predict the results
j = 10
exp = explainer.explain_instance(test_x.values[j], model.predict, num_features=6)
# Show the predictions
exp.show_in_notebook(show_table=True)

LIME creates local surrogate model around the observation to be explained and uses co-efficient of this model to identify the most influencing features due to which particular prediction has been made.

In the 5th instance we can see the features affecting the prediction positively and negatively. Likewise, for the 10th instance the features are different compared to the 5th instance. This shows that for each instance the features affecting the prediction are different.

References

  1. https://towardsdatascience.com/unboxing-the-black-box-using-lime-5c9756366faf
  2. https://www.kaggle.com/code/prashant111/explain-your-model-predictions-with-lime/notebook
  3. https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who

Check this to know more

If you want to know more about Text and Image

AIensured provides solution for Explainability, Metamorphic Testing and Counterfactuals. To check out these functionalities and to know more refer to Link