# Model Explainability using SHAP for Regression Problems

Nowadays we are able to solve various problems using AI. We have built a model that can predict the price of a house in an area, a model which can predict the team that is most likely to win the IPL using data, and predict who will qualify for NFL playoffs. In this blog, we are looking at SHAP for regression.

But will any person looking to buy a house trust your 90% + accuracy model or would a person betting for a tournament trust your model no matter how accurate it is.

No, they won't. No one would trust an AI model just because it was almost right every time. The reason is simple these are all Black Box. We can see the input and the resultant output but not what happens in between.

This is where you can see the importance of Model Explainability. Using techniques like LIME, ALE, SHAP, ICE, etc. we can explain why the model came up with the value.

One such explainability technique is SHAP (SHapley Additive exPlanations) which we are going to be covering in this blog.

SHAP or SHAPley Additive exPlanations is a visualization tool that can be used for making a machine learning model more explainable by visualizing its output. It can be used for explaining the prediction of any model by computing the contribution of each feature to the prediction. It is a combination of various tools like lime, SHAPely sampling values, DeepLift, QII, and many more.

One of the main components of the SHAP tool is SHAPley values because using it, SHAP connects optimal credit allocation with local explanations. When we talk about the SHAPley values we can consider them as a method that can tell how to accurately distribute the contribution by the features, among the features.

Let's look at the code of SHAP.

``````# Importing the required libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
``````

Importing the required libraries

``````# Reading the dataset

``````

Now let's go ahead and split the dataset and train a random forest model.

``````# Defining the dependent and the independent features

X = df.iloc[:,:-1]
Y = df.iloc[:,-1]

# Splitting the data set in train and test

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.3, random_state = 0)

# Training a Random Forest model

rfr = RandomForestRegressor()
rfr.fit(X_train,Y_train)
Y_pred = rfr.predict(X_test)

# Printing the Mean Squared Error

print(mean_squared_error(Y_test,Y_pred,squared=False))
``````

Now that we are done building the model our next step is to install SHAP

``````!pip install shap
``````

Since we have a Tree-based model we will be needing the Tree explainer for the explanations.

``````shap_values = shap.TreeExplainer(rfr).shap_values(X_test, check_additivity = False)
shap.summary_plot(shap_values, X_test, feature_names=X.columns)
``````