Libraries to Interpret Machine Learning Models

Libraries to Interpret Machine Learning Models
Source Google-Image

Explainable artificial intelligence (XAI) is a set of processes and methods that allows human users to comprehend and trust the results and output created by machine learning algorithms. Explainable AI is used to describe an AI model, its expected impact and potential biases. It helps characterize model accuracy, fairness, transparency and outcomes in AI-powered decision making. [IBM]

Need of Model Interpretation

In the world of machine learning, models are designed to predict outcomes and utilize those predictions to address various issues. This naturally leads to some important questions:

1. Can we have confidence in these predictions?

2. Are they dependable for making significant choices?

Model interpretation shifts our attention from simply knowing the end result to understanding the underlying reasoning behind that result. By delving into the model's decision-making process, we gain insights into what specific factors influence the model's correct or incorrect classification of a given data point.

Importance of Model Interpretation

Source Google-Images

Interpretable Machine Learning (IML) refers to the level at which a human can grasp the decision-making process of machine learning models. The significance of IML arises from the following traits:

1. Bias Mitigation: IML assures that predictions made by machine learning models are impartial and do not exhibit discrimination against specific categories or groups.

2. Data Privacy: Model interpretation guarantees the confidentiality of data, preserving its privacy throughout the process.

3. Robustness: It ensures that the predictions of a machine learning model remain stable and exhibit minimal changes when the input data is slightly altered.

4. Causality Emphasis: IML prioritizes capturing only causal relationships, providing a more accurate understanding of cause-and-effect dynamics within the model.

5. Trust: Individuals find it easier to place trust in interpretable systems as compared to opaque "black box" systems, fostering a stronger sense of belief and confidence.

How to interpret an ML model?

Machine learning models exhibit diverse levels of intricacy and performance, making a singular approach unfit for all cases. Consequently, there exist distinct techniques to comprehend these models. These techniques primarily fall into two categories:

1. Model-Specific vs. Model-Agnostic:

  • Model-Specific methods are tailored to particular models, relying on the internal workings of a model to derive specific insights. Examples include interpreting coefficient weights in Generalized Linear Models (GLMs) or analyzing weights and biases in Neural Networks.
  • Model-Agnostic methods are versatile and can be applied to any model post-training. They analyze the relationship between input and output features without delving into the model's internal mechanisms like weights or assumptions.

2. Local vs. Global Interpretation:

  • Local Interpretation pertains to individual predictions, uncovering the rationale behind a specific forecast.
  • Global Interpretation expands beyond isolated data points and encompasses the general behavior of the model as a whole.

Explainability techniques

Source Google-Images


SHAP (SHapley Additive exPlanations) is a framework that explains the output of any model using Shapley values, a game-theoretic approach often used for optimal credit allocation. While this can be used on any black-box model, SHAP can compute more efficiently on specific model classes (like tree ensembles).

This method is a member of the additive feature attribution methods class; feature attribution refers to the fact that the change of an outcome to be explained (e.g., a class probability in a classification problem) with respect to a baseline (e.g., average prediction probability for that class in the training set) can be attributed in different proportions to the model input features.

SHAP can be used both globally and locally.


Local interpretable model-agnostic explanations (LIME) is a method that fits a surrogate glass-box model around the decision space of any black-box model’s prediction. LIME explicitly tries to model the local neighbourhood of any prediction. LIME works by perturbing any individual data point and generating synthetic data which gets evaluated by the black-box system and ultimately used as a training set for the glass-box model.

LIME has been designed to be applied locally.

Permutation Importance

The idea is the following: feature importance can be measured by looking at how much the score (accuracy, F1, R², etc. — any score we’re interested in) decreases when a feature is not available.

To do that one can remove a feature from the dataset, re-train the estimator and check the score.

Of course, permutation importance can only be applied globally.

Partial Dependence Plot

The partial dependence plot (short PDP or PD plot) shows the marginal effect one or two features have on the predicted outcome of a machine learning model. A partial dependence plot can show whether the relationship between the target and a feature is linear, monotonic or more complex.

For a perturbation-based interpretability method, it is relatively quick. PDP assumes independence between the features and can be misleading interpretability-wise when this is not met

As for the previous one, it can only be applied globally.

Morris Sensitivity Analysis

It is a One-step-at-a-time (OAT) global sensitivity analysis where only one input has its level (discretized value) adjusted per run. Relative to other sensitivity analysis algorithms, the Morris method is fast (fewer model executions) but comes at the cost of not being able to differentiate non-linearities with interactions. This is commonly used for screening which inputs are important enough for further analysis.

Again, it can only be applied globally.

Accumulated Local Effects (ALE)

Accumulated Local Effects (ALE) is a method for computing feature effects. The algorithm provides model-agnostic (black box) global explanations for classification and regression models on tabular data. ALE addresses some key shortcomings of Partial Dependence Plots (PDP).

Although counterintuitive, it can only be applied globally.


The idea behind anchors is to explain the behaviour of complex models with high-precision rules called anchors. These anchors are locally sufficient conditions to ensure a certain prediction with a high degree of confidence.

From this, it follows that it can only be applied locally.

Contrastive Explanation Method (CEM)

CEM generates instance-based local black box explanations for classification models in terms of Pertinent Positives (PP) and Pertinent Negatives (PN). Highlights not only what should be minimally and sufficiently present to justify the classification of an input example by a neural network (pertinent positives), but also what should be minimally and necessarily absent (pertinent negatives), in order to form a more complete and well-rounded explanation.

CEM is designed to be applied locally.

Counterfactual Instances

Counterfactual explanations ‘interrogate’ a model to show how much individual feature values would have to change in order to flip the overall prediction. A counterfactual explanation of an outcome or a situation takes the form of “If had not occurred, would not have occurred”. In the context of a machine, a learning classifier would be an instance of interest and would be the label predicted by the model.

Counterfactual Instances is designed to be applied locally.

Integrated Gradients

Integrated Gradients aims to attribute an importance value to each input feature of a machine learning model based on the gradients of the model output with respect to the input. It has many use cases including understanding feature importances, identifying data skew, and debugging model performance.

Integrated Gradients is designed to be applied locally.

Global Interpretation via Recursive Partitioning (GIRP)

A compact binary tree that interprets ML models globally by representing the most important decision rules implicitly contained in the model using a contribution matrix of input variables. To generate the interpretation tree, aunified process recursively partitions the input variable space by maximizing the difference in the average contribution of the split variable between the divided spaces.

As we can guess, GIRP can only be applied globally.


A new approach for finding “prototypes” in an existing machine learning program. A prototype can be thought of as a subset of the data that have a greater influence on the predictive power of the model. The point of a prototype is to say something like, if you removed these data points, the model wouldn’t function as well, so that one can understand what’s driving predictions.

Protodash can only be applied locally.

Scalable Bayesian Rule Lists

Learn from the data and create a decision rule list. They have a logical structure that is a sequence of IF-THEN rules, identical to a decision list or one-sided decision tree.

Scalable Bayesian Rule Lists can be used both globally and locally.

Tree Surrogates

Tree Surrogates are an interpretable model that is trained to approximate the predictions of a black-box model. We can draw conclusions about the black-box model by interpreting the surrogate model. The policy trees are easily human interpretable and provide quantitative predictions of future behaviour.

Tree Surrogates can be used both globally and locally.

Explainable Boosting Machine (EBM)

EBM is an interpretable model developed at Microsoft Research. It uses modern machine learning techniques like bagging, gradient boosting, and automatic interaction detection to breathe new life into traditional GAMs (Generalized Additive Models).

Explainable Boosting Machine (EBM) is a tree-based, cyclic gradient boosting Generalized Additive Model with automatic interaction detection. EBMs are often as accurate as state-of-the-art blackbox models while remaining completely interpretable. Although EBMs are often slower to train than other modern algorithms, EBMs are extremely compact and fast at prediction time.

EBM can be used both globally and locally.

Explainability tools

Source Google-Images


In the year 2019, Seldon Technologies introduced the Alibi package. This Python package is designed to provide explanations for predictions made by machine learning models, assess the confidence in decisions made by these models, and ultimately facilitate in-depth analysis of model performance concerning concept drift and algorithmic bias. The primary aim of this library is to offer a high-quality implementation of explanation techniques (both local and global) for machine learning models. The package encompasses a range of algorithms that furnish localized explanations for predictions generated by machine learning models. These algorithms and their summaries can be found in the table below.

























Kernel SHAP 







Tree SHAP 







Source Gthub

Instance–specific scores such as trust score and linearity measure metric is provided by the tool to calculate model confidence for making a particular decision. The trust score is the ratio between the distance to the nearest class different from the predicted class and the distance to the predicted class. Linearity measure delivers a score quantifying how linear the model is around a test instance. The linearity score measures the model linearity around a test instance by feeding the model linear superpositions of inputs and comparing the output with the linear combination of output from predictions on single inputs. 


It is a unified framework to help users build an IML system for real-world applications. Skater is designed to demystify both global and local learned structures of the ML model and is implemented in Python. It has adopted paradigms of objectoriented and functional programming as deemed necessary to provide scalability and concurrency while keeping code brevity in mind. This library is in the beta phase, and active development is taking place.

Using the Skater package, users can identify the behavior of variable interaction, enhance domain knowledge, and evaluate the behavior of the ML model on a single instance of dataset or a complete data set. The drawback of the Skater tool is that it relies on different packages for implementing its interpretable methods. For example, to execute the LIME method, it relies on different packages such as NumPy and scikit-learn.


H. Nori et al. developed InterpretML, an open-source Python package that incorporates interpretability techniques under one roof for ML. It is easy to use, provides flexibility, and interpretable glass-box models are trained with this kit to describe ML models. InterpretML has interactive dashboards that support data filtering and cohort creation capabilities to help users observe and understand the model performance for different variations of the dataset. Besides, with varied representations, local and global explanations are displayed on the dashboard, and this tool also has debugging capabilities. InterpretML mainly focuses on the interpretation techniques where it helps users understand the reasoning behind the model’s predictions.


ELI5 is a Python bundle which helps users to debug ML classifiers and explain their forecasts. Many inbuilt functions exist in ELI5 that is easy to use . Characteristics of this tool like text highlighting and feature filtering can be reused. It supports model agnostic methods, but the LIME method only supports text classifiers.


EthicalML-XAI library is constructed based on the eight standards for Responsible Machine Learning to explain AI at its core. It contains different instruments for the examination and assessment of data and models. More generally, the XAI library planned to utilize 3-phase machine learning, which includes analysis of data as phase one, evaluation of the ML model as phase two, and monitoring the production as phase three. This Python library is currently in an early stage of development .


This package is implemented in Python, R, and Java languages. It explains individual predictions of any ML classifier by finding high precision rules known as anchors. These anchors predict locally where the changes in the remaining feature values of an instance are not considered. Anchors also address a shortcoming of local explanation method LIME, which proxy the local behavior of the model linearly. There are some limitations of anchor tool, such as potentially conflicting anchors and overly specific anchors. In potentially conflicting anchors, two or more anchors with different predictions may lead to the same test instance. So, choosing an anchor becomes ambiguous. On the other hand, in overly specific anchors, predictions that are near to the boundary of the ML model’s decision function or predictions of rare classes may require specific enough conditions, and thus their anchors may be complex and provide low coverage.. 


This library is based on game theory to explain the predictions of any ML model. SHAP uses the concept of Shapley values to score/rank features of the ML model. Shapley values consider all the possible predictions, for instance, using all the combinations of input variables. Because of this comprehensive approach, SHAP can guarantee properties like consistency and local accuracy, and their results are accurate and reliable. But, SHAP value calculation is time-consuming and expensive as it checks for all the possible combinations. Also, SHAP has a wrapper class implemented in R known as Shapper. SHAP offers various Interpretability ML techniques.


LightGBM is a gradient boosting framework implemented in Python and R languages. It uses tree-based learning algorithms to support different interpretable methods and produces efficient and accurate results. It handles massive data, requires low memory, and provides optimization with the implementation of parallel and GPU learning.

EXtreme Gradient Boosting (XGBoost)

XGBoost is a distributed platform that executes ML algorithms under the gradient boosting framework. It is a lightweight algorithm and supports flexibility. Also, XGBoost provides a parallel boosting mechanism that helps users solve many real-world decision-making problems quickly and accurately. This library is implemented in various programming languages such as Python, R, Julia, Java, C, C++, etc. It supports interpretable techniques is a scalable and distributed framework for ML. The H2O.library uses predictive modeling to give insights to users about the data quickly, and it is supported by different programming languages such as Java, Scala, Python, R, JSON, and Flow Notebook. Interpretable techniques in are described for both regular and time-series experiments. These techniques do not support image type data. Also, it has an interactive dashboard where users can run their experiments and view/observe explanations on the interpretation model tab. This UI gives users the option to view local, global, and clusterspecific explanations.


LIME is a Python and R library that supports explaining individual predictions for text classifiers, images, and tabular data. LIME is fast and can explain any ML algorithm, with multiple classes. The output of LIME (local interpretability) helps individuals to determine which feature changes have a considerable impact on the prediction. The implementation of LIME is not very simple as kernel settings for every application should be changed to get accurate explanations for models’ output. Also, linear models are used to explain the local behavior of the ML model, so this approach might not be advantageous to explain the complex behavior of the ML model. It lacks stability and does not provide consistent explanations.


DALEX package radiates any ML model and helps to explore and explain its complex behavior. Numerous methods present in the DALEX package helps users understand the connection between the features and predicted outcome of the ML model. Implemented interpretable methods assist individuals to test the model at a single instance and dataset level [25]. This tool executed in R language provides useful approaches to compare results across multiple ML models. It does not support multinomial classes.

Statistical Machine Intelligence and Learning Engine (SMILE)

SMILE is a notable library for a wide range of ML tasks. It has efficient memory usage and covers every aspect of ML. It is fast and provides NLP, data visualization, and statistical techniques. SMILE is self-contained and includes only the standard Java library.

AI Explainability 360 

It is a Python package that supports the interpretability and explainability of datasets and provides a flexible, familiar programming interface. This tool kit intended for high-stake applications of ML models contains several algorithms and provides explanations for ML models in different ways [32]. In data explanation, users need to understand the characteristics and features of data before any ML model is applied. Data explanations were provided by implementing DIPVAE and ProtoDash algorithms. On the other hand, in model explanation, there are several ways to explain the ML model, such as local versus global, directly interpretable versus post-hoc explanation, etc. For global directly interpretable models, the algorithms implemented are Boolean Decision Rules and Generalized Linear Rule models. For the Global post hoc explanation, the ProfWeight technique is implemented. For Local directly interpretable models, the Teaching AI to Explain its Decisions (TED) technique is implemented. Local post hoc explanation supports CEM, LIME, SHAP, and ProtoDash methods. Besides, it also supports explainability metrics known as faithfulness and monotonicity.


Glmnet package is fast, implemented in R programming language, and fits a generalized linear model through a probability function known as penalized maximum likelihood. Also, the algorithm fits other regression models, such as poisson, linear, multinomial, and logistic models. This package is also available in other languages such as Python and MATLAB. 


PDP is used for constructing partial dependence plots and individual conditional expectation curves [28]. The PDP implemented in R language is very adaptable in delivering various sorts of PDP, and it has numerous features, for example, a choice to show advancement bars, the choice to alleviate the dangers related to extrapolation, and alternatives to develop PDPs in parallel. The two significant functions implemented by PDP are: a) partial function which is used for computing partial dependence functions and individual conditional expectations from various fitted model objects, and b) plot partial function that builds lattice-based PDP and ICE curves.


The objective of this library is to determine the effect of input features on model prediction for any ML algorithm using PDPs. It supports multi-class and two-variable interaction PDPs. Similar to ICEBox but executed on the Python programming platform. Also, it is a solution for handling complex mutual dependency among features. 


This repository presents the R code for Individual Conditional Expectation (ICE) plots similar to Partial Dependence Plots. ICE curves show the functional relationship between input and output variables of any supervised ML algorithm for individual observations, and users can visualize those graphs and understand the reasoning behind the predicted ML output.


With increasingly complex architectures, model interpretation is something you simply have to do nowadays. 

The tools we have explored in this article are not the only available tools, and there are many ways to make sense of model predictions. Some of these ways might include tools or frameworks, while others might not, and I encourage you to explore them all.

But, if you still want to explore more techniques and  tools to interpret your model. Here is a git-hub repo for you which has all the techniques and tools.

written by - Ankit Mandal