Introducing LightSHAP

LightSHAP is here – a new, lightweight SHAP implementation for tabular data. While heavily inspired from the famous shap package, it has no dependency on it. LightSHAP simplifies working with dataframes (pandas, polars) and categorical data.

Key Features

  • Tree Models: TreeSHAP wrappers for XGBoost, LightGBM, and CatBoost via explain_tree()
  • Model-Agnostic: Permutation SHAP and Kernel SHAP via explain_any()
  • Visualization: Flexible plots

Highlights of the agnostic explainer:

  1. Exact and sampling versions of permutation SHAP and Kernel SHAP
  2. Sampling versions iterate until convergence, and provide standard errors
  3. Parallel processing via joblib
  4. Supports multi-output models
  5. Supports case weights
  6. Accepts numpy, pandas, and polars input, and categorical features

Some methods of the explanation object:

  • plot.bar(): Feature importance bar plot
  • plot.beeswarm(): Summary beeswarm plot
  • plot.scatter(): Dependence plots
  • plot.waterfall(): Waterfall plot for individual explanations
  • importance(): Returns feature importance values
  • set_X(): Update explanation data, e.g., to replace a numpy array with a DataFrame
  • set_feature_names(): Set or update feature names
  • select_output(): Select a specific output for multi-output models
  • filter(): Subset explanations by condition or indices

Usage

Let’s demonstrate the two workhorses explain_tree() and explain_any() with small examples.

Prepare diamonds data

import catboost
import numpy as np
import seaborn as sns
import statsmodels.formula.api as smf

# pip install lightshap
from lightshap import explain_any, explain_tree

# Prepare data
df0 = sns.load_dataset("diamonds")

df = df0.assign(
    log_carat=lambda x: np.log(x.carat),
    log_price=lambda x: np.log(x.price),
)

# Features only
X = df[["log_carat", "clarity", "color", "cut"]]

Fit and explain boosted trees model

Let’s (naively) build a small CatBoost model and explain ot using a sample of 1000 observations.

# Fit naively without validation strategy for simplicity
gbt = catboost.CatBoostRegressor(
    iterations=100, depth=4, cat_features=["clarity", "color", "cut"], verbose=0
)
_ = gbt.fit(X, y=df.log_price)

# SHAP analysis
X_explain = X.sample(1000, random_state=0)
gbt_explanation = explain_tree(gbt, X_explain)

gbt_explanation.plot.bar()
gbt_explanation.plot.beeswarm()
gbt_explanation.plot.scatter(sharey=False)
gbt_explanation.plot.waterfall(row_id=0)
Figure 1: SHAP importance bar plot for the CatBoost model
Figure 2: SHAP beeswarm plot for the CatBoost model
Figure 3: SHAP dependence plots for the CatBoost model
Figure 4: Explaining an individual prediction via SHAP waterfall plot for the CatBoost model

Fit and explain any model

To demonstate the model agnostic SHAP cruncher explain_any(), let’s fit a linear regression model with interactions and natural cubic spline.

lm = smf.ols("log_price ~ cr(log_carat, df=4) + clarity * color + cut", data=df)
lm = lm.fit()

# SHAP analysis - automatically picking exact permutation SHAP
# due to the small number of features
X_explain = X.sample(1000, random_state=0)
lm_explanation = explain_any(lm.predict, X_explain)  # 5s on laptop

lm_explanation.plot.bar()
lm_explanation.plot.beeswarm()
lm_explanation.plot.scatter(sharey=False)
lm_explanation.plot.waterfall(row_id=0)
Figure 5: SHAP importance plot for the linear regression
Figure 6: SHAP beeswarm plot for the linear regression
Figure 7: SHAP dependence plots for the linear regression
Figure 8: SHAP waterfall plot to explain a single prediction of the linear regression

How to contribute?

  1. Test, test, test: The more people are using and testing the current beta version of the package, the better it will get.
  2. Open issues: If you see problems or gaps, please open an issue. Then we will discuss if/who will work on this.

Future plans

In its current early stage, the project is still a “one-man show”. While growing, the aim is to move the project to a bigger organisation, e.g., a university.

Jupyter notebook

Comments

2 responses to “Introducing LightSHAP”

  1. Alessandro Mantovani Avatar

    Hi, this is a very interesting and important project. I am happy to help in this project testing the package with my data as well as reporting issues if any.
    Thank you very much.
    Alessandro

    1. Michael Mayer Avatar

      Fantastic, thank you so much!

Leave a Reply

Your email address will not be published. Required fields are marked *