{"id":1996,"date":"2025-10-15T11:43:21","date_gmt":"2025-10-15T09:43:21","guid":{"rendered":"https:\/\/lorentzen.ch\/?p=1996"},"modified":"2025-10-15T12:14:47","modified_gmt":"2025-10-15T10:14:47","slug":"introducing-lightshap","status":"publish","type":"post","link":"https:\/\/lorentzen.ch\/index.php\/2025\/10\/15\/introducing-lightshap\/","title":{"rendered":"Introducing LightSHAP"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"306\" height=\"65\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image.png\" alt=\"\" class=\"wp-image-1997\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image.png 306w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-300x64.png 300w\" sizes=\"auto, (max-width: 306px) 100vw, 306px\" \/><\/figure>\n\n\n\n<p><a href=\"https:\/\/github.com\/mayer79\/LightSHAP\">LightSHAP<\/a> is here &#8211; a new, lightweight SHAP implementation for tabular data. While heavily inspired from the famous <a href=\"https:\/\/github.com\/shap\/shap\">shap<\/a> package, it has no dependency on it. LightSHAP simplifies working with dataframes (pandas, polars) and categorical data.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Key Features<a href=\"https:\/\/github.com\/mayer79\/LightSHAP?tab=readme-ov-file#key-features\"><\/a><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Tree Models<\/strong>: TreeSHAP wrappers for XGBoost, LightGBM, and CatBoost via&nbsp;<code>explain_tree()<\/code><\/li>\n\n\n\n<li><strong>Model-Agnostic<\/strong>: Permutation SHAP and Kernel SHAP via&nbsp;<code>explain_any()<\/code><\/li>\n\n\n\n<li><strong>Visualization<\/strong>: Flexible plots<\/li>\n<\/ul>\n\n\n\n<p><strong>Highlights of the agnostic explainer:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Exact and sampling versions of permutation SHAP and Kernel SHAP<\/li>\n\n\n\n<li>Sampling versions iterate until convergence, and provide standard errors<\/li>\n\n\n\n<li>Parallel processing via joblib<\/li>\n\n\n\n<li>Supports multi-output models<\/li>\n\n\n\n<li>Supports case weights<\/li>\n\n\n\n<li>Accepts numpy, pandas, and polars input, and categorical features<\/li>\n<\/ol>\n\n\n\n<p><strong>Some methods of the explanation object:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>plot.bar()<\/code>: Feature importance bar plot<\/li>\n\n\n\n<li><code>plot.beeswarm()<\/code>: Summary beeswarm plot<\/li>\n\n\n\n<li><code>plot.scatter()<\/code>: Dependence plots<\/li>\n\n\n\n<li><code>plot.waterfall()<\/code>: Waterfall plot for individual explanations<\/li>\n\n\n\n<li><code>importance()<\/code>: Returns feature importance values<\/li>\n\n\n\n<li><code>set_X()<\/code>: Update explanation data, e.g., to replace a numpy array with a DataFrame<\/li>\n\n\n\n<li><code>set_feature_names()<\/code>: Set or update feature names<\/li>\n\n\n\n<li><code>select_output()<\/code>: Select a specific output for multi-output models<\/li>\n\n\n\n<li><code>filter()<\/code>: Subset explanations by condition or indices<\/li>\n\n\n\n<li>&#8230;<\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\">Usage<\/h1>\n\n\n\n<p>Let&#8217;s demonstrate the two workhorses <code>explain_tree()<\/code> and <code>explain_any()<\/code> with small examples.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Prepare diamonds data<\/h2>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;Python&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">import catboost\nimport numpy as np\nimport seaborn as sns\nimport statsmodels.formula.api as smf\n\n# pip install lightshap\nfrom lightshap import explain_any, explain_tree\n\n# Prepare data\ndf0 = sns.load_dataset(&quot;diamonds&quot;)\n\ndf = df0.assign(\n    log_carat=lambda x: np.log(x.carat),\n    log_price=lambda x: np.log(x.price),\n)\n\n# Features only\nX = df[[&quot;log_carat&quot;, &quot;clarity&quot;, &quot;color&quot;, &quot;cut&quot;]]<\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Fit and explain boosted trees model<\/h2>\n\n\n\n<p>Let&#8217;s (naively) build a small CatBoost model and explain ot using a sample of 1000 observations.<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;Python&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\"># Fit naively without validation strategy for simplicity\ngbt = catboost.CatBoostRegressor(\n    iterations=100, depth=4, cat_features=[&quot;clarity&quot;, &quot;color&quot;, &quot;cut&quot;], verbose=0\n)\n_ = gbt.fit(X, y=df.log_price)\n\n# SHAP analysis\nX_explain = X.sample(1000, random_state=0)\ngbt_explanation = explain_tree(gbt, X_explain)\n\ngbt_explanation.plot.bar()\ngbt_explanation.plot.beeswarm()\ngbt_explanation.plot.scatter(sharey=False)\ngbt_explanation.plot.waterfall(row_id=0)<\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"559\" height=\"371\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-1.png\" alt=\"\" class=\"wp-image-1998\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-1.png 559w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-1-300x199.png 300w\" sizes=\"auto, (max-width: 559px) 100vw, 559px\" \/><figcaption class=\"wp-element-caption\">Figure 1: SHAP importance bar plot for the CatBoost model<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"602\" height=\"371\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-2.png\" alt=\"\" class=\"wp-image-1999\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-2.png 602w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-2-300x185.png 300w\" sizes=\"auto, (max-width: 602px) 100vw, 602px\" \/><figcaption class=\"wp-element-caption\">Figure 2: SHAP beeswarm plot for the CatBoost model<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"916\" height=\"690\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-3.png\" alt=\"\" class=\"wp-image-2000\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-3.png 916w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-3-300x226.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-3-768x579.png 768w\" sizes=\"auto, (max-width: 916px) 100vw, 916px\" \/><figcaption class=\"wp-element-caption\">Figure 3: SHAP dependence plots for the CatBoost model<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"718\" height=\"383\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-4.png\" alt=\"\" class=\"wp-image-2001\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-4.png 718w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-4-300x160.png 300w\" sizes=\"auto, (max-width: 718px) 100vw, 718px\" \/><figcaption class=\"wp-element-caption\">Figure 4: Explaining an individual prediction via SHAP waterfall plot for the CatBoost model<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Fit and explain any model<\/h2>\n\n\n\n<p>To demonstate the model agnostic SHAP cruncher <code>explain_any()<\/code>, let&#8217;s fit a linear regression model with interactions and natural cubic spline.<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;Python&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">lm = smf.ols(&quot;log_price ~ cr(log_carat, df=4) + clarity * color + cut&quot;, data=df)\nlm = lm.fit()\n\n# SHAP analysis - automatically picking exact permutation SHAP\n# due to the small number of features\nX_explain = X.sample(1000, random_state=0)\nlm_explanation = explain_any(lm.predict, X_explain)  # 5s on laptop\n\nlm_explanation.plot.bar()\nlm_explanation.plot.beeswarm()\nlm_explanation.plot.scatter(sharey=False)\nlm_explanation.plot.waterfall(row_id=0)<\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"559\" height=\"371\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-5.png\" alt=\"\" class=\"wp-image-2002\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-5.png 559w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-5-300x199.png 300w\" sizes=\"auto, (max-width: 559px) 100vw, 559px\" \/><figcaption class=\"wp-element-caption\">Figure 5: SHAP importance plot for the linear regression<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"602\" height=\"371\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-6.png\" alt=\"\" class=\"wp-image-2003\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-6.png 602w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-6-300x185.png 300w\" sizes=\"auto, (max-width: 602px) 100vw, 602px\" \/><figcaption class=\"wp-element-caption\">Figure 6: SHAP beeswarm plot for the linear regression<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"930\" height=\"690\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-7.png\" alt=\"\" class=\"wp-image-2004\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-7.png 930w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-7-300x223.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-7-768x570.png 768w\" sizes=\"auto, (max-width: 930px) 100vw, 930px\" \/><figcaption class=\"wp-element-caption\">Figure 7: SHAP dependence plots for the linear regression<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"718\" height=\"383\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-8.png\" alt=\"\" class=\"wp-image-2005\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-8.png 718w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/10\/image-8-300x160.png 300w\" sizes=\"auto, (max-width: 718px) 100vw, 718px\" \/><figcaption class=\"wp-element-caption\">Figure 8: SHAP waterfall plot to explain a single prediction of the linear regression<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">How to contribute?<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Test, test, test: The more people are using and testing the current beta version of the package, the better it will get.<\/li>\n\n\n\n<li>Open issues: If you see problems or gaps, please open an <a href=\"https:\/\/github.com\/mayer79\/LightSHAP\/issues\">issue<\/a>. Then we will discuss if\/who will work on this.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Future plans<\/h2>\n\n\n\n<p>In its current early stage, the project is still a &#8220;one-man show&#8221;. While growing, the aim is to move the project to a bigger organisation, e.g., a university.<\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2025-10-15%20intro%20to%20lightshap.ipynb\">Jupyter notebook<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>LightSHAP is a new, lightweight SHAP implementation completely independent of the famous &#8220;shap&#8221; library.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,17,9],"tags":[6],"class_list":["post-1996","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-programming","category-statistics","tag-python"],"featured_image_src":null,"author_info":{"display_name":"Michael Mayer","author_link":"https:\/\/lorentzen.ch\/index.php\/author\/michael\/"},"_links":{"self":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1996","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/comments?post=1996"}],"version-history":[{"count":3,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1996\/revisions"}],"predecessor-version":[{"id":2008,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1996\/revisions\/2008"}],"wp:attachment":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/media?parent=1996"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/categories?post=1996"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/tags?post=1996"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}