{"id":1734,"date":"2024-06-28T16:00:41","date_gmt":"2024-06-28T14:00:41","guid":{"rendered":"https:\/\/lorentzen.ch\/?p=1734"},"modified":"2024-07-04T17:38:36","modified_gmt":"2024-07-04T15:38:36","slug":"shap-values-of-additive-models","status":"publish","type":"post","link":"https:\/\/lorentzen.ch\/index.php\/2024\/06\/28\/shap-values-of-additive-models\/","title":{"rendered":"SHAP Values of Additive Models"},"content":{"rendered":"\n<p>Within only a few years, SHAP (Shapley additive explanations) has emerged as the number 1 way to investigate black-box models. The basic idea is to decompose model predictions into additive contributions of the features in a fair way. Studying decompositions of many predictions allows to derive global properties of the model.<\/p>\n\n\n\n<p><strong>What happens if we apply SHAP algorithms to additive models? Why would this ever make sense?<\/strong><\/p>\n\n\n\n<p>In the spirit of our &#8220;Lost In Translation&#8221; series, we provide both high-quality Python and R code.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The models<\/h2>\n\n\n\n<p>Let&#8217;s build the models using a dataset with three highly correlated covariates and a (deterministic) response.<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-28b51e27-fb40-4132-8665-c5e494e2a40b\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-28b51e27-fb40-4132-8665-c5e494e2a40b-tab-0\" aria-controls=\"ub-tabbed-content-28b51e27-fb40-4132-8665-c5e494e2a40b-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-28b51e27-fb40-4132-8665-c5e494e2a40b-tab-1\" aria-controls=\"ub-tabbed-content-28b51e27-fb40-4132-8665-c5e494e2a40b-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-panel-0\" aria-labelledby=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>library(lightgbm)\nlibrary(kernelshap)\nlibrary(shapviz)\n\n#===================================================================\n# Make small data\n#===================================================================\n\nmake_data &lt;- function(n = 100) {\n  x1 &lt;- seq(0.01, 1, length = n)\n  data.frame(\n    x1 = x1,\n    x2 = log(x1),\n    x3 = x1 &gt; 0.7\n  ) |&gt;\n    transform(y = 1 + 0.2 * x1 + 0.5 * x2 + x3 + sin(2 * pi * x1))\n}\ndf &lt;- make_data()\nhead(df)\ncor(df) |&gt;\n  round(2)\n\n#      x1   x2   x3    y\n# x1 1.00 0.90 0.80 0.46\n# x2 0.90 1.00 0.58 0.58\n# x3 0.80 0.58 1.00 0.51\n# y  0.46 0.58 0.51 1.00\n\n#===================================================================\n# Additive linear model and additive boosted trees\n#===================================================================\n\n# Linear regression\nfit_lm &lt;- lm(y ~ poly(x1, 3) + poly(x2, 3) + x3, data = df)\nsummary(fit_lm)\n\n# Boosted trees\nxvars &lt;- setdiff(colnames(df), \"y\")\nX &lt;- data.matrix(df[xvars])\n\nparams &lt;- list(\n  learning_rate = 0.05,\n  objective = \"mse\",\n  max_depth = 1,\n  colsample_bynode = 0.7\n)\n\nfit_lgb &lt;- lgb.train(\n  params = params,\n  data = lgb.Dataset(X, label = df$y),\n  nrounds = 300\n)<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-panel-1\" aria-labelledby=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'>import numpy as np\nimport lightgbm as lgb\nimport shap\nfrom sklearn.preprocessing import PolynomialFeatures\nfrom sklearn.compose import ColumnTransformer\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.linear_model import LinearRegression\n\n#===================================================================\n# Make small data\n#===================================================================\n\ndef make_data(n=100):\n    x1 = np.linspace(0.01, 1, n)\n    x2 = np.log(x1)\n    x3 = x1 &gt; 0.7\n    X = np.column_stack((x1, x2, x3))\n\n    y = 1 + 0.2 * x1 + 0.5 * x2 + x3 + np.sin(2 * np.pi * x1)\n    \n    return X, y\n\nX, y = make_data()\n\n#===================================================================\n# Additive linear model and additive boosted trees\n#===================================================================\n\n# Linear model with polynomial terms\npoly = PolynomialFeatures(degree=3, include_bias=False)\n\npreprocessor =  ColumnTransformer(\n    transformers=[\n        (\"poly0\", poly, [0]),\n        (\"poly1\", poly, [1]),\n        (\"other\", \"passthrough\", [2]),\n    ]\n)\n\nmodel_lm = Pipeline(\n    steps=[\n        (\"preprocessor\", preprocessor),\n        (\"lm\", LinearRegression()),\n    ]\n)\n_ = model_lm.fit(X, y)\n\n# Boosted trees with single-split trees\nparams = dict(\n    learning_rate=0.05,\n    objective=\"mse\",\n    max_depth=1,\n    colsample_bynode=0.7,\n)\n\nmodel_lgb = lgb.train(\n    params=params,\n    train_set=lgb.Dataset(X, label=y),\n    num_boost_round=300,\n)<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<h2 class=\"wp-block-heading\">SHAP<\/h2>\n\n\n\n<p>For both models, we use exact permutation SHAP and exact Kernel SHAP. Furthermore, the linear model is analyzed with &#8220;additive SHAP&#8221;, and the tree-based model with TreeSHAP.<\/p>\n\n\n\n<p>Do the algorithms provide the same?<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-fada91e1-740f-4df4-8da6-6636caf90564\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-fada91e1-740f-4df4-8da6-6636caf90564-tab-0\" aria-controls=\"ub-tabbed-content-fada91e1-740f-4df4-8da6-6636caf90564-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-fada91e1-740f-4df4-8da6-6636caf90564-tab-1\" aria-controls=\"ub-tabbed-content-fada91e1-740f-4df4-8da6-6636caf90564-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-panel-0\" aria-labelledby=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>system.time({  # 1s\n  shap_lm &lt;- list(\n    add = shapviz(additive_shap(fit_lm, df)),\n    kern = kernelshap(fit_lm, X = df[xvars], bg_X = df),\n    perm = permshap(fit_lm, X = df[xvars], bg_X = df)\n  )\n\n  shap_lgb &lt;- list(\n    tree = shapviz(fit_lgb, X),\n    kern = kernelshap(fit_lgb, X = X, bg_X = X),\n    perm = permshap(fit_lgb, X = X, bg_X = X)\n  )\n})\n\n# Consistent SHAP values for linear regression\nall.equal(shap_lm$add$S, shap_lm$perm$S)\nall.equal(shap_lm$kern$S, shap_lm$perm$S)\n\n# Consistent SHAP values for boosted trees\nall.equal(shap_lgb$lgb_tree$S, shap_lgb$lgb_perm$S)\nall.equal(shap_lgb$lgb_kern$S, shap_lgb$lgb_perm$S)\n\n# Linear coefficient of x3 equals slope of SHAP values\ntail(coef(fit_lm), 1)                # 1.112096\ndiff(range(shap_lm$kern$S[, \"x3\"]))  # 1.112096\n\nsv_dependence(shap_lm$add, xvars)sv_dependence(shap_lm$add, xvars, color_var = NULL)<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-panel-1\" aria-labelledby=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'>shap_lm = {\n    \"add\": shap.Explainer(model_lm.predict, masker=X, algorithm=\"additive\")(X),\n    \"perm\": shap.Explainer(model_lm.predict, masker=X, algorithm=\"exact\")(X),\n    \"kern\": shap.KernelExplainer(model_lm.predict, data=X).shap_values(X),\n}\n\nshap_lgb = {\n    \"tree\": shap.Explainer(model_lgb)(X),\n    \"perm\": shap.Explainer(model_lgb.predict, masker=X, algorithm=\"exact\")(X),\n    \"kern\": shap.KernelExplainer(model_lgb.predict, data=X).shap_values(X),\n}\n\n# Consistency for additive linear regression\neps = 1e-12\nassert np.abs(shap_lm[\"add\"].values - shap_lm[\"perm\"].values).max() &lt; eps\nassert np.abs(shap_lm[\"perm\"].values - shap_lm[\"kern\"]).max() &lt; eps\n\n# Consistency for additive boosted trees\nassert np.abs(shap_lgb[\"tree\"].values - shap_lgb[\"perm\"].values).max() &lt; eps\nassert np.abs(shap_lgb[\"perm\"].values - shap_lgb[\"kern\"]).max() &lt; eps\n\n# Linear effect of last feature in the fitted model\nmodel_lm.named_steps[\"lm\"].coef_[-1]  # 1.112096\n\n# Linear effect of last feature derived from SHAP values (ignore the sign)\nshap_lm[\"perm\"][:, 2].values.ptp()    # 1.112096\n\nshap.plots.scatter(shap_lm[\"add\"])<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"358\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/06\/image-4-1024x358.png\" alt=\"\" class=\"wp-image-1740\" style=\"width:909px;height:auto\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/06\/image-4-1024x358.png 1024w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/06\/image-4-300x105.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/06\/image-4-768x269.png 768w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/06\/image-4.png 1290w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">SHAP dependence plot of the additive linear model and the additive explainer (Python).<\/figcaption><\/figure>\n\n\n\n<p>Yes &#8211; the three algorithms within model provide the same SHAP values. Furthermore, the SHAP values reconstruct the additive components of the features. <\/p>\n\n\n\n<p>Didactically, this is very helpful when introducing SHAP as a method: Pick a white-box and a black-box model and compare their SHAP dependence plots. For the white-box model, you simply see the additive components, while the dependence plots of the black-box model show scatter due to interactions.<\/p>\n\n\n\n<p><strong>Remark: The exact equivalence between algorithms is lost, when<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>there are too many features for exact procedures (~10+ features), and\/or when<\/li>\n\n\n\n<li>the background data of Kernel\/Permutation SHAP does not agree with the training data. This leads to slightly different estimates of the baseline value, which itself influences the calculation of SHAP values.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Final words<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>SHAP algorithms applied to additive models typically give identical results. Slight differences might occur because sampling versions of the algos are used, or a different baseline value is estimated.<\/li>\n\n\n\n<li>The resulting SHAP values describe the additive components.<\/li>\n\n\n\n<li>Didactically, it helps to see SHAP analyses of white-box and black-box models side by side.<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2024-06-28%20additive%20shap.R\">R script<\/a> , <a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2024-06-28%20additive%20shap.ipynb\">Python notebook<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post investigates properties of SHAP values of additive models. <\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,17,9],"tags":[10,6,5],"class_list":["post-1734","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-programming","category-statistics","tag-lost-in-translation","tag-python","tag-r"],"featured_image_src":null,"author_info":{"display_name":"Michael Mayer","author_link":"https:\/\/lorentzen.ch\/index.php\/author\/michael\/"},"_links":{"self":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1734","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/comments?post=1734"}],"version-history":[{"count":2,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1734\/revisions"}],"predecessor-version":[{"id":1743,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1734\/revisions\/1743"}],"wp:attachment":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/media?parent=1734"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/categories?post=1734"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/tags?post=1734"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}