{"id":952,"date":"2022-09-30T10:06:18","date_gmt":"2022-09-30T08:06:18","guid":{"rendered":"https:\/\/lorentzen.ch\/?p=952"},"modified":"2022-09-30T10:06:19","modified_gmt":"2022-09-30T08:06:19","slug":"kernel-shap-in-r-and-python","status":"publish","type":"post","link":"https:\/\/lorentzen.ch\/index.php\/2022\/09\/30\/kernel-shap-in-r-and-python\/","title":{"rendered":"Kernel SHAP in R and Python"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\">Lost in Translation between R and Python 9<\/h1>\n\n\n\n<p>This is the next article in our series <strong>&#8220;Lost in Translation between R and Python&#8221;<\/strong>. The aim of this series is to provide high-quality R <strong>and<\/strong> Python code to achieve some non-trivial tasks. If you are to learn R, check out the R tab below. Similarly, if you are to learn Python, the Python tab will be your friend.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Kernel SHAP<\/h2>\n\n\n\n<p>SHAP is one of the most used model interpretation technique in Machine Learning. It decomposes predictions into additive contributions of the features in a fair way. For tree-based methods, the fast TreeSHAP algorithm exists. For general models, one has to resort to computationally expensive Monte-Carlo sampling or the faster Kernel SHAP algorithm. Kernel SHAP uses a regression trick to get the SHAP values of an observation with a comparably small number of calls to the predict function of the model. Still, it is much slower than TreeSHAP.<\/p>\n\n\n\n<p>Two good references for Kernel SHAP:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Scott M. Lundberg and Su-In Lee. A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems 30, 2017.<\/li><li>Ian Covert and Su-In Lee. Improving KernelSHAP: Practical Shapley Value Estimation Using Linear Regression. Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:3457-3465, 2021.<\/li><\/ol>\n\n\n\n<p>In our <a href=\"https:\/\/lorentzen.ch\/index.php\/2022\/08\/12\/kernel-shap\/\">last post<\/a>, we introduced our new &#8220;kernelshap&#8221; package in R. Since then, the package has been substantially improved, also by the big help of <a href=\"https:\/\/github.com\/dswatson\">David Watson<\/a>: <\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>The package now supports multi-dimensional predictions.<\/li><li>It received a massive speed-up<\/li><li>Additionally, parallel computing can be activated for even faster calculations.<\/li><li>The interface has become more intuitive.<\/li><li>If the number of features is small (up to ten or eleven), it can provide exact Kernel SHAP values just like the <a href=\"https:\/\/github.com\/slundberg\/shap\">reference Python implementation<\/a>.<\/li><li>For a larger number of features, it now uses partly-exact (&#8220;hybrid&#8221;) calculations, very similar to the logic in the Python implementation.<\/li><\/ol>\n\n\n\n<p>With those changes, the R implementation is about to meet the Python version at eye level.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Example with four features <\/h2>\n\n\n\n<p>In the following, we use the diamonds data to fit a linear regression with<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>log(price) as response<\/li><li>log(carat) as numeric feature<\/li><li>clarity, color and cut as categorical features (internally dummy encoded)<\/li><li>interactions between log(carat) and the other three &#8220;C&#8221; variables. Note that the interactions are very weak<\/li><\/ul>\n\n\n\n<p>Then, we calculate SHAP decompositions for about 1000 diamonds (every 53th diamond), using 120 diamonds as background dataset. In this case, both R and Python will use exact calculations based on m=2^4 &#8211; 2 = 14 possible binary on-off vectors (a value of 1 representing a feature value picked from the original observation, a value of 0 a value picked from the background data).<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-4379a9e8-8a98-409a-812a-fe43ad823386\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-4379a9e8-8a98-409a-812a-fe43ad823386-tab-0\" aria-controls=\"ub-tabbed-content-4379a9e8-8a98-409a-812a-fe43ad823386-panel-0\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-4379a9e8-8a98-409a-812a-fe43ad823386-tab-1\" aria-controls=\"ub-tabbed-content-4379a9e8-8a98-409a-812a-fe43ad823386-panel-1\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-4379a9e8-8a98-409a-812a-fe43ad823386-panel-0\" aria-labelledby=\"ub-tabbed-content-4379a9e8-8a98-409a-812a-fe43ad823386-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>library(ggplot2)\nlibrary(kernelshap)\n\n# Turn ordinal factors into unordered\nord &lt;- c(\"clarity\", \"color\", \"cut\")\ndiamonds[, ord] &lt;- lapply(diamonds[ord], factor, ordered = FALSE)\n\n# Fit model\nfit &lt;- lm(log(price) ~ log(carat) * (clarity + color + cut), data = diamonds)\n\n# Subset of 120 diamonds used as background data\nbg_X &lt;- diamonds[seq(1, nrow(diamonds), 450), ]\n\n# Subset of 1018 diamonds to explain\nX_small &lt;- diamonds[seq(1, nrow(diamonds), 53), c(\"carat\", ord)]\n\n# Exact KernelSHAP (5 seconds)\nsystem.time(\n  ks &lt;- kernelshap(fit, X_small, bg_X = bg_X)  \n)\nks\n\n# SHAP values of first 2 observations:\n#          carat     clarity     color        cut\n# [1,] -2.050074 -0.28048747 0.1281222 0.01587382\n# [2,] -2.085838  0.04050415 0.1283010 0.03731644\n\n# Using parallel backend\nlibrary(\"doFuture\")\n\nregisterDoFuture()\nplan(multisession, workers = 2)  # Windows\n# plan(multicore, workers = 2)   # Linux, macOS, Solaris\n\n# 3 seconds on second call\nsystem.time(\n  ks3 &lt;- kernelshap(fit, X_small, bg_X = bg_X, parallel = TRUE)  \n)\n\n# Visualization\nlibrary(shapviz)\n\nsv &lt;- shapviz(ks)\nsv_importance(sv, \"bee\")<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-4379a9e8-8a98-409a-812a-fe43ad823386-panel-1\" aria-labelledby=\"ub-tabbed-content-4379a9e8-8a98-409a-812a-fe43ad823386-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'>import numpy as np\nimport pandas as pd\nfrom plotnine.data import diamonds\nfrom statsmodels.formula.api import ols\nfrom shap import KernelExplainer\n\n# Turn categoricals into integers because, inconveniently, kernel SHAP\n# requires numpy array as input\nord = [\"clarity\", \"color\", \"cut\"]\nx = [\"carat\"] + ord\ndiamonds[ord] = diamonds[ord].apply(lambda x: x.cat.codes)\nX = diamonds[x].to_numpy()\n\n# Fit model with interactions and dummy variables\nfit = ols(\n  \"np.log(price) ~ np.log(carat) * (C(clarity) + C(cut) + C(color))\", \n  data=diamonds\n).fit()\n\n# Background data (120 rows)\nbg_X = X[0:len(X):450]\n\n# Define subset of 1018 diamonds to explain\nX_small = X[0:len(X):53]\n\n# Calculate KernelSHAP values\nks = KernelExplainer(\n  model=lambda X: fit.predict(pd.DataFrame(X, columns=x)), \n  data = bg_X\n)\nsv = ks.shap_values(X_small)  # 74 seconds\nsv[0:2]\n\n# array([[-2.05007406, -0.28048747,  0.12812216,  0.01587382],\n#        [-2.0858379 ,  0.04050415,  0.12830103,  0.03731644]])<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"547\" height=\"470\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/09\/imp.jpeg\" alt=\"\" class=\"wp-image-960\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/09\/imp.jpeg 547w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/09\/imp-300x258.jpeg 300w\" sizes=\"auto, (max-width: 547px) 100vw, 547px\" \/><figcaption>SHAP summary plot (R model)<\/figcaption><\/figure>\n\n\n\n<p><strong>The results match, hurray!<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Example with nine features<\/h2>\n\n\n\n<p>The computation effort of running <strong>exact<\/strong> Kernel SHAP explodes with the number of features. For nine features, the number of relevant on-off vectors is 2^9 &#8211; 2 = 510, i.e. about 36 times larger than with four features.<\/p>\n\n\n\n<p>We now modify above example, adding five additional features to the model. <em>Note that the model structure is completely non-sensical. We just use it to get a feeling about what impact a 36 times larger workload has.<\/em><\/p>\n\n\n\n<p>Besides exact calculations, we use an almost exact hybrid approach for both R and Python, using 126 on-off vectors (p*(p+1) for the exact part and 4p for the sampling part, where p is the number of features), resulting in a significant speed-up both in R and Python.<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-d044df18-8775-4d45-b644-cf1c86c55bd4\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-d044df18-8775-4d45-b644-cf1c86c55bd4-tab-0\" aria-controls=\"ub-tabbed-content-d044df18-8775-4d45-b644-cf1c86c55bd4-panel-0\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-d044df18-8775-4d45-b644-cf1c86c55bd4-tab-1\" aria-controls=\"ub-tabbed-content-d044df18-8775-4d45-b644-cf1c86c55bd4-panel-1\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-d044df18-8775-4d45-b644-cf1c86c55bd4-panel-0\" aria-labelledby=\"ub-tabbed-content-d044df18-8775-4d45-b644-cf1c86c55bd4-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>fit &lt;- lm(\n  log(price) ~ log(carat) * (clarity + color + cut) + x + y + z + table + depth, \n  data = diamonds\n)\n\n# Subset of 1018 diamonds to explain\nX_small &lt;- diamonds[seq(1, nrow(diamonds), 53), setdiff(names(diamonds), \"price\")]\n\n# Exact Kernel SHAP: 61 seconds\nsystem.time(\n  ks &lt;- kernelshap(fit, X_small, bg_X = bg_X, exact = TRUE)  \n)\nks\n#          carat        cut     color     clarity         depth         table          x           y            z\n# [1,] -1.842799 0.01424231 0.1266108 -0.27033874 -0.0007084443  0.0017787647 -0.1720782 0.001330275 -0.006445693\n# [2,] -1.876709 0.03856957 0.1266546  0.03932912 -0.0004202636 -0.0004871776 -0.1739880 0.001397792 -0.006560624\n\n# Default, using an almost exact hybrid algorithm: 17 seconds\nsystem.time(\n  ks &lt;- kernelshap(fit, X_small, bg_X = bg_X, parallel = TRUE)  \n)\n#          carat        cut     color     clarity         depth         table          x           y            z\n# [1,] -1.842799 0.01424231 0.1266108 -0.27033874 -0.0007084443  0.0017787647 -0.1720782 0.001330275 -0.006445693\n# [2,] -1.876709 0.03856957 0.1266546  0.03932912 -0.0004202636 -0.0004871776 -0.1739880 0.001397792 -0.006560624<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-d044df18-8775-4d45-b644-cf1c86c55bd4-panel-1\" aria-labelledby=\"ub-tabbed-content-d044df18-8775-4d45-b644-cf1c86c55bd4-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'>x = [\"carat\"] + ord + [\"table\", \"depth\", \"x\", \"y\", \"z\"]\nX = diamonds[x].to_numpy()\n\n# Fit model with interactions and dummy variables\nfit = ols(\n  \"np.log(price) ~ np.log(carat) * (C(clarity) + C(cut) + C(color)) + table + depth + x + y + z\", \n  data=diamonds\n).fit()\n\n# Background data (120 rows)\nbg_X = X[0:len(X):450]\n\n# Define subset of 1018 diamonds to explain\nX_small = X[0:len(X):53]\n\n# Calculate KernelSHAP values: 12 minutes\nks = KernelExplainer(\n  model=lambda X: fit.predict(pd.DataFrame(X, columns=x)), \n  data = bg_X\n)\nsv = ks.shap_values(X_small)\nsv[0:2]\n# array([[-1.84279897e+00, -2.70338744e-01,  1.26610769e-01,\n#          1.42423108e-02,  1.77876470e-03, -7.08444295e-04,\n#         -1.72078182e-01,  1.33027467e-03, -6.44569296e-03],\n#        [-1.87670887e+00,  3.93291219e-02,  1.26654599e-01,\n#          3.85695742e-02, -4.87177593e-04, -4.20263565e-04,\n#         -1.73988040e-01,  1.39779179e-03, -6.56062359e-03]])\n\n# Now, using a hybrid between exact and sampling: 5 minutes\nsv = ks.shap_values(X_small, nsamples=126)\nsv[0:2]\n# array([[-1.84279897e+00, -2.70338744e-01,  1.26610769e-01,\n#          1.42423108e-02,  1.77876470e-03, -7.08444295e-04,\n#         -1.72078182e-01,  1.33027467e-03, -6.44569296e-03],\n#        [-1.87670887e+00,  3.93291219e-02,  1.26654599e-01,\n#          3.85695742e-02, -4.87177593e-04, -4.20263565e-04,\n#         -1.73988040e-01,  1.39779179e-03, -6.56062359e-03]])<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<p>Again, the results are essentially the same between R and Python, but also between the hybrid algorithm and the exact algorithm. This is interesting, because the hybrid algorithm is significantly faster than the exact one.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Wrap-Up<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>R is catching up with Python&#8217;s superb &#8220;shap&#8221; package.<\/li><li>For two non-trivial linear regressions with interactions, the &#8220;kernelshap&#8221; package in R provides the same output as Python.<\/li><li>The hybrid between exact and sampling KernelSHAP (as implemented in Python and R) offers a very good trade-off between speed and accuracy.<\/li><li><code>kernelshap()<\/code>in R is fast!<\/li><\/ul>\n\n\n\n<p>The Python and R codes can be found here:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2022-09-30%20kernelshap.R\">R co<\/a><a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2022-04-02%20duckdb.R\">de<\/a><\/li><li><a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2022-09-30%20kernelshap.py\">Python code<\/a><\/li><\/ul>\n\n\n\n<p>The examples were run on a Windows notebook with an Intel i7-8650U 4 core CPU.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;R <-> Python&#8221; continued&#8230; Kernel SHAP<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,17,9],"tags":[10,6,5],"class_list":["post-952","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-programming","category-statistics","tag-lost-in-translation","tag-python","tag-r"],"featured_image_src":null,"author_info":{"display_name":"Michael Mayer","author_link":"https:\/\/lorentzen.ch\/index.php\/author\/michael\/"},"_links":{"self":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/952","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/comments?post=952"}],"version-history":[{"count":8,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/952\/revisions"}],"predecessor-version":[{"id":963,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/952\/revisions\/963"}],"wp:attachment":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/media?parent=952"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/categories?post=952"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/tags?post=952"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}