{"id":1061,"date":"2023-01-27T12:18:55","date_gmt":"2023-01-27T11:18:55","guid":{"rendered":"https:\/\/lorentzen.ch\/?p=1061"},"modified":"2023-01-27T12:18:55","modified_gmt":"2023-01-27T11:18:55","slug":"shap-xgboost-tidymodels-love","status":"publish","type":"post","link":"https:\/\/lorentzen.ch\/index.php\/2023\/01\/27\/shap-xgboost-tidymodels-love\/","title":{"rendered":"SHAP + XGBoost + Tidymodels = LOVE"},"content":{"rendered":"\n<p>In this recent <a href=\"https:\/\/lorentzen.ch\/index.php\/2022\/12\/21\/interpret-complex-linear-models-with-shap-within-seconds\/\">post<\/a>, we have explained how to use Kernel SHAP for interpreting complex linear models. As plotting backend, we used our fresh CRAN package &#8220;<a href=\"ttps:\/\/CRAN.R-project.org\/package=shapviz\">shapviz<\/a>&#8220;. <\/p>\n\n\n\n<p>&#8220;shapviz&#8221; has direct connectors to a couple of packages such as XGBoost, LightGBM, H2O, kernelshap, and more. Multiple times people asked me how to combine shapviz when the XGBoost model was fitted with <strong>Tidymodels<\/strong>. The workflow was not 100% clear to me as well, but the answer is actually very simple, thanks to Julia&#8217;s <a href=\"https:\/\/juliasilge.com\/blog\/board-games\/\">post<\/a> where the plots were made with SHAPforxgboost, another cool package for visualization of SHAP values.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Example with shiny diamonds<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Preprocessing<\/h3>\n\n\n\n<p>We first write the data preprocessing recipe and apply it to the data rows that we want to explain. In our case, its 1000 randomly sampled diamonds. <\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-620cf5df-b17b-460e-92cb-319dfbc98577\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-620cf5df-b17b-460e-92cb-319dfbc98577-tab-0\" aria-controls=\"ub-tabbed-content-620cf5df-b17b-460e-92cb-319dfbc98577-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-620cf5df-b17b-460e-92cb-319dfbc98577-panel-0\" aria-labelledby=\"ub-tabbed-content-620cf5df-b17b-460e-92cb-319dfbc98577-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>library(tidyverse)\nlibrary(tidymodels)\nlibrary(shapviz)\n\n# Integer encode factors\ndia_recipe &lt;- diamonds %&gt;%\n  recipe(price ~ carat + cut + clarity + color) %&gt;% \n  step_integer(all_nominal())\n\n# Will explain THIS dataset later\nset.seed(2)\ndia_small &lt;- diamonds[sample(nrow(diamonds), 1000), ]\ndia_small_prep &lt;- bake(\n  prep(dia_recipe), \n  has_role(\"predictor\"),\n  new_data = dia_small, \n  composition = \"matrix\"\n)\nhead(dia_small_prep)\n\n#     carat cut clarity color\n#[1,]  0.57   5       4     4\n#[2,]  1.01   5       2     1\n#[3,]  0.45   1       4     3\n#[4,]  1.04   4       6     5\n#[5,]  0.90   3       6     4\n#[6,]  1.20   3       4     6\n<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<h3 class=\"wp-block-heading\">Step 2: Fit Model<\/h3>\n\n\n\n<p>The next step is to tune and build the model. For simplicity, we skipped the tuning part. Bad, bad \ud83d\ude42<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-cf622f1b-054e-4205-9bff-53272ecef92c\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-cf622f1b-054e-4205-9bff-53272ecef92c-tab-0\" aria-controls=\"ub-tabbed-content-cf622f1b-054e-4205-9bff-53272ecef92c-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-cf622f1b-054e-4205-9bff-53272ecef92c-panel-0\" aria-labelledby=\"ub-tabbed-content-cf622f1b-054e-4205-9bff-53272ecef92c-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'># Just for illustration - in practice needs tuning!\nxgboost_model &lt;- boost_tree(\n  mode = \"regression\",\n  trees = 200,\n  tree_depth = 5,\n  learn_rate = 0.05,\n  engine = \"xgboost\"\n)\n\ndia_wf &lt;- workflow() %&gt;%\n  add_recipe(dia_recipe) %&gt;%\n  add_model(xgboost_model)\n\nfit &lt;- dia_wf %&gt;%\n  fit(diamonds)<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<h3 class=\"wp-block-heading\">Step 3: SHAP Analysis<\/h3>\n\n\n\n<p>We now need to call <code>shapviz()<\/code> on the fitted model. In order to have neat interpretations with the original factor labels, we not only pass the prediction data prepared in Step 1 via <code>bake()<\/code>, but also the original data structure.<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-3348455d-b77c-4506-8cbe-70dbac6efa1c\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-3348455d-b77c-4506-8cbe-70dbac6efa1c-tab-0\" aria-controls=\"ub-tabbed-content-3348455d-b77c-4506-8cbe-70dbac6efa1c-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-3348455d-b77c-4506-8cbe-70dbac6efa1c-panel-0\" aria-labelledby=\"ub-tabbed-content-3348455d-b77c-4506-8cbe-70dbac6efa1c-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>shap &lt;- shapviz(extract_fit_engine(fit), X_pred = dia_small_prep, X = dia_small)\n\nsv_importance(shap, kind = \"both\", show_numbers = TRUE)\nsv_dependence(shap, \"carat\", color_var = \"auto\")\nsv_dependence(shap, \"clarity\", color_var = \"auto\")\nsv_force(shap, row_id = 1)\nsv_waterfall(shap, row_id = 1)\n<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/imp.jpeg\" alt=\"\" class=\"wp-image-1063\" width=\"632\" height=\"510\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/imp.jpeg 440w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/imp-300x242.jpeg 300w\" sizes=\"auto, (max-width: 632px) 100vw, 632px\" \/><figcaption class=\"wp-element-caption\">Variable importance plot overlaid with SHAP summary beeswarms<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/dep_carat.jpeg\" alt=\"\" class=\"wp-image-1064\" width=\"636\" height=\"513\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/dep_carat.jpeg 440w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/dep_carat-300x242.jpeg 300w\" sizes=\"auto, (max-width: 636px) 100vw, 636px\" \/><figcaption class=\"wp-element-caption\">Dependence plot for carat. Note that clarity is shown with original labels, not only integers.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/dep_clarity.jpeg\" alt=\"\" class=\"wp-image-1065\" width=\"630\" height=\"508\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/dep_clarity.jpeg 440w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/dep_clarity-300x242.jpeg 300w\" sizes=\"auto, (max-width: 630px) 100vw, 630px\" \/><figcaption class=\"wp-element-caption\">Dependence plot for clarity. Note again that the x-scale uses the original factor levels, not the integer encoded values.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/force.jpeg\" alt=\"\" class=\"wp-image-1066\" width=\"634\" height=\"512\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/force.jpeg 440w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/force-300x242.jpeg 300w\" sizes=\"auto, (max-width: 634px) 100vw, 634px\" \/><figcaption class=\"wp-element-caption\">Force plot of the first observation<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/waterfall.jpeg\" alt=\"\" class=\"wp-image-1067\" width=\"576\" height=\"465\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/waterfall.jpeg 440w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/waterfall-300x242.jpeg 300w\" sizes=\"auto, (max-width: 576px) 100vw, 576px\" \/><figcaption class=\"wp-element-caption\">Waterfall plot for the first observation<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Summary<\/h3>\n\n\n\n<p>Making SHAP analyses with XGBoost Tidymodels is super easy.<\/p>\n\n\n\n<p>The complete R script can be found <a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2023-01-27%20tidymodels.R\">here<\/a>.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>tidymodels and shapviz to explain XGBoost models<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,17,9],"tags":[5],"class_list":["post-1061","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-programming","category-statistics","tag-r"],"featured_image_src":null,"author_info":{"display_name":"Michael Mayer","author_link":"https:\/\/lorentzen.ch\/index.php\/author\/michael\/"},"_links":{"self":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1061","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/comments?post=1061"}],"version-history":[{"count":4,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1061\/revisions"}],"predecessor-version":[{"id":1072,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1061\/revisions\/1072"}],"wp:attachment":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/media?parent=1061"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/categories?post=1061"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/tags?post=1061"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}