{"id":1391,"date":"2024-01-07T10:37:24","date_gmt":"2024-01-07T09:37:24","guid":{"rendered":"https:\/\/lorentzen.ch\/?p=1391"},"modified":"2024-01-19T09:33:51","modified_gmt":"2024-01-19T08:33:51","slug":"explain-that-tidymodels-blackbox","status":"publish","type":"post","link":"https:\/\/lorentzen.ch\/index.php\/2024\/01\/07\/explain-that-tidymodels-blackbox\/","title":{"rendered":"Explain that tidymodels blackbox!"},"content":{"rendered":"\n<p>Let&#8217;s explain a {tidymodels} random forest by classic explainability methods (permutation importance, partial dependence plots (PDP), Friedman&#8217;s H statistics), and also fancy SHAP.<\/p>\n\n\n\n<p>Disclaimer: {hstats}, {kernelshap} and {shapviz} are three of my own packages.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"458\" height=\"565\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-2.png\" alt=\"\" class=\"wp-image-1394\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-2.png 458w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-2-243x300.png 243w\" sizes=\"auto, (max-width: 458px) 100vw, 458px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Diabetes data<\/h2>\n\n\n\n<p>We will use the <a href=\"https:\/\/www.kaggle.com\/datasets\/iammustafatz\/diabetes-prediction-dataset\">diabetes prediction dataset of Kaggle<\/a> to model diabetes (yes\/no) as a function of six demographic features (age, gender, BMI, hypertension, heart disease, and smoking history). It has 100k rows.<\/p>\n\n\n\n<p>Note: The data additionally contains the typical diabetes indicators HbA1c level and blood glucose level, but we wont use them to avoid potential causality issues, and to gain insights also for people that do not know these values.<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-33ddd962-a96c-4bd9-8465-ee8d6624cd57\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-33ddd962-a96c-4bd9-8465-ee8d6624cd57-tab-0\" aria-controls=\"ub-tabbed-content-33ddd962-a96c-4bd9-8465-ee8d6624cd57-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\"><br><br>R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-61eea646-d523-4e66-9f82-7861e79faf4c-panel-0\" aria-labelledby=\"ub-tabbed-content-61eea646-d523-4e66-9f82-7861e79faf4c-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'># https:\/\/www.kaggle.com\/datasets\/iammustafatz\/diabetes-prediction-dataset\n\nlibrary(tidyverse)\nlibrary(tidymodels)\nlibrary(hstats)\nlibrary(kernelshap)\nlibrary(shapviz)\nlibrary(patchwork)\n\ndf0 &lt;- read.csv(\"diabetes_prediction_dataset.csv\")  # from above Kaggle link\ndim(df0)  # 100000 9\nhead(df0)\n# gender age hypertension heart_disease smoking_history   bmi HbA1c_level blood_glucose_level diabetes\n# Female  80            0             1           never 25.19         6.6                 140        0\n# Female  54            0             0         No Info 27.32         6.6                  80        0\n#   Male  28            0             0           never 27.32         5.7                 158        0\n# Female  36            0             0         current 23.45         5.0                 155        0\n#   Male  76            1             1         current 20.14         4.8                 155        0\n# Female  20            0             0           never 27.32         6.6                  85        0\n\nsummary(df0)\nanyNA(df0)  # FALSE\ntable(df0$smoking_history, useNA = \"ifany\")\n\n# DATA PREPARATION\n\n# Note: tidymodels needs a factor response for classification\ndf1 &lt;- df0 |&gt;\n  transform(\n    y = factor(diabetes, levels = 0:1, labels = c(\"No\", \"Yes\")),\n    female = (gender == \"Female\") * 1,\n    smoking_history = factor(\n      smoking_history, \n      levels = c(\"No Info\", \"never\", \"former\", \"not current\", \"current\", \"ever\")\n    ),\n    bmi = pmin(bmi, 50)\n  )\n\n# UNIVARIATE ANALYSIS\n\nggplot(df1, aes(diabetes)) +\n  geom_bar(fill = \"chartreuse4\")\n\ndf1  |&gt;  \n  select(age, bmi, HbA1c_level, blood_glucose_level) |&gt; \n  pivot_longer(everything()) |&gt; \n  ggplot(aes(value)) +\n  geom_histogram(fill = \"chartreuse4\", bins = 19) +\n  facet_wrap(~ name, scale = \"free_x\")\n\nggplot(df1, aes(smoking_history)) +\n  geom_bar(fill = \"chartreuse4\")\n\ndf1 |&gt; \n  select(heart_disease, hypertension, female) |&gt;\n  pivot_longer(everything()) |&gt; \n  ggplot(aes(name, value)) +\n  stat_summary(fun = mean, geom = \"bar\", fill = \"chartreuse4\") +\n  xlab(element_blank())\n<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"431\" height=\"369\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-7.png\" alt=\"\" class=\"wp-image-1399\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-7.png 431w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-7-300x257.png 300w\" sizes=\"auto, (max-width: 431px) 100vw, 431px\" \/><figcaption class=\"wp-element-caption\">&#8220;yes&#8221; proportion of binary variables (including the response)<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"440\" height=\"357\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-4.png\" alt=\"\" class=\"wp-image-1396\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-4.png 440w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-4-300x243.png 300w\" sizes=\"auto, (max-width: 440px) 100vw, 440px\" \/><figcaption class=\"wp-element-caption\">Distribution of numeric variables<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"436\" height=\"376\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-5.png\" alt=\"\" class=\"wp-image-1397\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-5.png 436w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-5-300x259.png 300w\" sizes=\"auto, (max-width: 436px) 100vw, 436px\" \/><figcaption class=\"wp-element-caption\">Distribution of smoking_history<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Modeling<\/h2>\n\n\n\n<p>Let&#8217;s fit a random forest via tidymodels with {ranger} backend. <\/p>\n\n\n\n<p>We add a predict function <code>pf()<\/code> that outputs only the probability of the &#8220;Yes&#8221; class.<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;r&quot;,&quot;mime&quot;:&quot;text\/x-rsrc&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;R&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;r&quot;}\">set.seed(1)\nix &lt;- initial_split(df1, strata = diabetes, prop = 0.8)\ntrain &lt;- training(ix)\ntest &lt;- testing(ix)\n\nxvars &lt;- c(&quot;age&quot;, &quot;bmi&quot;, &quot;smoking_history&quot;, &quot;heart_disease&quot;, &quot;hypertension&quot;, &quot;female&quot;)\n\nrf_spec &lt;- rand_forest(trees = 500) |&gt; \n  set_mode(&quot;classification&quot;) |&gt; \n  set_engine(&quot;ranger&quot;, num.threads = NULL, seed = 49)\n\nrf_wf &lt;- workflow() |&gt; \n  add_model(rf_spec) |&gt;\n  add_formula(reformulate(xvars, &quot;y&quot;))\n\nmodel &lt;- rf_wf |&gt; \n    fit(train)\n\n# predict() gives No\/Yes columns\npredict(model, head(test), type = &quot;prob&quot;)\n# .pred_No .pred_Yes\n#    0.981    0.0185\n\n# We need to extract only the &quot;Yes&quot; probabilities\npf &lt;- function(m, X) {\n  predict(m, X, type = &quot;prob&quot;)$.pred_Yes\n}\npf(model, head(test))  # 0.01854290 ...\n<\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Classic explanation methods<\/h2>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;r&quot;,&quot;mime&quot;:&quot;text\/x-rsrc&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;R&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;r&quot;}\"># 4 times repeated permutation importance wrt test logloss\nimp &lt;- perm_importance(\n  model, X = test, y = &quot;diabetes&quot;, v = xvars, pred_fun = pf, loss = &quot;logloss&quot;\n)\nplot(imp) +\n  xlab(&quot;Increase in test logloss&quot;)\n\n# Partial dependence of age\npartial_dep(model, v = &quot;age&quot;, train, pred_fun = pf) |&gt; \n  plot()\n\n# All PDP in one patchwork\np &lt;- lapply(xvars, function(x) plot(partial_dep(model, v = x, X = train, pred_fun = pf)))\nwrap_plots(p) &amp;\n  ylim(0, 0.23) &amp;\n  ylab(&quot;Probability&quot;)\n\n# Friedman's H stats\nsystem.time( # 20 s\n  H &lt;- hstats(model, train[xvars], approx = TRUE, pred_fun = pf)\n)\nH  # 15% of prediction variability comes from interactions\nplot(H)\n\n# Stratified PDP of strongest interaction\npartial_dep(model, &quot;age&quot;, BY = &quot;bmi&quot;, X = train, pred_fun = pf) |&gt; \n  plot(show_points = FALSE)<\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Feature importance<\/h3>\n\n\n\n<p><em>Permutation importance<\/em> measures by how much the average test loss (in our case log loss) increases when a feature is shuffled before calculating the losses. We repeat the process four times and also show standard errors.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"430\" height=\"352\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-11.png\" alt=\"\" class=\"wp-image-1403\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-11.png 430w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-11-300x246.png 300w\" sizes=\"auto, (max-width: 430px) 100vw, 430px\" \/><figcaption class=\"wp-element-caption\">Permutation importance: Age and BMI are the two main risk factors.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Main effects<\/h3>\n\n\n\n<p>Main effects are estimated by PDP. They show how the average prediction changes with a feature, keeping every other feature fixed. Using a fixed vertical axis helps to grasp the strenght of the effect.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"404\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-10-1024x404.png\" alt=\"\" class=\"wp-image-1402\" style=\"aspect-ratio:2.5346534653465347;width:1014px;height:auto\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-10-1024x404.png 1024w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-10-300x118.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-10-768x303.png 768w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-10.png 1189w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">PDPs: The diabetes risk tends to increase with age, high (and very low) BMI, presence of heart disease\/hypertension, and it is a bit lower for females and non-smoker.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Interaction strength<\/h3>\n\n\n\n<p>Interaction strength can be measured by Friedman&#8217;s H statistics, see the <a href=\"https:\/\/lorentzen.ch\/index.php\/2023\/08\/01\/its-the-interactions\/\">earlier blog post<\/a>. A specific interaction can then be visualized by a stratified PDP.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"898\" height=\"386\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-9.png\" alt=\"\" class=\"wp-image-1401\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-9.png 898w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-9-300x129.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-9-768x330.png 768w\" sizes=\"auto, (max-width: 898px) 100vw, 898px\" \/><figcaption class=\"wp-element-caption\">Friedman&#8217;s H statistics: Left: BMI and age are the two features with clearly strongest interactions. Right: Their pairwise interaction explains about 10% of their joint effect variability.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"431\" height=\"383\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-8.png\" alt=\"\" class=\"wp-image-1400\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-8.png 431w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-8-300x267.png 300w\" sizes=\"auto, (max-width: 431px) 100vw, 431px\" \/><figcaption class=\"wp-element-caption\">Stratified PDP: The strong interaction between age and BMI is clearly visible. A high BMI makes the age effect on diabetes stronger.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">SHAP<\/h2>\n\n\n\n<p>What insights does a SHAP analysis bring? <\/p>\n\n\n\n<p>We will crunch slow exact permutation SHAP values via <code>kernelshap::permshap()<\/code>. If we had more features, we could switch to <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>kernelshap::kernelshap()<\/code><\/li>\n\n\n\n<li>Brandon Greenwell&#8217;s {fastshap}, or to the  <\/li>\n\n\n\n<li>{treeshap} package of my colleages from TU Warsaw.<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;r&quot;,&quot;mime&quot;:&quot;text\/x-rsrc&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;R&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;r&quot;}\">set.seed(1)\nX_explain &lt;- train[sample(1:nrow(train), 1000), xvars]\nX_background &lt;- train[sample(1:nrow(train), 200), ]\n\nsystem.time(  # 10 minutes\n  shap_values &lt;- permshap(model, X = X_explain, bg_X = X_background, pred_fun = pf)\n)\nshap_values &lt;- shapviz(shap_values)\nshap_values  # 'shapviz' object representing 1000 x 6 SHAP matrix\nsaveRDS(shap_values, file = &quot;shap_values.rds&quot;)\n# shap_values &lt;- readRDS(&quot;shap_values.rds&quot;)\n\nsv_importance(shap_values, show_numbers = TRUE)\nsv_importance(shap_values, kind = &quot;bee&quot;)\nsv_dependence(shap_values, v = xvars) &amp;\n  ylim(-0.14, 0.24) &amp;\n  ylab(&quot;Probability&quot;)<\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">SHAP importance<\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"435\" height=\"375\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-14.png\" alt=\"\" class=\"wp-image-1406\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-14.png 435w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-14-300x259.png 300w\" sizes=\"auto, (max-width: 435px) 100vw, 435px\" \/><figcaption class=\"wp-element-caption\">SHAP importance: On average, the age increases or decreases the diabetes probability by 4.7% etc. In this case, the top three features are the same as in permutation importance.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">SHAP &#8220;summary&#8221; plot<\/h3>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"423\" height=\"381\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-13.png\" alt=\"\" class=\"wp-image-1405\" style=\"aspect-ratio:1.110236220472441;width:540px;height:auto\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-13.png 423w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-13-300x270.png 300w\" sizes=\"auto, (max-width: 423px) 100vw, 423px\" \/><figcaption class=\"wp-element-caption\">SHAP &#8220;summary&#8221; plot: Additionally to the bar plot, we see that higher age, higher BMI, hypertension, smoking, males, and having a heart disease are associated with higher diabetes risk.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">SHAP dependence plots<\/h3>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"375\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-12-1024x375.png\" alt=\"\" class=\"wp-image-1404\" style=\"aspect-ratio:2.7306666666666666;width:1171px;height:auto\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-12-1024x375.png 1024w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-12-300x110.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-12-768x282.png 768w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2024\/01\/image-12.png 1069w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">SHAP dependence plots: We see similar shapes as in the PDPs. Thanks to the vertical scatter, we can, e.g., spot that the BMI effect strongly depends on the age. As in the PDPs, we have selected a common vertical scale to also see the effect strength.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Final words<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>{hstats}, {kernelshap} and {shapviz} can explain any model with XAI methods like permutation importance, PDPs, Friedman&#8217;s H, and SHAP. This, obviously, also includes models developed with {tidymodels}.<\/li>\n\n\n\n<li>They would actually even work for multi-output models, e.g., classification with more than two categories.<\/li>\n\n\n\n<li>Studying a blackbox with XAI methods is always worth the effort, even if the methods have their issues. I.e., an imperfect explanation is still better than no explanation.<\/li>\n\n\n\n<li>Model-agnostic SHAP takes a little bit of time, but it is usually worth the effort.<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2024-01-07%20Explain_tidymodels.R\">The full R script<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post you will learn how to explain a {tidymodels} blackbox with classic XAI and SHAP.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,17,9],"tags":[5],"class_list":["post-1391","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-programming","category-statistics","tag-r"],"featured_image_src":null,"author_info":{"display_name":"Michael Mayer","author_link":"https:\/\/lorentzen.ch\/index.php\/author\/michael\/"},"_links":{"self":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1391","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/comments?post=1391"}],"version-history":[{"count":2,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1391\/revisions"}],"predecessor-version":[{"id":1424,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1391\/revisions\/1424"}],"wp:attachment":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/media?parent=1391"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/categories?post=1391"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/tags?post=1391"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}