{"id":920,"date":"2022-07-11T10:33:41","date_gmt":"2022-07-11T08:33:41","guid":{"rendered":"https:\/\/lorentzen.ch\/?p=920"},"modified":"2022-07-11T10:33:42","modified_gmt":"2022-07-11T08:33:42","slug":"shapviz-goes-h2o","status":"publish","type":"post","link":"https:\/\/lorentzen.ch\/index.php\/2022\/07\/11\/shapviz-goes-h2o\/","title":{"rendered":"shapviz goes H2O"},"content":{"rendered":"\n<p><a href=\"https:\/\/lorentzen.ch\/index.php\/2022\/06\/10\/visualize-shap-values-without-tears\/\">In a recent post<\/a>, I introduced the initial version of the &#8220;<a href=\"https:\/\/CRAN.R-project.org\/package=shapviz\">shapviz&#8221; package<\/a>. Its motto: do one thing, but do it well: visualize SHAP values.<\/p>\n\n\n\n<p>The initial community feedback was very positive, and a couple of things have been improved in version 0.2.0. Here the main changes:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>&#8220;shapviz&#8221; now works with tree-based models of the <a href=\"https:\/\/cran.r-project.org\/package=h2o\"><code>h2o<\/code><\/a> package in R. <\/li><li>Additionally, it wraps the <a href=\"https:\/\/CRAN.R-project.org\/package=shapr\"><code>shapr<\/code><\/a> package, which implements an improved version of Kernel SHAP taking into account feature dependence.<\/li><li>A simple interface to collapse SHAP values of dummy variables was added.<\/li><li>The default importance plot is now a bar plot, instead of the (slower) beeswarm plot. In later releases, the latter might be moved to a separate function <code>sv_summary()<\/code> for consistency with other packages.<\/li><li>Importance plot and dependence plot now work neatly with <code>ggplotly()<\/code>. The other plot types cannot be translated with <code>ggplotly()<\/code> because they use geoms from outside ggplot. At least I do not know how to do this&#8230;<\/li><\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Example<\/h3>\n\n\n\n<p>Let&#8217;s build an H2O gradient boosted trees model to explain diamond prices. Then, we explain the model with our &#8220;shapviz&#8221; package. Note that H2O itself also offers some SHAP plots. &#8220;shapviz&#8221; is directly applied to the fitted H2O model. This means you don&#8217;t have to write a single superfluous line of code.<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-5506f5c9-3e8d-45a5-a10d-bfa438d138c4\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-5506f5c9-3e8d-45a5-a10d-bfa438d138c4-tab-0\" aria-controls=\"ub-tabbed-content-5506f5c9-3e8d-45a5-a10d-bfa438d138c4-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-5506f5c9-3e8d-45a5-a10d-bfa438d138c4-panel-0\" aria-labelledby=\"ub-tabbed-content-5506f5c9-3e8d-45a5-a10d-bfa438d138c4-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>library(shapviz)\nlibrary(tidyverse)\nlibrary(h2o)\n\nh2o.init()\n\nset.seed(1)\n\n# Get rid of that darn ordinals\nord &lt;- c(\"clarity\", \"cut\", \"color\")\ndiamonds[, ord] &lt;- lapply(diamonds[, ord], factor, ordered = FALSE)\n\n# Minimally tuned GBM with 260 trees, determined by early-stopping with CV\ndia_h2o &lt;- as.h2o(diamonds)\nfit &lt;- h2o.gbm(\n  c(\"carat\", \"clarity\", \"color\", \"cut\"),\n  y = \"price\",\n  training_frame = dia_h2o,\n  nfolds = 5,\n  learn_rate = 0.05,\n  max_depth = 4,\n  ntrees = 10000,\n  stopping_rounds = 10,\n  score_each_iteration = TRUE\n)\nfit\n\n# SHAP analysis on about 2000 diamonds\nX_small &lt;- diamonds %&gt;%\n  filter(carat &lt;= 2.5) %&gt;%\n  sample_n(2000) %&gt;%\n  as.h2o()\n\nshp &lt;- shapviz(fit, X_pred = X_small)\n\nsv_importance(shp, show_numbers = TRUE)\nsv_importance(shp, show_numbers = TRUE, kind = \"bee\")\nsv_dependence(shp, \"color\", \"auto\", alpha = 0.5)\nsv_force(shp, row_id = 1)\nsv_waterfall(shp, row_id = 1)<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<h3 class=\"wp-block-heading\">Summary and importance plots<\/h3>\n\n\n\n<p>The SHAP importance and SHAP summary plots clearly show that carat is the  most important variable. On average, it impacts the prediction by 3247 USD. The effect of &#8220;cut&#8221; is much smaller. Its impact on the predictions, on average, is plus or minus 112 USD. <\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"582\" height=\"365\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/bee.jpeg\" alt=\"\" class=\"wp-image-923\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/bee.jpeg 582w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/bee-300x188.jpeg 300w\" sizes=\"auto, (max-width: 582px) 100vw, 582px\" \/><figcaption>SHAP summary plot<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"447\" height=\"365\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/imp.jpeg\" alt=\"\" class=\"wp-image-922\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/imp.jpeg 447w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/imp-300x245.jpeg 300w\" sizes=\"auto, (max-width: 447px) 100vw, 447px\" \/><figcaption>SHAP importance plot<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">SHAP dependence plot<\/h3>\n\n\n\n<p>The SHAP dependence plot shows the effect of &#8220;color&#8221; on the prediction: The better the color (close to &#8220;D&#8221;), the higher the price. Using a correlation based heuristic, the plot selected carat on the color scale to show that the color effect is hightly influenced by carat in the sense that the impact of color increases with larger diamond weight. This clearly makes sense!<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"583\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/dep_color-1024x583.png\" alt=\"\" class=\"wp-image-924\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/dep_color-1024x583.png 1024w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/dep_color-300x171.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/dep_color-768x438.png 768w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/dep_color.png 1276w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Dependence plot for &#8220;color&#8221;<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Waterfall and force plot<\/h2>\n\n\n\n<p>Finally, the waterfall and force plots show how a single prediction is decomposed into contributions from each feature. While this does not tell much about the model itself, it might be helpful to explain what SHAP values are and to debug strange predictions.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"584\" height=\"365\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/waterfall.png\" alt=\"\" class=\"wp-image-926\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/waterfall.png 584w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/waterfall-300x188.png 300w\" sizes=\"auto, (max-width: 584px) 100vw, 584px\" \/><figcaption>Waterfall plot<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"711\" height=\"424\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/force.png\" alt=\"\" class=\"wp-image-925\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/force.png 711w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/07\/force-300x179.png 300w\" sizes=\"auto, (max-width: 711px) 100vw, 711px\" \/><figcaption>Force plot<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Short wrap-up<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Combining &#8220;shapviz&#8221; and H2O is fun. Okay, that one was subjective :-).<\/li><li>Good visualization of ML models is extremely helpful and reassuring.<\/li><\/ul>\n\n\n\n<p>The complete R script can be found <a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2022-07-11_shapviz_h2o.R\">here<\/a>.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The &#8220;shapviz&#8221; package now plays well together with H2O.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,17,9],"tags":[5],"class_list":["post-920","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-programming","category-statistics","tag-r"],"featured_image_src":null,"author_info":{"display_name":"Michael Mayer","author_link":"https:\/\/lorentzen.ch\/index.php\/author\/michael\/"},"_links":{"self":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/920","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/comments?post=920"}],"version-history":[{"count":2,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/920\/revisions"}],"predecessor-version":[{"id":928,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/920\/revisions\/928"}],"wp:attachment":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/media?parent=920"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/categories?post=920"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/tags?post=920"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}