{"id":1253,"date":"2023-06-07T16:52:13","date_gmt":"2023-06-07T14:52:13","guid":{"rendered":"https:\/\/lorentzen.ch\/?p=1253"},"modified":"2023-06-07T16:52:13","modified_gmt":"2023-06-07T14:52:13","slug":"geographic-shap","status":"publish","type":"post","link":"https:\/\/lorentzen.ch\/index.php\/2023\/06\/07\/geographic-shap\/","title":{"rendered":"Geographic SHAP"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\">Lost in Translation between R and Python 10<\/h1>\n\n\n\n<p>This is the next article in our series <strong>&#8220;Lost in Translation between R and Python&#8221;<\/strong>. The aim of this series is to provide high-quality R <strong>and<\/strong> Python code to achieve some non-trivial tasks. If you are to learn R, check out the R tab below. Similarly, if you are to learn Python, the Python tab will be your friend.<\/p>\n\n\n\n<p>This post is heavily based on the new {shapviz} <a href=\"https:\/\/modeloriented.github.io\/shapviz\/articles\/geographic.html\">vignette<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Setting<\/h2>\n\n\n\n<p>Besides other features, a model with geographic components contains features like<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>latitude and longitude,<\/li>\n\n\n\n<li>postal code, and\/or<\/li>\n\n\n\n<li>other features that depend on location, e.g., distance to next restaurant.<\/li>\n<\/ul>\n\n\n\n<p>Like any feature, the effect of a single geographic feature can be described using <strong>SHAP dependence<\/strong> plots. However, studying the effect of latitude (or any other location dependent feature) alone is often not very illuminating &#8211; simply due to strong interaction effects and correlations with other geographic features.<\/p>\n\n\n\n<p>That&#8217;s where the <strong>additivity of SHAP<\/strong> values comes into play: The sum of SHAP values of all geographic components represent the total geographic effect, and this sum can be visualized as a heatmap or 3D scatterplot against latitude\/longitude (or any other geographic representation).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A first example <\/h2>\n\n\n\n<p>For illustration, we will use a beautiful house price dataset containing information on about 14&#8217;000 houses sold in 2016 in Miami-Dade County. Some of the columns are as follows:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>SALE_PRC<\/strong>: Sale price in USD: Its logarithm will be our model response.<\/li>\n\n\n\n<li><em>LATITUDE<\/em>, <em>LONGITUDE<\/em>: Coordinates<\/li>\n\n\n\n<li><em>CNTR_DIST<\/em>: Distance to central business district<\/li>\n\n\n\n<li><em>OCEAN_DIST<\/em>: Distance (ft) to the ocean<\/li>\n\n\n\n<li><em>RAIL_DIST<\/em>: Distance (ft) to the next railway track<\/li>\n\n\n\n<li><em>HWY_DIST<\/em>: Distance (ft) to next highway<\/li>\n\n\n\n<li>TOT_LVG_AREA: Living area in square feet<\/li>\n\n\n\n<li>LND_SQFOOT: Land area in square feet<\/li>\n\n\n\n<li>structure_quality: Measure of building quality (1: worst to 5: best)<\/li>\n\n\n\n<li>age: Age of the building in years<\/li>\n<\/ul>\n\n\n\n<p>(Italic features are geographic components.) For more background on this dataset, see <a href=\"https:\/\/www.mdpi.com\/1595920\">Mayer et al [2].<\/a><\/p>\n\n\n\n<p>We will fit an XGBoost model to explain log(price) as a function of lat\/long, size, and quality\/age.<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-tab-0\" aria-controls=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-tab-1\" aria-controls=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-panel-0\" aria-labelledby=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>devtools::install_github(\"ModelOriented\/shapviz\", dependencies = TRUE)\nlibrary(xgboost)\nlibrary(ggplot2)\nlibrary(shapviz)  # Needs development version 0.9.0 from github\n\nhead(miami)\n\nx_coord &lt;- c(\"LATITUDE\", \"LONGITUDE\")\nx_nongeo &lt;- c(\"TOT_LVG_AREA\", \"LND_SQFOOT\", \"structure_quality\", \"age\")\nx &lt;- c(x_coord, x_nongeo)\n\n# Train\/valid split\nset.seed(1)\nix &lt;- sample(nrow(miami), 0.8 * nrow(miami))\nX_train &lt;- data.matrix(miami[ix, x])\nX_valid &lt;- data.matrix(miami[-ix, x])\ny_train &lt;- log(miami$SALE_PRC[ix])\ny_valid &lt;- log(miami$SALE_PRC[-ix])\n\n# Fit XGBoost model with early stopping\ndtrain &lt;- xgb.DMatrix(X_train, label = y_train)\ndvalid &lt;- xgb.DMatrix(X_valid, label = y_valid)\n\nparams &lt;- list(learning_rate = 0.2, objective = \"reg:squarederror\", max_depth = 5)\n\nfit &lt;- xgb.train(\n  params = params, \n  data = dtrain, \n  watchlist = list(valid = dvalid), \n  early_stopping_rounds = 20,\n  nrounds = 1000,\n  callbacks = list(cb.print.evaluation(period = 100))\n)<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-panel-1\" aria-labelledby=\"ub-tabbed-content-2122694c-03da-4101-9853-656777da49eb-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'>%load_ext lab_black\n\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import fetch_openml\n\ndf = fetch_openml(data_id=43093, as_frame=True)\nX, y = df.data, np.log(df.target)\nX.head()\n\n# Data split and model\nfrom sklearn.model_selection import train_test_split\nimport xgboost as xgb\n\nx_coord = [\"LONGITUDE\", \"LATITUDE\"]\nx_nongeo = [\"TOT_LVG_AREA\", \"LND_SQFOOT\", \"structure_quality\", \"age\"]\nx = x_coord + x_nongeo\n\nX_train, X_valid, y_train, y_valid = train_test_split(\n    X[x], y, test_size=0.2, random_state=30\n)\n\n# Fit XGBoost model with early stopping\ndtrain = xgb.DMatrix(X_train, label=y_train)\ndvalid = xgb.DMatrix(X_valid, label=y_valid)\n\nparams = dict(learning_rate=0.2, objective=\"reg:squarederror\", max_depth=5)\n\nfit = xgb.train(\n    params=params,\n    dtrain=dtrain,\n    evals=[(dvalid, \"valid\")],\n    verbose_eval=100,\n    early_stopping_rounds=20,\n    num_boost_round=1000,\n)\n<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<h3 class=\"wp-block-heading\">SHAP dependence plots<\/h3>\n\n\n\n<p>Let&#8217;s first study selected SHAP dependence plots, evaluated on the validation dataset with around 2800 observations. Note that we could as well use the training data for this purpose, but it is a bit large.<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-ebae3a58-69dc-4e2e-81bf-6db87c162b85\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-ebae3a58-69dc-4e2e-81bf-6db87c162b85-tab-0\" aria-controls=\"ub-tabbed-content-ebae3a58-69dc-4e2e-81bf-6db87c162b85-panel-0\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-ebae3a58-69dc-4e2e-81bf-6db87c162b85-tab-1\" aria-controls=\"ub-tabbed-content-ebae3a58-69dc-4e2e-81bf-6db87c162b85-panel-1\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-ebae3a58-69dc-4e2e-81bf-6db87c162b85-panel-0\" aria-labelledby=\"ub-tabbed-content-ebae3a58-69dc-4e2e-81bf-6db87c162b85-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>sv &lt;- shapviz(fit, X_pred = X_valid)\nsv_dependence(\n  sv, \n  v = c(\"TOT_LVG_AREA\", \"structure_quality\", \"LONGITUDE\", \"LATITUDE\"), \n  alpha = 0.2\n)<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-ebae3a58-69dc-4e2e-81bf-6db87c162b85-panel-1\" aria-labelledby=\"ub-tabbed-content-ebae3a58-69dc-4e2e-81bf-6db87c162b85-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'>import shap\n\nxgb_explainer = shap.Explainer(fit)\nshap_values = xgb_explainer(X_valid)\n\nv = [\"TOT_LVG_AREA\", \"structure_quality\", \"LONGITUDE\", \"LATITUDE\"]\nshap.plots.scatter(shap_values[:, v], color=shap_values[:, v])<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-1024x352.png\" alt=\"\" class=\"wp-image-1254\" width=\"688\" height=\"236\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-1024x352.png 1024w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-300x103.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-768x264.png 768w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image.png 1319w\" sizes=\"auto, (max-width: 688px) 100vw, 688px\" \/><figcaption class=\"wp-element-caption\">SHAP dependence plots of selected features (Python output).<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Total coordindate effect<\/h3>\n\n\n\n<p>And now the two-dimensional plot of the sum of SHAP values:<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-c4cce703-b446-48e2-810e-ffd0fa1eaa9e\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-c4cce703-b446-48e2-810e-ffd0fa1eaa9e-tab-0\" aria-controls=\"ub-tabbed-content-c4cce703-b446-48e2-810e-ffd0fa1eaa9e-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-c4cce703-b446-48e2-810e-ffd0fa1eaa9e-tab-1\" aria-controls=\"ub-tabbed-content-c4cce703-b446-48e2-810e-ffd0fa1eaa9e-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-c4cce703-b446-48e2-810e-ffd0fa1eaa9e-panel-0\" aria-labelledby=\"ub-tabbed-content-c4cce703-b446-48e2-810e-ffd0fa1eaa9e-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>sv_dependence2D(sv, x = \"LONGITUDE\", y = \"LATITUDE\") +\n  coord_equal()<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-c4cce703-b446-48e2-810e-ffd0fa1eaa9e-panel-1\" aria-labelledby=\"ub-tabbed-content-c4cce703-b446-48e2-810e-ffd0fa1eaa9e-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'>shap_coord = shap_values[:, x_coord]\nplt.scatter(*list(shap_coord.data.T), c=shap_coord.values.sum(axis=1), s=4)\nax = plt.gca()\nax.set_aspect(\"equal\", adjustable=\"box\")\nplt.colorbar()\nplt.title(\"Total location effect\")\nplt.show()<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"437\" height=\"433\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-1.png\" alt=\"\" class=\"wp-image-1255\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-1.png 437w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-1-300x297.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-1-150x150.png 150w\" sizes=\"auto, (max-width: 437px) 100vw, 437px\" \/><figcaption class=\"wp-element-caption\">Sum of SHAP values on color scale against coordinates (Python output).<\/figcaption><\/figure>\n\n\n\n<p>The last plot gives a good impression on price levels, but note:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Since we have modeled logarithmic prices, the effects are on relative scale (0.1 means about 10% above average).<\/li>\n\n\n\n<li>Due to interaction effects with non-geographic components, the location effects might depend on features like living area. This is not visible in above plot. We will modify the model now to improve this aspect.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Two modifications<\/h2>\n\n\n\n<p>We will now change above model in two ways, not unlike the model in <a href=\"https:\/\/www.mdpi.com\/1595920.\">Mayer et al [2].<\/a><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>We will use additional geographic features like distance to railway track or to the ocean.<\/li>\n\n\n\n<li>We will use interaction constraints to allow only interactions between geographic features.<\/li>\n<\/ol>\n\n\n\n<p>The second step leads to a model that is additive in each non-geographic component and also additive in the combined location effect. According to the technical report of <a href=\"https:\/\/arxiv.org\/abs\/2207.14490\">Mayer<\/a> [1], SHAP dependence plots of additive components in a boosted trees model are shifted versions of corresponding partial dependence plots (evaluated at observed values). This allows a &#8220;Ceteris Paribus&#8221; interpretation of SHAP dependence plots of corresponding components.<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-32d39b1b-7b5b-404f-bf2b-983a6099e6dd\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-32d39b1b-7b5b-404f-bf2b-983a6099e6dd-tab-0\" aria-controls=\"ub-tabbed-content-32d39b1b-7b5b-404f-bf2b-983a6099e6dd-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-32d39b1b-7b5b-404f-bf2b-983a6099e6dd-tab-1\" aria-controls=\"ub-tabbed-content-32d39b1b-7b5b-404f-bf2b-983a6099e6dd-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-32d39b1b-7b5b-404f-bf2b-983a6099e6dd-panel-0\" aria-labelledby=\"ub-tabbed-content-32d39b1b-7b5b-404f-bf2b-983a6099e6dd-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'># Extend the feature set\nmore_geo &lt;- c(\"CNTR_DIST\", \"OCEAN_DIST\", \"RAIL_DIST\", \"HWY_DIST\")\nx2 &lt;- c(x, more_geo)\n\nX_train2 &lt;- data.matrix(miami[ix, x2])\nX_valid2 &lt;- data.matrix(miami[-ix, x2])\n\ndtrain2 &lt;- xgb.DMatrix(X_train2, label = y_train)\ndvalid2 &lt;- xgb.DMatrix(X_valid2, label = y_valid)\n\n# Build interaction constraint vector\nic &lt;- c(\n  list(which(x2 %in% c(x_coord, more_geo)) - 1),\n  as.list(which(x2 %in% x_nongeo) - 1)\n)\n\n# Modify parameters\nparams$interaction_constraints &lt;- ic\n\nfit2 &lt;- xgb.train(\n  params = params, \n  data = dtrain2, \n  watchlist = list(valid = dvalid2), \n  early_stopping_rounds = 20,\n  nrounds = 1000,\n  callbacks = list(cb.print.evaluation(period = 100))\n)\n\n# SHAP analysis\nsv2 &lt;- shapviz(fit2, X_pred = X_valid2)\n\n# Two selected features: Thanks to additivity, structure_quality can be read as \n# Ceteris Paribus\nsv_dependence(sv2, v = c(\"structure_quality\", \"LONGITUDE\"), alpha = 0.2)\n\n# Total geographic effect (Ceteris Paribus thanks to additivity)\nsv_dependence2D(sv2, x = \"LONGITUDE\", y = \"LATITUDE\", add_vars = more_geo) +\n  coord_equal()<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-32d39b1b-7b5b-404f-bf2b-983a6099e6dd-panel-1\" aria-labelledby=\"ub-tabbed-content-32d39b1b-7b5b-404f-bf2b-983a6099e6dd-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'># Extend the feature set\nmore_geo = [\"CNTR_DIST\", \"OCEAN_DIST\", \"RAIL_DIST\", \"HWY_DIST\"]\nx2 = x + more_geo\n\nX_train2, X_valid2 = train_test_split(X[x2], test_size=0.2, random_state=30)\n\ndtrain2 = xgb.DMatrix(X_train2, label=y_train)\ndvalid2 = xgb.DMatrix(X_valid2, label=y_valid)\n\n# Build interaction constraint vector\nic = [x_coord + more_geo, *[[z] for z in x_nongeo]]\n\n# Modify parameters\nparams[\"interaction_constraints\"] = ic\n\nfit2 = xgb.train(\n    params=params,\n    dtrain=dtrain2,\n    evals=[(dvalid2, \"valid\")],\n    verbose_eval=100,\n    early_stopping_rounds=20,\n    num_boost_round=1000,\n)\n\n# SHAP analysis\nxgb_explainer2 = shap.Explainer(fit2)\nshap_values2 = xgb_explainer2(X_valid2)\n\nv = [\"structure_quality\", \"LONGITUDE\"]\nshap.plots.scatter(shap_values2[:, v], color=shap_values2[:, v])\n\n# Total location effect\nshap_coord2 = shap_values2[:, x_coord]\nc = shap_values2[:, x_coord + more_geo].values.sum(axis=1)\nplt.scatter(*list(shap_coord2.data.T), c=c, s=4)\nax = plt.gca()\nax.set_aspect(\"equal\", adjustable=\"box\")\nplt.colorbar()\nplt.title(\"Total location effect\")\nplt.show()<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1014\" height=\"459\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-2.png\" alt=\"\" class=\"wp-image-1256\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-2.png 1014w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-2-300x136.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-2-768x348.png 768w\" sizes=\"auto, (max-width: 1014px) 100vw, 1014px\" \/><figcaption class=\"wp-element-caption\">SHAP dependence plots of an additive feature (structure quality, no vertical scatter per unique feature value) and one of the geographic features (Python output).<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"437\" height=\"433\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-3.png\" alt=\"\" class=\"wp-image-1257\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-3.png 437w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-3-300x297.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/06\/image-3-150x150.png 150w\" sizes=\"auto, (max-width: 437px) 100vw, 437px\" \/><figcaption class=\"wp-element-caption\">Sum of all geographic features (color) against coordinates. There are no interactions to non-geographic features, so the effect can be read Ceteris Paribus (Python output).<\/figcaption><\/figure>\n\n\n\n<p>Again, the resulting total geographic effect looks reasonable. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Wrap-Up<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>SHAP values of all geographic components in a model can be summed up and plotted on the color scale against coordinates (or some other geographic representation). This gives a lightning fast impression of the location effects.<\/li>\n\n\n\n<li>Interaction constraints between geographic and non-geographic features lead to Ceteris Paribus interpretation of <em>total geographic effect<\/em>s.<\/li>\n<\/ul>\n\n\n\n<p>The Python and R notebooks can be found here:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2023-06-07_geographic_shap.Rmd\">R Markdown<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2023-06-07_geographic_shap.ipynb\">Python Notebook<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Mayer, Michael. 2022.&nbsp;\u201cSHAP for Additively Modeled Features in a Boosted Trees Model.\u201d&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2207.14490\">https:\/\/arxiv.org\/abs\/2207.14490<\/a>.<\/li>\n\n\n\n<li>Mayer, Michael, Steven C. Bourassa, Martin Hoesli, and Donato Flavio Scognamiglio. 2022.&nbsp;\u201cMachine Learning Applications to Land and Structure Valuation.\u201d&nbsp;<em>Journal of Risk and Financial Management<\/em>.<\/li>\n<\/ol>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;R <-> Python&#8221; continued&#8230; Geographic SHAP<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,17,9],"tags":[10,6,5],"class_list":["post-1253","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-programming","category-statistics","tag-lost-in-translation","tag-python","tag-r"],"featured_image_src":null,"author_info":{"display_name":"Michael Mayer","author_link":"https:\/\/lorentzen.ch\/index.php\/author\/michael\/"},"_links":{"self":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1253","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/comments?post=1253"}],"version-history":[{"count":4,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1253\/revisions"}],"predecessor-version":[{"id":1735,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1253\/revisions\/1735"}],"wp:attachment":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/media?parent=1253"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/categories?post=1253"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/tags?post=1253"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}