{"id":1905,"date":"2025-05-01T05:00:00","date_gmt":"2025-05-01T03:00:00","guid":{"rendered":"https:\/\/lorentzen.ch\/?p=1905"},"modified":"2025-04-30T21:00:55","modified_gmt":"2025-04-30T19:00:55","slug":"model-diagnostics-statistics-vs-machine-learning","status":"publish","type":"post","link":"https:\/\/lorentzen.ch\/index.php\/2025\/05\/01\/model-diagnostics-statistics-vs-machine-learning\/","title":{"rendered":"Model Diagnostics: Statistics vs Machine Learning"},"content":{"rendered":"\n<p>In this post, we show how different <strong>use cases<\/strong> require different <strong>model diagnostics<\/strong>. In short, we compare (statistical) <strong>inference<\/strong> and <strong>prediction<\/strong>.<\/p>\n\n\n\n<p>As an example, we use a simple linear model for the Munich rent index dataset, which was kindly provided by the authors of <a href=\"https:\/\/doi.org\/10.1007\/978-3-662-63882-8\" data-type=\"link\" data-id=\"https:\/\/doi.org\/10.1007\/978-3-662-63882-8\">Regression &#8211; Models, Methods and Applications 2nd ed. (2021)<\/a>. This dataset contains monthy rents in EUR (<code>rent<\/code>) for about 3000 apartments in Munich, Germany, from 1999. The apartments have several features such as living area (<code>area<\/code>) in squared meters, year of construction (<code>yearc<\/code>), quality of location (<code>location<\/code>, 0: average, 1: good, 2: top), quality of bath rooms (<code>bath<\/code>, 0:standard, 1: premium), quality of kitchen (<code>kitchen<\/code>, 0: standard, 1: premium), indicator for central heating (<code>cheating<\/code>).<\/p>\n\n\n\n<p>The target variable is <code><code><span class=\"katex-eq\" data-katex-display=\"false\">Y=\\text{rent}<\/span><\/code><\/code> and the goal of our model is to predict the mean rent, <code><code><span class=\"katex-eq\" data-katex-display=\"false\">E[Y]<\/span><\/code><\/code> (we omit the conditioning on X for brevity).<\/p>\n\n\n\n<p>Disclaimer: Before presenting the use cases, let me clearly state that I am not in the apartment rent business and everything here is merely for the purpose of demonstrating statistical good practice.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"0-inference\">Inference<\/h3>\n\n\n\n<p>The first use case is about inference of the effect of the features. Imagine the point of view of an investor who wants to know whether the installation of a central heating is worth it (financially). To lay the ground on which to base a decision, a statistician must have answers to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What is the <em>effect<\/em> of the variable <code>cheating<\/code> on the rent.<\/li>\n\n\n\n<li>Is this effect statistically significant?<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Prediction<\/h3>\n\n\n\n<p>The second use case is about prediction. This time, we take the point of view of someone looking out for a new apartment to rent. In order to know whether the proposed rent by the landlord is about right or improper (too high), a reference value would be very convenient. One can either ask the neighbors or ask a model to predict the rent of the apartment in question.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Model Fit<\/h2>\n\n\n\n<p>Before answering the above questions and doing some key diagnostics, we must load the data and fit a model. We choose a simple linear model and directly model <code>rent<\/code>.<\/p>\n\n\n\n<p>Notes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>For rent indices as well as house prices, one often log-transforms the target variable before modelling or one uses a log-link and an appropriate loss function (e.g. Gamma deviance).<\/li>\n\n\n\n<li>Our Python version uses <code>GeneralizedLinearRegressor<\/code> from the package <a href=\"https:\/\/glum.readthedocs.io\/\">glum<\/a>. We could as well have chosen other implementations like <a href=\"https:\/\/www.statsmodels.org\/stable\/generated\/statsmodels.regression.linear_model.OLS.html#statsmodels.regression.linear_model.OLS\">statsmodels.regression.linear_model.OLS<\/a>. This way, we have to implement the residual diagnostics ourselves which makes it clear what is plotted.<\/li>\n<\/ul>\n\n\n\n<p>For brevity, we skip imports and data loading. Our model is then fit by:<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-8c981e59-b644-4bb0-8cd3-86a06952cf37\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-8c981e59-b644-4bb0-8cd3-86a06952cf37-tab-0\" aria-controls=\"ub-tabbed-content-8c981e59-b644-4bb0-8cd3-86a06952cf37-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #eeeeee; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-8c981e59-b644-4bb0-8cd3-86a06952cf37-tab-1\" aria-controls=\"ub-tabbed-content-8c981e59-b644-4bb0-8cd3-86a06952cf37-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-8c981e59-b644-4bb0-8cd3-86a06952cf37-panel-0\" aria-labelledby=\"ub-tabbed-content-8c981e59-b644-4bb0-8cd3-86a06952cf37-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"Python\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'>lm = glum.GeneralizedLinearRegressor(\n    alpha=0,\n    drop_first=True,  # this is very important if alpha=0\n    formula=\"bs(area, degree=3, df=4) + yearc\"\n      \t\" + C(location) + C(bath) + C(kitchen) + C(cheating)\"\n)\nlm.fit(X_train, y_train)<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-8c981e59-b644-4bb0-8cd3-86a06952cf37-panel-1\" aria-labelledby=\"ub-tabbed-content-8c981e59-b644-4bb0-8cd3-86a06952cf37-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"R\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>model = lm(\n  formula = rent ~ bs(area, degree = 3, df = 4) + yearc + location + bath + kitchen + cheating,\n  data = df_train\n)<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<h2 class=\"wp-block-heading\">Diagnostics for Inference<\/h2>\n\n\n\n<p>The coefficient table will already tell us the effect of the <code>cheating<\/code> variable. For more involved models like gradient boosted trees or neural nets, one can use partial dependence and shap values to assess the effect of features.<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-f0e7ee03-1654-4365-9827-c7d35472f6bc\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-f0e7ee03-1654-4365-9827-c7d35472f6bc-tab-0\" aria-controls=\"ub-tabbed-content-f0e7ee03-1654-4365-9827-c7d35472f6bc-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #eeeeee; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-f0e7ee03-1654-4365-9827-c7d35472f6bc-tab-1\" aria-controls=\"ub-tabbed-content-f0e7ee03-1654-4365-9827-c7d35472f6bc-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-f0e7ee03-1654-4365-9827-c7d35472f6bc-panel-0\" aria-labelledby=\"ub-tabbed-content-f0e7ee03-1654-4365-9827-c7d35472f6bc-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"Python\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'>lm.coef_table(X_train, y_train)<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-f0e7ee03-1654-4365-9827-c7d35472f6bc-panel-1\" aria-labelledby=\"ub-tabbed-content-f0e7ee03-1654-4365-9827-c7d35472f6bc-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"R\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>summary(model)\nconfint(model)<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><strong>Variable<\/strong><\/th><th><strong>coef<\/strong><\/th><th><strong>se<\/strong><\/th><th><strong>p_value<\/strong><\/th><th><strong>ci_lower<\/strong><\/th><th><strong>ci_upper<\/strong><\/th><\/tr><\/thead><tbody><tr><td>intercept<\/td><td>-3682.5<\/td><td>327.0<\/td><td>0.0<\/td><td>-4323<\/td><td>-3041<\/td><\/tr><tr><td>bs(area, ..)[1]<\/td><td>88.5<\/td><td>31.3<\/td><td>4.6e-03<\/td><td>27<\/td><td>150<\/td><\/tr><tr><td>bs(area,..)[2]<\/td><td>316.8<\/td><td>24.5<\/td><td>0.0<\/td><td>269<\/td><td>365<\/td><\/tr><tr><td>bs(area, ..)[3]<\/td><td>547.7<\/td><td>62.8<\/td><td>0.0<\/td><td>425<\/td><td>671<\/td><\/tr><tr><td>bs(area, ..)[4]<\/td><td>733.7<\/td><td>91.7<\/td><td>1.3e-15<\/td><td>554<\/td><td>913<\/td><\/tr><tr><td>yearc<\/td><td>1.9<\/td><td>0.2<\/td><td>0.0<\/td><td>1.6<\/td><td>2.3<\/td><\/tr><tr><td>C(location)[2]<\/td><td>48.2<\/td><td>5.9<\/td><td>4.4e-16<\/td><td>37<\/td><td>60<\/td><\/tr><tr><td>C(location)[3]<\/td><td>137.9<\/td><td>27.7<\/td><td>6.6e-07<\/td><td>84<\/td><td>192<\/td><\/tr><tr><td>C(bath)[1]<\/td><td>50.0<\/td><td>16.5<\/td><td>2.4e-03<\/td><td>18<\/td><td>82<\/td><\/tr><tr><td>C(kitchen)[1]<\/td><td>98.2<\/td><td>18.5<\/td><td>1.1e-07<\/td><td>62<\/td><td>134<\/td><\/tr><tr><td>C(cheating)[1]<\/td><td>107.8<\/td><td>10.6<\/td><td>0.0<\/td><td>87.0<\/td><td>128.6<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>We see that <em>ceteris paribus<\/em>, meaning all else equal, a central heating increases the monthly rent by about 108 EUR. Not the size of the effect of 108 EUR, but the fact that there is an effect of central heating on the rent seems statistically significant:<br>This is indicated by the very low probability, i.e. p-value, for the null-hypothesis of <code>cheating<\/code> having a coefficient of zero.<br>We also see that the confidence interval with the default confidence level of 95%: [<code>ci_lower<\/code>, <code>ci_upper<\/code>] = [87, 129].<br>This shows the uncertainty of the estimated effect.<\/p>\n\n\n\n<p>For a building with 10 apartments and with an investment horizon of about 10 years, the estimated effect gives roughly a budget of 13000 EUR (range is roughly 10500 to 15500 with 95% confidence).<\/p>\n\n\n\n<p>A good statistician should ask several further questions:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Is the dataset at hand a good representation of the population?<\/li>\n\n\n\n<li>Are there confounders or interaction effects, in particular between <code>cheating<\/code> and other features?<\/li>\n\n\n\n<li>Are the assumptions for the low p-value and the confidence interval of <code>cheating<\/code> valid?<\/li>\n<\/ul>\n\n\n\n<p>Here, we will only address the last question, and even that one only partially. Which assumptions were made? The error term, <code><code><span class=\"katex-eq\" data-katex-display=\"false\">\\epsilon = Y - E[Y]<\/span><\/code><\/code>, should be homoscedastic and Normal distributed. As the error is not observable (because the <em>true model<\/em> for <code><code><span class=\"katex-eq\" data-katex-display=\"false\">E[Y]<\/span><\/code><\/code> is unknown), one replaces <code><code><span class=\"katex-eq\" data-katex-display=\"false\">E[Y]<\/span><\/code><\/code> by the model prediction <code><code><span class=\"katex-eq\" data-katex-display=\"false\">\\hat{E}[Y]<\/span><\/code><\/code>, this gives the residuals, <code><code><span class=\"katex-eq\" data-katex-display=\"false\">\\hat{\\epsilon} = Y - \\hat{E}[Y] = y - \\text{fitted values}<\/span><\/code><\/code>, instead. For homoscedasticity, the residuals should look like white (random) noise. Normality, on the other hand, becomes less of a concern with larger data thanks to the central limit theorem. With about 3000 data points, we are far away from <em>small data<\/em>, but it might still be a good idea to check for normality.<\/p>\n\n\n\n<p>The diagnostic tools to check that are residual and quantile-quatile (QQ) plots.<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-63f892f6-5238-44ac-a49e-0b0ec0982b56\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-63f892f6-5238-44ac-a49e-0b0ec0982b56-tab-0\" aria-controls=\"ub-tabbed-content-63f892f6-5238-44ac-a49e-0b0ec0982b56-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #eeeeee; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-63f892f6-5238-44ac-a49e-0b0ec0982b56-tab-1\" aria-controls=\"ub-tabbed-content-63f892f6-5238-44ac-a49e-0b0ec0982b56-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-63f892f6-5238-44ac-a49e-0b0ec0982b56-panel-0\" aria-labelledby=\"ub-tabbed-content-63f892f6-5238-44ac-a49e-0b0ec0982b56-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"Python\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'># See notebook for a definition of residual_plot.\nimport seaborn as sns\nfig, axes = plt.subplots(ncols=2, figsize=(4.8 * 2.1, 6.4))\nax = residual_plot(model=lm, X=X_train, y=y_train, ax=axes[0])\nsns.kdeplot(\n    x=lm.predict(X_train),\n    y=residuals(lm, X_train, y_train, kind=\"studentized\"),\n    thresh=.02,\n    fill=True,\n    ax=axes[1],\n).set(\n    xlabel=\"fitted\",\n    ylabel=\"studentized residuals\",\n    title=\"Contour Plot of Residuals\",\n)<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-63f892f6-5238-44ac-a49e-0b0ec0982b56-panel-1\" aria-labelledby=\"ub-tabbed-content-63f892f6-5238-44ac-a49e-0b0ec0982b56-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"R\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>autoplot(model, which = c(1, 2))  # from library ggfortify\n# density plot of residuals\nggplot(model, aes(x = .fitted, y = .resid)) + geom_point() +\n  geom_density_2d() + geom_density_2d_filled(alpha = 0.5)<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"850\" height=\"578\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image.png\" alt=\"\" class=\"wp-image-1910\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image.png 850w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-300x204.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-768x522.png 768w\" sizes=\"auto, (max-width: 850px) 100vw, 850px\" \/><figcaption class=\"wp-element-caption\">Residual plots on the training data.<\/figcaption><\/figure>\n\n\n\n<p>The more data points one has the less informative is a scatter plot. Therefore, we put a contour plot on the right.<\/p>\n\n\n\n<p>Visual insights:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>There seems to be a larger variability for larger fitted values. This is a hint that the homoscedasticity might be violated.<\/li>\n\n\n\n<li>The residuals seem to be centered around 0. This is a hint that the model is well calibrated (adequate).<\/li>\n<\/ul>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-3b3e3997-19b3-4929-9efa-27a64d777eae\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-3b3e3997-19b3-4929-9efa-27a64d777eae-tab-0\" aria-controls=\"ub-tabbed-content-3b3e3997-19b3-4929-9efa-27a64d777eae-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #eeeeee; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-3b3e3997-19b3-4929-9efa-27a64d777eae-tab-1\" aria-controls=\"ub-tabbed-content-3b3e3997-19b3-4929-9efa-27a64d777eae-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-3b3e3997-19b3-4929-9efa-27a64d777eae-panel-0\" aria-labelledby=\"ub-tabbed-content-3b3e3997-19b3-4929-9efa-27a64d777eae-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"Python\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'># See notebook for a definition of qq_plot.\nqq_plot(lm, X_train, y_train)<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-3b3e3997-19b3-4929-9efa-27a64d777eae-panel-1\" aria-labelledby=\"ub-tabbed-content-3b3e3997-19b3-4929-9efa-27a64d777eae-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"R\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>autoplot(model, which = 2)<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"565\" height=\"455\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-1.png\" alt=\"\" class=\"wp-image-1911\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-1.png 565w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-1-300x242.png 300w\" sizes=\"auto, (max-width: 565px) 100vw, 565px\" \/><\/figure>\n\n\n\n<p>The QQ-plot shows the quantiles of the theoretical assumed distribution of the residuals on the x-axis and the ordered values of the residuals on the y-axis. In the Python version, we decided to use the studentized residuals because normality of the error implies a student (t) distribution for these residuals.<\/p>\n\n\n\n<p>Concluding remarks:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We might do similar plots on the test sample, but we don&#8217;t necessarily need a test sample to answer the inference questions.<\/li>\n\n\n\n<li>It is good practice to plot the residuals vs each of the features as well.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Diagnostics for Prediction<\/h2>\n\n\n\n<p>If we are only interested in predictions of the mean rent, <code><code><span class=\"katex-eq\" data-katex-display=\"false\">\\hat{E}[Y]<\/span><\/code><\/code>, we don&#8217;t care much about the probability distribution of <code><code><span class=\"katex-eq\" data-katex-display=\"false\">Y<\/span><\/code><\/code>. We just want to know if the predictions are close enough to the real mean of the rent <code><code><span class=\"katex-eq\" data-katex-display=\"false\">E[Y]<\/span><\/code><\/code>. In a similar argument as for the error term and residuals, we have to accept that <code><code><span class=\"katex-eq\" data-katex-display=\"false\">E[Y]<\/span><\/code><\/code> is not observable (it is the quantity that we want to predict). So we have to fall back to the observations of <code><code><span class=\"katex-eq\" data-katex-display=\"false\">Y<\/span><\/code><\/code> in order to judge if our model is well calibrated, i.e., close the the ideal <code><code><span class=\"katex-eq\" data-katex-display=\"false\">E[Y]<\/span><\/code><\/code>.<\/p>\n\n\n\n<p>Very importantly, here we make use of the test sample in all of our diagnostics because <strong>we fear the in-sample bias<\/strong>.<\/p>\n\n\n\n<p>We start simple by a look at the unconditional calibration, that is the average (negative) residual <code><code><span class=\"katex-eq\" data-katex-display=\"false\">\\frac{1}{n}\\sum(\\hat{E}[Y_i]-Y_i)<\/span><\/code><\/code>.<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-2bb67cbf-cc26-4591-9af0-9c9cdce957c5\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-2bb67cbf-cc26-4591-9af0-9c9cdce957c5-tab-0\" aria-controls=\"ub-tabbed-content-2bb67cbf-cc26-4591-9af0-9c9cdce957c5-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #eeeeee; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-2bb67cbf-cc26-4591-9af0-9c9cdce957c5-tab-1\" aria-controls=\"ub-tabbed-content-2bb67cbf-cc26-4591-9af0-9c9cdce957c5-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-2bb67cbf-cc26-4591-9af0-9c9cdce957c5-panel-0\" aria-labelledby=\"ub-tabbed-content-2bb67cbf-cc26-4591-9af0-9c9cdce957c5-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"Python\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'>compute_bias(\n    y_obs=np.concatenate([y_train, y_test]),\n    y_pred=lm.predict(pd.concat([X_train, X_test])),\n    feature=np.array([\"train\"] * X_train.shape[0] + [\"test\"] * X_test.shape[0]),\n)<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-2bb67cbf-cc26-4591-9af0-9c9cdce957c5-panel-1\" aria-labelledby=\"ub-tabbed-content-2bb67cbf-cc26-4591-9af0-9c9cdce957c5-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"R\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>print(paste(\"Train set mean residual:\", mean(resid(model))))\nprint(paste(\"Test set mean residual: \", mean(df_test$rent - predict(model, df_test))))<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>set<\/th><th>mean bias<\/th><th>count<\/th><th>stderr<\/th><th>p-value<\/th><\/tr><\/thead><tbody><tr><td>train<\/td><td>-3.2e-12<\/td><td>2465<\/td><td>2.8<\/td><td>1.0<\/td><\/tr><tr><td>test<\/td><td>2.1<\/td><td>617<\/td><td>5.8<\/td><td>0.72<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>It is no surprise that <code>bias_mean<\/code> in the train set is almost zero.<br>This is the <em>balance property<\/em> of (generalized) linear models (with intercept term). On the test set, however, we detect a small bias of about 2 EUR per apartment on average.<\/p>\n\n\n\n<p>Next, we have a look a reliability diagrams which contain much more information about calibration and bias of a model than the unconditional calibration above. In fact, it assesses auto-calibration, i.e. how well the model uses its own information.<br>An ideal model would lie on the dotted diagonal line.<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-fe967c4a-18bc-4376-9063-0e33a0aed6bb\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-fe967c4a-18bc-4376-9063-0e33a0aed6bb-tab-0\" aria-controls=\"ub-tabbed-content-fe967c4a-18bc-4376-9063-0e33a0aed6bb-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #eeeeee; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-fe967c4a-18bc-4376-9063-0e33a0aed6bb-tab-1\" aria-controls=\"ub-tabbed-content-fe967c4a-18bc-4376-9063-0e33a0aed6bb-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-fe967c4a-18bc-4376-9063-0e33a0aed6bb-panel-0\" aria-labelledby=\"ub-tabbed-content-fe967c4a-18bc-4376-9063-0e33a0aed6bb-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"Python\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'>fig, axes = plt.subplots(ncols=2, figsize=(4.8 * 2.1, 6.4))\nplot_reliability_diagram(y_obs=y_train, y_pred=lm.predict(X_train), n_bootstrap=100, ax=axes[0])\naxes[0].set_title(axes[0].get_title() + f\" train set (n={X_train.shape[0]})\")\nplot_reliability_diagram(y_obs=y_test, y_pred=lm.predict(X_test), n_bootstrap=100, ax=axes[1])\naxes[1].set_title(axes[1].get_title() + f\" test set (n={X_test.shape[0]})\")<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-fe967c4a-18bc-4376-9063-0e33a0aed6bb-panel-1\" aria-labelledby=\"ub-tabbed-content-fe967c4a-18bc-4376-9063-0e33a0aed6bb-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"R\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>iso_train = isoreg(x = model$fitted.values, y = df_train$rent)\niso_test = isoreg(x = predict(model, df_test), y = df_test$rent)\nbind_rows(\n  tibble(set = \"train\", x = iso_train$x[iso_train$ord], y = iso_train$yf),\n  tibble(set = \"test\", x = iso_test$x[iso_test$ord], y = iso_test$yf),\n) |&gt;\n  ggplot(aes(x=x, y=y, color=set)) + geom_line() +\n  geom_abline(intercept = 0, slope = 1, linetype=\"dashed\") +\n  ggtitle(\"Reliability Diagram\")<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"866\" height=\"578\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-2.png\" alt=\"\" class=\"wp-image-1915\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-2.png 866w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-2-300x200.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-2-768x513.png 768w\" sizes=\"auto, (max-width: 866px) 100vw, 866px\" \/><\/figure>\n\n\n\n<p>Visual insights:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Graphs on train and test set look very similar.<br>The larger uncertainty intervals on the test set stem from the fact that is has a smaller sample size.<\/li>\n\n\n\n<li>The model seems to lie around the diagonal indicating good auto-calibration for the largest part of the range.<\/li>\n\n\n\n<li>Very high predicted values seem to be systematically too low, i.e. the graph is above the diagonal.<\/li>\n<\/ul>\n\n\n\n<p>Finally, we assess conditional calibration, i.e. the calibration with respect to the features. Therefore, we plot one of our favorite graphs for each feature. It consists of:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>average observed value of <code><code><span class=\"katex-eq\" data-katex-display=\"false\">Y<\/span><\/code><\/code> for each (binned) value of the feature<\/li>\n\n\n\n<li>average predicted value<\/li>\n\n\n\n<li>partial dependence<\/li>\n\n\n\n<li>histogram of the feature (grey, right y-axis)<\/li>\n<\/ul>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-1bb37af4-106c-4c46-9aef-ca5cb222373f\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-1bb37af4-106c-4c46-9aef-ca5cb222373f-tab-0\" aria-controls=\"ub-tabbed-content-1bb37af4-106c-4c46-9aef-ca5cb222373f-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #eeeeee; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-1bb37af4-106c-4c46-9aef-ca5cb222373f-tab-1\" aria-controls=\"ub-tabbed-content-1bb37af4-106c-4c46-9aef-ca5cb222373f-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-1bb37af4-106c-4c46-9aef-ca5cb222373f-panel-0\" aria-labelledby=\"ub-tabbed-content-1bb37af4-106c-4c46-9aef-ca5cb222373f-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"Python\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'>fig, axes = plt.subplots(nrows=5, ncols=2, figsize=(12, 5*4), sharey=True)\nfor i, col in enumerate([\"area\", \"yearc\", \"bath\", \"kitchen\", \"cheating\"]):\n    plot_marginal(\n        y_obs=y_train,\n        y_pred=lm.predict(X_train),\n        X=X_train,\n        feature_name=col,\n        predict_function=lm.predict,\n        ax=axes[i][0],\n    )\n    plot_marginal(\n        y_obs=y_test,\n        y_pred=lm.predict(X_test),\n        X=X_test,\n        feature_name=col,\n        predict_function=lm.predict,\n        ax=axes[i][1],\n    )\n    axes[i][0].set_title(\"Train\")\n    axes[i][1].set_title(\"Test\")\n    if i != 0:\n        axes[i][0].get_legend().remove()\n    axes[i][1].get_legend().remove()\nfig.tight_layout()<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-1bb37af4-106c-4c46-9aef-ca5cb222373f-panel-1\" aria-labelledby=\"ub-tabbed-content-1bb37af4-106c-4c46-9aef-ca5cb222373f-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"R\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>xvars = c(\"area\", \"yearc\", \"bath\", \"kitchen\", \"cheating\")\nm_train = feature_effects(model, v = xvars, data = df_train, y = df_train$rent)\nm_test = feature_effects(model, v = xvars, data = df_test, y = df_test$rent)\n\nc(m_train, m_test) |&gt; \n  plot(\n    share_y = \"rows\",\n    ncol = 2,\n    byrow = FALSE,\n    stats = c(\"y_mean\", \"pred_mean\", \"pd\"),\n    subplot_titles = FALSE,\n    # plotly = TRUE,\n    title = \"Left: Train - Right: Test\",\n  )<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"612\" height=\"1024\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-3-612x1024.png\" alt=\"\" class=\"wp-image-1918\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-3-612x1024.png 612w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-3-179x300.png 179w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-3-768x1285.png 768w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-3-918x1536.png 918w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-3.png 1189w\" sizes=\"auto, (max-width: 612px) 100vw, 612px\" \/><\/figure>\n\n\n\n<p>Visual insights:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>On the train set, the categorical features seem to have perfect calibration as average observed equals average predicted. This is again a result of the balance property. On the test set, we see a deviation, especially for the categorical level with smaller sample size. This is a good demonstration why plotting on both train and test set is a good idea.<\/li>\n\n\n\n<li>The numerical features area and year of construction seem fine, but a closer look can&#8217;t hurt.<\/li>\n<\/ul>\n\n\n\n<p>We next perform a bias plot, which is plotting the average difference of predicted minus observed per feature value. The values should be around zero, so we can zoom in on the y-axis.<br>This is very similar to the residual plot, but the information is better condensed for its purpose.<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-b1a71625-3af8-43f0-aaa1-ae92c6fdaeb8\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-b1a71625-3af8-43f0-aaa1-ae92c6fdaeb8-tab-0\" aria-controls=\"ub-tabbed-content-b1a71625-3af8-43f0-aaa1-ae92c6fdaeb8-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #eeeeee; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-b1a71625-3af8-43f0-aaa1-ae92c6fdaeb8-tab-1\" aria-controls=\"ub-tabbed-content-b1a71625-3af8-43f0-aaa1-ae92c6fdaeb8-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #eeeeee; text-align: left; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-b1a71625-3af8-43f0-aaa1-ae92c6fdaeb8-panel-0\" aria-labelledby=\"ub-tabbed-content-b1a71625-3af8-43f0-aaa1-ae92c6fdaeb8-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"Python\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'>fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 2*4), sharey=True)\naxes[0,0].set_ylim(-150, 150)\nfor i, col in enumerate([\"area\", \"yearc\"]):\n    plot_bias(\n        y_obs=y_train,\n        y_pred=lm.predict(X_train),\n        feature=X_train[col],\n        ax=axes[i][0],\n    )\n    plot_bias(\n        y_obs=y_test,\n        y_pred=lm.predict(X_test),\n        feature=X_test[col],\n        ax=axes[i][1],\n    )\n    axes[i][0].set_title(\"Train\")\n    axes[i][1].set_title(\"Test\")\nfig.tight_layout()<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-b1a71625-3af8-43f0-aaa1-ae92c6fdaeb8-panel-1\" aria-labelledby=\"ub-tabbed-content-b1a71625-3af8-43f0-aaa1-ae92c6fdaeb8-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"R\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>c(m_train[c(\"area\", \"yearc\")], m_test[c(\"area\", \"yearc\")]) |&gt; \n  plot(\n    ylim = c(-150, 150),\n    ncol = 2,\n    byrow = FALSE,\n    stats = \"resid_mean\",\n    subplot_titles = FALSE,\n    title = \"Left: Train - Right: Test\",\n    # plotly = TRUE,\n    interval = \"ci\"\n  )<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"682\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-4-1024x682.png\" alt=\"\" class=\"wp-image-1925\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-4-1024x682.png 1024w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-4-300x200.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-4-768x511.png 768w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2025\/04\/image-4.png 1187w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Visual insights:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>For large values of <code>area<\/code> and <code>yearc<\/code> in the 1940s and 1950s, there are only few observations available. Still, the model might be improved for those regions.<\/li>\n\n\n\n<li>The bias of <code>yearc<\/code> shows a parabolic curve. The simple linear effect in our model seems too simplistic. A refined model could use splines instead, as for <code>area<\/code>.<\/li>\n<\/ul>\n\n\n\n<p>Concluding remarks:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The predictions for <code>area<\/code> larger than around 120 square meters and for year of construction around the 2nd world war are less reliable.<\/li>\n\n\n\n<li>For all the rest, the bias is smaller than 50 EUR on average.<br>This is therefore a rough estimation of the prediction uncertainty.<br>It should be enough to prevent improperly high (or low) rents (on average).<\/li>\n<\/ul>\n\n\n\n<p>The full Python and R code is available under:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Python: <a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2025-05-01%20diagnostics.ipynb\">https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2025-05-01%20diagnostics.ipynb<\/a><\/li>\n\n\n\n<li>R: <a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2025-05-01%20diagnostics.R\">https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2025-05-01%20diagnostics.R<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>In this post, we show how different use cases require different model diagnostics. In short, we compare (statistical) inference and prediction. As an example, we use a simple linear model for the Munich rent index dataset, which was kindly provided by the authors of Regression &#8211; Models, Methods and Applications 2nd ed. (2021). This dataset [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,17,9],"tags":[10,6,5],"class_list":["post-1905","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-programming","category-statistics","tag-lost-in-translation","tag-python","tag-r"],"featured_image_src":null,"author_info":{"display_name":"Christian Lorentzen","author_link":"https:\/\/lorentzen.ch\/index.php\/author\/christian\/"},"_links":{"self":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1905","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/comments?post=1905"}],"version-history":[{"count":13,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1905\/revisions"}],"predecessor-version":[{"id":1926,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1905\/revisions\/1926"}],"wp:attachment":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/media?parent=1905"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/categories?post=1905"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/tags?post=1905"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}