{"id":1292,"date":"2024-06-10T13:14:24","date_gmt":"2024-06-10T11:14:24","guid":{"rendered":"https:\/\/lorentzen.ch\/?p=1292"},"modified":"2024-06-10T14:06:50","modified_gmt":"2024-06-10T12:06:50","slug":"a-tweedie-trilogy-part-ii-offsets","status":"publish","type":"post","link":"https:\/\/lorentzen.ch\/index.php\/2024\/06\/10\/a-tweedie-trilogy-part-ii-offsets\/","title":{"rendered":"A Tweedie Trilogy \u2014 Part II: Offsets"},"content":{"rendered":"\n<p><strong>TLDR:<\/strong> This second part of the trilogy will have a deeper look at offsets and sample weights of a GLM. Their non-equivalence stems from the mean-variance relationship. This time, we not only have a Poisson frequency but also a Gamma severity model.<\/p>\n\n\n\n<p>This trilogy celebrates the 40th birthday of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Tweedie_distribution\">Tweedie distributions<\/a> in 2024 and highlights some of their very special properties.<\/p>\n\n\n\n<p>See <a href=\"https:\/\/lorentzen.ch\/index.php\/2024\/06\/03\/a-tweedie-trilogy-part-i-frequency-and-aggregration-invariance\/\" data-type=\"link\" data-id=\"https:\/\/lorentzen.ch\/index.php\/2024\/06\/03\/a-tweedie-trilogy-part-i-frequency-and-aggregration-invariance\/\">part I<\/a>.<\/p>\n\n\n<div class=\"wp-block-ub-table-of-contents-block ub_table-of-contents\" id=\"ub_table-of-contents-c35b717c-d201-4871-925b-8dbb8473c034\" data-linktodivider=\"false\" data-showtext=\"show\" data-hidetext=\"hide\" data-scrolltype=\"auto\" data-enablesmoothscroll=\"false\" data-initiallyhideonmobile=\"false\" data-initiallyshow=\"true\"><div class=\"ub_table-of-contents-header-container\" style=\"\">\n\t\t\t<div class=\"ub_table-of-contents-header\" style=\"text-align: left; \">\n\t\t\t\t<div class=\"ub_table-of-contents-title\" style=\"\">Table of Contents<\/div>\n\t\t\t\t\n\t\t\t<\/div>\n\t\t<\/div><div class=\"ub_table-of-contents-extra-container\" style=\"\">\n\t\t\t<div class=\"ub_table-of-contents-container ub_table-of-contents-1-column \">\n\t\t\t\t<ul style=\"\"><li style=\"\"><a href=\"https:\/\/lorentzen.ch\/index.php\/2024\/06\/10\/a-tweedie-trilogy-part-ii-offsets\/#0-from-mean-variance-relation-to-score-equations\" style=\"\">From Mean-Variance Relation to Score Equations<\/a><\/li><li style=\"\"><a href=\"https:\/\/lorentzen.ch\/index.php\/2024\/06\/10\/a-tweedie-trilogy-part-ii-offsets\/#1-offsets-and-sample-weights\" style=\"\">Offsets and Sample Weights<\/a><ul><li style=\"\"><a href=\"https:\/\/lorentzen.ch\/index.php\/2024\/06\/10\/a-tweedie-trilogy-part-ii-offsets\/#2-poisson-glm\" style=\"\">Poisson GLM<\/a><\/li><li style=\"\"><a href=\"https:\/\/lorentzen.ch\/index.php\/2024\/06\/10\/a-tweedie-trilogy-part-ii-offsets\/#3-tweedie-glm\" style=\"\">Tweedie GLM<\/a><\/li><li style=\"\"><a href=\"https:\/\/lorentzen.ch\/index.php\/2024\/06\/10\/a-tweedie-trilogy-part-ii-offsets\/#4-example\" style=\"\">Example<\/a><\/li><\/ul><\/li><li style=\"\"><a href=\"https:\/\/lorentzen.ch\/index.php\/2024\/06\/10\/a-tweedie-trilogy-part-ii-offsets\/#5-outlook\" style=\"\">Outlook<\/a><\/li><\/ul>\n\t\t\t<\/div>\n\t\t<\/div><\/div>\n\n\n<h2 class=\"wp-block-heading\" id=\"0-from-mean-variance-relation-to-score-equations\">From Mean-Variance Relation to Score Equations<\/h2>\n\n\n\n<p>In the part I, we have already introduced the mean-variance relation of a Tweedie random variable <code><span class=\"katex-eq\" data-katex-display=\"false\">Y\\sim Tw_p(\\mu, \\phi)<\/span><\/code> with Tweedie power <code><span class=\"katex-eq\" data-katex-display=\"false\">p<\/span><\/code>, mean <code><span class=\"katex-eq\" data-katex-display=\"false\">\\mu<\/span><\/code> and dispersion parameter <code><span class=\"katex-eq\" data-katex-display=\"false\">\\phi<\/span><\/code>:<\/p>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>\\begin{align*}\n\\operatorname{E}[Y] &amp;= \\mu\n\\\\\n\\operatorname{Var}[Y] &amp;= \\phi \\mu^p = \\phi v(\\mu)\n\\end{align*}<\/pre><\/div>\n\n\n\n<p>with variance function <code><span class=\"katex-eq\" data-katex-display=\"false\">v(\\mu)<\/span><\/code>.<\/p>\n\n\n\n<p>This variance function directly impacts the estimation of GLMs. Assume the task is to estimate the expectation of a random variable<code><span class=\"katex-eq\" data-katex-display=\"false\">Y_i\\sim Tw_p(\\mu_i, \\phi\/w_i)<\/span><\/code>, given observations of the target <code><span class=\"katex-eq\" data-katex-display=\"false\">y_i<\/span><\/code> and of explanatories variables, aka features, <code><span class=\"katex-eq\" data-katex-display=\"false\">x_i\\in R^k<\/span><\/code>. A GLM then assumes a link function<code><span class=\"katex-eq\" data-katex-display=\"false\">g(\\mu_i) = \\sum_{j=1}^k x_{ij}\\beta_j<\/span><\/code>with coefficients <code><span class=\"katex-eq\" data-katex-display=\"false\">\\beta<\/span><\/code> to be estimated via an optimization procedure, of which the first order condition, also called score equation, reads<\/p>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>\\begin{equation*}\n\\sum_i w_i \\frac{y_i - \\mu_i}{v(\\mu_i)g'(\\mu_i)} x_{ij}= 0 \\quad \\forall j =1, \\ldots, k\n\\end{equation*}<\/pre><\/div>\n\n\n\n<p>This shows that the higher the Tweedie power <code><span class=\"katex-eq\" data-katex-display=\"false\">p<\/span><\/code>, entering via <code><span class=\"katex-eq\" data-katex-display=\"false\">v(\\mu)<\/span><\/code> only, the less weight is given to deviations of large values. In other words, higher Tweedie powers result in GLMs that are less and less sensitive to what happens at large (expected) values.<\/p>\n\n\n\n<p>This is also reflected in the deviance loss function. They can be derived from the negative log-likelihood and are given by<\/p>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>\\begin{equation*}\nd_p(y, \\mu) =\n\t2 \\cdot\n\t\\begin{cases}\n\t\t\\frac{\\max(0, y^{2-p})}{(1-p)(2-p)}-\\frac{y\\mu^{1-p}}{1-p}+\\frac{\\mu^{2-p}}{2-p} &amp; p \\in \\mathrm{R}\\setminus (0,1] \\cup \\{2\\} \\\\\n\t\ty\\log\\frac{y}{\\mu} - y + \\mu &amp; p=0 \\\\\n\t\t\\frac{y}{\\mu} - \\log\\frac{y}{\\mu} - 1 &amp; p=2\n\t\\end{cases}\n\\end{equation*}<\/pre><\/div>\n\n\n\n<p>These are the only strictly consistent scoring functions for the expectation (up to one multiplicative and one additive constant) that are homogeneous functions (of degree <code><span class=\"katex-eq\" data-katex-display=\"false\">2-p<\/span><\/code>), see, e.g., <a href=\"https:\/\/arxiv.org\/abs\/2202.12780\">Fissler et al (2022)<\/a>. The Poisson deviance (<code><span class=\"katex-eq\" data-katex-display=\"false\">p=1<\/span><\/code>), for example, has a degree of homogeneity of 1 and the same unit as the target variable. The Gamma deviance (<code><span class=\"katex-eq\" data-katex-display=\"false\">p=2<\/span><\/code>), on the other side, is zero-homogeneous and is completely agnostic to the scale of its arguments. This is another way of stating the above: the higher the Tweedie power the less it cares about large values.<\/p>\n\n\n\n<p>It is also connected to the fact that Tweedie distributions are the only distributions from the exponential dispersion family that are closed under scale transformations:<\/p>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>\\begin{align*}\nY &amp;\\sim Tw_p(\\mu, \\phi) \\\\\ncY &amp;\\sim Tw_p(c\\mu, c^{2-p}\\phi) \\quad \\forall c&gt;0\n\\end{align*}<\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"1-offsets-and-sample-weights\">Offsets and Sample Weights<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"2-poisson-glm\">Poisson GLM<\/h3>\n\n\n\n<p>When estimating counts with a Poisson GLM, there is often an exposure measure like <em>time under consideration<\/em> or underlying <em>number of things<\/em> (insurance policies, trees in a forest, radioactive atoms). One then often finds two different, but equivalent formulations of a Poisson GLM with log-link.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Sample weights: Model frequency <code><span class=\"katex-eq\" data-katex-display=\"false\">y=\\frac{N}{w}<\/span><\/code> and fit with sample weights <code><span class=\"katex-eq\" data-katex-display=\"false\">w<\/span><\/code> to estimate <code><span class=\"katex-eq\" data-katex-display=\"false\">\\operatorname{E}[y] = \\mu_y = \\exp(x \\beta)<\/span><\/code>.<\/li>\n\n\n\n<li>Offsets: Model counts <code><span class=\"katex-eq\" data-katex-display=\"false\">N<\/span><\/code>, but account for the exposure <code><span class=\"katex-eq\" data-katex-display=\"false\">w<\/span><\/code> via an offset as <code><span class=\"katex-eq\" data-katex-display=\"false\">\\operatorname{E}[N]=\\mu_N = \\exp(x \\beta + \\log(w)) = w \\mu_y<\/span><\/code>.<\/li>\n<\/ul>\n\n\n\n<p>Note that each way models a different target, so we had to use subscripts to distinguish the mean parameters <code><code><span class=\"katex-eq\" data-katex-display=\"false\">\\mu<\/span><\/code><\/code>.<\/p>\n\n\n\n<p>In this special case of a Poisson GLM with (canonical) log-link, both models are equivalent and will result in the exact same parameters <code><span class=\"katex-eq\" data-katex-display=\"false\">\\beta<\/span><\/code>. You can plug it into the score equation to convince yourself.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"3-tweedie-glm\">Tweedie GLM<\/h3>\n\n\n<div style=\"background-color: #d9edf7; color: #31708f; border-left-color: #31708f; \" class=\"ub-styled-box ub-notification-box wp-block-ub-styled-box\" id=\"ub-styled-box-e48dc208-f0e4-478b-9567-1108c0efd2a3\">\n<p id=\"ub-styled-box-notification-content-f8297ee4-8be2-4d79-a2df-542afa282249\">Very importantly, this simple equivalence of GLM fomulations with offsets and with sample weights does <strong>only<\/strong> hold for the <strong>Poisson<\/strong> GLM with log-link. <strong>It does not hold for any other<\/strong> Tweedie parameter or even other distributions from the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Exponential_dispersion_model\">exponential dispersion family<\/a>.<\/p>\n\n\n<\/div>\n\n\n<p>One can show that a Tweedie GLM with log-link and offset (additive in link space) <code><code><span class=\"katex-eq\" data-katex-display=\"false\">\\log(u)<\/span><\/code><\/code> on target <code><code><span class=\"katex-eq\" data-katex-display=\"false\">y<\/span><\/code><\/code> with weights <code><code><span class=\"katex-eq\" data-katex-display=\"false\">w<\/span><\/code><\/code> is equivalent to the same Tweedie GLM but with target <code><code><span class=\"katex-eq\" data-katex-display=\"false\">\\frac{y}{u}<\/span><\/code><\/code> and weights <code><code><span class=\"katex-eq\" data-katex-display=\"false\">w u^{2-p}<\/span><\/code><\/code>.<\/p>\n\n\n\n<p>So one can construct an equivalence between unweighted with offsets and weighted without offsets by setting <code><span class=\"katex-eq\" data-katex-display=\"false\">u = \\sqrt[2-p]{w}<\/span><\/code>. But note that this does not work for a Gamma GLM which as <code><span class=\"katex-eq\" data-katex-display=\"false\">p=2<\/span><\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"4-example\">Example<\/h3>\n\n\n\n<p>We continue with the same dataset and model as in part I and show this (non-) equivalence with the offsets.<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">from glum import GeneralizedLinearRegressor\nimport pandas as pd\n\n# ... quite some code ... here we abbreviate.\n# Model frequency with weights (but without offsets)\ny_freq = df[&quot;ClaimNb&quot;] \/ df[&quot;Exposure&quot;]\nw_freq = df[&quot;Exposure&quot;]\nX = df[x_vars]\nglm_params = {\n    &quot;alpha&quot;: 0,\n    &quot;drop_first&quot;: True,\n    &quot;gradient_tol&quot;: 1e-8,\n}\nglm_freq = GeneralizedLinearRegressor(\n    family=&quot;poisson&quot;, **glm_params\n).fit(X, y_freq, sample_weight=w_freq)\n\n# Model counts N = w * freq with offsets (but without weights)\nN = w_freq * y_freq\nglm_offset_freq = GeneralizedLinearRegressor(\n    family=&quot;poisson&quot;, **glm_params\n).fit(X, N, offset=np.log(w_freq))\n\nprint(\n    f&quot;intercept freq{'':&lt;8}= {glm_freq.intercept_}\\n&quot;\n    f&quot;intercept freq offset = {glm_offset_freq.intercept_}&quot;\n)\n# intercept freq        = -3.756437676421677\n# intercept freq offset = -3.7564376764216725\n\nnp.max(np.abs(glm_freq.coef_ - glm_offset_freq.coef_)) &lt; 1e-13\n# True<\/pre><\/div>\n\n\n\n<p>As next, we model the severity <code><code><span class=\"katex-eq\" data-katex-display=\"false\">Y = \\frac{loss}{N}<\/span><\/code><\/code> with claim counts <code><code><span class=\"katex-eq\" data-katex-display=\"false\">N<\/span><\/code><\/code> as weights. As is standard, we use a Gamma GLM with log-link (which is not canonical this time).<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\"># Model severity with weights (but without offsets)\ny_sev = (df[&quot;ClaimAmount&quot;] \/ df[&quot;ClaimNb&quot;])\nw_sev = df[&quot;ClaimNb&quot;].fillna(0)\nX = df[x_vars]\n# Filter out zero count (w_sev==0) rows\nw_gt_0 = w_sev &gt; 0\ny_sev = y_sev[w_gt_0]\nX_sev = X[w_gt_0]\nw_sev = w_sev[w_gt_0]\n\nglm_sev = GeneralizedLinearRegressor(\n    family=&quot;gamma&quot;, **glm_params\n).fit(X_sev, y_sev, sample_weight=w_sev)\n\n# Note that the target is claim amount = w * sev.\nclaim_amount = w_sev * y_sev\nglm_offset_sev = GeneralizedLinearRegressor(\n    family=&quot;gamma&quot;, **glm_params\n).fit(X_sev, claim_amount, offset=np.log(w_sev))\n\nprint(\n    f&quot;intercept sev{'':&lt;8}= {glm_sev.intercept_}\\n&quot;\n    f&quot;intercept sev offset = {glm_offset_sev.intercept_}&quot;\n)\n# intercept sev        = 7.287909799461992\n# intercept sev offset = 7.236827150674156\n\nnp.max(np.abs(glm_sev.coef_ - glm_offset_sev.coef_))\n# 0.2119162919285421<\/pre><\/div>\n\n\n\n<p>The deviations might seem small, but they are there and add up:<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">print(\n    &quot;Total predicted claim amounts with weights &quot;\n    f&quot;{np.sum(w_sev * glm_sev.predict(X_sev)):_.2f}&quot;\n)\nprint(\n    &quot;Total predicted claim amounts offset       &quot;\n    f&quot;{np.sum(glm_offset_sev.predict(X_sev, offset=np.log(w_sev))):_.2f}&quot;\n)\n# Total predicted claim amounts with weights 49_309_687.30\n# Total predicted claim amounts offset       48_769_342.47<\/pre><\/div>\n\n\n\n<p>Here, it becomes evident that the two models are quite different.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"5-outlook\">Outlook<\/h2>\n\n\n\n<p>The full notebook can be found <a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2024-06-03%20frequency_freMTPL2.ipynb\">here<\/a>.<\/p>\n\n\n\n<p>The final part III of the Tweedie trilogy will follow in one week and go into details of the probability density function.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This trilogy celebrates the 40th birthday of Tweedie distributions in 2024 and highlights some of their very special properties.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,9],"tags":[6,22],"class_list":["post-1292","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-statistics","tag-python","tag-tweedie-trilogy"],"featured_image_src":null,"author_info":{"display_name":"Christian Lorentzen","author_link":"https:\/\/lorentzen.ch\/index.php\/author\/christian\/"},"_links":{"self":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1292","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/comments?post=1292"}],"version-history":[{"count":73,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1292\/revisions"}],"predecessor-version":[{"id":1724,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1292\/revisions\/1724"}],"wp:attachment":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/media?parent=1292"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/categories?post=1292"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/tags?post=1292"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}