{"id":51,"date":"2021-01-07T17:30:00","date_gmt":"2021-01-07T16:30:00","guid":{"rendered":"https:\/\/lorentzen.ch\/?p=51"},"modified":"2021-02-19T17:50:12","modified_gmt":"2021-02-19T16:50:12","slug":"illustrating-the-central-limit-theorem","status":"publish","type":"post","link":"https:\/\/lorentzen.ch\/index.php\/2021\/01\/07\/illustrating-the-central-limit-theorem\/","title":{"rendered":"Illustrating The Central Limit Theorem"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\">Lost in Translation between R and Python 1<\/h1>\n\n\n\n<p>This is the first article in our series <strong>&#8220;Lost in Translation between R and Python&#8221;<\/strong>. The aim of this series is to provide high-quality R <strong>and<\/strong> Python 3 code to achieve some non-trivial tasks. If you are to learn R, check out the R tab below. Similarly, if you are to learn Python, the Python tab will be your friend.<\/p>\n\n\n\n<p>Let&#8217;s start with a little bit of statistics &#8211; it wont be the last time, friends: Illustrating the <strong>Central Limit Theorem<\/strong> (CLT).<\/p>\n\n\n\n<p>Take a sample of a random variable <em>X<\/em> with finite variance. The CLT says: No matter how &#8220;unnormally&#8221; distributed <em>X<\/em> is, its <em>sample mean<\/em> will be approximately normally distributed, at least if the sample size is not too small. This classic result is the basis to construct simple confidence intervals and hypothesis tests for the (true) mean of <em>X<\/em>, check out <a href=\"https:\/\/en.wikipedia.org\/wiki\/Central_limit_theorem\">Wikipedia<\/a> for a lot of additional information.<\/p>\n\n\n\n<p>The code below illustrates this famous statistical result by simulation, using a very asymmetrically distributed <em>X<\/em>, namely <em>X = 1<\/em> with probability 0.2 and <em>X=0<\/em> otherwise. <em>X<\/em> could represent the result of asking a randomly picked person whether he smokes. Conducting such a poll, the mean of the collected <em>sample<\/em> of such results would be a statistical estimate of the proportion of people smoking.<\/p>\n\n\n\n<p>Curiously, by a tiny modification, the same code will also illustrate another key result in statistics &#8211; the <strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Law_of_large_numbers\">Law of Large Numbers<\/a><\/strong>: For growing sample size, the distribution of the sample mean of <em>X<\/em> contracts to the expectation <em>E(X)<\/em>.<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-35b00f9b-11e0-4376-aad5-87b49e7fccd7\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-35b00f9b-11e0-4376-aad5-87b49e7fccd7-tab-0\" aria-controls=\"ub-tabbed-content-35b00f9b-11e0-4376-aad5-87b49e7fccd7-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-35b00f9b-11e0-4376-aad5-87b49e7fccd7-tab-1\" aria-controls=\"ub-tabbed-content-35b00f9b-11e0-4376-aad5-87b49e7fccd7-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-35b00f9b-11e0-4376-aad5-87b49e7fccd7-panel-0\" aria-labelledby=\"ub-tabbed-content-35b00f9b-11e0-4376-aad5-87b49e7fccd7-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'># Fix seed, set constants\nset.seed(2006)\nsample_sizes &lt;- c(1, 10, 30, 1000)\nnsims &lt;- 10000\n\n# Helper function: Mean of one sample of X\none_mean &lt;- function(n, p = c(0.8, 0.2)) {\n  mean(sample(0:1, n, replace = TRUE, prob = p))\n}\n# one_mean(10)\n\n# Simulate and plot\npar(mfrow = c(2, 2), mai = rep(0.4, 4))\n\nfor (n in sample_sizes) {\n  means &lt;- replicate(nsims, one_mean(n))\n  hist(means, breaks = \"FD\", \n       # xlim = 0:1, # uncomment for LLN\n       main = sprintf(\"n=%i\", n))\n}<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-35b00f9b-11e0-4376-aad5-87b49e7fccd7-panel-1\" aria-labelledby=\"ub-tabbed-content-35b00f9b-11e0-4376-aad5-87b49e7fccd7-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'>import numpy as np\nimport matplotlib.pyplot as plt\n%matplotlib inline\n\n# Fix seed, set constants\nnp.random.seed(100)\nsample_sizes = [1, 10, 30, 1000]\nnsims = 10_000\n\n# Helper function: Mean of one sample\ndef one_mean(n, p=0.2):\n    return np.random.binomial(1, p, n).mean()\n\n# Simulate and plot\nfig, axes = plt.subplots(2, 2, figsize=(8, 8))\n\nfor i, n in enumerate(sample_sizes):\n    means = [one_mean(n) for ell in range(nsims)]\n    ax = axes[i \/\/ 2, i % 2]\n    ax.hist(means, 50)\n    ax.title.set_text(f'$n = {n}$')\n    ax.set_xlabel('mean')\n    # ax.set_xlim(0, 1)  # uncomment for LLN\nfig.tight_layout()<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<p><strong>Result: The Central Limit Theorem<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"765\" height=\"545\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2021\/01\/image-2.png\" alt=\"\" class=\"wp-image-105\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2021\/01\/image-2.png 765w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2021\/01\/image-2-300x214.png 300w\" sizes=\"auto, (max-width: 765px) 100vw, 765px\" \/><figcaption>The larger the samples, the closer the histogram of the simulated means resembles a symmetric bell shaped curve (R-Output for illustration).<\/figcaption><\/figure>\n\n\n\n<p><strong>Result: The Law of Large Number<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"765\" height=\"545\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2021\/01\/image-3.png\" alt=\"\" class=\"wp-image-106\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2021\/01\/image-3.png 765w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2021\/01\/image-3-300x214.png 300w\" sizes=\"auto, (max-width: 765px) 100vw, 765px\" \/><figcaption>Fixing the x-scale illustrates &#8211; for free(!) &#8211; the Law of Large Numbers: The distribution of the mean contracts more and more to the expectation 0.2 (R-Output for illustration).<\/figcaption><\/figure>\n\n\n\n<p>See also the python notebook <a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2021-01-07%20Illustrating%20The%20Central%20Limit%20Theorem.ipynb\" data-type=\"URL\" data-id=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2021-01-07%20Illustrating%20The%20Central%20Limit%20Theorem.ipynb\">https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2021-01-07 Illustrating The Central Limit Theorem.ipynb<\/a> and for many great posts on R, \u00a0<a href=\"http:\/\/www.R-bloggers.com\">http:\/\/www.R-bloggers.com<\/a>.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is the first article in our series &#8220;Lost in Translation between R and Python&#8221;. We start it by illustrating the famous Central Limit Theorem.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[10,6,5],"class_list":["post-51","post","type-post","status-publish","format-standard","hentry","category-statistics","tag-lost-in-translation","tag-python","tag-r"],"featured_image_src":null,"author_info":{"display_name":"Michael Mayer","author_link":"https:\/\/lorentzen.ch\/index.php\/author\/michael\/"},"_links":{"self":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/51","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/comments?post=51"}],"version-history":[{"count":60,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/51\/revisions"}],"predecessor-version":[{"id":234,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/51\/revisions\/234"}],"wp:attachment":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/media?parent=51"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/categories?post=51"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/tags?post=51"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}