{"id":711,"date":"2022-02-19T18:57:11","date_gmt":"2022-02-19T17:57:11","guid":{"rendered":"https:\/\/lorentzen.ch\/?p=711"},"modified":"2022-02-19T18:57:12","modified_gmt":"2022-02-19T17:57:12","slug":"avoid-loops-in-r-really","status":"publish","type":"post","link":"https:\/\/lorentzen.ch\/index.php\/2022\/02\/19\/avoid-loops-in-r-really\/","title":{"rendered":"Avoid loops in R! Really?"},"content":{"rendered":"\n<p>It must have been around the year 2000, when I wrote my first snipped of SPLUS\/R code. One thing I&#8217;ve learned back then:<\/p>\n\n\n\n<p><strong>Loops are slow. Replace them with<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><strong>vectorized calculations or<\/strong><\/li><li><strong>if vectorization is not possible, use sapply() et al.<\/strong><\/li><\/ol>\n\n\n\n<p>Since then, the R core team and the community has invested tons of time to improve R and also to make it faster. There are things like RCPP and parallel computing to speed up loops. <\/p>\n\n\n\n<p>But what still relatively few R users know: <strong>loops are not that slow anymore<\/strong>. We want to demonstrate this using two examples. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Example 1: sqrt()<\/h2>\n\n\n\n<p>We use three ways to calculate the square root of a vector of random numbers:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Vectorized calculation. This will be the way to go because it is internally optimized in C.<\/li><li>A loop. This must be super slow for large vectors.<\/li><li>vapply() (as safe alternative to sapply).<\/li><\/ol>\n\n\n\n<p>The three approaches are then compared via bench::mark() regarding their speed for different numbers <em>n<\/em> of vector lengths. The results are then compared first regarding absolute median times, and secondly (using an independent run), on a relative scale (1 is the vectorized approach). <\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-b8fcaf6a-8ad9-4b6a-9e3d-69b5189937c4\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-b8fcaf6a-8ad9-4b6a-9e3d-69b5189937c4-tab-0\" aria-controls=\"ub-tabbed-content-b8fcaf6a-8ad9-4b6a-9e3d-69b5189937c4-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-b8fcaf6a-8ad9-4b6a-9e3d-69b5189937c4-panel-0\" aria-labelledby=\"ub-tabbed-content-b8fcaf6a-8ad9-4b6a-9e3d-69b5189937c4-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>library(tidyverse)\nlibrary(bench)\n\n# Calculate square root for each element in loop\nsqrt_loop &lt;- function(x) {\n  out &lt;- numeric(length(x))\n  for (i in seq_along(x)) {\n    out[i] &lt;- sqrt(x[i])\n  }\n  out\n}\n\n# Example\nsqrt_loop(1:4) # 1.000000 1.414214 1.732051 2.000000\n\n# Compare its performance with two alternatives\nsqrt_benchmark &lt;- function(n) {\n  x &lt;- rexp(n)\n  mark(\n    vectorized = sqrt(x),\n    loop = sqrt_loop(x),\n    vapply = vapply(x, sqrt, FUN.VALUE = 0.0),\n    # relative = TRUE\n  )\n}\n\n# Combine results of multiple benchmarks and plot results\nmultiple_benchmarks &lt;- function(one_bench, N) {\n  res &lt;- vector(\"list\", length(N))\n  for (i in seq_along(N)) {\n    res[[i]] &lt;- one_bench(N[i]) %&gt;% \n      mutate(n = N[i], expression = names(expression))\n  }\n  \n  ggplot(bind_rows(res), aes(n, median, color = expression)) +\n    geom_point(size = 3) +\n    geom_line(size = 1) +\n    scale_x_log10() +\n    ggtitle(deparse1(substitute(one_bench))) +\n    theme(legend.position = c(0.8, 0.15))\n}\n\n# Apply simulation\nmultiple_benchmarks(sqrt_benchmark, N = 10^seq(3, 6, 0.25))<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<h3 class=\"wp-block-heading\">Absolute timings<\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"633\" height=\"544\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/02\/Rplot02-1.jpeg\" alt=\"\" class=\"wp-image-716\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/02\/Rplot02-1.jpeg 633w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/02\/Rplot02-1-300x258.jpeg 300w\" sizes=\"auto, (max-width: 633px) 100vw, 633px\" \/><figcaption> Absolute median times on the &#8220;sqrt()&#8221; task<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Relative timings (using a second run)<\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"633\" height=\"544\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/02\/Rplot05.jpeg\" alt=\"\" class=\"wp-image-721\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/02\/Rplot05.jpeg 633w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/02\/Rplot05-300x258.jpeg 300w\" sizes=\"auto, (max-width: 633px) 100vw, 633px\" \/><figcaption>Relative median times of a separate run on the &#8220;sqrt()&#8221; task <\/figcaption><\/figure>\n\n\n\n<p>We see:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Run times increase quite linearly with vector size.<\/li><li>Vectorization is more than ten times faster than the naive loop.<\/li><li>Most strikingly, vapply() is much slower than the naive loop. Would you have thought this?<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Example 2: paste()<\/h2>\n\n\n\n<p>For the second example, we use a less simple function, namely<\/p>\n\n\n\n<p>paste(&#8220;Number&#8221;, prettyNum(x, digits = 5))<\/p>\n\n\n\n<p>What will our three approaches (vectorized, naive loop, vapply) show on this task?<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-63e87f71-a483-4453-b176-e28eddb0fb77\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-63e87f71-a483-4453-b176-e28eddb0fb77-tab-0\" aria-controls=\"ub-tabbed-content-63e87f71-a483-4453-b176-e28eddb0fb77-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-63e87f71-a483-4453-b176-e28eddb0fb77-panel-0\" aria-labelledby=\"ub-tabbed-content-63e87f71-a483-4453-b176-e28eddb0fb77-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>pretty_paste &lt;- function(x) {\n  paste(\"Number\", prettyNum(x, digits = 5))\n}\n\n# Example\npretty_paste(pi) # \"Number 3.1416\"\n\n# Again, call pretty_paste() for each element in a loop\npaste_loop &lt;- function(x) {\n  out &lt;- character(length(x))\n  for (i in seq_along(x)) {\n    out[i] &lt;- pretty_paste(x[i])\n  }\n  out\n}\n\n# Compare its performance with two alternatives\npaste_benchmark &lt;- function(n) {\n  x &lt;- rexp(n)\n  mark(\n    vectorized = pretty_paste(x),\n    loop = paste_loop(x),\n    vapply = vapply(x, pretty_paste, FUN.VALUE = \"\"),\n    # relative = TRUE\n  )\n}\n\nmultiple_benchmarks(paste_benchmark, N = 10^seq(3, 5, 0.25))<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<h3 class=\"wp-block-heading\">Absolute timings<\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"633\" height=\"544\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/02\/Rplot03.jpeg\" alt=\"\" class=\"wp-image-717\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/02\/Rplot03.jpeg 633w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/02\/Rplot03-300x258.jpeg 300w\" sizes=\"auto, (max-width: 633px) 100vw, 633px\" \/><figcaption> Absolute median times on the &#8220;paste()&#8221; task <\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Relative timings (using a second run)<\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"633\" height=\"544\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/02\/Rplot04.jpeg\" alt=\"\" class=\"wp-image-718\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/02\/Rplot04.jpeg 633w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/02\/Rplot04-300x258.jpeg 300w\" sizes=\"auto, (max-width: 633px) 100vw, 633px\" \/><figcaption>Relative median times of a separate run on the &#8220;paste()&#8221; task<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\"><li>In contrast to the first example, vapply() is now as fast as the naive loop.<\/li><li>The time advantage of the vectorized approach is much less impressive. The loop takes in median only 50% longer.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<ol class=\"wp-block-list\"><li>Vectorization is fast and easy to read. If available, use this. No surprise.<\/li><li>If you use vapply\/sapply\/lapply, do it for the style, not for the speed. In some cases, the loop will be faster. And, depending on the situation and the audience, a loop might actually be even easier to read. <\/li><\/ol>\n\n\n\n<p>The code can be found on <a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2022-02-19%20loops_in_R.R\">github<\/a>. <\/p>\n\n\n\n<p>The runs have been made on a Windows 11 system with a four core Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz processor.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>It must have been around the year 2000, when I wrote my first snipped of SPLUS\/R code. One thing I&#8217;ve learned back then: Loops are slow. Replace them with vectorized calculations or if vectorization is not possible, use sapply() et al. Since then, the R core team and the community has invested tons of time [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17],"tags":[5],"class_list":["post-711","post","type-post","status-publish","format-standard","hentry","category-programming","tag-r"],"featured_image_src":null,"author_info":{"display_name":"Michael Mayer","author_link":"https:\/\/lorentzen.ch\/index.php\/author\/michael\/"},"_links":{"self":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/711","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/comments?post=711"}],"version-history":[{"count":11,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/711\/revisions"}],"predecessor-version":[{"id":742,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/711\/revisions\/742"}],"wp:attachment":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/media?parent=711"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/categories?post=711"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/tags?post=711"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}