{"id":1046,"date":"2023-01-09T19:47:06","date_gmt":"2023-01-09T18:47:06","guid":{"rendered":"https:\/\/lorentzen.ch\/?p=1046"},"modified":"2023-01-09T19:47:07","modified_gmt":"2023-01-09T18:47:07","slug":"dplyr-style-without-dplyr","status":"publish","type":"post","link":"https:\/\/lorentzen.ch\/index.php\/2023\/01\/09\/dplyr-style-without-dplyr\/","title":{"rendered":"Dplyr-style without dplyr"},"content":{"rendered":"\n<p>One of the reasons why we love the &#8220;dplyr&#8221; package: it plays so well together with the forward pipe operator `%>%` from the &#8220;magrittr&#8221; package. Actually, it is not a coincidence that both packages were released quite at the same time, in 2014.<\/p>\n\n\n\n<p>What does the pipe do? It puts the object on its left as the first argument into the function on its right: <code>iris %>% head()<\/code> is a funny way of writing <code>head(iris<\/code>). It helps to avoid long function chains like <code>f(g(h(x)))<\/code>, or repeated assignments.<\/p>\n\n\n\n<p>In 2021 and version 4.1, R has received its native forward pipe operator <code>|><\/code> so that we can write nice code like this:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/Base-R-and-piping.png\" alt=\"\" class=\"wp-image-1053\" width=\"475\" height=\"169\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/Base-R-and-piping.png 643w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/Base-R-and-piping-300x107.png 300w\" sizes=\"auto, (max-width: 475px) 100vw, 475px\" \/><figcaption class=\"wp-element-caption\">Imagine this without pipe&#8230;<\/figcaption><\/figure>\n\n\n\n<p>Since version 4.2, the piped object can be referenced by the underscore <code>_<\/code>, but just once for now, see an example below.<\/p>\n\n\n\n<p>To use the native pipe via CTRL-SHIFT-M in Posit\/RStudio, tick this:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"649\" height=\"429\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/image.png\" alt=\"\" class=\"wp-image-1047\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/image.png 649w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/image-300x198.png 300w\" sizes=\"auto, (max-width: 649px) 100vw, 649px\" \/><\/figure>\n\n\n\n<p>Combined with the many great functions from the standard distribution of R, we can get a real &#8220;dplyr&#8221; feeling without even loading dplyr. Don&#8217;t get me wrong: I am a huge fan of the whole Tidyverse! But it is a great way to learn &#8220;Standard R&#8221;.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Data chains<\/h3>\n\n\n\n<p>Here a small selection of standard functions playing well together with the pipe: They take a data frame and return a modified data frame:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>subset()<\/code>: Select rows and columns of data frame<\/li>\n\n\n\n<li><code>transform()<\/code>: Add or overwrite columns in data frame<\/li>\n\n\n\n<li><code>aggregate()<\/code>: Grouped calculations<\/li>\n\n\n\n<li><code>rbind()<\/code>, <code>cbind()<\/code>: Bind rows\/columns of data frame\/matrix<\/li>\n\n\n\n<li><code>merge()<\/code>: Join data frames by key<\/li>\n\n\n\n<li><code>head()<\/code>, <code>tail()<\/code>: First\/last few elements of object<\/li>\n\n\n\n<li><code>reshape()<\/code>: Transposition\/Reshaping of data frame (no, I don&#8217;t understand the interface)<\/li>\n<\/ul>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-b50881d7-bb8b-471d-93de-8cc1a6c20b50\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-b50881d7-bb8b-471d-93de-8cc1a6c20b50-tab-0\" aria-controls=\"ub-tabbed-content-b50881d7-bb8b-471d-93de-8cc1a6c20b50-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-b50881d7-bb8b-471d-93de-8cc1a6c20b50-panel-0\" aria-labelledby=\"ub-tabbed-content-b50881d7-bb8b-471d-93de-8cc1a6c20b50-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>library(ggplot2)  # Need diamonds\n\n# What does the native pipe do?\nquote(diamonds |&gt; head())\n\n# OUTPUT\n# head(diamonds)\n\n# Grouped statistics\ndiamonds |&gt; \n  aggregate(cbind(price, carat) ~ color, FUN = mean)\n\n# OUTPUT\n#   color    price     carat\n# 1     D 3169.954 0.6577948\n# 2     E 3076.752 0.6578667\n# 3     F 3724.886 0.7365385\n# 4     G 3999.136 0.7711902\n# 5     H 4486.669 0.9117991\n# 6     I 5091.875 1.0269273\n# 7     J 5323.818 1.1621368\n\n# Join back grouped stats to relevant columns\ndiamonds |&gt; \n  subset(select = c(price, color, carat)) |&gt; \n  transform(price_per_color = ave(price, color)) |&gt; \n  head()\n\n# OUTPUT\n#   price color carat price_per_color\n# 1   326     E  0.23        3076.752\n# 2   326     E  0.21        3076.752\n# 3   327     E  0.23        3076.752\n# 4   334     I  0.29        5091.875\n# 5   335     J  0.31        5323.818\n# 6   336     J  0.24        5323.818\n\n# Plot transformed values\ndiamonds |&gt; \n  transform(\n    log_price = log(price),\n    log_carat = log(carat)\n  ) |&gt; \n  plot(log_price ~ log_carat, col = \"chartreuse4\", pch = \".\", data = _)<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/scatter.jpeg\" alt=\"\" class=\"wp-image-1048\" width=\"641\" height=\"469\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/scatter.jpeg 389w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/scatter-300x220.jpeg 300w\" sizes=\"auto, (max-width: 641px) 100vw, 641px\" \/><figcaption class=\"wp-element-caption\">A simple scatterplot<\/figcaption><\/figure>\n\n\n\n<p>The plot does not look quite as sexy as &#8220;ggplot2&#8221;, but its a start.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Other chains<\/h3>\n\n\n\n<p>The pipe not only works perfectly with functions that modify a data frame. It also shines with many other functions often applied in a nested way. Here two examples:<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-95b34342-87ec-49db-aad1-66f56a592ab5\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-95b34342-87ec-49db-aad1-66f56a592ab5-tab-0\" aria-controls=\"ub-tabbed-content-95b34342-87ec-49db-aad1-66f56a592ab5-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-95b34342-87ec-49db-aad1-66f56a592ab5-panel-0\" aria-labelledby=\"ub-tabbed-content-95b34342-87ec-49db-aad1-66f56a592ab5-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'># Distribution of color within clarity\ndiamonds |&gt; \n  subset(select = c(color, clarity)) |&gt; \n  table() |&gt; \n  prop.table(margin = 2) |&gt; \n  addmargins(margin = 1) |&gt; \n  round(3)\n\n# OUTPUT\n# clarity\n# color      I1   SI2   SI1   VS2   VS1  VVS2  VVS1    IF\n#     D   0.057 0.149 0.159 0.138 0.086 0.109 0.069 0.041\n#     E   0.138 0.186 0.186 0.202 0.157 0.196 0.179 0.088\n#     F   0.193 0.175 0.163 0.180 0.167 0.192 0.201 0.215\n#     G   0.202 0.168 0.151 0.191 0.263 0.285 0.273 0.380\n#     H   0.219 0.170 0.174 0.134 0.143 0.120 0.160 0.167\n#     I   0.124 0.099 0.109 0.095 0.118 0.072 0.097 0.080\n#     J   0.067 0.052 0.057 0.060 0.066 0.026 0.020 0.028\n#     Sum 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000\n\n# Barplot from discrete column\ndiamonds$color |&gt; \n  table() |&gt; \n  prop.table() |&gt; \n  barplot(col = \"chartreuse4\", main = \"Color\")<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/barplot-2.jpeg\" alt=\"\" class=\"wp-image-1052\" width=\"612\" height=\"449\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/barplot-2.jpeg 389w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2023\/01\/barplot-2-300x220.jpeg 300w\" sizes=\"auto, (max-width: 612px) 100vw, 612px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Wrap up<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Piping is fun with and without dplyr.<\/li>\n\n\n\n<li>It is a great motivation to learn standard R<\/li>\n<\/ul>\n\n\n\n<p>The complete R script can be found <a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2023-01-09%20dplyr_without_dplyr.R\">here<\/a>.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to get &#8220;dplyr&#8221; feeling without &#8220;dplyr&#8221;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17,9],"tags":[5],"class_list":["post-1046","post","type-post","status-publish","format-standard","hentry","category-programming","category-statistics","tag-r"],"featured_image_src":null,"author_info":{"display_name":"Michael Mayer","author_link":"https:\/\/lorentzen.ch\/index.php\/author\/michael\/"},"_links":{"self":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1046","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/comments?post=1046"}],"version-history":[{"count":2,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1046\/revisions"}],"predecessor-version":[{"id":1054,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1046\/revisions\/1054"}],"wp:attachment":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/media?parent=1046"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/categories?post=1046"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/tags?post=1046"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}