{"id":745,"date":"2022-04-02T16:00:18","date_gmt":"2022-04-02T14:00:18","guid":{"rendered":"https:\/\/lorentzen.ch\/?p=745"},"modified":"2022-04-02T16:00:18","modified_gmt":"2022-04-02T14:00:18","slug":"duckdb-quacking-sql","status":"publish","type":"post","link":"https:\/\/lorentzen.ch\/index.php\/2022\/04\/02\/duckdb-quacking-sql\/","title":{"rendered":"DuckDB: Quacking SQL"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\">Lost in Translation between R and Python 8<\/h1>\n\n\n\n<p>This is the next article in our series <strong>&#8220;Lost in Translation between R and Python&#8221;<\/strong>. The aim of this series is to provide high-quality R <strong>and<\/strong> Python 3 code to achieve some non-trivial tasks. If you are to learn R, check out the R tab below. Similarly, if you are to learn Python, the Python tab will be your friend.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">DuckDB<\/h2>\n\n\n\n<p><a href=\"https:\/\/duckdb.org\/\">DuckDB<\/a> is a fantastic <em>in-process<\/em> SQL database management system written completely in C++. Check its <a href=\"https:\/\/duckdb.org\/docs\/\">official documentation<\/a> and other blogposts like <a href=\"https:\/\/guillaumepressiat.github.io\/\/blog\/2019\/10\/duckdb\">this<\/a> to get a feeling of its superpowers. It is getting better and better!<\/p>\n\n\n\n<p>Some of the highlights:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Easy installation in R and Python, made possible via language bindings.<\/li><li>Multiprocessing and fast.<\/li><li>Allows to work with data bigger than RAM.<\/li><li>Can fire SQL queries on R and Pandas tables.<\/li><li><strong>Can fire SQL queries on (multiple!) csv and\/or Parquet files.<\/strong><\/li><li><a href=\"https:\/\/duckdb.org\/2021\/12\/03\/duck-arrow.html\">Quacks Apache Arrow<\/a>.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Installation<\/h2>\n\n\n\n<p>DuckDB is super easy to install:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>R: <code>install.packages(\"duckdb\")<\/code><\/li><li>Python: <code>pip install duckdb<\/code><\/li><\/ul>\n\n\n\n<p>Additional packages required to run the code of this post are indicated in the code.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A first query<\/h2>\n\n\n\n<p>Let&#8217;s start by loading a dataset, initializing DuckDB and running a simple query.<\/p>\n\n\n\n<p>The dataset we use here contains information on over 20,000 sold houses in Kings County. Along with the sale price, different features describe the size and location of the properties. The dataset is available on <a href=\"https:\/\/www.openml.org\/d\/42092\">OpenML.org<\/a> with ID 42092.<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-b45d071d-48f6-48d2-a95f-1ccdc8b7eed5\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-b45d071d-48f6-48d2-a95f-1ccdc8b7eed5-tab-0\" aria-controls=\"ub-tabbed-content-b45d071d-48f6-48d2-a95f-1ccdc8b7eed5-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-b45d071d-48f6-48d2-a95f-1ccdc8b7eed5-tab-1\" aria-controls=\"ub-tabbed-content-b45d071d-48f6-48d2-a95f-1ccdc8b7eed5-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-b45d071d-48f6-48d2-a95f-1ccdc8b7eed5-panel-0\" aria-labelledby=\"ub-tabbed-content-b45d071d-48f6-48d2-a95f-1ccdc8b7eed5-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>library(OpenML)\nlibrary(duckdb)\nlibrary(tidyverse)\n\n# Load data\ndf &lt;- getOMLDataSet(data.id = 42092)$data\n\n# Initialize duckdb, register df and materialize first query\ncon = dbConnect(duckdb())\nduckdb_register(con, name = \"df\", df = df)\ncon %&gt;% \n  dbSendQuery(\"SELECT * FROM df limit 5\") %&gt;% \n  dbFetch()<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-b45d071d-48f6-48d2-a95f-1ccdc8b7eed5-panel-1\" aria-labelledby=\"ub-tabbed-content-b45d071d-48f6-48d2-a95f-1ccdc8b7eed5-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'>import duckdb\nimport pandas as pd\nfrom sklearn.datasets import fetch_openml\n\n# Load data\ndf = fetch_openml(data_id=42092, as_frame=True)[\"frame\"]\n\n# Initialize duckdb, register df and fire first query\n# If out-of-RAM: duckdb.connect(\"py.duckdb\", config={\"temp_directory\": \"a_directory\"})\ncon = duckdb.connect()\ncon.register(\"df\", df)\ncon.execute(\"SELECT * FROM df limit 5\").fetchdf()<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/04\/image.png\" alt=\"\" class=\"wp-image-749\" width=\"839\" height=\"183\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/04\/image.png 946w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/04\/image-300x66.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/04\/image-768x168.png 768w\" sizes=\"auto, (max-width: 839px) 100vw, 839px\" \/><figcaption>Result of first query (from R)<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Average price per grade<\/h2>\n\n\n\n<p>If you like SQL, then you can do your data preprocessing and simple analyses with DuckDB. Here, we calculate the average house price per online grade (the higher the grade, the better the house).<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-84ff6373-bd49-436e-b8b9-bd282569f9dc\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-84ff6373-bd49-436e-b8b9-bd282569f9dc-tab-0\" aria-controls=\"ub-tabbed-content-84ff6373-bd49-436e-b8b9-bd282569f9dc-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-84ff6373-bd49-436e-b8b9-bd282569f9dc-tab-1\" aria-controls=\"ub-tabbed-content-84ff6373-bd49-436e-b8b9-bd282569f9dc-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-84ff6373-bd49-436e-b8b9-bd282569f9dc-panel-0\" aria-labelledby=\"ub-tabbed-content-84ff6373-bd49-436e-b8b9-bd282569f9dc-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>query &lt;- \n  \"\n  SELECT AVG(price) avg_price, grade \n  FROM df \n  GROUP BY grade\n  ORDER BY grade\n  \"\navg &lt;- con %&gt;% \n  dbSendQuery(query) %&gt;% \n  dbFetch()\n\navg\n<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-84ff6373-bd49-436e-b8b9-bd282569f9dc-panel-1\" aria-labelledby=\"ub-tabbed-content-84ff6373-bd49-436e-b8b9-bd282569f9dc-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'># Average price per grade\nquery = \"\"\"\n  SELECT AVG(price) avg_price, grade \n  FROM df \n  GROUP BY grade\n  ORDER BY grade\n  \"\"\"\navg = con.execute(query).fetchdf()\navg<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/04\/image-1.png\" alt=\"\" class=\"wp-image-755\" width=\"121\" height=\"186\"\/><figcaption>R output<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Highlight: queries to files<\/h2>\n\n\n\n<p><strong>The last query will be applied directly to files on disk<\/strong>. To demonstrate this fantastic feature, we first save &#8220;df&#8221; as a parquet file and &#8220;avg&#8221; as a csv file. <\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-8843d5b5-c2e3-4537-a7f8-cce09f30e765\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-8843d5b5-c2e3-4537-a7f8-cce09f30e765-tab-0\" aria-controls=\"ub-tabbed-content-8843d5b5-c2e3-4537-a7f8-cce09f30e765-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-8843d5b5-c2e3-4537-a7f8-cce09f30e765-tab-1\" aria-controls=\"ub-tabbed-content-8843d5b5-c2e3-4537-a7f8-cce09f30e765-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-8843d5b5-c2e3-4537-a7f8-cce09f30e765-panel-0\" aria-labelledby=\"ub-tabbed-content-8843d5b5-c2e3-4537-a7f8-cce09f30e765-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'>write_parquet(df, \"housing.parquet\")\nwrite.csv(avg, \"housing_avg.csv\", row.names = FALSE)\n<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-8843d5b5-c2e3-4537-a7f8-cce09f30e765-panel-1\" aria-labelledby=\"ub-tabbed-content-8843d5b5-c2e3-4537-a7f8-cce09f30e765-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'># Save df and avg to different file types\ndf.to_parquet(\"housing.parquet\")  # pyarrow=7\navg.to_csv(\"housing_avg.csv\", index=False)<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<p>Let&#8217;s load some columns of &#8220;housing.parquet&#8221; data, but only rows with grades having an average price of one million USD. Agreed, that query does not make too much sense but I hope you get the idea&#8230;\ud83d\ude03<\/p>\n\n\n<div class=\"wp-block-ub-tabbed-content wp-block-ub-tabbed-content-holder wp-block-ub-tabbed-content-horizontal-holder-mobile wp-block-ub-tabbed-content-horizontal-holder-tablet\" id=\"ub-tabbed-content-dc6ec10d-5218-4932-aa28-0a634846c65d\" style=\"\">\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-holder horizontal-tab-width-mobile horizontal-tab-width-tablet\">\n\t\t\t\t<div role=\"tablist\" class=\"wp-block-ub-tabbed-content-tabs-title wp-block-ub-tabbed-content-tabs-title-mobile-horizontal-tab wp-block-ub-tabbed-content-tabs-title-tablet-horizontal-tab\" style=\"justify-content: flex-start; \"><div role=\"tab\" id=\"ub-tabbed-content-dc6ec10d-5218-4932-aa28-0a634846c65d-tab-0\" aria-controls=\"ub-tabbed-content-dc6ec10d-5218-4932-aa28-0a634846c65d-panel-0\" aria-selected=\"true\" class=\"wp-block-ub-tabbed-content-tab-title-wrap active\" style=\"--ub-tabbed-title-background-color: #6d6d6d; --ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">R<\/div>\n\t\t\t<\/div><div role=\"tab\" id=\"ub-tabbed-content-dc6ec10d-5218-4932-aa28-0a634846c65d-tab-1\" aria-controls=\"ub-tabbed-content-dc6ec10d-5218-4932-aa28-0a634846c65d-panel-1\" aria-selected=\"false\" class=\"wp-block-ub-tabbed-content-tab-title-wrap\" style=\"--ub-tabbed-active-title-color: inherit; --ub-tabbed-active-title-background-color: #6d6d6d; text-align: center; \" tabindex=\"-1\">\n\t\t\t\t<div class=\"wp-block-ub-tabbed-content-tab-title\">Python<\/div>\n\t\t\t<\/div><\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"wp-block-ub-tabbed-content-tabs-content\" style=\"\"><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap active\" id=\"ub-tabbed-content-dc6ec10d-5218-4932-aa28-0a634846c65d-panel-0\" aria-labelledby=\"ub-tabbed-content-dc6ec10d-5218-4932-aa28-0a634846c65d-tab-0\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"r\",\"mime\":\"text\/x-rsrc\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"R\",\"maxHeight\":\"400px\",\"modeName\":\"r\"}'># \"Complex\" query\nquery2 &lt;- \"\n  SELECT price, sqft_living, A.grade, avg_price\n  FROM 'housing.parquet' A\n  LEFT JOIN 'housing_avg.csv' B\n  ON A.grade = B.grade\n  WHERE B.avg_price &gt; 1000000\n  \"\n\nexpensive_grades &lt;- con %&gt;% \n  dbSendQuery(query2) %&gt;% \n  dbFetch()\n\nhead(expensive_grades)\n\n# dbDisconnect(con)<\/pre><\/div>\n\n<\/div><div role=\"tabpanel\" class=\"wp-block-ub-tabbed-content-tab-content-wrap ub-hide\" id=\"ub-tabbed-content-dc6ec10d-5218-4932-aa28-0a634846c65d-panel-1\" aria-labelledby=\"ub-tabbed-content-dc6ec10d-5218-4932-aa28-0a634846c65d-tab-1\" tabindex=\"0\">\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting='{\"showPanel\":true,\"languageLabel\":\"language\",\"fullScreenButton\":true,\"copyButton\":true,\"mode\":\"python\",\"mime\":\"text\/x-python\",\"theme\":\"material\",\"lineNumbers\":false,\"styleActiveLine\":false,\"lineWrapping\":false,\"readOnly\":true,\"fileName\":\"\",\"language\":\"Python\",\"maxHeight\":\"400px\",\"modeName\":\"python\"}'># Complex query\nquery2 = \"\"\"\n  SELECT price, sqft_living, A.grade, avg_price\n  FROM 'housing.parquet' A\n  LEFT JOIN 'housing_avg.csv' B\n  ON A.grade = B.grade\n  WHERE B.avg_price &gt; 1000000\n  \"\"\"\nexpensive_grades = con.execute(query2).fetchdf()\nexpensive_grades\n\n# con.close()<\/pre><\/div>\n\n<\/div><\/div>\n\t\t<\/div>\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/04\/image-2.png\" alt=\"\" class=\"wp-image-758\" width=\"371\" height=\"141\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/04\/image-2.png 315w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/04\/image-2-300x114.png 300w\" sizes=\"auto, (max-width: 371px) 100vw, 371px\" \/><figcaption>R output<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Last words<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>DuckDB is cool!<\/li><li>If you have strong SQL skills but do not know R or Python so well, this is a great way to get used to those programming languages.<\/li><li>If you are unfamiliar to SQL but like R and\/or Python, you can use DuckDB for a while and end up being an SQL addict.<\/li><li>If your analysis involves combining many large files during preprocessing, then you can try the trick shown in the last example of this post.<\/li><\/ul>\n\n\n\n<p>The Python notebook and R code can be found at:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2022-04-02%20duckdb.R\">R code<\/a><\/li><li><a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2022-04-02%20duckdb.ipynb\">Python notebook<\/a><\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;R <-> Python&#8221; continued&#8230; DuckDB: Quacking SQL<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17],"tags":[10,6,5],"class_list":["post-745","post","type-post","status-publish","format-standard","hentry","category-programming","tag-lost-in-translation","tag-python","tag-r"],"featured_image_src":null,"author_info":{"display_name":"Michael Mayer","author_link":"https:\/\/lorentzen.ch\/index.php\/author\/michael\/"},"_links":{"self":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/745","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/comments?post=745"}],"version-history":[{"count":20,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/745\/revisions"}],"predecessor-version":[{"id":769,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/745\/revisions\/769"}],"wp:attachment":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/media?parent=745"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/categories?post=745"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/tags?post=745"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}