{"id":1859,"date":"2025-04-07T22:45:59","date_gmt":"2025-04-07T20:45:59","guid":{"rendered":"https:\/\/lorentzen.ch\/?p=1859"},"modified":"2025-04-07T22:57:43","modified_gmt":"2025-04-07T20:57:43","slug":"dictionary-for-data-scientists-and-statisticians","status":"publish","type":"post","link":"https:\/\/lorentzen.ch\/index.php\/2025\/04\/07\/dictionary-for-data-scientists-and-statisticians\/","title":{"rendered":"Dictionary for Data Scientists and Statisticians"},"content":{"rendered":"\n<p>During my journey through machine learning (ML) and statistics, I was faced some many times with surprisingly different usage of terms. To improve the understanding of data scientists and statisticians, I present a dictionary and hope the humour does not get unnoticed.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>data scientist<\/strong><\/td><td><strong>statistician<\/strong><\/td><td><strong>comment<\/strong><\/td><\/tr><tr><td>sample<\/td><td>observation<\/td><td><\/td><\/tr><tr><td>(training) set<\/td><td>sample<\/td><td><\/td><\/tr><tr><td>feature<\/td><td>covariate, predictor<\/td><td>many more terms<\/td><\/tr><tr><td>label<\/td><td>categorical response<\/td><td><\/td><\/tr><tr><td>inference<\/td><td>prediction, forecast<\/td><td><\/td><\/tr><tr><td>statistics<\/td><td>inference<\/td><td><\/td><\/tr><tr><td>training<\/td><td>fitting<\/td><td><\/td><\/tr><tr><td>training error<\/td><td>in-sample error<\/td><td><\/td><\/tr><tr><td>test\/validation set<\/td><td>hold-out sample<\/td><td><\/td><\/tr><tr><td>regression<\/td><td>regression<\/td><td><\/td><\/tr><tr><td>classification<\/td><td>regression (on categorical response) + decision making<\/td><td>thus the name logistic \/ multinomial regression!<\/td><\/tr><tr><td>supervised machine learning<\/td><td>regression<\/td><td><\/td><\/tr><tr><td>AI<\/td><td>AI for funding, else regression<\/td><td>see <a href=\"https:\/\/artificialintelligenceact.eu\/article\/3\/\">EU AI Act article 3<\/a><\/td><\/tr><tr><td>confidence score<\/td><td>predicted probability<\/td><td>confidence scores might not represent probabilities<\/td><\/tr><tr><td>(binary\/multiclass) cross-entropy<\/td><td>(binomial\/multinomial) log likelihood<\/td><td>a.k.a. log loss<\/td><\/tr><tr><td>unbalanced data problem<\/td><td>\ud83e\udd37\u200d\u2642\ufe0fwhat problem?<\/td><td>if any, a small data problem<\/td><\/tr><tr><td>SMOTE<\/td><td>devil&#8217;s work<\/td><td><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>Statistics is about the honest interpretation of data, which is much less appealing than less honest interpretation<\/em>.<\/p>\n<\/blockquote>\n\n\n\n<p>by Prof. Simon Wood, a.k.a. Mr GAM\/mgcv<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>During my journey through machine learning (ML) and statistics, I was faced some many times with surprisingly different usage of terms. To improve the understanding of data scientists and statisticians, I present a dictionary and hope the humour does not get unnoticed. data scientist statistician comment sample observation (training) set sample feature covariate, predictor many [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,9],"tags":[30,10,29],"class_list":["post-1859","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-statistics","tag-data-science","tag-lost-in-translation","tag-statistics"],"featured_image_src":null,"author_info":{"display_name":"Christian Lorentzen","author_link":"https:\/\/lorentzen.ch\/index.php\/author\/christian\/"},"_links":{"self":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1859","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/comments?post=1859"}],"version-history":[{"count":16,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1859\/revisions"}],"predecessor-version":[{"id":1902,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/1859\/revisions\/1902"}],"wp:attachment":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/media?parent=1859"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/categories?post=1859"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/tags?post=1859"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}