Dictionary for Data Scientists and Statisticians

During my journey through machine learning (ML) and statistics, I was faced some many times with surprisingly different usage of terms. To improve the understanding of data scientists and statisticians, I present a dictionary and hope the humour does not get unnoticed.

data scientiststatisticiancomment
sampleobservation
(training) setsample
featurecovariate, predictormany more terms
labelcategorical response
inferenceprediction, forecast
statisticsinference
trainingfitting
training errorin-sample error
test/validation sethold-out sample
regressionregression
classificationregression (on categorical response) + decision makingthus the name logistic / multinomial regression!
supervised machine learningregression
AIAI for funding, else regressionsee EU AI Act article 3
confidence scorepredicted probabilityconfidence scores might not represent probabilities
(binary/multiclass) cross-entropy(binomial/multinomial) log likelihooda.k.a. log loss
unbalanced data problem🤷‍♂️what problem?if any, a small data problem
SMOTEdevil’s work

Statistics is about the honest interpretation of data, which is much less appealing than less honest interpretation.

by Prof. Simon Wood, a.k.a. Mr GAM/mgcv

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *