Tag: data science

  • Dictionary for Data Scientists and Statisticians

    During my journey through machine learning (ML) and statistics, I was faced some many times with surprisingly different usage of terms. To improve the understanding of data scientists and statisticians, I present a dictionary and hope the humour does not get unnoticed.

    data scientiststatisticiancomment
    sampleobservation
    (training) setsample
    featurecovariate, predictormany more terms
    labelcategorical response
    inferenceprediction, forecast
    statisticsinference
    trainingfitting
    training errorin-sample error
    test/validation sethold-out sample
    regressionregression
    classificationregression (on categorical response) + decision makingthus the name logistic / multinomial regression!
    supervised machine learningregression
    AIAI for funding, else regressionsee EU AI Act article 3
    confidence scorepredicted probabilityconfidence scores might not represent probabilities
    (binary/multiclass) cross-entropy(binomial/multinomial) log likelihooda.k.a. log loss
    unbalanced data problem🤷‍♂️what problem?if any, a small data problem
    SMOTEdevil’s work

    Statistics is about the honest interpretation of data, which is much less appealing than less honest interpretation.

    by Prof. Simon Wood, a.k.a. Mr GAM/mgcv