Monday, December 2, 2019

Performance Measures for Machine Learning


In this blog spot I'm presenting a few performance measures for machine learning tasks. These performance measures come up a lot in the marketing domain.

Take at home lessons:
  • the measure you optimize to makes a difference
  • the measure you report makes a difference
  • use measure appropriate for problem/community
  • accuracy often is not sufficient/appropriate
  • only accuracy generalizes to >2 classes
  • this is not an exhaustive list of performance measures


Confusion Matrix

First we construct a confusion matrix for a binary classification problem. Given a classification function f(x)->R and a threshold T that can split the outcomes into {0, 1} we can create a confusion matrix that counts the occurrences of the predicted class given the true label.



Accuracy is then measured as the percentage of correct responses (True Positives + True Negatives) over the total amount of responses.


Problems with Accuracy

This measure is commonly used but it can be misleading. The problems arise from the domain we are modelling. If one of the class for example is poorly represented into the metric is meaningless as we could predict the same class always and still have a good results.

• Assumes equal cost for both kinds of errors 
  • cost(b-type-error) = cost (c-type-error)
• is 99% accuracy good?
  • can be excellent, good, mediocre, poor, terrible
  • depends on problem
• Base Rate = accuracy of predicting predominant class


Weighted (Cost sensitive) Accuracy

A modified version of accuracy is "Weighted Accuracy" were we count the cost of misclassification.

In this scenario we aiming for a model and a threshold that can minimize the total cost.



If we are not interested in the accuracy on the entire dataset but want accurate predictions for 5%, 10% or 20% of the dataset then we can use the lift measure.
Lift measures how much better than random prediction on the fraction of the dataset predicted true (f(x) > threshold).


Precision / Recall

  • The Precision measure counts how many of the interest class are correct.
  • The Recall measure counts how many of the interest class does the model return.
In the case below the interest class is a(1). 

We can change the sweep over the threshold calculate Precision/Recall multiple times and graph out what is called the Precision/Recall curve.

At each different threshold we can see a different tradeoff between the two metrics.
  • When the threshold is too high then c (everything is predicted as class 0) becomes zero and then the precision becomes zero.
  • When the threshold is too low then b (everything is predicted as class 1) becomes zero and then the recall becomes zero.
Both of these metrics are flawed in isolation and it is the eye of the modeller on which one better represents the problem.


The F-Measure is an attempt to merge the two measures to construct a more meaningful performance measure.

Receiver Operating Characteristic (ROC)

• Developed in WWII to statistically model false positive and false negative detections of radar operators
• Better statistical foundations than most other measures
• Standard measure in medicine and biology
• Becoming more popular in ML 

Although ROC graphs are apparently simple, there are some common misconceptions and pitfalls when using them in practice.

One of the earliest adopters of ROC graphs in machine learning was Spackman (1989), who demonstrated the value of ROC curves in evaluating and comparing algorithms.

ROC graphs are conceptually simple, but there are some non-obvious complexities that arise when they are used in research. 

ROC Plot


• Sweep threshold and plot
  • TPR vs. FPR
  • Sensitivity vs. 1-Specificity
  • P(true|true) vs. P(true|false)
• Sensitivity = a/(a+b) = Recall = LIFT numerator
• 1 - Specificity = 1 - d/(c+d)

A ROC graph depicts relative trade-offs between benefits (true positives) and costs (false positives).

  • The lower left point (0,0) represents the strategy of never issuing a positive classiffication. 
  • The opposite strategy is represented by the upper right point (1,1).
  • The point (0,1) represents perfect classiffication.
  • The diagonal line y = x represents the strategy of randomly guessing a class. 
  • A random classifier will produce an ROC point that "slides" back and forth on the diagonal based on the frequency with which it guesses the positive class. In order to get away from this diagonal into the upper triangular region, the classifier must exploit some information in the data. 
  • Any classifier that appears in the lower right triangle performs worse than random guessing. This triangle is therefore usually empty in ROC graphs.
  • ROC curves have an attractive property: they are insensitive to changes in class distribution.
  • Any performance metric that uses values from both columns of theconfusion matrix will be inherently sensitive to class skews. Metrics such as accuracy, precision, lift and F scores use values from both
    columns of the confusion matrix. 
  • ROC graphs are based upon TP rate and FP rate, in which each dimension is a strict columnar ratio, so do not depend on class distributions.

1 comment:

  1. Hello to everyone out here, I am here to share the unexpected miracle that happened to me … My name is Susan Christian , I live in London, UK. we got married for more than 9 years and have gotten two kids. thing were going well with us and we are always happy. until one day my husband started to behave in a way i could not understand, i was very confused by the way he treat me and the kids. later that month he did not come home again and he called me that he want a divorce, i asked him what have i done wrong to deserve this from him, all he was saying is that he want a divorce that he hate me and do not want to see me again in his life, i was mad and also frustrated do not know what to do, i was sick for more than 2 weeks because of the divorce. i love him so much he was everything to me without him my life is incomplete. i told my sister and she told me to contact a spell caster, i never believe in all this spell casting of a thing. i just want to try if something will come out of it. i contacted Dr Emu for the return of my husband to me, they told me that my husband have been taken by another woman, that she cast a spell on him that is why he hate me and also want us to divorce. then they told me that they have to cast a spell on him that will make him return to me and the kids, they casted the spell and after 24 hours my husband called me and he told me that i should forgive him, he started to apologize on phone and said that he still live me that he did not know what happen to him that he left me. it was the spell that he Dr Emu casted on him that make him come back to me today, me and my family are now happy again today. thank you Dr Emu for what you have done for me i would have been nothing today if not for your great spell. i want you my friends who are passing through all this kind of love problem of getting back their husband, wife , or ex boyfriend and girlfriend to contact Dr Emu ,if you need his help you can contact him through his private mail: or you can contact him through his website fb page Https:// and you will see that your problem will be solved without any delay.