As the first step in the decommissioning of the site has been converted to read-only mode.

Here are some tips for How to share your SAS knowledge with your professional network.

Tips:Accurately Calculate AUC in SAS

From sasCommunity
Revision as of 21:03, 13 February 2010 by Frankfry (Talk | contribs)

Jump to: navigation, search

Accurately Calculate AUC (Area Under the Curve) in SAS for a binary classifier rank ordered data

AUC is the area under ROC curve and is an important measurement on the accuracy of a binary classifier. It equals to c-statistics of Hosmer and Lemeshow, and is related closely to other measures such as gini index, etc.

In order to calculate AUC for a given SAS data set that is already rank ordered by a binary classifier (such as linear logistic regression), where we have the binary outcome Y and rank order measurement P_0 or P_1 (for class 0 and 1 respectively), we can use PROC NPAR1WAY to obtain Wilcoxon Rank Sum statistics and from there we are able to obtain accurate measurement of AUC for this given data. Such method avoids computation burden from approaches using numerical integration or use PROC LOGISTIC to refit this model (PROC LOGISTIC will not report c-statistics without fitting a model)

The relationship between AUC and Wilcoxon Rank Sum test statistics is:

AUC = (W-W0)/(N1*N0)+0.5

where N1 and N0 are the frequency of class 1 and 0, and W0 is the Expected Sum of Ranks under H0: Randomly ordered, and W is the Wilcoxon Rank Sums.

Application Example:

PROC LOGISTIC reports c=0.911960

This method calculates as AUC=0.9119491555

For sample code, check

For recent discussion threads on SAS-L, check and subsequent posts under the same topic.

Submitted by Liang Xie (My Blog). Contact me at my Discussion Page.

...see also