As the first step in the decommissioning of the site has been converted to read-only mode.

Here are some tips for How to share your SAS knowledge with your professional network.

Tip of the Day:July 14

From sasCommunity
Jump to: navigation, search

sasCommunity Tip of the Day

Accurately Calculate AUC (Area Under the Curve) in SAS for a binary classifier rank ordered data.

AUC is the area under a receiver operating characteristic (ROC) curve and is an important measurement on the accuracy of a binary classifier. It equals to c-statistics of Hosmer and Lemeshow, and is related closely to other measures such as the Gini index, etc.

In order to calculate AUC for a given SAS data set that is already rank ordered by a binary classifier (such as linear logistic regression), where we have the binary outcome Y and rank order measurement P_0 or P_1 (for class 0 and 1 respectively), we can use PROC NPAR1WAY to obtain Wilcoxon Rank Sum statistics and from there we are able to obtain accurate measurement of AUC for this given data. Such method avoids computation burden from approaches using numerical integration or use PROC LOGISTIC to refit this model (PROC LOGISTIC will not report c-statistics without fitting a model)

The relationship between AUC and Wilcoxon Rank Sum test statistics is:

AUC = (W-W0)/(N1*N0)+0.5

where N1 and N0 are the frequency of class 1 and 0, and W0 is the Expected Sum of Ranks under H0: Randomly ordered, and W is the Wilcoxon Rank Sums.

Application Example:

PROC LOGISTIC reports c=0.911960

This method calculates as AUC=0.9119491555

For sample code, click here.

For recent discussion threads on SAS-L, see this post and subsequent posts under the same topic.

Submitted by Liang Xie (My Blog). Contact me at my Discussion Page.

Feel free to comment on this tip.

Prior tip - Next tip - Random Tip

Submit a Tip