As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.
Here are some tips for How to share your SAS knowledge with your professional network.
How to implement K-Nearest-Neighbor in SAS
K-Nearest-Neighbor, aka KNN, is a widely-used data mining tool and is often called a memory-based/case-based/instance-based method as no model is fit. A good introduction to KNN can be found at .
Typically, the KNN algorithm relies on a sophisticated data structure called a kd-Tree to quickly find the closest points for a given point in high dimensional space, which is not easy to implement in SAS on your own. There are two ways to work around this:
- Use PROC LOESS. PROC LOESS is able to build a kd-Tree if you specify the
DETAILS(KDTREE)option in the MODEL statement and use ODS table kdTree to output the structure of kdTree. The problem with this method is that PROC LOESS is very time-consuming in dealing with even moderate data and is not able to handle large amounts of data. The number of dimensions should be less than or equal to 15; besides, you have to code your own nearest neighbor search algorithm using the generated kd-Tree.
- Simply ask PROC DISCRIM to use a nonparametric method by using option
METHOD=NPAR K=. Note to not use the
R=option at the same time, which corresponds to radius-based of nearest-neighbor method. Also pay attention to how PROC DISCRIM treats categorical data automatically. Sometimes, you may want to change categorical data into metric coordinates in advance. Since PROC DISCRIM doesn't output the Tree it built internally, use the
data= test= testout=options to score the new data set.
- Trevor Hastie, Robert Tibshirani, Jerome Friedman, "Elements of Statistical Learning", Ed.2, Springer, 2008
- J. L. Bentley. "Multidimensional binary search trees used for associative searching", Communications of the ACM, 18(9):509-517, 1975
Submitted by Liang Xie. Contact me at my Discussion Page.