Finding nearest neighbors

From sasCommunity

Jump to: navigation, search

The problem: For each of a given a set of points with spatial coordinates, find the N closest points. Test data generator:

data have;
do CaseID = '11','22','33','44','55';	
   X = ceil (9 * ranuni(123) );
   Y = ceil (9 * ranuni(123) );
   output;
   end;
run;

Result:

Case
 ID     X    Y

 11     7    3
 22     2    9
 33     4    2
 44     8    4
 55     2    2

Use PROC DISTANCE to build a distance matrix:

proc distance data=have out=distance shape=square; 
id CaseID;
var interval (X Y);
run;

Result:

Case
 ID       _11        _22        _33        _44        _55

 11     0.00000    7.81025    3.16228    1.41421    5.09902
 22     7.81025    0.00000    7.28011    7.81025    7.00000
 33     3.16228    7.28011    0.00000    4.47214    2.00000
 44     1.41421    7.81025    4.47214    0.00000    6.32456
 55     5.09902    7.00000    2.00000    6.32456    0.00000

Now normalize:

proc transpose data = distance
 out = distance_list
       ( where = ( CaseID ne substr(_name_,2) ) 
         rename = (col1 = Distance)
       );
by CaseID;
run;

The WHERE condition serves to discard the trivial diagonal elements. Notice the SUBSTR call. It serves to ignore the leading underscores in the variable names, which in turn exist because all of the CaseID values are not valid as variable names. If the CaseID values were all valid variable names, the underscores would not be there and the SUBSTR transformation would necessarily be removed (leaving a simple reference to _NAME_). If some CaseID values are valid variable names and some are not, things get messy.

Next, sort:

proc sort data=distance_list;
by CaseID Distance;
run;

Result:

Case
 ID     _NAME_    Distance

 11      _44       1.41421
 11      _33       3.16228
 11      _55       5.09902
 11      _22       7.81025
 22      _55       7.00000
 22      _33       7.28011
 22      _11       7.81025
 22      _44       7.81025
 33      _55       2.00000
 33      _11       3.16228
 33      _44       4.47214
 33      _22       7.28011
 44      _11       1.41421
 44      _33       4.47214
 44      _55       6.32456
 44      _22       7.81025
 55      _33       2.00000
 55      _11       5.09902
 55      _44       6.32456
 55      _22       7.00000

Finally, subset:

data nearest;
do countoff = 1 by 1 until (last.CaseID);
   set distance_list	;
   by CaseID;
   if countoff <= 2 then output;
   end;
run;	

End result:

            Case
countoff     ID     _NAME_    Distance

    1        11      _44       1.41421
    2        11      _33       3.16228
    1        22      _55       7.00000
    2        22      _33       7.28011
    1        33      _55       2.00000
    2        33      _11       3.16228
    1        44      _11       1.41421
    2        44      _33       4.47214
    1        55      _33       2.00000
    2        55      _11       5.09902
Personal tools