Finding nearest neighbors
From sasCommunity
The problem: For each of a given a set of points with spatial coordinates, find the N closest points. Test data generator:
data have; do CaseID = '11','22','33','44','55'; X = ceil (9 * ranuni(123) ); Y = ceil (9 * ranuni(123) ); output; end; run;
Result:
Case ID X Y 11 7 3 22 2 9 33 4 2 44 8 4 55 2 2
Use PROC DISTANCE to build a distance matrix:
proc distance data=have out=distance shape=square; id CaseID; var interval (X Y); run;
Result:
Case ID _11 _22 _33 _44 _55 11 0.00000 7.81025 3.16228 1.41421 5.09902 22 7.81025 0.00000 7.28011 7.81025 7.00000 33 3.16228 7.28011 0.00000 4.47214 2.00000 44 1.41421 7.81025 4.47214 0.00000 6.32456 55 5.09902 7.00000 2.00000 6.32456 0.00000
Now normalize:
proc transpose data = distance
out = distance_list
( where = ( CaseID ne substr(_name_,2) )
rename = (col1 = Distance)
);
by CaseID;
run;
The WHERE condition serves to discard the trivial diagonal elements. Notice the SUBSTR call. It serves to ignore the leading underscores in the variable names, which in turn exist because all of the CaseID values are not valid as variable names. If the CaseID values were all valid variable names, the underscores would not be there and the SUBSTR transformation would necessarily be removed (leaving a simple reference to _NAME_). If some CaseID values are valid variable names and some are not, things get messy.
Next, sort:
proc sort data=distance_list; by CaseID Distance; run;
Result:
Case ID _NAME_ Distance 11 _44 1.41421 11 _33 3.16228 11 _55 5.09902 11 _22 7.81025 22 _55 7.00000 22 _33 7.28011 22 _11 7.81025 22 _44 7.81025 33 _55 2.00000 33 _11 3.16228 33 _44 4.47214 33 _22 7.28011 44 _11 1.41421 44 _33 4.47214 44 _55 6.32456 44 _22 7.81025 55 _33 2.00000 55 _11 5.09902 55 _44 6.32456 55 _22 7.00000
Finally, subset:
data nearest; do countoff = 1 by 1 until (last.CaseID); set distance_list ; by CaseID; if countoff <= 2 then output; end; run;
End result:
Case
countoff ID _NAME_ Distance
1 11 _44 1.41421
2 11 _33 3.16228
1 22 _55 7.00000
2 22 _33 7.28011
1 33 _55 2.00000
2 33 _11 3.16228
1 44 _11 1.41421
2 44 _33 4.47214
1 55 _33 2.00000
2 55 _11 5.09902
