Macro CombSort

From sasCommunity
Jump to: navigation, search

Posted to SAS-L by

  • Paul Dorfman
  • 2008-Dec-12

The reason I am posting this is because I have developed a habit of searching SAS-L for... my own code. Having been asked by a coworker about a SAS implementation of the CombSort algorithm, I remembered I had posted it on SAS-L, yet when I resorted to the habit in order to locate it, I could only find a subsequent post of mine stating that I had posted it earlier. Durn it! I need to be able to find not the reference to the real thing, but the thing itself.

Luckily, it is easy to re-implement the algorithm anew. For those who happen to have never heard of it, CombSort is related to the bubble sort approximately as Shell sort is related to the insertion sort. To my knowledge, it was first described in Byte magazine waaaay ago. The nice thing about CombSort is that, although at N=1000000, it is 2-3 times slower than iterative Quicksort, is it much shorter and uncomparably easier to implement and get right. Moreover, at N=10000, the speed difference becomes practically negligible.

SAS CALL SORTN/C routines are quite a bit quicker at large Ns, but again, at N<~10000 the speed difference dwindles to non-essentiality, while CombSort, being an exchange sorting algorithm, has the advantage of being suited in the same guise for both numeric and character arrays. That is, for most practical intents and purposes, it is both simple enough and quick enough - unlike its bubble sort brethren which runs out of breath already at N=1000 due to its O(N**2) nature.

Here is the "naked" DATA step code for sorting an array A ascending (for descending, reverse the CONTINUE-inducing inequality in the code):

    do g = hbound (A) - 1 by 0 while (s or g > 1) ;
       g  = int (g / 1.3) ;
       if     g  in (0    ) then g =  1 ;
       else if g in (9, 10) then g = 11 ;
       s = 0 ;
       do j = lbound (A) to hbound (A) - g ;
          k = j + g ;
          if A[j] >= A[k] then continue ;
          t      = A[j] ;
          A[j] = A[k] ;
          A[k] = t      ;
          s = 1 ;
       end ;
    end ;

On my desktop (same I am using to type this), it sorts a completely disordered numeric array with N=1000000 in 4 seconds flat (quicksort does it just under 2). Or, one may prefer to encapsulate (omitting parameter validation):

%macro combsort (arr =, order = <);
drop __:;
do __g = hbound (&arr) - 1 by 0 while (__s or __g > 1);
   __g = int (__g / 1.3);
   if      __g in (0    ) then __g =  1;
   else if __g in (9, 10) then __g = 11;
   __s = 0;
   do __j = lbound (&arr) to hbound (&arr) - __g;
      __k = __j + __g;
      if &arr[__j] &order &arr[__k] then continue;
      __t      = &arr[__j];
      &arr[__j] = &arr[__k];
      &arr[__k] = __t;
      __s = 1;
      end;
   end;
%mend;

and then, for example:

DATA  _Null_;
   array Test [1000000] _temporary_;
   retain iWantItSorted 1;
 
   do _N_ = lbound (test) to hbound (test);
      test[_N_] = ranuni (1);
   end;
 
   if iWantItSorted then do; 
      %combsort(arr = test);
   end;
 
   sorted = 1;
 
   do _N_ = lbound (test) to hbound (test) - 1  while (sorted);
      if test[_N_ + 1] < test[_N_] then sorted = 0;
   end;
   put sorted =;
run;

Kind regards


Paul Dorfman Jax, FL


References