As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.


Here are some tips for How to share your SAS knowledge with your professional network.


Autorecode

From sasCommunity
Jump to: navigation, search

SPSS has a handy command called autorecode, which converts a variable to categories, in alphanumeric order. There is no similar command in SAS, but "data _null_", writing on SAS-L gave some code.

First read in some data, containing strings:

data work.test; 
   input string $50.; 
   cards; 
a string
a string
another string
another string
yet another string
more categories
more
more

and more
and one more

;;;;
   run;

There is no need to sort the input data. The default order for the variable STRING will be alphabetical, and that can be changed using class statement options.

Use PROC SUMMARY to create a new dataset (work.recode), containing the levels of the string. The $UPCASE format is added cause any case differences to be ignored.

proc summary data=work.test nway; 
   class string; 
   output out=work.recode(drop=_type_ _freq_) / levels; 
   format string $upcase50.;
   run; 

By adding the MISSING option to the PROC SUMMARY statement above you can create a missing category that will be LEVEL 1 by default. However it may be better to handle missing as a special case when the INFORMAT is created in the next step.


Use the data file (work.recode) to create a PROC FORMAT control data set (work.control):


data work.control; 
   retain fmtname 'str2num' type 'I' hlo 'UJ '; 
   length label $40;
   do until(eof);
      set work.recode end=eof;
      start = upcase(string); 
      label = vvalue(_level_); 
      output;
      end;
   call missing(start); *missing;
   _level_ = _level_ + 1;
   label = vvalue(_level_);
   output;
   label = '_error_'; *unknown causes invalid data error;
   hlo = 'UJO';
   output;
   stop;
   run; 
proc format cntlin=work.control;
   run;


Create a new variable using the INFORMATS that encodes the values of string:

data work.test; 
   set work.test end=eof;
   attrib num length=8 informat=str2num.; 
   num = inputn(string,vinformat(num));
   run;

Print the data, to check it.

proc print; 
   run;

Obs    string                num

  1    a string               2
  2    a string               2
  3    another string         5
  4    another String         5
  5    yet another string     7
  6    more categories        6
  7    More                   1
  8    more                   1
  9                           8
 10    and more               3
 11    and one more           4
 12                           8