Transpose using IDGROUP in PROC SUMMARY

From sasCommunity

Jump to: navigation, search

A question was posted on SAS-L "Array Question" regarding how to transpose two or more variables to wide format in a single step, the actual question related to determining the dimimsion of for the array. This question comes up often where a user wants to make data really wide by transposing many variables into a single observation. Most often I like to use PROC TRANPOSE to tranpose data over a data step transpose, it seems more clear most of the time. However when transposing to "really wide" like this example takes at least two calls to PROC TRANSPOSE and possibly a data step merge.

This example uses a feature of PROC SUMMARY that will allow the user to transpose two or more variables to "really wide" format in one step. Actually two steps including the step to determine the dimension for the SAS enumerated variable lists.

Consider the following data. The object is to transpose all variables by PT so there is one observation per PT.

data have;
   input PT day day2;
   cday  = put(day,words20.);
   cday2 = put(day2,words20.);
   format day day2 f8.1;
   attrib cday: label='Label for Cday';
   cards;
1    10    20
1    11    21
1    12    22
2    13    23
2    14    24
2     .    25
2    16    23
3    13    23
3    14    24
3     .    25
;;;;
   run;

Obs    PT         day        day2    cday        cday2

  1     1        10.0        20.0    ten         twenty
  2     1        11.0        21.0    eleven      twenty-one
  3     1        12.0        22.0    twelve      twenty-two
  4     2        13.0        23.0    thirteen    twenty-three
  5     2        14.0        24.0    fourteen    twenty-four
  6     2          .         25.0    missing     twenty-five
  7     2        16.0        23.0    sixteen     twenty-three
  8     3        13.0        23.0    thirteen    twenty-three
  9     3        14.0        24.0    fourteen    twenty-four
 10     3          .         25.0    missing     twenty-five

We will end up creating several groups of SAS enumerated variables so we want to determine the maximum value for that enumeration. That value needs to become part of SAS code later on so we will put the value into a macro variable. One easy enough way to do that is to use PROC SQL and find the MAX of the record count for each subject.

proc sql noprint feedback;
   select max(ptcount) into :dim 
      from (select count(pt) as ptcount from have group by pt);
   quit;
   run;

Now we know the diminsion of the enumerated variable list we can call PROC SUMMARY to transpose the observations to "really wide format". The macro variable DIM becomes the [N] specification for the OUT parameter in the IDGROUP statement. We also list the names of the variables we want to transpose in the (id-variable-list) specification of the OUT parameter. This list can include characacter or numeric variables. The SAS enumerated variables created from this list also inherit the attributes of the varaibles in the list.

proc summary data=have nway;
   class pt;
   output out=need idgroup(out[&dim](day: cday:)=);
   run;


The statements about produce the following data set.


                     The CONTENTS Procedure

                  Variables in Creation Order

    #    Variable    Type    Len    Format    Label

    1    PT          Num       8
    2    _TYPE_      Num       8
    3    _FREQ_      Num       8
    4    day_1       Num       8    F8.1
    5    day_2       Num       8    F8.1
    6    day_3       Num       8    F8.1
    7    day_4       Num       8    F8.1
    8    day2_1      Num       8    F8.1
    9    day2_2      Num       8    F8.1
   10    day2_3      Num       8    F8.1
   11    day2_4      Num       8    F8.1
   12    cday_1      Char     30              Label for Cday
   13    cday_2      Char     30              Label for Cday
   14    cday_3      Char     30              Label for Cday
   15    cday_4      Char     30              Label for Cday
   16    cday2_1     Char     20              Label for Cday
   17    cday2_2     Char     20              Label for Cday
   18    cday2_3     Char     20              Label for Cday
   19    cday2_4     Char     20              Label for Cday
   20    day         Num       8    F8.1
   21    day2        Num       8    F8.1


Obs  PT  _TYPE_  _FREQ_     day_1     day_2     day_3     day_4

 1    1     1       3        10.0      11.0      12.0        .
 2    2     1       4        13.0      14.0        .       16.0
 3    3     1       3        13.0      14.0        .         .

Obs    day2_1    day2_2    day2_3    day2_4  cday_1     cday_2

 1       20.0      21.0      22.0        .   ten       eleven
 2       23.0      24.0      25.0      23.0  thirteen  fourteen
 3       23.0      24.0      25.0        .   thirteen  fourteen

Obs  cday_3   cday_4   cday2_1         cday2_2      cday2_3

 1   twelve            twenty        twenty-one   twenty-two
 2   missing  sixteen  twenty-three  twenty-four  twenty-five
 3   missing           twenty-three  twenty-four  twenty-five

Obs    cday2_4            day        day2

 1                       11.0        21.0
 2   twenty-three        14.3        23.8
 3                       13.5        24.0


Personal tools