Transpose using IDGROUP in PROC SUMMARY
From sasCommunity
A question was posted on SAS-L "Array Question" regarding how to transpose two or more variables to wide format in a single step, the actual question related to determining the dimimsion of for the array. This question comes up often where a user wants to make data really wide by transposing many variables into a single observation. Most often I like to use PROC TRANPOSE to tranpose data over a data step transpose, it seems more clear most of the time. However when transposing to "really wide" like this example takes at least two calls to PROC TRANSPOSE and possibly a data step merge.
This example uses a feature of PROC SUMMARY that will allow the user to transpose two or more variables to "really wide" format in one step. Actually two steps including the step to determine the dimension for the SAS enumerated variable lists.
Consider the following data. The object is to transpose all variables by PT so there is one observation per PT.
data have; input PT day day2; cday = put(day,words20.); cday2 = put(day2,words20.); format day day2 f8.1; attrib cday: label='Label for Cday'; cards; 1 10 20 1 11 21 1 12 22 2 13 23 2 14 24 2 . 25 2 16 23 3 13 23 3 14 24 3 . 25 ;;;; run; Obs PT day day2 cday cday2 1 1 10.0 20.0 ten twenty 2 1 11.0 21.0 eleven twenty-one 3 1 12.0 22.0 twelve twenty-two 4 2 13.0 23.0 thirteen twenty-three 5 2 14.0 24.0 fourteen twenty-four 6 2 . 25.0 missing twenty-five 7 2 16.0 23.0 sixteen twenty-three 8 3 13.0 23.0 thirteen twenty-three 9 3 14.0 24.0 fourteen twenty-four 10 3 . 25.0 missing twenty-five
We will end up creating several groups of SAS enumerated variables so we want to determine the maximum value for that enumeration. That value needs to become part of SAS code later on so we will put the value into a macro variable. One easy enough way to do that is to use PROC SQL and find the MAX of the record count for each subject.
proc sql noprint feedback;
select max(ptcount) into :dim
from (select count(pt) as ptcount from have group by pt);
quit;
run;
Now we know the diminsion of the enumerated variable list we can call PROC SUMMARY to transpose the observations to "really wide format". The macro variable DIM becomes the [N] specification for the OUT parameter in the IDGROUP statement. We also list the names of the variables we want to transpose in the (id-variable-list) specification of the OUT parameter. This list can include characacter or numeric variables. The SAS enumerated variables created from this list also inherit the attributes of the varaibles in the list.
proc summary data=have nway; class pt; output out=need idgroup(out[&dim](day: cday:)=); run;
The statements about produce the following data set.
The CONTENTS Procedure
Variables in Creation Order
# Variable Type Len Format Label
1 PT Num 8
2 _TYPE_ Num 8
3 _FREQ_ Num 8
4 day_1 Num 8 F8.1
5 day_2 Num 8 F8.1
6 day_3 Num 8 F8.1
7 day_4 Num 8 F8.1
8 day2_1 Num 8 F8.1
9 day2_2 Num 8 F8.1
10 day2_3 Num 8 F8.1
11 day2_4 Num 8 F8.1
12 cday_1 Char 30 Label for Cday
13 cday_2 Char 30 Label for Cday
14 cday_3 Char 30 Label for Cday
15 cday_4 Char 30 Label for Cday
16 cday2_1 Char 20 Label for Cday
17 cday2_2 Char 20 Label for Cday
18 cday2_3 Char 20 Label for Cday
19 cday2_4 Char 20 Label for Cday
20 day Num 8 F8.1
21 day2 Num 8 F8.1
Obs PT _TYPE_ _FREQ_ day_1 day_2 day_3 day_4
1 1 1 3 10.0 11.0 12.0 .
2 2 1 4 13.0 14.0 . 16.0
3 3 1 3 13.0 14.0 . .
Obs day2_1 day2_2 day2_3 day2_4 cday_1 cday_2
1 20.0 21.0 22.0 . ten eleven
2 23.0 24.0 25.0 23.0 thirteen fourteen
3 23.0 24.0 25.0 . thirteen fourteen
Obs cday_3 cday_4 cday2_1 cday2_2 cday2_3
1 twelve twenty twenty-one twenty-two
2 missing sixteen twenty-three twenty-four twenty-five
3 missing twenty-three twenty-four twenty-five
Obs cday2_4 day day2
1 11.0 21.0
2 twenty-three 14.3 23.8
3 13.5 24.0
