As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.


Here are some tips for How to share your SAS knowledge with your professional network.


Transposing a Zero-Observation Data Set

From sasCommunity
Jump to: navigation, search

Q: Is it possible to prevent PROC TRANSPOSE from generating an output observation when the input data set to the transpose contains no observations? This was asked on SAS-L

A: Preventing such observations is dicey, but there are a number of useful techniques which can be used downstream to shed the unwanted observations.

First, let's see just what is generated in this situation. We need, to start, a data set with no observations. We can create it by running:

data nothing;
retain by1 by2 copy1 copy2 var1 var2 0;
stop;
label var1 = 'Label for Var1';
run;

Now we'll transpose it with:

proc transpose data=nothing out=nothing_transposed;
by by1 by2;
copy copy1 copy2;
var var1 var2;
run;

The output looks like this:

by1    by2    copy1    copy2    _NAME_       _LABEL_       
. . . . var1 Label for Var1 . . . . var2

The output data set always includes the _NAME_ variable. In addition we have any and all BY and COPY variables, with missing values of course. If any of the variables being transposed has a label, the _LABEL_ variable will also be in the output. There is one observation for each variable being transposed.

These unwanted observations can be excluded by coding a WHERE filter which requires at least one BY variable to be non-missing:

proc transpose data=nothing
 out=nothing_transposed(where = (n(by1,by2)>0) );
by by1 by2;
copy copy1 copy2;
var var1 var2;
label var1 = 'Label for Var1';
run;

But what if there are no BY variables, or if it is legitimate for there to be observations where all of the BY variables have missing values? Then it's probably necessary to accept a second-best solution, admitting the unwanted observations in the PROC TRANSPOSE output and dealing with them later.

Before showing how that might be done, let's set up some different test data:

data empty not_empty;
set sashelp.class;
label height = 'Height in Inches'
      weight = 'Weight in Pounds';
if _n_ in (2,12);
output not_empty;
run;

NOT_EMPTY looks like this:

Name     Sex    Age    Height    Weight   
Alice F 13 56.5 84 Judy F 14 64.3 90

and EMPTY is of course empty. If we transpose NOT_EMPTY thusly:

proc transpose data=not_empty out=legit;
by name sex;
var height weight;
run;

we get

Sex    Name     _NAME_        _LABEL_         COL1     
F Alice Height Height in Inches 56.5 F Alice Weight Weight in Pounds 84.0 F Judy Height Height in Inches 64.3 F Judy Weight Weight in Pounds 90.0

whereas the parallel transposition of EMPTY:

proc transpose data=    empty out=bogus;
by name sex;
var height weight;
run;

yields:

Sex    Name    _NAME_        _LABEL_              
Height Height in Inches Weight Weight in Pounds

The most straightforward way to get rid of the unwanted observations is with SQL:

delete from legit
 where not exists(select * from not_empty);     
delete from bogus where not exists(select * from empty);

The results are, as desired,

NOTE: No rows were deleted from WORK.LEGIT.
NOTE: 2 rows were deleted from WORK.BOGUS

The same thing could have been done with DATA steps, but it's a bit more complicated:

data legit;
modify legit;
if none then remove;
else stop;
delete;
set not_empty(drop=_all_) end=none;
run;                                         
data bogus; modify bogus; if none then remove; else stop; delete; set empty(drop=_all_) end=none; run;

In the SAS-L thread, it turned out that the user wanted to append the rows to a master table. In that case, why bother deleting? Instead, just lose the unwanted rows during the append process. To demonstrate, we must first create the target:

proc sql;
create table master like legit;
describe table master;
create table master
 (Name    char (8),
  Sex     char (1),
  _NAME_  char (8),
  _LABEL_ char(40),
  COL1    num
 );

Now we can insert the wanted rows:

insert into master
 select * from legit
  where exists(select * from not_empty);                      
insert into master select * from bogus where exists(select * from empty);

The first statement works as it should, and the log shows:

NOTE: 4 rows were inserted into WORK.MASTER.

The second statement, the one involving the table derived from the empty table, is a different story. We get:

ERROR: Attempt to insert fewer columns than specified after the INSERT table name.

That's because the COL1 column is absent in BOGUS, so that the insertion set is not compatible with its target. In a sense there is no problem; we don't want anything inserted and that is precisely the outcome. However, the presence of the ERROR may be unacceptable. Here's a workaround. First create an empty copy of the target:

create table model like master;

Now apply MODEL's structure to the insertion set via the OUTER UNION operator and insert the results:

 insert into master
 select * from model where 0
 outer union corresponding
 select * from legit
  where exists(select * from not_empty);     
insert into master select * from model where 0 outer union corresponding select * from bogus where exists(select * from empty);

Now both statements work.

The strategy of filtering out the unwanted rows during an append process can also be implemented in DATA steps, though once again the code is more obscure:

data master;
if _n_=1 then set not_empty(drop=_all_);
set legit;
output;
delete;
modify master;
run;                                         
data master; if _n_=1 then set empty(drop=_all_); set bogus; output; delete; modify master; run;

Finally, a more general approach to leaving the PROC TRANSPOSE output data sets untouched and instead filtering out the unwanted rows later is to create views for the purpose. Here is the SQL version:

create view legit_filtered as
 select * from legit
  where exists(select * from not_empty);   
create view bogus_filtered as select * from bogus where exists(select * from empty);

The DATA step counterparts are:

data legit_filtered / view=legit_filtered;
set legit;
if none then stop;
output;
delete;
set not_empty(drop=_all_) end=none;
run;                                               
data bogus_filtered / view=bogus_filtered; set bogus; if none then stop; output; delete; set empty(drop=_all_) end=none; run;

Expanding variable lists with PROC TRANSPOSE shows how the variables which arise from transposing an empty SAS data set can be useful.