As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.


Here are some tips for How to share your SAS knowledge with your professional network.


Changing Variable Type

From sasCommunity
Revision as of 10:30, 25 April 2011 by Amos (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The task is to create a successor to a SAS data set, selectively replacing character variables with numerics, and numeric variables with character ones. Here is a method which preserves variable order and variable labels. First create a data set for the demonstration:

proc sql;
create table mixedtype 
 (   Month num
   , Year num
   , MeanTemp char(6) Label = 'Mean Temperature (Celsius)'
   , Rainfall char(6)  Label = 'Rainfall (Centimeters)'
 );
insert into mixedtype (Month, Year, MeanTemp, Rainfall)
 values (1, 2008, '5.5', '6.2')
 values (2, 2008, '6.4', '5.9')
 values (3, 2008, '9.8', '4.9')
 ;
quit;

A control file specifies details. For example:

data control;
infile cards missover;
input name $ in_format : $15.;
cards; 
Year      4.
MeanTemp  .
;

NAME of course identifies the variables to be converted. IN_FORMAT makes it possible to override the default format/informat used in the conversion.

Here is the SQL code which makes everything happen:

proc sql;

reset noprint;

select case when flagged
            then cat (   case when type EQ 'num'
                              then 'put'
                              when type EQ 'char'
                              then 'input'
                              else ''
                              end
                       , '('
                       , trim(dictcols.name)
                       , ','
                       , case when type EQ 'num'
                              then coalesce(   trim(in_format)
                                             , 'best12.' )
                              when type EQ 'char'
                              then coalesce(   trim(in_format)
                                             , '32.' )
                              else ''
                              end
                       , ') as '
                       , trim(dictcols.name)
                       , case when type EQ 'char'
                                   and
                                   not missing(in_format)
                              then catt(' informat=',in_format)
                              else ''
                              end 
                       , trim(label)
                     )
            else dictcols.name
            end
 into : selections separated by ','
 from (select   name
              , type
              , case when missing(label)
                     then ' ' 
                     else catt (   ' label='
                                 , quote( trim(label) )
                               )
                     end as label
              , varnum 
        from dictionary.columns as dictcols
        where libname EQ 'WORK' and memname EQ 'MIXEDTYPE'
      )
      natural left join
      (select * , 1 as flagged from control)
 order by varnum ;

create table remixed as select &selections from mixedtype; 

quit;

To expose things a bit, include this code:

%put &selections ;

reset print;

select   name     format=$10.
       , memname  format=$10.
       , varnum   format=10.
       , type     format=$8.
       , length   format=6.
       , informat format=$8.
       , label    format=$30.
 from dictionary.columns
 where libname='WORK' and memname in ('MIXEDTYPE','REMIXED')
 order by varnum, memname desc;

Here is the content of the macro variable SELECTIONS:

Month,
put(Year,4.) as Year,
input(MeanTemp,32.) as MeanTemp label="Mean Temperature (Celsius)",
Rainfall

Here is the result of the last SELECT:

                            Column
Column      Member       Number in  Column    Column  Column
Name        Name             Table  Type      Length  Informat  Column Label
------------------------------------------------------------------------------------------
Month       REMIXED              1  num            8
Month       MIXEDTYPE            1  num            8
Year        REMIXED              2  char           4
Year        MIXEDTYPE            2  num            8
MeanTemp    REMIXED              3  num            8            Mean Temperature (Celsius)
MeanTemp    MIXEDTYPE            3  char           6            Mean Temperature (Celsius)
Rainfall    REMIXED              4  char           6            Rainfall (Centimeters)
Rainfall    MIXEDTYPE            4  char           6            Rainfall (Centimeters)

What if there are a lot of variables, and all but a few are to be converted? Just build the CONTROL table with the exceptions (variables to be left alone) and change the logic in the first CASE structure by replacing

when flagged

with

when not flagged

If it is also necessary to to override some formats and informats, include those variables in the CONTROL table but expand the first CASE condition to

when not ( flagged and missing(in_format) )

It may make sense to have a zero-row CONTROL table. This forces either all variables or no variables to be converted. This technique probably makes the most sense when more restrictions are incorporated into the first CASE condition. To convert all variables, or all numerics, or all character variables, the conditions would be

when not flagged 
when type EQ  'num' and not flagged 
when type EQ 'char' and not flagged 

respectively. These three special cases are equivalent to the methods shown in Convert All Variables to Opposite Type, Convert Numeric Variables to Character, and Convert Character Variables to Numeric.