As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.


Here are some tips for How to share your SAS knowledge with your professional network.


Convert Character Variables to Numeric

From sasCommunity
Jump to: navigation, search

The task is to create a successor to a SAS data set, replacing each character variable in the original with a numeric variable. Here is a method which preserves variable order and variable attributes. First create a data set for the demonstration:

proc sql;
create table mixedtype 
 (   Month num format=monyy7.
   , MeanTemp char(6) Label = 'Mean Temperature (Celsius)'
   , Rainfall char(6)  Label = 'Rainfall (Centimeters)'
 );
insert into mixedtype (Month, MeanTemp, Rainfall)
 values ('01Jan2008'd, '5.5', '6.2')
 values ('01Feb2008'd, '6.4', '5.9')
 values ('01Mar2008'd, '9.8', '4.9')
 ;

Result:

         Mean
         Temperature  Rainfall
  Month  (Celsius)    (Centimeters)
-----------------------------------
JAN2008  5.5          6.2
FEB2008  6.4          5.9
MAR2008  9.8          4.9

Next, this somewhat tricky query which uses DICTIONARY.COLUMNS as a source of metadata and creates the bulk of another query:

select case type
        when 'num'  then name
        when 'char' then catx (   ' ' 
                                , 'input ('
                                , name
                                , ', f32. ) as'
                                , name
                                , 'label ='
                                , quote( strip(label) )                               
                              )
        else             catx (   ' '
                                , quote('Error on type')
                                , 'as'
                                , name
                              )
        end
 into : selections separated by ' , '
 from dictionary.columns
 where libname='WORK' and memname='MIXEDTYPE';

To adapt for other tables change only the literals in the WHERE clause.

Result:

Month
input ( MeanTemp , f32. ) as MeanTemp label = "Mean Temperature (Celsius)"
input ( Rainfall , f32. ) as Rainfall label = "Rainfall (Centimeters)"

The INTO clause takes these expressions and concatenates them into a comma-separated string which it stores in the macro variable SELECTIONS, which in turn is used in the statement which actually does the work:

create table allnum as select &selections from mixedtype;
%symdel selections;

Result:

                Mean
         Temperature       Rainfall
  Month    (Celsius)  (Centimeters)
-----------------------------------
JAN2008          5.5            6.2
FEB2008          6.4            5.9
MAR2008          9.8            4.9

Another query against DICTIONARY.COLUMNS compares the attributes of the original and replacement variables:

select   name    format=$10.
       , memname format=$10.
       , varnum  format=10.
       , type    format=$8.
       , length  format=6.
       , format  format=$8.
       , label   format=$30.
 from dictionary.columns
 where libname='WORK' and memname in ('ALLNUM','MIXEDTYPE')
 order by varnum, memname desc;
quit;

Result:

                            Column
Column      Member       Number in  Column    Column  Column
Name        Name             Table  Type      Length  Format    Column Label
------------------------------------------------------------------------------------------
Month       MIXEDTYPE            1  num            8  MONYY7.
Month       ALLNUM               1  num            8  MONYY7.
MeanTemp    MIXEDTYPE            2  char           6            Mean Temperature (Celsius)
MeanTemp    ALLNUM               2  num            8            Mean Temperature (Celsius)
Rainfall    MIXEDTYPE            3  char           6            Rainfall (Centimeters)
Rainfall    ALLNUM               3  num            8            Rainfall (Centimeters)

This demonstrates that variable order and variable attributes are preserved.

Limitation: This technique will not produce correct results for variables requiring special informats (most notably, times and dates).

See also Convert Numeric Variables to Character and Convert All Variables to Opposite Type.