Re-ordering variables

From sasCommunity
Jump to: navigation, search

There are many ways of changing the ordering of variables in SAS. The basic rule is the "Rule of First Encounter:" the first time that SAS encounters a variable in the course of the construction of a new dataset determines the order in which that variable is entered into the new dataset.

So, to change the order of variables, you must change the order in which SAS encounters variables.

Define a dataset:

 DATA OLD;
      A=1; B=2; C=3; OUTPUT;
 RUN;

In this dataset, variables are in order A B C.

 DATA NEW;
      SET OLD;
      LENGTH B A C 3;
 RUN;

Variables are still in order A B C, since they are first encountered in Dataset OLD, and that sets the ordering for dataset NEW.

To change the order, ensure that the new order is encountered first:

 DATA RENEW;
      LENGTH B A C 3;
      SET OLD;
 RUN;

Now Dataset RENEW has a new ordering.

As mentioned earlier any statement that lists the variables in the desired order before any other statement will reorder the variables in the newly created dataset. The most common are the RETAIN, LENGTH, ATTRIB, LABEL, and FORMAT Statements. So which one should be used? That's debatable. But consider the RETAIN statement since all variables coming from a SET dataset are automatically RETAINed (note this is also true for MERGE, MODIFY, and UPDATE statements). As such using a RETAIN statement to ONLY reorder variables in a dataset has no unintended side effect. All other statements require the programmer to specify some attribute of each variable.

 DATA RENEW;
      RETAIN B A C;
      SET OLD;
 RUN;

[Following paragraphs started out as a separate article, so some integration is needed.]

Q: How do I change the order of variables in my data set?

or

How can I reorder variables in data set?

A-1: use retain statement

DATA   Order_I_Want;
retain Var6 Var9 Var2 Var4;
set    Libref.data;

note: easy, variable type (C,N) does not matter

caveat: DO NOT use assignment statements

A-2: use attribute statement

DATA   Order_I_Want;
attrib Var6 
       Var9 
       Var2 
       Var4;
set    Libref.data;

note-1: can add or change other attributes like type and label;

note-2: assignment statements ok.

see also: Attribute statement


PLEASE use the LABEL, ATTRIB, LENGTH, or something other than the RETAIN statement. Make sure you understand how it works and document the fact that no logic is to be placed in the data step where you are using the retain statement or you could be putting your company and possibly your job at risk. If you disagree with me, then it is obvious you don't understand the door you open when you use RETAIN.

On the discussion tab there is a nice conversation about whether or not to use RETAIN, and under what circumstances it is safe/not safe to use when reordering variables. --Art Carpenter (talk) 01:12, 11 May 2014 (CDT)

General Technique

Here is an approach that should work under any circumstances to manipulate the Rule of First Encounter. It is introduced by example. Start with this data set:

data oldclass ;
set sashelp.class ;
label age    = 'Age in Years'
      height = 'Height in Inches'
      weight = 'Weight in Pounds' ;
run ;
 
proc contents data=oldclass varnum ; run ;
            Variables in Creation Order

 #    Variable    Type    Len    Label

 1    Name        Char      8
 2    Sex         Char      1
 3    Age         Num       8    Age in Years
 4    Height      Num       8    Height in Inches
 5    Weight      Num       8    Weight in Pounds

The initial task is to create a new character variable INITIAL containing the first letter of the NAME variable and a new numeric variable BMI containing the body mass index computed for the teens only. Here is a solution.

data newclass ;
set oldclass ;
Initial = substr(name,1,1) ;
if age GT 12 then BMI = 703 * weight / height**2 ;
run ;

But suppose that there is an additional requirement, that the variables in the output data set appear in a somewhat particular order. SEX is to appear first, followed by AGE, BMI, and NAME; the remaining variables (HEIGHT, WEIGHT, and INITIAL) are to follow, but in no particular order.

The solution is now:

data newclass ;
retain Sex Age ;
label BMI=' ' ;
retain Name ;
set oldclass ;
Initial = substr(name,1,1) ;
if age GT 12 then bmi = 703 * weight / height**2 ;
run ;

Notice that 4 variables are mentioned for the sake of the Rule of First Encounter, using RETAIN statements for the variables inherited from OLDCLASS and LABEL statements for the newly computed variables. This is in general the most reliable approach. To see why, try breaking the rules by using RETAIN for a new variable and LABEL for an inherited one:

data trouble ;
retain BMI ;
label age=' ' ;
set oldclass ;
Initial = substr(name,1,1) ;
if age GT 12 then bmi = 703 * weight / height**2 ;
run ;

Two problems materialize.

  • Several BMI values are inappropriately propagated forward.
  • The label for AGE disappears.

There are special cases and circumstances which might permit the first-encounter mentions to be safely consolidated in one statement. For example, if you know all of the variables in the ordering requirement are inherited, use RETAIN. Or, if there are no labels to be inherited, or the labels aren't needed, use LABEL.

Why LABEL statements? There are two advantages. They do not prevent later statements from creating labels for the variables in question, and they defer variable typing (numeric vs. character).

One remaining drawback is the loss of case pattern (upper, lower, or some particular mix) in the names of variables. These have to be re-specified in the RETAIN statement(s); they will not otherwise be inherited. That's why data set TROUBLE has a variable "age" even though it's "Age" in the source data set.

One way to preserve case patterns is to avoid RETAIN and instead use not-executed SET statements like

if 0 then set oldclass(keep = name) ;

Note that a separate SET statement is needed for each variable to maintain complete control over order, so this can get verbose.

See also