Macro AutoMagic

From sasCommunity
Jump to: navigation, search

This article discusses the issues of using arrays of macro variables -- a sequentially-numbered set of macro variables with a common prefix -- and suggests an elegant or proposed solution.

Problems

  • creation and dimension
    • data step and Nobs
    • sql and sql-obs
  •  %do loop with upper-bound = Nobs
  • references with double ampersands &&Var&I
  • text returned:
    • tokens
    • statements
    • steps
    • macro calls

Summary:

  • Creation of a macro array requires at least one step, either data or sql.
  • Reference in a %do loop requires a dimension: N-items, of the array.
  • Output requires a macro to parse.

Proposed solution: a macro function AutoMagic which hides all of the above.

Parameters:

  • data, required
  • action, either of:
    • MacroName, parameters of called macro are same as variables of data set
    • Text, contains macro variable references to variables of data

Examples:

  •  %AutoMagic(data = libref.data,MacroName=)
  •  %AutoMagic(data = libref.data,text=)

Splitting Data

This is a common FAQ: How do I split a data set based on the values of a classification variable?

Old Style: Two Mini Macros!

This log shows an over-simplified example of what it takes to write miniature macros to do the below splitting-data example.

The concepts of encapsulating the loop in one macro and the action in other macros is from Fehd and Carpenter: List Processing Basics.

1          *callmacro-demo-sashelp-old-style;
2          options mprint;
3          proc freq data = sashelp.class;
4                    tables Sex
5                           / list missing noprint
6                       out = Work.Class_freq_sex;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS_FREQ_SEX has 2 observations and 3 variables.

7          PROC SQL;
7        !           select Sex, Count, Percent
8                    into  :Sex1     -     :Sex&SysMaxLong.
9                         ,:Count1   -   :Count&SysMaxLong.
10                        ,:Percent1 - :Percent&SysMaxLong.
11                   from  Work.Class_freq_sex;
12                   %Let  Dim_Class_sex = &SqlObs;
23                   quit;
27         %macro DoLoop(item=,dim=,MacroName=,semicolon=0);
28         %do I = 1 %to &Dim.;
29             %&MacroName(&Item. = &&&Item&I)
30             %if &Semicolon. %then %do;
31                 ;
32                 %end;
33             %end;
34         %mend;
35         %doloop(item     = sex
36                ,dim      = &Dim_Class_sex.
37                ,macroname= put note
38                ,semicolon= 1)
note(sex = F)
note(sex = M)
39         ;
40         %Macro data_set_name(sex=);
41                Sex_eq_&Sex.
42                %mend;
43         %Macro if_then(sex=);
44                if Sex eq "&Sex." then output Sex_eq_&Sex.;
45                %mend;

46         DATA %doloop(item     = sex
47                     ,dim      = &Dim_Class_sex.
48                     ,macroname= data_set_name)
MPRINT(DATA_SET_NAME):   Sex_eq_F
49                     ;
MPRINT(DATA_SET_NAME):   Sex_eq_M
50         do until(EndoFile);
51            set sashelp.Class end = EndoFile;
52            %doloop(item     = sex
53                   ,dim      = &Dim_Class_sex.
54                   ,macroname= if_then)
MPRINT(IF_THEN):   if Sex eq "F" then output Sex_eq_F;
MPRINT(IF_THEN):   if Sex eq "M" then output Sex_eq_M;
55            end;
56         stop;
57         run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.SEX_EQ_F has 9 observations and 5 variables.
NOTE: The data set WORK.SEX_EQ_M has 10 observations and 5 variables.

Notes:

  • options mprint is necessary in order to see the tokens returned by the mini macros.
  • building the macro array with sql required a complete step
  • three mini macros are required

= Using AutoMatic to split data

13         PROC Freq data   = sashelp.Class;
14                   tables   Sex
15                          / noprint
16                      out = Work.Freq_Class_Sex;
17         run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.FREQ_CLASS_SEX has 2 observations and 3 variables.

**         expected: DATA Sex_M Sex_F;
27         DATA
28              %AutoMagic(data = Work.Freq_Class_Sex
29                        ,text = 
30              (work.Sex_&Sex(label="Count:&Count Pcnt:%scan(&Percent.,1,.)"))
32                        )
note: AutoMagic reading Work.Freq_Class_Sex obs=2 vars=3
note: AutoMagic returns work.Sex_F(label="Count:9 Pcnt:47")
note: AutoMagic returns work.Sex_M(label="Count:10 Pcnt:52")
33              ;*end: data statement;
34         do until(EndoFile);
35            set sashelp.class
36                end = endoFile;
37            %AutoMagic(data = Work.Freq_Class_Sex
38                      ,text = 
39               (if Sex eq "&Sex" then output work.Sex_&Sex; else)
40                      )
note: AutoMagic reading Work.Freq_Class_Sex obs=2 vars=3
note: AutoMagic returns if Sex eq "F" then output work.Sex_F; else
note: AutoMagic returns if Sex eq "M" then output work.Sex_M; else
41               ;*semicolon needed for end: else;
42            end;
43         stop;
44         run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.SEX_F has 9 observations and 5 variables.
NOTE: The data set WORK.SEX_M has 10 observations and 5 variables.

Notes:

  • short and to the point
  • no options mprint required to see tokens generated
  • shorter, cleaner, more obvious: expected tokens are in-line in code and in log

Data Review

Using Format Value Other=Invalid

This example shows how to automate data review. For each variable needing review create a format and add the item other = "&Invalid." to show unexpected values.

This trick of using value ... other = "&Invalid." was also published by Cody.

Program:

  • create format
  • create sample data set with one invalid value of categorical variable
    • add a format to the variable to review
  • create a proc contents output data set
  • data step with choose statements :
    • if put(Var1,format1.) eq 'invalid' then output;

Log

**         macro variable with common label for other=;
11         %Let Fmt_Invalid = invalid;

**         create the data-review format(s);
12         PROC Format library = work
13                     fmtlib;
14                     value $Gender
15                           'F' = 'female'
16                           'M' = 'male'
17                         other = "&Fmt_Invalid.";
NOTE: Format $GENDER has been output.

NOTE: The PROCEDURE FORMAT printed page 1.

**         add the data-review format(s) to the variables to be reviewed;
19         DATA Work.Class;
20         if 0 then set sashelp.class;

21         attrib Sex format = $Gender.;*<---<<< contains other='invalid';

22         do until(EndoFile);
23            set sashelp.class end = EndoFile;
24            output;
25            end;
**         This is the observation that has an invalid value;
26         Name = 'Chris';
27         Sex  = 'I';
28         output;
29         stop;
30         run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 20 observations and 5 variables.

32         PROC Contents data = Work.Class
33                              noprint
34                       out  = Work.Contents
35                      (keep =
36            NAME    %*char(32) label='Variable Name',       ;
37            TYPE    %*num label='Variable Type',            ;
38            LENGTH  %*num label='Variable Length',          ;
39            LABEL   %*char(256) label='Variable Label',     ;
40            FORMAT  %*char(32) label='Variable Format',     ;
41                      );

NOTE: The data set WORK.CONTENTS has 5 observations and 5 variables.

43         PROC SQL;
43       !           describe table Work.Class;
NOTE: SQL table WORK.CLASS was created like:
create table WORK.CLASS( bufsize=4096 )
  (
   Name char(8),
   Sex char(1) format=$GENDER., %*<---<<< generate data-review for this variable;
   Age num,
   Height num,
   Weight num
  );

45                   quit;

46         PROC Print data = Work.Contents;
47                    title  Work.Contents;

NOTE: There were 5 observations read from the data set WORK.CONTENTS.
NOTE: The PROCEDURE PRINT printed page 2.

49         %Macro DataRevu
50           (NAME    =/*char(32) label='Variable Name',       */
51           ,TYPE    =/*num label='Variable Type',            */
52           ,LENGTH  =/*num label='Variable Length',          */
53           ,LABEL   =/*char(256) label='Variable Label',     */
54           ,FORMAT  =/*char(32) label='Variable Format',     */
55         );
56         %if &Format. ne %then %do;
57             %if %sysfunc(getoption(MPRINT)) ne MPRINT %then
58                 %Put note: &SysMacroName %cmpres(
59             if put(&Name.,&Format..) eq "&Fmt_Invalid." then output);
60             if put(&Name.,&Format..) eq "&Fmt_Invalid." then output;
61             %end;
62         %mend;

64         DATA Work.DataReview;
65         do until(EndoFile);
66            set Work.Class end = EndoFile;
67            %AutoMagic(Data      = Work.Contents
68                      ,MacroName = DataRevu)
note: AutoMagic reading Work.Contents obs=5 vars=5
note: DATAREVU if put(Sex,$GENDER.) eq "invalid" then output
69            end;
70         stop;
71         

NOTE: There were 20 observations read from the data set WORK.CLASS.
NOTE: The data set WORK.DATAREVIEW has 1 observations and 5 variables.    

72         Proc Print data = &SysLast.;
73                    title  &SysLast.;
74         run;

NOTE: There were 1 observations read from the data set WORK.DATAREVIEW.


Listing

‚       FORMAT NAME: $GENDER  LENGTH:    7   NUMBER OF VALUES:    3        ‚
‚   MIN LENGTH:   1  MAX LENGTH:  40  DEFAULT LENGTH:   7  FUZZ:        0  ‚

‚START           ‚END             ‚LABEL  (VER. V7|V8   11APR2012:08:17:17)‚
‚F               ‚F               ‚female                                  ‚
‚M               ‚M               ‚male                                    ‚
‚**OTHER**       ‚**OTHER**       ‚invalid                                 ‚
 
Work.Contents      

Obs    NAME      TYPE    LENGTH    LABEL    FORMAT

 1     Age         1        8                      
 2     Height      1        8                      
 3     Name        2        8                      
 4     Sex         2        1               $GENDER
 5     Weight      1        8                      
 
WORK.DATAREVIEW    

Obs    Name       Sex      Age    Height    Weight

 1     Chris    invalid     15     66.5       112 

Using Proc Summary to Identify Hi or Low Values

This example uses an output data set from Proc Summary to identify high (max) or low (min) values.

Program:

  1. allocate global macro variables of parameters
  2. Proc Summary
  3. Print Libref.Data where Var eq high (max) or low (min)
  4. Proc Summary with Class
  5. Print Libref.Data where for each Class, Var eq high (max) or low (min)

Log

10         options mprint;* source2;*testing;
12         
13         %Let In_Lib  = Library;
14         %Let In_Lib  = sashelp;
15         %Let In_Data = Class;
16         %Let Class   = Sex;
17         %Let Var     = age;

21         %Let Summary = Work.Summary;
22         
23         PROC Summary data   = &In_Lib..&In_Data.;
24                      var      &Var.;
25                      output
26                         out = &Summary.
27                     (  drop =  _Type_  _Freq_)
28         %Macro SmryStats(Name=);
29                min   (&Name.)=
30                max   (&Name.)=
31                /*************
32                mean  (&Name.)=
33                median(&Name.)=
34                std   (&Name.)=
35                /***********/
36         %mend;
37         %SmryStats(Name=&Var.)
MPRINT(SMRYSTATS):   = _Type_ _Freq_) min (age)= max (age)=
38                             / autoname;
39         run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.SUMMARY has 1 observations and 2 variables.

41         PROC Print data = &SysLast.;
42                    title3 &SysLast.;
43         run;

NOTE: There were 1 observations read from the data set WORK.SUMMARY.

44         Proc Print data = &In_Lib..&In_Data.
45                    (where = (
46                    %AutoMagic(data =&Summary.
47                              ,text=(   &Var. eq &&&Var._min
48                                     or &Var. eq &&&Var._max)
50                              )
note: AutoMagic reading Work.Summary obs=1 vars=2
note: AutoMagic returns age eq 11 or age eq 16
51                    )         );
52                    title3 "&In_Lib..&In_Data. where &Var. eq  min or max";
53         
54         run;

NOTE: There were 3 observations read from the data set SASHELP.CLASS.
      WHERE age in (11, 16);

55         PROC Summary data   = &In_Lib..&In_Data.;
56                      var      &Var.;
57                      class    &Class.;
58                      output
59                         out = &Summary.
60                     (  drop =  _Type_  _Freq_
61                       where = (&Class. ne ' ')
62                     )
63                      %SmryStats(Name=&Var.)
MPRINT(SMRYSTATS):   min (age)= max (age)=
64                             / autoname;
65         

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.SUMMARY has 2 observations and 3 variables.

66         Proc Print data = &Summary.;
67                    title3 &Summary.;
68         

NOTE: There were 2 observations read from the data set WORK.SUMMARY.

69         Proc Print data = &In_Lib..&In_Data.
70                    (where = (
71                    %AutoMagic(data =&Summary.
72                              ,text=(      &Class. eq "&&&Class."
73                                     and (   &Var. eq  &&&Var._min
74                                          or &Var. eq  &&&Var._max
75                                     ) or )
77                              ) 0
note: AutoMagic reading Work.Summary obs=2 vars=3
note: AutoMagic returns Sex eq "F" and ( age eq 11 or age eq 15 ) or
note: AutoMagic returns Sex eq "M" and ( age eq 11 or age eq 16 ) or
78                    )        );
79                    title3
80                        "&In_Lib..&In_Data. by &Class. where &Var eq min or max";
81         run;

NOTE: There were 5 observations read from the data set SASHELP.CLASS.
      WHERE ((Sex='F') and age in (11, 15)) 
         or ((Sex='M') and age in (11, 16));
NOTE: The PROCEDURE PRINT printed page 4.

Listing

WORK.SUMMARY

Obs    Age_Min    Age_Max
 1        11         16  
 
sashelp.Class where age eq min or max

Obs     Name     Sex    Age    Height    Weight
 11    Joyce      F      11     51.3       50.5
 15    Philip     M      16     72.0      150.0
 18    Thomas     M      11     57.5       85.0

 
Work.Summary

Obs    Sex    Age_Min    Age_Max
 1      F        11         15  
 2      M        11         16  
 
 
sashelp.Class by Sex where age eq min or max

Obs     Name     Sex    Age    Height    Weight
  8    Janet      F      15     62.5      112.5
 11    Joyce      F      11     51.3       50.5
 14    Mary       F      15     66.5      112.0
 15    Philip     M      16     72.0      150.0
 18    Thomas     M      11     57.5       85.0


DISCLAIMER

All these example logs have been heavily edited!  ;-)


References

These are the two macros used in the above description

Predecessors:

-- created by User:Rjf2 12:01 am, 01 April 2012 (EDT)

--Ronald_J._Fehd macro.maven == the radical programmer