14
Feb

The difference between CLASS statements and BY statements in SAS

When I first learned to program in SAS, I remember being confused about the difference between CLASS statements and BY statements. A novice SAS programmer recently asked when to use one instead of the other, so this article explains the difference between the CLASS statement and BY variables in SAS procedures.

The BY statement and the CLASS statement in SAS both enable you to specify one or more categorical variables whose levels define subgroups of the data. (For simplicity, we consider only a single categorical variable.) The primary difference is that the BY statement computes many analyses, each on a subset of the data, whereas the CLASS statement computes a single analysis of all the data. Specifically,

  • The BY statement repeats an analysis on every subgroup. The subgroups are treated as independent samples. If a BY variable defines k groups, the output will contains k copies of every table and graph, one copy for the first group, one copy for the second group, and so on.
  • The CLASS statement enables you to include a categorical variable as part of an analysis. Often the CLASS variable is used to compare the groups, such as in a t test or an ANOVA analysis. In regression models, the CLASS statement enables you to estimate parameters for the levels of a categorical variable, thereby estimating the effect of each level on the response. Another use of a CLASS variable is to define categories for a classification task, such as a discriminant analysis.

To illustrate the differences between an analysis that uses a BY statement and one that uses a CLASS statement, let's create a subset (called Cars) of the Sashelp.Cars data. The levels of the Origin variable indicate whether a vehicle is manufactured in "Asia", "Europe", or the "USA". For efficiency reasons, most classical SAS procedures require that you sort the data when you use a BY statement. Therefore, a call to PROC SORT creates a sorted version of the data called CarsSorted, which will be used for the BY-group analyses.

data Cars;
   set Sashelp.Cars;
   where cylinders in (4,6,8) and type ^= 'Hybrid'; 
run;
 
proc sort data=Cars out=CarsSorted; 
   by Origin; 
run;

Descriptive statistics for grouped data

When you generate descriptive statistics for groups of data, the univariate statistics are identical whether you use a CLASS statement or a BY statement. What changes is the way that the statistics are displayed. When you use the CLASS statement, you get one table that contains all statistics or one graph that shows the distribution of each subgroup. However, when you use the BY statement you get multiple tables and graphs.

The following statements use the CLASS statement to produce descriptive statistics. PROC UNIVARIATE displays one (paneled) graph that shows a comparative histogram for the vehicles that are made in Asia, Europe, and USA. PROC MEANS displays one table that contains descriptive statistics:

proc univariate data=Cars;
   class Origin;
   var Horsepower;
   histogram Horsepower / nrows=3; /* must use NROWS= to get panel */
   ods select histogram;
run;
 
proc means data=Cars N Mean Std;
   class Origin;
   var Horsepower Weight Mpg_Highway;
run;

In contrast, if you run a BY-group analysis on the levels of the Origin variable, you will see three times as many tables and graphs. Each analysis is preceded by a label that identifies each BY group. Notice that the BY-group analysis uses the sorted data.

proc means data=CarsSorted N Mean Std;
   by Origin;
   var Horsepower Weight Mpg_Highway;
run;

Always remember that the output from a BY statement is equivalent to the output from running the procedure multiple times on subsets of the data. For example, the previous statistics could also be generated by calling PROC MEANS three times, each call with a different WHERE clause, as follows:

proc means N Mean Std data=CarsSorted( where=(origin='Asia') );
   var Horsepower Weight Mpg_Highway;
run;
proc means N Mean Std data=CarsSorted( where=(origin='Europe') );
   var Horsepower Weight Mpg_Highway;
run;
proc means N Mean Std data=CarsSorted( where=(origin='USA') );
   var Horsepower Weight Mpg_Highway;
run;

In fact, if you ever find yourself repeating an analysis many times (perhaps by using a macro loop), you should consider whether you can rewrite your program to be more efficient by using a BY statement.

Comparing groups: Use the CLASS statement

As a general rule, you should use a CLASS statement when you want to compare or contrast groups. For example, the following call to PROC GLM performs an ANOVA analysis on the horsepower (response variable) for the three groups defined by the Origin variable. The procedure automatically creates a graph that displays three boxplots, one for each group. The procedure also computes parameter estimates for the levels of the CLASS variable (not shown).

proc glm data=Cars; /* by default, create graph with side-by-side boxplots */
   class Origin;
   model Horsepower = Origin / solution;
run;

You can specify multiple variables on the CLASS statement to include multiple categorical variables in a model. Any variables that are not listed on the CLASS statement are assumed to be continuous. Thus the following call to PROC GLM analyzes a model that has one continuous and one classification variable. The procedure automatically produces a graph that overlays the three regression curves on the data:

ods graphics /antialias=on;
title "CLASS Variable Regression: One Model with Multiple Parameters";
proc GLM data=Cars plots=FitPlot;
   class Origin;
   model Horsepower = Origin | Weight / solution;
   ods select ParameterEstimates ANCOVAPlot;
quit;

In contrast, if you use a BY statement, the Origin variable cannot be part of the model but is used only to subset the data. If you use a BY statement, you obtain three different models of the form Horsepower = Weight. You get three parameter estimates tables and three graphs, each showing one regression line overlaid on a subset of the data.

Predicted Values: CLASS VARIABLE versus BY Variable

When you use a BY statement and fit three models of the form Horsepower = Weight, the procedure fits a total of six parameters. Notice that when you use the CLASS statement and fit the model Horsepower = Origin | Weight, you also fit six free parameters. It turns out that these two methods produce the same predicted values. In fact, you can combine the parameter estimates (for the GLM parameterization) for the CLASS model to obtain the parameter estimates from the BY-variable analysis, as shown below. Each parameter estimate for the BY-variable models are obtained as the sum of two estimates for the CLASS-variable analysis:

For many regression models, the predicted values for the BY-variable analyses are the same as for a particular model that uses a CLASS variable. As shown above, you can even see how the parameters are related when you use a GLM or reference parameterization. However, the CLASS variable formulation can fit models (such as the equal-slope model Horsepower = Origin Weight) that are not available when you use a BY variable to fit three separate models. Furthermore, the CLASS statement provides parameter estimates so that you can see the effect of the groups on the response variable. It is more difficult to compare the models that are produced by using the BY statement.

Other CLASS-like statements in SAS

Some SAS procedures use other syntax to analyze groups. In particular, the SGPLOT procedure calls classification variables "group variables." If you want to overlay graphs for multiple groups, you can use the GROUP= option on many SGPLOT statements. (Some statements support the CATEGORY= option, which is similar.) For example, to replicate the two-variable regression analysis from PROC GLM, you can use the following statements in PROC SGPLOT:

proc sgplot data=Cars;
   reg y=Horsepower x=Weight / group=Origin; /* Horsepower = Origin | Weight */
run;

Summary

In summary, use the BY statement in SAS procedures when you want to repeat an analysis for every level of one or more categorical variables. The variables define the subsets but are not otherwise part of the analysis. In classical SAS procedures, the data must be sorted by the BY variables. A BY-group analysis can produce many tables and graphs, so you might want to suppress the ODS output and write the results to a SAS data set.

Use the CLASS statement when you want to include a categorical variable in a model. A CLASS statement often enables you to compare or contrast subgroups. For example, in regression models you can evaluate the relative effect of each level on the response variable.

In some cases, the BY statement and the CLASS statement produce identical statistics. However, the CLASS statement enables you to fit a wider variety of models.

The post The difference between CLASS statements and BY statements in SAS appeared first on The DO Loop.

8
Feb

Demystifying certification (Part 3): To the finish

Want to see my newly minted certified professional badge? Scroll down to take a peek. Yes, I managed to successfully complete the Base SAS Programmer certification exam… with, ahem, flying colors I might add. Here are my tips to tackle the Base SAS certification exam: 1.  Get clear on the [...]

The post Demystifying certification (Part 3): To the finish appeared first on SAS Learning Post.

1
Feb

Demystifying SAS Certification (Part 2): Write it down

Do you know what the #1 fear in North America is? Most people say fear of public speaking or fear of death, but you may just want to consider this new fear upping the charts - the fear of writing a SAS Certification exam! The real test begins before even [...]

The post Demystifying SAS Certification (Part 2): Write it down appeared first on SAS Learning Post.

25
Jan

Let SAS write batch scripts for you

Batch Scripts in SASIf Necessity is the mother of Invention, then, perhaps, the father of Automation is Laziness. Automation is all about convenience, comfort, and productivity. Why do it yourself if you can devise something to do it for you!

In my previous post Running SAS programs in batch under Unix/Linux, we learned that in order to automate SAS jobs submissions, they are often run in batch mode. We also learned that we usually create batch scripts as a convenient way to run SAS programs in batch. To create a unique SAS log file generated with each batch submission, a typical batch script may look like follows:

#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/project1/program1.sas"
logname="/sas/code/project1/program1_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname

It will allow you to submit your SAS program /sas/code/project1/program1.sas in batch, and also capture SAS log file with a convenient date-time suffix in the same directory.

SAS program to write batch scripts

But what if we are deploying multiple SAS programs? Well, then we would need to create a batch script for each of them. They will all look similar to each other, and that is when most human errors usually occur – when we do something similar, monotonously, over and over again.  Besides, I found working with the Unix Visual Editor (“vi editor”) is not quite a 21st century experience.

What would a normal SAS programmer do in such a situation? That’s right – we would write a SAS program to write a batch script file! Let’s do it. Let’s automate the automation.

In its simplest form, to replicate the above batch script example our SAS program would look like this:

filename b '/sas/code/project1/program1.sh';
 
data _null_;
file b;
input;
put _infile_;
datalines;
#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/project1/program1.sas"
logname="/sas/code/project1/program1_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname
;

Setting up batch file permissions

As we already know from my previous post, we need to assign certain permissions to our batch file in order to make it executable. For example, if you want to give yourself (Owner) and Group execution permissions then your script file permissions can be as:

-rwxr-x---, or 750 in octal representation.

In order to do that you can to add to your SAS code the following x-statement:

options noxwait;
x 'chmod 750 /sas/code/project1/program1.sh';

Alternatively, you can use %SYSEXEC macro statement (no quoting for the OS command) or SYSTASK statement, or CALL SYSTEM routine (used within a data step).

When you create a batch script by running the above code in SAS Enterprise Guide (EG), you don’t have to leave the comfort of your SAS environment or even touch Unix vi editor. Moreover, you can even submit your SAS job in batch mode right from your SAS EG Program Editor.

However, all of these will work fine, unless XCMD System Option is disabled (NOXCMD).

Assigning batch file permissions when XCMD System Option is disabled

ERROR: Shell escape is not valid in this SAS session.

Bummer! Have you ever seen this error message in SAS Enterprise Guide while trying to run SAS code with the X statement? It indicates that executing OS commands in the SAS environment is not allowed.

In many organizations, IT department policies do not allow enabling the SAS XCMD system option due to cyber-security concerns. This is usually done system-wide for the whole SAS Enterprise client-server installation via SAS configuration.  In this case, no operating system command is allowed to be executed from within SAS.

Of course, this substantially limits SAS’ automation power, but that is the goal and the price to pay for enhanced security.

Still, even without OS command execution at our disposal, we can set Unix script file permissions using FILENAME statement’s PERMISSION= option. Then our above filename statement will look like this:

filename b '/sas/code/project1/program1.sh'
   permission='A::u::rwx,A::g::r-x,A::o::---';

However, it is important to realize that your ability to fully control file permissions via FILENAME statement’s PERMISSION= option is still restricted by the Unix umask value set by your IT system administrator. But usually, it is not overly restrictive, at least for the purpose of creating executable files in the environments I have worked with.

The double benefit of the FILENAME statement’s PERMISSION= option is that it can be used for setting up file permissions in any SAS installation whether the XCMD system option is enabled or disabled.

SAS macro to create batch script files

Let’s wrap all the above SAS code pieces into a SAS macro that writes batch scripts. Here is the macro code definition:

%macro write_shell(code);
   %let fdir = %substr(&code,1,%sysfunc(findc(&code,/,b)));
 
   options dlcreatedir;
   libname _flib "&fdir";
   libname _flib;
 
   %let core = %substr(&code,1,%eval(%length(&code)-4));
   filename _fout "&core..sh" permission='A::u::rwx,A::g::r-x,A::o::---'; 
   data _null_;
      file _fout;
      put
         '#!/bin/sh' //
         'now=$(date +%Y.%m.%d_%H.%M.%S)' /
         "pgmname=""&code""" /
         "logname=""&core._$now.log""" /
         'sas $pgmname -log $logname'
         ;
   run;
   filename _fout;
%mend write_shell;

The single macro parameter (code) represents full path name of your SAS code. And here is a macro invocation example:

 
%write_shell(/sas/code/project1/program1.sas)

The assumption here is that the script file gets created in the same directory as the relevant SAS code and SAS logs for each of the batch runs. It will be assigned the same name as your SAS program, only with the .sh name extension. As you can see, we do some string parsing to derive directory name, script file name and SAS log file name from the single macro parameter representing full path name of your SAS code. As an added bonus, if a specified directory (/sas/code/project1/) does not yet exist, it will be created by this macro. DLCREATEDIR System Option (along with the two subsequent libname statements) are responsible for the directory creation.

If you want to create many script files for your multiple SAS programs, you just invoke the macro as many times. You can even go totally data-driven for mass script file creation.

Do you find this useful?

Please let me know in the comments section below if you find this blog post useful. Thank you for reading! I also invite you to share your ideas and experiences on the topic.

Let SAS write batch scripts for you was published on SAS Users.

24
Jan

Getting your SAS 9 code to run multi-threaded in SAS Viya 3.3

This article and accompanying technical white paper are written to help SAS 9 users process existing SAS 9 code multi-threaded in SAS Viya 3.3.
Read the full paper, Getting Your SAS 9 Code to Run Multi-Threaded in SAS Viya 3.3.

The Future is Multi-threaded Processing Using SAS® Viya®

When I first began researching how to run SAS 9 code as multi-threaded in SAS Viya, I decided to compile all the relevant information and detail the internal processing rules that guide how the SAS Viya Cloud Analytics Services (CAS) server handles code and data. What I found was that there are a simple set of guidelines, which if followed, help ensure that most of your existing SAS 9 code will run multi-threaded. I have stashed a lot of great information into a single whitepaper that is available today called “Getting Your SAS 9 Code to Run Multi-threaded in SAS Viya”.

Starting with the basic distinctions between single and parallel processing is key to understanding why and how some of the parallel processing changes have been implemented. You see, SAS Viya technology constructs code so that everything runs in pooled memory using multiple processors. Redefining SAS for this parallel processing paradigm has led to huge gains in decreasing program run-times, as well as concomitant increases in accuracy for a variety of machine learning techniques. Using SAS Viya products helps revolutionize how we think about undertaking large-scale work because now we can complete so many more tasks in a fraction of the time it took before.

The new SAS Viya products bring a ton of value compared to other choices you might have in the analytics marketplace. Unfortunately most open source libraries and packages, especially those developed for use in Python and R, are limited to single-threading. SAS Viya offers a way forward by coding in these languages using an alternative set of SAS objects that can run as parallel, multi-threaded, distributed processes. The real difference is in the shared memory architecture, which is not the same as parallel, distributed processing that you hear claimed from most Hadoop and cloud vendors. Even though parallel, distributed processing is faster than single-threading, it proverbially hits a performance wall that is far below what pooled and persisted data provides when using multi-threaded techniques and code.

For these reasons, I believe that SAS Viya is the future of data/decision science, with shared memory running against hundreds if not thousands of processors, and returning results back almost instantaneously. And it’s not for just a handful of statistical techniques. I’m talking about running every task in the analytics lifecycle as a multi-threaded process, meaning end-to-end processing through data staging, model development and deployment, all potentially accomplished through a point-and-click interface if you choose. Give SAS Viya a try and you will see what I am talking about. Oh, and don’t forget to read my technical white paper that provides a checklist of all the things that you may need to consider when transitioning your SAS 9 code to run multi-threaded in SAS Viya!

Questions or Comments?

If you have any questions about the details found in this paper, feel free to leave them in the comments field below.

Getting your SAS 9 code to run multi-threaded in SAS Viya 3.3 was published on SAS Users.

23
Jan

Demystifying SAS certification – Week 1

Well! I did it! I conducted a workshop for internal SAS employees wishing to take SAS certification. The workshop was to help them understand the following: SAS Certification overview Exam preparation Website Content Exam pass score Sample questions Study material As the day for the exam approaches, on 5 February, [...]

The post Demystifying SAS certification – Week 1 appeared first on SAS Learning Post.

16
Jan

How to secure your REST API credentials in SAS programs

I've used SAS with a bunch of different REST APIs: GitHub, Brightcove, Google Analytics, Lithium, LinkedIn, and more. For most of these I have to send user/password or "secret" application tokens to the web service so that it knows who I am and what data I can retrieve. I do not want to keep this secret information in my SAS program files -- that would be a bad idea. If my credentials were part of the program -- even if they were obfuscated and not stored in clear text -- then anyone who managed to get a copy of my program could run it. And they could gain access to my data, as if they were me.

I've written about this topic for SAS-related passwords. In this article, I'll share the approach that I use for API credentials and tokens.

REST APIs: Each service requires different types of secrets

My REST API services don't require just simple user ID and password combos. It depends on the API, but usually the information is in the form of one or more tokens that I've generated using the vendor's developer console, or perhaps that have been granted by an administrator.

For example, to access the Google Analytics API, I need three things: a client ID, a client secret token, and a valid "refresh" token. I can send these three items to the Google OAuth2 API, and in return I'll receive a live "access" token that I can use to request my data. I think of this like checking into a hotel. I show my ID and a credit card at the front desk, and in exchange I receive a room key. Just like my hotel room key, the access token doesn't last forever and cannot be reused on my next visit.

Other APIs are simpler and require just a single token that never expires. That's more like a house key -- it's mine to use forever, or until someone decides to change the locks.

Whether a static token or a token-for-token exchange, I don't want to just leave these keys lying around for just anyone to find and use.

Hide your tokens in a file that only you can read

My simple approach is to store the token values in a text file within my home directory. Then, I change the permissions on the file such that only my account can read it. Whether I submit my program interactively (in SAS Enterprise Guide or SAS Studio) or as a scheduled batch job, it's running under my account. I'm showing the instructions here for UNIX/Linux, but Windows users can accomplish something similar with Windows permissions.

On Linux, I've used the chmod command to specify the octal value that says "only the owner can read/write." That's "chmod 600 filename". The "ls -l" command shows that this permissions mask has been applied.

chmod 600 ./.google_creds.csv
ls -l ./.google_creds.csv
> -rw------- 1 myid mygroup 184 Jan 15 12:41 ./.google_creds.csv

I stored my tokens in a standard CSV format because it's easy for SAS to read and it's easy for me to read if I ever need to change it.

Use INFILE to read the tokens dynamically

With this critical data now stored externally, and the file permissions in place, I can use SAS to read the credentials/tokens within my program and store the values in SAS macro variables. In the following SAS program, I assigned a macro variable to my user root folder. Since I might run this program on Linux or Windows, I used this trick to determine the proper path notation. I also used the &SYSUSERID macro variable to make my program more portable. If I want to supply this program to any colleagues (or to you!), the only thing that's needed is to create and store the token CSV files in the proper location.

/* My path is different for UNIX vs Windows */
%let authpath = %sysfunc(ifc(&SYSSCP. = WIN,
	 \netsharerootu&sysuserid.,
	 /u/&sysuserid.));
 
/* This should be a file that only YOU or trusted group members can read */
/* Use "chmod 0600 filename" in UNIX environment */
/* "dotfile" notation is convention for on UNIX for "hidden" */
filename auth "&authpath./.google_creds.csv";
 
/* Read in the secret account keys from another file */
data _null_;
 infile auth firstobs=2 dsd delimiter=',' termstr=crlf;
 length client_id $ 100 client_secret $ 30 refresh_token $ 60;
 input client_id client_secret refresh_token;
 call symputx('client_id',client_id);
 call symputx('client_secret',client_secret);
 call symputx('refresh_token',refresh_token);
run;

When I run this code in my production job, I can see the result:

NOTE: The infile AUTH is:
      Filename=/u/myid/.google_creds.csv,
      Owner Name=myid,Group Name=mygroup,
      Access Permission=rw-------,
      Last Modified=Mon Jan 15 12:41:58 2018,
      File Size (bytes)=184

NOTE: 1 record was read from the infile AUTH.
      The minimum record length was 145.
      The maximum record length was 145.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      user cpu time       0.01 seconds

For this example, my next step is to call the Google API to get my access token. I'll use the macro variables that my program created with CALL SYMPUT to build the proper API call.

/* Call Google API to exchange the refresh token for an active access token */
%let oauth2=https://www.googleapis.com/oauth2/v4/token;
filename rtoken temp;
proc http
 method="POST"
 url="&oauth2.?client_id=&client_id.%str(&)client_secret=&client_secret.%str(&)grant_type=refresh_token%str(&)refresh_token=&refresh_token."
 out=rtoken;
run;

See the full explanation of this Google Analytics example in this article.

The post How to secure your REST API credentials in SAS programs appeared first on The SAS Dummy.

11
Jan

How to list your mapped drives in a SAS program

If you work in a team environment, you might be accustomed to using mapped network drives for source data folders or to publish results. If you've recently moved to a SAS server environment, you might not have those mapped drives available. How can you tell?

This question was posted on the SAS Support Communities, and SAS Communities member Patrick provided a handy code snippet that does the trick. Patrick learned the trick himself from SAS Note 24818.

The following code uses a special DRIVEMAP device for the

filename diskinfo DRIVEMAP;
data _null_;
  infile diskinfo;
  input;
  put _infile_;
run;

Here's the output on my PC, where I have a two mapped network drives, U: and W:. (C: and D: are part of my laptop device.)

NOTE: The infile DISKINFO is:
      Drivemap Access Device,
      PROCESS,RECFM=V,LRECL=32767
C:
D:
U:
W:

Mapped drives are typically available only from your local machine. Even if you use SAS on a remote Windows server, the process that defines the mapped drive aliases is usually not triggered when you connect. If this is the case for you, you'll have to adjust your process to use UNC path notation (example: "\serverfolderproject" instead of the familiar "P:project" shortcut). (Additional note for SAS admins: on a Windows SAS server,

The post How to list your mapped drives in a SAS program appeared first on The SAS Dummy.

19
Dec

Running SAS programs in batch under Unix/Linux

Running SAS programs in batchWhile SAS program development is usually done in an interactive SAS environment (SAS Enterprise Guide, SAS Display Manager, SAS Studio, etc.), when it comes to running SAS programs in a production or operations environment, it is routinely done in batch mode.

Why run SAS programs in batch mode?

First and foremost, this is done for automation, as the batch process does not require human participation at the time of run. It can be scheduled to run (using Operating System scheduler or other scheduling software) while we sleep, at any time of the day or at any time interval between two consecutive runs.

Running SAS programs in batch mode allows streamlining SAS processing by eliminating the possibility of human error, submitting multiple SAS jobs (programs) all at once or in a sequence securing programs and/or data dependencies.

SAS batch processing also takes care of self-documenting, as it automatically generates and stores SAS logs and outputs.

Imagine the following scenario. Every night, a SAS batch process “wakes up” at 3 a.m. and runs an ETL process on a SAS Application server that extracts multiple tables from a database, transforms, combines, and loads them into a SAS datamart; then moves some data tables across the network and loads them into SAS LASR server, so when you are back to work in the morning your SAS Visual Analytics application has all its data refreshed and ready to roll. Of course, the process schedule can be custom-tailored to your particular needs; your batch jobs may run every 15 minutes, once a week, every first Friday of the month – you name it.

What is a batch script file?

To submit a single SAS program in batch mode manually, you could submit an OS command that looks something like the following:

Unix/Linux

sas /sas/code/proj1/job1.sas -log /sas/code/proj1/job1.log

DOS/Windows

"C:Program FilesSASHomeSASFoundation9.4Sas.exe" -SYSIN c:proj1job1.sas -NOSPLASH -ICON -LOG c:proj1job1.log

However, submitting an OS command manually has too many drawbacks: it’s too much typing, it only submits one SAS program at a time, and most importantly – it is manual, which means it is prone to human error.

Usually, these OS commands are packaged into so called batch files (shell scripts in Unix) that allow for sequential, parallel, as well as conditional execution of multiple OS line commands. They can be run either manually, or automatically – on schedule, or called by other batch scripts.

In a Windows/DOS Operating System, these script files are called batch files and have .bat filename extensions. In Unix-like operating systems, such as Linux, these script files are called shell scripts and have .sh filename extensions.

Since Windows batch files are similar, but slightly different from the Unix (and its open source cousin Linux) shell scripts, in the below examples we are going to use Unix/Linux shell scripts only, in order to avoid any confusion. And we are going to use terms Unix and Linux interchangeably.

Here is the typical content of a Linux shell script file to run a single SAS program:

#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/project1/program1.sas"
logname="/sas/code/project1/program1_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname

Note, that the shell script syntax allows for some basic programming features like current datetime function, formatting, and variables. It also provides some conditional processing similar to “if-then-else” logic. For detailed information on the shell scripting language you may refer to the following BASH shell script tutorial or any other source of many dialects or flavors of the shell scripting (C Shell, Korn Shell, etc.)

Let’s save the above shell script as the following file:
/sas/code/project1/program1.sh

How to submit a SAS program via Unix script

In order to run this shell script we would submit the following Linux command:
/sas/code/project1/program1.sh

Or, if we navigate to the directory first:
cd /sas/code/project1

then we can submit an abbreviated Linux command
./program1.sh
When run, this shell script not only executes a SAS program (program1.sas), but for every run it also creates and saves a uniquely named SAS Log file. You may create the SAS log file in the same directory where the SAS code is stored, as specified in the script shell above, or specify another directory of your choice.

For example, it creates the following SAS log file:
/sas/code/project1/program1_2017.12.06_09.15.20.log

The file name uniqueness is achieved by adding a date/time stamp suffix between the SAS program name and .log file name extension, in this particular case indicating that this SAS log file was created on December 6, 2017, at 09:15:20 (hours:minutes:seconds).

Unix script for submitting multiple SAS programs

Unix scripts may contain not only OS commands, but also other Unix script calls. You can mix-and-match OS commands and other script calls.

When scripts are created for each individual SAS program that you intend to run in a batch, you can easily combine them into a program flow by creating a flow script containing those single program scripts. For example, let’s create a script file /sas/code/project1/flow1.sh with the following contents:

/sas/code/project1/program1.sh
/sas/code/project1/program2.sh
/sas/code/project1/program3.sh

When submitted as

/sas/code/project1/flow1.sh

it will sequentially execute three scripts - program1.sh, program2.sh, and program3.sh, each of which will execute the corresponding SAS program - program1.sas, program2.sas, and program3.sas, and produce three SAS logs - program1.log, program2.log, and program3.log.

Unix script file permissions

In order to be executable, UNIX script files must have certain permissions. If you create the script file and want to execute it yourself only, the file permissions can be as follows:

-rwxr-----, or 740 in octal representation.

This means that you (the Owner of the script file) have Read (r), Write (w) and Execute (x) permission as indicated by the green highlighting; Group owning the script file has only Read (r) permission as indicated by yellow highlighting;  Others have no permissions to the script file at all as indicated by red highlighting.

If you want to give yourself (Owner) and Group execution permissions then your script file permissions can be as:

-rwxr-x---, or 750 in octal representation.

In this case, your group has Read (r) and Execute (x) permissions as highlighted in yellow.

In Unix, file permissions are assigned using the chmod Unix command.

Note, that in both examples above we do not give Others any permissions at all. Remember that file permissions are a security feature, and you should assign them at the minimum level necessary.

Conditional execution of scripts and SAS programs

Here is an example of a Unix script file that allows running multiple SAS programs and OS commands at different times.

#!/bin/sh

#1 extract data from a database
/sas/code/etl/etl.sh

>#2 copy data to the Visual Analytics autoload directory
scp -B userid@sasAPPservername:/sas/data/*.sas7bdat userid@sasVAservername:/sas/config/.../AutoLoad

#3 run weekly, every Monday
dow=$(date +%w)
if [ $dow -eq 1 ]
then
   /sas/code/alerts_generation.sh
fi

#4 run monthly, first Friday of every month
dom=$(date +%d)
if [ $dow -eq 5 -a $dom -le 7 ]
then
   /sas/code/update_history.sh
   /sas/code/update_transactions.sh
fi

In this script, the following logical operators are used: -eq (equal), -le (less or equal), -a (logical and).

As you can see, the script logic takes care of branching to execute different SAS programs when certain timing conditions are met. With such an approach, you would need to schedule only this single script to run at a specified time/interval, say daily at 3 a.m.

In this case, the script will “wake up” every morning at 3 a.m. and execute its component scripts either unconditionally, or conditionally.

If one of the included programs needs to run at a different, lesser frequency (e.g. every Monday, or monthly on first Friday of every month) the script logic will trigger those executions at the appropriate times.

In the above script example steps #1 and #2 will execute every time (unconditionally) the script runs (daily). Step #1 runs ETL program to extract data from a database, step #2 copies the extracted data across the network from SAS Application server to the SAS LASR Analytic server’s drop zone from where they are automatically loaded (autoloaded) into the LASR.

Step #3 will run conditionally every Monday ( $dow -eq 1). Step #4 will run conditionally every first Friday of a month ($dow -eq 5 -a $dom -le 7).

For more information on how to format date for use in shell scripts please refer to this post.

Do you run your SAS programs in batch?

Please share your batch experiences in the comment section below. I am sure the rest of us will really appreciate it!

Running SAS programs in batch under Unix/Linux was published on SAS Users.

13
Dec

How to get the current TITLE in SAS

The SAS language is large. Even after 20+ years of using SAS, there are many features that I have never used. Recently it became necessary for me to learn about DICTIONARY tables in PROC SQL (and the associated SASHELP views) because I needed to programmatically obtain the text for the current value of the system title in SAS. I had heard a lot about DICTIONARY tables, but this was my first time using them in a program.

This article discusses DICTIONARY tables and shows how to use the DICTIONARY.Titles table to obtain the current value of titles and footnotes in SAS.

DICTIONARY tables and titles

DICTIONARY tables are documented in the PROC SQL documentation. They are special read-only PROC SQL tables that contain information about the current state of SAS, including the state of libraries, data sets, and system options. The documentation lists all DICTIONARY tables, and I determined that I needed to look at the DICTIONARY.Titles table in PROC SQL or (if I needed to use another SAS procedure) the SASHELP.VTitle view, which contains the same information.

After finding out which table to use, I wanted to display the contents of the table. The following call to PROC SQL displays the structure of the table (names and types of variables) and the contents. Equivalently, you can use PROC CONTENTS and PROC PRINT to show similar information for the view SASHELP.VTitle. The output shows the table for a new SAS session:

proc sql;
describe table Dictionary.Titles;         /* writes to SAS log */
select * from Dictionary.Titles;          /* display table */
quit;
 
proc contents data=Sashelp.VTitle;  run;  
proc print data=Sashelp.VTitle;     run;

The table contains three variables:

  • The TYPE variable is a one-character variable with the values 'T' (for title) or 'F' (for footnote).
  • The NUMBER variable is a numeric variable. The value '1' indicates the value of the TITLE1 or FOOTNOTE1 global statements. The value '2' indicates the value of the TITLE2 or FOOTNOTE2 statements, and so on.
  • The TEXT variable is a 256-character variable that contains the value of a title or footnote. The output shows that when you first start SAS, the TITLE1 statement is set to "The SAS System."

You can run an example to see how the contents of the view change after you submit TITLE and FOOTNOTE statements:

title "Normal Distribution";    /* alias for TITLE1 statement */
title2 "mu = 0; sigma = 1;";
footnote "N = 100";             /* alias for FOOTNOTE1 statement */
 
proc print data=Sashelp.VTitle;  run;

Putting a title into a macro variable

The structure of the DICTIONARY.Titles table implies that you can use a WHERE clause to subset the table. For example, the clause WHERE Type="T" & Number=1 selects only the row for the TITLE1 statement. Similarly, the clause WHERE Type="F" & Number=2 selects the row for the FOOTNOTE2 statement. You can use the SELECT INTO :MacroVar statement in PROC SQL to put data into a macro variable, as follows:

PROC SQL noprint;
select Text into :TitleText TRIMMED  /* put the trimmed value into a macro */
  from Dictionary.Titles
  where Type="T" & Number=1;
quit;
%put &=TitleText;
TITLETEXT=Normal Distribution

You need to be a little careful when using this technique in production code. If you ask for the TITLE2 information when that title is not set, then the WHERE clause will return an empty table. For example, if you clear the TITLE1 statement and rerun the previous PROC SQL statement you will see that the SAS log displays NOTE: No rows were selected and value of the TitleText macro is not updated.

One way to handle this potential problem is to use the %LET statement to set the macro variable to an empty value before you call PROC SQL. If the macro variable is empty after PROC SQL runs, then the requested title or footnote is not set. The following macro uses this technique to set a macro variable to the value of the Nth title, where you can specify the parameter N:

/* Get the N_th title into the &TitleText macro variable */
%macro GetTitle(Number=1);
%global TitleText;
%let TitleText = ;           /* value is empty if TITLEn does not exist */
PROC SQL noprint;
select Text into :TitleText TRIMMED    /* value is set if TITLEn exists */
  from Dictionary.Titles
  where Type="T" & Number=&Number;
quit;
%mend;
 
/* test the %GetTitle macro */
title "Lognormal Distribution";   /* set TITLE1; clear TITLE2 */
%GetTitle();
%put Title1 = "&TitleText";      /* text of title1 */
%GetTitle(Number=2);
%put Title2 = "&TitleText";      /* text of title1 */
Title1 = "Lognormal Distribution"
Title2 = ""

This basic macro is sufficient for my purposes. Feel free to propose improvements in the comments. Also, let me know how you use DICTIONARY tables in your work.

If you would like to learn more about DICTIONARY tables, the following two references will get you started. Many papers have been written about DICTIONARY tables and views. An internet search of the form
sas proceedings "dictionary table"
will reveal some of the papers from SAS conferences.

The post How to get the current TITLE in SAS appeared first on The DO Loop.

Back to Top