18
Aug

Making the sas.servers script look pretty

If you are a SAS administrator managing an environment on UNIX or z/OS, you must use the sas.servers script on a regular basis. As you know, one of its uses is to display the current status of all servers. Is the output accurate? Absolutely. Is it easy to read? Relatively. Is it visually attractive? Not so much.

Human beings are visual creatures. Conventional wisdom says that a picture speaks a thousand words. However, I am not using images to improve the output, but another powerful tool: color. According to this Xerox paper, color captures attention and enhances productivity. It can improve search time, reduce errors, and increase comprehension. As a result, this blog post provides the steps for applying color and an easy-to-read layout to make the sas.servers script look cute and even fun!

Preliminary Steps

Before we begin, it is important to give you some recommendations:

  1. Stop all SAS services.
  2. Backup the files: sas.servers, sas.servers.mid and sas.servers.pre
  3. Apply these changes to a Development or Testing environment.

As a matter of fact, I consider steps 1 and 3 as optional. This script is only used by the SAS platform to start/stop/restart or check the status of the servers. Moreover, if you follow all steps carefully, you can apply these changes safely in a Production environment. In contrast, step 2 is relevant; it is always a good practice to backup essential files before modifying them. In case you need to rollback, you can restore them easily and quickly.

Things to Consider

The About the sas.servers Script section from the SAS 9.4 Intelligence Platform: System Administration Guide provides a caution message: "You should not directly update the sas.servers script." In our case, the type of customization we are about to perform requires a manual update. Don't worry, your script is in good hands.

Furthermore, if you ever need to update the sas.servers script because you want to add/remove a server, you have to run the generate_boot_scripts.sh script to regenerate this file. After doing so, you can lose all the changes made in this post. Keep this in mind, so you can backup the current files before attempting this task.

Easy-to-read Layout

Let's jump into the details. First of all, let's define the new layout of the desired output. Grab your favorite text editor. I use Notepad++ at the office, and Sublime Text at home. Go to your /SASCONFIG/Lev1/ directory and run a regular status command as the sas user. Assuming you followed the preliminary steps and depending on the products installed and the number of SASServers deployed, you should get a similar output:

./sas.servers status
SAS servers status:
SAS Web Infrastructure Data Server is NOT up
SAS Metadata Server 1 is NOT up
SAS Object Spawner 1 is NOT up
SAS DIP Job Runner 1 is NOT up
SAS Information Retrieval Studio Server is NOT up
SAS JMS Broker is NOT up
SAS Cache Locator Service ins_41415 is NOT up
SAS Web Server is NOT up
SAS Web App Server SASServer1_1 is NOT up
SAS Environment Manager is NOT up
SAS Environment Manager Agent is NOT up

Copy that output, except the first line, and paste it into your text editor. Here you can modify the output according to your taste. What I did was to identify the longest line, which is SAS Information Retrieval Studio Server, then I added four spaces to the right and substituted the legend is NOT up for [DOWN]. Likewise, I applied these changes to the rest of the servers, removed the instance number, and kept them all aligned:

SAS Web Infrastructure Data Server         [DOWN]
SAS Metadata Server                        [DOWN]
SAS Object Spawner                         [DOWN]
SAS DIP Job Runner                         [DOWN]
SAS Information Retrieval Studio Server    [DOWN]
SAS JMS Broker                             [DOWN]
SAS Cache Locator Service ins_41415        [DOWN]
SAS Web Server                             [DOWN]
SAS Web App Server SASServer1_1            [DOWN]
SAS Environment Manager                    [DOWN]
SAS Environment Manager Agent              [DOWN]

I chose the pair [ UP ]/[DOWN] to reflect the status because I wanted the look and feel from CentOS 6 boot process. If you are/were a CentOS 6 user, you can remember that the services display the legends: [ OK ] or [FAILED] when booting. You are free to use other alternatives such as ACTIVE/INACTIVE or perhaps RUNNING/STOPPED with or without brackets. Now that the layout is finished, let's move on to the color department.

All You Need Is Color

The fun finally arrived. Let's integrate the layout into our three scripts and add the main ingredient: color!

Again, I am assuming at this stage that you already backed up the files: sas.servers, sas.servers.mid, and sas.servers.pre. Next, open the sas.servers file with your vi editor and add this code anywhere at the top, below the block of comments:

#*****
# Custom Colors for status command
# RED   for [DOWN]
# GREEN for [ UP ]
# NC    for No Color
#*****
RED='e[31m'
GREEN='e[32m'
NC='e[0m'

These lines create three variables with three different color codes: a RED variable with code 31, a GREEN variable with code 32, and a NC variable with the default color code. The definition of these variables is optional but recommended, since they help you debug problems or change colors more easily. If you prefer to use different colors or attributes, you can play with the codes as shown in Bash tips: Colors and formatting.

Considering that the sas.servers script does not contain all servers, you have to add the same code to sas.servers.mid and sas.servers.pre files.

First Example

Now that we have defined the three variables in all the necessary files, let's use them. I'll show you how to apply them to a couple of servers, and then you can replicate it to the rest. The first one is the SAS Metadata Server. Open the sas.servers script again and find these lines:

# SAS Metadata Server
SASMETA_WONT_START_OTHERS="The remaining SAS servers will NOT be started as a result."
 
SASMETA1_IS_UP="SAS Metadata Server 1 is UP"
SASMETA2_IS_UP="SAS Metadata Server 2 is UP"
SASMETA3_IS_UP="SAS Metadata Server 3 is UP"
SASMETA4_IS_UP="SAS Metadata Server 4 is UP"
SASMETA5_IS_UP="SAS Metadata Server 5 is UP"
 
SASMETA1_IS_DOWN="SAS Metadata Server 1 is NOT up"
SASMETA2_IS_DOWN="SAS Metadata Server 2 is NOT up"
SASMETA3_IS_DOWN="SAS Metadata Server 3 is NOT up"
SASMETA4_IS_DOWN="SAS Metadata Server 4 is NOT up"
SASMETA5_IS_DOWN="SAS Metadata Server 5 is NOT up"

Since I have a single Metadata Server instance, the only meaningful variables are SASMETA1_IS_UP and SASMETA1_IS_DOWN. Delete their values, copy the correct string from your text editor, and paste it in both variables. Fix them accordingly:

SASMETA1_IS_UP="SAS Metadata Server                        [ UP ]"
#more instances
 
SASMETA1_IS_DOWN="SAS Metadata Server                        [DOWN]"
#more instances

The final touch is to give color to our script. Use the GREEN and RED variables for the UP/DOWN statuses. It is important to include the NC variable at the end to remove all attributes:

SASMETA1_IS_UP="SAS Metadata Server                        [ ${GREEN}UP${NC} ]"
#more instances
 
SASMETA1_IS_DOWN="SAS Metadata Server                        [${RED}DOWN${NC}]"
#more instances

Second Example

The process is the same for all the servers defined in the sas.servers script. For the other two scripts it is a little different, but still quite easy. I am going to use the SAS Web Infrastructure Data Server as the second example. Open the sas.servers.pre script and look for the server_status() function. Pay attention to these lines:

       if [ $? -eq 0 ]; then
          # Server is already running
          echo "SAS Web Infrastructure Data Server is UP"
       else
          echo "SAS Web Infrastructure Data Server is NOT up"
       fi
    }
    else
      echo "SAS Web Infrastructure Data Server is NOT up"

A subtle difference from the previous example is the echo command. In the sas.servers script, there is a logmsg() function that uses the "echo -e" command behind the scenes. In this case, we have to explicitly add the -e option to enable the interpretation of backslash escapes. Let's also integrate the color and layout:

          echo -e "SAS Web Infrastructure Data Server         [ ${GREEN}UP${NC} ]"
       else
          echo -e "SAS Web Infrastructure Data Server         [${RED}DOWN${NC}]"
       fi
    }
    else
      echo -e "SAS Web Infrastructure Data Server         [${RED}DOWN${NC}]"

At this point, with the above examples, you should have a solid idea about the required changes to accomplish our goal. Now it is your turn to apply them to the rest of the servers.

Finished Product

If you followed this article in detail and performed the steps in all the required servers, your output should resemble mine:

Get the green light by running the start command:

Final Thoughts

I am a visual person with a curious mind. One of the things I like is to customize the tools I use the most, so I decided to make the sas.servers script output a little more attractive to my eyes. I hope you liked the result. If you are still not sure whether to implement this idea or not, let's suppose there are some problems with a couple of servers in your environment and they stop running. Which output would you rather look at? Which one is easier to spot an issue? Let me know your thoughts in the comments below, or even better you can share your creative outputs!

Making the sas.servers script look pretty was published on SAS Users.

17
Aug

SAS Viya sharing credentials for database access

SAS Viya deployments use credentials for accessing databases and other third-party products that require authentication. In this blog post, I will look at how this sharing of credentials is implemented in SAS Environment Manager.

In SAS Viya, domains are used to store the:

  • Credentials required to access external data sources.
  • Identities that are allowed to use those credentials.

There are three types of domains:

  • Authentication stores credentials that are used to access an external source that can then be associated with a caslib.
  • Connection used when the external database has been set up to require a User ID but no password.
  • Encryption stores an encryption key required to read data at rest in a path assigned to a caslib.

In this blog post we will focus on authentication domains which are typically used to provide access to data in a database management system. It is a pretty simple concept; an authentication domain makes a set of credentials available to a set of users. This allows SAS Viya to seamlessly access a resource. The diagram below shows a logical view of a domain. In this example, the domain PGAuth stores the credentials for a Postgres database, and makes those credentials available to two groups (and their members) and three users.

How does this work when a user accesses data in a database caslib? The following steps are performed:

1.     Log on to SAS Viya using personal credentials: the user’s identity is established including group memberships.

2.     Access a CASLIB for a database: using the user’s identity and the authentication domain of the CASLIB, Viya will look up the credentials associated with that identity in the domain.

3.     Two results are possible. A credential match is:

  • 1.     Found: the credentials are passed to the database authentication provider to determine access to the data.
  • 2.     Not found: no access to the data is provided.

To manage domains in SAS Environment Manager you must be an administrator. In SAS Environment Manager select Security > Domains. There are two views available:  Domains and Credentials. The Domains view lists all defined domains. You can access the credentials for a domain by right-clicking on the domain and selecting Credentials.

The Credentials view lists all credentials defined and the domains for which they are associated.

Whatever way you get to a credential, you can edit it by right-clicking and selecting Edit. In the edit dialog, you can specify the Identities (users and groups) that can use the credential, and the User ID and Password of the credential.  Note that only users who are already listed in the Identities field will be able to edit this field, so make sure you are in this field (directly or through group membership) prior to saving.

To use an authentication domain, you reference it in the CASLIB definition. When defining a non-path based CASLIB you must select a domain to provide user credentials to connect to the database server. This can be done when creating a new CASLIB in SAS Environment Manager in the Data > Libraries area.

If you use code to create or access your caslib, use the authenticationdomain option. In this example, we specify authenticationdomain in the table.addcaslib action.

If a user is not attached to the authentication domain directly, or through a group membership, they will not be able to access the credentials. An error will occur when they attempt to access the data.

This has been a brief look at storing and using credentials to access databases from SAS Viya. You can find  more detail in the SAS Viya Administration Guide in the section titled SAS Viya sharing credentials for database access was published on SAS Users.

16
Aug

Load Data into SAS Viya via REST API

In this blog post I am going to cover the example of importing data into SAS Viya using Cloud Analytic Services (CAS) actions via REST API. For example, you may want to import data into a CASLib via REST API.  This means you can perform an import of data outside of the SAS Self-Service Import user interface environment using REST API.  Once this data is loaded into CAS it is available for use in applications such as SAS Visual Analytics and SAS Visual Data Builder.

Introduction

To import data into SAS Viya via REST API, you need to make a series of REST API calls:

1.     Start CAS Session
2.     Load Data into a CASLib
3.     End CAS Session

I will walk through these various REST API calls in the sections below using the REST API testing application HTTPRequestor, which is a free add-on to the Mozilla Firefox browser.

Before I perform any of my REST API calls, I need to Base-64 encode my credentials. The input for encoding the credentials is: I used the site https://www.base64encode.org/ to encode my credentials.  Note: You can use other methods (e.g., Python) to encode your credentials. Use the preferred method by your organization to ensure you are meeting their security protocols.

Below is the header Authorization information I will be sending with each of my requests.

Authorization Header

1.     Start CAS Session

First, I need to start a CAS Session. Below is an example request for starting a CAS Session:

POST https://<YourCASServer:Port>/cas/sessions

Authorization: Basic <Base-64EncodedCredentials>
 Content-Type: application/json

{}

This request returns the CASSessionUUID needed in the next step.

I construct my request in HTTPRequestor as follows and submit the request:

Start CAS Session Request/Response

Here is a screenshot of the raw transaction information.

Start CAS Session Raw Transaction

I need to copy the CAS Session UUID information that was returned for use in the subsequent REST API calls since their CAS Actions must be performed within a CAS Session.

2.     Load Data into a CASLib

Now that I have started my CAS session and have its UUID, I can load the table to CAS. Below is an example request for the table.loadTable CAS Action:

POST 
https://<YourCASServer:Port>/cas/sessions/<CASSessionUUID>/actions/table.load
Table

Authorization: Basic <Base-64EncodedCredentials>
 Content-Type: application/json

{"casLib":"<InputCASLib>","importOptions":{"fileType":"<FileType>"},"path":"<InputFilePathAndName>",
 "casout":{"caslib":"<OutputCASLib>","name":"<OutputTableName>","promote":true}}

 

This request returns a log message: “NOTE: Cloud Analytic Services made the file <InputFilePathAndName> available as table <OutputTableName> in caslib <OutputCASLib>.”

For my example, I will load the SAS data set BASEBALL located in the helpdata CASLib to the Public CASLib and call the CAS Table SAS_BASEBALL.  I am copying the data to the Public CASLib to make it more readily available to all CAS users. Let’s first confirm that the SAS_BASEBALL table does not currently exist in the Public CASLib.

Public CASLib Before LoadTable CAS Action Called

I construct my request in HTTPRequestor as follows and submit the request:

Load Table Request/Response

Here is a screenshot of the raw transaction information.

Load Table Raw Transaction

Next, I will confirm that the SAS_BASEBALL data set is now loaded in the Public CASLib.

Public CASLib After LoadTable CAS Action Called

The SAS_BASEBALL data set is now available for use in applications such as SAS Visual Analytics and SAS Visual Data Builder.

3.     End CAS Session

Finally, I need to terminate my CAS Session. Below is an example request for the session.endSession CAS Action:

POST https://&lt;YourCASServer:Port&gt;/cas/sessions/&lt;CASSessionUUID&gt;/actions/session.endSession

Authorization: Basic &lt;Base-64EncodedCredentials&gt;
 Content-Type: application/json

{}

 

This request returns a status of 0 indicating there was no error and the CASSessionUUID specified in the request has ended.

I construct my request in HTTPRequestor as follows and submit the request:

End CAS Session Request/Response

Here is a screenshot of the raw transaction information.

End CAS Session Raw Transaction

Conclusion

These calls can be strung together so you could schedule their execution. For more information on SAS Viya and REST APIs, refer to the following documentation the SAS Cloud Analytics REST API documentation.

Load Data into SAS Viya via REST API was published on SAS Users.

14
Aug

CAS data modeling for performance

CAS data modelingThe CAS physical data model, i.e.what features CAS offers for data storage, and how to use them to maximize performance in CAS (and consequently SAS Visual Analytics 8.1 too).

So, specifically let’s answer the question:

What CAS physical table storage features can we use to get better performance in CAS and SAS Visual Analytics/CAS?

CAS Physical Table Storage Features

The following data storage features affect how CAS tables are physically structured:

  • Compression
  • Partitioning
  • Sorting
  • Repeated Tables
  • Extended Data Types (Varchar)
  • User Defined Formats

Compression — the Storage Option that Degrades Performance

data public.MegaCorp (compress=yes);
   set baselib.MegaCorp;
run;

Partitioning and Sorting

Partitioning is a powerful tool for improving Bar Charts, Decision Tree, Linear Regression) provide grouping as well as classification functionality.

When performing analyses/processing, CAS first groups the data into the required BY-groups. Pre-partitioning on commonly-used BY-groups means CAS can skip this step, vastly improving performance.

Within partitions, tables can be sorted by non-partition-key variables. Pre-sorting by natural ordering variables (e.g. time) allows CAS to skip the ordering step in many cases just like partitioning allows CAS to skip the grouping step.

For a full use-case, consider a line graph that groups sales by region and plots by date. This graph object would benefit greatly from a CAS table that is pre-partitioned by region and pre-sorted by date.

Join Optimization

Partitioning can also support join operations since both the CAS FedSQL Merge Join algorithm utilize BY-GROUP operations to support their processing.

Pre-partitioning tables in anticipation of joins will greatly improve join performance. A good use case is partitioning both a large transaction table and an equally large reference table (e.g. an enormous Customer table) by the common field, customerID. When a DATA Step MERGE or a FedSQL join is performed between the two tables on that field, the join/merge will take advantage of partitioning for the BY-GROUP operation resulting in something similar to a partition-wise join.

Like Compression, partitioning and sorting can be implemented via CAS actions as well as data set options. Using the data set options is demonstrated below:

data mycas.bigOrderTable (partition=(region division) orderby=(year quarter month));
   set CASorBase.bigOrderTable;
run;

Repeated Tables

By default, in distributed CAS Server deployments, CAS divides incoming tables into blocks and distributes those blocks among its DUPLICATE data set option or the Repeated Tables have two main use-cases in CAS:

1.     Join Optimization
2.     Small Table Operation Optimization

Join Optimization

For join operations, the default data distribution scheme can result in significant network traffic as matching records from the two tables travel between worker nodes to meet. If one of the two tables was created with the DUPLICATE/REPEAT option, then every possible record from that table is available on every node to the other table. There is no need for any network traffic.

Small Table Operation Optimization

For small tables, even single table operations can perform better with repeated instead of divided distribution. LASR actually implemented the “High Volume Access to Smaller Tables” feature for the same reason. When a table is repeated, CAS runs any required operation on a single worker node against the full copy of the table that resides there, instead of distributing the work.

As stated, repeated tables can be implemented with the DUPLICATE data set option, it can also be implemented with the REPEAT option on the PROC CASUTIL LOAD statement. The CASUTIL method is shown below:

proc casutil ;
   load data=sashelp.prdsale outcaslib=”caspath”
           casout=”prdsale” replace REPEAT ;
quit ;

Extended Data Types (VARCHAR)

With Viya 3.2 comes SAS’ first widespread implementation of variable length character fields. While Base SAS offers variable length character fields through compression, Viya 3.2 is the first major SAS release to include a save storage space, it also improves performance by reducing the size of the record being processed. CAS, like any other processing engine, will process narrower records more quickly than wide records.

User Defined Formats

User defined formats (UDFs) exist in CAS in much the same way they do in Base SAS. Their primary function, of course, is to provide display formatting for raw data values. Think about a format for direction. The raw data might be: “E”, “W”, “N”, “S” while the corresponding format values might be “East”, “West”, “North”, “South.”

So how might user defined formats improve performance in CAS? The same way they do in Base SAS, and the same way that VARCHAR does, by reducing the size of the record that CAS has to process. Imagine replacing multiple 200 byte description fields with 1 byte codes. If you had 10 such fields, the record length would decrease 1990 bytes ((10 X 200) – 10). This is an extreme example but it illustrates the point: User defined formats can reduce the amount of data that CAS has to process and, consequently, will lead to performance gains.

CAS data modeling for performance was published on SAS Users.

5
Aug

Try it - you'll like it...

free trial of SAS Viya productsIt's a common mantra many parents use to encourage their children to expand their food choices and try something new. Even as adults we’re often more comfortable easing into the unfamiliar, taking small bites, tiny samples, even dipping a toe in the water before diving in headfirst.

Software is no different.

We’ve introduced trials of our latest products – and we encourage you to test them out, see what they are like and consider how they could benefit you and your organization.

SAS is currently offering a free trial of SAS Viya products. The latest evolution of the SAS platform. SAS Viya is a new engine, driving fast analytic answers for more data types, introducing new products, and accessible to those who code in SAS, use other coding languages – or prefer an interactive interface. Extending SAS 9, customers are enjoying the benefits, running existing code faster, adding new machine learning methods into existing SAS analysis and solving entirely new business questions – all from one, governed platform.

To help you learn more about this modernization of the SAS platform, we are now offering product trials at no cost and with no installation necessary. They even come with test data to deliver a complete experience. To start your free trial of SAS Viya products, just sign up at www.sas.com/viya ‘Try It for Free’ – and choose your analytics trial experience. You’ll use your existing SAS profile – or create one if you don’t already have one – to register and get started. In just a couple of minutes after we receive your request, we’ll send you a link with instructions, videos and step-by-step scripts you can follow to sign onto your SAS-hosted image. All you need is a web browser and a little curiosity and you’re on your way to expanding your analytics arsenal.

So, what do you say? Try it… we know you’ll like it.

Try it - you'll like it... was published on SAS Users.

2
Aug

CALL EXECUTE made easy for SAS data-driven programming

In my prior posts (Data-driven SAS macro loops, Modifying variable attributes in all datasets of a SAS library, Automating the loading of multiple database tables into SAS tables), I presented various data-driven applications using SAS macro loops.

However, macro loops are not the only tools available in SAS for developing data-driven programs.

CALL EXECUTE is one of them. The CALL EXECUTE routine accepts a single argument that is a character string or character expression. The character expression is usually a concatenation of strings containing SAS code elements to be executed after they have been resolved. Components of the argument expression can be character constants, data step variables, macro variable reference, as well as macro references. CALL EXECUTE dynamically builds SAS code during DATA step iterations; that code executes after the DATA step’s completion outside its boundary. This makes a DATA step iterating through a driver table an effective SAS code generator similar to that of SAS macro loops.

However, the rather peculiar rules of the CALL EXECUTE argument resolution may make its usage somewhat confusing. Let’s straighten things out.

Argument string has no macro or macro variable reference

If an argument string to the CALL EXECUTE contains SAS code without any macro or macro variable references, that code is simply pushed out (of the current DATA step) and appended to a queue after the current DATA step. As the DATA step iterates, the code is appended to the queue as many times as there are iterations of the DATA step. After the DATA step completes, the code in the queue gets executed in the order of its creation (First In First Out).

The beauty of this process is that the argument string can be a concatenation of character constants (in single or double quotes) and SAS variables which get substituted with their values by CALL EXECUTE for each DATA step iteration. This will produce data-driven, dynamically generated SAS code just like an iterative SAS macro loop.

Let’s consider the following example. Say we need to load multiple Oracle tables into SAS tables.

Step 1. Creating a driver table

In order to make our process data-driven, let’s first create a driver table containing a list of the table names that needed to be extracted and loaded:

/* create a list of tables to extract & load */
libname parmdl '/sas/data/parmdata';
data parmdl.tablelist;
        length tname $8;
        input tname;
        datalines;
ADDRESS
ACCOUNT
BENEFIT
FINANCE
HOUSING
;

This program runs just once to create the driver table parmdl.tablelist.

Step 2. Loading multiple tables

Then, you can use the following data-driven program that runs each time you need to reload Oracle tables into SAS:

/* source ORACLE library */
libname oralib oracle path="xxx" schema="yyy" user="uuu"
 	PASSWORD="{SAS002}ABCDEFG12345678RTUR" access=readonly;
 
/* target SAS library */
libname sasdl '/sas/data/appdata';
 
/* driver table SAS library */
libname parmdl '/sas/data/parmdata';
 
data _null_;
   set parmdl.tablelist;
   call execute(cats(
      'data sasdl.',tname,';',
         'set oralib.',tname,';',
      'run;'));
run;

In order to concatenate the components of the CALL EXECUTE argument I used the cats() SAS function which returns a concatenated character string removing leading and trailing blanks.

When this program runs, the SAS log indicates that after the data _null_ step the following statements are added and executed:

NOTE: CALL EXECUTE generated line.
1   + data sasdl.ADDRESS;set oralib.ADDRESS;run;
2   + data sasdl.ACCOUNT;set oralib.ACCOUNT;run;
3   + data sasdl.BENEFIT;set oralib.BENEFIT;run;
4   + data sasdl.FINANCE;set oralib.FINANCE;run;
5   + data sasdl.HOUSING;set oralib.HOUSING;run;

In this example we use data _null_ step to loop through the list of tables (parmdl.tablelist) and for each value of the tname column a new data step gets generated and executed after the data _null_ step. The following diagram illustrates the process:

Diagram explaining CALL EXECUTE for SAS data-driven programming

Argument string has macro variable reference in double quotes

If an argument to the CALL EXECUTE has macro variable references in double quotes, they will be resolved by the SAS macro pre-processor during the DATA step compilation. Nothing unusual. For example, the following code will execute exactly as the above, and macro variable references &olib and &slib will be resolved to oralib and sasdl before CALL EXECUTE takes any actions:

%let olib = oralib;
%let slib = sasdl;
 
data _null_;
   set parmdl.tablelist;
   call execute (
      "data &slib.."!!strip(tname)!!';'
         "set &olib.."!!strip(tname)!!';'!!
      'run;'
   );
run;

Argument string has macro or macro variable reference in single quotes

Here comes the interesting part. If the argument to CALL EXECUTE has macro or macro variable references in single quotes, they still will be resolved before the code is pushed out of the DATA step, but not by the SAS macro pre-processor during the DATA step compilation as it was in the case of double quotation marks. Macro or macro variable references in single quotes will be resolved by CALL EXECUTE itself. For example, the following code will execute exactly as the above, but macro variable references &olib and &slib will be resolved by CALL EXECUTE:

%let olib = oralib;
%let slib = sasdl;
 
data _null_;
   set parmdl.tablelist;
   call execute('data &slib..'!!strip(tname)!!';'!!
                'set &olib..'!!strip(tname)!!';'!!
                'run;'
               );
run;

Timing considerations

CAUTION: If your macro contains some non-macro language constructs for assigning macro variables during run time, such as a CALL SYMPUT or SYMPUTX statement (in a DATA step) or an INTO clause (in PROC SQL), resolving those macro variable references by CALL EXECUTE will happen too soon, before your macro-generated code gets pushed out and executed. This will result in unresolved macro variables. Let’s run the following code that should extract Oracle tables into SAS tables as above, but also re-arrange column positions to be in alphabetical order:

%macro onetable (tblname);
   proc contents data=oralib.&tblname out=one(keep=name) noprint;
   run;
 
   proc sql noprint;
      select name into :varlist separated by ' ' from one;
   quit;
   %put &=varlist;
 
   data sasdl.&tblname;
      retain &varlist;
      set oralib.&tblname end=last nobs=n;
      if last then call symput('n',strip(put(n,best.)));
   run;
   %put Table &tblname has &n observations.;
%mend onetable;
 
data _null_;
   set parmdl.tablelist;
   call execute('%onetable('!!strip(tname)!!');');
run;

Predictably, the SAS log will show unresolved macro variable references, such as:

WARNING: Apparent symbolic reference VARLIST not resolved.
WARNING: Apparent symbolic reference N not resolved.
Table ADDRESS has &n observations.

SOLUTION: To avoid the timing issue when a macro reference gets resolved by CALL EXECUTE too soon, before macro variables have been assigned during macro-generated step execution, we can strip CALL EXECUTE of the macro resolution privilege. In order to do that, we can mask & and % characters using the %nrstr macro function, thus making CALL EXECUTE “macro-blind,” so it will push the macro code out without resolving it. In this case, macro resolution will happen after the DATA step where CALL EXECUTE resides. If an argument to CALL EXECUTE has a macro invocation, then including it in the %nrstr macro function is the way to go. The following code will run just fine:

data _null_;
   set parmdl.tablelist;
   call execute('%nrstr(%onetable('!!strip(tname)!!'));');
run;

When this DATA step runs, the SAS log indicates that the following statements are added and executed:

NOTE: CALL EXECUTE generated line.
1   + %onetable(ADDRESS);
2   + %onetable(ACCOUNT);
3   + %onetable(BENEFIT);
4   + %onetable(FINANCE);
5   + %onetable(HOUSING);

CALL EXECUTE argument is a SAS variable

The argument to CALL EXECUTE does not necessarily have to contain or be a character constant. It can be a SAS variable, a character variable to be precise. In this case, the behavior of CALL EXECUTE is the same as when the argument is a string in single quotes. It means that if a macro reference is part of the argument value it needs to be masked using the %nrstr() macro function in order to avoid the timing issue mentioned above.

In this case, the argument to the CALL EXECUTE may look like this:

arg = '%nrstr(%mymacro(parm1=VAL1,parm2=VAL2))';
call execute(arg);

Making CALL EXECUTE totally data-driven

In the examples above we used the tablelist driver table to retrieve values for a single macro parameter for each data step iteration. However, we can use a driver table not only to dynamically assign values to one or more macro parameters, but also to control which macro to execute in each data step iteration. The following diagram illustrates the process of totally data-driven SAS program:

Diagram explaining using CALL EXECUTE for SAS data-driven programming

Conclusion

CALL EXECUTE is a powerful tool for developing dynamic data-driven SAS applications. Hopefully, this blog post demonstrates its benefits and clearly explains how to avoid its pitfalls and use it efficiently to your advantage. I welcome your comments, and would love to hear your experiences with CALL EXECUTE.

CALL EXECUTE made easy for SAS data-driven programming was published on SAS Users.

28
Jul

Controlling Stored Process Execution through Request Initialization Code Injection

Recently, I was working with a client who had a unique problem. He needed a way to cancel a stored process from executing in cases where the stored process wasn’t registered to the matching Metadata folder-structure for its selected server context. For example, if a stored process was stored under the DEV folder-structure in Metadata but attempted to be run on the PROD server context, the execution of the stored process would then need to be canceled and alert the user of the error.

Fortunately, this type of behavior with stored processes can be achieved by injecting SAS code that runs before the stored process executes via Request Initialization of the Logical Stored Process Server.

In this example, the client has two logical environments that reside within a single physical/metadata environment. The two logical environments are DEV and PROD. The client doesn’t want stored processes in DEV to run on PROD-associated servers.

Preliminary requirements

A stored process registered on one logical environment (e.g., DEV) won’t run on the server context of another logical environment (e.g., PROD).

The following steps show you how to set up a Request Initialization for a Logical Stored Process Server in order to cancel a stored process from executing on the PROD server, if it isn’t stored in PROD in metadata.

1) Write the desired injection code. In this case, the code checks if the stored process is registered with PROD and then either cancels the stored process or proceeds with normal execution.

/* This code checks macros variables populated by the currently executing stored process to discover the path in metadata where the currently executing stored processes is located. It then compares this metadata path with the expected path for the server context this code will be injected into. For example, the redacted line below might say “PROD” to signify the top-level. It cancels if the path is not a match; otherwise, execution continues normally. */
%macro CheckProdStpRegistration();
	%let stp_path =;
	%if %symexist(_METAFOLDER) %then %let stp_path = &amp;_METAFOLDER;
	%else %if %symexist(_PROGRAM) %then %let stp_path = &amp;_PROGRAM;
	%else %do;
		%let stp_path = INVALID_REGISTRATION;
		%put ERROR: The macro variables _METAFOLDER and _PROGRAM are not 
      defined. Cannot determine stored process registration.;
	%end;
 
	/* Check top-level filepath is what is expected in metadata PROD folders. */
%if ^(%scan(&amp;stp_path,1,%str(/)) = Logical
AND %scan(&amp;stp_path,2,%str(/)) = PROD) %then %do;
		%put ERROR: Only stored processes registered in PROD can be run on an 
      PROD-based Stored Process Server. Canceling the stored
      process execution.;
 
		/* Cancel the stored process from running. */
		data _null_;
			abort cancel;
		run;
	%end;
%mend CheckProdStpRegistration;
 
%CheckProdStpRegistration();

In this example, the code checks the macro variables that are populated with running a stored process. After retrieving the metadata path where the stored process is stored, the code then checks that the first and second parts of the path are equivalent to “Logical/PROD”. This matches the logical PROD environment from above.

2) Now that we have our code, we can set it to run before every executed stored processes via a Request Intialization under the Logical Stored Process Server options.

NOTE: After changing this property, you’ll have to refresh the Object Spawner so that the changes take hold.

In this example, we go into the properties for the Logical Stored Process Server associated with the PROD server context. In the properties, you can find the Request Initialization under the Options tab > Set Server Properties > Request tab. You then simply provide the path to the SAS program created in step 1.

3) Test a stored process located metadata under the DEV folder structure and test a stored process in metadata located under the PROD folder structure. Ensure the stored process is set to execute against the Application server that the Request Initialization code was applied to (e.g., the PROD server context).

In this example, the Application Server selected is the one that the Request Initialization code was applied to. The stored process can then just be moved from one location to another in order to test the process: to ensure that a stored process not registered in PROD cannot be run on the PROD Logical Stored Process Server.

You should not that the part of the code that is essential to canceling the execution of a stored process is the “abort” statement with the “cancel” option. There is no other way to gracefully terminate a stored process’s execution. Other methods will either allow the stored process to still run or cause errors that can affect the current session and mislead the end-user.

/* Cancel the stored process from running. */
		data _null_;
			abort cancel;
		run;

In this example, the ‘abort cancel’ statement/option is shown. It simply just needs to run in a ‘null’ data-step in order to cancel the upcoming stored process execution.

I hope you found this tip helpful. Feel free to leave a question or comment in the space below.

Controlling Stored Process Execution through Request Initialization Code Injection was published on SAS Users.

26
Jul

SAS Visual Analytics 8.1: Configuring prompts with different source data

In a previous blog, I describe how there are a few new features related to report and page prompts in SAS Visual Analytics 8.1; namely the ability to configure cascading prompts in VA 8.1: Cascading Prompts as Report and Page Prompts.

In this blog, I will cover how to configure prompts, either report, page, or report canvas prompts, that use different data sources.

Different Data Sources with overlapping data values

First, you must have two different data sources added to your Visual Analytics report. These data sources must have values that overlap that you wish to prompt on. All of the values do not need to map, but they must have some values in common if you wish to use a shared prompt.

In this example, we will prompt for Product Line. Let’s examine the column values:

I’ve color coded the values that I would like to map together. I see that the only values that match “out-of-the-box” is Game.

One work around to get all of the values to match will be to create a Custom Category and use that column for the mapping.

In a “real world” scenario, this may not be ideal. The cardinality of the two columns may be so large that you may have to go back to either the source data or ETL job to produce better matching values.

However, if you are using date columns as the mapping columns things are considerably easier as year, month, and quarter are standard values that match without extra steps.

Here is my new Custom Category that I will use for my mapping:

Here are my mappings now. I will be using Product Line (New) for the Insight Toys data source moving forward.

Add prompts

There are two different locations where you can add prompts, i.e. Control Objects, which means there are two different ways to configure prompts with different data sources:

1.     Report and Page Prompts

2.     Report Canvas Prompts

Report and Page Prompt configuration for different data sources

For this first example, I will configure a Button Bar object placed in the Page Prompt area to filter two different data sources. For the Button Bar’s Category Role, I will use the data source with the largest available selection, in this case, the Product Line (New) from Insight Toys.

Now let’s configure this button bar to filter both data sources. You must activate the button bar by clicking on it, then right-mouse click and select Edit data source mappings

Then you simply have to pick your source table’s column to map to your target table’s column.

That’s it. The mapping is complete. Here is what the report would look like with different selections made for the button bar. Notice, that since I used the Insight Toys data source for the Role assignment, and it has more values than available in the Mega Corp data. If a selection is made where nothing matches in Mega Corp, as in the Gift example, then the Mega Corp bar chart is blank.

Report Canvas Prompt configuration for different data sources

In this second example, I am going to use a List Control object within the report canvas to filter two different data sources. Again, I will use the Insight Toys’ Product Line (New) column as the List Role Category assignment since it has the most values.

Now to configure the list to filter both bar charts. Click on the list control object to activate the window. Then select the Actions pane, and use the Add button to select Add filter.

Then select both bar charts as the target of the filter Action.
Next, select the Map data option.

Select the source data’s column to map to the target data’s column. Use the + to add additional column mapping criteria.

Here is how the report would look with a few of the values selected from the list table. You can see how both Mega Corp and Insight Toys display overlapping values for Product Line but for any unique Product Lines, such as Gift, its values are only displayed on the Insight Toys bar chart.

Now you know how to configure your control objects for multiple data sources. This works no matter how many data sources you add to your report, simply use the Map data option and select the mappings between the source data and target data.

As I mentioned earlier, a frequently used application of mapping prompts for multiple data sources is for date columns. Here is a screenshot of one example using year and month. I also styled the button bar’s selected background and text color to coordinate with the graphs.

 

SAS Visual Analytics 8.1: Configuring prompts with different source data was published on SAS Users.

25
Jul

Programming tips from experienced SAS users

Trivial Pursuit, Justin Bieber and Timbits. Some pretty great things have come from Canada, eh? Well, you can go ahead and add expert SAS programmers to that impressive list.

In this video, six Canadian SAS programmers, with more than 115 years of SAS programming experience between them, share some of their favorite, little-known SAS programming tips. You're sure to discover a new trick or two.

The video includes the following tips and more:

  • Standardizing and documenting your SAS program.
  • Creating parameter lookup tables.
  • Declaring your macro variables.
  • Using the Characterize Data task in SAS Enterprise Guide.
  • Data exploration best practices for SAS Enterprise Guide.

Looking for more great tips to help bring your SAS programming skills to the next level? Check out these great resources and learn even more from your SAS peers:

  • SAS Support Communities: Peer-to-peer support for SAS users about programming, data analysis, installation and deployment issues, tips and best practices and a whole lot more.
  • SAS Blogs: Connecting you to people, products and ideas from SAS with technical tips, support information and more.
  • SAS Newsletters: The latest news, tips, tricks and resources from SAS, plus advice and industry knowledge gleaned from top experts and other SAS users.

 

Programming tips from experienced SAS users was published on SAS Users.

21
Jul

Selecting the top n% and bottom n% of observations from a data set

n% of observations from a data setSAS® offers several ways that you can find the top n% and bottom n% of data values based on a numeric variable. The RANK procedure with the GROUPS= option is one method. Another method is The UNIVARIATE procedure with the PCTLPTS= option. Because there are several ways to perform this task, you can choose the procedure that you are most familiar with. In this blog post, I use the SUMMARY procedure to generate the percentile values and macro logic to dynamically choose the desired percentile statistics. After the percentiles are generated, I subset the data set based on those values. This blog post provides two detailed examples: one calculates percentiles for a single variable and one calculates percentiles within a grouping variable.

Calculate Percentiles of a Single Variable

Calculating percentiles of a single variable includes the following steps. Within the macro, a PROC SUMMARY step calculates the percentiles. The subsequent DATA step uses CALL SYMPUTX to create macro variables for the percentile values, and the final DATA step uses those macro variables to subset the data. Here is the code, which is explained in detail below:

/* Create sample data */
data test;                   
   do i=1 to 10000;                                                     
      x=ranuni(i)*12345;                                         
      output;                                                         
   end; 
   drop i; 
run;     
 
proc sort data=test;
   by x;
run; 
 
%macro generate_percentiles(ptile1,ptile2); 
/* Output desired percentile values */                         
proc summary data=test;                                               
   var x;                                                       
   output out=test1 &amp;ptile1= &amp;ptile2= / autoname;                               
run;                                                                 
 
/* Create macro variables for the percentile values */     
data _null_;                                                         
   set test1;                                                         
   call symputx("&amp;ptile1", x_&amp;ptile1);                                     
   call symputx("&amp;ptile2", x_&amp;ptile2);                                     
run;    
%put &amp;&amp;&amp;ptile1;
%put &amp;&amp;&amp;ptile2; 
 
data test2;                                                             
   set test;                                                           
/* Use a WHERE statement to subset the data */                         
   where x le &amp;&amp;&amp;ptile1 or x ge &amp;&amp;&amp;ptile2;                                       
run;  
 
proc print;
run; 
 
%mend;
 
options mprint mlogic symbolgen;
%generate_percentiles(p1,p99)
%generate_percentiles(p25,p75)

After creating and sorting the sample data, I begin my macro definition with two parameters that enable me to substitute the desired percentiles in my macro invocation:

%macro generate_percentiles(ptile1,ptile2);

The PROC SUMMARY step writes the desired percentiles for variable X to the Test1 data set. The AUTONAME option names the percentile statistics in the following format, <varname>_<percentile> (for example, x_p25).

proc summary data=test;                                               
   var x;                                                       
   output out=test1 &amp;ptile1= &amp;ptile2= / autoname;                               
run;

Next, I want to store the values of the percentile statistics in macro variables so that I can use them in later processing. I use CALL SYMPUTX to do this, which gives the macro variables the same name as the statistic. To see the resulting values in the log, I use

data _null_;                                                         
   set test1;                                                         
   call symputx("&amp;ptile1", x_&amp;ptile1);                                     
   call symputx("&amp;ptile2", x_&amp;ptile2);                                     
run;    
%put &amp;&amp;&amp;ptile1;
%put &amp;&amp;&amp;ptile2;

The SAS log shows the following:

MLOGIC(GENERATE_PERCENTILES):  %PUT &amp;&amp;&amp;ptile1
SYMBOLGEN:  &amp;&amp; resolves to &amp;.
SYMBOLGEN:  Macro variable PTILE1 resolves to p1
SYMBOLGEN:  Macro variable P1 resolves to 123.22158288
123.22158288
MLOGIC(GENERATE_PERCENTILES):  %PUT &amp;&amp;&amp;ptile2
SYMBOLGEN:  &amp;&amp; resolves to &amp;.
SYMBOLGEN:  Macro variable PTILE2 resolves to p99
SYMBOLGEN:  Macro variable P99 resolves to 12232.136483
12232.136483

I use these macro variables in a WHERE statement within a DATA step to subset the data set based on the percentile values:

data test2;                                                             
   set test;                                                           
/* Use a WHERE statement to subset the data */                         
   where x le &amp;&amp;&amp;ptile1 or x ge &amp;&amp;&amp;ptile2;                                       
run;

Finally, the macro invocations below pass in the desired percentile statistics:

%generate_percentiles(p1,p99)
%generate_percentiles(p25,p75)

The percentile statistics that are available with PROC SUMMARY are included in the documentation for the

/* Create sample data */
data test; 
 do group='a','b'; 
   do i=1 to 10000;                                                     
      x=ranuni(i)*12345;                                         
      output;                                                         
   end; 
 end;
   drop i; 
run;     
 
proc sort data=test;
   by group x;
run; 
 
%macro generate_percentiles(ptile1,ptile2); 
/* Output desired percentile values by group */                         
proc summary data=test; 
   by group; 
   var x;                                                       
   output out=test1 &amp;ptile1= &amp;ptile2= / autoname;                               
run;                                                                 
 
/* Create macro variables for each value of the BY variable */
/* Create macro variables for the percentile values for each BY group */ 
/* Create a macro variable that is the count of the unique
values of the BY variable */ 
data _null_;   
  retain count 0; 
   set test1;   
   by group;
   if first.group then do;
    count+1;
    call symputx('val'||left(count),group); 
    call symputx("&amp;ptile1"||'_'||left(count), x_&amp;ptile1);                                     
    call symputx("&amp;ptile2"||'_'||left(count), x_&amp;ptile2);  
   end; 
  call symput('last',left(count));
run;    
%put _user_;
 
/* Loops through each value of the BY variable */ 
%do i=1 %to &amp;last;
 
data test&amp;i;                                                             
   set test;  
   where group="&amp;&amp;val&amp;i"; 
/* Use an IF statement to subset the data */
   if x le &amp;&amp;&amp;ptile1._&amp;i or x ge &amp;&amp;&amp;ptile2._&amp;i;                                       
run;  
 
proc print;
run; 
 
%end;
%mend;
 
options mprint mlogic symbolgen;
%generate_percentiles(p1,p99)

Calculating percentiles has many applications, including ranking data, finding outliers, and subsetting data. Using a procedure in Base SAS® that enables you to request percentile statistics along with the power of the macro language, you can dynamically generate desired values that can be used for further processing and analysis.

Selecting the top n% and bottom n% of observations from a data set was published on SAS Users.

Back to Top