20
Oct

Top 12 Advantages of SAS Viya

Advantages of SAS ViyaThere are many compelling reasons existing SAS users might want to start integrating SAS Viya into their SAS9 programs and applications.  For me, it comes down to ease-of-use, speed, and faster time-to-value.  With the ability to traverse the (necessarily iterative) analytics lifecycle faster than before, we are now able to generate output quicker – better supporting vital decision-making in a reduced timeframe.   In addition to the positive impacts this can have on productivity, it can also change the way we look at current business challenges and how we design possible solutions.

Earlier this year I wrote about how SAS Viya provides a robust analytics environment to handle all of your big data processing needs.  Since then, I’ve been involved in testing the new SAS Viya 3.3 software that will be released near the end of 2017 and found some additional advantages I think warrant attention.  In this article, I rank order the main advantages of SAS Viya processing and new capabilities coming to SAS Viya 3.3 products.  While the new SAS Viya feature list is too long to list everything individually, I’ve put together the top reasons why you might want to start taking advantage of SAS Viya capabilities of the SAS platform.

1.     Multi-threaded everything, including the venerable DATA-step

In SAS Viya, everything that can run multi-threaded - does.  This is the single-most important aspect of the SAS Viya architecture for existing SAS customers.  As part of this new holistic approach to data processing, SAS has enabled the highly flexible DATA step to run multi-threaded, requiring very little modification of code in order to begin taking advantage of this significant new capability (more on that in soon-to-be-released blog).  Migrating to SAS Viya is important especially in those cases where long-running jobs consist of very long DATA steps that act as processing bottle-necks where constraints exist because of older single-threading configurations.

2.     No sorting necessary!

While not 100% true, most sort routines can be removed from your existing SAS programs.  Ask yourself the question: “What portion of my runtimes are due strictly to sorting?”  The answer is likely around 10-25%, maybe more.  In general, the concept of sorting goes away with in-memory processing.  SAS Viya does its own internal memory shuffling as a replacement.  The SAS Viya CAS engine takes care of partitioning and organizing the data so you don’t have to.  So, take those sorts out your existing code!

3.     VARCHAR informat (plus other “variable-blocking” informats/formats)

Not available in SAS 9.4, the VARCHAR informat/format allows you to store byte information without having to allocate room for blank spaces.  Because storage for columnar (input) values varies by row, you have the potential to achieve an enormous amount of (blank space) savings, which is especially important if you are using expensive (fast) disk storage space.  This represents a huge value in terms of potential data storage size reduction.

4.     Reduced I/O in the form of data reads and writes from Hive/HDFS and Teradata to CAS memory

SAS Viya can leverage Hive/HDFS and Teradata platforms by loading (lifting) data up and writing data back down in parallel using CAS pooled memory.  Data I/O, namely reading data from disk and converting it into a SAS binary format needed for processing, is the single most limiting factor of SAS 9.4.  Once you speed up your data loading, especially for extremely large data sets, you will be able to generate faster time to results for all analyses and projects.

5.     Persisted data can stay in memory to support multiple users or processing steps

Similar to SAS LASR, CAS can be structured to persist large data sets in memory, indefinitely.  This allows users to access the same data at the same time and eliminates redundancy and repetitive I/O, potentially saving valuable compute cycles.  Essentially, you can load the data once and then as many people (or processing steps) can reuse it as many times as needed thereafter.

6.     State-of-the-art Machine Learning (ML) techniques (including Gradient Boosting, Random Forest, Support Vector Machines, Factorization Machines, Deep Learning and NLP analytics)

All the most popular ML techniques are represented giving you the flexibility to customize model tournaments to include those techniques most appropriate for your given data and problem set.  We also provide assessment capabilities, thus saving you valuable time to get the types of information you need to make valid model comparisons (like ROC charts, lift charts, etc.) and pick your champion models.  We do not have extreme Gradient Boosting, Factorization Machines, or a specific Assessment procedure in SAS 9.4.  Also, GPU processing is supported in SAS Viya 3.3, for Deep Neural Networks and Convolutional Neural Networks (this has not be available previously).

7.     In-memory TRANSPOSE

The task of transposing data amounts to about 80% of any model building exercise, since predictive analytics requires a specialized data set called a ‘one-row-per-subject’ Analytic Base Table (ABT).  SAS Viya allows you transpose in a fraction of the time that it used to take to develop the critical ABT outputs.  A phenomenal time-saver procedure that now runs entirely multi-threaded, in-memory.

8.     API’s!!!

The ability to code from external interfaces gives coders the flexibility they need in today’s fast-moving programming world.  SAS Viya supports native language bindings for Lua, Java, Python and R.  This means, for example, that you can launch SAS processes from a Jupyter Notebook while staying within a Python coding environment.  SAS also provide a REST API for use in data science and IT departments.

9.     Improved model build and deployment options

The core of SAS  Viya machine learning techniques support auto-tuning.  SAS has the most effective hyper-parameter search and optimization routines, allowing data scientists to arrive at the correct algorithm settings with higher probability and speed, giving them better answers with less effort.  And because ML scoring code output is significantly more complex, SAS Viya Data Mining and Machine Learning allows you to deploy compact binary score files (called Astore files) into databases to help facilitate scoring.  These binary files do not require compilation and can be pushed to ESP-supported edge analytics.  Additionally, training within  event streams is being examined for a future release.

10.    Tons of new SAS visual interface advantages

A.     Less coding – SAS Viya acts as a code generator, producing batch code for repeatability and score code for easier deployment.  Both batch code and score code can be produced in a variety of formats, including SAS, Java, and Python.

B.     Improved data integration between SAS Viya visual analytics products – you can now edit your data in-memory and pass it effortlessly through to reporting, modeling, text, and forecasting applications (new tabs in a single application interface).

C.     Ability to compare modeling pipelines – now data scientists can compare champion models from any number of pipelines (think of SAS9 EM projects or data flows) they’ve created.

D.     Best practices and white box templates – once only available as part of SAS 9 Rapid Predictive Modeler, Model Studio now gives you easy access to basic, intermediate and advanced model templates.

E.     Reusable components – Users can save their best work (including pipelines and individual nodes) and share it with others.  Collaborating is easier than ever.

11.    Data flexibility

You can load big data without having all that data fit into memory.  Before in HPA or LASR engines, the memory environment had to be sized exactly to fit all the data.  That prior requirement has been removed using CAS technology – a really nice feature.

12.    Overall consolidation and consistency

SAS Viya seeks to standardize on common algorithms and techniques provided within every analytic technique so that you don’t get different answers when attempting to do things using alternate procedures or methods. For instance, our deployment of Stochastic Gradient Descent is now the same in every technique that uses that method.  Consistency also applies to the interfaces, as SAS Viya attempts to standardize the look-and-feel of various interfaces to reduce your learning curve when using a new capability.

The net result of these Top 12 advantages is that you have access to state-of-the-art technology, jobs finish faster, and you ultimately get faster time-to-value.  While this idea has been articulated in some of the above points, it is important to re-emphasize because SAS Viya benefits, when added together, result in higher throughputs of work, a greater flexibility in terms of options, and the ability to keep running when other systems would have failed.  You just have a much greater efficiency/productivity level when using SAS Viya as compared to before.  So why not use it?

Learn more about SAS Viya.
Tutorial Library: An introduction to SAS Viya programming for SAS 9 programmers.
Blog: Adding SAS Viya to your SAS 9 programming toolbox.

Top 12 Advantages of SAS Viya was published on SAS Users.

20
Oct

Tips for using the IMPORT procedure to read files that contain delimiters

using the IMPORT procedure to read files that contain delimitersReading an external file that contains delimiters (commas, tabs, or other characters such as a pipe character or an exclamation point) is easy when you use the IMPORT procedure. It's easy in that variable names are on row 1, the data starts on row 2, and the first 20 rows are a good sample of your data. Unfortunately, most delimited files are not created with those restrictions in mind.  So how do you read files that do not follow those restrictions?

You can still use PROC IMPORT to read the comma-, tab-, or otherwise-delimited files. However, depending on the circumstances, you might have to add the GUESSINGROWS= statement to PROC IMPORT or you might need to pre-process the delimited file before you use PROC IMPORT.

Note: PROC IMPORT is available only for use in the Microsoft Windows, UNIX, or Linux operating environments.

The following sections explain four different scenarios for using PROC IMPORT to read files that contain the delimiters that are listed above.

Scenario 1

In this scenario, I use PROC IMPORT to read a comma-delimited file that has variable names on row 1 and data starting on row 2, as shown below:

proc import datafile='c:tempclassdata.csv' 
out=class dbms=csv replace;
run;

 

When I submit this code, the following message appears in my SAS® log:

NOTE: Invalid data for Age in line 28 9-10.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
28        Janet,F,NA,62.5,112.5 21
Name=Janet Sex=F Age=. Height=62.5 Weight=112.5 _ERROR_=1 _N_=27
NOTE: 38 records were read from the infile 'c:tempclassdata.csv'.
      The minimum record length was 17.
      The maximum record length was 21.
NOTE: The data set WORK.CLASS has 38 observations and 5 variables.

 

In this situation, how do you prevent the Invalid Data message in the SAS log?

By default, SAS scans the first 20 rows to determine variable attributes (type and length) when it reads a comma-, tab-, or otherwise-delimited file.  Beginning in SAS® 9.1, a new statement (GUESSINGROWS=) is available in PROC IMPORT that enables you to tell SAS how many rows you want it to scan in order to determine variable attributes. In SAS 9.1 and SAS® 9.2, the GUESSINGROWS= value can range from 1 to 32767.  Beginning in SAS® 9.3, the GUESSINGROWS= value can range from 1 to 2147483647.  Keep in mind that the more rows you scan, the longer it takes for the PROC IMPORT to run.

The following program illustrates the use of the GUESSINGROWS= statement in PROC IMPORT:

proc import datafile='c:tempclassdata.csv' out=class              dbms=csv replace;
guessingrows=100;
run;

 

The example above includes the statement GUESSINGROWS=100, which instructs SAS to scan the first 100 rows of the external file for variable attributes. You might need to increase the GUESSINGROWS= value to something greater than 100 to obtain the results that you want.

Scenario 2

In this scenario, my delimited file has the variable names on row 4 and the data starts on row 5. When you use PROC IMPORT, you can specify the record number at which SAS should begin reading.  Although you can specify which record to start with in PROC IMPORT, you cannot extract the variable names from any other row except the first row of an external file that is comma-, tab-, or an otherwise-delimited.

Then how do you program PROC IMPORT so that it begins reading from a specified row?

To do that, you need to allow SAS to assign the variable names in the form VARx (where x is a sequential number). The following code illustrates how you can skip the first rows of data and start reading from row 4 by allowing SAS to assign the variable names:

proc import datafile='c:tempclass.csv' out=class dbms=csv replace;
getnames=no;
datarow=4;
run;

 

Scenario 3

In this scenario, I want to read only records 6–15 (inclusive) in the delimited file. So the question here is how can you set PROC IMPORT to read just a section of a delimited file?

To do that, you need to use the OBS= option before you execute PROC IMPORT and use the DATAROW= option within PROC IMPORT.

The following example reads the middle ten rows of a CSV file, starting at row 6:

options obs=15; 
 
proc import out=work.test2  
            datafile= "c:tempclass.csv" 
            dbms=csv replace; 
            getnames=yes; 
            datarow=6; 
run; 
 
options obs=max; 
run;

 

Notice that I reset the OBS= option to MAX after the IMPORT procedure to ensure that any code that I run after the procedure processes all observations.

Scenario 4

In this scenario, I again use PROC IMPORT to read my external file. However, I receive more observations in my SAS data set than there are data rows in my delimited file. The external file looks fine when it is opened with Microsoft Excel. However, when I use Microsoft Windows Notepad or TextPad to view some records, my data spans multiple rows for values that are enclosed in quotation marks.  Here is a snapshot of what the file looks like in both Microsoft Excel and TextPad, respectively:

The question for this scenario is how can I use PROC IMPORT to read this data so that the observations in my SAS data set match the number of rows in my delimited file?

In this case, the external file contains embedded carriage return (CR) and line feed (LF) characters in the middle of the data value within a quoted string. The CRLF is an end-of-record marker, so the remaining text in the string becomes the next record. Here are the results from reading the CSV file that is illustrated in the Excel and TextPad files that are shown earlier:

That behavior is why you receive more observations than you expect.  Anytime SAS encounters a CRLF, SAS considers that a new record regardless of where it is found.

A sample program that removes a CRLF character (as long as it is part of a quoted text string) is available in SAS Note 26065, "Remove carriage return and line feed characters within quoted strings."

After you run the code (from the Full Code tab) in SAS Note 26065 to pre-process the external file and remove the erroneous CR/LF characters, you should be able to use PROC IMPORT to read the external file with no problems.

For more information about PROC IMPORT, see "Chapter 35, The IMPORT Procedure" in the Base SAS® 9.4 Procedures Guide, Seventh Edition.

 

 

Tips for using the IMPORT procedure to read files that contain delimiters was published on SAS Users.

20
Oct

Creating reports in style with SAS Enterprise Guide

 

“The difference between style and fashion is quality.”

-Giorgio Armani

With an out-of-the-box SAS Enterprise Guide (EG) installation, when you build a report in SAS EG it is displayed in a nice-looking default style. If you like it, you can keep it, and continue reading. If you don’t quite like it, then stop, take a deep breath, and continue reading carefully – you are about to discover a wealth of styling options available in EG. In any case, you are not bound by the default style that is set during your SAS EG installation.

Changing your SAS EG report style on the fly

Let’s say we run the following SAS program in EG:

SAS code sample to run in SAS EG
When you run a SAS Program or a Process Flow that creates an output, it will open in the Results tab shown in a default style (HtmlBlue). For many it looks quite OK. However, SAS provides many other different styles you can choose from. To change your report style, just click on Properties of the workspace toolbar:
Results tab in SAS EG
This will open the Properties for SAS Report window where you can select any style in the Style drop-down list:

Properties for SAS Report
After you selected desired style, click OK; this will save your change and close the window. Your report will immediately be redrawn and displayed in your new style:

SAS report in new style

That new style will only apply to the Results element of the SAS Program or a Process Flow you ran. If you save the EG project and then re-open it in a new EG session and re-run, EG will still remember and use the style you previously selected for the Results element. All other elements will use the default style.

But how do you know what each style’s look and feel is before you select it? The following sections show how to browse different styles as well as how to change your default styles to new ones.

Browsing SAS EG styles

SAS Enterprise Guide interface provides a quick access to viewing different styles, whether built-in or external. Here is how to get an idea of different styles look and feel.

From the EG main menu click on Tools → Style Manager. In the opened Style Manager window you may browse through the Style List by clicking on each style listed in the left pane and also get a good idea of how each particular style looks by viewing it in the right Preview pane of the Style Manager window:

Style Manager window

From Style Manager window, you can also set up a default style by selecting a style you like from the Style List in the left pane and clicking Set as Default button. However, setting your default style using Style Manager will only affect SAS Report and HTML results formats. But what about other results formats? Not to worry, SAS EG interface has you covered.

Changing your default SAS EG report style

If you like a particular report style and don’t want to be stuck with a pre-set default style and necessity to change it every time you run a report, you may easily change your SAS EG default style for practically any results format.

From the EG main menu click on Tools → Options…, the Options window will open. I that window, under Results → Results General you may select (check) one or multiple Results Formats (SAS Report, HTML, PDF, RTF, Text Output, PowerPoint, and Excel) as well as choose your Default Result Format:

Options window to choose default report format

Let’s set up default styles for different results formats. First, let’s go to Results → SAS Report, you will see (under Appearance → Style) that your default style (set up during initial EG installation) is HtmlBlue. Click on the Style drop-down list to select a different style:

Select new report style

The same way you may set up default styles for other results formats (HTML, RTF, PDF, Excel, and PowerPoint). For Graph, you may select a Graph Format (ActiveX, Java, GIF, JPEG, etc.) When you are done, click OK button, the Options window will close and your selected styles become your new default. They are going to persist across EG sessions.

If you are a SAS Administrator, to ensure consistency across your organization, you may have all your SAS Enterprise Guide users set up the same Default styles for every Result format.

Server-side style templates

Server-side SAS style templates are created using the PROC TEMPLATE of the SAS Output Delivery System (ODS) and are stored in Template Stores within SAS libraries. By definition, a template store is an item store that stores items that were created by the TEMPLATE procedure. In particular, built-in server-side SAS style templates are stored in the SASHELP.TMPLMST item store.

Note, that you will not see these item stores / template stores in the EG Server→Library tree under the SASHELP library as it only shows data tables and views. While there is no access in EG to the Templates Window, you can access the Templates Window from SAS Display Manager.

In Enterprise Guide, in order to view a list of built-in server-side SAS styles in the SASHELP.TMPLMST item store, you may run the following code:

proc template;
   path sashelp.tmplmst;
   list styles;
run;

This will produce the following listing shown in the EG’s Results tab:

Report listing

If you want to view all the server-side styles including built-in and user-defined, you can do that in EG by running the following code:

proc template;
   list styles;
run;

Server-side templates are applied to ALL Results Formats.

CSS styles

Cascading Style Sheet (CSS) styles are available only for SAS Report and HTML result formats. The CSS stylesheet only styles the browser-rendered elements. It will not change a graph image style that is generated on the server.

In the SAS code generated by EG, CSS style is specified in STYLESHEET= option of the ODS statement. It can point to any local or network accessible CSS file, for example:

STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/HTMLBlue.css")

In addition, STYLESHEET= option can point to a .css file located on the Internet, for example:

STYLESHEET=(URL="https://www.sas.com/etc/designs/saswww/static.css")

Server-side styles vs. CSS styles

With SAS Enterprise Guide you create Projects, Process Flows, Programs, Tasks, Reports, etc. on your local Window machine. When you Run your Project (or any part of it), EG generates SAS code which gets sent to and executed on the SAS server, and then any visual results are sent back to EG and displayed there.

For every Result Format, a server-side style template is always applied when SAS output is generated on the SAS server.

When that SAS output is returned to SAS EG, for SAS Report and HTML result formats only, an additional optional styling is applied in a form of CSS styles that controls what your SAS Report or HTML output looks like. This CSS styling affects only HTML elements of the output and do not affect graph images that are always generated and styled on the server.

These two kinds of styles are reflected in the EG-generated SAS code that gets shipped to SAS server for execution. If you look at the Code Preview area (Program → Export → Export Program) or Log tab, you will always see ODS statement with STYLE= option that specifies the server-side style. If your selected Result Format is either SAS Report or HTML, then in addition to STYLE= option the ODS statement also contains STYLESHEET= option that specifies HTML CSS stylesheet (external file) accessible via the client.

If you select as default a built-in style (e.g. Harvest) EG will find both server version of it and CSS version of it; you will see this in the SAS log:

STYLE=Harvest
STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/Harvest.css")

However, if you select as default some custom CSS or external CSS style (e.g. ABC) that does not have a match in the server template store, the server style will be set to the default server-side style HTMLBlue; you will see in the SAS log the following WARNING:

WARNING: Style ABC not found; Default style will be used instead.

This warning relates to the STYLE= option specifying the server-side style.

Adding your custom SAS EG report style

Even though SAS supplies dozens of styles for you to choose from (Built-in Styles), you can still modify existing styles and create your own custom styles for SAS Report and HTML output types only. You can do this via Style Manager.

Open Style Manager with either one of the following ways:

Tools → Style Manager

Tools → Options… → Results/SAS Reports → Manage Styles

Tools → Options… → Results/HTML → Manage Styles

Note, that style customization via Style Manager is only available for SAS Report and HTML output types.

In the left pane of the Style Manager there are 3 columns:

  1. Style representing style name;
  2. Location indicating whether it is Built-in Style (SAS-supplied CSS), My Style (your custom CSS), or External Style (any CSS - Cascading Style Sheet - on your local machine or on the Web; or a style template on a SAS server);
  3. URL showing the location of the CSS file.

Find a style in the left pane list you wish to modify. Notice that SAS-supplied built-in styles are not editable (Edit button is grayed out). First, make a copy of this style by pressing Create a Copy button. You can also make a copy of a style by right-clicking on it and selecting Create a Copy from the pop-up menu.  This will open Save Style As window where you can give it a name and select a Save in location.

Your new style appears in the Style List of the Style Manager. Click on the new style name and then press Edit button (alternatively, you may right-click on the new style name and select Edit from the pop-up menu):

Style Editor window

This will open the Style Editor window where you can modify text and border attributes, specify background and banner images, as well as assign any custom CSS property name / property value pairs.

Click OK button when you are done to return to the Style Manager. There you may even set your custom style as default, by selecting it first and then pressing the Set as Default button.

Besides editing your new style in Style Manager, you may also open your-new-style.css file in a Text Editor and edit CSS there.

Adding an external style to Enterprise Guide

You can add external styles to your Style List in the Style Manager. While in the Style Manager, click on Add button, this will open the Add New Style window:

Adding an external style

Make sure Add new external style radio box is selected. Type in a Style name for your external style and Style URL, which can be a folder/directory path name on your local machine or your network (e.g. C:your_folderyour_css_file_name.css) or a location on the Web (e.g. http://www.some_domain.com/styles/your_special_style.css).

To make your custom styles available to all SAS EG users in your organization, you may create them as a SAS style template using PROC TEMPLATE and place on a SAS server (server-side style), see this SAS Code Sample.  In this case, you can add your custom style to the Style Manager by selecting This is a SAS server style only check box in the above Add New Style window. The Style URL field will become disabled, as it is only used to specify CSS stylesheet:

Checking This is a SAS server style only checkbox

You would select this checkbox if you only want to use server-side style (the STYLE= option is always present) and do not want to also provide and apply an optional CSS stylesheet (STYLESHEET=).

Conclusion

In this post I tried to present a comprehensive guide on using styles in SAS Enterprise Guide. Please use the Comments section below to share your experience with Enterprise Guide as it relates to reports styling.

Resources

Little SAS Enterprise Guide bookThe Little SAS Enterpriser Guide Book

Point-and-Click Style Editing in SAS® Enterprise Guide®

I Didn’t  Know SAS®  Enterprise Guide®  Could Do  That!

Creating reports in style with SAS Enterprise Guide was published on SAS Users.

15
Sep

Using a web browser as a SAS code editor

Whether you are a SAS code creator, a blogger, a technical writer, an editor-in-chief, an executive, a secretary, a developer or programmer in any programming language or simply someone who uses computer or hand-held device for writing, you need to read this blog post – your life is about to change forever!

Did you know that you can use a web browser as a SAS code editor? I’m not talking about browser-based SAS programming interfaces like SAS University Edition or SAS Studio; these are full-blown applications. I’m talking about converting a regular web browser into a “notepad” where you can type, display, and save your SAS code. Or non-SAS code. Or practically anything. And you don’t even have to be connected to the Internet to use this browser functionality.

Converting a web browser into a notepad

This trick works with most modern browsers:

  • Chrome
  • Firefox
  • Opera
  • Safari

It will not work on Internet Explorer 11.

Try this: open your web browser (I am using Firefox in the examples below) or a new tab in your browser and type the following in the URL field (case insensitive):

data:text/html,<html contentEditable>

Hit Enter. Then click anywhere in the browser body.

Your browser has just turned into a Notepad. You can now type anything in it, including SAS code:

Using web browser as SAS code editor

In order to save your SAS code in Firefox, click on File ⇒ Save Page As… and save it as type Text Files (*.txt;*.text):

Saving SAS code in a browser as text file

This functionality is possible thanks to HTML5’s contentEditable attribute and the browsers’ ability to handle data URL.

I don’t know about you, but I find this browser-notepad feature very cool and handy. Not only does it allow you to type SAS code in your browser, but it also gives you the capability to take notes and copy & paste excerpts or code snippets from other web pages on non-web applications. If you use WebEx or Skype or Lync to present one of the SAS web browser-based products such as SAS Visual Analytics, SAS Visual Statistics, etc., you can share your browser to your audience and make one of the tabs a typeable area. Then during your presentation you may switch between browser tabs depending on whether you are presenting SAS VA/VS or your own on-the-fly typing.

Bookmark notepad in a browser

If you like this Notepad browser feature, you can easily bookmark it by placing it on the Bookmarks toolbar. In this case, I suggest typing the following line in the URL filed:

data:text/html,<html contentEditable><title>Notepad</title>

and then dragging the image in front of this URL string and dropping it to the Bookmarks toolbar to create a button. Then, every time you need a Notepad it is at your fingertips; you just need to click the button:

Bookmark SAS code editor in a browser

Styling your new SAS editor in a browser

By default, your browser editor does not look pretty. However, you can apply CSS styles to it to make it look better. You can control font (style, size, color), margins, paddings, background and other CSS style attributes. For example, try the following URL:

data:text/html, <textarea style="width:100%; height:100%; padding:20px; font-size:2em; font-family: SAS Monospace; color:darkblue; border:none; border-left: 10px solid lightblue; margin-left: 30px;" autofocus/>

Your web browser editor becomes much more presentable:

Customize SAS code editor in a browser

The autofocus attribute places cursor immediately in the typing area of the browser notepad, without having to click on the browser body first.

I want to hear from you!

Do you like this editable browser feature? Would you use it to enhance your presentations? Do you envision yourself writing SAS code in a browser? An article, a blog post? What other usages can you envision using such a web browser transformation? Do you have any ideas to expand this notepad browser functionality beyond presenting, typing, taking notes, copying/pasting, and saving your SAS code? Can you apply SAS color syntax highlighting in a browser? Or a background image? How about submitting your SAS code from a browser?

Using a web browser as a SAS code editor was published on SAS Users.

2
Aug

CALL EXECUTE made easy for SAS data-driven programming

In my prior posts (Data-driven SAS macro loops, Modifying variable attributes in all datasets of a SAS library, Automating the loading of multiple database tables into SAS tables), I presented various data-driven applications using SAS macro loops.

However, macro loops are not the only tools available in SAS for developing data-driven programs.

CALL EXECUTE is one of them. The CALL EXECUTE routine accepts a single argument that is a character string or character expression. The character expression is usually a concatenation of strings containing SAS code elements to be executed after they have been resolved. Components of the argument expression can be character constants, data step variables, macro variable reference, as well as macro references. CALL EXECUTE dynamically builds SAS code during DATA step iterations; that code executes after the DATA step’s completion outside its boundary. This makes a DATA step iterating through a driver table an effective SAS code generator similar to that of SAS macro loops.

However, the rather peculiar rules of the CALL EXECUTE argument resolution may make its usage somewhat confusing. Let’s straighten things out.

Argument string has no macro or macro variable reference

If an argument string to the CALL EXECUTE contains SAS code without any macro or macro variable references, that code is simply pushed out (of the current DATA step) and appended to a queue after the current DATA step. As the DATA step iterates, the code is appended to the queue as many times as there are iterations of the DATA step. After the DATA step completes, the code in the queue gets executed in the order of its creation (First In First Out).

The beauty of this process is that the argument string can be a concatenation of character constants (in single or double quotes) and SAS variables which get substituted with their values by CALL EXECUTE for each DATA step iteration. This will produce data-driven, dynamically generated SAS code just like an iterative SAS macro loop.

Let’s consider the following example. Say we need to load multiple Oracle tables into SAS tables.

Step 1. Creating a driver table

In order to make our process data-driven, let’s first create a driver table containing a list of the table names that needed to be extracted and loaded:

/* create a list of tables to extract & load */
libname parmdl '/sas/data/parmdata';
data parmdl.tablelist;
        length tname $8;
        input tname;
        datalines;
ADDRESS
ACCOUNT
BENEFIT
FINANCE
HOUSING
;

This program runs just once to create the driver table parmdl.tablelist.

Step 2. Loading multiple tables

Then, you can use the following data-driven program that runs each time you need to reload Oracle tables into SAS:

/* source ORACLE library */
libname oralib oracle path="xxx" schema="yyy" user="uuu"
 	PASSWORD="{SAS002}ABCDEFG12345678RTUR" access=readonly;
 
/* target SAS library */
libname sasdl '/sas/data/appdata';
 
/* driver table SAS library */
libname parmdl '/sas/data/parmdata';
 
data _null_;
   set parmdl.tablelist;
   call execute(cats(
      'data sasdl.',tname,';',
         'set oralib.',tname,';',
      'run;'));
run;

In order to concatenate the components of the CALL EXECUTE argument I used the cats() SAS function which returns a concatenated character string removing leading and trailing blanks.

When this program runs, the SAS log indicates that after the data _null_ step the following statements are added and executed:

NOTE: CALL EXECUTE generated line.
1   + data sasdl.ADDRESS;set oralib.ADDRESS;run;
2   + data sasdl.ACCOUNT;set oralib.ACCOUNT;run;
3   + data sasdl.BENEFIT;set oralib.BENEFIT;run;
4   + data sasdl.FINANCE;set oralib.FINANCE;run;
5   + data sasdl.HOUSING;set oralib.HOUSING;run;

In this example we use data _null_ step to loop through the list of tables (parmdl.tablelist) and for each value of the tname column a new data step gets generated and executed after the data _null_ step. The following diagram illustrates the process:

Diagram explaining CALL EXECUTE for SAS data-driven programming

Argument string has macro variable reference in double quotes

If an argument to the CALL EXECUTE has macro variable references in double quotes, they will be resolved by the SAS macro pre-processor during the DATA step compilation. Nothing unusual. For example, the following code will execute exactly as the above, and macro variable references &olib and &slib will be resolved to oralib and sasdl before CALL EXECUTE takes any actions:

%let olib = oralib;
%let slib = sasdl;
 
data _null_;
   set parmdl.tablelist;
   call execute (
      "data &slib.."!!strip(tname)!!';'
         "set &olib.."!!strip(tname)!!';'!!
      'run;'
   );
run;

Argument string has macro or macro variable reference in single quotes

Here comes the interesting part. If the argument to CALL EXECUTE has macro or macro variable references in single quotes, they still will be resolved before the code is pushed out of the DATA step, but not by the SAS macro pre-processor during the DATA step compilation as it was in the case of double quotation marks. Macro or macro variable references in single quotes will be resolved by CALL EXECUTE itself. For example, the following code will execute exactly as the above, but macro variable references &olib and &slib will be resolved by CALL EXECUTE:

%let olib = oralib;
%let slib = sasdl;
 
data _null_;
   set parmdl.tablelist;
   call execute('data &slib..'!!strip(tname)!!';'!!
                'set &olib..'!!strip(tname)!!';'!!
                'run;'
               );
run;

Timing considerations

CAUTION: If your macro contains some non-macro language constructs for assigning macro variables during run time, such as a CALL SYMPUT or SYMPUTX statement (in a DATA step) or an INTO clause (in PROC SQL), resolving those macro variable references by CALL EXECUTE will happen too soon, before your macro-generated code gets pushed out and executed. This will result in unresolved macro variables. Let’s run the following code that should extract Oracle tables into SAS tables as above, but also re-arrange column positions to be in alphabetical order:

%macro onetable (tblname);
   proc contents data=oralib.&tblname out=one(keep=name) noprint;
   run;
 
   proc sql noprint;
      select name into :varlist separated by ' ' from one;
   quit;
   %put &=varlist;
 
   data sasdl.&tblname;
      retain &varlist;
      set oralib.&tblname end=last nobs=n;
      if last then call symput('n',strip(put(n,best.)));
   run;
   %put Table &tblname has &n observations.;
%mend onetable;
 
data _null_;
   set parmdl.tablelist;
   call execute('%onetable('!!strip(tname)!!');');
run;

Predictably, the SAS log will show unresolved macro variable references, such as:

WARNING: Apparent symbolic reference VARLIST not resolved.
WARNING: Apparent symbolic reference N not resolved.
Table ADDRESS has &n observations.

SOLUTION: To avoid the timing issue when a macro reference gets resolved by CALL EXECUTE too soon, before macro variables have been assigned during macro-generated step execution, we can strip CALL EXECUTE of the macro resolution privilege. In order to do that, we can mask & and % characters using the %nrstr macro function, thus making CALL EXECUTE “macro-blind,” so it will push the macro code out without resolving it. In this case, macro resolution will happen after the DATA step where CALL EXECUTE resides. If an argument to CALL EXECUTE has a macro invocation, then including it in the %nrstr macro function is the way to go. The following code will run just fine:

data _null_;
   set parmdl.tablelist;
   call execute('%nrstr(%onetable('!!strip(tname)!!'));');
run;

When this DATA step runs, the SAS log indicates that the following statements are added and executed:

NOTE: CALL EXECUTE generated line.
1   + %onetable(ADDRESS);
2   + %onetable(ACCOUNT);
3   + %onetable(BENEFIT);
4   + %onetable(FINANCE);
5   + %onetable(HOUSING);

CALL EXECUTE argument is a SAS variable

The argument to CALL EXECUTE does not necessarily have to contain or be a character constant. It can be a SAS variable, a character variable to be precise. In this case, the behavior of CALL EXECUTE is the same as when the argument is a string in single quotes. It means that if a macro reference is part of the argument value it needs to be masked using the %nrstr() macro function in order to avoid the timing issue mentioned above.

In this case, the argument to the CALL EXECUTE may look like this:

arg = '%nrstr(%mymacro(parm1=VAL1,parm2=VAL2))';
call execute(arg);

Making CALL EXECUTE totally data-driven

In the examples above we used the tablelist driver table to retrieve values for a single macro parameter for each data step iteration. However, we can use a driver table not only to dynamically assign values to one or more macro parameters, but also to control which macro to execute in each data step iteration. The following diagram illustrates the process of totally data-driven SAS program:

Diagram explaining using CALL EXECUTE for SAS data-driven programming

Conclusion

CALL EXECUTE is a powerful tool for developing dynamic data-driven SAS applications. Hopefully, this blog post demonstrates its benefits and clearly explains how to avoid its pitfalls and use it efficiently to your advantage. I welcome your comments, and would love to hear your experiences with CALL EXECUTE.

CALL EXECUTE made easy for SAS data-driven programming was published on SAS Users.

25
Jul

Programming tips from experienced SAS users

Trivial Pursuit, Justin Bieber and Timbits. Some pretty great things have come from Canada, eh? Well, you can go ahead and add expert SAS programmers to that impressive list.

In this video, six Canadian SAS programmers, with more than 115 years of SAS programming experience between them, share some of their favorite, little-known SAS programming tips. You're sure to discover a new trick or two.

The video includes the following tips and more:

  • Standardizing and documenting your SAS program.
  • Creating parameter lookup tables.
  • Declaring your macro variables.
  • Using the Characterize Data task in SAS Enterprise Guide.
  • Data exploration best practices for SAS Enterprise Guide.

Looking for more great tips to help bring your SAS programming skills to the next level? Check out these great resources and learn even more from your SAS peers:

  • SAS Support Communities: Peer-to-peer support for SAS users about programming, data analysis, installation and deployment issues, tips and best practices and a whole lot more.
  • SAS Blogs: Connecting you to people, products and ideas from SAS with technical tips, support information and more.
  • SAS Newsletters: The latest news, tips, tricks and resources from SAS, plus advice and industry knowledge gleaned from top experts and other SAS users.

 

Programming tips from experienced SAS users was published on SAS Users.

21
Jul

Selecting the top n% and bottom n% of observations from a data set

n% of observations from a data setSAS® offers several ways that you can find the top n% and bottom n% of data values based on a numeric variable. The RANK procedure with the GROUPS= option is one method. Another method is The UNIVARIATE procedure with the PCTLPTS= option. Because there are several ways to perform this task, you can choose the procedure that you are most familiar with. In this blog post, I use the SUMMARY procedure to generate the percentile values and macro logic to dynamically choose the desired percentile statistics. After the percentiles are generated, I subset the data set based on those values. This blog post provides two detailed examples: one calculates percentiles for a single variable and one calculates percentiles within a grouping variable.

Calculate Percentiles of a Single Variable

Calculating percentiles of a single variable includes the following steps. Within the macro, a PROC SUMMARY step calculates the percentiles. The subsequent DATA step uses CALL SYMPUTX to create macro variables for the percentile values, and the final DATA step uses those macro variables to subset the data. Here is the code, which is explained in detail below:

/* Create sample data */
data test;                   
   do i=1 to 10000;                                                     
      x=ranuni(i)*12345;                                         
      output;                                                         
   end; 
   drop i; 
run;     
 
proc sort data=test;
   by x;
run; 
 
%macro generate_percentiles(ptile1,ptile2); 
/* Output desired percentile values */                         
proc summary data=test;                                               
   var x;                                                       
   output out=test1 &amp;ptile1= &amp;ptile2= / autoname;                               
run;                                                                 
 
/* Create macro variables for the percentile values */     
data _null_;                                                         
   set test1;                                                         
   call symputx("&amp;ptile1", x_&amp;ptile1);                                     
   call symputx("&amp;ptile2", x_&amp;ptile2);                                     
run;    
%put &amp;&amp;&amp;ptile1;
%put &amp;&amp;&amp;ptile2; 
 
data test2;                                                             
   set test;                                                           
/* Use a WHERE statement to subset the data */                         
   where x le &amp;&amp;&amp;ptile1 or x ge &amp;&amp;&amp;ptile2;                                       
run;  
 
proc print;
run; 
 
%mend;
 
options mprint mlogic symbolgen;
%generate_percentiles(p1,p99)
%generate_percentiles(p25,p75)

After creating and sorting the sample data, I begin my macro definition with two parameters that enable me to substitute the desired percentiles in my macro invocation:

%macro generate_percentiles(ptile1,ptile2);

The PROC SUMMARY step writes the desired percentiles for variable X to the Test1 data set. The AUTONAME option names the percentile statistics in the following format, <varname>_<percentile> (for example, x_p25).

proc summary data=test;                                               
   var x;                                                       
   output out=test1 &amp;ptile1= &amp;ptile2= / autoname;                               
run;

Next, I want to store the values of the percentile statistics in macro variables so that I can use them in later processing. I use CALL SYMPUTX to do this, which gives the macro variables the same name as the statistic. To see the resulting values in the log, I use

data _null_;                                                         
   set test1;                                                         
   call symputx("&amp;ptile1", x_&amp;ptile1);                                     
   call symputx("&amp;ptile2", x_&amp;ptile2);                                     
run;    
%put &amp;&amp;&amp;ptile1;
%put &amp;&amp;&amp;ptile2;

The SAS log shows the following:

MLOGIC(GENERATE_PERCENTILES):  %PUT &amp;&amp;&amp;ptile1
SYMBOLGEN:  &amp;&amp; resolves to &amp;.
SYMBOLGEN:  Macro variable PTILE1 resolves to p1
SYMBOLGEN:  Macro variable P1 resolves to 123.22158288
123.22158288
MLOGIC(GENERATE_PERCENTILES):  %PUT &amp;&amp;&amp;ptile2
SYMBOLGEN:  &amp;&amp; resolves to &amp;.
SYMBOLGEN:  Macro variable PTILE2 resolves to p99
SYMBOLGEN:  Macro variable P99 resolves to 12232.136483
12232.136483

I use these macro variables in a WHERE statement within a DATA step to subset the data set based on the percentile values:

data test2;                                                             
   set test;                                                           
/* Use a WHERE statement to subset the data */                         
   where x le &amp;&amp;&amp;ptile1 or x ge &amp;&amp;&amp;ptile2;                                       
run;

Finally, the macro invocations below pass in the desired percentile statistics:

%generate_percentiles(p1,p99)
%generate_percentiles(p25,p75)

The percentile statistics that are available with PROC SUMMARY are included in the documentation for the

/* Create sample data */
data test; 
 do group='a','b'; 
   do i=1 to 10000;                                                     
      x=ranuni(i)*12345;                                         
      output;                                                         
   end; 
 end;
   drop i; 
run;     
 
proc sort data=test;
   by group x;
run; 
 
%macro generate_percentiles(ptile1,ptile2); 
/* Output desired percentile values by group */                         
proc summary data=test; 
   by group; 
   var x;                                                       
   output out=test1 &amp;ptile1= &amp;ptile2= / autoname;                               
run;                                                                 
 
/* Create macro variables for each value of the BY variable */
/* Create macro variables for the percentile values for each BY group */ 
/* Create a macro variable that is the count of the unique
values of the BY variable */ 
data _null_;   
  retain count 0; 
   set test1;   
   by group;
   if first.group then do;
    count+1;
    call symputx('val'||left(count),group); 
    call symputx("&amp;ptile1"||'_'||left(count), x_&amp;ptile1);                                     
    call symputx("&amp;ptile2"||'_'||left(count), x_&amp;ptile2);  
   end; 
  call symput('last',left(count));
run;    
%put _user_;
 
/* Loops through each value of the BY variable */ 
%do i=1 %to &amp;last;
 
data test&amp;i;                                                             
   set test;  
   where group="&amp;&amp;val&amp;i"; 
/* Use an IF statement to subset the data */
   if x le &amp;&amp;&amp;ptile1._&amp;i or x ge &amp;&amp;&amp;ptile2._&amp;i;                                       
run;  
 
proc print;
run; 
 
%end;
%mend;
 
options mprint mlogic symbolgen;
%generate_percentiles(p1,p99)

Calculating percentiles has many applications, including ranking data, finding outliers, and subsetting data. Using a procedure in Base SAS® that enables you to request percentile statistics along with the power of the macro language, you can dynamically generate desired values that can be used for further processing and analysis.

Selecting the top n% and bottom n% of observations from a data set was published on SAS Users.

16
Jun

Using parameters within the macro facility

Using parameters within the macro facilityHave you ever written a macro and wondered if there was an easy way to pass values to the macro? You can by using macro parameters. Macro parameters enable you to pass values into the macro at macro invocation, and set default values for macro variables within the macro definition. In this blog post, I also discuss how you can pass in a varying number of parameter values.

There are two types of macro parameters: positional and keyword.

Positional Parameters

You can use positional parameters to assign values based on their position in the macro definition and at invocation. The order that you use to specify the values must match the order in which they are listed in the %MACRO statement. When specifying multiple positional parameters, use a comma to separate the parameters. If you do not pass a value to the macro when it is invoked, a null value is assigned to the macro variable specified in the %MACRO statement.

Here is an example:

%macro test(var1,var2,var3);                                                                                                            
 %put &amp;=var1;                                                                                                                           
 %put &amp;=var2;                                                                                                                           
 %put &amp;=var3;                                                                                                                           
%mend test;                                                                                                                             
 
/** Each value corresponds to the position of each variable in the definition. **/ 
/** Here, I am passing numeric values.                                         **/                                                            
%test(1,2,3)                                                                                                                            
/** The first position matches with var1 and is given a null value.            **/                                                             
%test(,2,3)                                                                                                                             
/** I pass no values, so var1-var3 are created with null values.               **/                                                             
%test()                                                                                                                                 
/** The first value contains a comma, so I use %STR to mask the comma.         **/                                                             
/** Otherwise, I would receive an error similar to this: ERROR: More           **/
/** positional parameters found than defined.                                  **/                                                             
%test(%str(1,1.1),2,3)                                                                                                                  
/** Each value corresponds to the position of each variable in the definition. **/ 
/** Here, I am passing character values.                                       **/                                                            
%test(a,b,c) 
/** I gave the first (var1) and second (var2) positions a value of             **/
/** b and c, so var3 is left with a null value.                                **/                                                             
%test(b,c)

 

Here are the log results:

173  /** Each value corresponds to the position of each variable in the definition. **/
174  /** Here, I am passing numeric values.                                         **/
175  %test(1,2,3)
VAR1=1
VAR2=2
VAR3=3
176  /** The first position matches with var1 and is given a null value.            **/                                                             
177  %test(,2,3)
VAR1=
VAR2=2
VAR3=3
 
178  /** I pass no values, so var1-var3 are created with null values.               **/
179  %test()
VAR1=
VAR2=
VAR3=
180  /** The first value contains a comma, so I use %STR to mask the comma.         **/                                                             
181  /** Otherwise, I would receive an error similar to this: ERROR: More           **/
182  /** positional parameters found than defined.                                  **/                                                             
183  %test(%str(1,1.1),2,3)
VAR1=1,1.1
VAR2=2
VAR3=3
184  /** Each value corresponds to the position of each variable in the definition. **/
185  /** Here, I am passing character values.                                       **/
186  %test(a,b,c)
VAR1=a
VAR2=b
VAR3=c
187  /** I gave the first (var1) and second (var2) positions a value of             **/
188  /** b and c, so var3 is left with a null value.                               **/
189  %test(b,c)
VAR1=b
VAR2=c
VAR3=

 

Keyword Parameters

The benefit of using keyword parameters is the ability to give the macro variables a default value within the macro definition. When you assign values using keyword parameters, you must include an equal sign after the macro variable name.

Here is an example:

%macro test(color=blue,id=123);                                                                                                         
 %put &amp;=color;                                                                                                                          
 %put &amp;=id;                                                                                                                             
%mend test;                                                                                                                             
 
/** Values passed to the macro overwrite default values from the definition. **/                                                                 
%test(color=red,id=456)                                                                                                                 
/** Passing in no values allows the default values to take precedence.      **/                                                                 
%test()                                                                                                                                 
/** You are not required to pass in a value for each keyword parameter.    **/                                                                 
%test(color=green)                                                                                                                      
/** The order of variables does not matter.                               **/                                                                                                 
%test(id=789,color=yellow)

 

Here are the log results:

270  /** Values passed to the macro overwrite default values from the definition. **/
271  %test(color=red,id=456)
COLOR=red
ID=456
272  /** Passing in no values allows the default values to take precedence.     **/
273  %test()
COLOR=blue
ID=123
274  /** You are not required to pass in a value for each keyword parameter.   **/
275  %test(color=green)
COLOR=green
ID=123
276  /** The order of variables does not matter.                              **/
277  %test(id=789,color=yellow)
COLOR=yellow
ID=789

 

If the macro definition combines positional and keyword parameters, positional parameters must come first. If you do not follow this order, this error is generated:

ERROR: All positional parameters must precede keyword parameters.

 

Here is an example:

%macro test(val,color=blue,id=123);                                                                                                     
 %put &amp;=color;                                                                                                                          
 %put &amp;=id;                                                                                                                             
 %put &amp;=val;                                                                                                                            
%mend test;                                                                                                                             
 
/** The positional parameter is listed first. **/                                                                 
%test(1,color=red,id=456)
 
Here are the log results:
 
318  /** The positional parameter is listed first. **/                                                                 319  %test(1,color=red,id=456)
COLOR=red
ID=456
VAL=1

 

PARMBUFF

The PARMBUFF option creates a macro variable called &SYSPBUFF that contains the entire list of parameter values, including the parentheses. This enables you to pass in a varying number of parameter values. In the following example, you can pass any number of parameter values to the macro. This following example illustrates how to parse each word in the parameter list:

%macro makes/parmbuff; 
  /** The COUNTW function counts the number of words within &amp;SYSPBUFF.            **/                                                                                                                 
   %let cnt=%sysfunc(countw(&amp;syspbuff)); 
  /** The %DO loop increments based on the number of words returned to the macro. **/
  /** variable &amp;CNT.                                                              **/                                
   %do i= 1 %to &amp;cnt;  
  /** The %SCAN function extracts each word from &amp;SYSPBUFF.                      **/                                                                                                                  
     %let make=%scan(&amp;syspbuff,&amp;i);                                                                                                     
     %put &amp;make;                                                                                                                        
   %end;                                                                                                                                
%mend makes;                                                                                                                            
 
%makes(toyota,ford,chevy)

 

Here are the log results:

19  %macro makes/parmbuff;
20    /** The COUNTW function counts the number of words within &amp;SYSPBUFF.            **/
21     %let cnt=%sysfunc(countw(&amp;syspbuff));
22    /** The %DO loop increments based on the number of words returned to the macro  **/
23    /** variable &amp;CNT.                                                              **/
24     %do i= 1 %to &amp;cnt;
25    /** The %SCAN function extracts each word from &amp;SYSPBUFF.                       **/
26       %let make=%scan(&amp;syspbuff,&amp;i);
27       %put &amp;make;
28     %end;
29  %mend makes;
30
31  %makes(toyota,ford,chevy)
toyota
ford
chevy

 

When you specify the PARMBUFF option and the macro definition includes both positional and keyword parameters, the parameters still receive values when you invoke the macro. In this scenario, the entire invocation list of values is assigned to &SYSPBUFF. Here is an example:

%macro test(b,a=300)/parmbuff;                                                                                                      
 %put &amp;=syspbuff;                                                                                                                        
 %put _local_;                                                                                                                          
%mend;                                                                                                                                  
 
%test(200,a=100)

 

Here are the log results:

SYSPBUFF=(200,a=100)
TEST A 100
TEST B 200

 

Notice that &SYSPBUFF includes the entire parameter list (including the parentheses), but each individual parameter still receives its own value.

If you need to know all the parameter values that are passed to the macro, specify the PARMBUFF option in the macro definition to get access to &SYSPBUFF, which contains all the parameter values. For more information about PARMBUFF, see %MACRO Statement in SAS® 9.4 Macro Language: Reference, Fifth Edition.

I hope this blog post has helped you understand how to pass values to a macro. If you have SAS macro questions that you would like me to cover in future blog posts, please comment below.

Using parameters within the macro facility was published on SAS Users.

6
Jun

Combine and conquer with SAS

Combine and conquer with SASThe ancient political maxim “divide and conquer” (Lat. “dīvide et īmpera”) has been used for millennia in politics, sociology, and psychology, mainly to manipulate people. In the last two centuries, however, this principle has also been adopted by computer science. It is widely used in computer programming, primarily in computer algorithms such as binary search, quick sort, Fast Fourier Transform (FFT), parallel processing algorithms – to name a few.

A typical Divide and Conquer algorithm consists of the following steps:

  1. Divide – break the given problem into smaller pieces (sub-problems) of the same type;
  2. Conquer – solve these smaller pieces (sub-problems);
  3. Combine – aggregate the results of the solved sub-problems to deduce the result of the initial problem.

However, in this blog post we’ll reverse the “divide and conquer” principle and go backwards from step three to step two, to combine and then conquer.

In our day-to-day programming activities, we may not need to divide a problem in order to solve it. In many cases, we deal with small pieces of a puzzle (language elements) provided by the SAS programming language, and we solve some larger problems by combining those elements.

Let’s consider some examples.

Combining SAS Formats

Suppose you have a variable x in a SAS table on which you need to report with the following requirement. Depending on the variable value, you will need to apply different SAS formats:

Different SAS formats based on variable values

In other words, you need to display percentages that are greater than 10% as whole numbers, percentages between 1% and 10% as numbers with one decimal point, and percentages that are less than 1% as numbers with two decimal points.

You may get away with proc format’s picture statement with a proper multiplier – but good luck with that. Fortunately, there is a much better way of solving this problem. Did you know that you can use already existing formats as labels in proc format’s value statement? In other words, we can combine several SAS formats while defining another SAS format.

The above problem is easily solved by defining a new format as a combination of different PERCENTw.d formats as follows:

proc format;
   value pctmod
      -0.01 <-< 0.01 = [percent10.2] 
      -0.1 - -0.01, 0.01 - 0.1 = [percent10.1]
      low -< -0.1, 0.1 <- high = [percent10.0]
      ;
run;

Here is what this format does:

Unformatted and formatted values

You can combine any other existing formats as needed, including mixing and matching SAS-supplied formats and user-defined formats. You can even use existing formats within the PICTURE statement of the PROC FORMAT. For example, the following format will format values between 0 and 1 as cents (e.g. ¢75) while values greater than 1 as dollars (e.g. $9,340):

proc format;
   picture dolcent
	  0 <-< 1 = '99' (prefix='¢' mult=100)
	  1 - high = [dollar32.0]
	  ;
run;

Here is what this format does:

Formatted values vs. unformatted values

When using an existing format as a label, make sure to enclose the format name in square brackets, e.g. [date9.], and do not enclose the name of the existing format in single or double quotes.

Using an existing format in the format definition can be construed as format nesting. A nested level of one means that you are creating the format A with the format B as a formatted value. However, if you also create format B with the format C as a formatted value, then you have nested level of two. Avoid nesting formats with a level higher than one, as the resource requirements may increase dramatically with each additional level.

Combining SAS Libraries

Suppose you have defined several SAS data libraries, such as

libname apples 'c:projectsgardenapples';
libname oranges 'c:projectsgardenoranges';
libname tomatoes 'c:projectsgardentomatoes';

You can then logically combine (concatenate) them into a single library:

libname combolib (apples oranges tomatoes);

or

libname combolib ('c:projectsgardenapples' 'c:projectsgardenoranges' 'c:projectsgardentomatoes');

Yes, you can combine apples with oranges, and even with tomatoes, and that will enable you to access the SAS data sets in several libraries with one libref.

As you can see, you can combine two or more libraries by their librefs, physical names, or even a combination of librefs and physical names, such as:

libname combolib (apples oranges 'c:projectsgardentomatoes');

SAS will even allow using commas between the concatenated libraries:

libname combolib (apples, oranges, tomatoes);
libname combolib (apples, oranges, 'c:projectsgardentomatoes');

It is important to know that while SAS has no problem combining multiple libraries that contain same-named files, when you reference such a file, the concatenated libraries are searched in order and the first occurrence of the specified file is used.
This proves to be extremely useful when you have multiple versions of the same tables that you promote between different environments during a system development cycle. For example, you might have a SAS data table named SALES in either some or all of the development, testing, and production environments. You can define librefs for each environment as well as a libref for the combined environment:

libname devllib 'c:projectsproj_xdevl';
libname testlib 'c:projectsproj_xtest';
libname prodlib 'c:projectsproj_xprod';
 libname lastlib (prodlib testlib devllib);

Then when you reference your table as lastlib.SALES SAS will return the SALES table from production environment if it exists there; if not then from testing environment if it exists there; if not then from development environment it exists there. That logical hierarchy means that SAS will return the latest and greatest version of that SALES table. If you want that table from a specific environment, then you reference it using a specific libref: prodlib.SALES, testlib.SALES, or devllib.SALES.

Combining SAS Format Libraries

You can combine SAS format libraries to achieve similar version prioritization as in the above case of SAS data libraries using FMTSEARCH= System Option:

options fmtsearch=(prodlib testlib devllib);

This implies that your SAS formats are stored in the catalogs named FORMATS in the above specified libraries: prodlib.FORMATS, testlib.FORMATS, devllib.FORMATS.

One must remember that if not specified, WORK and LIBRARY libraries by default take precedence in the order in which SAS formats are searched, making the above options statement effectively look like:

options fmtsearch=(work library prodlib testlib devllib);

If WORK and LIBRARY libraries are specified in the fmtsearch= then the default is not applied and the search order is as specified.

Your turn

An unlimited space below is reserved for your contribution to the fascinating topic of combine and conquer with SAS. This is your chance to contribute to the list of coding techniques that you use to combine and conquer solving your SAS coding problems.

 

Combine and conquer with SAS was published on SAS Users.

5
Jun

The new Data Pane in SAS Visual Analytics - It's painless!

Refining your data for effective reports is even easier in the 8.1 release of SAS Visual Analytics. In this blog post, I’ll take a look at the data pane, how it displays data from your active data source, and a few tasks that you might want to perform, such as viewing measure details, changing data item properties, and creating geographic data items, hierarchies, and custom categories.  In a future blog, I’ll look at creating filters, new calculated data items, and calculated aggregations.

In the SAS Visual Analytics interface, you can display the Data pane by clicking the Data icon on the left menu.

Measure details

You can view data item properties and make these changes:

  • The name of a data item.
  • The Classification of Measure or Category data item.
  • The format of a Measure or Date/Time/Datetime data item.
  • The aggregation of a Measure data item.
  • Note:  User-defined formats for Category data items are honored, but cannot be changed.  The user-defined format must be available to the CAS server where the data is loaded.

A Measure classification can be changed to Category or Geography. A Category classification can be changed to Geography.  Date and Tim classifications cannot be changed.

You can modify the aggregation for a Measure data item:

You can modify measure formats:

  • You can expand to display available Duration and Currency formats.
  • Over thirty different Currency formats are available for local or international currency.  (Example: $56,790 or USD56,789)

Geography data items identity a geographic region or location, and are typically used to visualize data on a geographic map.

  • You can create geography data items by using predefined geography classifications such as countries or states.
  • You can create a custom geographic classification by providing latitude and longitude coordinates in your data source.
  • For predefined classifications, the values of your category data items must match the lookup values documented here.

Designating a Geography data item:

Designating a custom Geography data item:

A hierarchy enables you to add drill-down functionality to a report.

  • A hierarchy can display information from general to specific.
  • You can create category, date/datetime, or geography hierarchies.
  • Not all report objects support all types of hierarchies.

  • A custom category is a category data item which associates a set of alphanumeric values with intervals, ranges, or specific values of a data item.
  • A custom category can be based on a category or a measure data item.
  • A custom category can provide functionality in a report similar to that of a user-defined format—without having to have a format previously associated with the data item.

Example:
1-20 = ‘First group’
21-30 = ‘Second group’
31-40 = ‘Third group’

Depending on the report object, some roles require only one value; other roles may require multiple values.

You can right-click on a data item in the Roles pane to remove the data item from a role.

Duplicating a data item can enable you to display data in two different ways.

Duplicating the Date data item with 939 different values can enable you to create a report that analyzes production on each day of the week: Monday, Tuesday, Wednesday,…

Duplicating a measure data item can enable you to use the column as a category data item in reports as well.

Note: You can only delete data items that you have created, such duplicated data items or calculated data items.  Deleting a data item means that it no longer appears in the data pane.

This covers many of the basic tasks that you can complete in the new data pane. In my next blog, I’ll take a look at the visual and text editors for creating filter expressions and calculated items and aggregations in the data pane.

The new Data Pane in SAS Visual Analytics - It's painless! was published on SAS Users.

Back to Top