5
Dec

Jedi SAS Tricks: Pattern Search in DS2

Finding a pattern like a phone number or national ID number embedded in text can be difficult and time consuming. The traditional DATA step has a family of functions (collectively referred to as PRX functions) that allow using Perl regular expressions in your SAS programs to make pattern search easier. [...]

The post Jedi SAS Tricks: Pattern Search in DS2 appeared first on SAS Learning Post.

4
Dec

How to scrape data from a web page using SAS

The internet is rich with data, and much of that data seems to exist only on web pages, which -- for some crazy reason -- are designed for humans to read. When students/researchers want to apply data science techniques to analyze collect and analyze that data, they often turn to "data scraping." What is "data scraping?" I define it as using a program to fetch the contents of a web page, sift through its contents with data parsing functions, and save its information into data fields with a structure that facilitates analysis.

Python and R users have their favorite packages that they use for scraping data from the web. New SAS users often ask whether there are similar packages available in the SAS language, perhaps not realizing that Base SAS is already well suited to this task -- no special bundles necessary.

The basic steps for data scraping are:

  1. Fetch the contents of the target web page
  2. Process the source content of the page -- usually HTML source code -- and parse/save the data fields you need.
  3. If necessary, repeat for subsequent pages. This applies to those web sites that serve up lots of information in paginated form, and you want to collect all available pages of data.

Let's map these steps to the SAS programming language:

For this step Use these features
Get the contents of the web page FILENAME URL
Process/parse the web page contents DATA step, with parsing functions such as FIND, regular expressions via PRXMATCH. Use SAS informats to convert text to native data types.
Repeat across subsequent pages SAS macro language (%DO %UNTIL processing) or DATA step with asked about scraping data from the Center for Disease Control (CDC) web site.

Step 0: Find the original data source and skip the scrape

I'm writing this article at the end of 2017, and at this point in our digital evolution, web scraping seems like a quaint pastime. Yes, there are still some cases for it, which is why I'm presenting this article. But if you find information that looks like data on a web page, then there probably is a real data source behind it. And with the movement toward open data (especially in government) and data-driven APIs (in social media and commercial sites) -- there is probably a way for you to get to that data source directly.

In my example page for this post, the source page is hosted by CDC.gov, a government agency whose mandate is to share information with other public and private entities. As one expert pointed out, the CDC shares a ton of data at its dedicated data.cdc.gov site. Does this include the specific table the user wanted? Maybe -- I didn't spend a lot of time looking for it. However, even if the answer is No -- it's a simple process to ask for it. Remember that web sites are run by people, and usually these people are keen to share their information. You can save effort and achieve a more reliable result if you can get the official data source.

If you find an official data source to use instead, you might still want to automate its collection and import into SAS. Here are two articles that cover different techniques:

Web scraping is lossy, fragile process. The information on the web page does not include data types, lengths, or constraints metadata. And one tweak to the presentation of the web page can break any automated scraping process. If you have no other alternative and you're willing to accept these limitations, let's proceed to Step 1.

Step 1: Fetch the web page

In this step, we want to achieve the equivalent of the "Save As..." function from your favorite web browser. Just like your web browser, SAS can act as an HTTP client. The two most popular methods are:

  • FILENAME URL, which assigns a SAS fileref to the web page content. You can then use INFILE and INPUT to read that file from a DATA step.
  • PROC HTTP, which requires a few more lines of code to get and store the web page content in a single SAS procedure step.

I prefer PROC HTTP, and here's why. First, as a separate explicit step it's easier to run just once and then work with the file result over the remainder of your program. You're guaranteed to fetch the file just once. Though a bit more code, it makes it more clear about what happens under the covers. Next reason: PROC HTTP has been refined considerably in SAS 9.4 to run fast and efficient. Its many options make it a versatile tool for all types of internet interactions, so it's a good technique to learn. You'll find many more uses for it.

Here's my code for getting the web page for this example:

/* Could use this, might be slower/less robust */
* filename src url "https://wwwn.cdc.gov/nndss/conditions/search/";
 
/* I prefer PROC HTTP for speed and flexibility */
filename src temp;
proc http
 method="GET"
 url="https://wwwn.cdc.gov/nndss/conditions/search/"
 out=src;
run;

Step 2: Parse the web page contents

Before you can write code that parses a web page, it helps to have some idea of how the page is put together. Most web pages are HTML code, and the data that is "locked up" within them are expressed in table tags: <table>, <tr>, <td> and so on. Before writing the first line of code, open the HTML source in your favorite text editor and see what patterns you can find.

For this example from the CDC, the data we're looking for is in a table that has 6 fields. Fortunately for us, the HTML layout is very regular, which means our parsing code won't need to worry about lots of variation and special cases.

 <tr >
	<td style="text-align:left;vertical-align:middle;">
    	<a href="/nndss/conditions/chancroid/">
		Chancroid
		</a>
	</td>
	<td class="tablet-hidden"></td>
	<td class="tablet-hidden"></td>
	<td class="tablet-hidden"><i>Haemophilus ducreyi</i></td>
	<td  >1944 </td>
	<td  > Current</td>
</tr>

The pattern is simple. We can find a unique "marker" for each entry by looking for the "/nndss/conditions" reference. The line following (call it offset-plus-1) has the disease name. And the lines that are offset-plus-7 and offset-plus-8 contain the date ranges that the original poster was asking for. (Does this seem sort of rough and "hacky?" Welcome to the world of web scraping.)

Knowing this, we can write code that does the following:

Here's the code that I came up with, starting with what the original poster shared:

/* Read the entire file and skip the blank lines */
/* the LEN indicator tells us the length of each line */
data rep;
infile src length=len lrecl=32767;
input line $varying32767. len;
 line = strip(line);
 if len>0;
run;
 
/* Parse the lines and keep just condition names */
/* When a condition code is found, grab the line following (full name of condition) */
/* and the 8th line following (Notification To date)                                */
/* Relies on this page's exact layout and line break scheme */
data parsed (keep=condition_code condition_full note_to);
 length condition_code $ 40 condition_full $ 60 note_to $ 20;
 set rep;
 if find(line,"/nndss/conditions/") then do;
   condition_code=scan(line,4,'/');
   pickup= _n_+1 ;
   pickup2 = _n_+8;
   /* "read ahead" one line */
   set rep (rename=(line=condition_full)) point=pickup;
   condition_full = strip(condition_full);
 
   /* "read ahead" 8 lines */
   set rep (rename=(line=note_to)) point=pickup2;
   /* use SCAN to get just the value we want */
   note_to = strip(scan(note_to,2,'<>'));
 
   /* write this record */
   output;
  end;
run;

The result still needs some cleanup -- there are a few errant data lines in the data set. However, it should be simple to filter out the records that don't make sense with some additional conditions. I left this as a to-do exercise to the original poster. Here's a sample of the result:

Step 3: Loop and repeat, if necessary

For this example with the CDC data, we're done. All of the data that we need appears on a single page. However, many web sites use a pagination scheme to break the data across multiple pages. This helps the page load faster in the browser, but it's less convenient for greedy scraping applications that want all of the data at once. For web sites that paginate, we need to repeat the process to fetch and parse each page that we need.

Web sites often use URL parameters such as "page=" to indicate which page of results to serve up. Once you know the URL parameter that controls this offset, you can create a SAS macro program that iterates through all of the pages that you want to collect. There are many ways to accomplish this, but I'll share just one here -- and I'll use a special page from the SAS Support Communities (communities.sas.com) as a test.

Aside from the macro looping logic, my code uses two tricks that might be new to some readers.

/* Create a temp folder to hold the HTML files */
options dlcreatedir;
%let htmldir = %sysfunc(getoption(WORK))/html;
libname html "&htmldir.";
libname html clear;
 
/* collect the first few pages of results of this site */
%macro getPages(num=);
 %do i = 1 %to &num.;
   %let url = https://communities.sas.com/t5/custom/page/page-id/activity-hub?page=&i.;
   filename dest "&htmldir./page&i..html";
   proc http 
      method="GET"
	  url= "&url."
	  out=dest;
   run;
 %end;
%mend;
 
/* How many pages to collect */
%getPages(num=3);
 
/* Use the wildcard feature of INFILE to read all */
/* matching HTML files in a single step */
data results;
 infile "&htmldir./*.html" length=len lrecl=32767;
 input line $varying32767. len ;
 line = strip(line);
 if len>0;
run;

The result of this program is one long data set that contains the results from all of the pages -- that is, all of lines of HTML code from all of the web pages. You can then use a single DATA step to post-process and parse out the data fields that you want to keep. Like magic, right?

Where to learn more

Each year there are one or two "web scraping" case studies presented at SAS Global Forum. You can find them easily by searching on the keyword "scrape" from the SAS Support site or from lexjansen.com. Here are some that I found interesting:

How JMP 13 sees the CDC page

If you have access to JMP software from SAS, you could try the File->Internet Open... feature. It's useful for a one-time, ad-hoc parsing step that you can later capture into a JSL script for future use. With Internet Open you can specify a URL and select to open it "as HTML," and then JMP will offer a selection of available tables to import as data. This is definitely worth a try if you have JMP on your desktop, as it can lend some insight about the structure of the page you're trying to scrape.

JMP 13 version of the data, imported

The post How to scrape data from a web page using SAS appeared first on The SAS Dummy.

28
Nov

Learn more about SAS Viya with resources from SAS Education

SAS Viya is an exciting addition to the SAS Platform, allowing you to conduct analysis faster than ever before and providing you the flexibility to utilize open source technologies and generate insights from data in any computing environment. The blog post “Top 12 Advantages of SAS Viya” does a great [...]

The post Learn more about SAS Viya with resources from SAS Education appeared first on SAS Learning Post.

21
Nov

Computing a date from the past (or future) with SAS

Social media has brought anniversary dates to the forefront. Every day, my view of Google Photos or Facebook shows me a collection of photos from exactly some number of years ago to remind me of how good things were back then. These apps are performing the simplest of date-based math to trick me. They think, "We have a collection of photos from this month/day from some previous year -- let's collect those together and then prey on Chris' nostalgia instincts."

Sometimes it works and I share them with friends. Weren't my kids so cute back then? It's true, they were -- but I don't pine for those days. I live in the present and I look towards the future -- as all citizens of the world should.

Looking at back dates in SAS

But you know who likes to look at the past? Managers (and probably a few of my colleagues...). Many of my SAS jobs are in support of date-based reports. They answer questions such as, "what happened in the past 30 days?" or "how much activity in the past 6 months?" When I have SAS date values, going back 30 days is simple. The SAS date for "30 days ago" is simply today()-30. (Because, remember, a SAS date is simply an integer representing the count of days since Jan 1, 1960.)

"6 months ago" is a little more nuanced. I could fudge it by subtracting 183 or 184 from the value of today(), but that's not precise enough for the analytical wonks that I work with. Fortunately, SAS provides a function that can compute any date -- given a starting date -- using just about any criteria you can imagine. The function is called INTNX (

six_mo_ago = intnx(
 'month',  /* unit of time interval */
  today(), /* beginning date */
  -6,      /* number of intervals, negative goes to the past */
  'same'   /* alignment of interval date. "Same" is for same day of month */
 );

If I wanted to go 6 years into the past, I would simply change that first argument to 'year'. 6 days? Change it to 'day'. in %SYSFUNC -- the macro function that breaks out of macro processing to invoke built-in SAS functions. I also need to remove the quotes around the interval and alignment values -- the SAS macro processor will treat these as string literals and would not approve of the quotes.

%let six_mo_ago = 
 %sysfunc(
  intnx(
   month,             /* unit of time interval */
   %sysfunc(today()), /* function to get current date */
   -6,                /* number of intervals, negative goes to the past */
   same               /* alignment of interval date. "Same" is for same day of month */
   )
  );
 %put &=six_mo_ago;

Output (from today's run):

45          %put &=six_mo_ago;
SIX_MO_AGO=20960

Notice the result is a number -- which is exactly what I need for date value comparisons. But what if I wanted a formatted version of the date? The

 %let six_mo_ago_fmt = 
  %sysfunc(
   intnx(
   month,             /* unit of time interval */
   %sysfunc(today()), /* function to get current date */
   -6,                /* number of intervals, negative goes to the past */
   same               /* alignment of interval date. "Same" is for same day of month */
   ), date9.  /* Tell %SYSFUNC how to format the result */
  );
 %put &=six_mo_ago_fmt;

Output (from today's run):

56          %put &=six_mo_ago_fmt;
SIX_MO_AGO_FMT=21MAY2017

Adapting INTNX for SAS datetime values

This computed date works perfectly when my data sets contain SAS date values that I want to filter. For example, I can limit the records to those from the past 6 months with code similar to this:

proc freq data=comm.visits (where=(date > &six_mo_ago.)) 
 noprint 
 ;
tables geo_country_code / out=country_visits_6mo;
run;

But I realized that some of my data sets use datetime values, not date values. A SAS datetime value is simply the number of seconds since midnight on Jan 1, 1960. I've seen programs that adapt the code above by converting the computed cutoff date from a date value to a datetime value -- it's simple to do with math:

proc freq data=comm.visits (where=(event_time > %sysevalf( &six_mo_ago. * 60 * 60 * 24 ))) 
 noprint 
 ;
tables geo_country_code / out=country_visits_6mo;
run;

The

%let six_mo_ago_dt = 
 %sysfunc(
  intnx(
   dtmonth,              /* unit of time interval */
   %sysfunc(datetime()), /* function to get current datetime */
   -6,                   /* number of intervals, negative goes to the past */
   same                  /* alignment of interval date. "Same" is for same day of month/same time */
   )
  );
 %put &=six_mo_ago_dt;

Output (at this moment):

34          %put &=six_mo_ago_dt;
SIX_MO_AGO_DT=1810991917.93526

I can then use the much more readable version of my comparison code:

proc freq data=comm.visits (where=(event_time > &six_mo_ago_dt)) 
 noprint 
 ;
tables geo_country_code / out=country_visits_6mo;
run;

Learning more about date and datetime intervals

First, an acknowledgment. I recently had the privilege of presenting at the Quebec user groups with SAS-world superstar Marje Fecht. Marje is an excellent instructor, and she delivered a popular talk about working with SAS date values -- a topic that trips up many new SAS users. It's useful for me -- someone who has used SAS for a long time -- to see a basic topic presented to me as if I were brand new. Her talk reminded me of some good practices that I was able to bring home and apply in my own production SAS jobs. I know that Marje was invited to present this talk at SAS Global Forum 2018, and I hope that I did not steal too much of her thunder. She has a good origin story about the 01JAN1960 date decision. You should attend the conference and see her talk in person.

The INTNX (and its sister function for computing date differences, INTCK) are powerful tools for manipulating date and datetime values. Other programming languages offer complex code libraries to accomplish what these two functions can do as part of Base SAS. They are tricky to learn at first, but once you get the hang of them they can really simplify your SAS programs that deal with time-based data.

Rick Wicklin presented a useful introduction to both functions in INTCK and INTNX: Two essential functions for computing intervals between dates in SAS.

While these functions are available in Base SAS, they are maintained by the developers who look after SAS/ETS (econometrics and time series). You'll find see this thread on the SAS Support Communities.

The post Computing a date from the past (or future) with SAS appeared first on The SAS Dummy.

10
Nov

Simple Proc Print trick for grouped data

Here's a Proc Print trick for grouped data. Suppose your data is divided into groups, such as males and females. You could sort by the grouping variable before printing, like this: Suppose you want to better emphasize the groups. You could add a BY statement, like this: OK, but, personally, [...]

The post Simple Proc Print trick for grouped data appeared first on SAS Learning Post.

8
Nov

How to format rows of a table in SAS

A SAS programmer wanted to display a table in which the rows have different formats. An example is shown below. The programmer wanted columns that represent statistics and rows that represent variables. She wanted to display formats (such as DOLLAR) for some variables—but only for certain statistics. For example, the number of nonmissing observations (N) should never use the format of the variable, whereas the minimum, maximum, and mean values should.

Format rows of a table

In SAS you can easily apply a format to a column, but not to a row. However, SAS provides two techniques that enable you to control the appearance of cells in tables:

Both PROC TEMPLATE and PROC REPORT have a learning curve. If you are in a hurry, an alternative solution is to use the DATA step to create a factoid. Recall that a factoid is an ODS table that contains character values and (formatted) numbers in the same column. A factoid can display mixed-type data by converting every value to a string. This conversion process gives you complete control over the format of each cell in the table. Consequently, you can use context to format some numbers differently from other numbers.

Creating a custom factoid

Suppose that you want to generate the table shown at the top of this article. You can obtain the statistics from PROC MEANS and then transpose the data. The following statements produce an output data set that contains descriptive statistics for three variables in the Sashelp.Cars data:

proc means data=sashelp.cars noprint;
var Cylinders EngineSize MSRP;
output out=stats(drop=_TYPE_ _FREQ_
                 rename=(_STAT_=Label1)
                 where=(Label1 ^= "STD") );
run;
 
proc print data=stats noobs;
run;
Descriptive statistics with formats

I have highlighted the first row of the output data to indicate why this output is not suitable for a report. The output variables retain the format of the original variables. However, the 'N' statistic is the sample size, and a sample of size '$428' does not make sense! Even '428.000' is a strange value to see for a sample size. It would be great to format the first row with an intrger format such as 8.0.

The next DATA step creates three character variables (CVALUE1-CVALUE3) that contain formatted values of the three numerical variables in the data. The SELECT-WHEN statement does the following:

  • For each observation and for each of the three numerical variables:
    • If the LABEL1 variable is 'N', then apply the 8.0 format to the value of the numerical variable.
    • Otherwise use the VVALUE function to get the formatted value for the numerical variable.
data Factoid;
set stats;
array nValue[3] Cylinders EngineSize MSRP;      /* numerical variables */
array cValue[3] $12.;                           /* cValue[i] is formatted version of nValue[i] */
/* with a little effort you could assign labels dynamically */
label cValue1="Cylinders" cValue2="EngineSize" cValue3="MSRP";
 
do i = 1 to dim(nValue);
   select (Label1);
      when ('N')    cValue[i] = put(nvalue[i], 8.0);
      otherwise     cValue[i] = vvalue(nvalue[i]);
   end;
end;
run;
 
proc print data=Factoid label noobs;
   var Label1 cValue:;
run;

The output displays the character columns, which contain formatted numbers. With additional logic, you could display fewer decimal values for the MEAN row.

Transpose the factoid

The previous table contains the information that the programmer wants. However, the programmer asked to display the transpose of the previous table. For completeness, here are SAS statements that transpose the table:

proc transpose data=Factoid out=TFactoid;
   var cValue:;
   ID Label1;
   IDLABEL Label1;
run;
 
proc print data=TFactoid(drop=_NAME_) noobs label;
   label _LABEL_ = "Variable";
run;

The final table is displayed at the top of the article.

Summary

In summary, this article shows one way to display a column of data in which each cell potentially has a different format. The trick is to create a factoid, which means that you create a character copy of numeric data. As you create the copy, you can use the PUT function to apply a custom format, or you can use the VVALUE function to get the existing formatted value.

In general, you should try to avoid creating copies of data, so use PROC TEMPLATE and PROC REPORT if you know how. However, if you don't know how to use those tools, the factoid provides an alternative way to control the formatted values that appear in a table. Although not shown in this article, a factoid also enables you to display character and numeric values in the same column.

The post How to format rows of a table in SAS appeared first on The DO Loop.

6
Nov

SAS tips and tricks: Users-tell-all edition

SAS Tips and TricksThere is certainly no shortage of terrific tips and tricks in various SAS blogs from some of our most distinguished SAS in-house experts. But, there's another group of equally qualified experts who don't often get to share their expertise on this channel: our customers. So, I went on a quest to get the inside scoop from various SAS users, polling Friends of SAS members to get their feedback on their favorite SAS tips.

We asked a few of these Friends of SAS members who are regular SAS users to share with us their top SAS tips and tricks for improving performance or something they wished they had known earlier in their SAS career. Based on that, we got a wide range of tips and tricks from a number of different SAS users – ranging from novice to expert and across various industries and product users. Check out some of them below:

FUNCTIONS

Functions are either built into SAS itself or you can write your own customized code that act in the same manner, all of which help in analyzing and processing data. There are a variety of function categories that include mathematical, date and time, character, truncation, and miscellaneous. Using functions makes us more efficient, and we don’t have to re-invent the wheel every time we want to figure something out. With this being said, some of our regular SAS users have a thing or two to say about dealing with functions that may help you out:

“Before you program any complex code, look for a SAS function that will do the task for you.”
     - John Ladds, Past President, OASUS

“Insert a line break in a concatenated string, such as: manylines = catx('0a'x,a,b,c);”
     - Aroop Ghosh, Principal Consultant, Webtalk Communications

“Use the lag function to create time related variables, for example, in time punch data”
     - Yolanda, Analyst, TD

“A good trick that I have recently learnt [sic]which can make the code less wordier is using the functions IFN and IFC as an alternative to IF THEN ELSE statements in conditional processing.”
     - Sunny Giroti, Master of Business Analytics Candidate, Schulich School of Business

“IFN can be used in place of IF THEN ELSE to shorten code”
     - Neil Menezes, Senior Business Anlyst, CTFS

“Ron Cody’s link from SAS.COM. It has many SAS function examples.”
     - John Lam, CIBC

SYNTAX/SHORTCUTS/EFFICIENCIES

You know what they say: time is money. So for a SAS programmer, finding shortcuts and ways to work more efficiently and faster are important to get a job done quicker. Here are a few ways SAS users think can make your life easy while working with SAS:

“Use missover to ensure no records are skipped when reading in a file”
     - Scott Bellefeuille, IT Solutions Developer (Merchant Services), TD Bank

“Pressing keys 'Ctrl'+'/' to comment out a line of code.”
     - Bunce Leung, Execution Manager, RBC

“Variable Lists - being able to refer to variables using double dashes to indicate all variables between first and last in a dataset is super useful for many procs. The later versions of being able to use the prefix and colon to indicate all datasets with a prefix is a great shortcut as well.”
     - Fareeza Khurshed, Manager (Statistical Services), Alberta Treasury Board and Finance

“I like to use PERL in SAS for finding stuff in character variables.”
     - Peter Timusk, Statistics Officer, Statistics Canada

“Title "SAS can give you an Inheritance". Have an ODBC driver on your local PC but not on a remote server? No problem. Use rsubmit with the inheritlib option. Your remote server will now inherit the ODBC driver and be able to access a database you thought you could only reach with your PC.”
     - Horst Wolter, Manager, TD Bank

“If you want to speed the processing of your program. Run your join statements on the "work" library. It is must faster.”
     - Estela Tavares, Economist, Statistics Canada

“When dealing with probability, can logistic be used in all cases? Trick Q - as A is N0. What about the times, probability is 0 and 1. What if the data is heavily distributed on 1s and 0s.”
     - Mukul Pandey, Student Business Analytics, Schulich School of Business

“Proc tabulate can perform descriptive statistics better than proc freq and proc means.”
     - Taha Azizi, Senior Business Insight Analyst, TD

Your turn

Were any of these tips and tricks useful? Do you use them already? What are some of your top SAS tips and tricks? Please be sure to share in the comments below!

Looking for more tips and tricks? Check out this video featuring six Canadian SAS programmers, including a few Friends of SAS members, who share some of their favourite SAS programming tips.

About Friends of SAS

If you’re not familiar with Friends of SAS, it is an exclusive online community available only to our Canadian SAS customers and partners to recognize and show our appreciation for their affinity to SAS. Members complete activities called 'challenges' and earn points that can be redeemed for rewards. There are opportunities to build powerful connections, gain privileged access to SAS resources and events, and boost your learning and development of SAS all in a fun environment.

Interested in learning more about Friends of SAS? Feel free to email myself at Natasha.Ulanowski@sas.com or Martha.Casanova@sas.com with any questions or more details.

SAS tips and tricks: Users-tell-all edition was published on SAS Users.

6
Nov

What is a factoid in SAS?

A factoid in SAS is a table that displays numeric and chanracter values in a single column

Have you ever seen the "Fit Summary" table from PROC LOESS, as shown to the right? Or maybe you've seen the "Model Information" table that is displayed by some SAS analytical procedures? These tables provide brief interesting facts about a statistical procedure, hence they are called factoids.

In SAS, a "factoid" has a technical meaning:

  • A factoid is an ODS table that can display numerical and character values in the same column.
  • A factoid displays row headers in a column. In most ODS styles, the row headers are displayed in the first column by using a special style, such as a light-blue background color in the HTMLBlue style.
  • A factoid does not display column headers because the columns display disparate and potentially unrelated information.

I want to emphasize the first item in the list. Since variables in a SAS data set must be either character or numeric, you might wonder how to access the data that underlies a factoid. You can use the ODS OUTPUT statement to look at the data object behind any SAS table, as shown below:

proc loess data=sashelp.cars plots=none;
   model mpg_city = weight;
   ods output FitSummary=Fit(drop=SmoothingParameter);
run;
 
proc print data=Fit noobs;
run;
The data object that underlies a factoid in SAS

The PROC PRINT output shows how the factoid manages to display characters and numbers in the same column. The underlying data object contains three columns. The LABEL1 column contains the row headers, which identify each row. The CVALUE1 column is the column that is displayed in the factoid. It is a character column that contains character strings and formatted copies of the numbers in the NVALUE1 column. The NVALUE1 column contains the raw numeric value of every number in the table. Missing values represent rows for which the table displays a character value.

All factoids use the same naming scheme and display the LABEL1 and CVALUE1 columns. The form of the data is important when you want to use the numbers from a factoid in a SAS program. Do not use the CVALUE1 character variable to get numbers! Those values are formatted and possibly truncated, as you can see by looking at the "Smoothing Parameter" row. Instead, read the numbers from the NVALUE1 variable, which stores the full double-precision number.

For example, if you want to use the AICC statistic (the last row) in a program, read it from the NVALUE1 column, as follows:

data _NULL_;
   set Fit(where=( Label1="AICC" ));  /* get row with AICC value */
   call symputx("aicc", NValue1);     /* read value of NValue1 variable into a macro */
run;
%put &=aicc;                          /* display value in log */
AICC=3.196483775

Some procedures produce factoids that display multiple columns. For example, PROC CONTENTS creates the "Attributes" table, which is a factoid that displays four columns. The "Attributes table displays two columns of labels and two columns of values. When you use the ODS OUTPUT statement to create a data set, the variables for the first two columns are LABEL1, CVALUE1, and NVALUE1. The variables for the second two columns are LABEL2, CVALUE2, and NVALUE2.

Be aware that the values in the LABEL1 (and LABEL2) columns depend on the LOCALE= option for your SAS session. This means that some values in the LABEL1 column might be translated into French, German, Korean, and so forth. When you use a WHERE clause to extract a value, be aware that the WHERE clause might be invalid in other locales. If you suspect that your program will be run under multiple locales, use the _N_ automatic variable, such as if _N_=14 then call symputx("aicc", NValue1);. Compared with the WHERE clause, using the _N_ variable is less readable but more portable.

Now that you know what a factoid is, you will undoubtedly notice them everywhere in your SAS output. Remember that if you need to obtain numerical values from a factoid, use the ODS OUTPUT statement to create a data set. The NVALUE1 variable contains the full double-precision numbers in the factoid. The CVALUE1 variable contains character values and formatted versions of the numbers.

The post What is a factoid in SAS? appeared first on The DO Loop.

1
Nov

Evaluate a function by using the function name in SAS/IML

A SAS/IML programmer asked whether you can pass the name of a function as an argument to a SAS/IML module and have the module call the function that is passed in. The answer is "yes." The basic idea is to create a string that represents the function call and then use CALL EXECUTE to evaluate the function.

Suppose that you want to create a "super function" (call it EvalFunc) that can compute several univariate statistics such as the mean, the median, the standard deviation, and so forth. Each of these descriptive statistics are calculated by a SAS/IML function that takes one argument. You can therefore define the EvalFunc function to take two arguments: the name of a SAS/IML function and a vector of data. For example, the syntax m = EvalFunc("mean", x) would evaluate mean(x) whereas the syntax s = EvalFunc("std", x) would evaluate std(x). The following SAS/IML function creates a string of the form "z=func(x);" and uses CALL EXECUTE to run that statement:

proc iml;
start EvalFunc(func, x);
   cmd = "z = " + func + "(x);";    /* z = func(x); */
   call execute( cmd );
   return z;
finish;

You can test the EvalFunc module by passing the names of some SAS/IML functions that compute descriptive statistics:

y = {1,2,3,4,5,6,7,8,20};
stats = {"mean" "median" "std" "var" "skewness" "kurtosis"};
results = j(1, ncol(stats), .);
do i = 1 to ncol(stats);
   results[i] = EvalFunc(stats[i], y);
end;
 
print results[colname=stats];

In the example, there is no error checking. It assumes that you pass in a valid function name that takes one argument. Notice that CALL EXECUTE runs immediately in SAS/IML and runs in the local scope. This is in contrast to the way that CALL EXECUTE works in the SAS DATA step. The DATA step pushes statements onto a queue and executes the queue after the DATA step completes.

If you add some IF-THEN/ELSE logic, you could support alternate values of the func argument. For example, if upcase(func)="Q1", you could define cmd = "call qntl(z, x, 0.25);". When you pass that string to CALL EXECUTE, it calls the QNTL function to compute the first quantile (25th percentile).

If you use the SAS/IML Studio application, you can use the ALIAS statement to directly call a function based on the name and signature of that function. The ALIAS function is not supported in PROC IML.

In summary, this article shows that you can pass the name of a function to a SAS/IML module and use CALL EXECUTE to evaluate the function.

The post Evaluate a function by using the function name in SAS/IML appeared first on The DO Loop.

30
Oct

A SAS programming technique to modify ODS templates

This article demonstrates a SAS programming technique that I call Kuhfeld's template modification technique. The technique enables you to dynamically modify an ODS template and immediately call the modified template to produce a new graph or table. By following the five steps in this article, you can implement the technique in less than 20 lines of SAS code.

Kuhfeld's Template Modification Technique (TMT)

The template modification technique (TMT) is one of several advanced techniques that Warren Kuhfeld created and promoted in a series of articles that culminated with Kuhfeld (2016). (Slides are also available.) The technique that I call the TMT is a DATA _NULL_ program that modifies a SAS-supplied template, uses CALL EXECUTE to compiles it "on the fly," and immediately calls the modified template to render a new graph (pp. 12ff in the paper; p. 14 of his slides).

When I attended Warren's 2016 presentation, I felt a bit overwhelmed by the volume of information. It was difficult for me to see the forest because of all the interconnected "trees." Graph customization is easier to understand if you restrict your attention only to GTL templates. You can make simple modifications to a template (for example, changing the value of an option) by using the following five steps:

  1. Prepend a private item store to the ODS search path. The private item store will store the modified template.
  2. Write the existing template to a temporary text file.
  3. Run a DATA step that reads the existing template one line at a time. Check each line to see if it contains the option that you want to change. If so, modify the line to change the template's behavior.
  4. Compile the modified template into the private item store.
  5. Call PROC SGRENDER to render the graph by using the new template.

This article is intentionally brief. For more information see the chapter "ODS Graphics Template Modification" in the SAS/STAT documentation.

Most of Kuhfeld's examples use the TMT to modify ODS templates that are distributed with SAS. These templates can be complex and often contain conditional logic. In an effort to produce a simpler example, the next section creates a simple template that uses a heat map to display a response variable on a uniform grid.

Example: A template that specifies a color ramp

Programmers who use the GTL to create graphs have probably encountered dynamic variables and run-time macro variables. You can use these to specify certain options at run time, rather than hard-coding information into a template. However, the documentation states that "dynamic variables ... cannot resolve to statement or option keywords." In particular, the GTL does not support changing the color ramp at run time: whatever color ramp is hard-coded in the template is the color ramp that is used. (Note: If you have SAS 9.4M3, the HEATMAPPARM statement in PROC SGPLOT supports specifying a color ramp at run time.)

However, you can use Kuhfeld's TMT to modify the color ramp at run time. The following GTL defines a template named "HEATMAP." The template uses the HEATMAPPARM statement and hard-codes the COLORMODEL= option to be the "ThreeColorRamp" in the current ODS style. The template is adapted from the article "Creating a basic heat map in SAS."

/* Construct example template: a basic heat map with continuous color ramp */
proc template; 
define statgraph HEATMAP;
dynamic _X _Y _Z;           /* dynamic variables */
 begingraph;
 layout overlay;
    heatmapparm x=_X y=_Y colorresponse=_Z /  /* specify variables at run time */
       name="heatmap" primary=true xbinaxis=false ybinaxis=false
       colormodel = THREECOLORRAMP;           /* <== hard-coded color ramp */
    continuouslegend "heatmap";
  endlayout;
endgraph;
end;
run;

When the PROC TEMPLATE step runs, the HEATMAP template is stored in a location that is determined by your ODS search path. You can run ods path show; to display the search path. You can discover where the HEATMAP template is stored by running proc template; list HEATMAP; run;.

The following DATA step defines (x,y) data on a uniform grid and defines z to be a cubic function of x and y. The heat map shows an approximate contour plot of the cubic function:

/* Construct example data: A cubic function z = f(x,y) on regular grid */
do x = -1 to 1 by 0.05;
   do y = -1 to 1 by 0.05;
      z = x**3 - y**2 - x + 0.5;
      output;
   end;
end;
run;
 
/* visualize the data using the built-in color ramp */
proc sgrender data=Cubic template=Heatmap; 
   dynamic _X='X' _Y='Y' _Z='Z';
run;
Heat map rendered with original template

The remainder of this article shows how to use Kuhfeld's TMT to create a new template which is identical to the HEATMAP template except that it uses a different color ramp. Kuhfeld's technique copies the existing template line by line but replaces the text "THREECOLORRAMP" with text that defines a new custom color ramp.

In the following program, capital letters are used for the name of the existing template, the name of the new template, and the value of the new color ramp. Most of the uncapitalized terms are "boilerplate" for the TMT, although each application will require unique logic in the DATA _NULL_ step.

Step 1: Prepend a private item store to the ODS search path

PROC TEMPLATE writes templates to the first (writable) item store on the ODS search path. PROC SGRENDER looks through the ODS search path until it locates a template with a given name. The first step is to prepend a private item store to the search path, as shown by the following statement:

/* 1. Modify the search path for ODS. Prepend an item store in WORK */
ods path (prepend) work.modtemp(update);

The ODS PATH statement ensures that the modified template will be stored in WORK.MODTEMP. You can provide any name you want. The name WORK.TEMPLAT is a convention that is used in the SAS documentation. Because you might already have WORK.TEMPLAT on your ODS path, I used a different name. (Note: Any item stores in WORK are deleted when SAS exits.) The SAS/STAT documentation provides more information about the template search path.

Step 2: Write the existing template to a temporary text file

You need to write the source code for the HEATMAP template to a plain text file so that it can be read by the SAS DATA step. You can use the FILENAME statement to point to a hard-coded location (such as filename t "C:/temp/tmplt.txt";), but I prefer to use the SAS keyword TEMP, which is more portable. The TEMP keyword tells SAS to create a temporary file name in a location that is independent of the operating system and the file system:

/* 2. Write body of existing template to text file */
filename _tmplt TEMP;             /* create temporary file and name */
proc template;
   source HEATMAP / file=_tmplt;  /* write template source to text file */
run;

If you omit the FILE= option, the template source is written to the SAS log. The text file contains only the body of the HEATMAP template.

Steps 3 and 4: Create a new template from the old template

The next step reads and modifies the existing template line by line. There are many SAS character functions that enable you to discover whether some keyword exists in a line of text. Examples include the INDEX and FIND family of functions, the PRXCHANGE function (which uses regular expressions), and the TRANWRD function.

The goal of the current example is to replace the word "THREECOLORRAMP" by another string. The TRANWRD function replaces the string if it is found; the line is unchanged if the word is not found. Therefore it is safe to call the TRANWRD function on every line in the existing template.

You can use the CALL EXECUTE function to place each (possibly modified) line of the template onto a queue, which will be executed when the DATA step completes. However, the text file does not start with a PROC TEMPLATE statement nor end with a RUN statement. You must add those statements to the execution queue in order to compile the template, as follows:

/* one possible way to specify a new color ramp */
%let NEWCOLORRAMP = (blue cyan yellow red);
 
/* 3. Read old template; make modifications.
   4. Use CALL EXECUTE to compile (modified) template with a new name. */
data _null_;                       /* Modify graph template to replace default color ramp */
   infile _tmplt end=eof;          /* read from text file; last line sets EOF flag to true */
   input;                          /* read one line at a time */
   if _n_ = 1 then 
       call execute('proc template;');      /* add 'PROC TEMPLATE;' before */
 
   if findw(_infile_, 'define statgraph') then
      _infile_ = "define statgraph NEWHEATMAP;";  /* optional: assign new template name */
   /* use TRANWRD to make substitutions here */
   _infile_ = tranwrd(_infile_, 'THREECOLORRAMP', "&NEWCOLORRAMP"); /* replace COLORMODEL= option */
 
   call execute(_infile_);                  /* push line onto queue; compile after DATA step */
   if eof then call execute('run;');        /* add 'RUN;' at end */
run;

The DATA step copies the existing template source, but modifies the template name and the COLORMODEL= option. The CALL EXECUTE statements push each (potentially modified) statement onto a queue that is executed when the DATA step exits. The result is that PROC TEMPLATE compiles a new template and stores it in the WORK.MODTEMP item store. Recall that the TRANWRD function and other string-manipulation functions are case sensitive, so be sure to use the exact capitalization that exists in the template. In other words, this technique requires that you know details of the template that you wish to modify!

If you are using this technique to modify one of the built-in SAS templates that are used by SAS procedure, do not modify the template name.

Inspect the modified template

Before you attempt to use the new template, you might want to confirm that it is correct. The following call to PROC TEMPLATE shows the location and creation date for the new template. The source for the template is displayed in the log so that you can verify that the substitutions were made correctly.

proc template;
   list NEWHEATMAP / stats=created; /* location of the modified template */
   source NEWHEATMAP;       /* confirm that substitutions were performed */
run;

Step 5: Render the graph by using the new template

The final step is to call PROC SGRENDER to use the newly created template to create the new graph:

/* 5. Render graph by using new template */
proc sgrender data=Cubic template=NEWHEATMAP; 
   dynamic _X='X' _Y='Y' _Z='Z';
run;
Render graph with modified template

The output demonstrates that the original color ramp ("ThreeColorRamp") has been replaced by a four-color ramp that uses the colors blue, cyan, yellow, and red.

When is this necessary?

You might wonder if the result is worth all this complexity and abstraction. Couldn't I have merely copied the template into a program editor and changed the color ramp? Yes, you can use a program editor to modify a template for your personal use. However, this programming technique enables you to write macros, tasks, and SAS/IML functions that other people can use. For example, you could encapsulate the program in this article into a %CreateHeatmap macro that accepts a color ramp and data set as an argument. Or in SAS/IML you can write modules such as the HEATMAPCONT and HEATMAPDISC subroutines, which enable SAS/IML programmers to specify any color ramp to visualize a matrix.

If you work in a regulated industry and are modifying templates that SAS distributes, this technique provides reproducibility. A reviewer or auditor can run a program to reproduce a graph.

Summary

Kuhfeld's template modification technique (TMT) enables you to write a SAS program that modifies an ODS template. The TMT combines many SAS programming techniques and can be difficult to comprehend when you first see it. This article breaks down the technique into five simpler steps. You can use the TMT to edit ODS templates that are distributed by SAS and used in SAS procedures, or (as shown in this article) to change options at run time for a template that you wrote yourself. You can download the SAS program for this article.

Happy Halloween to all my readers. I hope you'll agree that Kuhfeld's TRICKS are a real TREAT!

The post A SAS programming technique to modify ODS templates appeared first on The DO Loop.

Back to Top