4
Aug

Viewing SAS catalogs and formats in SAS Enterprise Guide

Many years ago I shared a custom task that allows you to view and manage SAS catalogs within SAS Enterprise Guide. As a reminder, a SAS catalog is a special type of SAS file that acts as a container, like a folder, for a variety of content items called catalog entries. Conceptually, this is like a folder in a file system with a diverse collection of file types. The entries and their uses are specific to SAS, and they include things like SOURCE (program code), SCL (SAS Component Language) programs, SAS format definitions, compiled SAS macro programs, and more.

Most catalog entries aren't that interesting for SAS Enterprise Guide users -- with the notable exception of SAS format definitions. Beginning with the day that I shared the task on my blog, my readers began asking for the ability to see inside a SAS format definition. I replied, "great idea! I'll put that on my to-do list." Here it is, 7 years later -- and I've got it done. I am a man of my word.

>> Download the task from here (ZIP file, with README text file instructions)

Pay special attention to the README instructions to deploy the correct version, and make sure you unblock the task DLL if needed.

And more good news -- this task is scheduled to ship as a built-in task (on the Tools menu) in the next release of SAS Enterprise Guide, coming soon. The custom task download will support SAS Enterprise Guide versions 4.3 through 7.13. Later versions will have the task already available!

Key features of the Catalog and Formats Explorer

Like the original task that I shared, this revised task allows you to explore SAS libraries and the catalogs that they contain. You can view the metadata about every catalog entry -- name, type, description, created and modified dates. You can also delete entries that you need to clean up. And you can view the contents of certain types of entries, mainly SOURCE entries and the format entry types. Here are the main enhancements:

  • You can view the contents of FORMAT, FORMATC, INFMT, INFMTC entries
  • The window displays “modeless” – you can have it showing while you still work on other tasks, such as your SAS programs. Define a format in code, run it, and then click Refresh in the task to see it.
  • You'll find special entry for "User-defined formats", which shows just the catalogs that contain format definitions within the FMTSEARCH path. You can still navigate any library to find catalogs that contain formats.

Here's a screenshot of the task. You can also watch my 5-minute video demonstration of the task in action on SAS Support Communities.

For developers: How it was built

Since the original task was shared as a companion example for my custom tasks book, I've also shared the source code that goes into these enhancements. You can find these in the same GitHub repository that contains the original version. Adding these enhancements makes use of other techniques that I've already shared on this blog.

To generate a text report of a format definition, the task submits a PROC FORMAT step with a SELECT statement. Here's an example:

filename fmtout temp;  
%let _catexp1 = %sysfunc(getoption(center));  
%let _catexp2 = %sysfunc(getoption(date));  
%let _catexp3 = %sysfunc(getoption(number));  
%let _catexp4 = %sysfunc(getoption(pagesize));  
options nocenter nodate nonumber pagesize=max;  
ods listing file=fmtout; 
proc format fmtlib lib=dbmlnx.formats; 
 select ab; 
run; 
ods listing close; 
options &_catexp1. &_catexp2. &_catexp3. PS=&_catexp4.;

With the listing file generated, the task uses code similar to this Windows PowerShell example to download the contents of the output. You might notice that the SAS code sets a few system options, but it stores and later replaces the option values so that your other programs won't be affected.

If you like to tinker, feel free to download the source code and see how it works. If you want to learn more about custom tasks in SAS Enterprise Guide, view my Ask the Expert session, available on-demand from here.

The post Viewing SAS catalogs and formats in SAS Enterprise Guide appeared first on The SAS Dummy.

30
May

The BEST D formats that you have ever seen!

You probably know about the w.d and BESTw. formats. They have been around throughout my entire 38-year history of using SAS. Do you also know about the Dw.p and BESTDw.p formats? You might find that they are the best D formats around and are even better than BEST! The Dw.p. and BESTDw.p formats work well when the range of values is sufficiently large that different values need to be formatted by using different w.d formats. Specifying BESTDw.p (where p = w - 1) is often better than specifying BESTw. for columns of numbers, since the decimal does not shift when the last digit is 0.

     d10.1      d10.3      d10.6      d10.9    best10.  bestd10.9       10.5
 
2.06115E-9 2.06115E-9 2.06115E-9 2.06115E-9 2.06115E-9 2.06115E-9    0.00000
 5.6028E-9  5.6028E-9  5.6028E-9  5.6028E-9  5.6028E-9  5.6028E-9    0.00000
  1.523E-8   1.523E-8   1.523E-8   1.523E-8   1.523E-8   1.523E-8    0.00000
 0.0000008 8.31529E-7 8.31529E-7 8.31529E-7 8.31529E-7 8.31529E-7    0.00000
 0.0000167  0.0000167 0.00001670 0.00001670  0.0000167 0.00001670    0.00002
 0.0000454  0.0000454 0.00004540 0.00004540  0.0000454 0.00004540    0.00005
    0.0001  0.0001234 0.00012341 0.00012341 0.00012341 0.00012341    0.00012
    0.0498    0.04979  0.0497871 0.04978707 0.04978707 0.04978707    0.04979
    1.0000    1.00000   1.000000 1.00000000          1          1    1.00000
    2.7183    2.71828   2.718282 2.71828183 2.71828183 2.71828183    2.71828
    7.3891    7.38906   7.389056 7.38905610  7.3890561 7.38905610    7.38906
   54.5982   54.59815  54.598150 54.5981500   54.59815 54.5981500   54.59815
  403.4288  403.42879   403.4288 403.428793 403.428793 403.428793  403.42879
 8103.0839       8103  8103.0839 8103.08393 8103.08393 8103.08393 8103.08393
     22026      22026   22026.47 22026.4658 22026.4658 22026.4658 22026.4658
  24154953   24154953   24154953 24154952.8 24154952.8 24154952.8 24154952.8
  65659969   65659969   65659969 65659969.1 65659969.1 65659969.1 65659969.1

Here is a review of some of the basic formats for displaying numbers.

w.d displays a numeric value in a field w positions wide, with d positions to the right of the decimal place, and with w - d - 1 positions to the left of the decimal place. Use it when you know the range of value (such as correlation coefficients) or when you know the typical range (such as t statistics).

Example:

x = 100 / 3;
y = -9.9999;
z = -12.3333333;
put x 7.3 / y 7.4 / z 7.4;
 33.333
-9.9999
-12.333

Notice that the decimal automatically shifts for extreme values.

BESTw. displays numeric values, in a field w positions wide, using the best format for each particular value. Decimals might not align for many of the values. This can make results difficult to read.

Example:

x1 = 100 / 3;
x2 = x1 ** 2;
x3 = 1 / x1;
x4 = x1 / 10;
x5 = 12;
format x1-x5 BEST8.;
put x1 / x2 / x3 / x4 / x5;
33.33333
1111.111
0.03
3.333333
12

Dw.p displays numeric values, in a field w positions wide, possibly with a great range of values, lining up decimals for values of similar magnitude. The precision parameter is p. The Dw.p format selects a w.d (or possibly Ew.) format for each value, with the number of decimal points d chosen so that values of similar magnitude are displayed with decimal points that line up within a column of values. For a given field width w, this internal selection of w.d formats is made subject to the precision constraint specified by p. That is, every value will be displayed with at least p significant digits shown (counting digits both to the left and the right of the decimal point).

Example:

x1 = 100 / 3;
x2 = x1 ** 2;
x3 = 1 / x1;
x4 = x1 / 10;
x5 = 12;
format x1-x5 D8.;
put x1 / x2 / x3 / x4 / x5;
33.3333
1111.1
0.0300
3.3333
12.0000

BESTDw.p displays numeric values, in a field w positions wide, lining up decimals for values of similar magnitude, and displays integers without decimals. As the name implies, this format combines the BESTw. format for integers and the Dw.p format for nonintegers.

Example:

x1 = 100 / 3;
x2 = x1 ** 2;
x3 = 1 / x1;
x4 = x1 / 10;
x5 = 12;
format x1-x5 BestD8.;
put x1 / x2 / x3 / x4 / x5;
33.3333
1111.1
0.0300
3.3333
12

Example:

x1 = 100 / 3;
x2 = x1 ** 2;
x3 = 1 / x1;
x4 = x1 / 10;
x5 = 12;
format x1-x5 BestD8.7;
put x1 / x2 / x3 / x4 / x5;
33.33333
1111.111
0.030000
3.333333
12

Specifying BESTDw.p (where p = w - 1) is better than specifying BESTw. for columns, since decimals do not shift when the last digit is 0.

Example:

proc iml;
   x     = {-2 -1.2 -.52 0 .5 1.3 1.5 2.4 3 3.5 4 4.5 5 6 7 8};
   knots = {-7 -4 -2 0 5 8 12 17};
   b     = bspline(x,4,knots);
   print b[format=bestd7.6 label='BestD7.6'] '  '
         b[format=best7. label='Best7.'];
quit;
 
BestD7.6                                   Best7.
 
0.16534 0.59550 0.23099 0.00816       0    0.16534  0.5955 0.23099 0.00816       0
0.16534 0.59550 0.23099 0.00816       0    0.16534  0.5955 0.23099 0.00816       0
0.16534 0.59550 0.23099 0.00816       0    0.16534  0.5955 0.23099 0.00816       0
0.16534 0.59550 0.23099 0.00816       0    0.16534  0.5955 0.23099 0.00816       0
0.10848 0.55622 0.31543 0.01986 7.66E-6    0.10848 0.55622 0.31543 0.01986 7.66E-6
0.04958 0.45179 0.44110 0.05718 0.00035    0.04958 0.45179  0.4411 0.05718 0.00035
0.03970 0.42132 0.46769 0.07067 0.00062     0.0397 0.42132 0.46769 0.07067 0.00062
0.01209 0.28103 0.55021 0.15260 0.00407    0.01209 0.28103 0.55021  0.1526 0.00407
0.00423 0.19585 0.56541 0.22458 0.00993    0.00423 0.19585 0.56541 0.22458 0.00993
0.00134 0.13569 0.55242 0.29216 0.01839    0.00134 0.13569 0.55242 0.29216 0.01839
0.00026 0.08757 0.51761 0.36319 0.03137    0.00026 0.08757 0.51761 0.36319 0.03137
0.00002 0.05202 0.46436 0.43334 0.05025    0.00002 0.05202 0.46436 0.43334 0.05025
      0 0.02813 0.39792 0.49737 0.07659          0 0.02813 0.39792 0.49737 0.07659
      0 0.02813 0.39792 0.49737 0.07659          0 0.02813 0.39792 0.49737 0.07659
      0 0.02813 0.39792 0.49737 0.07659          0 0.02813 0.39792 0.49737 0.07659
      0 0.02813 0.39792 0.49737 0.07659          0 0.02813 0.39792 0.49737 0.07659

Dw.p Format Details. Dw.p does not imply w.d. The p specification is a precision specification. It is not a number-of-decimal-places (d) specification. First, note that the effective values for p range from 1 to 9. Dw. (no p), Dw.0, and Dw.3 all mean the same thing: p=3. Here is a column of numbers formatted 6 different ways, illustrating various choices of p:

     d10.1      d10.2      d10.3      d10.5      d10.7      d10.9
 
2.06115E-9 2.06115E-9 2.06115E-9 2.06115E-9 2.06115E-9 2.06115E-9
 0.0000061  0.0000061 6.14421E-6 6.14421E-6 6.14421E-6 6.14421E-6
 0.0000167  0.0000167  0.0000167 0.00001670 0.00001670 0.00001670
    7.3891     7.3891    7.38906   7.389056  7.3890561 7.38905610
   20.0855    20.0855   20.08554  20.085537  20.085537 20.0855369
   54.5982    54.5982   54.59815  54.598150  54.598150 54.5981500
  148.4132   148.4132  148.41316    148.413  148.41316 148.413159
  403.4288   403.4288  403.42879    403.429  403.42879 403.428793
 1096.6332  1096.6332       1097   1096.633  1096.6332 1096.63316
 8103.0839  8103.0839       8103   8103.084  8103.0839 8103.08393
     22026      22026      22026  22026.466  22026.466 22026.4658
     59874      59874      59874  59874.142  59874.142 59874.1417
    162755     162755     162755     162755  162754.79 162754.791
    442413     442413     442413     442413  442413.39 442413.392
   1202604    1202604    1202604    1202604  1202604.3 1202604.28
   8886111    8886111    8886111    8886111  8886110.5 8886110.52
  24154953   24154953   24154953   24154953   24154953 24154952.8
  65659969   65659969   65659969   65659969   65659969 65659969.1
 178482301  178482301  178482301  178482301  178482301  178482301
 485165195  485165195  485165195  485165195  485165195  485165195

Some of the results are as follows:

  • Display at least 1 significant digit. When p = 1 (the minimum) , there are relatively few groups of values and decimals that align within those groups. Hence decimals appear in fewer of the w positions, and as a result, some values might be displayed with relatively less precision. In this case, there is one group of values that is displayed with the Ew. format and three additional groups (decimals in positions 3, 6, and not displayed).
  • Display at least 2 significant digits. When p = 2, there tend to be more ranges of values where decimals align. Hence, decimals might appear in more of the w positions, and values tend to be displayed with more precision. In this case, there is one group of values that is displayed with the Ew. format and three additional groups (decimals in positions 3, 6, and not displayed).
  • Display at least 3 significant digits. When p = 3 (the default), there will tend to be still more ranges of values where decimals align. Hence, decimals might appear in still more of the w positions, and values tend to be displayed with still more precision. In this case, there is one group of values that is displayed with the Ew. format and three additional groups (decimals in positions 3, 5, and not displayed). Notice that some values are displayed with more precision than with smaller values of p.
  • Display at least 9 significant digits. When p = 9 (the maximum), there will tend to be still more ranges of values where decimals align. Hence, decimals can appear in all but the first and last of the w positions, and values will be displayed with maximum precision. Decimals will not align for many values. In this case, there is one group of values that is displayed with the Ew. format and eight additional groups (decimals in positions 2, 8 and not displayed). Notice that most values are displayed with more precision than with smaller values of p.

As p increases, more different w.d formats are likely to get chosen, values tend to be displayed with more precision, and decimals are less likely to align. The choice of p and w are related. If you want to ensure more worst-case precision, then a wider field width is appropriate. Also, one advantage of Dw.d is that it allows more moderate width columns. If all you have is w.d, and you want to ensure reasonable precision for values of different scales, they you're pretty much forced to use very wide columns with a large number of decimal places and hope for the best.

Submit the following steps to see more values that are formatted using a variety of D and other formats. The table of formatted numbers displayed above is a subset of the results displayed by these steps.

data x(drop=i j l f);
   array d[13] d0-d12;
   length l f $ 200;
   do j = 0 to 9;
      l = catx(' ', l, cats('d', j), '=', cats('"d10.',j,'"'));
      f = catx(' ', f, cats('d', j), cats('d10.',j));
   end;
   l = catx(' ', 'label',  l, 'd10 = "best10." d11 = "bestd10.9" d12 = "10.5";');
   f = catx(' ', 'format', f, 'd10 best10. d11 bestd10.9 d12 10.5;');
   call symputx('lf', catx(' ', l, f));
   do i = -20 to 20;
      d0 = exp(i);
      do j = 2 to 13; d[j] = d0; end;
      output;
   end;
run;
 
options ls=142;
proc print noobs label; &lf run;

Letting ODS pick the format. In an ODS table template, you can use the CHOOSE_FORMAT= option to choose a format. Use it with the FORMAT_WIDTH=w option to choose a good format for the specified width (w).

CHOOSE_FORMAT=MAX_ABS
picks a good format for a single number based on the maximum absolute value. This is the simplest case.

CHOOSE_FORMAT=MAX
picks a good format for a single number based on the maximum. Values are all expected to be nonnegative so no space is reserved for a minus sign.

CHOOSE_FORMAT=MIN_MAX
picks a good format for a single number based on the minimum and the maximum. Values can be mixed, positive and negative. This option provides room for a minus sign only where it is actually needed. So if all values happen to be positive, no room is reserved for a minus sign. If values range from say -5 to 10, no extra column is needed for the minus sign. If you also specify FORMAT_NDEC=ndec in PROC TEMPLATE, ndec provides a ceiling on the maximum number of decimal places.

CHOOSE_FORMAT=COMPROMISE
picks a good w.d format for a column of numbers if they are not too heterogeneous.

History. The w.d and BESTw. formats have been around for decades. Many SAS formats were written by Rick Langston, although the original mainframe w.d and BESTw. formats precede Rick. The functionality that underlies the Dw.p format has also been around for decades, but it has not always been available as a format. Mark Little wrote it, and for many years it was only available for procedures to use internally. When SAS started making output using ODS, then that functionality was surfaced as the Dw.p format. The BESTDw.p format came much later.

Conclusions. When you explicitly specify a format, you typically specify w.d when the values have a known range (such as correlation coefficients) or Dw.p when values are more diverse. You might instead use BESTDw.p when you expect a mix of integers and nonintegers. BESTw. is not the best format for columns; it is best when used for scalars. Ew. is useful for large or small numbers or when extra precision is desired. Dw.p. and BESTDw.p work well when the range of values is large. Experiment with the precision parameter p to get the best display of your results. In particular, setting p to w - 1 and specifying BESTDw.p provides a good alternative to the BESTw. format.

The BEST D formats that you have ever seen! was published on SAS Users.

29
Jan

The data behind jail breaks

You've probably seen headlines about the recent jail break in California. Do you think they'll catch those guys? Is there any data available about recapturing escapees? You betcha! ... And do they need some help graphing that data? It appears so! But before we get started on the graphs, here's […]

The post The data behind jail breaks appeared first on SAS Learning Post.

18
Aug

Why should we teach Roman numerals?

In my local paper this morning, I read about how a North Carolina state commission plans to recommend changes to our teaching standards for mathematics. One of the topics that they want to bring back: Roman numerals. Why? According to my exhaustive 30 seconds of Internet research, the only practical applications of Roman numerals are: I) understanding Super Bowl numbering, and II) reading the time on old-fashion clocks.

But I don't need convincing. I believe that there are other advantages of teaching Roman numerals. The main lesson is this: the world has not always revolved around "base 10" numbering, and actually it still doesn't today. Having the ability to express numbers in other forms helps us to understand history, passage of time, technology, and even philosophy*.

In the popular media, binary (base 2) is famous for being "the language of computers". That may be so, but binary is not usually the language of computer programmers. When I was a kid, I spent many hours programming graphics on my TI 99/4A computer. I became proficient in translating decimal to hexadecimal (base 16) to binary -- all to express how the pixels would be drawn on the screen and in what color. Due to lack of practice and today's availability of handy tools and higher-level programming languages, I have since lost the ability to calculate all of these in my head. I also lost the ability to solve any Rubik's Cube that I pick up -- there go all of my party tricks.

But the SAS programming language retains many fun math tricks, including the ability to express numbers in many different ways, instantly. Here's an example of one number expressed six (or 6 or VI or 0110) different ways.

data _null_;
  x = 1956;
  put  / 'Decimal: '     x=best12.;
  put  / 'Roman: '       x=roman10.;
  put  / 'Word: '        x=words50.;
  put  / 'Binary: '      x=binary20.;
  put  / 'Octal: '       x=octal10.;
  put  / 'Hexadecimal: ' x=hex6.;
run;

The output:

Decimal: x=1956
Roman: x=MCMLVI
Word: x=one thousand nine hundred fifty-six
Binary: x=00000000011110100100
Octal: x=0000003644
Hexadecimal: x=0007A4

You might never need some of these number systems or SAS formats in your job, but knowing them makes you a more interesting person. If nothing else, it's a skill that you can trot out during cocktail parties. (I guess I attend different sorts of parties now.)

* For example, the number 'zero' has not always been with us. Introducing it into our numbering system allows us to think about 'nothing' in ways that earlier societies could not.

tags: formats, stem

The post Why should we teach Roman numerals? appeared first on The SAS Dummy.

13
Aug

Format your data like a social media superstar

In my industry of data and computer science, precision is typically regarded as a virtue. The more exact that you can be, the better. Many of my colleagues are passionate about the idea, which isn't surprising for a statistical software company.

But in social media, precision is a stigma -- especially when applied to the number of followers or connections that you have. Do you have so few connections that it can be represented as an exact number? Well, I wouldn't go bragging about that!

For example, let's look at this new LinkedIn joiner (blurred to protect his identity). As of this writing he has only -- and exactly -- 15 connections! (For the record, I consider this guy to be a rising star; I'm sure that his connections will skyrocket very soon.)

LinkedIn newcomer
Compare this to a more seasoned networker, shown below. LinkedIn won't tell us how many connections he has; perhaps they are beyond measure. We can see only that he has 500 or more LinkedIn connections. It might be 501 or it might be 500 million. It doesn't matter, according to LinkedIn. All you need to know is that this guy is a highly sought-after connection.

linkedin_pro
Let's create a SAS format to count connections, LinkedIn-style. It's simple: for cases of 500 or more, the label should be "500+". For fewer than 500, the label will be the embarrassingly precise actual number.

libname library (work); /* "library" is automatically searched for formats */
proc format lib=library;
 value linkedin    
  500 - high ='500+'
  /* all other values fall through to actual number */
 ;
run;
 
/* test data */
data followers;
  length name $ 40 followers 8 displayed 8;
  format displayed linkedin.;
  infile datalines dsd;
  input name followers;
  displayed = followers;
datalines;
Chris Hemedinger, 776
Kevin Bacon, 100543
Norman Newbie, 3
Colleen Connector, 499
;
run;

Here's the result. Notice how the "displayed" value acts as sort of an equalizer among the super-connected. Once you've achieved a certain magnitude of connections, you're placed on par with all other "superstars".
linkedin_data

Social sites like Instagram and Twitter operate on a different scale. While they still sacrifice precision when displaying a large number of "likes" or "followers", they do at least present a ballpark number that shows the magnitude of the activity. For example, instead of showing the exact follower count for Ronda Rousey (public figure, film star, MMA champ, and daughter of an experienced SAS presenter), Instagram simply truncates the number to the thousands and adds a "K".

rrig
To achieve the same in SAS, we can rely on a PICTURE format. This format contains the templates for what a number should look like, plus the multipliers to apply to get to the figure we want to display. Here's an example program, which I repurposed from the BIGMONEY example in SAS documentation.

proc format lib=library;
picture social (round)
 1E03-<1000000='000K' (mult=.001  )
 1E06-<1000000000='000.9M' (mult=.00001)
 1E09-<1000000000000='000.9B' (mult=1E-08)
 1E12-<1000000000000000='000.9T' (mult=1E-11);
run;
 
/* test data */ 
data likes (keep=actual displayed);
format actual comma20.
 displayed social.;
 do i=1 to 12;
  actual=16**i;
  displayed = actual;
  output;
 end;
run;

Here's the result:
socialcount_output

As you can see, the "displayed" values show you the magnitude of big "likes" in a cool-but-not-so-eagerly-precise way.

Here's my complete test program if you want to try it yourself. Happy networking!

tags: formats, LinkedIn, PICTURE formats, precision, social media
7
May

Adding Harvey Balls to your SAS reports

I'm currently working on a large project for a SAS customer. The project comprises many activities and phases, so there is a need to track progress on many different levels. During a recent meeting the project manager announced, "I'm putting together a status deck, and I'll include some Harvey Balls for each item."

"Harvey whats?" I said, nearly spitting out my coffee.

I've since learned that Harvey Balls are a project-management thing. They are used as way to communicate qualitative progress towards a goal: not just whether an item is complete, but also how far along it is. (Please, don't confuse these with Harvey wallbangers, which are used to measure progress towards something else.) Harvey Balls remind me of the little circles that I see in Consumer Reports product reviews.

A web search will yield lots of examples of how to add Harvey Balls to your PowerPoint slides and Excel spreadsheets -- favorite domains of project managers. However, there isn't much (any?) about how to use these in SAS, so with this blog post I hope to corner the market.

First, a visual -- here's what Harvey Balls look like in a SAS report:

hbreport
It turns out that the Harvey Ball glyphs are built into some of the fonts that you probably have installed, and can be accessed by referencing their Unicode character values. For ODS-based reports, we can include any Unicode character that we need by using:

  • the ODS escape character, combined WITH...
  • the {unicode} keyword, AND..
  • the hexadecimal code of the character we want.

For example:

ods escapechar='~';
title "This is a full-circle Harvey Ball: ~{unicode '25CF'x}";

yields this:

hbtitle
For a more data-driven approach, I decided to encode the Harvey Balls in a numeric SAS format. In my mapping, I set values between 0 and 4, inclusive. 0 means "empty" or no progress, whereas 4 means "full" or complete. 1 through 3 represent a quarter, halfway, and three-quarters, respectively.

ods escapechar='~';
proc format lib=work;
value hb  
  0 = "~{unicode '25CB'x}" /* empty circle */
  1 = "~{unicode '25D4'x}" /* quarter */
  2 = "~{unicode '25D1'x}" /* half */
  3 = "~{unicode '25D5'x}" /* three-quarter */
  4 = "~{unicode '25CF'x}" /* full */
;

Here's a partial listing of the PROC REPORT code that uses the format. Note that I added some other appearance treatments to the report column with the Harvey Ball. Because the font glyph tends to be small, I bumped up the font size to 18 points. I also specified Lucida Sans Unicode as the font face because I know that font contains the right glyphs. Finally, I tweaked the alignments to account for the larger padding that accompanies the larger font size, ensuring the data looked vertically aligned properly in each row.

proc report data=progress;
 columns  stage dev test prod;
 define stage / 'Activity' display style=[verticalalign=center];
 define dev  / '% Dev' format=hb.
  style(column)=[
   font_face='Lucida Sans Unicode' 
   font_size=18pt 
   verticalalign=center 
   just=c 
   color=blue
   ];
/* ... */

>> Download the complete example program from here. (Should work with SAS 9.2 or later)

Bonus: If you have SAS 9.4, you can modify this program to place the report directly into PowerPoint! This makes it perfect for consumption by project managers everywhere.

ods powerpoint file="c:\temp\hb.pptx" style=PowerPointLight;
/* proc report section here...*/
ods powerpoint close;

Here's a screenshot of the result (click to see it full size):

hbppt

See also

The Power of Unicode (using these characters in SG plots)
Unicode Tick Values using GTL
The Great Escape(char) - great paper by SAS user Louise S. Hadden

tags: formats, Harvey Balls, ods escapechar, powerpoint, SAS 9.4
30
Dec

SAS Visual Analytics: How Catwoman Influenced My Data

Image: CatWoman from Wikipedia

Image: Wikipedia

Eartha Kitt was the best Cat Woman. It’s true – her voice and diction was what made her so purr-fect for the role. When I was younger I would watch the Batman reruns (POW!) and then try to walk and talk like Eartha.  ”How can BatGirl be the best anything when CatWoman is around??” [Check out CatWoman in action here!]

You’re probably wondering what that statement has to do with SAS or why I would mention it at all. Last week I was raving about how wonderful it is to use formats with SAS Visual Analytics to solve all your problems and pretty much end all suffering in the world (well, reporting world).  But here’s a tip for working with dressed up  data items.

A Costume Does Not a Superhero Make

A format is a way to change how your data appears in a chart or table.  You can think of a format like a costume. For instance, even if I had a Cat Woman costume, it would not suddenly empower me with any special powers other than looking similar to her. You wouldn’t find me thinking I had new agile jumping abilities or for that matter deciding to embark upon a criminal enterprise with my new friend the Joker. So the costume would make me look different, but inside I’m still just a geeky SAS programmer.

Think of a format the same way (this is an important concept litter mates!)  Your underlying data does not change because you changed the format. For instance, using the same example from last week, here is my data. The data item, Arrival Date (Original), is the true value; it shows the calls per minute.  There were several ways to change the data item so it could appear as more than one superhero (or date value).  Actually I guess CatWoman was super villain not a hero.

visual analytics data items formatted

Has the Joker Been in My Report?

Let’s say that we want to create an “Average Actions Per Hour” data item to plot in a chart similar to this one.  The simplest thing to do is duplicate the Actions (Total) measure and change it’s aggregation from Sum to Average.  That’s what I did in the following graphic. But when I double-checked my math on the averages … it was not what I expected.

sas visual analytics chart averages

Click for larger image

My expectation was that 19 actions divided by 3 days would be 6.3 not 3.8.  Eeek … what happened? I doubled checked and the SUM is correct – there were a total of 19 calls.  The average is wrong … but why? Let’s start by looking at the raw data and then removing some assumptions.  My first assumption was that because the X-axis had the actions plotted by Hour that the Actions (Averaged) data item would know what to do.  Turns out it does know what to do … but just not the way I was thinking.

sas visual analytics average aggregationRemember when I said the costume didn’t make me Cat Woman – well the hour format did change the underlying data in this instance either.  Instead of totaling the actions (19) and then dividing by the number of days (3) it divided by the number of values.  In this case – there were 5 values contributing to the end result.  SAS Visual Analytics didn’t sum by hour – instead it did it by minute.

Geez – What is SAS Visual Analytics Doing?

Ok – now we know what we don’t want and how we inadvertently introduced errors into the reporting process. [Another  failed example here.] Seems like we just need to aggregate to the hour level and then it should understand. As we went through last week, create a new calculated item.  Since I want the data to be a aggregated to the hour, I need to remove the minutes and seconds from the data item.  Using the DateTimeFromDateHMS function, I rebuilt the data item to be at the hour level.  The entire trick is really when the 0s are used for the minute and second.

SAS VIsual analytics create calculated item

Click for larger image

 

Then I changed the format to Time showing only the hour.  My result is below.  SAS VA is still using the count of variables as the denominator for the average.  So the other values are not having any influence on the calculation. As Robin would say, “The batcomputer is none too frisky today, Batman!!”

formats_05

Turn off the Bat Signal … I’ve Got it!

Tell commissioner Gordon we’ve figure it out – no Bat Signal required. We have to let the calculation know how we want it calculated. So let’s build a new aggregation for our measure.

  1. Create a New Aggregated Measure.
  2. Use a formula similar to the one below.  It’s just sum of actions divided by distinct count of arrival days.formats_06
  3. Update the chart to use your new measure.

Now let’s look at these results next to our original so we can compare how the chart changes. I added a date range slider so the users can determine how to explore the data. Here’s a video I made to show how the date range slider works for this example. In the video you can see how the new measure handles the distinct number of days.

sas visual analytics date range example

Click for larger image.

 

Oh What Purr-fection!

It’s like many things – when you let the machine do everything it sometimes doesn’t make the choice you expect or intent.  Two lessons from this post – how to create an aggregate and don’t forget to double-check the math.

Don’t forget to check out the video.


Learn More about SAS Visual Analtyics

You can learn more about how companies are using SAS Visual Analytics to add value to their data and analysis in the upcoming SAS Visual Analytics book. If you want to know when the book is available, add your name to the list.

Post SAS Visual Analytics: How Catwoman Influenced My Data appeared first on BI Notes for SAS® Users. Go to BI Notes for SAS® Users to subscribe.

5
Sep

New and improved: Importing SPSS data files in SAS Enterprise Guide

It's been almost 5 years since I posted one of the first custom tasks for SAS Enterprise Guide. It was a task that made it easier to convert SPSS data files into SAS data sets.

Like many projects that begin as custom tasks, this one later became a feature in the product. Since SAS Enterprise Guide 4.2, the "Import SPSS File" task has appeared on the Data menu (along with Import JMP File and Import Stata File).

As a few readers observed, the task is limited because it ignores some of the nuances of SPSS data files. For example, SPSS data files can feature value labels, which are similar in concept to SAS formats. A value label maps a raw data value (such as 1 or 0) to a formatted display value (such a 'Male' or 'Female'). In SAS, it's a user-defined format that contain the rules to map data for display or grouping purposes. Also, the current task stores the imported data in the temporary WORK library; if you want to place it in a permanent location, you need to add another step in the process.

In response to customer requests, we've built a new version of the task that provides more control and some new options. Consider using this new task when:

  • You want to store the imported data in a permanent SAS library, not just in the temporary WORK library. The new version of the task allows you to select any SAS library that you have defined.
  • The SPSS data file contains value label definitions that you want to keep with the data going forward. During the import process, SAS converts SPSS value labels into user-defined SAS formats. These formats are stored in a secondary file called a SAS catalog, which must remain available for use the next time you open your new data in SAS. The new version of the task provides more options for managing these SAS formats.

How do you know whether your SPSS data files contain value labels? When you import an SPSS data file, review the results in the SAS Enterprise Guide data grid. When you see a numeric column (indicated with a blue circle icon) that displays character values, then you know that there is a SAS format at work.

The "giveaway" is the blue circle icon near the column name, as well as the tooltip that appears when you hover the mouse pointer over the column. Despite the fact that the column type is numeric, the displayed values are "female" and "male".

Downloading and using the new task

The new version of the task is built to work with SAS Enterprise Guide 5.1 and later. Like the previous version, this task requires that you have SAS/ACCESS to PC Files installed with SAS, as it still uses PROC IMPORT to convert the SPSS file to a SAS data set. The task imports SPSS .SAV files. You can download the new custom task from here:

>> Download task: Import SPSS data and value labels

The task package is a ZIP archive that contains a DLL and documentation about how to install and use it. (And don't forget this trick for installing if it doesn't work right away.)

Once you have it installed, you'll see the task in Tools->Add-Ins->Import SPSS data and value labels. It does not replace the Import SPSS File task that is already in the Data menu; this is a new task. Here's a preview of the task window:

As always, let me know what you think of the task and whether you find it useful.

tags: formats, SAS custom tasks, SAS Enterprise Guide, SPSS
Back to Top