Preparing Sample Data for SAS-L

From sasCommunity
Jump to: navigation, search

Some advice on preparing sample data to post on SAS-L.

Note: Some of the examples in this article will be confusing and ineffective if there is unintentional wrapping. The following 50-character "ruler" should occupy just two lines

0        1         2         3         4         5
12345678901234567890123456789012345678901234567890

If the ruler is broken, try to fix it by widening your browser window or reducing text size.

Avoid Wrapping

This is hard to read:

Name                    Sex                                      
 Age                  Height                  Weight
James M 12 57 83 Jane F 12 60 85 John M 12 59 100 Joyce F 11 51 51 Louise F 12 56 77 Robert M 12 65 128 Thomas M 11 58 85

It's better to get rid of most of the excessive whitespace between columns so that the table looks more like this

 Name     Sex    Age    Height    Weight
James M 12 57.3 83.0 Jane F 12 59.8 84.5 John M 12 59.0 99.5 Joyce F 11 51.3 50.5 Louise F 12 56.3 77.0 Robert M 12 64.8 128.0 Thomas M 11 57.5 85.0

Of course wrapping isn't always a result of excessive whitespace. There may simply be too many variables, or variable names or values which are rather long. In that case, it may be appropriate to shrink the problem, not just the data presentation, by reducing the number of variables and/or shortening their names and/or their values.

Use a Monospace Font

Don't mis-align columns, like this

Name      Sex    Age  Height    Weight
James M 12 57.3 83.0 Jane F 12 59.8 84.5 John M 12 59.0 99.5 Joyce F 11 51.3 50.5 Louise F 12 56.3 77.0 Robert M 12 64.8 128.0 Thomas M 11 57.5 85.0

That seems rather obvious, and it is unlikely that anyone would do this deliberately. Nevertheless, the SAS-L archives contain many, many data presentations with such ragged alignment. This most likely happens when people use a proportional font while editing. Here is the same table, with the same number of space characters between fields, but displayed with Times New Roman, a proportional font

Name      Sex    Age  Height    Weight
James M 12 57.3 83.0 Jane F 12 59.8 84.5 John M 12 59.0 99.5 Joyce F 11 51.3 50.5 Louise F 12 56.3 77.0 Robert M 12 64.8 128.0 Thomas M 11 57.5 85.0

The columns appear to be aligned (not perfectly, but nearly so), but that's just an illusion.

Generally, it is not possible to edit whitespace within a table in a way which will look right with both proportional and nonproportional (monospace) fonts. However, it's the monospace rendering which is important in a SAS-L post, because the INPUT statement does not care about character widths.

So use Courier New or some other monospace font when preparing tabular content. Here is the same example, with the whitespace properly edited.

Name      Sex    Age    Height    Weight
James M 12 57.3 83.0 Jane F 12 59.8 84.5 John M 12 59.0 99.5 Joyce F 11 51.3 50.5 Louise F 12 56.3 77.0 Robert M 12 64.8 128.0 Thomas M 11 57.5 85.0

What does this look like rendered in Times New Roman? Like this

Name      Sex    Age    Height    Weight
James M 12 57.3 83.0 Jane F 12 59.8 84.5 John M 12 59.0 99.5 Joyce F 11 51.3 50.5 Louise F 12 56.3 77.0 Robert M 12 64.8 128.0 Thomas M 11 57.5 85.0

It look wrong, but in fact it is right for SAS-L

Get Rid of Tabs

Another possible cause of misalignment is the use of tabs in the whitespace. These will likely be converted to spaces, or multiple spaces, before readers see your data, but not in a way which will present straight columns.

So don't organize data in Excel and copy it directly into your post. Instead replace the tabs with spaces and add or subtract spaces to align the columns (using a monospace font, of course).

Make it Copy/Paste/Run Ready

This is especially appropriate for the "given" data (as distinguished from the needed results). If it's easier for people to work with your data in SAS, it increases the chances that you will get good answers.

So don't just provide the variable names and values. Incorporate them into a complete DATA step. For example

data myexample;
infile cards dsd;
input Name$ Sex $ Age Height Weight;
cards;
James,M,12,57.3,83
Jane,F,12,59.8,84.5
John,M,12,59,99.5
Joyce,F,11,51.3,50.5
Louise,F,12,56.3,77
Robert,M,12,64.8,128
Thomas,M,11,57.5,85
;

Of course this is not particularly easy to absorb visually.

Best Practice: Have It Both Ways

Here's a way to present data so that it easy to read visually and easy to read into SAS.

data myexample;
informat Name $ 10. Sex $1.;
input
Name Sex Age Height Weight  ; cards; James M 12 57.3 83.0 Jane F 12 59.8 84.5 John M 12 59.0 99.5 Joyce F 11 51.3 50.5 Louise F 12 56.3 77.0 Robert M 12 64.8 128.0 Thomas M 11 57.5 85.0
;

The data grid is visually distinct and uncluttered, yet the code is runnable.