PROC SQL by Example

From sasCommunity
Jump to: navigation, search

Contents

Overview

60500.gif

Title: PROC SQL by Example: Using SQL within SAS

Author: Howard Schreier

ISBN: 978-1-59994-297-1

First Printing: September 2008

Click on the image to order the book from SAS online bookstore.


Description

Howard Schreier's book explains and illustrates the use of PROC SQL in the context of the SAS DATA step and other SAS procedures (such as SORT, FREQ, MEANS/SUMMARY, APPEND, DATASETS, and TRANSPOSE) whose functionality overlaps and complements that of SQL. Using a side-by-side approach, this concise reference guide includes many extensively explained examples showing equivalent DATA step and SQL code. It enables SAS users to take advantage of existing SAS skills and knowledge while learning about SQL. Discussions cover the differences between SQL and the DATA step as well as situations where SQL and the DATA step are used together to benefit from the strengths of each.

Reviews

Read the first set of reviews about this book.

Errata

Page No(s). Severity Description
33 * The CREATE TABLE statement should be preceded by a PROC SQL statement and followed by a QUIT statement.
41 * The RUN statement which appears in lower case should be in upper case.
62 *** The first SELECT statement should be preceded by a PROC SQL statement and followed by a QUIT statement.
92 *** The hypothetical CREATE TABLE statement should be preceded by a PROC SQL statement and followed by a QUIT statement.
99 ** There are obviously three statements in the code. The first sentence of the paragraph which follows should read: Before we look at the effect of this step (which comprises, apart from the PROC and QUIT statements, a single statement terminated by a single semicolon), let’s look at the syntax and compare it to that of the join.
133, 135, 137, 139 ** The running head (chapter indicator preceding the page number) is incorrect.
135 *** In each of the two SELECT statements, whitespace is needed between the FROM and GROUP BY clauses.
  FROM         twelves
  GROUP BY     sex
135 ** There should be a blank TITLE statement (TITLE;) after the last SELECT statement. Without it, the last declared title will persist.
147 ** There should be blank TITLE and FOOTNOTE statements (TITLE; FOOTNOTE;) after the SELECT statement. Without them, the last declared title and footnote will persist.
150 ** The next-to-last %PUT statement is superfluous.
154, 164, 173 * The RUN statements in the PROC DATASETS steps should be indented.
163 * In the DATA step, several tokens should be in upper case, as in
DATA simple;
DO Measure = 1 TO 3; OUTPUT; END;
RUN;
164, 173, 190 ** The PROC DATASETS steps should be terminated with QUIT statements.
176 *** Each of the two INSERT statements should be preceded by a PROC SQL statement and followed by a QUIT statement.
215 * The RUN statement which appears in lower case should be in upper case.
224 *** The SQL code should be preceded by a PROC SQL statement and followed by a QUIT statement.
228 * To align tokens in the code, one space should be removed before the slash in the BREAK statement.
252 *** The CREATE TABLE statement must be followed by a QUIT statement. The statements must be submitted together in order to get accurate timings.

Severity is reflected in the number of asterisks. One asterisk (*) indicates a minor issue such as a spelling or grammar mistake, or a deviation from style. Two asterisks (**) indicate ambiguity or other source of possible confusion. Three asterisks(***) indicate an incorrect statement (or code which produces incorrect results) where the mistake is fairly obvious. Four asterisks (****) indicate a subtle mistake.

Readers' Questions and Comments

Just edit this page or use the discussion tab.


Q: Some of the examples deliberately trigger ERROR conditions. How does one prevent those from "snowballing" and interfering with subsequent examples.

A: Within a PROC SQL session, run
RESET EXEC;
At the global level, run
OPTIONS NOSYNTAXCHECK OBS=MAX;

pp. 27-28
Q: Is there a second form of CASE, parallel to SELECT (expression); etc.?

A: Yes. To illustrate, suppose that the destinations for the field trips depended only on age, with all 11-year-olds going to the zoo and all 12-year-olds to the museum. Adjusting the CASE structure in the book to these rules would result in
CASE WHEN age=11 THEN 'Zoo'
     WHEN age=12 THEN 'Museum'
     ELSE             '[None]'
     END
This can also be expressed more compactly as
CASE age WHEN 11 THEN 'Zoo'
         WHEN 12 THEN 'Museum'
         ELSE         '[None]'
         END

p. 29
Q: What happens when PROC PRINT or a DATA step is fed zero observations?

A: In both situations the behavior is like that of PROC SQL. If you run
PROC PRINT DATA=preteen;
WHERE age=10;
RUN;
you see
NOTE: No observations were selected from data set WORK.PRETEEN.
If you run
DATA tens;
SET preteen;
WHERE age=10;
RUN;
an empty (zero-observation) data set is created.

p. 45
Q: Is this usage of DISTINCT (within a function call) allowed in clauses other than SELECT?

A: Why not? Here's an example.
PROC SQL;
SELECT       sex  
FROM         sashelp.class
GROUP BY     sex
HAVING       MEAN(DISTINCT age) GT 13;
QUIT ;

pp. 59-78
Names less cryptic than u1 and m1 would make things easier to follow.

The mnemonic intent (not explained in the book) is "U" for "unique" (that is, keys which do not repeat) and "M" for "multiple" (keys which do repeat).

pp. 85-88
Q: The documentation says that NOT can be coded directly before EXISTS or IN to negate the result of the comparison. Are there similar constructs for the other subquery contexts (ANY, ALL)?

A: . ANY and ALL themselves cannot be negated, but the comparison operators to which they bind can be negated, or the overall evaluations can be negated. For example:
SELECT       name
FROM         sashelp.class
WHERE        SUBSTR(name,1,1) EQ ANY (SELECT sex FROM sashelp.class);
This returns only the name "Mary", the only name in the table beginning with F or M. Now replace EQ with NE:
SELECT       name
FROM         sashelp.class
WHERE        SUBSTR(name,1,1) NE ANY (SELECT sex FROM sashelp.class);
As long as the subquery returns at least two distinct values for SEX, the condition will always be true (since any initial will be not equal to one or the other or both). That's the case here, so the result set contains all of the names found in the source table.
Finally, use EQ but prefix the whole expression with NOT:
SELECT       name
FROM         sashelp.class
WHERE        NOT SUBSTR(name,1,1) EQ ANY (SELECT sex FROM sashelp.class);
This returns all names except "Mary" (in other words, the complement of the first result).

pp. 89-91
Q: Where are ANY and ALL documented? They don't appear in the SQL Procedure Component Dictionary.

A: ANY and ALL are documented only by example, as part of the explanation of subqueries. See the section "Query Expressions (Subqueries)" in the "sql-expression" page, which does appear in the SQL Procedure Component Dictionary.

p. 95
Q: Exactly how can the missing values be replaced with zeroes?

A: Use the COALESCE function, with the subquery as the first argument and zero as the second.
PROC SQL;
SELECT       fname,
             age,
             COALESCE( ( SELECT   COUNT(*)
                         FROM     classgirls
                         WHERE    moregirls.age    = classgirls.age
                           AND    moregirls.fname ^= classgirls.fname
                         GROUP BY classgirls.age
                       ) 
                       , 0 ) AS SameAge
FROM         moregirls
;
QUIT;

p. 119
The same table name (UNIONALL) was used in an earlier example. Elsewhere, SQL and DATA step examples use the same names for their output. It would be better to have distinct names for all these tables.


pp. 8, 154, 157, 212
Q: The cited section on "Concepts" appears to be nonexistent. Where is the content?

A: The author would appreciate any information about that. The section did appear in earlier editions of the Procedures Guide, such as 9.1.3.

p. 173:
Q: How can PROC SQL drop the table when PROC DATASETS has just done exactly that?

A: The two steps are either/or alternatives. So to see both in action, the target table has to be re-created. Here is a briefer example.
DATA demo;
RUN;
PROC DATASETS;
DELETE demo; 
   RUN; 
QUIT;
DATA demo; 
RUN;
PROC SQL;
DROP TABLE demo;
QUIT;

pp. 193-194
Q: Why is the integrity constraint violation an ERROR in SQL but just a NOTE when the same thing is attempted in a DATA step?

The author does not have a good answer, and would appreciate hearing from anyone who does.
There's even more to ponder when one looks at the treatment of non-violating rows. The example in the book does not raise this issue, so instead consider:
PROC SQL;
INSERT INTO  demolib.fifteenups
SET          fname   = "Newbie"
SET          fname   = "Ronald"
SET          fname   = "Lastly"
;
QUIT;
The log reports:
ERROR: Add/Update failed for data set DEMOLIB.FIFTEENUPS because
       data value(s) do not comply with integrity constraint norepeats.
NOTE: Deleting the successful inserts before error noted above to
      restore table to a consistent state.
So PROC SQL first inserted the Newbie row, but rolled that back when it detected the duplication in the Ronald row, and stopped before processing the Lastly row. In contrast, here's how the DATA step works:
DATA demolib.fifteenups; 
fname = 'Newbie';
OUTPUT;
fname = 'Ronald';
OUTPUT;
fname = 'Lastly';
OUTPUT;
STOP;
MODIFY demolib.fifteenups;
RUN;
The log shows:
NOTE: The data set DEMOLIB.FIFTEENUPS has been updated.  There were 0
      observations rewritten, 2 observations added and 0 observations deleted.
NOTE: There were 0 rejected updates, 1 rejected adds, and 0 rejected deletes.
So the observation violating the constraint was not written to the data set, but the other two were.
Also, note that the DATA step's _IORC_ automatic variable and the SQLRC automatic macro variable can be used to monitor these processes. Keep in mind that the system option UNDOPOLICY and the PROC SQL option UNDO_POLICY affect the behavior of the SQL process. See Usage Note 11051 for details on the interaction of UNDOPOLICY and SQLRC.

p. 200
Q: Shouldn't the SET statement in the second DATA step on the page be SET GDS_Demo? Similarly, shouldn't the second CREATE TABLE on p. 202 be FROM GDS_Demo?

A: It doesn't matter. The GDS (generation data sets) feature is about naming and referencing data sets, not about the structure and content of those data sets. There is no requirement that the N+1st generation be derived from the Nth.

pp. 229-230
The file names in the ODS HTML statements appear to be specific to Windows. Also, the code won't work unless &PATH can be resolved.


p. 234
Q: Why is the code to create ARRIVALS and DEPARTURES not shown?

A: From the author's files:
DATA Arrivals;
INPUT Name $ Arrival   : TIME5.;
FORMAT       Arrival     TIME5.;
CARDS;
John   14:30
Paul   15:00
Ringo  15:30
George 16:00
;
 
DATA Departures;
INPUT Name $ Departure : TIME5.;
FORMAT       Departure   TIME5.;
CARDS;
John   15:30
Paul   16:00
Ringo  17:30
George 18:00
;

p. 250
Q: Why create the macro variable and include it in a %PUT statement when the outcome is not shown? Also, where did the the column heading in the output originate?

A: Here is code, from the author's files, which displays the answer as output rather than via a macro variable:
PROC SQL;
SELECT       COUNT(*) LABEL = 'Occurrences of 654321'
FROM         myRandoms
WHERE        myRandom = 654321
;
QUIT;
The result:
Occurrences
  of 654321
-----------
         22

pp. 255-269
There are a lot of page numbers, in the boxes, which don't refer to pages in this book, but rather to pages in the various SAS manuals. It's confusing.


p. 261
The V. 9.2 doc includes more choices for object-item, specifically

*table-name.* 
specifies all columns in the PROC SQL table that is specified in table-name.

*table-alias.* 
specifies all columns in the PROC SQL table that has the alias that is specified in table-alias.

*view-name.* 
specifies all columns in the SAS view that is specified in view-name.

*view-alias.* 
specifies all columns in the SAS view that has the alias that is specified in view-alias.
Also, it says that a "SELECT Clause Lists the columns that will appear in the output", whereas this book seems to use "SELECT Clause" more broadly.

p. 271
There is no libref MYLIB assigned.



Visit the SAS Press site for this book.

Link to this page: http://tinyurl.com/sqlbook

E-mail the author.