PROC SQL by Example
Title: PROC SQL by Example: Using SQL within SAS
Author: Howard Schreier
First Printing: September 2008
Click on the image to order the book from SAS online bookstore.
Howard Schreier's book explains and illustrates the use of PROC SQL in the context of the SAS DATA step and other SAS procedures (such as SORT, FREQ, MEANS/SUMMARY, APPEND, DATASETS, and TRANSPOSE) whose functionality overlaps and complements that of SQL. Using a side-by-side approach, this concise reference guide includes many extensively explained examples showing equivalent DATA step and SQL code. It enables SAS users to take advantage of existing SAS skills and knowledge while learning about SQL. Discussions cover the differences between SQL and the DATA step as well as situations where SQL and the DATA step are used together to benefit from the strengths of each.
Read the first set ofabout this book.
|33||*||The CREATE TABLE statement should be preceded by a PROC SQL statement and followed by a QUIT statement.|
|41||*||The RUN statement which appears in lower case should be in upper case.|
|62||***||The first SELECT statement should be preceded by a PROC SQL statement and followed by a QUIT statement.|
|92||***||The hypothetical CREATE TABLE statement should be preceded by a PROC SQL statement and followed by a QUIT statement.|
|99||**||There are obviously three statements in the code. The first sentence of the paragraph which follows should read: Before we look at the effect of this step (which comprises, apart from the PROC and QUIT statements, a single statement terminated by a single semicolon), let’s look at the syntax and compare it to that of the join.|
|133, 135, 137, 139||**||The running head (chapter indicator preceding the page number) is incorrect.|
|135||***|| In each of the two SELECT statements, whitespace is needed between the FROM and GROUP BY clauses.
FROM twelves GROUP BY sex
|135||**||There should be a blank TITLE statement (TITLE;) after the last SELECT statement. Without it, the last declared title will persist.|
|147||**||There should be blank TITLE and FOOTNOTE statements (TITLE; FOOTNOTE;) after the SELECT statement. Without them, the last declared title and footnote will persist.|
|150||**||The next-to-last %PUT statement is superfluous.|
|154, 164, 173||*||The RUN statements in the PROC DATASETS steps should be indented.|
|163||*|| In the DATA step, several tokens should be in upper case, as in
DATA simple; DO Measure = 1 TO 3; OUTPUT; END; RUN;
|164, 173, 190||**||The PROC DATASETS steps should be terminated with QUIT statements.|
|176||***||Each of the two INSERT statements should be preceded by a PROC SQL statement and followed by a QUIT statement.|
|215||*||The RUN statement which appears in lower case should be in upper case.|
|224||***||The SQL code should be preceded by a PROC SQL statement and followed by a QUIT statement.|
|228||*||To align tokens in the code, one space should be removed before the slash in the BREAK statement.|
|252||***||The CREATE TABLE statement must be followed by a QUIT statement. The statements must be submitted together in order to get accurate timings.|
Severity is reflected in the number of asterisks. One asterisk (*) indicates a minor issue such as a spelling or grammar mistake, or a deviation from style. Two asterisks (**) indicate ambiguity or other source of possible confusion. Three asterisks(***) indicate an incorrect statement (or code which produces incorrect results) where the mistake is fairly obvious. Four asterisks (****) indicate a subtle mistake.
Readers' Questions and Comments
Just edit this page or use the discussion tab.
Q: Some of the examples deliberately trigger ERROR conditions. How does one prevent those from "snowballing" and interfering with subsequent examples.
- A: Within a PROC SQL session, run
- At the global level, run
OPTIONS NOSYNTAXCHECK OBS=MAX;
Q: Is there a second form of CASE, parallel to SELECT (expression); etc.?
- A: Yes. To illustrate, suppose that the destinations for the field trips depended only on age, with all 11-year-olds going to the zoo and all 12-year-olds to the museum. Adjusting the CASE structure in the book to these rules would result in
CASE WHEN age=11 THEN 'Zoo' WHEN age=12 THEN 'Museum' ELSE '[None]' END
- This can also be expressed more compactly as
CASE age WHEN 11 THEN 'Zoo' WHEN 12 THEN 'Museum' ELSE '[None]' END
Q: What happens when PROC PRINT or a DATA step is fed zero observations?
- A: In both situations the behavior is like that of PROC SQL. If you run
PROC PRINT DATA=preteen; WHERE age=10; RUN;
- you see
NOTE: No observations were selected from data set WORK.PRETEEN.
- If you run
DATA tens; SET preteen; WHERE age=10; RUN;
- an empty (zero-observation) data set is created.
Q: Is this usage of DISTINCT (within a function call) allowed in clauses other than SELECT?
- A: Why not? Here's an example.
PROC SQL; SELECT sex FROM sashelp.class GROUP BY sex HAVING MEAN(DISTINCT age) GT 13; QUIT ;
Names less cryptic than u1 and m1 would make things easier to follow.
- The mnemonic intent (not explained in the book) is "U" for "unique" (that is, keys which do not repeat) and "M" for "multiple" (keys which do repeat).
Q: The documentation says that NOT can be coded directly before EXISTS or IN to negate the result of the comparison. Are there similar constructs for the other subquery contexts (ANY, ALL)?
- A: . ANY and ALL themselves cannot be negated, but the comparison operators to which they bind can be negated, or the overall evaluations can be negated. For example:
SELECT name FROM sashelp.class WHERE SUBSTR(name,1,1) EQ ANY (SELECT sex FROM sashelp.class);
- This returns only the name "Mary", the only name in the table beginning with F or M. Now replace EQ with NE:
SELECT name FROM sashelp.class WHERE SUBSTR(name,1,1) NE ANY (SELECT sex FROM sashelp.class);
- As long as the subquery returns at least two distinct values for SEX, the condition will always be true (since any initial will be not equal to one or the other or both). That's the case here, so the result set contains all of the names found in the source table.
- Finally, use EQ but prefix the whole expression with NOT:
SELECT name FROM sashelp.class WHERE NOT SUBSTR(name,1,1) EQ ANY (SELECT sex FROM sashelp.class);
- This returns all names except "Mary" (in other words, the complement of the first result).
Q: Where are ANY and ALL documented? They don't appear in the SQL Procedure Component Dictionary.
- A: ANY and ALL are documented only by example, as part of the explanation of subqueries. See the section "Query Expressions (Subqueries)" in the "sql-expression" page, which does appear in the SQL Procedure Component Dictionary.
Q: Exactly how can the missing values be replaced with zeroes?
- A: Use the COALESCE function, with the subquery as the first argument and zero as the second.
PROC SQL; SELECT fname, age, COALESCE( ( SELECT COUNT(*) FROM classgirls WHERE moregirls.age = classgirls.age AND moregirls.fname ^= classgirls.fname GROUP BY classgirls.age ) , 0 ) AS SameAge FROM moregirls ; QUIT;
The same table name (UNIONALL) was used in an earlier example. Elsewhere, SQL and DATA step examples use the same names for their output. It would be better to have distinct names for all these tables.
pp. 8, 154, 157, 212
Q: The cited section on "Concepts" appears to be nonexistent. Where is the content?
- A: The author would appreciate any information about that. The section did appear in earlier editions of the Procedures Guide, such as 9.1.3.
Q: How can PROC SQL drop the table when PROC DATASETS has just done exactly that?
- A: The two steps are either/or alternatives. So to see both in action, the target table has to be re-created. Here is a briefer example.
DATA demo; RUN; PROC DATASETS; DELETE demo; RUN; QUIT;
DATA demo; RUN; PROC SQL; DROP TABLE demo; QUIT;
Q: Why is the integrity constraint violation an ERROR in SQL but just a NOTE when the same thing is attempted in a DATA step?
- The author does not have a good answer, and would appreciate hearing from anyone who does.
- There's even more to ponder when one looks at the treatment of non-violating rows. The example in the book does not raise this issue, so instead consider:
PROC SQL; INSERT INTO demolib.fifteenups SET fname = "Newbie" SET fname = "Ronald" SET fname = "Lastly" ; QUIT;
- The log reports:
ERROR: Add/Update failed for data set DEMOLIB.FIFTEENUPS because data value(s) do not comply with integrity constraint norepeats. NOTE: Deleting the successful inserts before error noted above to restore table to a consistent state.
- So PROC SQL first inserted the Newbie row, but rolled that back when it detected the duplication in the Ronald row, and stopped before processing the Lastly row. In contrast, here's how the DATA step works:
DATA demolib.fifteenups; fname = 'Newbie'; OUTPUT; fname = 'Ronald'; OUTPUT; fname = 'Lastly'; OUTPUT; STOP; MODIFY demolib.fifteenups; RUN;
- The log shows:
NOTE: The data set DEMOLIB.FIFTEENUPS has been updated. There were 0 observations rewritten, 2 observations added and 0 observations deleted. NOTE: There were 0 rejected updates, 1 rejected adds, and 0 rejected deletes.
- So the observation violating the constraint was not written to the data set, but the other two were.
- Also, note that the DATA step's _IORC_ automatic variable and the SQLRC automatic macro variable can be used to monitor these processes. Keep in mind that the system option UNDOPOLICY and the PROC SQL option UNDO_POLICY affect the behavior of the SQL process. See Usage Note 11051 for details on the interaction of UNDOPOLICY and SQLRC.
Q: Shouldn't the SET statement in the second DATA step on the page be SET GDS_Demo? Similarly, shouldn't the second CREATE TABLE on p. 202 be FROM GDS_Demo?
- A: It doesn't matter. The GDS (generation data sets) feature is about naming and referencing data sets, not about the structure and content of those data sets. There is no requirement that the N+1st generation be derived from the Nth.
The file names in the ODS HTML statements appear to be specific to Windows. Also, the code won't work unless &PATH can be resolved.
Q: Why is the code to create ARRIVALS and DEPARTURES not shown?
- A: From the author's files:
DATA Arrivals; INPUT Name $ Arrival : TIME5.; FORMAT Arrival TIME5.; CARDS; John 14:30 Paul 15:00 Ringo 15:30 George 16:00 ; DATA Departures; INPUT Name $ Departure : TIME5.; FORMAT Departure TIME5.; CARDS; John 15:30 Paul 16:00 Ringo 17:30 George 18:00 ;
Q: Why create the macro variable and include it in a %PUT statement when the outcome is not shown? Also, where did the the column heading in the output originate?
- A: Here is code, from the author's files, which displays the answer as output rather than via a macro variable:
PROC SQL; SELECT COUNT(*) LABEL = 'Occurrences of 654321' FROM myRandoms WHERE myRandom = 654321 ; QUIT;
- The result:
Occurrences of 654321 ----------- 22
There are a lot of page numbers, in the boxes, which don't refer to pages in this book, but rather to pages in the various SAS manuals. It's confusing.
The V. 9.2 doc includes more choices for object-item, specifically
*table-name.* specifies all columns in the PROC SQL table that is specified in table-name. *table-alias.* specifies all columns in the PROC SQL table that has the alias that is specified in table-alias. *view-name.* specifies all columns in the SAS view that is specified in view-name. *view-alias.* specifies all columns in the SAS view that has the alias that is specified in view-alias.
- Also, it says that a "SELECT Clause Lists the columns that will appear in the output", whereas this book seems to use "SELECT Clause" more broadly.
There is no libref MYLIB assigned.
Visit the SAS Press site for this book.
Link to this page: http://tinyurl.com/sqlbook