Anatomy of a SQL query

From sasCommunity
Jump to: navigation, search

This page describes valid components of a SQL query. The aim is to provide a visual reference of keywords, options, and order. In SAS, these are run within a PROC SQL step:

   proc sql;
       ...
   quit;

Note that a quit; statement ends the step, rather than run;. Multiple queries may be run in the same PROC SQL block. Each one is terminated with a semicolon.[1]

What makes a valid query?

The necessary clauses of a basic query are SELECT and FROM. Other components are optional. Within a clause, square brackets ([]) represent optional components. Note that you can use lower case for these keywords. Using all uppercase letters is done here for demonstration; lowercase letters may be preferred style.

   CREATE TABLE | VIEW new.table_name AS
   [VALIDATE]
   SELECT col1 [as newName format fmtName.]
       [* /* use an asterisk to select all columns from a table or series of tables */]
       [INTO :var1 [separated by "<character>"]]
       [except <subquery>]
   FROM library.table_name [optional alias letter(s)]
       (in-line view allowed here, e.g. subquery [EXCEPT subquery])
       [inner/left/right/full join table_name on ...]
   WHERE (sub-query allowed here)
       and
       or
   GROUP BY
   HAVING (sub-query allowed here)
   ORDER BY var1 [DESC], var2, ...;
   
   [INTERSECT | OUTER UNION [CORR] | UNION [ALL]
   SELECT...]

For more information on joins, which are performed in the FROM clause, see SQL joins.

Subqueries and in-line queries

Subqueries are queries-within-queries. For example:

proc sql;
    create table current_students as
    select *
    from students
    where id not in (select id
                     from graduated_students);
quit;

In-line views are similar: [1]

proc sql;
    create table current_students
    select *
    from (select * from all_students where status = "current");
quit;

SAS implementation-specific

While ANSI standard SQL may be used in SAS, there are also some additional capabilities in SAS. For example, between...and may be used for subsetting in the WHERE clause as an English equivalent of the math operators > and <. For more information, see SQL vs SAS.

If a column name contains spaces, use "Example Name"n in your query.[2]

You can also use the automatic macro variable &sqlobs to get the count from the latest "result set" from a PROC SQL query.[2]

Pass-through queries

You can use Pass Through Queries to query other databases (DB2, Hadoop, Oracle, SQL, Teradata, etc.) using query language native to that database. The syntax changes from the above to use:

proc sql;
    connect to <dbase engine> (user=... password=... server=... path=... <other parameter(s)>)
...
    FROM connection to <dbase engine>
        (<dbase engine-specific query))
    disconnect from <dbase engine>
...
quit;

You can alternatively use a libname statement to connect to an external database, and proceed as if it were a SAS library. By doing this, SAS will autogenerate the actual query passed, and may thus be less efficient than writing your own pass-through query.

SQL mnemonics

There are several mnemonics for remembering the valid order of SQL queries, specifically Select - From - Where - Group by - Having - Order by. Some are:

  • So Few Workers Go Home On-time[3]
  • Some Fishy Workers Get Hooked On-line[3]
  • Sure Fire Way to Get Huge Output[3]
  • Sly Foxes Watch Geese Hide Offspring[3]
  • Some French workers glue hardwood often[4]
  • San Francisco, where the Grateful Dead heads originated[4]

Table manipulation

Here are some example table manipulation statements:

   insert into tableName
       set var1 = value1,
           var2 = value2
       set var1 = value3,
           var2 = value4;
   
   insert into tableName(column1, column2, column3, column4)
       values (val1, val2, val3, val4);
   
   insert into tableName
       values (val5, val6, val7, val8);
   
   update tableName
       set var1 = newValueOrExpression
       where var2 is null;
   
   alter table tableName
       add var1 [type, format, label, etc.]
       drop var1 [, var2, ...]
       modify var1 [new format];
   
   drop index index-name
       from table-name;
   create [unique] index index-name
       on table-name(col-name1, col-name2, ...);
   
   describe table tableName;
   
   drop table table-name [,table-name-2, ...];

Some examples of altering and updating may be found here.

PROC SQL options

You can also pass options in to the proc sql statement. INOBS= and OUTOBS= define how many observations SAS will read and write, respectively. FEEDBACK will print to the log all column names when SELECT * is used. NOEXEC will allow you to verify your syntax, without executing the query or queries within your PROC SQL block. For more information, see SAS Support.

dataset options

Just like in a DATA step, you can use data set options like DROP and KEEP. For more information, read the paper Using Data Set Options in PROC SQL.

See also

Notes

  1. Note that a query contains only one semicolon after the final clause, whether it be FROM, WHERE, GROUP BY, HAVING, or ORDER BY. If one of the set operators are used (intersect, union, etc.), the semicolon is omitted and is inserted after the final query.
  2. Read more from the SAS blog post How many records are in that PROC SQL result?
  3. 3.0 3.1 3.2 3.3 from the SAS SQL 1: Essentials training
  4. 4.0 4.1 from Dear Miss SASAnswers: A Guide to Efficient PROC SQL Coding