|When it comes to performing PROC SQL joins, users supply the list of tables for joining along with the join conditions, and the PROC SQL optimizer has the task of determining which of the available join algorithms to use for performing the join operation. There are three basic algorithms used in joining:
- Nested Loop Join – When an equality condition is not specified, a read of the complete contents of the right table is processed for each row in the left table.
- Merge Join – When the tables specified are already in the desired sort order, resources will not need to be extended to rearranging the tables.
- Hash Join – When an equality relationship exists, the smaller of the tables is able to fit in memory, no sort operations are required, and each table is read only once.
PROC SQL supports a powerful option called _METHOD. SQL procedure users may find this option worth exploring because it often provides a better understanding of what is actually happening during specific processes, including joins. The various codes produced from the _METHOD option and corresponding descriptions appear at right.
_METHOD Codes and Descriptions
- SQXCRTA Create table as Select.
- SQXSLCT Select statement or clause.
- SQXJSL Step loop join (Cartesian).
- SQXJM Merge join operation.
- SQXJNDX Index join operation.
- SQXJHSH Hash join operation.
- SQXSORT Sort operation.
- SQXSRC Source rows from table.
- SQXFIL Rows filtration.
- SQXSUMG Summary stats (aggregates) with GROUP BY clause.
- SQXSUMN Summary stats with no GROUP BY clause.
The PROC SQL code, shown at right, (using the MOVIES and ACTORS tables), and corresponding SAS Log, illustrates the results from using the _METHOD option with a simple two-way equi-join. The _METHOD option displays information that can help users better understand, as well as tune and debug their join queries. The SAS Log illustrates that the two-source tables are MOVIES and ACTORS, and the join algorithm used by the PROC SQL optimizer is a hash join. Because a hash join utilizes available real memory (and real memory is faster than other types of storage) to perform the join, users frequently experience faster joins than when step loop, merge or index-join algorithms are selected by the optimizer due to the speed of real memory.
PROC SQL _METHOD;
SELECT MOVIES.TITLE, RATING, ACTOR_LEADING
WHERE MOVIES.TITLE = ACTORS.TITLE;
NOTE: SQL execution methods chosen are:
sqxsrc( MOVIES )
sqxsrc( ACTORS )
Feel free to comment on this tip.