As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.

Here are some tips for How to share your SAS knowledge with your professional network.

# User:Paulkaefer/sandbox

*This is a work in progress. You can contribute to this article.**Once it is at a reasonable point, it will be...joined...with the SQL joins page.*

**SQL joins** are the SQL equivalent of DATA Step merges. This page explains the way SQL joins are done using SAS PROC SQL, as well as key concepts.

## Introduction: SAS syntax

The general syntax is as follows. Multiple tables may be joined together in one query, up to 256.^{[1]}

proc sql;SELECT col1, col2, etc. FROM table1 inner/left/right/full join table2 on ... [inner/left/right/full join table3 on ...] [...]quit;

Other SQL statement syntax may also be used to subset the data to be joined. See Anatomy of a SQL query for more on SQL syntax.

## Contents

## Relational databases

The idea of relational databases is to have multiple tables that may be linked to other tables using unique keys. For example, in a company's database, an **employees** table may be linked to a **salary** table via a unique employee ID.

For much more on relational databases, see the Wikipedia article on the subject.

## What does a join do?

A join transforms data by combining multiple sets together. While this sounds like a simple task, there are many possible combinations, even for just two input sets. Do you want to include items that have a match in both sets, or items that only match one set? Understanding basic set theory will reveal that there are many possible **subsets** based on the combination of two sets.

### Join terminology

A **subset** is a "set within a set." A subset may have some or all of the elements in its **superset**. If this is confusing, read more and see simple examples at Wikipedia. You can join on full datasets or subsets, by filtering out the data via a DATA step or SQL query.

A **union** is all the distinct elements that are in one or both sets.^{[2]} **Intersection** refers to the elements that belong to both sets, not one or the other.^{[3]} In joins, the keywords UNION and INTERSECT are used.

The **universal set** is the set of all possible elements. For example, the universal set of numbers contains pi and -42. The subset of integers does not contain pi, and the subset of positive integers does not contain -42. A dataset is likely a subset of a universal set, but in some cases, a data table may contain every possible outcome or event in a given field.

SQL joins may be performed on more than two sets, following the same rules.

## Types of joins

In the following explanations, these two tables will be used as examples:

### Natural join

no keys; a.k.a. Cartesian product

### Cartesian join

The Cartesian join or Cartesian product of

### Inner join

### Outer join

Left, right, or full

### Left join

### Right join

A right join is the same as a left join, but on the other set being joined (on the "right side" of the query). If the order of the datasets is switched, a left join will become a right join.

### full join

Full outer join ref: [5]

Full outer join vs. cross join / Cartesian product, with an example: [6]

### Other types of joins

See SQL Joins -- The Long and The Short of It.

## Keys and indexes

Indexes: basic or composite, see [7]

### Primary key

### Secondary key

### Foreign key

### Business key

### Surrogate key

## Things to consider

- Chasm trap [8]
- Fan trap [9] <-- not SAS
*per se*, but has examples of this and chasm trap - Hash join (up with indexes?) [10] [11]
- Sort-merge join [12]
- table scans [13] [14]

- One example on why they may not be recommended: [15]

- magic numbers; just avoid them? see [16]; maybe ask Cameron to clarify further or provide expertise
- threaded joins [17]
- partitioning
- normalization and denormalization
- And so on....

## See also

- Join (SQL) on Wikipedia, which goes much further in depth, but not including SAS-specifics
- Introduction to Proc SQL, which contains Venn diagrams of what joins look like
- Queries, Joins and Where Clauses, Oh My! Demystifying PROC SQL, which contains plenty of code examples
- MERGING vs. JOINING: Comparing the DATA Step with SQL

- Cartesian Product Joins, for the "default" join if no join criterion are given
- SQL Set Operators: So Handy Venn You Need Them
- Say NO to Venn Diagrams When Explaining JOINs, an alternative for visually explaining joins (discussion on reddit)