As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.
Here are some tips for How to share your SAS knowledge with your professional network.
- This is a work in progress. You can contribute to this article.
- Once it is at a reasonable point, it will be...joined...with the SQL joins page.
Introduction: SAS syntax
The general syntax is as follows. Multiple tables may be joined together in one query, up to 256.
proc sql; SELECT col1, col2, etc. FROM table1 inner/left/right/full join table2 on ... [inner/left/right/full join table3 on ...] [...] quit;
Other SQL statement syntax may also be used to subset the data to be joined. See Anatomy of a SQL query for more on SQL syntax.
- 1 Introduction: SAS syntax
- 2 Relational databases
- 3 What does a join do?
- 4 Types of joins
- 5 Keys and indexes
- 6 Things to consider
- 7 See also
The idea of relational databases is to have multiple tables that may be linked to other tables using unique keys. For example, in a company's database, an employees table may be linked to a salary table via a unique employee ID.
For much more on relational databases, see the Wikipedia article on the subject.
What does a join do?
A join transforms data by combining multiple sets together. While this sounds like a simple task, there are many possible combinations, even for just two input sets. Do you want to include items that have a match in both sets, or items that only match one set? Understanding basic set theory will reveal that there are many possible subsets based on the combination of two sets.
A subset is a "set within a set." A subset may have some or all of the elements in its superset. If this is confusing, read more and see simple examples at Wikipedia. You can join on full datasets or subsets, by filtering out the data via a DATA step or SQL query.
A union is all the distinct elements that are in one or both sets. Intersection refers to the elements that belong to both sets, not one or the other. In joins, the keywords UNION and INTERSECT are used.
The universal set is the set of all possible elements. For example, the universal set of numbers contains pi and -42. The subset of integers does not contain pi, and the subset of positive integers does not contain -42. A dataset is likely a subset of a universal set, but in some cases, a data table may contain every possible outcome or event in a given field.
SQL joins may be performed on more than two sets, following the same rules.
Types of joins
In the following explanations, these two tables will be used as examples:
no keys; a.k.a. Cartesian product
The Cartesian join or Cartesian product of
Left, right, or full
A right join is the same as a left join, but on the other set being joined (on the "right side" of the query). If the order of the datasets is switched, a left join will become a right join.
Full outer join ref: 
Full outer join vs. cross join / Cartesian product, with an example: 
Other types of joins
Keys and indexes
Indexes: basic or composite, see 
Things to consider
- Chasm trap 
- Fan trap  <-- not SAS per se, but has examples of this and chasm trap
- Hash join (up with indexes?)  
- Sort-merge join 
- table scans  
- One example on why they may not be recommended: 
- magic numbers; just avoid them? see ; maybe ask Cameron to clarify further or provide expertise
- threaded joins 
- normalization and denormalization
- And so on....
- Join (SQL) on Wikipedia, which goes much further in depth, but not including SAS-specifics
- Introduction to Proc SQL, which contains Venn diagrams of what joins look like
- Queries, Joins and Where Clauses, Oh My! Demystifying PROC SQL, which contains plenty of code examples
- MERGING vs. JOINING: Comparing the DATA Step with SQL
- Cartesian Product Joins, for the "default" join if no join criterion are given
- SQL Set Operators: So Handy Venn You Need Them
- Say NO to Venn Diagrams When Explaining JOINs, an alternative for visually explaining joins (discussion on reddit)