As the first step in the decommissioning of the site has been converted to read-only mode.

Here are some tips for How to share your SAS knowledge with your professional network.

Tips:Counting and miscounting occurrences using SQL

From sasCommunity
Revision as of 21:19, 4 November 2015 by Frankfry (Talk | contribs)

Jump to: navigation, search

Here's a simple question: How many company managers are not in the sales department? It sounds like a perfect candidate for an SQL sub-query. Count the number of managers who don't have an ID in the sales department.

proc sql;
select count(distinct manager_id) 'Count of Non-Sales Managers'
from company_master
where manager_id not in
(select employee_id from sales_department);

In this particular situation, however, there's a catch. The company's CEO doesn't have a manager. Therefore, the CEO's manager_id value is missing in the company_master dataset. The SQL count function does not count missing values and the answer is wrong. The CEO, although being the highest level manager in the company, isn't counted.

One fix to this 'mis-count' is to add the counts of missing and existing values: Select count(distinct manager_id + nmiss(manager_id) 'Count of Non-Sales Managers'. But who would ever think of doing this when writing sub-queries? It's just not intuitive.

Instead of using a sub-query technique, an inline view would count correctly and provide the correct answer. Why? Because an inline view counts occurrences, not values.

proc sql;
select count(*) 'Count of Non-Sales Managers'
from (select distinct manager_id from company_master
      except all
      select employee_id from sales_department)

Programmers like sub-queries. They are easy to understand. They build look-up lists which can be searched with the outer query. However, they may not be bulletproof. Inline views are almost as easy to understand and may be more reliable.

....see also

Submitted By Jim Sattler