As the first step in the decommissioning of the site has been converted to read-only mode.

Here are some tips for How to share your SAS knowledge with your professional network.

Difference between revisions of "Tips:Counting and miscounting occurrences using SQL"

From sasCommunity
Jump to: navigation, search
(It's READY; and can also be REFINED as needed.)
(Scheduled for November 6, 2015)
Line 40: Line 40:
<!-- Please do not edit below this line, EXCEPT when promoting a tip -->
<!-- Please do not edit below this line, EXCEPT when promoting a tip -->
<div style="float:right">Submitted By [[User:Jsattler|Jim Sattler]]</div>
<div style="float:right">Submitted By [[User:Jsattler|Jim Sattler]]</div>
[[Category:Tip Ready]]
[[Category:Tip in Use]]
[[Category:Tip Awaiting Refinement]]

Revision as of 21:19, 4 November 2015

Here's a simple question: How many company managers are not in the sales department? It sounds like a perfect candidate for an SQL sub-query. Count the number of managers who don't have an ID in the sales department.

proc sql;
select count(distinct manager_id) 'Count of Non-Sales Managers'
from company_master
where manager_id not in
(select employee_id from sales_department);

In this particular situation, however, there's a catch. The company's CEO doesn't have a manager. Therefore, the CEO's manager_id value is missing in the company_master dataset. The SQL count function does not count missing values and the answer is wrong. The CEO, although being the highest level manager in the company, isn't counted.

One fix to this 'mis-count' is to add the counts of missing and existing values: Select count(distinct manager_id + nmiss(manager_id) 'Count of Non-Sales Managers'. But who would ever think of doing this when writing sub-queries? It's just not intuitive.

Instead of using a sub-query technique, an inline view would count correctly and provide the correct answer. Why? Because an inline view counts occurrences, not values.

proc sql;
select count(*) 'Count of Non-Sales Managers'
from (select distinct manager_id from company_master
      except all
      select employee_id from sales_department)

Programmers like sub-queries. They are easy to understand. They build look-up lists which can be searched with the outer query. However, they may not be bulletproof. Inline views are almost as easy to understand and may be more reliable.

....see also

Submitted By Jim Sattler