Count Four Different Ways with SQL
From sasCommunity
Problem: given this table
data demo; input A; cards; 1 2 3 . 2 3 . 2 ;
count the A's. There are several ways to approach this:
- Include all rows, with COUNT(*); notice that this isn't really specific to column A.
- Include only non-missing values, with COUNT(A).
- Exclude not only missing values, but also duplicate values, with COUNT(DISTINCT A).
- Exclude duplicates, but include missing values (except, of course, duplicate missing values). SQL has no built-in mechanism for this specification, but one can be constructed.
Here is query which produces all four counts:
proc sql;
select count( *) as dupes_in__nulls_in
, count( A) as dupes_in__nulls_out
, count(distinct A) as dupes_out_nulls_out
, count(distinct A) +
max( missing(A) ) as dupes_out_nulls_in
from demo;
quit;
Result:
dupes_in__ dupes_in__ dupes_out_ dupes_out_
nulls_in nulls_out nulls_out nulls_in
------------------------------------------------------
8 6 3 4
Note that these expressions work equally well with numeric and character data. Demonstrate by replacing
input A;
with
input A $;
Q: How can varied special missing values be counted separately?
