Count Four Different Ways with SQL

From sasCommunity

Jump to: navigation, search

Problem: given this table

data demo;
input A;
cards;
  1
  2
  3
  .
  2
  3
  .
  2
;

count the A's. There are several ways to approach this:

  • Include all rows, with COUNT(*); notice that this isn't really specific to column A.
  • Include only non-missing values, with COUNT(A).
  • Exclude not only missing values, but also duplicate values, with COUNT(DISTINCT A).
  • Exclude duplicates, but include missing values (except, of course, duplicate missing values). SQL has no built-in mechanism for this specification, but one can be constructed.

Here is query which produces all four counts:

proc sql;
select   count(         *)   as dupes_in__nulls_in
       , count(         A)   as dupes_in__nulls_out
       , count(distinct A)   as dupes_out_nulls_out
       , count(distinct A) +
           max( missing(A) ) as dupes_out_nulls_in
 from demo;
quit;

Result:

  dupes_in__    dupes_in__    dupes_out_    dupes_out_
    nulls_in     nulls_out     nulls_out      nulls_in
------------------------------------------------------
           8             6             3             4

Note that these expressions work equally well with numeric and character data. Demonstrate by replacing

input A;

with

input A $;

Q: How can varied special missing values be counted separately?

Personal tools