As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.


Here are some tips for How to share your SAS knowledge with your professional network.


NOBS option limitations

From sasCommunity
Jump to: navigation, search

The documentation for the SET statement tells us that NOBS= option initializes a variable "whose value is usually the total number of observations in the input data set or data sets". The inclusion of the word "usually" is notable. The text goes on state:

  • "The number of observations includes those observations that are marked for deletion but are not yet deleted."
  • "For certain SAS views, SAS cannot determine the number of observations. In these cases, SAS sets the value of the NOBS= variable to the largest positive integer value that is available in your operating environment."

In fact there are other circumstances which completely prevent NOBS from working or cause it to produce possibly unexpected results.

Here are two native SAS data files which will be used to illustrate the issues.

data elevens twelves ;
set sashelp.class ;
keep name age sex ;
select (age) ; 
   when (11) output elevens ;
   when (12) output twelves ;
   otherwise ;
   end ;
run ; 

ELEVENS:

Obs     Name     Sex    Age

 1     Joyce      F      11
 2     Thomas     M      11

TWELVES:

Obs     Name     Sex    Age

 1     James      M      12
 2     Jane       F      12
 3     John       M      12
 4     Louise     F      12
 5     Robert     M      12

Well-Behaved Operation of NOBS

Here is a trouble-free exercise of the NOBS option:

data _null_; 
put obscount=F16.0-L ;
stop ;
set twelves nobs=obscount ;
run ;

The result is

obscount=5

Obstacles

Views

Now exercise NOBS while processing a DATA step view:

data vv / view = vv ;
set sashelp.class(where = (age EQ 12) ) ;
run ; 

data _null_ ; 
put obscount=F16.0-L ;
stop ;
set vv nobs=obscount ;
run ;

The result (on a Windows system) is

obscount=9007199254740992

Sequential Engines

Sequential engines also get into trouble. We can use the XML engine, which is sequential, as an example:

libname xmlLib xml 'c:\temp\xml.xml' ;

data xmlLib.ss ;
set sashelp.class ;
run ; 

data _null_ ; 
put obscount=F16.0-L ;
stop ;
set xmlLib.ss nobs=obscount ;
run ; 

libname xmlLib clear ;

The result:

obscount=9007199254740992

Data Base Engines

Another situation which defeats the OBS option is third-party data base engines. Excel, though not a data base, shares the behavior in question. To illustrate, run

libname xlLib excel 'c:\temp\xlBook.xls' ; 

proc datasets library=xlLib nolist ;
delete Sheet1 ;
run ;
quit ;

data xlLib.Sheet1 ;
set sashelp.class ;
run ;

data _null_ ; 
put obscount=F16.0-L ;
stop ;
set xlLib.Sheet1 nobs=obscount ;
run ; 

libname xlLib clear ;

The result is

obscount=9007199254740992

OPEN=DEFER

A final circumstance preventing NOBS from working is the OPEN=DEFER option, as in


data _null_ ; 
put obscount=F16.0-L ;
stop ;
set elevens twelves open=defer nobs=obscount ;
run ; 

The result is

obscount=9007199254740992

Effect of Deleted Observations

Now consider circumstances in which NOBS works, but produces possibly unexpected or unwanted results. Specifically, delete some observations from one of the data sets:

data   twelves ;
modify twelves ;
if sex EQ 'F' then remove ;
run ;  

proc print data=twelves ;
run ;

This is the output:

Obs     Name     Sex    Age

 1     James      M      12
 3     John       M      12
 5     Robert     M      12

Notice the presentation of the original observation numbers. That is one clue that the deleted observations are still there. Indeed, exercising NOBS as in

data _null_ ; 
put obscount=F16.0-L ;
stop ;
set twelves nobs=obscount ;
run ; 

yields

obscount=5

NOBS and WHERE processing

NOBS has no interaction with WHERE processing. Consequently, a count generated using NOBS will include all observations. It will not exclude observations which the WHERE condition excludes. Here is an example:

data _null_; 
put obscount=F16.0-L ;
stop ;
set sashelp.class(where = (age EQ 12) ) nobs=obscount ;
run ;

The result is

obscount=19

Workarounds

The large integers indicating the inability of the NOBS option to operate can be replaced by special missing values:

data _null_ ; 
obscount = ifn (  obscount EQ constant('ExactInt')
                , (.N)
                , obscount
               ) ;
put obscount=F16.0-L ;
stop ;
set vv nobs=obscount ;
run ;

Result:

obscount=N

This represents perhaps a slight improvement in the situation.

To get a logical count of observations (that is, excluding deleted observations), use the ATTRN function instead of the NOBS option, as in

data _null_ ; 
dsid = open('twelves') ;
obscount = attrn(dsid,'nlobs') ;
rc = close(dsid) ;
put obscount=F16.0-L ;
run ;

The result is

obscount=3

See also POINT option limitations

Some of the issues illustrated above affect the MODIFY statement as well as the SET statement.

See Jack Hamilton's paper How Many Observations Are In My Data Set? for a macro which provides a more bullet-proof way of generating observation counts.