As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.


Here are some tips for How to share your SAS knowledge with your professional network.


POINT option limitations

From sasCommunity
Jump to: navigation, search

The documentation for the SET statement tells us that POINT= option "specifies a temporary variable whose numeric value determines which observation is read" and goes on to note a number of restrictions.

In fact there are other circumstances which completely prevent POINT from working or cause it to produce results which are not useful.

Here are two native SAS data files which will be used to illustrate the issues.

data elevens twelves ;
set sashelp.class ;
keep name age sex ;
select (age) ; 
   when (11) output elevens ;
   when (12) output twelves ;
   otherwise ;
   end ;
run ; 

ELEVENS:

Obs     Name     Sex    Age

 1     Joyce      F      11
 2     Thomas     M      11

TWELVES:

Obs     Name     Sex    Age

 1     James      M      12
 2     Jane       F      12
 3     John       M      12
 4     Louise     F      12
 5     Robert     M      12

Well-Behaved Operation of POINT

Here is a trouble-free exercise of the POINT option:

data _null_ ;
obsnum = 3 ;
set twelves point=obsnum ; 
put _all_ ;
stop ;
run ; 

The result is

obsnum=3 Name=John Sex=M Age=12 _ERROR_=0 _N_=1

Obstacles

Sequential Engines

Sequential engines are incompatible with POINT. We can use the XML engine, which is sequential, as an example:

libname xmlLib xml 'c:\temp\xml.xml' ;

data xmlLib.ss ;
set sashelp.class ;
run ; 

data _null_ ;
obsnum = 3 ;
set xmlLib.ss point=obsnum ; 
put _all_ ;
stop ;
run ;

libname xmlLib clear ;

The result:

ERROR: File XMLLIB.SS.DATA is sequential.
       This task requires reading observations in a random
       order, but the engine allows only sequential access.

Data Base Engines

Another situation which defeats the POINT option is third-party data base engines. Excel, though not a database, shares the behavior in question. To illustrate, run

libname xlLib excel 'c:\temp\xlBook.xls' ; 

proc datasets library=xlLib nolist ;
delete Sheet1 ;
run ;
quit ;

data xlLib.Sheet1 ;
set sashelp.class ;
run ;

data _null_ ;
obsnum = 3 ;
set xlLib.Sheet1 point=obsnum ; 
put _all_ ;
stop ;
run ; 

libname xlLib clear ;

The result is

ERROR: The POINT= data set option is not valid for the data set XLLIB.Sheet1,
       the data set must be randomly not sequentially accessible for POINT= processing.

OPEN=DEFER

Another circumstance preventing POINT from working is the OPEN=DEFER option, as in


data _null_ ;
obsnum = 3 ;
set elevens twelves open=defer point=obsnum ; 
put _all_ ;
stop ;
run ; 

The result is

ERROR: The POINT= option is incompatible with OPEN=DEFER.

WHERE

Presence of a WHERE condition affecting the data sets in question also incapacitates POINT. To illustrate:

data _null_ ;
obsnum = 3 ;
set sashelp.class(where = (age EQ 12) ) point=obsnum ; 
put _all_ ;
stop ;
run ; 

The result is

ERROR: The POINT= option is inconsistent with a data set
       with WHERE expression processing active.

Effect of Deleted Observations

Now consider circumstances in which POINT works, but presents difficulties. Specifically, delete some observations from one of the data sets:

data   twelves ;
modify twelves ;
if sex EQ 'F' then remove ;
run ;  

proc print data=twelves ;
run ;

This is the output:

Obs     Name     Sex    Age

 1     James      M      12
 3     John       M      12
 5     Robert     M      12

Notice the presentation of the original observation numbers. That is one clue that the deleted observations are still there. Indeed, exercising POINT as in

data _null_ ;
obsnum = 3 ;
set twelves point=obsnum ; 
put _all_ ;
stop ;
run ;

yields

obsnum=3 Name=John Sex=M Age=12 _ERROR_=0 _N_=1

Of course that's because OBSNUM is designating an observation which was not deleted. See what happens if the 3 is changed to 4:

data _null_ ;
obsnum = 4 ;
set twelves point=obsnum ; 
put _all_ ;
stop ;
run ; 

The result is:

obsnum=4 Name=  Sex=  Age=. _ERROR_=1 _N_=1
obsnum=4 Name=  Sex=  Age=. _ERROR_=1 _N_=1

The missing values reflect the failure of the read operation. Repetition occurs because the "PUT _ALL_;" statement tells SAS to dump the program data vector and the setting of the _ERROR_ flag to 1 causes SAS to automatically dump the PDV. What is odd is the absence of an error message along the lines of "ERROR: ATTEMPT TO PROCESS DELETED OBSERVATION."

Consider a slightly more realistic example. Suppose the task is to present the 1st, 4th, and 5th observations. Simply ignoring the deletion issue, we could run this code:

data boys ;	
do obsnum = 1 , 4 , 5 ;
   set twelves point=obsnum ;
   output ;
   end ;
stop ;
run ; 

The result is

 Name     Sex    Age

James      M      12
James      M      12
Robert     M      12

When the attempt is made to read the 4th observation, it fails because that observation has been marked as deleted (though it is actually still in the file). Nothing is changed in the PDV, so the data values from the previous observation ("James" etc.) remain in place and are written again to the output data set.

Workarounds

The simplest solution is to apply the knowledge that observation #4 is unavailable and exclude it from the loop:

data boys ;
do obsnum = 1 , 5 ;
   set twelves point=obsnum ;
   output ;
   end ;
stop ;
run ; 

The output is

 Name     Sex    Age

James      M      12
Robert     M      12

as desired. But typically the required information is not available. What is needed is a programatic approach which will in effect step over the deleted observations.

One approach is to inspect the _ERROR_ flag after each attempt to read, and make the output trigger conditional:

data boys ;		
do obsnum = 1 , 4 , 5 ;
   set twelves point=obsnum ;
   if _ERROR_ then _ERROR_ = 0 ;
   else output ;
   end ;
stop ;
run ; 

The result is again:

 Name     Sex    Age

James      M      12
Robert     M      12

Another technique exploits the IN= data set option:

data boys ;		
do obsnum = 1 , 4 , 5 ;
   in_flag = 0 ;
   set twelves(in=in_flag) point=obsnum ;
   if in_flag then output ;
   else _ERROR_ = 0 ;
   end ;
stop ;
run ;

As before, the result is

 Name     Sex    Age

James      M      12
Robert     M      12

See also NOBS option limitations

Some of the issues illustrated above affect the MODIFY statement as well as the SET statement.