As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.


Here are some tips for How to share your SAS knowledge with your professional network.


Look-Ahead and Look-Back

From sasCommunity
Jump to: navigation, search


As part of the decommissioning effort for sasCommunity.org this article/tip has been migrated to communities.sas.com.

The new home for this article/tip is Look-Ahead and Look-Back (https://communities.sas.com/t5/SAS-Communities-Library/Look-Ahead-and-Look-Back/ta-p/475772)



A common requirement in programming is to be able to look ahead at the next record or back at the previous record in order to make a calculation or comparison. In the SAS DATA step, look-ahead is trickier than look-back. In SQL, there is symmetry.

For example, start with

data have;
input ID $ Index Measure;
cards;
A 1 11
A 2 12
A 3 13
B 1 21
B 2 22
;

How can each observation be extended to include the next and previous values of MEASURE, with missing values in the appropriate places at the beginning, end, and ID boundaries? In other words, how can one derive this data set:

                            Next_      Prev_
 ID    Index    Measure    Measure    Measure    
A 1 11 12 . A 2 12 13 11 A 3 13 . 12 B 1 21 22 . B 2 22 . 21

Here's a DATA step solution:

data look_both_ways;
set have;
by ID;
set have ( firstobs = 2 keep = Measure rename = (Measure = Next_Measure) )
    have (      obs = 1 drop = _all_                                     );
Prev_Measure = ifn( first.ID, (.), lag(Measure) );
Next_Measure = ifn(  last.ID, (.), Next_Measure );
run;

Notice that the look-back values are supplied by the LAG function, while the look-ahead requires an additional SET statement coordinated carefully with the first one.

Here's a PROC SQL solution in the form of a chain of LEFT JOINs:

proc sql;
create table look_both_ways as
 select   have.*
        , next.Measure as Next_Measure
        , prev.Measure as Prev_Measure           
  from have
       left join
       have as prev
         on have.ID    = prev.ID
            and 
            have.Index = prev.Index + 1
       left join
       have as next
         on have.ID    = next.ID
            and
            have.Index = next.Index - 1
;
quit;

Notice that this SQL solution depends on the INDEX values with their equal increments, not on the order of the rows in the given table.

Here's another SQL solution, one which does not need the INDEX column but rather depends on the strict monotonicity of the column (MEASURE in this case) for which we need the look-ahead and look-back. Instead of a chain of joins, there is a structure of nested queries.

create table look_both_ways as
 select lookback.*, next.Measure as Next_Measure         
  from  ( select have.*, prev.Measure as Prev_Measure         
           from have left join have as prev
             on have.ID = prev.ID and prev.Measure < have.Measure
           group by have.ID, have.Measure
           having prev.Measure = max(prev.Measure)
        ) as lookback 
   left join have as next
     on lookback.ID = next.ID and next.Measure > lookback.Measure
   group by lookback.ID, lookback.Measure
   having next.Measure = min(next.Measure)
;

This may be a bit easier to follow if broken into two statements, with the inline view replaced by a named view.

create view lookback as
 select have.*, prev.Measure as Prev_Measure         
  from have left join have as prev
   on have.ID = prev.ID and prev.Measure < have.Measure
  group by have.ID, have.Measure
  having prev.Measure = max(prev.Measure)
  ;
create table look_both_ways as
 select lookback.*, next.Measure as Next_Measure         
  from lookback left join have as next
   on lookback.ID = next.ID and next.Measure > lookback.Measure
  group by lookback.ID, lookback.Measure
  having next.Measure = min(next.Measure)
  ;

All of the solutions produce the specified results, but that's only because of the way the example is contrived. The DATA step "reads" the order of the observations in the source. The first SQL view reads the evenly separated INDEX values. The second SQL solution reads the sequence of the MEASURE values. In general the three methods will produce different results.


See also: Four methods of performing a look-ahead read