As the first step in the decommissioning of the site has been converted to read-only mode.

Here are some tips for How to share your SAS knowledge with your professional network.

Tips Talk:Nodupkey using PROC SQL

From sasCommunity
Jump to: navigation, search

Good tip. Any internal or external references? --Art Carpenter 20:19, 13 February 2012 (EST)

Do we want to promote a tip that is dependent upon an undocumented function? --Art T 18:54, 14 February 2012 (EST)

That is a good point, but do you know under what circumstances, if any, it fails? Usage note 15138 is a bit vague not to mention dated. Perhaps at least a caveat would be warranted. undated comment by ArtCarpenter

Art, no, I have NEVER seen it fail! But I seriously still question why SAS refuses to add it to the documentation, thus am always concerned about recommending it if it is going to be used in production code.--Art T 18:40, 15 February 2012 (EST)

I've seen it fail under some unusual circumstances (e.g., views and external data references). After releasing the monotonic() genie, I've advocated putting it back in the bottle. Preceding a SQL query with a Data step view that sets each row and adds a row counter (_N_) to it will allow us to replace the yield of monotonic with the row counter. One can then not select or drop the row counter from any dataset created. S

I will agree to withdraw my 'Good Tip'. In light of this discussion a better tip might be to write something about why not to use it. The monotonic function has shown up several time in the SAS Forums. It would be good to have an article to reference. Perhaps with an example. --Art Carpenter 01:27, 28 April 2012 (EDT)

The Use of monotonic() is just an example in here. It can be used with many other summary functions like min and max and obtain much useful results as illustrated below:

 data sample;
   input x $ y $ z $ w ;
   a1 b1 c2  7
   a1 b1 c1  5
   a1 b2 c3  2
   a1 b2 c2  9
   a1 b2 c6  6
   a1 b1 c4  4
   a2 b1 c1  6
   a2 b1 c2  4
   a2 b1 c4  1
   a2 b2 c7  7
   a3 b3 c8  9
 proc sort data = sort_out ; /* sorted against z to keep record with minimum of z */
   by x y z;
 proc sort data = sample out = sort_out nodupkey;
   by x  y;

The above 2 steps of proc sort can be simply done by following Proc sql block:

 proc sql;
   create table sql_out(drop = _temp_v)
        as select * ,min(z) as _temp_v from sample group by x,y 
                  having min(z) = z ;

These senarios are in very common where in we need to sort our data according to many field but may be delete/keep/select records based on some or those sorted fields.

Since the use of the MONOTONIC function is incidental to the tip then OP should consider using this alternative SQL step. --Art Carpenter 09:39, 20 July 2012 (EDT)

Actually I don't think the sort step is guaranteed to always return the first (or the last) record in the group. And the alternative SQL above does not handle the case where there are two records with the smallest value of z within a group. Perhaps the point of the tip needs to be rethought? Or qualified so that distinct records are always selected - but with the caveat that they may not always include the same record for each set of dups. --Don Henderson (talk) 20:35, 20 January 2014 (CST)

It's been a while; is there a next step? -- Charlie Shipp (talk) 01:04, 2 May 2015 (CDT)

I'm also wondering about a next step. I like the overall idea, especially showing how much coding is saved with PROC SQL, but I think it would be good to avoid the MONOTONIC function. This is an older discussion and I don't know if we are still in contact with the OP. We might want to discuss this at our next CAB meeting. --Otterm1 (talk) 22:12, 2 May 2015 (CDT)

Usage Note 39458 says the MONOTONIC function can be used for detailed data but the record numbers do not display correctly if the data is being aggregated. That being said, using an undocumented function without explanation, not even a comment, when it is an integral part of the procedure is not good programming practice. - Cameron (talk) 07:49, 4 July 2015 (CDT)