As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.


Here are some tips for How to share your SAS knowledge with your professional network.


Solutions in the Round -- How Many Observations in the Dataset

From sasCommunity
Jump to: navigation, search

Introduction

From the WUSS abstract for this topic

There are many ways to determine the number of observations in a dataset. We will identify several methods for doing this and will discuss the merits of each. Join us for this group discussion! Solutions in the Round is an exciting new section [at WUSS 2014]. The format is simple: we sit in a circle and discuss the many ways to approach and address a programming problem, and we'll identify situations where one approach might be more useful than another. We will keep notes, and those notes will be posted on sasCommunity.org where we can continue the discussion and allow other to join in after the conference. We hope to engage users from all different perspectives and experience levels to participate in these discussions.

Discussion at WUSS

Suggested solutions:

  • PROC CONTENTS
  • Data Step
if 0 then  set xyz nobs=nobs;
only if no deleted observations
maybe not for viewers
  • PROC SQL
proc sql 
   select nobs
   into :nobs
   from dictionary.tables
   where libname eq "&libref" and memname eq "&data" and memtype eq 'Data';
quit;
 
libname eq "%uppercase(&libref)"
  • &sqlobs
  • PROC FREQ with NLEVELS option
  • other macro option:
dsid=%sysfunc(open(dsn));
nobs=%sysfunc(attrn(dsid, nobs);
  • NLOBSF
  • PROC MEANS with N and NMISS
  • Kirk Lafler metadata webinar available
  • other data step option
data a;
   set b end=last;
   x=_n_;
   if last then output;
run;
  • other data step option -- needs an output statement
data a;
   found_some=0;
   set b (where clause);
   found_some=I; /*not sure if this is supposed to be an I or a 1*/
   stop;
run;

Further Discussion -- Open to All

Please join the conversation! Also, if you were one of the live participants, please feel free to correct any mistakes or omissions from our original discussion.

  • First of all, my apologies for the delay in posting this to the site -- several months after the conference. It was a lot of fun to host this section and to collaborate in such a free-formatted section, and I hope that we can continue the discussion here. As the section chair, and in hopes of breaking the ice, I challenged myself to propose the worst solution to every problem. I think that it is important to identify the full spectrum of solutions to a problem, so that we can also identify the things that we are trying to avoid. Also, we can all agree that there is usually not just one solution that fits every single problem. In most cases, we would like a solution that avoids processing through every record on the dataset. However, in cases where there is very little data, this is not always necessary. I look forward to hearing others' thoughts. Cheers, --Otterm1 (talk) 10:58, 30 December 2014 (CST)
  • Many of the above solutions are only applicable if the database engine is SAS. SAS Data Integration Studio counts the number of observations in a table in different ways depending on the type of the database engine. While the NOBS property can be used for SAS tables, this is not available for other database engines. Then Data Integration Studio uses a SQL pass-through procedure with a SELECT statement to COUNT(*) the number of records in the table. Querying the NOBS property is quick to do, but counting records takes time. When, where and how records are counted may change depending on the situation. And you can exploit some situations to count records while doing something else. No one solution is the best in every situation. - Cameron (talk) 14:28, 30 December 2014 (CST)