SAS® offers several ways that you can find the top n% and bottom n% of data values based on a numeric variable. The RANK procedure with the GROUPS= option is one method. Another method is The UNIVARIATE procedure with the PCTLPTS= option. Because there are several ways to perform this task, you can choose the procedure that you are most familiar with. In this blog post, I use the SUMMARY procedure to generate the percentile values and macro logic to dynamically choose the desired percentile statistics. After the percentiles are generated, I subset the data set based on those values. This blog post provides two detailed examples: one calculates percentiles for a single variable and one calculates percentiles within a grouping variable.
Calculate Percentiles of a Single Variable
Calculating percentiles of a single variable includes the following steps. Within the macro, a PROC SUMMARY step calculates the percentiles. The subsequent DATA step uses CALL SYMPUTX to create macro variables for the percentile values, and the final DATA step uses those macro variables to subset the data. Here is the code, which is explained in detail below:
/* Create sample data */ data test; do i=1 to 10000; x=ranuni(i)*12345; output; end; drop i; run; proc sort data=test; by x; run; %macro generate_percentiles(ptile1,ptile2); /* Output desired percentile values */ proc summary data=test; var x; output out=test1 &ptile1= &ptile2= / autoname; run; /* Create macro variables for the percentile values */ data _null_; set test1; call symputx("&ptile1", x_&ptile1); call symputx("&ptile2", x_&ptile2); run; %put &&&ptile1; %put &&&ptile2; data test2; set test; /* Use a WHERE statement to subset the data */ where x le &&&ptile1 or x ge &&&ptile2; run; proc print; run; %mend; options mprint mlogic symbolgen; %generate_percentiles(p1,p99) %generate_percentiles(p25,p75)
After creating and sorting the sample data, I begin my macro definition with two parameters that enable me to substitute the desired percentiles in my macro invocation:
The PROC SUMMARY step writes the desired percentiles for variable X to the Test1 data set. The AUTONAME option names the percentile statistics in the following format, <varname>_<percentile> (for example, x_p25).
proc summary data=test; var x; output out=test1 &ptile1= &ptile2= / autoname; run;
Next, I want to store the values of the percentile statistics in macro variables so that I can use them in later processing. I use CALL SYMPUTX to do this, which gives the macro variables the same name as the statistic. To see the resulting values in the log, I use
data _null_; set test1; call symputx("&ptile1", x_&ptile1); call symputx("&ptile2", x_&ptile2); run; %put &&&ptile1; %put &&&ptile2;