Concatenation Operations with PROC APPEND

From sasCommunity

Jump to: navigation, search

--KirkLafler 23:02, 24 April 2007 (EDT)
By Kirk Paul Lafler, Software Intelligence Corporation


Contents

[edit] Introduction

The APPEND procedure provides SAS users with an efficient method for concatenating observations from a smaller data set to a larger data set. This paper explores how PROC APPEND is used, how it works, what happens when variables in the smaller data set are not present in the larger data set, and its advantages over other concatenation methods.


[edit] The Essence of PROC APPEND

The APPEND procedure (and APPEND statement of the DATASETS procedure) is an efficient method for concatenating observations from a smaller data set to a larger data set. The BASE= data set option is reserved for the larger of the two data sets with the DATA= option for the smaller data set. Essentially, the APPEND procedure avoids reading any observations in the BASE= data set by positioning the record pointer at the end of the BASE= data set. Each observation from the smaller data set is then applied one at a time to the end of the BASE= data set. In the first example, the BASE= data set identifies a larger data set called MASTER and the DATA= data set identifies the smaller TXN data set.

Code:

PROC APPEND
     BASE=master
     DATA=txn;
RUN;


When two or more data sets need to be concatenated, multiple APPEND procedures are issued. In the next example, two separate PROC APPEND steps are specified to concatenate the two smaller data sets (TXN1 and TXN2) at the end of the larger BASE= (MASTER) data set.

Code:

PROC APPEND
     BASE=master
     DATA=txn1;
RUN;
PROC APPEND
     BASE=master
     DATA=txn2;
RUN;


[edit] Forcing the Concatenation of Data Sets

When one or more variables in the input data set, represented by the DATA= option, are not present in the BASE= data set, or the variables in the input data set are not of the same type as the variables in the BASE= data set, an optional FORCE option can be specified on the PROC APPEND statement to force the concatenation of the data sets. Using the FORCE option also prevents an error from being generated in these situations. The next example illustrates the FORCE option being specified in the concatenation of the TXN data set to the MASTER data set.

Code:

PROC APPEND
     BASE=master
     DATA=txn
     FORCE;
RUN;


[edit] Selecting Observations for Concatenation

By default, all the rows of an input data set are typically concatenated to the end of the base data set. But when only selected rows of an input data set are desired, an optional WHERE clause can be specified for the associated input data set as a data set option. This is accomplished by specifying an optional WHERE clause. The WHERE clause defines the logical conditions that control which observations will be selected from an input data set, and ultimately concatenated to the end of the base data set. In the next example, a WHERE clause is specified to select only ‘PG’ and ‘PG13’ movies from the input data set for concatenation to the end of the base data set.

Code:

PROC APPEND
     BASE=master
     DATA=txn (WHERE=(rating in (‘PG’, ‘PG-13’)));
RUN;


[edit] Conclusion

The APPEND procedure is a powerful tool for SAS users to use for data set concatenation purposes. If used, the smaller data set should be specified as the input data set (DATA= option) and the larger data set as the base data set (BASE= option). Users can also concatenate unlike data sets using the FORCE option on the PROC APPEND statement as well as select which observations are captured from the input data set using a WHERE clause and concatenated to the end of the base data set. Users are encouraged to explore the many capabilities the APPEND procedure (or the APPEND statement of the DATASETS procedure) has to offer.




[edit] Contact Information

If you would like more information or have any questions about this tip, please contact: Kirk Paul Lafler, Software Intelligence Corporation at KirkLafler@cs.com. Kirk has been working with the SAS System since 1979 and is a SAS Certified Professional®. His company provides custom SAS programming, application design and development, consulting services, and hands-on SAS training to clients around the world. Kirk is the author of four books including PROC SQL: Beyond the Basics Using SAS by SAS Institute, and more than two hundred peer-reviewed articles and papers that have appeared in professional journals and SAS User Group proceedings. Kirk can be reached at:

                                       Kirk Paul Lafler
Software Intelligence Corporation
World Headquarters
P.O. Box 1390
Spring Valley, California 91979-1390
E-mail: KirkLafler@cs.com

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Personal tools