As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.

Here are some tips for How to share your SAS knowledge with your professional network.

# A simple and powerful way to simulate your individual time series data

This tip ist taken from the book Applying Data Science - Business Case Studies Using SAS of Gerhard Svolba. It shows how to use a SAS Datastep with random number generators and a SAS Informat to simulate monthly time series data with specific pattern like trend, seasonal variation, breakpoints and outliers. It outlines options to analyze the course of the timeseries with analytical methods to identify breakpoints and outliers.

## Contents

## Prework: Preparing a Lookup Table for the Seasonal Variation

If you want to introduce a specific monthly variation into your data, you could for example use a sequence of IF/THEN/ELSE or SELECT/WHEN statements. A more elegant and flexible solution is to prepare a SAS Informat with the monthly average values.

proc format; invalue fl_mon 1 =438 2 =426 3 =516 4 =494 5 =506 6 =536 7 =566 8 =573 9 =478 10 =508 11 =479 12 =490; run;

This INFORMAT is used with the INPUT function in the datastep to retrieve the respective valule per month.

## Simulating the Data with a SAS Datastep

The datastep that creates the data is explained here step by step.

### Generating Data with a DO Loop

The following statements are used to create the data set FLIGHTS_SIMUL by using a DO loop to loop over the years from 1981 to 2000 and the months 1 to 12.

data flights_simul; *** Initialize the seed for the random number generator; call streaminit(20886); *** you can use any number; format Date yymmp7. Passengers 8.; drop year month; do Year = 1981 to 2000; *** Loop over Years; do month = 1 to 12; *** Loop over Months; *** Prepare the TIME Variable; date = mdy(month,1,year);

Note that no SET statement is used, as no data set is used as input source. The data are created in the DATA step with a nested DO loop. The date variables is created with the MDY function from the month and the year value.

### Defining the Basic Form of the Time Series

In the next step, the seasonal variation, a linear trend, and a random variation is introduced into the data. Note that the scalar, 400, 40, and 1000 in the expressions are just arbitrary and are used to shift and re-scale the distribution of the values.

*** Use the INPUT function to retrieve values from the INFORMAT; passengers = (input(month, fl_mon.)-400)*40;

You see that the SAS informat FL_MON that was previously generated is used to “query” the monthly averages.

- For this purpose the MONTH variable is used in the INPUT function with the informat as created above.
- The resulting value is the monthly average for the respective month.

A positive linear trend is introduced and random variation is added with the RAND function that generates a uniformly distributed number.

*** Add a linear trend to the data; passengers = Passengers + (year-1981+1)*1000; *** Add random variation to the data; passengers = passengers + rand('uniform')*1000;

Note that the RAND function is used here as it should be the best practice to generate random numbers in SAS. This function uses the Mersenne-Twister algorithm and generates random numbers from sequences with a longer period. You could alternatively also use the RANUNI function.

### Adding Structural Changes and Outliers

The following statements are used to add structural changes and outliers in the data. A shift of +20% is introduced for the years 1986 and 1987.

*** Add outliers and level shifts; if year in (1986,1987) then passengers = passengers * 1.2;

The value in 1992 are cumulatively decreased by 300 for each month. The expression "Year in (1992)" shows a coding option to avoid an IF-statement. You receive the same output when using the IF-statement. There are situations where you might want write your value assignment as a one-line expression.

passengers = Passengers + (year in (1992)) * (-month*300);

Positive and negative outliers are introduced for certain months

if date = '01APR1997'd then passengers = passengers * 1.25; if date = '01SEP1998'd then passengers = passengers * 0.8; if date = '01APR1990'd then passengers = passengers * 1.2;

### Output the records and close the Datastep

Finally, the records are output and the DATA step is closed.

*** Output the record; output; end; end; run;

**You see that the SAS datastep is very powerful to simulate your time series data and to specify different types of pattern in the data. You can thus easily generate your data for software demonstrations ortest data for your analyses.**

## Printing Selected Records

The following code prints the records for year 1992. This is the year where the monthly value was cumulatively decreased by 300 every month.

proc print data=flights_simul; where year(date) = 1992; run; Obs Date Passengers 133 1992.01 13962 134 1992.02 12558 135 1992.03 16658 136 1992.04 15133 137 1992.05 15567 138 1992.06 16605 139 1992.07 16903 140 1992.08 17028 141 1992.09 13077 142 1992.10 14298 143 1992.11 12073 144 1992.12 12421

## Plotting the Time Series

The following figure shows the plots of the time series. It was created with the following SAS statements.

proc sgplot data=flights_simul; series x=date y=passengers; run;

## Running further Analyses

This example is taken from case study 2 of my book Applying Data Science - Business Case Studies Using SAS. In case study 2 you find an extensive discussion how to smooth time series data and to detect breakpoints and outliers with differen SAS analytic procedures like PROC ADAPTIVEREG or PROC X13.

### Smoothing the Data with the EXPAND procedure

The data have been smoothed with a 12-month moving average using the CONVERT statement in the EXPAND procedure.

### Detecting Breakpoints with the ADAPTIVEREG procedure

The ADAPTIVEREG procedure has been used to automatically identify the breakpoints in the data. You see that the method has been able to spot the inserted changes in the data.

### Detecting Outliers with the X13 procedure

The X13 procedure has been used to automatically identify the outliers in the data. You see that the method has been able to spot the inserted outliers in the data.

Note that the reference lines have been automatically inserted into the graph based on the detected time points. A tip that explains this method is planned to be added soon. Questions and comments can be sent to the author.