As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.
Here are some tips for How to share your SAS knowledge with your professional network.
Kaplan-Meier Survival Plotting Macro %NEWSURV
As part of the decommissioning effort for sasCommunity.org this article/tip has been migrated to communities.sas.com.
- 1 Abstract
- 2 Online Materials
- 3 Key Features
- 3.1 Automates Analysis
- 3.2 Builds its Own Plot Data Set
- 3.3 Customizable Graph
- 4 Examples of Key Parameters
- 4.1 Dataset for Examples
- 4.2 Example 1: Basic Macro Call
- 4.3 Example 2: Plotting Multiple Curves with a CLASS Parameter
- 4.4 Example 3: Modifying the Axes and Curves
- 4.5 Example 4: Modifying the Statistics Table and Showing Event-free Rates
- 4.6 Example 5: Adding Patients-at-Risk
- 4.7 Example 6: Cumulative Incidence
- 4.8 Example 7: Multiple Plots
- 4.9 Example 8: Reference Lines
- 4.10 Example 9: Confidence Intervals
- 5 Expansion Macros
- 6 Contact Info
The research areas of pharmaceuticals and oncology clinical trials greatly depend on time-to-event endpoints such as overall survival and progression-free survival. One of the best graphical displays of these analyses is the Kaplan-Meier curve, which can be simple to generate with the LIFETEST procedure but difficult to customize. Journal articles generally prefer that statistics such as median time-to-event, number of patients, and time-point event-free rate estimates be displayed within the graphic itself, and this was previously difficult to do without an external program such as Microsoft Excel. The macro NEWSURV takes advantage of the Graph Template Language (GTL) that was added with the SG graphics engine to create this level of customizability without the need for backend manipulation. Taking this one step further, the macro was improved to be able to generate a lattice of multiple unique Kaplan-Meier curves for side by side comparisons or condensing figures for publications. The following is a paper describing the functionality of the macro and a description of how the key elements of the macro work.
View the PDF for this paper
View the PowerPoint Slides: Media:BB13-Kaplan_-_Meier_Survival_Plotting_Macro_%NEWSURV.pptx
Version of the macro: Media:Newsurv_pharmasug.sas
Best Paper Award in Advanced Analytics View the PDF for this paper
View the PowerPoint Slides: Media:AA08-Kaplan_-_Meier_Survival_Plotting_Macro_%NEWSURV.pptx
Version of the macro: Media:Newsurv_mwsug.sas
SAS Global Forum 2015
View the PDF for Media:newsurv_sgf_paper.pdf
View the PowerPoint Slides: Media:newsurv_sgf_slides.pptx
Version of the macro: Media:newsurv_sgf.sas
View the PDF for Media:Pharmasug_2017_paper.pdf
View the PowerPoint Slides: TBA
Version of the macros:
May 21, 2017: Media:newsurv_05212017.sas
January 18, 2016: Media:Newsurv_01182016.sas
March 18, 2016: Media:Newsurv_03182016.sas
May 9, 2016: Media:Newsurv_05092016.sas
August 1, 2016: Media:Newsurv_08012016.sas
The NEWSURV macro automates several analyses commonly done with time-to-event endpoints, and creates a highly customizable plot of publication quality. The macro find number of patients, number of events, time-point estimates, hazard ratios, median time-to-event, and p-value comparisons, and summarizes them in two ways. Firstly they can be placed directly into the plot in a neatly organized table. Secondly they can be displayed in a separate table in a manuscript quality table.
%NEWSURV Automates the following analyses using the mentioned procedures:
- Kaplan-Meier Analyses:
- Number of patients/events (PROC LIFETEST)
- Median Time-to-Event w/95% Confidence Bounds (PROC LIFETEST)
- Time-Point Event-Free Rates w/95% Confidence Bounds (PROC LIFETEST)
- Unstratified/Stratified Logrank/Wilcoxon P-value (PROC LIFETEST)
- Patients-at-Risk Counts (PROC LIFETEST)
- Cumulative Incidence Analyses:
- The code used to compute cumulative incidence was taken from the SAS autocall macro %CIF. This macro uses PROC IML to do most of the calculations, but since this is an optional add-on for some companies, the DATA STEP with array processing was used as a substitute. In order to calculate the p-value the matrix operations within PROC FCMP was used.
- Number of patients/events (DATA STEP)
- Median Time-to-Event w/95% Confidence Bounds (DATA STEP)
- Time-Point Event-Free Rates w/95% Confidence Bounds (DATA STEP)
- Unstratified/Stratified Gray K-sample P-value (DATA STEP/PROC FCMP)
- Patients-at-Risk Counts (DATA STEP)
- Cox Proportional Hazards Regression
- Hazards Ratio w/95% Confidence Bounds (PROC PHREG)
- Wald Parameter P-values (PROC PHREG)
- Wald/Score/Likelihood-Ratio Type 3 Test P-values (PROC PHREG)
- Competing Risks (PROC PHREG - SAS 9.4+ Only)
There are many parameters for customizing the automated analysis, including:
- Stratification where available
- Categorical and continuous adjusting factors for hazard ratios
- Class variable options
- Reference value
- Order of values
- Cumulative incidence variance calculations
- Ability to subset with a WHERE clause without making new data set
- Ability to indicate censor value or event value
- Method for ties within Cox models
Displays Statistics from Analysis in Plot
The macro can feature different statistics from the automated analysis within the graph itself
- Number of events and patients
- Median time-to-event w/95% confidence bounds
- Time-point event-free rates w/95% confidence bounds
- Hazard ratios w/95% confidence bounds (adjusted or unadjusted)
- P-values (stratified or unstratified, adjusted or unadjusted)
- Type-3 tests: Score, likelihood ratio, and Wald
- Covariate level comparisons: Wald
- Patients-at-risk at specified time-points
Builds its Own Plot Data Set
The NEWSURV macro pulls the analysis from the ran procedures and inserts them into a final data set used for plotting. The format used to build the data set follows the following logic:
- Each model has its own set of columns
- The NEWSURV macro can produce multiple plots within the same image in a rectangular array.
- Each individual plot has its own set of variables to make up the components
- Step plots
- Censor marking scatter plots
- Patients-at-risk block plots
- Confidence bounds
- Reference lines
- Each Class level has its own set of columns
- Within each plot, each level of the class variable has its own set of columns. This allows easier customization to each component of the plot
- Time and survival function for step plot and scatter plots
- Time and patients-at-risk numbers for block plot
- Confidence bounds for survival function band plots
- Class stratum level for identification purposes
<Insert Example of data set here>
Nearly all parts of the graph are customizable with multiple options. The following sections cover options associated with common components.
Step Plot Curves and Confidence Bounds
- Thickness, color, and pattern
- Thickness, color, pattern, fill color, and transparency of band plot
Censor Indicating Scatter Plot
- Size and color
Patients-at-Risk Block Plot
- Size and weight of font
- Location of block plot (Above x-axis or below x-axis)
- Location of block plot labels (Left of block plot, above block plot, none)
- Match color to step plot curves
- Text size and weight
- Statistics column header text
- Which statistics are displayed and their order
- Number of patients and number of events
- Number of patients and number of events separately ###
- Number of events/number of patients ###/###
- Median time-to-event w/95% confidence bounds ###.# (###.#-###.#)
- Hazard ratios w/95% confidence bounds ###.# (###.#-###.#)
- Time-point event-free rates
- Column for time-points (e.g. 60 days, 4 months, etc) which can be disabled
- Column for event-free rates w/95% confidence bounds ###.# (###.#-###.#)
- P-value format (#.####), values less than 0.0001 shown as >0.0001
- Column for covariate level Wald p-values
- Section for type 3 p-value of the class variable
- Manually entered comments are available with the TABLECOMMENTS parameter
- Text size and weight
- Overall titles/footnotes for the image as well as individual titles/footnotes for each plot
- Superscripts/subscripts/Unicode available
- Y-axis can be set to proportions or percentages
- X-axis can be transformed into other time units (days to months, etc.)
- Automatic or manual labels
- Automatic labels will be "Percent/Proportion with Event" for Y-axis and the time variable's label for the X-axis
- Manual labels can be entered to override automatic labels
- Label and tick value text size and weight
- Top and right frames can be disabled
Examples of Key Parameters
Dataset for Examples
Make the dataset with the following code (also in macro documentation)
proc format; value grpLabel 1='ALL' 2='AML low risk' 3='AML high risk'; run; data BMT; input DIAGNOSIS Ftime Status Gender@@; label Ftime="Days"; format Diagnosis grpLabel.; datalines; 1 2081 0 1 1 1602 0 1 1 1496 0 1 1 1462 0 0 1 1433 0 1 1 1377 0 1 1 1330 0 1 1 996 0 1 1 226 0 0 1 1199 0 1 1 1111 0 1 1 530 0 1 1 1182 0 0 1 1167 0 0 1 418 2 1 1 383 1 1 1 276 2 0 1 104 1 1 1 609 1 1 1 172 2 0 1 487 2 1 1 662 1 1 1 194 2 0 1 230 1 0 1 526 2 1 1 122 2 1 1 129 1 0 1 74 1 1 1 122 1 0 1 86 2 1 1 466 2 1 1 192 1 1 1 109 1 1 1 55 1 0 1 1 2 1 1 107 2 1 1 110 1 0 1 332 2 1 2 2569 0 1 2 2506 0 1 2 2409 0 1 2 2218 0 1 2 1857 0 0 2 1829 0 1 2 1562 0 1 2 1470 0 1 2 1363 0 1 2 1030 0 0 2 860 0 0 2 1258 0 0 2 2246 0 0 2 1870 0 0 2 1799 0 1 2 1709 0 0 2 1674 0 1 2 1568 0 1 2 1527 0 0 2 1324 0 1 2 957 0 1 2 932 0 0 2 847 0 1 2 848 0 1 2 1850 0 0 2 1843 0 0 2 1535 0 0 2 1447 0 0 2 1384 0 0 2 414 2 1 2 2204 2 0 2 1063 2 1 2 481 2 1 2 105 2 1 2 641 2 1 2 390 2 1 2 288 2 1 2 421 1 1 2 79 2 0 2 748 1 1 2 486 1 0 2 48 2 0 2 272 1 0 2 1074 2 1 2 381 1 0 2 10 2 1 2 53 2 0 2 80 2 0 2 35 2 0 2 248 1 1 2 704 2 0 2 211 1 1 2 219 1 1 2 606 1 1 3 2640 0 1 3 2430 0 1 3 2252 0 1 3 2140 0 1 3 2133 0 0 3 1238 0 1 3 1631 0 1 3 2024 0 0 3 1345 0 1 3 1136 0 1 3 845 0 0 3 422 1 0 3 162 2 1 3 84 1 0 3 100 1 1 3 2 2 1 3 47 1 1 3 242 1 1 3 456 1 1 3 268 1 0 3 318 2 0 3 32 1 1 3 467 1 0 3 47 1 1 3 390 1 1 3 183 2 0 3 105 2 1 3 115 1 0 3 164 2 0 3 93 1 0 3 120 1 0 3 80 2 1 3 677 2 1 3 64 1 0 3 168 2 0 3 74 2 0 3 16 2 0 3 157 1 0 3 625 1 0 3 48 1 0 3 273 1 1 3 63 2 1 3 76 1 1 3 113 1 0 3 363 2 1 ; run;
- FTIME: Survival time
- STATUS: Survival status (0=Alive, 1=Death, 2=Other Failure)
- DIAGNOSIS: Type of disease (1='ALL' 2='AML low risk' 3='AML high risk')
- Gender: Patient's gender (0=Female, 1=Male)
Example 1: Basic Macro Call
%newsurv(DATA=bmt, TIME=FTIME, CENS=STATUS, CEN_VL=0, SUMMARY=0);
This is basic macro call using almost only required parameters. The graph is plotted with censor markers and confidence bounds by default, and displays number of patients, number of events, and median time to event within the graph. The following parameters are introduced:
- DATA: The dataset to be used by the macro. This dataset will not be modified by the macro.
- TIME: The variable within DATA that contains the patients time-to-event values. Must be numeric.
- CENS: The variable within DATA that contains the patients event status. Must be numeric.
- CEN_VL: The value of the CENS varaible that contains the censor value. Must be a numeric value.
- SUMMARY: This parameter determines if the table summary is displayed along with the plot. 1 is Yes and 0 is No.
Example 2: Plotting Multiple Curves with a CLASS Parameter
%newsurv(DATA=bmt, TIME=FTIME, CENS=STATUS, CEN_VL=0, SUMMARY=0, CLASS=DIAGNOSIS, CLASSREF=ALL,CLASSORDER=1 3 2);
This example shows how to add a CLASS parameter to produce grouped survival curves. Doing so also adds hazard ratios and a p-value to the plot statistical summary table. The example shows how to use other parameters related to the CLASS variable. The following new parameters are introduced:
- CLASS: Variable used to group the plots. Can be character or numeric.
- CLASSREF: Level of the CLASS variable that is to be used as the reference for the hazard ratios. The value specified must be one of the formatted values of the CLASS variable. If not specified then the last value alphabetically is used.
- CLASSORDER: Reorders the CLASS levels within the plot. The values are sorted alphabetically by formatted values by default. They are reordered by entering a numbered list. In this example 1 3 2 is used which specifies the first, then third, then second alphabetical levels. Ordering by 1 2 3 would match the default.
Example 3: Modifying the Axes and Curves
%newsurv(DATA=bmt, TIME=FTIME, CENS=STATUS, CEN_VL=0, SUMMARY=0, CLASS=DIAGNOSIS, CLASSREF=ALL,CLASSORDER=1 3 2, COLOR=black red green, PATTERN=solid, LINESIZE=3pt, SYMBOLSIZE=10pt, XDIVISOR=30.44, XMAX=100, XINCREMENT=10, XLABEL=Months, YTYPE=PPT, YLABEL=Proportion Alive);
This example shows how to change the X and Y axis scale and labels as well as the attributes of the Kaplan-Meier curves. The x-axis is changed from days to months (this does not affect the original dataset) and the tick values are determined by XMAX and XINCREMENT. The y-axis is changed to proportion using the YTYPE parameter. The Kaplan-Meier curves can have their colors, patterns, thickness, and censor symbols modified. The following new parameters are all introduced:
- COLOR: Provides a list of colors that will be applied to the lines in the order of the CLASS variable. If only one color is provided then that color is used for all lines. Default is black.
- PATTERN: Provides a list of patterns that will be applied to the lines in the order of the CLASS variable. If only one pattern is provided then that pattern is used for all lines. Numbers between 1 and 42 can be used as well as certain keywords like SOLID. The default is AUTO, which makes all lines solid when colors are specified and different patterns when only one color is provided.
- LINESIZE: Provides the thickness of the graph lines. Must be a number followed by pt. Default is 1pt.
- SYMBOLSIZE: Provides the size of the censor symbols. Must be a number followed by pt. Default is 3pt.
- XDIVISOR: Converts the time variable to other units by dividing by a scalar value. Does not affect the original dataset.
- XMAX: Determines the maximum value of the x-axis based on final converted units. Computed automatically by default.
- XINCREMENT: Determines the tick value increment of the x-axis of final converted units. Computed automatically by default.
- XLABEL: Determines the label of the X-axis. Default is the time variable label.
- YTYPE: Determines if the Y-axis is in percentages or proportions. Default is PPT (Percent).
- YLABEL: Determines the label of the Y-axis. Default is either Percentage with Event or Proportion with Event.
Example 4: Modifying the Statistics Table and Showing Event-free Rates
%newsurv(DATA=bmt, TIME=FTIME, CENS=STATUS, CEN_VL=0, SUMMARY=0, CLASS=DIAGNOSIS, CLASSREF=ALL,CLASSORDER=1 3 2, COLOR=black red green, PATTERN=solid, LINESIZE=3pt, SYMBOLSIZE=10pt, XDIVISOR=30.44, XMAX=100, XINCREMENT=10, XLABEL=Months, YTYPE=PPT, YLABEL=Proportion Alive, TIMELIST=20 40,TIMEDX=months,DISPLAY=legend timelist);
This example demonstrates changing which statistics are shown within the plot table and how to display Kaplan-Meier time-point event-free rates. One or more event-free rates can be specified, and they will be displayed vertically. The column that shows the time-point can be disabled when displaying only one time-point. The following new parameters are all introduced:
- DISPLAY: Controls which statistics are displayed within the plot summary table. Items are displayed in the same order that they are listed within the macro variable. The default changes depending on what kind of plot is being displayed.
- TIMELIST: Enter list of one or more time-points in the final X-axis units to calculate time-point event-free rates. Can be a simple list of numbers (xx xx xx xx) or a list in loop format (xx to xx by xx).
- TIMEDX: This parameter will be used to display the units within the time-point estimates. For example, entering Months will add the text Months after each time-point number within the plot.
Example 5: Adding Patients-at-Risk
%newsurv(DATA=bmt, TIME=FTIME, CENS=STATUS, CEN_VL=0, SUMMARY=0, CLASS=DIAGNOSIS, CLASSREF=ALL,CLASSORDER=1 3 2, COLOR=black red green, PATTERN=solid, LINESIZE=3pt, SYMBOLSIZE=10pt, XDIVISOR=30.44, XMAX=100, XINCREMENT=10, XLABEL=Months, YTYPE=PPT, YLABEL=Proportion Alive, RISKLIST=0 to 100 by 10,RISKLOCATION=BOTTOM,RISKCOLOR=1);
This example gives a basic demonstration of adding the patients-at-risk counts to the bottom of the plot. There are many options to customize the location of the labels, headers, where the table is printed, and even what type of numbers are shown. The following new parameters are all introduced:
- RISKLIST: Enter list of one or more time-points in the final X-axis units to display the current patients-at-risk. Can be a simple list of numbers (xx xx xx) or a list in loop format (xx to xx by xx). Does not have to be the same as TIMELIST and does not have to match X-axis tick marks.
- RISKLOCATION: Determines where the patients-at-risk will be drawn. Default is BOTTOM, which displays the numbers below the X-axis. Specifying INSIDE would put the numbers under the curve above the X-axis.
- RISKCOLOR: Determines if the patients-at-risk numbers are colored to match the Kaplan-Meier curves. Default is 0 (no). Specifying 1 will color the numbers, and can potentially make matching the numbers to the curves visually easier.
Other useful options:
- PARHEADER: Determines the header used above the patients-at-risk table. If left blank then will not be drawn.
- PARHEADERALIGN: Determines where the header will be drawn. Options are LEFT, CENTER, RIGHT, and LABELS. Specifying LABELS will place the header above the labels to the left of the numbers.
- RISKLABELLOCATION: Determines where the labels for the numbers are drawn. Options are LEFT, ABOVE and null. Specifying LEFT draws the labels to the left. Specifying ABOVE draws the labels above the numbers in their own row which is useful for long labels. Specifying nothing will cause the labels to not be drawn. This can be useful when pairing with RISKCOLOR.
- PARDISPLAY (New): Determines what numbers are shown in the patients-at-risk table. One or more items can be listed. Options are PAR (Patients-at-risk), NCENS (Number of cumulative censors), and NEVENTS (Number of cumulative events). Two combinations, PAR_NCENS and PAR_NEVENTS are also allowed.
Example 6: Cumulative Incidence
%newsurv(DATA=bmt, TIME=FTIME, CENS=STATUS, CEN_VL=0, SUMMARY=0, CLASS=DIAGNOSIS, CLASSREF=ALL,CLASSORDER=1 3 2, COLOR=black red green, PATTERN=solid, LINESIZE=3pt, SYMBOLSIZE=10pt, METHOD=CIF, EV_VL=1);
This example shows how to plot cumulative incidence instead of Kaplan-Meier curves. The following new parameters are all introduced:
- METHOD: Determines which method is used to generate the curves. Options are KM (Kaplan-Meier) or CIF (Cumulative Incidence Function).
- EV_VL: Determines which value of the status variable is the event of interest. Non-event and non-censor values are considered other events.
Example 7: Multiple Plots
%newsurv(DATA=bmt, TIME=FTIME, CENS=STATUS, CEN_VL=0, SUMMARY=0, CLASS=DIAGNOSIS, CLASSREF=ALL,CLASSORDER=1 3 2, COLOR=black red green, PATTERN=solid, LINESIZE=3pt, SYMBOLSIZE=10pt, METHOD=KM|CIF, EV_VL=1, SREVERSE=1|0, NMODELS=2, ROWS=2, AUTOALIGN=BOTTOMRIGHT|TOPRIGHT);
This example shows how to produce multiple plots in a lattice diagram. Any options that will be different between plots have the | (capital \) delimiter to designate different settings per option. Any options without a | delimiter will keep the same settings across all models. This example also demonstrates the difference when plotting CIF versus 1-Survival. The following new parameters are all introduced:
- SREVERSE: Determines if Survival or 1-Survival is plotted. 1-Survival is not the same as CIF. Default is 0 (Survival) and 1 indicates 1-Survival.
- NMODELS: Determines how many models will be run. Default is 1.
- ROWS: Determines how many rows will be in the graph lattice. Default is 1.
- AUTOALIGN: Determines where the statistical summary table is shown within the plot. Default is TOPRIGHT. Can be anchored to any of the 9 primary points of the plot (TOPRIGHT, TOP, TOPLEFT, LEFT, CENTER, RIGHT, BOTTOMRIGHT
Example 8: Reference Lines
%newsurv(DATA=bmt, TIME=FTIME, CENS=STATUS, CEN_VL=0, SUMMARY=0, CLASS=DIAGNOSIS, CLASSREF=ALL,CLASSORDER=1 3 2, COLOR=black red green, PATTERN=solid, LINESIZE=3pt, SYMBOLSIZE=10pt, REFLINES=medians,REFLINEAXIS=both);
This example shows how to add reference lines. Reference lines can highlight two different items: medians and time-point estimates. The reference lines can be dropped to either axis. The following new parameters are all introduced:
- REFLINES: Determines which statistic is used for the reference lines. Either MEDIANS or TIMEPOINTS are allowed. If TIMEPOINTS is specified then all time-points in TIMELIST are shown.
- REFLINEAXIS: Determines which axis the reference lines are drawn to. Options are X, Y or Both.
- REFLINEMETHOD: Determines if reference lines are drawn from the KM curves or across the whole plot. Default is DROP. Options are DROP and FULL.
Example 9: Confidence Intervals
%newsurv(DATA=bmt, TIME=FTIME, CENS=STATUS, CEN_VL=0, SUMMARY=0, CLASS=DIAGNOSIS, CLASSREF=ALL,CLASSORDER=1 3 2, COLOR=black red green, PATTERN=solid, LINESIZE=3pt, SYMBOLSIZE=10pt, PLOTCI=1);
This example shows how to add confidence intervals. Confidence intervals are automatically added for graphs with no CLASS variable. Confidence intervals can be added as a filled background, as lines, or both. By default only a filled background is used similar to the LIFETEST procedure. The following new parameters are all introduced:
- PLOTCI: Determines if confidence intervals are drawn. Default is 2, which makes confidence intervals if no CLASS variable is provided, but no confidence intervals if a CLASS variable is provided. Options are 1 to force confidence interval and 0 to disable confidence interval.
Multivariate models are often necessary in survival analysis in order to account for confounding factors. Adjusting for other factors can dramatically change outcomes such as hazard ratio. When outcomes are dramatically changed in a multivariate model it can be inappropriate to plot the unadjusted curves. There are numerous methods available for creating adjusted survival curves, and none are the correct method for all situations. Thus there was a need to have a series of macros that could create the high quality plots of NEWSURV, but with the appropriate methodologies for adjusted survival curves. These macros were designed to adjust survival curves using either the direct adjustment or inverse weights methodologies. A third macro, NEWSURV_DATA, allows the user to pre-calculate their own survival curves and then plot them with the customization of NEWSURV.
Example 10: Adjusted Survival Curves
%newsurv_adj_invwts(DATA=bmt, TIME=FTIME, CENS=STATUS, CEN_VL=0, SUMMARY=0, CLASS=DIAGNOSIS, CLASSREF=ALL,CLASSORDER=1 3 2, COLOR=black red green,LINESIZE=3pt, SYMBOLSIZE=10pt, CLASSCOV=gender);
This example shows how to make adjusted survival curves using the inverse weights methods. The macro parameters are mainly similar to NEWSURV. The following new parameters are all introduced:
- CLASSCOV: Specifies discrete covariates to adjust the survival curves, hazard ratios, and p-values by. This is also available in NEWSURV, but will not adjust the actual curves.
- CONTCOV: Specifies continuous covariates to adjust the survival curves, hazard ratios and p-values by. This is also available in NEWSURV, but will not adjust the actual curves.
- PLOT_UNADJUST: Determines if the unadjusted survival curves are drawn. Default is 1 (Yes).
Personal E-mail: email@example.com