Applying Data Science - Business Case Studies Using SAS

From sasCommunity
Jump to: navigation, search

BAWS Cover V4.JPG DSCS f2p4.jpg DSCS f6p1.jpg DSCS f14p5.jpg DSCS f28p1.jpg

This is the new SAS Press book of Gerhard Svolba. It contains 8 case studies in 28 practical chapters with business explainations, methodological considerations and lots of SAS Code.

Why you want to read this book:

  • This book reflects the author's enthusiasm to use analytical and data science methods to solve business questions and to implement the solution using SAS.
  • It shows you the benefits of analytics, how to gain more insight into your data, and how to make better decisions. In eight entertaining and real-world case studies, Svolba combines data science and advanced analytics with business questions, illustrating them with data and SAS code.
  • Written for business analysts, statisticians, data miners, data scientists, and SAS programmers, Applying Data Science bridges the gap between high-level, business-focused books that skimp on the details and technical books that only show SAS code with no business context.
  • This book is written for a variety of different persona groups and profiles.
    • Business Analysts and Business Experts: Businesspeople can review the examples and see what can be achieved with analytical methods. They get insight into the power of analytics and the additional findings that can be generated by these methods. They might not study the SAS implementation and the code in much detail. They would rather hand over the implementation examples to their data scientist to give them a quick start to apply the methods.
    • Statisticians, Data Miners, Data Scientists, and Quantitative Experts: This group of people might be interested to see how analytical methods can be applied to real-world business questions. They learn how analytical methods that are established in a certain industry might be applied to other areas. They see practical situations and constraints that they can expect to encounter when they apply data science methods.
    • SAS Programmers: The book contains a lot of SAS code, including SAS macros, SAS DATA step code for data preparation, SAS analytics procedures, and SAS graph procedures. In this code SAS programmers can find new ways to solve certain problems in SAS and transfer the solutions in these examples to their day-to-day problems

--> Navigate to the Download Section of the book.

Other books of the author: Data Preparation for Analytics and Data Quality for Analytics See also: Presentations

Find here an overview over the case studies and the chapters More details to follow soon.

News and Important Links

Blog on how to individually simulate time series data.

Purchase options: Amazon and SAS Press Website.

Statement from Wolfgang Hauner, Chief Data Officer, Munich Re

The book perfectly combines data science methods with relevant business case studies. It provides insights how applications of data science can build the basis for better business decisions. The book shows business examples that can directly be transferred to many practical situations Thus it can be read by different persona groups: business experts who look for analysis ideas, data scientist who look for insight how to apply different methods, and SAS programmers that find a lot of SAS code. Wolfgang Hauner, Chief Data Officer, Munich Re

Other Statements

  • I recently purchased your book Applying Data Science. Great book! Anxiously awaiting to download the code files so I can continue to work thru the example programs. (Customer in the Health Care Sector, US)
  • I purchased your book (“Applying Data Science) the other day, and I am thrilled with it. I am so glad you are using EM. ... and have been using proc lifetest (Bingo, your first chapters are relavent!) Your chapters add richness and confirmation for me. I am also forecasting the number of ... per month. So, more chapters are relevant! Thanks so much for your work!. (Customer in the Public Sector, US)
  • (Reply in a forum on survival data mining) ... You and or your customer may also want to check out the first three chapters of the following new SAS publication: Applying Data Science: Business Case Studies Using SAS, which focus in on Survival Analysis (quite excellently, I might add.) (Consulting Company, US)


Visit the SAS Bookstore for more details.

Analytical Methods and SAS Procedures in this book

  • Kaplan-Meier-Estimates – Cox Propotional Hazards Regression – Survival Data Mining – Smoothing of Longitudinal Data – Multivariate Adaptive Regression Splines – Automatic Breakpoint Detection – Automatic Detection of Outliers – ARIMA Models – Linear Regression – Poisson Regression – Quantile Regression – New Product Forecasting – Similarity Search – Imputation of Missing Values – Association Analysis – Benford’s Law – Chi2 Independency Test – Monte Carlo Simulation – Mathematical Programming – Data Matrices – Simulation of Complex Processes
  • LIFETEST - PHREG - ARIMA - X11 - X13 - ADAPTIVEREG - VARCLUS - GLM - TREE - HPGENSELECT - GLMSELECT - QUANTSELECT - QUANTREG - HPQUANTSELECT - IML - SGPLOT - SGPANEL - SGTILE - FREQ - MEANS - TRANSPOSE - SQL

Overview over the Case Studies in the Book

Case Study 1 – Performing Headcount Survival Analysis for Employee Retention

This case study uses employee retention data to illustrate how analytical methods allow you to draw conclusions about the average length of time intervals, even if most of the endpoints have not yet been observed. Survival analysis methods like Kaplan-Meier estimates and Cox Proportional Hazards regression are used to solve the business questions. The case study contains the following chapters:

  • 1 Using Survival Analysis Methods to Analyze Employee Retention Time
  • 2 Analyzing the Effect of Influential Factors on Employee Retention Time
  • 3 Performing Survival Data Mining - The Data Mining Approach for Survival Analysis
  • 4 Visualizing Employee Retention Data

DSCS f1p4.jpg DSCS f2p1.jpg DSCS f2p2.jpg DSCS f2p3.jpg DSCS f2p4.jpg DSCS f2p5.jpg DSCS f2p6.jpg DSCS f2p7.jpg DSCS f2p8.jpg DSCS f2p9.jpg DSCS f3p2.jpg DSCS f4p4.png DSCS f4p7.jpg

Case Study 2 – Detecting Structural Changes and Outliers in Longitudinal Data

This case study shows how analytical methods can be used to automatically detect events and changes in the course of longitudinal data. Example time series data with the number of airline passengers and data from a long-term clinical trial are used to illustrate how data can be smoothed and breakpoints and outliers can be detected. Analytical methods like multivariate adaptive splices regression, ARIMA models, and moving averages are used to solve the business questions in the following chapters:

  • 5 Analyzing the Course of Longitudinal Data
  • 6 Detecting Structural Changes in Longitudinal Data
  • 7 Detecting Outliers and Level Shifts in Longitudinal Data
  • 8 Results from a Simulation Study with Longitudinal Data
  • 9 Analyzing the Variability of Longitudinal Data

DSCS f5p2.jpg DSCS f5p3.jpg DSCS f6p1.jpg DSCS f6p4.jpg DSCS f6p5.jpg DSCS f6p7.jpg DSCS f6p9.jpg DSCS f8p2.jpg DSCS f8p3.jpg DSCS f8p4.jpg DSCS f9p2.jpg DSCS f9p4.jpg DSCS f9p5.jpg DSCS f9p6.jpg

Case Study 3 – Explaining Forecast Errors and Deviations

This case uses regression methods to identify influential factors that have an impact on the forecast accuracy of time series forecasting models. The forecast error usually differs between factors like product group, forecast horizons, and the analytical method that was used to create the forecast. Analytical methods allow you to identify and isolate these effects to provide more insight into the generation of forecasts. This case study also deals with the important question of whether demand planners really improve forecast accuracy with their manual overrides of the statistical forecast. Linear regression and quantile regression are used to analyze these questions in the following chapters:

  • 10 Investigating Forecast Errors with Descriptive Statistics
  • 11 Investigating Forecast Errors with General Linear Models
  • 12 Interpreting the Coefficients of Categorical Variables in Regression Models
  • 13 Using Quantile Regression to Get More than the Average Picture
  • 14 Analyzing the Effect of Manual Overrides in Forecasting

DSCS f10p10.jpg DSCS f10p11.jpg DSCS f14p3.jpg DSCS f14p4.jpg DSCS f14p5.jpg

Case Study 4 – Forecasting the Demand for New Products

This case study shows how demand forecasts can be generated for products that have no or only a short time history of known demand. Methods like Poisson regression or similarity search are used to solve this business question in the following chapters:

  • 15 Performing Demand Forecasting for New Products
  • 16 Using Poisson Regression to Forecast the Demand for New Products
  • 17 Using Similarity Search to Forecast the Demand for New Products

DSCS f16p1.jpg DSCS f17p1.jpg

Case Study 5 – Checking the Alignment with Predefined Pattern

A frequent business question is to verify whether different entities our counterparts show the expected behavior or adhere to predefined patterns or processes. For the different interaction strategies you want, for example, to know which customers show a behavior that is far from what you expected. In financial accounting, the analysis of Benford’s law is often investigated. Methods like the Chi2 independency test are used to verify these assumptions in the following chapters:

  • 18 Checking Accounting Data For the Benford’s Law
  • 19 Checking the Benford’s Law for Multiple Accounts
  • 20 Checking Different Pattern in the Data

DSCS f20p10.jpg DSCS f20p11.jpg

Case Study 6 - Listening to Your Data – Discover Relationships with Unsupervised Analysis Methods

This case study shows how you can receive answers from your data, even if you do not ask every question in detail. You see which features and properties in the data are closely related together. Unsupervised machine learning methods like association analysis and variable clustering are used in the following chapters:

  • 21 Finding Relationships in Your Analysis Data with Association Analysis
  • 22 Using Variable Clustering to Detect Relationships in Your Data
  • 23 Investigating of Clinical Trial Data in an Explanatory Way

DSCS f21p6.jpg DSCS f21p7.jpg DSCS f21p8.jpg

Case Study 7 – Using Monte Carlo Simulations to Understand the Outcome Distribution

This case study shows how simulation studies can be used to get a more comprehensive picture about the outcome distribution. The case study uses the sales projects pipeline of a sales manager and answers the questions about the likelihood that the sales manager might get fired because he misses a certain minimum target. Methods like Monte Carlo simulations are used in these chapters. An approach using matrix calculations with SAS/IML software is shown.

  • 24 Calculating the Outcomes of All Possible Scenarios
  • 25 Using Monte Carlo Methods to Simulate the Distribution of the Outcomes

Case Study 8 – Studying Complex Systems – Simulating the Monopoly® Board Game

Learning more details about the behavior of complex systems and the relationship between different components of this system is very often needed. This case study shows how Monte Carlo simulations can be used to simulate the Monopoly board game. The simulations include analysis of the visit frequency on different fields of the board game as well as a profitability analysis of different properties. Monte Carlo simulations with a SAS DATA step are shown in this case study.

  • 26 Creating Basic Framework to Simulate the Visit Frequency on the Fields of the Monopoly® Board Game
  • 27 Enhancing the Simulation Framework to Consider Special Rules
  • 28 Simulating the Profitability of the Property Fields of the Monopoly® Board Game

DSCS f27p1.png DSCS f27p2.png DSCS f27p3.png DSCS f27p4.png DSCS f27p5.png DSCS f27p6.jpg DSCS f28p1.jpg DSCS o28p1.jpg DSCS o28p2.jpg