Data Quality for Analytics

From sasCommunity
Jump to: navigation, search

Navigation: Overview --- Table of Contents --- Part I - Data Quality for Analytics Defined --- Part II - Profiling and Improvement --- Part III - Simulation Studies --- Download Page

DQFA Draft Cover01.jpg DQFA Kreislauf.jpg

  • 3 reasons to read this book
    • Analytics has additional requirements on data quality
    • Analytics contributes methods for better data quality
    • Simulation studies show the consequences of poor data quality on model quality


About

  • Gerhard Svolba has worked on his new book "Data Quality for Analytics Using SAS" from 2009 to 2012.
  • The book has been published in SAS Press in May 2012. It is also available in amazon.com.
  • Watch my interviews on data preparation and data quality on youtube.
  • Navigate to the links at the top of this page for detailed description of the different parts of the book.
  • Navigate to Gerhard's "Analytic and Data Preparation" Blog for actual updates on Data Preparation, Data Quality and Analytic Topics.
  • Comments and feedback can be sent to the author.

Links and Downloads

AnalyticsMagazine201401.jpg May paper on "Missing Values" has been published in the Analytics Magazine.

  • My paper on "Data Quality Criteria for Analytics" has been published in the IT-Briefcase

DPDQ London.jpg Picture Blog of my books travelling around the world

Hot News

  • Presentation on the new data quality book that are scheduled so far:
    • A2012 - Las Vegas , October 8-9th: The Consequences of Poor Data Quality on Model Accuracy (confirmed)
    • Predictive Analytics Conference September 24-25th, Vienna "Data Quality WITH and FOR Analytics" (working title)
    • Statistische Woche, TU-Wien, Vienna, September 18-21 Analytisches Datenqualitäts Profiling mit SAS und JMP (submitted)
    • SAS Forum Denmark, Copenhagen, October 3rd-4th "Data Quality WITH and FOR Analytics" (working title)

Content and Rationale of the Book (Overview)

  • Data quality is getting a lot of attention in the market. However, most of the initiatives on data quality and publications and papers on data quality focus on classical topics of data quality like elimination of duplicates, standardization of data, lists of values, value ranges and plausibility checks.
  • It shall not be said here that these topics are not important for analytics; on the contrary, they build the foundation of data for analysis. However, there are many aspects of data that are specific to analytics. And these aspects are important to differentiate whether data are suitable for analysis or not.
  • Analytics puts in many cases higher requirements on data quality but also offers more capabilities and options to measure and to improve data quality, like the calculation of representative imputation values for missing values. Thus, there is a symbiosis between the analytical requirements and the analytical capabilities in the data quality context.
  • Analytics is also uniquely able to close the loop on data quality since it reveals anomalies in the data that other applications often miss. The SAS® System with its offering is also perfectly suited to analyze and improve data quality.

The Data Quality Process

DQFA Kreislauf.jpg

  • The logical order of data quality for analytics as presented here is to first define the requirements and criteria for Data Quality for Analytics.
  • Based on these definitions it is investigated how the data quality status can be profiled and how a picture of important criteria for advanced analytic methods and the data quality status of the data can be achieved. And how it can be improved with analytical methods.
  • As not all data quality problems can be corrected or solved (or the effort is not justifiable), this paper deals with consequences of poor data quality. Based on simulations studies general answers about the usability of certain analytical methods and the effect on the accuracy of models are given if data quality criteria are not fulfilled.