As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.


Here are some tips for How to share your SAS knowledge with your professional network.


Gerhard's Blog

From sasCommunity
Jump to: navigation, search

DQFA cannot live Without books.jpg

DQFA Gerhard foto.jpg DQFA Desk.jpg DQFA Draft Cover01.jpg


General

Gerhard's BLOG

  • Monday, July 7th, 2014 - I have started working on my third book!
    • Working Title: Business Analyses with SAS
    • Case Studies for Data Analysis with SAS - Business Context, Rationale, SAS Code Examples.
    • Color? Most likely RED  :-) BAWS NewBook.JPG


DPDQ A2014.JPG A2014 in Frankfurt : Pictures and Analytical Data Quality Profiling with SAS Visual Analytics

DPDQ NewYork.JPG New York and SAS Campus Cary, NC : Pictures and Interview in the SAS R&D Blog (coming soon)


  • Wednesday, April 16h, 2014 A2014 in Frankfurt
    • The SAS Analytic Series Conference A2014 takes places in Frankfurt from June 4-5.
    • Prior to the conference on June 2-3 SAS offers pre-conference trainings, where my training "Building Analytic Data Marts" is included as well.
    • Find more details here.
  • Friday, January 17th, 2014 Papers on Data Quality Criteria are available:

AnalyticsMagazine201401.jpg My paper on "Missing Values" has been published in the Analytics Magazine.

    • My paper on "Data Quality Criteria for Analytics" has been published in the IT-Briefcase
  • Tuesday, November 5th, 2013 New Blog Entries and new pictures
    • New Blog Entry, discussing why my Aunt Susanne and her friends are giving us a hard time in statistical analysis: English and German

DPDQ Liechtenstein.JPG Liechtenstein and Mannheim : Pictures and Missing You! Presentation in German

DPDQ Ottakring.jpg New kids on the shelf : Pictures and Book presentation in the SAS Club Austria and Austria -- SAS Club

DPDQ GrandCanyon.JPG At the Grand Canyon Plateau : Pictures and A2012 presentation

DPDQ Florida.JPG Driving around Florida : Pictures

DPDQ Sweden.JPG The Nordic Tour 1 - Sweden : Pictures and Data Quality and the Consequences presentation

  • Thursday, October 3rd, 2013 Pictures of 'Data Preparation and Data Quality on the Road' available
    • A picture collection of my books travelling around the world can be found here there is also a link at sascommunity.org
  • Monday, June 24th, 2013 New Downloads Available
    • After returning from conferences in Frankfurt, Heidelberg and London, there are new downloads available:
    • A2013 in London, June 19th -- PDF Version of the presentation

DPDQ London.jpg Two Englishmen at A2013 : Pictures and PDF Version (English) of the Missing You! presentation and - A2013 interview on youtube

    • Watch my A2013 interview on - youtube.
    • Data Mining Anwendertag 2013 in Heidelberg -- Download
    • JMP Explorer Seminar Frankfurt, Juni 2013 -- Download
  • Thursday, May 9th, 2013 Youtube videos are online
    • The SAS Video Team recorded 6 short interviews with me on data preparation and data quality for analytics. You can find the youtube list here.

DPDQ Chicago.JPG US Trip 2013, Cary NC and Chicago : Pictures and Youtube Interview on data preparation and data quality

  • Thursday, April 4th, 2013 Thoughts on Missing Valules

DQFA Tile Chart Missing Values.png

DPDQ Ulm.jpg KSFE 2013 in Ulm, Germany : Pictures and Blog

  • Wednesday, October 10th, 2012 Download section updates
    • I took time to update the download section for my two books, so there are some news:
      • For the data quality book there is a "Changes and Enhancements Document" and some updates to the programs available. See both under this link
      • For the data preparation book there is an update to the MAKEWIDE and MAKELONG macro. It now allows to transpose data that have no BY variable. See here.
  • Friday, September 28th, 2012 Autumn conferences
    • This autumn I am very busy with conference engagements. So far I have presented the ideas of my new book at Statistische Woche Technical Univeristy of Vienna and the Predictive Analytics Conference in Vienna.
    • Next week I will fly to Copenhagen to speak at SAS Forum Denmark
    • And the week after I look forward to meet you at Las Vegas at A2012
  • Friday, August 3rd, 2012 4 ways to cure yourself after writing a book
    • I wrote some thoughts for the SAS Press blog The SAS Bookshelf on what you should do after you have finished writing a book. Hope you enjoy!
  • Tuesday, June 19th, 2012 After my first e-chat
    • Yesterday, I ran an e-chat at AllAnalytics.com on my new book. A lot of people logged in and it was new experience for me to get so many questions at the same time. I enjoyed the conversation with you all and took away some thoughts.
  • Sunday, June 17th, 2012 Returned from Cologne
    • Hi, I just returned from Cologne from A2012. I was a very good experience to give a talk about my new data quality book in front of a crowded room. I enjoyed the talk, I hope you in the audience did also. Thank you very much for your comments and questions afterwards and in the break, it is the conversation with you that keeps these knowledge-projects and the ideas moving. Download here.
  • Tuesday, June 12th, 2012 Cologne is waiting
    • Hi, I look forward to see you in Cologne at A2012, there are more than 400 people registered.
    • The link for the AllAnalytics.com e-chat is here. Look forward to chat to you.
  • Wednesday, June 6th 2012 E-Chat and Preparation Tasks
    • I will run an e-chat about my new book "data quality for analytics" on allanalytics.com on June 18th, 11 a.m. ET. Look forward to "e"-chat to you there.
    • In the meantime I am in the final preparation work for my A2012 presentation in Cologne and the edition of the final manuscript of an article on "Data Quality for Analytics – and the consequences if it is not as good as you thought" in the Analytics Magazine this summer.
    • As one of our customers, Europcar, can not make it to come to Cologne next week, I agreed to jump in for his presentation and also do the talk "Yield Management - Europar" a talk that shows how time series forecasting and optimization is used to better run the car rental business. So there is a lot to du until next week, but I definetly enjoy these interesting tasks.
  • Wednesday, May 23rd 2012 Data Quality Book making holidays
    • After 2 ½ years of work my book deserved some holidays, so I decided to be a nice author and take it out beautiful places at the coasts of Florida.
  • Tuesday, May 22nd 2012 Downloads for Data Quality for Analytics are available
  • Saturday, May 19th 2012 Invitation to speak on A2012 in Cologne
    • I just got the request to speak on A2012 in Cologne. And really look forward to see you there. The topic of my presentation will be The Consequences of Poor Data Quality on Model Accuracy and will run on June 14th, from 17:00 - 17:30.
    • Other presentation on the new data quality book that are scheduled so far are:
      • A2012 - Las Vegas, October 8-9th: The Consequences of Poor Data Quality on Model Accuracy (confirmed)
      • Predictive Analytics Conference September 24-25th, Vienna "Data Quality WITH and FOR Analytics" (working title)
      • Statistische Woche, TU-Wien, Vienna, September 18-21 Analytisches Datenqualitäts Profiling mit SAS und JMP (submitted)
      • SAS Forum Denmark, Copenhagen, October 3rd-4th "Data Quality WITH and FOR Analytics" (working title)
  • Wednesday, May 16th 2012 New book Data Quality for Analytics is available
    • The new book Data Quality for Analytics is available since today. It is again a great experience to have the 2nd book out. I am convinced that the effort that went into this book was worthwhile and will cause benefit to the analytics user community. I look forward to receive your feedback and enter into discussions with you. LinkedIn or Email.
    • See also Amazon.
  • Thursday, May 10th 2012 First presentation on the new book at a SAS conference
    • We had our SAS Forum Austria today in the Hotel Sofitel in Vienna. In the Information Management stream I gave my first presentation on the new book Data Quality for Analytics with a full room of people and very good feedback from the attendees.
  • Tuesday, May 1st, 2012 Review from Bart Basesens
    • I received a review quote from Prof. Bart Baesens on my new book today, that I wanted to share with you. It is a pleasure for me receive such feedback on my work over the last 2 1/2 years.
      • Data Quality is the key ingredient for a successful analytics project.  In our past research, we have extensively shown that the best investment to boost the performance of any analytical model, is by understanding and improving data quality.
      • In his book, dr. Svolba provides a comprehensive coverage of the topic  by defining data quality first, followed by providing well-articulated guidelines to profile and improve data quality, and concluding with illustrating the impact of poor data quality in e.g. predictive modeling and time series forecasting. The book is well-structured and written, with lots of practical examples clarifying the ideas presented. 
      • In short, I consider the book a must-read for anyone working on developing high-performing analytical models! 
      • Prof. Dr. Bart Baesens - Department of Decision Sciences and Information Management -- KU Leuven - Belgium


  • Monday, April 16th MAKEWIDE and MAKELONG macro available again
    • SAS removed my MAKEWIDE and MAKELONG macro from support.sas.com, as they want to promote their new sample based on PROC TRANSPOSE.
    • As I got many requests from users that they still want to use my macros I made the available here.
    • I ran some tests and found out that the performance of my macros is still better than the new transpose sample.
  • Saturday, March 3rd, 2012 New Data Quality pages and New SAS Samples
  • This week I found time to rework the sascommunity.org pages for the new data quality book. A look on the new pages can be found here: Data Quality for Analytics
  • Some weeks ago I contributed a | sample to support.sas.com that shows how you can use PROC FCMP to create your own funtions.
    • My sample shows how to build a function that calculates the weekday from a date, where Monday=1, Tuesday=2, ..., Sunday=7.
    • This is different from the standard WEEKDAY function that uses 1=Sunday, 2=Monday, ..., 7 = Staturday.
  • Sunday, February 19th, 2012 - All 23 chapters passed the copy-edit process
  • Since this week all my 23 chapters of "Data Quality for Analytics" passed the copy-edit process. The copy-editor of SAS Press did a really good work and invested a lot of time to create not only a nicer formatted document, but also sent me back some comments that made the text much more readable. Thanks a lot for your efforts. The appendix and the introduction are still pending, but we are moving closer to the final version. On SAS Global Forum in Orlando display copies of the final version of the book will be available.
  • Wednesday, Febrary 15th 2012 - Correction in scorecode for GENMOD extension node for SAS Enterprise Miner
  • Together with a customer we found out that the scorecode that is generated by the GENMOD node ignores categorical variables and interval class variables. Many thanks also to the developer team of SAS Enterprise Miner. Even if this is an extension node that is not offically supported by the team, they were able to provide a fix within 1/2 a day.
  • I will update the download files for my Data Preparation extension nodes nodes soon. If you need an earlier update please contact me under Email.


  • Tuesday, January 10th, 2012 - My extension nodes run under EM 7.1 as well"
  • I got some requests whether my Data Preparation extension nodes also run under EM 7.1. Yes, they do!
  • I made a quick update to the installation instruction for the extension nodes and will inlcude this into the PDF document soon.


  • Tuesday, December 20th, 2011 - Merry Christmas and Happy New Year"
  • The last weeks have been quite hectic and I did not take time to write text for the blog. My book "Data Quality for Analytics" is already in the copy edit process. Publication is participated in April 2012. I wish you a merry Christmas and a successful 2012. Gerhard.
  • Monday, October 17th, 2011 - 23:59 -- Pressing the SEND button
  • Sorry for being quiet over the last weeks. However, all my energy went in the finalisation of the full draft of my book "Data Quality for Analytics Using SAS". And yes! On Monday October 17th, I was able to finalize the full draft version, that considers all recommendations from the many reviewers and contains many changes that I had on my "change-list".
  • The book is now in the hands of SAS-Press, who will hand go through the copy edit process over the next weeks. The book is anticipated to be published by begin of next week. Definitely it should be available on SAS Global Forum 2012 in Orlando.
  • As I have now more "free" time :-) , I hope to be able to update this blog more frequently and also proceed with providing more analytic content.


  • Thursday, October 13th, 2011 - Returning from Cologne
  • Just returning from Cologne where I held my training class Building Analytic Data Marts over the last 3 days for a large German telecommunications company. It was again a good experience to be in touch and in discussion with customers on the data preparation topic.


  • Thursday, October 6th, 2011 - This is your life with SAS
  • Just returning from Brussels. Today I attended SAS Forum Belgium and gave a presentation on my book Data Preparation for Analytics.
  • The main theme of the SAS Forum Belgium was "This is your life with SAS". There is also a very nice comic strip, that shows how SAS improves your life. See also the video.
  • Yesterday I attendes SAS Forum NL where I gave a presentation on Forecasting Case Studies of Austrian Companies and had very good and interesting conversations with people from the audience afterwards.
  • I also held a workshop on "Preditive Analytics" together with James Taylor and my colleague Rens Feenstra.


  • Tuesday, August 30th, 2011 -- Moving into the home stretch
  • Back from wonderful holidays in the Austria Alps and sailing on Neusiedler See: Now I am moving into the home stretch with my book “Data Quality for Analytics”. A lot of reviews have already arrived in my inbox and I am happy to see that all of them are very positive.
  • I have built a time schedule with target date October 17th to deliver the full version to SAS Press. (There is a buffer that would allow delivering it a week later). So I try to stay focused over the next weeks.
  • In the meantime I have updated my LINKED-IN account a little bit and opened a TWITTER account. However it will still take some time when this goes online. I will post any changes here on this site.

Blog Grossglockner.JPG Blog HoheNock2011.jpg


  • Tuesday, July 19th, 2011 -- GPS Data
  • From the social network point of view I want to mention that I recently also opened a "Linked-In" account. You can check my profile here. I plan to add more content here over time and also want to open a discussion list on the topic of analytics, data preparation and data quality. Currently my favorite name for this is Analytics and Data Preparation (ADP). Comments on the name are welcome.
  • The sailing season has started well and we are collecting lots of data during the races with our GPS tracking device. So my interest for processing and analyzing GPS data got a new hype. So sometimes I am not sure, whether I should be happy for the good weather to sail or look forward for bad weather to analyze more. I have already started to build a nice demo with JMP software on sailboat race data. I hope to publish this demo soon.
  • As the two pictures show, I am happy to see that my sons also share my interest for sailing (as long as there is good wind).

Blog S1.JPG Blog S2.JPG


  • Friday, July 8th, 2011 -- Book Reviews are coming in
  • My book "Data Quality for Analytics" is out for review for more than two months now. And the first feedback from reviewers already comes into through my mail inbox. I am glad to see that all of them invest a lot of time and give very detailed feedback. This is essential to me as so I can get a broader view on various opinions and incorporate feedback into the book before it is published. My plan is to wait for a few more weeks and then start with creating the final version. SAS Press still gives green light for a publication in January 2012.
  • Over the last weeks I have spoken with some analytic customers here in Austria about SAS Model Manager and also got into quite some detail when building a prototype for one customer. Beside the many nice features in model governance and model validation, I really like the stability report and the characteristic report which are part of the model performance monitoring.
  • Citing from the Users Guide: Section "Data Composition Reports"
    • The two data composition reports, the Characteristic Report and the Stability Report, detect and quantify shifts in the distribution of variable values that occur in input data and scored output data over a period of time.
    • The Characteristic report detects shifts in the distribution of input variables over time.
    • The Stability report measures shifts in the scored output data that a model produces. By analyzing these shifts, you can gain insights on scoring input and output variables.
  • From a data quality perspective these reports are very important as it gives you an insight how stable your INPUT data is. In many cases only the outcome in terms of correct response or lift values is measured. However these assessment statistics often tell you only half the story. The above mentioned reports help you to divide the problem and get more insight, whether the input data, the output score data or the target variables itself change.


  • Saturday, June 17th, 2011 -- Returning from Copenhagen
  • Just returned from Copenhagen where I was teaching my training Building Analytic Data Marts. Beside the fact that Copenhagen is an extremely nice city (not only for a sailor), it was a very good experience to have a number of very interested people in the training room. It always extends my horizon when discussing the topic “Data Preparation for Analytics” together with the course attendees and to learn about the problems, their experience, their viewpoints and how they can benefit from what I teach:

Dear course attendees, dear customers thank you for your discussions, questions and opinions. The reason that I can write book and generate ideas how to solve analytical and data preparation problems is triggered by the interaction with you.

DPFA Kopenhagen.jpg DPFA Harbor CPH.jpg


  • May, 30th 2011 -- Nomination as the new secretary of the “Vienna Biometrical Section"
  • Last week I got nominated as the new secretary of the “Vienna Biometrical Section” (Wiener Biometrische Section), which is part of the International Biometric Society. I was very happy about this nomination as this reflects my statistical roots at the Department of Medical Statistics, where I wrote my PhD. Also I felt a little bit honored, as I am the first board member who comes from a non-academic organization. My role as secretary is to send out the announcements for colloquia. The last presentation discussed for example considerations for the treatment of continuous variables as input variables, whether to group them or to use them in functional form. A topic that goes much beyond medical statistics and is of importance for example in data mining (credit scoring, customer behavior) as well.


  • Week 19 - Friday, May 13th, 2011 -- THE FIRST PRESENTATION
  • I was invited this week to speak at a conference in the Austrian Statistical Office. For me it was the premiere to speak about my new book and present it to a broader audience. So there were many reasons to be excited. It was a great experience to see that the topic and the way how it was presented was perceived very well by the audience. This, besides sending the book to the reviewers, was definitely one of the memorable moments in the context with this book.
  • In my talk I approach the data quality topic from a very untypical staring pint; as I am an enthusiastic sailor, I introduce the sailing topic, then show what analysis questions arise when you want to perform your sail performance. Next I show what data we have available and which problems we have with this data. From this I relate to the data quality problems for analytics. You will see that almost all data quality problems can be discussed on this basis. Stay tuned for announcements of the next talks.

DQFA StatAustria May2011.jpg

  • Week 18 - Friday, May 6th, 2011 -- THE WEEK AFTER
  • We had SAS Forum Austria this week in a very fancy centrally located hotel in Vienna. This year I had a lot of time to talk to our customers as I did not have to do a presentation. Even if it is a stressful day to speak to so many people, it is always a good experience to get in touch with all our analytic customer and share ideas, get their feedback and show them new features in our SAS 9.3 offering.
  • I recently got the confirmation that my course on my first book Data Preparation for Analytics will take place in Copenhagen from June 14th to June 16th. It will be a pleasure for me to see you there. [Registration]
  • Week 17 - Monday, May 2nd, 24th 2011 -- PRESSING THE SEND BUTTON
  • After a writing marathon over the last 4 months. I was able to finish the full draft of my book today and submit the document (which has now 314 pages) to SAS Press. Even if the last weeks were really hard consumed a lot of personal resources, I do not want to complain. It is an indescribable feeling to hold the final version in my hands and touch the outcome that has been produced over the last weeks and months.
  • The nice thing now is that I do not need to feel guilty if I am not writing. I am “allowed” to do other things right now and enjoy the nice spring in Vienna.
  • As you can see from the pictures below my “wall” is full with printouts right now.

DQFA Status All.jpg.jpg DQFA Status 21.jpg DQFA Status 20.jpg DQFA Status 19.jpg DQFA Status14.jpg DQFA Status11.jpg DQFA Status9.jpg DQFA Status 8.jpg DQFA Status 7.jpg DQFA Status 6.jpg DQFA Status4.jpg DQFA Status 2.jpg

  • Week 16 - Sunday, April, 24th 2011 -- MOVING TOWARDS THE FINISH

DQFA at SGF2011.jpg

Just a quick note from myself: I am still alive. Writing still goes on. However I use any minute to write on the book to keep the deadline April 29th to submit the full draft to SAS-Press. Thus the blog was quiet for some time. More conversation will becoming soon. Gerhard

DQFA Status 21.jpg DQFA Status 20.jpg DQFA Status 19.jpg DQFA Status14.jpg DQFA Status11.jpg DQFA Status9.jpg DQFA Status 8.jpg DQFA Status 7.jpg DQFA Status 6.jpg DQFA Status4.jpg DQFA Status 2.jpg


  • Week 13 - Saturday, April 9th 2011 -- RETURNING FROM THE US
  • Yes, there has been a long delay in adding the next entry to this blog. However this is not due to any inactivity in Data Quality and Data Preparation for Analytics. In contrary:
    • I spent two weeks in the United States. The first week at the SAS headquarters in Cary, North Carolina. It is always a very good experience to sit in the room together with other colleagues from the SAS Global Analytic Practice and from Product Management and R&D and to exchange ideas and experiences. I would go that far to say that nowhere on this planet you can reach such a high concentration of expertise in analytics, data mining and forecasting when it comes to answer practical business questions.
    • Before moving on to Las Vegas to attend SAS Global Forum a colleague and I flew to Denver to rent a car a drive from Denver over the Rocky Mountains to Las Vegas. A breathtaking experience with skiing resorts, endless horizons on the I-70, hot weather in the Arches National Park, snow in the Bryce Canyon NP, and a nice finish in the Zion NP and the Lake Mead with Hoover Damm.


  • Week 11 - Sunday, March, 20th 2011 -- ABOUT THE BENEFIT OF A DESSERT PLATE FOR SIMULATION STUDIES

DQFA Apfelstrudel.JPG DQFA Cakeplate.JPG

  • This week I managed to finish part II of the book and the first 14 chapters are sent out to the reviewers by SAS Press. Finishing part II also means that I can start working on the final version of my simulations studies for part III. Here I simulate the consequences on model quality when artificially introducing data quality errors into the data (missing values, biases, reduction of number of events or length of time history).
  • The simulations are performed on my laptop. Luckily enough I have a very strong machine which can run 8 processes in parallel. And SAS®Enterprise Miner utilizes them very efficiently. Some of these simulation processes run for hours, often overnight. As a consequences however the processor and the whole left side of my laptop gets hot, really hot. This even affects the wooden plate of my desk. So what to do?
  • The Austrian solution to this is as follows:
    • Eat an “Apfelstrudel” (no one pronounces this as funny as Austria’s most prominent export: Arnold Schwarzenegger, baked by his "Mama") while your simulations running. My wife backed one this week, where I managed to get hold of piece, before my three sons eliminated it all. (illustrative picture 1).
    • After you have finished the “Apfelstrudel”, turn around the dessert plate and put it under the side of your laptop where the processor is located (illustrative picture 2).
    • This will allow enough cooling air and your laptop and your desktop is save.
  • So I hope I have either stimulated your interest to perform simulations with SAS or at least you have appetite for an “Apfelstrudel” right now. For those of you, who want to get more details about how to perform simulation studies with SAS®Enterprise Miner I have good news: I decided to include, beside the simulations results, a chapter in my book that describes the architecture, pros and cons of the simulation environment. Those who have a very urgent need in this, can contact me directly.

DQFA Status14.jpg DQFA Status11.jpg DQFA Status9.jpg DQFA Status 8.jpg DQFA Status 7.jpg DQFA Status 6.jpg DQFA Status4.jpg DQFA Status 2.jpg



  • Week 10 - Sunday, March, 13th 2011 -- HOW FRAGILE OUR WORLD CAN BE
    • Looking at the dramatic pictures of Japan after the earthquake and the tsunami and thinking of the possible consquences of the nuclear incident, I decided to pause my blog for this week. In front of this background many of my daily life problems and challenges lose their relevance.
  • Week 9 - Friday, March, 4th 2011 -- REVIEWERS IMPUTE MISSING VALUES!
    • This week the Chapters 1 - 9 were sent to the reviewers. I was delighted to see that I already got initial feedback from one of them. I think that the very often hidden and anonymous contribution of the many reviewers are beside writing the book and highly important factor for the completion of a book.
    • Dear colleagues! I highly appreciate the effort and time that you invest to read through my book and give me feedback. Your work increase the quality not only of my book, but of many SAS Press books a lot. Thank you for imputing the missing values in my work!
    • Imputing and replacing missing values, was also the focus of my writing this week in chapter 10 and 11. The names of the two chapters are:
      • CHAPTER 10 – PROFILING AND IMPUTATION OF MISSING VALUES
      • CHAPTER 11 - PROFILING AND REPLACEMENT OF MISSING DATA IN TIME SERIES DATA
    • Note that for time series data I intentionally selected MISSING DATA instead of MISSING VALUES as in time series it is also important (and possible) to check whether whole records are missing. For example if a record 'January2011' is followed by 'March2011'. Here PROC TIMESERIES offers a lot of options to detect and insert these records.
    • The sun is currently shining in Vienna and this makes me and some colleagues here in SAS very happy as after lunch we will leave for a SAS skiing weekend in the Austrian Alps at Hinterstoder.

DQFA Status11.jpg DQFA Status9.jpg DQFA Status 8.jpg DQFA Status 7.jpg DQFA Status 6.jpg DQFA Status4.jpg DQFA Status 2.jpg


  • Week 8 - Sunday, February 27th 2011 -- THE COLOR OF DATA QUALITY FOR ANALYTICS
    • Friday this week I managed to finish the last chapter (chapter 9) of Part I 'Process Considerations of Data Quality for Analytics'. I loved to write that chapter as it deals with important topics like Data Quality Responsibilities and Data Quality as an Ongoing Process.
    • Acutally Data Quality is never finished. You may reach a certain data quality status with a project, however you have to invest in data quality in order to be able to maintain this stage. Data Quality decreases without any assistence from your side. This is similar as you have to maintain your house regularly, take care about your personal health all over the time and service you car in regular intervals.
    • After finishing the introduction section the draft version of the book was ready to be sent to SAS Press in Cary for review of the first part (chapter 1-9) of the book.
    • This version not only contains the chapters of the first part, but also all the other chapters in its current version. This version currently has 227 pages and will be used for the print of the preview copy which will be shown at SAS Global Forum in Las Vegas.
    • I also had to deal with another very important decision over the last 2 weeks. The Cover of the book. It is a very funny experience to be in the midst of finishing an important part of the draft version and in parallel having to take cary about color tones and graphics on the cover of the book. Of course I invested a serious amount of attention to that. As in my mind the cover the book has always been green, this color has been selected. A screenshot can be found below, the real print version can be checked in Las Vegas. Feedback is welcome!

DQFA Status9.jpg DQFA Status 8.jpg DQFA Status 7.jpg DQFA Status 6.jpg DQFA Status4.jpg DQFA Status 2.jpg

  • Week 7 - Sunday, February 20th 2011 -- BECOMING A GOOD PRESENTER
    • This week went very well with respect to writing. I finished chapter 8 and even a part of chapter 9. Thus I am a little bit before my schedule. My new milestone is the Friday 26th of next week, where I want to have the following available:
      • Chapter 9. This means that all chapters of Part I of the book are ready and I stayed within my personal time schedule.
      • Introduction to the book as well as the introduction to the three parts of the book.
      • Amended complete version of the book that is sent to SAS Press to allow to start the review process and to act as the preliminary print version for SAS Global Forum in Last Vegas.
    • The focus of my blog entry for this week however is a quite personal one. This Sunday evening 15 years ago (1996) I was quite nervous. At that time I was in my first year at the Department for Medical Statistics, University of Vienna. And Monday morning the first "Statistics and Documentation" class for the dieticians students at the general hosptical of Vienna started. And I was the person who had designed this class and had to deliver it for the next 2 weeks. I was not nervous because 15 young ladies where waiting to be tought in this class, but teaching and speaking in front of other people was not my favorite task at that time. I did at time also not believe the words of my former boss at an insurance company who told me that "everyone can learn to present". I did not consider myself as a lousy presenter but I thought that there was some room for improvement.
    • So I started my class on Monday morning and I survived it. I did not only survive it, I started to enjoy speaking in front of others and I lost my stage fright. Over the years I taught this class many times and I got some routine in practice. When I joined SAS a couple of years later I was doing much more presentations of different types; to small groups of customers, to large groups at marketing events. When I entered the conference room at SUGI31 (now SAS Global Forum) in San Francisco and presented my new book Data Preparation for Analytics to more than 200 people and got very good feedback on the presentation I was not only happy as this was an important milestone for myself. I was also looking back on how my presentations skills changed over time.
    • The message to you: If you have any doubt on whether you can improve your presentation skills and get rid of stage fright and nervousness, think about the story above and believe me: You just have to do it and to train it. It will work.
    • And what has this all to do with analytics? A lot! Presenting your ideas and findings to other people is important, irrespective whether you are a statistician in a company, a researcher or a consultant.

DQFA Status 8.jpg DQFA Status 7.jpg DQFA Status 6.jpg DQFA Status4.jpg DQFA Status 2.jpg


dDstill in my intial phase at the

  • Week 6 - Sunday, February 13th 2011 -- JUST NOTHING
    • No entry for this week!
    • I did not only enjoy this fantastic week in the Austrian Alps with incredible weather, clear sky and wonderful snow. I also appreciated a lot that I managed to get a free head from the frist holiday minute on. Especially in this hectic time this is very important.
    • Thank you all for the nice feedback to my blog. Some of you wanted to know whether I currently also read books or only write. Yes, I always have at least one book that I read. Currently they are however mostly easy-to-read titles. For example:
      • Dirk Stermann: "6 Österreicher unter den ersten 5" (six Austrians amongst the top five). The experience over 20 years of an German immigrant when coming to Vienna. A slightly critical mirror for us Austrians in a very humorous way.
      • Daniel Glattauer: "Darum". A crazy story about an intentional murder, who admits his crime, but no one believes him as he is a honored member of society.
      • Ken Follet: "Sturz der Titanen" (engl. Fall of Giants). I do one historic roman every year. Ken Follet understands to take you on a historical ride in many of his books.

DQFA Ski1.JPG DQFA Ski2.JPG


  • Week 5 - Thursday, February 3rd 2011 -- READY FOR A HOLIDAY
    • Yesterday evening I finished chapter 7 "Specifics of Preditive Modeling". I like this chapter a lot. Especially as its content was one of the triggers to write this book. Predictive models are indeed very powerful and help answering many important business questions. However they pose special requirements on the data.
    • Especially the need to have historic snapshots of the data available. And if the model shall be scored in regular intervals to have the data available in the same structure and with the same defintions in future periods. In many practical situations, especially if a company or an organisation has not done predictive model earlier, it is sometimes challenging to explain the need for historic snapshots of the data. Sometimes it is helpful to tell those people that the name predictive contains the word 'pre', meaning that you need to have data from the 'pre-time' of the event. In German this works event better with the words "Vorhersage-Modell" and "Vorher".
    • Another point is the number of usable observations and events. Sometimes the first look on data shows a nice number of events and observations. However as soon as you start to dig a little bit deeper, you see that observations need to be excluded from your training data. Section 7.6 will show this in an example. Where initially 90.000 observations are available and finally only 4.500 can be used for the analysis. See the picture below. In practice it is a very common sitatuion, where you have to balance between using more observation with fewer variables in the analysis or including all variables but have to skip some observations.

Chap7 ReductionofObservations.jpg

    • From a schedule point of view I managed to keep within my plans until the start of my holidays. No chapter is planned for the next week. But a lot of snow, panorama (webcam) from the core of the Austrian Alps, good Austrian food for the cold season, hot spiced wine with ginger, ... On Saturday school holidays start in the eastern part of Austria and many people will drive 3 hours westward to the good ski resorts.

DQFA Status 7.jpg DQFA Status 6.jpg DQFA Status4.jpg DQFA Status 2.jpg

Panorama.jpg


  • Week 4 - Sunday, January 30th 2011 -- KEEPING THE RHYTHM
    • This week the famous worldcup night-slalom took place at the Planai in Schladming (Styria, Austria). Beside the worldcup downhill race in Kitzbuehel, one of the most famous and exiting alpine ski races. From an Austrian point of view the results were a little bit disappointing, as different to the last years, the win did not go to an Austrian skier. For me it was still worthwile to watch the race because the co-commentator pointed out a couple of times how important it is for a slalom skier "to stay in rhythm throughout the whole race. As soon as you lose the rhythm you will hardly find back to your good performance".
    • An obvious point on the one hand side, on the hand other hand side I found a lot of similarty to my own current situation of having to finish a book. I defined my challenging time schedule in the first days of January how to proceed and I set milestones for each week. For me it was very important not to fall behind the schedule already in the first weeks. Even if a delay of one week would not affect the completion of the whole project that much, it is a self-accelerating process to know that you started very well and the project keeps going. This will give additional good mute to finish the following tasks. And I am very sure that this is not only true for book projects.
    • And this was a very good week with lots of good mute!; I could not only finish the scheduled chapters 5 and 6 this week on Saturday, I also got some good ideas for the inclusion in chapter in the 2nd part of the book "Profiling and Improvement". I am really looking forward to the completion of the book. Not only to have it finished, but I look forward to hold this collection of good ideas around "Data Quality for Analytics" in my hands and to discuss it with SAS-users.
    • This week was also a very good week from another point of view: A decision was made by my management that I will attend the SAS Global Forum in Las Vegas. I really look forward to it to meet many peers from SAS, but also to be able to get in touch with lots of customers from the US, for example at the SAS-Press booth. For all of you who will attend SAS Global Forum as well, make sure you pass by the SAS-Press booth. A pre-version of my book will be available there, which will give you a first impression about the structure and content of the book. I look forward to meet you there.
    • I met a friend this week who started to study law beside his regular job. A very challenging project. I got a little thoughtful, when he told me that he had to discontinue almost any private activities in favour of his study. Even when I spend a lot of time with my book and think very often about it, there are still many activities beside work and writing that I look after. And I am really thankful to my wife and my three boys to distract me that often and to actively request time that I spend with them. By the way: for the German speaking in the Vienna area amongst you: I can highly recommend the new program of Gerry Seidl which I saw this week. It is really brilliant.

DQFA Status 6.jpg DQFA Status4.jpg DQFA Status 2.jpg


  • Week 3 - Friday, January 21st 2011 -- A FREE WEEKEND? - UNBELIEVBLE
    • Unbelievable, but true. I managed to finish the revision of the chapters that were planned for this week already by Friday 8 p.m. This means that a weekend without writing-obligations is waiting for me. I am still not sure whether I can cope with that (just kidding). My three boys have a lot of semester-end school stress so I will go to an adventure swimming bath with them tomorrow to distract them a little bit (and me as well). Most likely I will start looking at chapters 5 and 6 on Sunday.
    • How far did I get? The following chapters are finished:
      • Chapter 1 – Introductory Case Studies
      • Chapter 2 – Definition and Scope of Data Quality for Analytics
      • Chapter 3 – Data Availability
      • Chapter 4 – Data Quantity
    • It was a very good experience to work on these chapters this week. I really like the content and I am getting even more confident that the content of the book will be appealing.
    • For psychological reasons I started to print out each chapter as soon as it is available with the 8 pages per print page option. The pages with the "small pocket book version" are attached to my shelf in my office. It is a good visiual motivation to see the book growing that way. I promise to add a photo here soon.
    • I also had a good idea today for the refinement of the content of chapter 12 - Analytic Data Correction (even if this is not in part "I" of the book which I am currently reworking). So far I wanted to pack all the options into this chapter, who analytics can improve data quality. For the analytical focus of the book however I realised today that this chapther should include all the nice analytical functionality that SAS/STAT, SAS/Enterprise Miner, ... offer for data quality tasks. So I am collecting these ideas now in parallel if they come into my mind.
    • Beside the writing of the Data Quality for Analytics (DQFA) book, I go a nice Email from Georg Morsing from education manager of the Danish SAS office this week. He was asking whehter I want to give my training "Building Analytic Data Marts" which I created and which is based on my first book Data Preparation for Analytics (DPFA) in Copenhagen this June. I am very happy about this request as it shows a confirmation of the value of my work in this area. The training is scheduled for June 14th to 16th 2011. Look forward to see your there. See also Building Analytic Data Marts.

DQFA Status4.jpg DQFA Status 2.jpg


  • Week 2 - Friday, January 14th 2011 -- GETTING STARTED AGAIN
    • I returned from Christmas holidays this week. It was a good feeling to have the first draft version finished before 2010 ended. And it was my personal Christmas present to achieve this. I could use the previous week not only to get rid of all our Christmas cookies (especially the "Vanillekipferl") at home, but also to start with the 2011 planning for the book.
    • My goal is to get the full version of my book out to SAS Press this spring. This means that I should utilize especially the time until end of March before the sailing season starts again and it will be hard for me to spend my weekend on my desk with book writing.
    • Having the first draft version available is with every book a very nice experience. I still remember this from my first book Data Preparation for Analytics. A nice experience not only because it is finsihed, but also because it is always a very challenging but extremely interesting part to look at the content of all chapters and to decide how the structure of the book should be changed. This results in creation of new sections, splitting and merging of chapters, transfer of sections to another chapter. There is no general rule when you can or should finish this process. You just feel it! And when you look at the rearrenged version, you know it: That's it! That's the book I want to finish now. And that's the structure readers will benefit most.
    • That what's happened. Chapters moved, sections moved, chapters were split. And the first part which I am focusing right now has a new structure which can be seen in the table of contents above.
    • Target for this week is to finish Chapter 1+2. The case studies I have already finished. I like them a lot, not only because they mirror some of the most interesting projects in my working career, but also because I am convinced that the help the reader with a practical and meaningful start into the book.
    • This and last week I also had some very good discussions with my colleage Mihai about the Predictive Modeling simulation case study, which helped my a lot. However I have first to get through the conceptual chapters in Part I until I can incorporate this.

DQFA Status 2.jpg