Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study

From sasCommunity
Jump to: navigation, search
69153_thumbnailcover.jpg

Overview

Title: Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study

Authors: Paul Dorfman and Don Henderson

ISBN: The book is expected to be available in the Fall of 2017.

Solve your Business Intelligence Problems with Hash Tables

The SAS hash object is a well-known tool to many SAS programmers. You may have very well used the hash object in a DATA step to create and load hash tables to perform table lookup. The use of the hash object for table lookup is wide-spread and well-known.

Restricting your usage of the hash object to table lookup problems does not do proper justice to the capabilities of the SAS hash object; it can do a lot more than you might think! This book concentrates on solving your challenging data management and analysis problems via the power of the SAS hash object, whose environment and tools make it possible to create complete dynamic solutions . To this end, this book provides an in-depth overview of the hash table as an in-memory database with the CRUD (Create, Retrieve, Update, Delete) cycle rendered by the hash object tools.

By using this concept and focusing on real-world problems exemplified by sports data sets and statistics, the book seeks to help you take advantage of the hash object productively, in particular, but not limited to:

  • select proper hash tools to perform the hash table operations
  • use proper hash table operations to support specific data management tasks
  • utilize the dynamic, run-time nature of hash object programming
  • understand the algorithmic principles behind hash table data look-up, retrieval, and aggregation
  • recognize why the hash object is exceptionally well-suited for data aggregation by presenting a number of data aggregation examples to create summaries useful in answering Business Intelligence questions.
  • manage hash table memory footprint, especially when processing big data
  • use hash object techniques for other data processing tasks, such as filtering, combining, splitting, sorting, unduplicating, etc.

Reviews

No reviews availabble yet.

Tentative List of Chapters

  • Chapter 1 is the introduction.
  • Chapter 2 provides an overview of our sample data. Understanding the sample data will help with understanding all the examples in the chapters that follow. The sample data is transactional data about a game similar to baseball and created by a series of programs that perform a number of hash object operations. The programs are not covered in detail; instead, chapter 3 will highlight the hash object operations and methods used in these programs.
  • Chapter 3 provides an overview of the history of the hash object along with a peek under the hood that explains, at a high level, the inner workings of the hash object.
  • Chapter 4 provides an overview of all of the types of data management operations, along with selected simple code examples, that the hash object can perform and introduces terminology which is used in the remainder of the book.
  • Chapter 5 will expand upon the operations and methods of the hash object, with specific focus on how the hash object can be used to filter, combine, sort and split data from various sources. The sample programs that were discussed in Chapter 2 that create our sample will provide many of the examples of the examples in this chapter.
  • Chapter 6 discusses the use of the hash object to create data warehouse tables, specifically star schema fact and dimensiuon tables. A data warehouse star schema is created at the end of this chapter using the transactional data described in chapter 2. That star schema is used as the input data for many of the examples in the chapters that follow.
  • Chapter 7 illustrates the use of the hash object for data aggregation. Examples of aggregation at multitple different levels and different classification or grouping variables will be illustrated. Calculating distinct counts will also be addressed as will the flexibility inherent in the hash object to develop algorithms and programs for metrics to assist in answering our user’s Bizarro Ball Business Intelligence questions.
  • Chapter 8 will introduce an advanced feature of the hash object – creating a hash object whose data variables themselves contain hash object or hash iterator instances. We will refer to this as a Hash of Hashes (HoH) which will be used in combination with data-driven techniques to in order to offer more flexibility and dynamic coding. We will apply the HoH concept to selected examples from earlier chapters in order to illustrate how it lends itself to minimizing hard-coding and makes our programs usingthe hash object lither.
  • Chapter 9 will provide more examples of advanced techniques with a special focus on techniques for memory management and key-independent data partitioning. Tables created via the hash object are purely memory-resident; and so large tables, or a requirement for numerous tables, can exceed the memory capacity of even higher-end servers. Several examples of partitioning will illustrate how to minimize the hash object memory footprint while still maintaining its full functionality.