As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.


Here are some tips for How to share your SAS knowledge with your professional network.


Tips:Splitting into one table can be faster

From sasCommunity
Jump to: navigation, search

The Splitter transformation in SAS Data Integration Studio is not confined to splitting tables apart. It can also create just one output table that is a subset, or a copy, of the input table. And do so significantly faster than the SQL Extract transformation in some circumstances.

When processing a large table the SQL procedure (which the Extract transformation uses) may determine that a table scan is needed, especially when the WHERE conditions are not indexed. Even so, the SQL procedure still appears to use indexes, or at least some level of sorting, to access the table. Because the Splitter transformation uses a DATA Step containing a subsetting IF statement to output the table, the input table is scanned without the indexing overhead, resulting in faster processing.

If the output from the Splitter is narrower, as well as shorter, it can be useful to create an SQL extract view beforehand with all the columns that will be needed by the splitter: output and subsetting columns. Don't put a where clause in the view - this will just (probably) slow things down again.

Although it would be possible to write a BetterSplitter transformation, we have to put up with some understandable but less-than-optimal behaviour. The first thing the splitter does is create all the datasets it needs along with all the variables in the source; it does the column subsetting in subseqent steps. So by reducing the source columns in the view, the first step will reduce these variables straight away.

Empirically:

  • SQL Extract with query - anything between ten seconds and 30 minutes
  • Splitter with subsetting if - 2'30" - 4'00"
  • SQL Extract view with Splitter - 0'30"

....read more

Submitted By Cameron (talk)