As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.


Here are some tips for How to share your SAS knowledge with your professional network.


Difference between revisions of "Tip of the Day:July 3"

From sasCommunity
Jump to: navigation, search
m (Scheduled Estimating the Size of New SAS Data Sets)
(Scheduled Splitting into one table can be faster)
 
Line 5: Line 5:
 
sasCommunity Tip of the Day
 
sasCommunity Tip of the Day
 
</h1><div style="padding:10px;">
 
</h1><div style="padding:10px;">
{{Tips:Estimating the Size of New SAS Data Sets}}
+
{{Tips:Splitting into one table can be faster}}
 
<div style="width:100%"><hr></div>
 
<div style="width:100%"><hr></div>
Feel free to [[Tips Talk:Estimating the Size of New SAS Data Sets|comment]] on this tip.
+
Feel free to [[Tips Talk:Splitting into one table can be faster|comment]] on this tip.
 
</div>
 
</div>
 
</td></tr>
 
</td></tr>

Latest revision as of 20:54, 22 June 2015

sasCommunity Tip of the Day

The Splitter transformation in SAS Data Integration Studio is not confined to splitting tables apart. It can also create just one output table that is a subset, or a copy, of the input table. And do so significantly faster than the SQL Extract transformation in some circumstances.

When processing a large table the SQL procedure (which the Extract transformation uses) may determine that a table scan is needed, especially when the WHERE conditions are not indexed. Even so, the SQL procedure still appears to use indexes, or at least some level of sorting, to access the table. Because the Splitter transformation uses a DATA Step containing a subsetting IF statement to output the table, the input table is scanned without the indexing overhead, resulting in faster processing.

If the output from the Splitter is narrower, as well as shorter, it can be useful to create an SQL extract view beforehand with all the columns that will be needed by the splitter: output and subsetting columns. Don't put a where clause in the view - this will just (probably) slow things down again.

Although it would be possible to write a BetterSplitter transformation, we have to put up with some understandable but less-than-optimal behaviour. The first thing the splitter does is create all the datasets it needs along with all the variables in the source; it does the column subsetting in subseqent steps. So by reducing the source columns in the view, the first step will reduce these variables straight away.

Empirically:

  • SQL Extract with query - anything between ten seconds and 30 minutes
  • Splitter with subsetting if - 2'30" - 4'00"
  • SQL Extract view with Splitter - 0'30"
Submitted By Cameron (talk)



Feel free to comment on this tip.


Prior tip - Next tip - Random Tip

Submit a Tip