Recently in the SAS Community Library: SAS' @BethEbersole reveals 4 steps to stop money laundering, solve law-enforcement cases, find missing children and more with SAS Visual Investigator.
I have a report (e.g. below) where a set of list controls control the data used in the scatter plot and a simple univariate linear regression. The table below has some reference x values, and the "Predicted" column is derived from the linear regression (right click on object, "Derive predicted", and included as an item in the dataset, avg. aggregation.) The linear regression itself updates whenever I select/deselect any item in the controls. But the 'Predicted' values are the values from the initial version of the regression from which I created the field. Is there any way to get the predicted y values from the regression updating dynamically and displayed in the report below? It should change whenever I change the list control selection. I just want the report viewer to be able to see the estimated y values at certain x values as they slice data in different combinations. Thank you!
... View more
The app game is now available in the SAS Global Forum conference app. See the Click Game icon for our version of a photo scavenger hunt for #SASGF 2018. Play starting now until Wednesday, April 11 at 10:00 a.m. MT to earn points to win prizes.
The top 20 point earners will be entered into a drawing for the following:
YETI Hopper Flip 8 Portable Cooler
ENO Hammock
Echo Spot
Moodo Scent Machine
Let the games begin! Can't wait to see the photos.
... View more
Hello everyone: I'm writing a task and one of the objects (daterange) returns a couple of dates to two macro variables. I can read these two variables properly to a table with the following code: options locale=English_UnitedStates;
%let d1 = February 24, 2003;
data test;
dd = input("&d1", nldate200.);
run; But I can't read read the date in the macro variable and save it in another one with the format I need (date9. would ideal). I have tried the following, but it doesn't work. options locale=English_UnitedStates;
%let d1 = February 24, 2003;
%let d2 = %sysfunc(inputn("&d1", nldate200.), date9.);
%put &d2; Any help is appreciated.
... View more
Been a long-time SAS user and wanted to try out CAS. I came across the concept of a Personal CAS Server which is like a Sandbox environment. So seems perfect for my process right now. But there is surprisingly no information to be found on how to set it up. So how do I set up a Personal CAS Server - does anyone know ?
... View more
Some people need time to warm-up to new things. As a member of the Global Enablement and Learning team, I see this every day. While "late adopters" are generally quite willing to learn the latest twist or tweak on something they already know well, they balk at the truly novel and complete paradigm shifts. Such it is with CAS. Many SAS users are taking a "long" approach.
One group that might be reluctant to adopt CAS is surprisingly SAS' most ardent supporters. These users have invested lots of time into their SAS knowledge. With so much invested, they can be reluctant to work in another construct where they aren't so confident or skilled.
So, for you SAS true believers, let's look at how to get the best performance in CAS and compare it to SAS (standard disclaimers about these tests being done on non-optimal, virtual hardware...) and, hopefully, once you know how to get vastly better performance from CAS than single engine (but still awesome!) base SAS, you'll be more comfortable giving it a try.
1. Group BY Aggregation -- Low Cardinality
When choosing data processing techniques in base SAS, you usually only have DATA Step or PROCs. You have these in CAS as well, but, when looking for optimum performance, the best place to start is with the CAS Actions, and aggregation is no exception.
Thanks to Nicolas Robert, we already know which CAS action aggregates the fastest (at least in his scenario), simple.summary. So, let's compare its performance on some big data against base SAS on the same.
Test Parameters
Test Parameter
Value
Input Rows
160 million
Distinct BY Groups (Cardinality)
8
CAS Code -- Simple.Summary
proc cas ;
simple.summary result=r status=s /
inputs={"revenue","expenses"},
subSet={"SUM"},
table={
name="mega_corp"
caslib="visual"
groupBy={"facilityType","productline"}
},
casout={name="summaryMC", replace=True, replication=0} ;
quit ;
Base SAS Code -- PROC MEANS
proc means data=mega_corp noprint;
var revenue expenses;
class facilityType productline;
output out=summaryMC sum(revenue)=sumRevenue sum(expenses)=sumExpenses;
run;
Results
Engine
Method
Real Time
CAS
Simple.Summary
7.39
SAS
PROC MEANS
2:32.44
2. Group BY Aggregation -- High Cardinality
There has been some talk that CAS does not perform well with high cardinality operations. Let's take a look by increasing the number of BY-Groups. We'll use the same code as above but replace the GroupBy and CLASS variables with productID, date, and unit. This gives us approximately 88,000 distinct groups.
Test Parameters
Test Parameter
Value
Input Rows
160 million
Distinct BY Groups (Cardinality)
88,000
Results
Engine
Method
Real Time
CAS
Simple.Summary
18.22
SAS
PROC MEANS
2:31.65
3. De-Duplication
As with aggregation, picking the right technique is key and, thankfully again, Nicolas Robert has already shown us which method to use for de-duplication, the simple.GroupBy CAS action.
So, let's compare simple.GroupBy with PROC SORT.
Test Parameters
Test Parameter
Value
Input Rows
160 million
Unique Keys
88,000
CAS Code -- Simple.GroupBy
proc cas;
simple.groupBy result=r status=rc /
inputs={"productID", "date", "unit"}
table={caslib="casuser",name="mega_corp"}
casOut={name="dedupMC",replace=true,replication=0} ;
run ;
quit ;
Base SAS Code -- PROC SORT
proc sort data=mega_corp nodupkey out=dedupMC;
by productID date unit;
run;
Results
Engine
Method
Real Time
CAS
Simple.GroupBy
12.57
SAS
PROC SORT
3:09.29
Discussion
So, there you have it. CAS is fast. It is plowing through some decent sized data here on a few (5), relatively small (4-way) virtual servers in seconds.
If you want performance like this however, you need to know which techniques to use. Luckily some of the hard work has already been done for you. In particular check out these posts:
CAS answers to 4 common data manipulation tasks – Part 1 – APPEND
CAS answers to 4 common data manipulation tasks – Part 2 – SORT
CAS answers to 4 common data manipulation tasks – Part 3 – DE-DUPLICATE
CAS answers to 4 common data manipulation tasks – Part 4 – AGGREGATE
You'll also need to know more about CAS Actions. In particular, you'll need to know how to enhance them so they do exactly what you want. This post should help with that:
CAS Action Computed Columns
... View more