Home

StephenFoerster · ‎09-18-2020

Some people need time to warm-up to new things. As a member of the Global Enablement and Learning team, I see this every day. While "late adopters" are generally quite willing to learn the latest twist or tweak on something they already know well, they balk at the truly novel and complete paradigm shifts. Such it is with CAS. Many SAS users are taking a "long" approach. One group that might be reluctant to adopt CAS is surprisingly SAS' most ardent supporters. These users have invested lots of time into their SAS knowledge. With so much invested, they can be reluctant to work in another construct where they aren't so confident or skilled. So, for you SAS true believers, let's look at how to get the best performance in CAS and compare it to SAS (standard disclaimers about these tests being done on non-optimal, virtual hardware...) and, hopefully, once you know how to get vastly better performance from CAS than single engine (but still awesome!) base SAS, you'll be more comfortable giving it a try. 1. Group BY Aggregation -- Low Cardinality When choosing data processing techniques in base SAS, you usually only have DATA Step or PROCs. You have these in CAS as well, but, when looking for optimum performance, the best place to start is with the CAS Actions, and aggregation is no exception. Thanks to Nicolas Robert, we already know which CAS action aggregates the fastest (at least in his scenario), simple.summary. So, let's compare its performance on some big data against base SAS on the same. Test Parameters Test Parameter Value Input Rows 160 million Distinct BY Groups (Cardinality) 8 CAS Code -- Simple.Summary proc cas ; simple.summary result=r status=s / inputs={"revenue","expenses"}, subSet={"SUM"}, table={ name="mega_corp" caslib="visual" groupBy={"facilityType","productline"} }, casout={name="summaryMC", replace=True, replication=0} ; quit ; Base SAS Code -- PROC MEANS proc means data=mega_corp noprint; var revenue expenses; class facilityType productline; output out=summaryMC sum(revenue)=sumRevenue sum(expenses)=sumExpenses; run; Results Engine Method Real Time CAS Simple.Summary 7.39 SAS PROC MEANS 2:32.44 2. Group BY Aggregation -- High Cardinality There has been some talk that CAS does not perform well with high cardinality operations. Let's take a look by increasing the number of BY-Groups. We'll use the same code as above but replace the GroupBy and CLASS variables with productID, date, and unit. This gives us approximately 88,000 distinct groups. Test Parameters Test Parameter Value Input Rows 160 million Distinct BY Groups (Cardinality) 88,000 Results Engine Method Real Time CAS Simple.Summary 18.22 SAS PROC MEANS 2:31.65 3. De-Duplication As with aggregation, picking the right technique is key and, thankfully again, Nicolas Robert has already shown us which method to use for de-duplication, the simple.GroupBy CAS action. So, let's compare simple.GroupBy with PROC SORT. Test Parameters Test Parameter Value Input Rows 160 million Unique Keys 88,000 CAS Code -- Simple.GroupBy proc cas; simple.groupBy result=r status=rc / inputs={"productID", "date", "unit"} table={caslib="casuser",name="mega_corp"} casOut={name="dedupMC",replace=true,replication=0} ; run ; quit ; Base SAS Code -- PROC SORT proc sort data=mega_corp nodupkey out=dedupMC; by productID date unit; run; Results Engine Method Real Time CAS Simple.GroupBy 12.57 SAS PROC SORT 3:09.29 Discussion So, there you have it. CAS is fast. It is plowing through some decent sized data here on a few (5), relatively small (4-way) virtual servers in seconds. If you want performance like this however, you need to know which techniques to use. Luckily some of the hard work has already been done for you. In particular check out these posts: CAS answers to 4 common data manipulation tasks – Part 1 – APPEND CAS answers to 4 common data manipulation tasks – Part 2 – SORT CAS answers to 4 common data manipulation tasks – Part 3 – DE-DUPLICATE CAS answers to 4 common data manipulation tasks – Part 4 – AGGREGATE You'll also need to know more about CAS Actions. In particular, you'll need to know how to enhance them so they do exactly what you want. This post should help with that: CAS Action Computed Columns

Richardvan_tHoff

I want to create a master detail subdetail report within SAS BASE (proc report). You woud have a master row with colums. The master has detail rows with different columns and detail row also subdetail rows with comments. I don't want all the columns master,detail, subdetail on one row and master, detail, subdetail should be on the same page. The result will be converted to pdf. The output would look something like the below example: Procesname Start date end date Detail info ----------------------------------------------------------------------------------------- Walking 01-01-2024 10-04-2024 I walk very slow Subproces start date end data Start place end place ----------------------------------------------------------------------------------------------------- Part 1 01-1-2024 10-01-2024 New York Amsterdam Comments -------------------------------------------------------------------------------------------- I started very well The walk was nice Subproces start date end data Start place end place ----------------------------------------------------------------------------------------------------- Part 2 10-1-2024 24-01-2024 Amsterdam Berlin Comments -------------------------------------------------------------------------------------------- Was perfect swimming 10-06-2024 10-07-2024 I love swimming detail subdetail

kyle234

I have a library that I want to subset to only include tables that are dbms_memtype='table'. Thank you!

victor1893

Hi, I have a table from which I want to store in a macro variable all columns that contain a specific value. For example: data have; input abc def ghi jkl 8.; datalines; 1 4 2 3 6 99 0 8 3 99 6 5 2 4 0 99 ; run; Here I would like to obtain all columns which contain the value 99: def, jkl.