Hi, do you guys ever work with Verhoeff check digit algorithm (https://en.wikipedia.org/wiki/Verhoeff_algorithm) with SAS? I am trying to use sas for verhoeff check digit but havent found any script to do so.
... View more
Hi,
I am new to SAS Admin. I am looking to give read-only permission to Redshift tables in SAS DI Studio for a specific SAS user group in PROD environment.
Example:
CDP SAS Users Group - I have assigned it to Redshift Read Only Users
I have 2 set of auth domains I have created for 2 groups
I have a library to Redshift access using CDP_Redshift_Prod as the write access to Redshift. See below
LIBNAME ROCOUR SASIORST DSN=prod_cdp_ods_sas SCHEMA=xxxx AUTHDOMAIN="CDP_Redshift_Prod" ;
My issue is, I only want to give users of the CDP SAS Users to have read-only metadata and read only data to the redshift tables. How can I achieve this as I cannot change the configuration of the existing library to use CDP_Redshift_Prod_RO.
I tried using the Redshift Write Users group to CDP SAS Users and giving only read only via ACT but I can still write to the Redshift table.
Any help you be more than welcome.
... View more
Hi, I have data like below and I would like to convert variables as test with their result. I have created macro and calling a macro for every variable each time. But the problem is I have more than 4,00,000 records with 100 variables which means 100 tests, so it is taking lot of time for each test and when I am stacking them then I am getting low space pop up notification, and I am unable to run my program completely. Is there any solution for this? data I have: data have;
input Obs $1-2 id $3-7 DT $8-17 var6_d0 $18-20 var20 $21-23 var30 $24-26 var40 $27-29 var70 $30-32 var6_d1 $33-35 var21 $36-38 var31 $39-41 var41 $42-44 var71 $45-47;
datalines;
1 1001 23MAY2023 No
2 1001 23MAY2023 No
3 1001 23MAY2023
4 1001 23MAY2023
1 1001 23MAY2023
2 1001 23MAY2023
3 1001 23MAY2023
4 1001 23MAY2023
1 1001 23MAY2023
2 1001 23MAY2023
3 1001 23MAY2023
4 1001 23MAY2023
1 1001 23MAY2023
2 1001 23MAY2023
3 1001 23MAY2023
4 1001 23MAY2023
1 1001 23MAY2023
2 1001 23MAY2023
3 1001 23MAY2023
4 1001 23MAY2023
1 1001 20JUN2023 No
2 1001 20JUN2023 No
3 1001 20JUN2023
4 1001 20JUN2023
1 1001 20JUN2023
2 1001 20JUN2023
3 1001 20JUN2023
4 1001 20JUN2023
1 1001 20JUN2023
2 1001 20JUN2023
3 1001 20JUN2023
4 1001 20JUN2023
1 1001 20JUN2023
2 1001 20JUN2023
3 1001 20JUN2023
4 1001 20JUN2023
1 1001 20JUN2023
2 1001 20JUN2023
3 1001 20JUN2023
4 1001 20JUN2023
1 1001 15AUG2023 No
2 1001 15AUG2023 No
3 1001 15AUG2023
4 1001 15AUG2023
1 1001 15AUG2023
2 1001 15AUG2023
3 1001 15AUG2023
4 1001 15AUG2023
1 1001 15AUG2023
2 1001 15AUG2023
;
run; Macro I have created: %macro convert_variables_columns(test=);
data want;
set have;
where &test. ne "";
test=&test.;
res=&test.;
run;
proc sort; by id; run;
%mend;
%convert_variables_columns(test=var6_d0);
%convert_variables_columns(test=var20);
%convert_variables_columns(test=var30);
%convert_variables_columns(test=var40);
%convert_variables_columns(test=var70);
%convert_variables_columns(test=var6_d1);
%convert_variables_columns(test=var21);
%convert_variables_columns(test=var31);
%convert_variables_columns(test=var41);
%convert_variables_columns(test=var71); Data I want: data want;
input obs $1-2 id $3-7 dt $8-17 test $18-25 res $26-28;
datalines;
1 1001 23MAY2023 var6_d0 No
2 1001 23MAY2023 var40 No
1 1001 20JUN2023 var6_d0 No
2 1001 20JUN2023 var40 No
1 1001 15AUG2023 var6_d0 No
2 1001 15AUG2023 var40 No
;
;
run; Thanks, Chi
... View more
Hi,
I took the SAS 9.4 Base Certification practice exam. Below are the steps for question 13.
This project will use data set cert.input36. At any time, you may save your program as program36 in cert\programs. Write a SAS program that will clean the data in cert.input36 as follows:
Step 1:
create a temporary data set, cleandata36.
In this data set, convert all group values to upper case.
Then keep only observations with group equal to 'A' or 'B'.
Step 2:
Determine the MEDIAN value for the Kilograms variable for each group (A,B) in the cleandata36 data set. Round MEDIAN to the nearest whole number.
Step 3:
create results.output36 from cleandata36
Ensure that all values for variable Kilograms are between 40 and 200, inclusively.
If the value is missing or out of range, replace the value with the MEDIAN Kilograms value for the respective group (A,B) calculated in step 2.
How many observations are in results.output36?
The original dataset had 5000 observations.
My answer was 4897 observations.
The practice exam has 4992 observations for the answer.
I can't figure out why there is a 95 observation difference.
I did answer the second question regarding the median correctly.
Thanks for your help.
I used the following code:
data cleandata36; set cert.input36; group = upcase(group); where group in ('A' 'B'); run;
proc means data=cleandata36 median maxdec=0; class group; var kilograms; run;
data results.output36; set cleandata36; if Group = 'A' and kilograms lt 40 or kilograms gt 200 then kilograms = 79; if Group = 'B' and kilograms lt 40 or kilograms gt 200 then kilograms = 89; run;
proc means data=results.output36 maxdec= 2 min max mean median n; class group; var kilograms; run;
The answer code is: data work.cleandata36; set cert.input36; group=upcase(group); if group in ('A','B'); run;
proc means data=work.cleandata36 median;
class group;
var kilograms;
run;
data results.output36;
set cleandata36;
if Kilograms < 40 or Kilograms > 200 then do;
if group='A' then kilograms=79;
else kilograms=89;
end;
run;
proc contents data=results.output36;
... View more
Suppose I sample 30 widgets from a manufacturing line over the course of a 24 hour day. The sampling times are not necessarily random or evenly distributed. Each widget is inspected and either passes or fails. So I have data like:
data have ;
input time time5. fail ;
format time time5. ;
cards ;
00:45 0
01:00 1
02:45 0
03:15 1
03:45 1
05:30 1
05:45 0
06:00 1
06:30 0
06:45 1
07:00 0
09:00 1
10:15 1
12:30 1
14:00 0
16:00 0
17:15 0
17:30 0
17:45 0
19:00 0
20:45 0
21:30 0
21:45 0
22:00 0
22:15 0
22:30 0
22:45 0
23:00 0
23:15 0
23:30 0
;
run ;
If I plot the data I get:
And looking at the plot, I see two clusters. There was a high failure rate between 0:00 and 12:30, and after 14:00 the failure rate was zero. Of course usually the pattern isn't as clear. I could have data like:
At first it looks like there may be three clusters, with a low failure rate clusters between 10:00 and 16:00. But then when you think about it there are only four data points in that period. If the failure rate was constant during the day (.3), it would not be unusual to have four consecutive successes (.7**4=.24). So maybe the correct interpretation is that there are no clusters in that data. Or maybe there are two clusters: 00:00-09:30 with a high failure rate and 10:00-24:00 with a lower failure rate.
I think what I'm looking for is a way to group the data into clusters (by time) where consecutive clusters have a failure rate significantly different than the failure rate of neighboring clusters.
Or maybe another thing I might want is to identify any clusters (by time) where I can conclude that the failure rate is, say, <.1 (with 95% confidence).
Would appreciate any thoughts/suggestions. I realize this is not a well-framed question. I feel like I want some sort of cluster analysis? ChatGPT recommended looking into "temporal scan" methods which apparently create different time window sizes and then compare them. But I didn't find much on that approach that seemed helpful.
... View more