Creating an AI Assistant for SAS Viya in 5 steps (@sassoftware/viya-assistantjs) - Part I
Recent Library Articles
Recently in the SAS Community Library: SAS' @kumardeva debunks the myth that developing AI assistants is too hard. He shows you how to use the @sassoftware/viya-assistantjs library to jump start your development.
If you've seen any of my recent posts, you know that I've been struggling with this for a while now. My supervisor keeps telling me it's easy and it should have only taken a day. What I'm finding is that it's fairly easy to generate the numbers, but very time-consuming and tedious to rearrange them into a summary table. Proc Report seems to help a lot, but there is still a lot of data prep to get to a dataset that can be fed into Proc Report.
Anyway, as the title says, I'm trying to create a summary table. Below is sample data, and the code that I have so far. PUFFWGT is the person weight, while PUFF001 is a replicate weight to account for effects of the sampling/survey design. There are actually 100 replicate weights, but I only included 2 as an example.
Also below is an example table that shows the desired layout. This table was created with the TableN macro, which works great, but unfortunately doesn't work with weights. I'm not too concerned with labels right now, I just need the correct numbers in the correct place. There are a few differences between the Proc Report output and the desired table:
--There is no column for missing.
--There is no column for Total. In each row, this should be the sum of the Missing, no ADRD, and ADRD groups. The Total row has the correct n, but not the correct percentage.
--Once the Total column is created, I can drop the Frequency columns.
--The percentages in () should be column totals for each variable. So the percentage of DEM_AGE=1 + DEM_AGE=2 should equal 100%
--Need to add a column for the p-values from the chisq file.
data have;
infile datalines dsd dlm=',' truncover;
input TM_group ADRD_group DEM_AGE DEM_SEX PUFFWGT PUFF001-PUFF002;
datalines;
0,0,1,1,2110.5357,676.0632354,3108.917787
0,1,1,1,918.4676314,1106.627216,11398.01208
0,0,1,1,2758.096955,4735.215718,1418.690422
0,1,1,1,2456.208457,918.6245816,4548.556261
0,1,1,1,616.4292264,1033.661611,1213.613946
0,0,1,1,3673.275956,6316.504889,1704.047563
0,1,1,1,1102.850296,304.9154776,904.2769054
0,0,1,2,2714.610724,877.3159168,993.7824266
0,1,2,2,20090.52479,6085.947656,7590.092351
1,1,1,2,9098.780523,3498.802425,1335.317433
1,1,1,2,7196.92957,11977.46854,1240.066816
1,1,1,1,842.3146743,1255.501335,4314.886534
1,0,2,2,13516.51008,4775.164303,1106.604846
1,0,1,2,4460.828162,1083.99043,943.9937287
1,0,1,2,4818.845124,9286.932256,8197.376118
1,0,1,1,2848.023551,4739.321046,1113.329766
1,.,1,2,3050.820193,805.9607077,761.0495075
1,.,1,1,4256.565135,1176.233283,1775.665732
1,1,1,1,5211.416525,1570.093954,291.3352611
1,1,1,2,4181.422692,8242.459467,2573.543707
1,1,1,1,3179.943422,4688.70721,6518.281019
.,.,1,1,3451.708714,4256.625072,1167.477312
.,.,1,1,3018.745468,4729.150346,1565.75873
.,.,1,2,4098.775,1486.742512,1283.25
.,.,1,1,3483.708151,1098.980881,1349.598
;RUN;
/*Sort analysis file on BY variables, as required by PROC SURVEYFREQ*/
PROC SORT data=have;
BY TM_group ADRD_group; RUN;
/*** SECTION 1 -- Examine weighted freqs for each group ***/
/*Section 1.1 -- Weighted freqs - both person and sample weights*/
/*Section 1.1.1 -- Use ODS to save results into a file*/
ods output crosstabs = temp.freqs_raw_t1
chisq = temp.chisq_raw_t1;
/*Section 1.1.2 -- Generate weighted freqs*/
title1 "Section 1.1.2 -- Weighted freqs - both person and sample weights V&version.";
PROC SURVEYFREQ data=have VARMETHOD=BRR missing;
WEIGHT PUFFWGT;
REPWEIGHTS PUFF001-PUFF002;
BY TM_group;
TABLES DEM_AGE*ADRD_group DEM_SEX*ADRD_group / rschisq;
RUN;
/*** SECTION 2 -- Reorganize ODS output tables ***/
/*Section 2.1 -- freqs table*/
/*Keep needed variables and use macro to populate Value_Labels with proper values*/
DATA temp.freqs_t1_t; SET temp.freqs_raw_t1;
DROP WgtFreq /*Frequency*/ StdDev StdErr _SkipLine;
/*Create Value_Labels column and populate it with labels from each variable*/
Value_Labels = " ";
%macro val_labs (val_labs);
IF Table = "Table &val_labs. * ADRD_group" THEN Value_Labels = F_&val_labs.;
DROP F_&val_labs. &val_labs.;
%mend;
%val_labs(DEM_AGE)
%val_labs(DEM_SEX)
/*Reformat and combine weighted count and percent columns*/
PERCENT_RD = ROUND(Percent,.01);
Freq_percent = " ";
Freq_percent = COMPRESS(Frequency||"("||put(PERCENT_RD,8.2)||"%"||")");
/*IF Frequency <10 THEN Freq_percent = "<10"*/;
DROP /*Frequency*/ Percent PERCENT_RD; /*Retain Frequency because it is needed for the Total column*/
RUN;
DATA temp.chisq_2nd_t1; SET temp.chisq_raw_t1;
IF Label1 ^= "Pr > ChiSq" THEN delete; RUN;
/*REVISE THIS SECTION TO INCLUDE MISSING COLUMN*/
/*Section 2.1.1 -- Put columns in correct order*/
PROC REPORT data=temp.freqs_t1_t;
COLUMNS table Value_Labels TM_group, ADRD_group,(Freq_percent Frequency);
DEFINE table / group;
DEFINE value_labels / group;
DEFINE TM_group / across;
DEFINE ADRD_group /across;
RUN;
/*Section 2.2 -- chisq table*/
/*Drop unneeded variables, delete results for missing cases, and only keep Rao-Scott chisq p-value rows*/
DATA temp.chisq_t1_t; SET temp.chisq_raw_t1;
DROP Name1 nValue1;
IF TM_group = . THEN delete;
/*IF ADRD_group = . THEN delete;*/
IF Label1 ^= "Pr > ChiSq" THEN delete;
RUN;
MA_group
TM_group
Missing
no ADRD
ADRD
Total
P-value
Missing
no ADRD
ADRD
Total
P-value
(N=14)
(N=21836)
(N=1232)
(N=23082)
(N=17)
(N=29642)
(N=1475)
(N=31134)
Age group, n (%)
<.0001 1
<.0001 1
2:Age Group [65,75)
4 (28.6%)
8701 (39.8%)
179 (14.5%)
8884 (38.5%)
5 (29.4%)
12504 (42.2%)
215 (14.6%)
12724 (40.9%)
3:Age Group >=75
10 (71.4%)
13135 (60.2%)
1053 (85.5%)
14198 (61.5%)
12 (70.6%)
17138 (57.8%)
1260 (85.4%)
18410 (59.1%)
Gender, n (%)
<.0001 1
0.00751
1:Male
6 (42.9%)
9479 (43.4%)
453 (36.8%)
9938 (43.1%)
6 (35.3%)
13623 (46.0%)
619 (42.0%)
14248 (45.8%)
2:Female
8 (57.1%)
12357 (56.6%)
779 (63.2%)
13144 (56.9%)
11 (64.7%)
16019 (54.0%)
856 (58.0%)
16886 (54.2%)
... View more
I have the following input dataset ID Var1 Var1 11 A 23 11 B 121 11 A 23 12 A 32 12 B 158 12 A 32 12 B 158 13 A 87 13 B 567 I want to remove the repetitive records by ID variable and get the following output. ID Var1 Var1 11 A 23 11 B 121 12 A 32 12 B 158 13 A 87 13 B 567 Please suggest the best way to do it.
... View more
I have a dataset with five variables I would like to use to display the data on a graph. The data is grouped by the first variable (p) and sub-grouped by the second variable (i) along the x-axis. The fourth variable (nor) determines the height of the point along the y-axis. The fifth variable (mnor) is the mean of the nor values for each subgroup. I want to replace the circle with a unique symbol determined by the third variable (date), so that the graph is easier to interpret the results from different days the data was collected. I have been able to replace the point with the date but would prefer a symbol to minimize clutter. I've included a sample code below. The first graph is my original graph without including the date variable, and the second graph is replacing the points with the date value. data samp;
input p$ i date: date9. nor mnor;
format date date9.;
datalines;
A 0 16May2024 1.230 1.000
A 0 23May2024 0.770 1.000
A 1 16May2024 0.014 0.021
A 1 23May2024 0.028 0.021
B 0 16May2024 1.576 1.504
B 0 23May2024 1.432 1.504
B 1 16May2024 0.064 0.071
B 1 23May2024 0.078 0.071
C 0 16May2024 0.432 0.510
C 0 23May2024 0.588 0.510
C 1 16May2024 0.011 0.009
C 1 23May2024 0.007 0.009
;
title "Example Graph 1";
proc sgplot data = samp noborder;
scatter x = p y = nor /
markerattrs = (symbol=CircleFilled) group = i groupdisplay = cluster;
highlow x = p low = mnor high = mnor / nofill type = bar barwidth = 0.4 group = i groupdisplay = cluster;
xaxis type = discrete labelattrs = (size = 9) display = (nolabel);
yaxis labelattrs = (size = 9) display = (nolabel) grid;
run;
title "Example Graph 2";
proc sgplot data = samp noborder;
scatter x = p y = nor /
markercharattrs = (weight = bold) markerchar = date group = i groupdisplay = cluster;
highlow x = p low = mnor high = mnor / nofill type = bar barwidth = 0.4 group = i groupdisplay = cluster;
xaxis type = discrete labelattrs = (size = 9) display = (nolabel);
yaxis labelattrs = (size = 9) display = (nolabel) grid;
run;
... View more
SAS compute sessions have always been heavily depending on good disk I/O performance. In this blog I am covering some typical choices for configuring the SASWORK storage for SAS compute sessions.
... View more
Hello In a typical SAS 9.x environment, users often create datasets etc. that can be permanently stored on a physical disk. (Local, shared etc.) Moving to SAS Viya 4 on AWS I am exploring the options where a user can get the same experience and store the data on some form of permanent storage disk. (Content server, database excluded). This could be a storage on AWS itself. Wondering if somebody can enlighten me about the options available.
... View more