Creating an AI Assistant for SAS Viya in 5 steps (@sassoftware/viya-assistantjs) - Part I
Recent Library Articles
Recently in the SAS Community Library: SAS' @kumardeva debunks the myth that developing AI assistants is too hard. He shows you how to use the @sassoftware/viya-assistantjs library to jump start your development.
I wanted to know what would be the most efficient way to loop through each variable in my SAS code and return the data type.
I was also thinking of changing the format of variables depending on the data type.
For example, if var1 is Num, set format to a specific one for numerical variables, and so forth for character variables.
Is there a way for me to loop through and efficiently change all of the variables formats using an IF/ELSE statement?
... View more
If you've seen any of my recent posts, you know that I've been struggling with this for a while now. My supervisor keeps telling me it's easy and it should have only taken a day. What I'm finding is that it's fairly easy to generate the numbers, but very time-consuming and tedious to rearrange them into a summary table. Proc Report seems to help a lot, but there is still a lot of data prep to get to a dataset that can be fed into Proc Report.
Anyway, as the title says, I'm trying to create a summary table. Below is sample data, and the code that I have so far. PUFFWGT is the person weight, while PUFF001 is a replicate weight to account for effects of the sampling/survey design. There are actually 100 replicate weights, but I only included 2 as an example.
Also below is an example table that shows the desired layout. This table was created with the TableN macro, which works great, but unfortunately doesn't work with weights. I'm not too concerned with labels right now, I just need the correct numbers in the correct place. There are a few differences between the Proc Report output and the desired table:
--There is no column for missing.
--There is no column for Total. In each row, this should be the sum of the Missing, no ADRD, and ADRD groups. The Total row has the correct n, but not the correct percentage.
--Once the Total column is created, I can drop the Frequency columns.
--The percentages in () should be column totals for each variable. So the percentage of DEM_AGE=1 + DEM_AGE=2 should equal 100%
--Need to add a column for the p-values from the chisq file.
data have;
infile datalines dsd dlm=',' truncover;
input TM_group ADRD_group DEM_AGE DEM_SEX PUFFWGT PUFF001-PUFF002;
datalines;
0,0,1,1,2110.5357,676.0632354,3108.917787
0,1,1,1,918.4676314,1106.627216,11398.01208
0,0,1,1,2758.096955,4735.215718,1418.690422
0,1,1,1,2456.208457,918.6245816,4548.556261
0,1,1,1,616.4292264,1033.661611,1213.613946
0,0,1,1,3673.275956,6316.504889,1704.047563
0,1,1,1,1102.850296,304.9154776,904.2769054
0,0,1,2,2714.610724,877.3159168,993.7824266
0,1,2,2,20090.52479,6085.947656,7590.092351
1,1,1,2,9098.780523,3498.802425,1335.317433
1,1,1,2,7196.92957,11977.46854,1240.066816
1,1,1,1,842.3146743,1255.501335,4314.886534
1,0,2,2,13516.51008,4775.164303,1106.604846
1,0,1,2,4460.828162,1083.99043,943.9937287
1,0,1,2,4818.845124,9286.932256,8197.376118
1,0,1,1,2848.023551,4739.321046,1113.329766
1,.,1,2,3050.820193,805.9607077,761.0495075
1,.,1,1,4256.565135,1176.233283,1775.665732
1,1,1,1,5211.416525,1570.093954,291.3352611
1,1,1,2,4181.422692,8242.459467,2573.543707
1,1,1,1,3179.943422,4688.70721,6518.281019
.,.,1,1,3451.708714,4256.625072,1167.477312
.,.,1,1,3018.745468,4729.150346,1565.75873
.,.,1,2,4098.775,1486.742512,1283.25
.,.,1,1,3483.708151,1098.980881,1349.598
;RUN;
/*Sort analysis file on BY variables, as required by PROC SURVEYFREQ*/
PROC SORT data=have;
BY TM_group ADRD_group; RUN;
/*** SECTION 1 -- Examine weighted freqs for each group ***/
/*Section 1.1 -- Weighted freqs - both person and sample weights*/
/*Section 1.1.1 -- Use ODS to save results into a file*/
ods output crosstabs = temp.freqs_raw_t1
chisq = temp.chisq_raw_t1;
/*Section 1.1.2 -- Generate weighted freqs*/
title1 "Section 1.1.2 -- Weighted freqs - both person and sample weights V&version.";
PROC SURVEYFREQ data=have VARMETHOD=BRR missing;
WEIGHT PUFFWGT;
REPWEIGHTS PUFF001-PUFF002;
BY TM_group;
TABLES DEM_AGE*ADRD_group DEM_SEX*ADRD_group / rschisq;
RUN;
/*** SECTION 2 -- Reorganize ODS output tables ***/
/*Section 2.1 -- freqs table*/
/*Keep needed variables and use macro to populate Value_Labels with proper values*/
DATA temp.freqs_t1_t; SET temp.freqs_raw_t1;
DROP WgtFreq /*Frequency*/ StdDev StdErr _SkipLine;
/*Create Value_Labels column and populate it with labels from each variable*/
Value_Labels = " ";
%macro val_labs (val_labs);
IF Table = "Table &val_labs. * ADRD_group" THEN Value_Labels = F_&val_labs.;
DROP F_&val_labs. &val_labs.;
%mend;
%val_labs(DEM_AGE)
%val_labs(DEM_SEX)
/*Reformat and combine weighted count and percent columns*/
PERCENT_RD = ROUND(Percent,.01);
Freq_percent = " ";
Freq_percent = COMPRESS(Frequency||"("||put(PERCENT_RD,8.2)||"%"||")");
/*IF Frequency <10 THEN Freq_percent = "<10"*/;
DROP /*Frequency*/ Percent PERCENT_RD; /*Retain Frequency because it is needed for the Total column*/
RUN;
DATA temp.chisq_2nd_t1; SET temp.chisq_raw_t1;
IF Label1 ^= "Pr > ChiSq" THEN delete; RUN;
/*REVISE THIS SECTION TO INCLUDE MISSING COLUMN*/
/*Section 2.1.1 -- Put columns in correct order*/
PROC REPORT data=temp.freqs_t1_t;
COLUMNS table Value_Labels TM_group, ADRD_group,(Freq_percent Frequency);
DEFINE table / group;
DEFINE value_labels / group;
DEFINE TM_group / across;
DEFINE ADRD_group /across;
RUN;
/*Section 2.2 -- chisq table*/
/*Drop unneeded variables, delete results for missing cases, and only keep Rao-Scott chisq p-value rows*/
DATA temp.chisq_t1_t; SET temp.chisq_raw_t1;
DROP Name1 nValue1;
IF TM_group = . THEN delete;
/*IF ADRD_group = . THEN delete;*/
IF Label1 ^= "Pr > ChiSq" THEN delete;
RUN;
MA_group
TM_group
Missing
no ADRD
ADRD
Total
P-value
Missing
no ADRD
ADRD
Total
P-value
(N=14)
(N=21836)
(N=1232)
(N=23082)
(N=17)
(N=29642)
(N=1475)
(N=31134)
Age group, n (%)
<.0001 1
<.0001 1
2:Age Group [65,75)
4 (28.6%)
8701 (39.8%)
179 (14.5%)
8884 (38.5%)
5 (29.4%)
12504 (42.2%)
215 (14.6%)
12724 (40.9%)
3:Age Group >=75
10 (71.4%)
13135 (60.2%)
1053 (85.5%)
14198 (61.5%)
12 (70.6%)
17138 (57.8%)
1260 (85.4%)
18410 (59.1%)
Gender, n (%)
<.0001 1
0.00751
1:Male
6 (42.9%)
9479 (43.4%)
453 (36.8%)
9938 (43.1%)
6 (35.3%)
13623 (46.0%)
619 (42.0%)
14248 (45.8%)
2:Female
8 (57.1%)
12357 (56.6%)
779 (63.2%)
13144 (56.9%)
11 (64.7%)
16019 (54.0%)
856 (58.0%)
16886 (54.2%)
... View more
First, let me say that I'm a fan of the Reddit community. I've used it to learn tricks about all types of topics, including home improvement and video game secrets. I've also answered SAS questions in some subreddits. However, over the past several months we've seen many old Reddit topics that are copy/pasted into new threads here on the SAS Community.
This isn't Reddit's fault and has nothing to do with the original authors of the topics. Instead, it's an approach by spammers to create what seems like a legitimate topic on the forums, get some credibility for their profile, and then follow it up with other replies that link to unrelated commercial sites.
The trick they use is to select some of the more provocative topics on Reddit, like "why is SAS so difficult" or "Should I learn SAS instead of Python, what do you guys think." We have no objection to authentic questions like these from community members, but we do not allow this inauthentic approach to generating engagement for misleading commercial purposes.
Ours is not the only community that experiences this. Many of our industry peers who manage other communities are reporting that they see the same thing on their forums.
When we spot cases like these we take action. We mark the topic as Spam (to remove it from view) and then we ban the user account that posted it. How can we tell the content came from Reddit? There is a trick you can use with Google search: copy a unique phrase from the post and paste it into a Google search field in quotation marks to find the exact phrase in other internet sites.
In a recent example, a thread contained the phrase "SAS seems astonishingly unintuitive and overly rigid". A search for this exact phrase yielded a single result: a Reddit thread from 2019. That's all I needed to confirm that this was not an authentic post for our community, but an effort to leverage the popularity of our site for an unrelated purpose.
If you see a community post that seems provocative like this, think twice before you respond. Many community members are quick to jump in and advocate for SAS and encourage the original poster to stick with it and learn more...but we'd hate for you to invest time in a reply that gets deleted because the topic was not genuine.
If you see/suspect spam topics like these, use the Report Inappropriate Content menu item on the message to let us know. We can investigate and then take action as needed.
As always, thank you for your advocacy and for helping fellow SAS users on the community!
... View more
Obs months 1 2 2 2 3 6 4 2 I want to merge the dataset above with other datasets based on value of months variable. For example, I want to merge two months of data for the first record, two months of data for the second record, 6 months of data for the third record, and 2 months of records for the fourth record. Is there a systematic way of doing it? I was thinking about creating a variable based on the months variable and create a loop within a macro to achieve it.
... View more
Hello community,
I am struggling finding the components for a normal distribution function using the parameter class, where should I find for each category the parameters to write my probability function.
My first step was to fit my function to my data like that:
ods graphics on /height=900 width=1600;
ods select DensityPlot;
proc hpfmm data=casuser.ds_11_15 gconv=0 plots=density(bins=90);
class reg_clim_num;
bayes MUPRIORPARMS((0, 2E12 ) (-40000, .) ) ;
model dk_cons =/ dist=normal k=2 /*parms(23000000 ., 34000000 ., 120000000 . )*/;
ods output PostIntervals=PostIntervals;
ods output PostSummaries=PostSummaries;
run;
Then I would like to define the parameters by my category variable reg_clim
ods graphics on /height=900 width=1600;
ods select DensityPlot;
proc hpfmm data=casuser.ds_11_15 gconv=0 plots=density(bins=90);
class reg_clim;
bayes MUPRIORPARMS((0, 2E12 ) (-40000, .) ) ;
model dk_cons = reg_clim/ dist=normal k=2 /*parms(23000000 ., 34000000 ., 120000000 . )*/;
ods output PostIntervals=PostIntervals;
ods output PostSummaries=PostSummaries;
run;
WARNING: Output 'DensityPlot' was not created. Make sure that the output object name, label, or path is spelled correctly. Also,
verify that the appropriate procedure options are used to produce the requested output object. For example, verify that
the NOPRINT option is not used.
I got this warning with no plot distribution and with these outputs
I don't know how to interpret them. The mean has strange values and the reg_clim 15 has missing parameters.
Do you have any advice, thank you 🙂
I will upload my data and my output.
... View more