Recently in the SAS Community Library: SAS' @Sundaresh1 highlights a sometimes overlooked task when applying document embeddings for purposes of similarity-based search. Normalisation of vectors helps obtain relevant matches.
I've been looking for a good, efficient method to create "Table 1", which has a more or less standard format in research journals:
Variable Name
Category 1
Category 2
P-value
Variable 1
1
N(%)
N(%)
Chi-sq or Fisher's
2
N(%)
N(%)
3
N(%)
N(%)
Variable 2
Mean (SD)
Mean (SD)
t-test
Median (IQR)
Median (IQR)
I found this PDF paper. It seems to work pretty well, but I'm having 2 issues. The first is that the chi-square results are saved into a file and then merged into the main file. But the value of the "variable" variable is the var label in 1 file and the var name in the other, so they don't merge.
The second issue is that Proc Report does not produce a table and instead gives me this: NOTE: Groups are not created because the usage of levels is DISPLAY. To avoid this note, change all GROUP variables to ORDER variables.
WARNING: A GROUP, ORDER, or ACROSS variable is missing on every observation.
data have;
infile datalines dsd dlm=',' truncover;
input DEM_AGE DEM_SEX cohort_flag TM_group;
datalines;
3,1,1,0
2,1,1,1
3,2,1,1
3,2,1,1
3,2,1,0
2,2,1,1
2,1,1,1
3,1,1,1
2,1,1,1
3,2,1,0
2,1,1,0
2,2,1,1
3,2,1,0
2,2,0,
3,2,1,1
3,2,1,1
3,1,1,0
3,2,1,0
2,1,1,0
3,1,1,1
3,2,1,1
3,2,1,0
3,2,1,1
3,2,1,1
3,2,1,1
; RUN;
/*Load formats from existing file in temp folder*/
options fmtsearch=(temp.formats);
/*DEM_SEX sex 1 Male 2 Female*/
/*DEM_AGE AGE2GRP 1:Age Group <65 2:Age Group [65,75) 3:Age Group >=75*/
/*TM_group yesno 1 Yes 2 No*/
/*Generate descriptive statistics*/
proc means data = temp.have noprint n sum mean;
class DEM_AGE DEM_SEX;
var TM_group /*MA_group*/;
ways 1;
output out = temp.expl_PreTable n =
sum =
mean = / autoname;
WHERE cohort_flag = 1;
run;
/*Format descriptive stats*/
data temp.expl_Table (keep = variable levels TM_group_N TM_group_sum
TM_group_mean pct ExpPct indexvar); set temp.expl_PreTable;
length variable $ 20; /* These four variables */;
length levels $ 20; /* will describe the first */;
length pct $ 8; /* four columns of the table */;
length ExpPct $ 15;
if DEM_AGE ne . then do; /*Building "variable" and "Levels" columns for "DEM_AGE"*/;
variable = 'Age category';
levels = put(DEM_AGE, age2grp.);
IndexVar = 1; /*This index is included just in case the order of data presentation needs to be changed*/;
end;
if DEM_SEX ne . then do; /*Building "variable" and "Levels" columns for "DEM_SEX"*/;
variable = 'Sex';
levels = put(DEM_SEX,sex.);
IndexVar = 2;
end;
pct = put(TM_group_mean*100,4.1); /*Calculate % exposed */;
ExpPct = compress(put(TM_group_sum,comma4.),' ')
||' '||'('||compress(pct,' ')||')'; /*creating data in the form of "count (%)" */;
run;
/*Run chi-square significance tests and use ODS to create a dataset of these results*/
ods trace on;
ods output chisq = temp.expl_ChiData;
proc freq data = temp.have;
table TM_group*DEM_AGE / chisq;
table TM_group*DEM_SEX / chisq;
WHERE cohort_flag = 1;
run;
ods trace off;
/*Rearrange chi-square dataset so it can be merged with descriptive stats table*/
data temp.expl_ChiData2 (keep = variable prob);
set temp.expl_ChiData (where = (statistic = 'Chi-Square'));
length variable $ 20;
variable = scan(table,-1,' '); /* Returns the last word in a character value from the "table" variable*/
run;
/*Sort both tables so they will merge*/
PROC SORT data=temp.expl_Table OUT=temp.expl_Table_sort; BY variable; RUN;
PROC SORT data=temp.expl_ChiData2 OUT=temp.expl_ChiData2_sort; BY variable; RUN;
/**************MERGE DOES NOT WORK BECAUSE The value of the "variable" variable is the var label in 1 file
and the var name in the other. DUE TO THE 'IF a' STATEMENT, NOTHING FROM THE CHIDATA2 FILE IS MERGED IN****/
/*Merge descriptive stats table with chi-square table*/
DATA temp.expl_TableData;
MERGE temp.expl_Table_sort (in = a) temp.expl_ChiData2_sort (in = b);
BY variable;
IF a;
RUN;
/*Use PROC REPORT to create final output table*/
proc report data = temp.expl_TableData nowd;
column variable levels TM_group_N ExpPct prob;
define variable / "Variable" group format = $variable.;
define levels / " " ;
define TM_group_N / "TM" /*format = comma5.*/;
define ExpPct / "TM Group/n (%)";
define prob / "p-value" group format = pvalue6.4;
Title "Table 1. Descriptive characteristics of individuals in the sample";
RUN;
... View more
Join MSUG for their 1-Day SAS Conference!
Date: Wednesday, June 12, 2024 Time: 8:00 AM - 4:30 PM Place: VisTaTech Center Schoolcraft College 18600 Haggerty Rd Livonia, MI 48152 Cost: $50 on or before May 28, 2024; $95 after May 28, 2024. $10 students with proof of student status.
Register Now!
Agenda
Know Thy Data: Techniques for Data Exploration - Charu Shankar, SAS
Bayesian Time Series in PROC MCMC - Danny Modlin, SAS
Introduction to Data Simulation - Jason Brinkley, Abt Associates Inc.
SAS HPSPLIT: A Powerful Machine Learning Tool - Russ Lavery, Independent Consultant
NHANES Dietary Supplement Component: A Parallel Programming Project - Jay Iyengar, Data System Consultants, LLC
Being a Statistical Expert Witness - David Corliss, Grafham Analytics
Binning Procedures for Logistic Regression - Bruce Lund, Independent Consultant
Confessions of a PROC SQL Instructor - Charu Shankar, SAS
Missing Data in PROC MCMC - Danny Modlin, SAS
Regression Models for Count Data - Jason Brinkley, Abt Associates Inc.
An Animated Introduction to Git and GitHub - Russ Lavery, Independent Consultant
SAS Job Searching and Interviewing Tips – Strategies in the Post-Pandemic Era - Jay Iyengar, Data System Consultants, LLC
They are also planning to hold the following training classes before and after the conference. Cost is $185 for a half-day class, and $370 for a full-day class. All classes will be held at the VisTaTech Center at Schoolcraft College (18600 Haggerty Rd, Livonia, MI 48152). Click on the titles for the course descriptions. Tuesday, June 11
8:00 AM - 5:00 PM: SAS Macros in Cartoons: Complex Stuff Made Easy! - Russ Lavery, Independent Consultant
Thursday, June 13
8:00 AM - 12:00 PM: An Overview of Multivariate Statistical Analysis of Quantitative Data (PCA, FA, and Clustering) - Jason Brinkley, Abt Associates Inc.
1:00 - 5:00 PM: An Overview of Causal Inference, Counterfactual Data Analysis, and Propensity Score Methods - Jason Brinkley, Abt Associates Inc.
Register Now!
... View more
Greetings. I asked an online LLM to generate some SAS code for me. It did so. I haven't tried the code yet. The procedures contained in the code are listed below. I'm wondering if any of the procedures will be missing from SAS online, SAS Studio? Please explain limitations that exist, such as which procedures are missing, dataset size limits, code length limits, etc.
SAS Procedures Required In LLM-Generated Code: proc assoc proc cluster proc corr proc cvmodelfit proc fcmp proc glm proc glmselect proc hpensemble proc hpforest proc hplogistic proc hpreduce proc import proc means proc nlin proc optmodel proc print proc score proc sgplot proc sql proc stdize proc surveyselect proc transpose proc tree
... View more
Does anyone know how to calculate odds ratio in its original scale from the coefficient and/or odds ratio from a logistic regression model which includes a natural log transformed independent variable? For example, logit (p/(1-p)) = b0+ b1 In (x) If b1=0.1, (b1 is the coefficient of natural log transformed independent variable) What is the OR for one unit increase in the natural log of x? What is the OR for one unit increase in original scale of X? What is the OR for 5-unit increase in natural log of x? What is the OR for 5-unit increase in X?
... View more