Thoughts of a Crime Show Junkie: Inadmissible Evidence
Recent Library Articles
Recently in the SAS Community Library: SAS' @RhondaWilliams reveals how SAS Law Enforcement Intelligence helps law enforcement agencies expedite detailed data entry using the Evidence Creation feature.
I've been looking for a good, efficient method to create "Table 1", which has a more or less standard format in research journals:
Variable Name
Category 1
Category 2
P-value
Variable 1
1
N(%)
N(%)
Chi-sq or Fisher's
2
N(%)
N(%)
3
N(%)
N(%)
Variable 2
Mean (SD)
Mean (SD)
t-test
Median (IQR)
Median (IQR)
I found this PDF paper. It seems to work pretty well, but I'm having 2 issues. The first is that the chi-square results are saved into a file and then merged into the main file. But the value of the "variable" variable is the var label in 1 file and the var name in the other, so they don't merge.
The second issue is that Proc Report does not produce a table and instead gives me this: NOTE: Groups are not created because the usage of levels is DISPLAY. To avoid this note, change all GROUP variables to ORDER variables.
WARNING: A GROUP, ORDER, or ACROSS variable is missing on every observation.
data have;
infile datalines dsd dlm=',' truncover;
input DEM_AGE DEM_SEX cohort_flag TM_group;
datalines;
3,1,1,0
2,1,1,1
3,2,1,1
3,2,1,1
3,2,1,0
2,2,1,1
2,1,1,1
3,1,1,1
2,1,1,1
3,2,1,0
2,1,1,0
2,2,1,1
3,2,1,0
2,2,0,
3,2,1,1
3,2,1,1
3,1,1,0
3,2,1,0
2,1,1,0
3,1,1,1
3,2,1,1
3,2,1,0
3,2,1,1
3,2,1,1
3,2,1,1
; RUN;
/*Load formats from existing file in temp folder*/
options fmtsearch=(temp.formats);
/*DEM_SEX sex 1 Male 2 Female*/
/*DEM_AGE AGE2GRP 1:Age Group <65 2:Age Group [65,75) 3:Age Group >=75*/
/*TM_group yesno 1 Yes 2 No*/
/*Generate descriptive statistics*/
proc means data = temp.have noprint n sum mean;
class DEM_AGE DEM_SEX;
var TM_group /*MA_group*/;
ways 1;
output out = temp.expl_PreTable n =
sum =
mean = / autoname;
WHERE cohort_flag = 1;
run;
/*Format descriptive stats*/
data temp.expl_Table (keep = variable levels TM_group_N TM_group_sum
TM_group_mean pct ExpPct indexvar); set temp.expl_PreTable;
length variable $ 20; /* These four variables */;
length levels $ 20; /* will describe the first */;
length pct $ 8; /* four columns of the table */;
length ExpPct $ 15;
if DEM_AGE ne . then do; /*Building "variable" and "Levels" columns for "DEM_AGE"*/;
variable = 'Age category';
levels = put(DEM_AGE, age2grp.);
IndexVar = 1; /*This index is included just in case the order of data presentation needs to be changed*/;
end;
if DEM_SEX ne . then do; /*Building "variable" and "Levels" columns for "DEM_SEX"*/;
variable = 'Sex';
levels = put(DEM_SEX,sex.);
IndexVar = 2;
end;
pct = put(TM_group_mean*100,4.1); /*Calculate % exposed */;
ExpPct = compress(put(TM_group_sum,comma4.),' ')
||' '||'('||compress(pct,' ')||')'; /*creating data in the form of "count (%)" */;
run;
/*Run chi-square significance tests and use ODS to create a dataset of these results*/
ods trace on;
ods output chisq = temp.expl_ChiData;
proc freq data = temp.have;
table TM_group*DEM_AGE / chisq;
table TM_group*DEM_SEX / chisq;
WHERE cohort_flag = 1;
run;
ods trace off;
/*Rearrange chi-square dataset so it can be merged with descriptive stats table*/
data temp.expl_ChiData2 (keep = variable prob);
set temp.expl_ChiData (where = (statistic = 'Chi-Square'));
length variable $ 20;
variable = scan(table,-1,' '); /* Returns the last word in a character value from the "table" variable*/
run;
/*Sort both tables so they will merge*/
PROC SORT data=temp.expl_Table OUT=temp.expl_Table_sort; BY variable; RUN;
PROC SORT data=temp.expl_ChiData2 OUT=temp.expl_ChiData2_sort; BY variable; RUN;
/**************MERGE DOES NOT WORK BECAUSE The value of the "variable" variable is the var label in 1 file
and the var name in the other. DUE TO THE 'IF a' STATEMENT, NOTHING FROM THE CHIDATA2 FILE IS MERGED IN****/
/*Merge descriptive stats table with chi-square table*/
DATA temp.expl_TableData;
MERGE temp.expl_Table_sort (in = a) temp.expl_ChiData2_sort (in = b);
BY variable;
IF a;
RUN;
/*Use PROC REPORT to create final output table*/
proc report data = temp.expl_TableData nowd;
column variable levels TM_group_N ExpPct prob;
define variable / "Variable" group format = $variable.;
define levels / " " ;
define TM_group_N / "TM" /*format = comma5.*/;
define ExpPct / "TM Group/n (%)";
define prob / "p-value" group format = pvalue6.4;
Title "Table 1. Descriptive characteristics of individuals in the sample";
RUN;
... View more
Hi there,
I am using ods excel along with proc report. Is there a way to achieve this using ods excel.
The equivalent of excel is here. (In excel select columns that need to be grouped, then go to DATA menu and then group sub menu.
Thanks in advance
RP
... View more
Hi, I think it would be beneficial to have Generation Data group like dataset in SAS similar to the format used in Mainframe. Generation Data Groups (GDGs) are group of datasets related to each other by a common name. The common name is referred as GDG base and each dataset associated with the base is called a GDG version. You can set the limit of the related files(generations). We can easily keep track of all generation of data sets. Any particular generation can be referred easily. We use lot of datasets which regenerates daily , monthly , yearly etc.. Consider I need to create daily transaction data for the month of May. If Generation Data Group is available, I would define the base like "Tran_May2024". Then I would just create dataset daily in the below way. Data Tran_May2024(+1); -- This will create the next available version. set work_tran_table; run; If need to point current or earlier version i would use the dataset below. Tran_May2024(0) - Latest Version Tran_May2024(-1) - Previous Version Tran_May2024(-2) - 2 versions back If I need the whole month data , i just refer the base Data work_tran_may; set Tran_May2024; --This would all the version available in the base run; you can have options like below LIMIT – To limit the maximum number of generations. NOEMPTY – Uncatalog only the oldest generation in GDG when the limit is reached. EMPTY – Uncatalog all the generations when a limit is reached. SCRATCH -Physically delete the dataset(generation) which is uncataloged. NOSCRATCH – Don’t Physically delete the dataset(generation) which is uncataloged. This would help a lot when we create lot of datasets which are created in a repeatable fashion. Thank Ravi
... View more
Hello everyone,
I am trying to create a dynamic proc format that is based on the monthly data for the past 7 or 8 years. I am using SAS version SAS 9.4 and enterprise guide 8.3. Here is a look at a portion of the normal proc format that I am trying to emulate.
proc format; INVALUE classordinal '2016-05'= 1 '2016-06'= 2 '2016-07'= 3 '2016-08'= 4 '2016-09'= 5 '2016-10'= 6 '2016-11'= 7 '2016-12'= 8 '2017-01'= 9 '2017-02'= 10 '2017-03'= 11 '2017-04'= 12
;
RUN;
I figured out how to generate the needed months, but don't know how to put it into a proc format statement.
data empty; format vintage yymmd7.; do i = 96 by -1 to 1; vintage = intnx('month', today(), -i, 'B'); classrank = 97 - i; output; end; drop i; run;
proc sql; select put(vintage, yymmd7.), put(classrank, 8.) into :vintages separated by ' ', :classranks separated by ' ' from empty; quit;
results
2016-05
1
2016-06
2
2016-07
3
2016-08
4
2016-09
5
2016-10
6
2016-11
7
2016-12
8
2017-01
9
2017-02
10
2017-03
11
2017-04
12
Thanks for your help.
... View more
I have a report (e.g. below) where a set of list controls control the data used in the scatter plot and a simple univariate linear regression. The table below has some reference x values, and the "Predicted" column is derived from the linear regression (right click on object, "Derive predicted", and included as an item in the dataset, avg. aggregation.) The linear regression itself updates whenever I select/deselect any item in the controls. But the 'Predicted' values are the values from the initial version of the regression from which I created the field. Is there any way to get the predicted y values from the regression updating dynamically and displayed in the report below? It should change whenever I change the list control selection. I just want the report viewer to be able to see the estimated y values at certain x values as they slice data in different combinations. Thank you!
... View more