Thoughts of a Crime Show Junkie: Inadmissible Evidence
Recent Library Articles
Recently in the SAS Community Library: SAS' @RhondaWilliams reveals how SAS Law Enforcement Intelligence helps law enforcement agencies expedite detailed data entry using the Evidence Creation feature.
I need suggestions on how to widen my dataset. I need a single line for each ID while keeping all the data in the sample dataset. I plan on later merging this with another dataset by ID. I was unable to transpose the data because an ID can have multiple same_spans of the same number (see ID R in the data example).
Thanks!
Data Have;
input
ID$ Clinic$ Span_Beg :Date9. Span_End :Date9. Same_Span ;
format Span_Beg MMDDYY10. Span_End MMDDYY10.;
datalines;
A Clinic1 17Nov2022 4Jan2023 1
B Clinic2 1Jul2022 29Jul2022 1
B Clinic2 21Nov2022 4Apr2023 2
C Clinic1 1Jul2022 7Jul2022 1
D Clinic3 1Jul2022 12Jul2022 1
E Clinic1 22Nov2022 19Mar2023 1
F Clinic2 26Aug2022 15May2023 1
G Clinic4 1Oct2022 10Apr2023 1
H Clinic5 30Dec2022 19Jan2023 1
I Clinic6 1Jul2022 1Nov2022 1
I Clinic6 11Nov2022 10Jan2023 2
I Clinic6 1Feb2023 30Apr2023 3
J Clinic2 1Jul2022 21Aug2022 1
K Clinic2 9Mar2023 30Jun2023 1
L Clinic5 20Sep2022 20Feb2023 1
M Clinic5 1Jul2022 2Aug2022 1
N Clinic1 6Jul2022 1Sep2022 1
O Clinic7 1Jul2022 1Sep2022 1
P Clinic8 1Jul2022 24Oct2022 1
Q Clinic8 1Jul2022 4Dec2022 1
R Clinic9 19Dec2022 31May2023 1
R Clinic9 8Jun2023 18Jun2023 2
R Clinic1 28Aug2022 18Sep2022 1
S Clinic1 1Jul2022 10Jul2022 1
S Clinic3 4Aug2022 17Nov2022 1
;
run;
... View more
The fact that I even have to post this tells u what kind of day I'm having 😞 If the HLT_ALZDEM variable doesn't have a valid value of 1 or 2 (yes/no), then the variables that are constructed based on HLT_ALZDEM should be given a value of missing. However, they sometimes are given a value of 0 instead.
Below is a portion of the output file, sorted for rows where HLT_ALZDEM is missing. Since missing is clearly not 1 or 2, all ADRD_group and non_ADRD_group values should also be missing. But they're not. Obs 9 and 10 are identical except for these 2 variables., when Obs 9 is missing and Obs 10 is 0 for both.
The only possible explanation I can think of is that there are non-printable characters for some values of HLT_ALZDEM. But even if there are, those non-printable characters are not 1 or 2. Frequencies for HLT_ALZDEM only show values of 1, 2, and missing.
IF cohort_flag = 1 AND ADM_MA_FLAG_YR = 3
THEN MA_group = 1; ELSE MA_group = 0;
IF cohort_flag = 0 THEN MA_group = .;
IF cohort_flag = 1 AND ADM_FFS_FLAG_YR = 3
THEN TM_group = 1; ELSE TM_group = 0;
IF cohort_flag = 0 THEN TM_group = .;
IF cohort_flag = 1 AND HLT_ALZDEM = 1
THEN ADRD_group = 1; ELSE ADRD_group = 0;
IF cohort_flag = 0 THEN ADRD_group = .;
IF HLT_ALZDEM ^in (1,2) THEN ADRD_group = .;
IF cohort_flag = 1 AND HLT_ALZDEM = 2
THEN non_ADRD_group = 1; ELSE non_ADRD_group = 0;
IF cohort_flag = 0 THEN non_ADRD_group = .;
IF HLT_ALZDEM ^in (1,2) THEN non_ADRD_group = .;
Obs
HLT_ALZDEM
cohort_flag
MA_group
TM_group
ADRD_group
non_ADRD_group
1
0
2
1
0
1
0
0
3
1
1
0
4
0
5
0
6
1
0
1
7
1
1
0
8
1
1
0
9
1
1
0
10
1
1
0
0
0
11
1
1
0
12
1
0
1
13
1
0
1
14
1
0
1
15
1
0
1
16
1
0
1
0
0
... View more
I have a process where I am:
reading 67M rows from SQL Server to SAS; transforming the data using SAS; writing 67M rows from SAS to SQL Server.
no rows are dropped or created in the transformation processing. it's just fixing some incorrect values in historic data.
the source and target tables have identical schema
the target table is a heap (no indexes, keys, constraints)
The read operation is taking approx. 5 minutes; the write operation is taking approx. 5 hours. So, yeah 😬😱😠
Here are some excerpts from the log:
Libname statement:
27 %libname_sqlsvr(libref=PROD,server=MYSERVER,port=,database=MYDB,schema=prod)
=================================================================================
LIBREF: PROD
CONNECT: NOPROMPT="Driver={SQL Server Native Client
10.0};Server=MYSERVER;Database=MYDB;Trusted_Connection=yes;"
INTERNAL OPTIONS: schema=prod bulkload=yes dbcommit=100000 direct_exe=delete preserve_names=yes
USER OPTIONS:
=================================================================================
ODBC: AUTOCOMMIT is NO for connection 3
MPRINT(LIBNAME_SQLSVR): libname PROD odbc NOPROMPT="Driver={SQL Server Native Client
10.0};Server=MYSERVER;Database=MYDB;Trusted_Connection=yes;" schema=prod bulkload=yes dbcommit=100000
direct_exe=delete preserve_names=yes ;
NOTE: Libref PROD was successfully assigned as follows:
Engine: ODBC
Physical Name:
Read data from SQL Server to SAS:
27 * Copy data from SQL Server ;
ODBC: AUTOCOMMIT is NO for connection 6
ODBC_28: Prepared: on connection 6
SELECT * FROM "prod"."RLDX_DEMOGRAPHIC_ORIG"
28 data workspde.RLDX_DEMOGRAPHIC;
29 set prod.RLDX_DEMOGRAPHIC_ORIG;
30 run;
ODBC_29: Executed: on connection 6
Prepared statement ODBC_28
NOTE: There were 67,766,170 observations read from the data set PROD.RLDX_DEMOGRAPHIC_ORIG.
NOTE: The data set WORKSPDE.RLDX_DEMOGRAPHIC has 67,766,170 observations and 34 variables.
NOTE: Compressing data set WORKSPDE.RLDX_DEMOGRAPHIC decreased size by 75.73 percent.
NOTE: DATA statement used (Total process time):
real time 5:05.05
user cpu time 6:15.26
system cpu time 57.09 seconds
memory 1401.65k
OS Memory 22828.00k
Timestamp 25/03/2020 07:17:50 AM
Write data from SAS to SQL Server:
27 proc append
ODBC_24: Prepared: on connection 6
SELECT * FROM "prod"."RLDX_DEMOGRAPHIC" WHERE 0=1
ODBC: AUTOCOMMIT is NO for connection 7
ODBC: COMMIT performed on connection 7.
ODBC_25: Prepared: on connection 7
SELECT * FROM "prod"."RLDX_DEMOGRAPHIC"
28 base=prod.RLDX_DEMOGRAPHIC (
29 bulkload=yes bl_options=tablock
30 )
31 data=workspde.RLDX_DEMOGRAPHIC6 (
32 keep=
33 rldx_record_key
34 rldx_audit_key
35 recnum
36 stay_number
37 episode_sequence_number
38 hospital_type
39 source
40 hoscode
41 tfrhosp
42 transfrom
43 mrn
44 surname
45 givnames
46 sex
47 dob
48 age
49 address
50 locality
51 pcode
52 cob
53 admdate
54 sepdate
55 dthdate
56 notifdate
57 patientnum
58 supi
59 moh_auid
60 AHS_hosp
61 dc_extract_date
62 cob_sacc
63 state_of_usual_residence
64 rldx_valid_from
65 rldx_valid_to
66 rldx_current_ind
67 );
68 run;
NOTE: Appending WORKSPDE.RLDX_DEMOGRAPHIC6 to PROD.RLDX_DEMOGRAPHIC.
NOTE: There were 67,766,170 observations read from the data set WORKSPDE.RLDX_DEMOGRAPHIC6.
NOTE: 67766170 observations added.
NOTE: The data set PROD.RLDX_DEMOGRAPHIC has . observations and 34 variables.
NOTE: PROCEDURE APPEND used (Total process time):
real time 5:34:15.35
user cpu time 18:04.33
system cpu time 37.90 seconds
memory 634.87k
OS Memory 21804.00k
Timestamp 25/03/2020 03:34:13 AM
The write operation is a TRUNCATE TABLE (not shown) and PROC APPEND approach.
I note this previous thread: https://communities.sas.com/t5/General-SAS-Programming/Get-data-into-sql-server-more-efficiently/td-p/95560
I will continue playing with the libname and/or dataset options, but at 5 hours a pop for performance metrics, I thought I'd post to see if someone has further ideas. For example, is it possible that bulkloading needs to be tweaked on SQL Server itself (not that I'd know how to diagnose that, and I'm not the DBA).
... View more
Hi SAS Community,
I've been trying to create a grouped forest plot in SAS, but so far, I've only been able to find resources on how to create a single forest plot. However, my requirement is to combine two forest plots into one, similar to the attached sample plot.
I'm wondering if anyone could offer some guidance on how to achieve this. Should I use proc sgplot or proc template for this purpose? Currently, I'm using the proc sgplot code provided in this post: CTSPedia Clinical Graphs - Subgrouped Forest Plot.
Any insights or examples you could share would be greatly appreciated!
Thank you in advance!
Jess
... View more