Hi all - ¿How can I connect files from a web server to sas viya? I want to automatically capture the log files in .gz, and then process them (ex: domain.com-2024-04-11.gz).
... View more
I'm working on an actuarial project to estimate monthly probabilities that someone becomes disabled. A portfolio of persons (each having a different 'PolicyNr') is observed during 12 months, and the time until disability is registered by the variable 'TimetoDisability'. When no disability occured during the 12 months, the variable 'RightCensored' has the value 1. We further have the variables 'Gender', 'AgeatDisability' (which equals the age at the disability, or the age after 12 months for the right censored observations), and the variable 'OccupationClass'. We have data available in a wide format. For example, the following lines are part of the data: PolicyNr TimetoDisability RightCensored Gender AgeatDisability OccupationClass 001 2 months 0 Male 40 year 1 002 3 months 0 Male 30 year 2 003 12 months 1 Female 42 year 1 Intuitively, I would model this using a Cox proportional hazard model, with the variable 'TimetoDisabilty' as the time until the occurrence of the disability, and 'Gender', 'AgeatDisability', and 'OccupationClass' as covariates. Monthly probabilities are derived from the survival function. Now assume -because of practical/technical reasons- it is only possible to perform a GLM Binomial regression. I read that performing a GLM Binomial regression on data with pseudo observations is analogue to a Cox Discrete Time Survival model. To prepare the analysis, I transform the wide dataset to a long dataset (with pseudo observations, see, e.g., https://grodri.github.io/glms/notes/c7s6), in which each line is duplicated according the variable TimetoDisability. For example, the first line from the table above is transformed to 2 lines as is took 2 months to become disabled. The last line has a value 1 for the variable 'Disability', as the disability occured in month 2. The variable 'AgeatDisability' is transformed into the variable 'Age', now representing the age during that month. The right censored observation is transformed into 12 lines, all having the value zero for the variable 'Disability', as the disability is not observed. This becomes: PolicyNr Duration Disability Gender Age OccupationClass 001 1 0 Male 39 year 11 months 1 001 2 1 Male 40 year 1 002 1 0 Male 29 year 10 months 2 002 2 0 Male 29 year 11 months 2 002 3 1 Male 30 year 2 003 1 0 Female 41 year 1 months 1 003 2 0 Female 41 year 2 months 1 003 3 0 Female 41 year 3 months 1 003 4 0 Female 41 year 4 months 1 003 5 0 Female 41 year 5 months 1 003 6 0 Female 41 year 6 months 1 003 7 0 Female 41 year 7 months 1 003 8 0 Female 41 year 8 months 1 003 9 0 Female 41 year 9 months 1 003 10 0 Female 41 year 10 months 1 003 11 0 Female 41 year 11 months 1 003 12 0 Female 42 year 1 Question: In this long data format, the multiple rows (pseudo observations) for each person are not independent. We have repeated measures for each person. However, I read in Therneau and Grambsch: (quote) "One concern that often arises is that observations [on the same individual] are "correlated," and would thus not be handled by standard methods. This is not actually an issue. The internal computations for a Cox model have a term for each unique death or event time..." So for a Cox Discrete Time Survival model, the dependency is not an issue. However, I don't see how the dependency in the data is not an issue for a GLM Binomial regression? Is it -given the dependency in the data- appropriate to perform a GLM to get trustworthy estimates of monthly probabilities? Or should I go for a mixed effect model? Thank you.
... View more
Hi all - I have a specific report format that I am trying to achieve using PROC REPORT, and I'm getting pretty close, but there's a column subtotal element that I cannot figure out.
I have sales data, summarized by Client, Quarter, and Week. Each quarter contains an arbitrary number of weeks, up to around 13.
I would like to list Client information down the side, with:
Quarters listed across the top,
each Quarter's corresponding Weeks nested underneath,
the table populated with Sales for each Client-Week,
and Grand Totals on the far right and bottom. I've gotten this far on my own just fine.
Where I'm getting stuck is that I'd also like to show Quarterly subtotal columns, at the end of each Quarter. I have tried different variations of "break after" and the like, but I'm just not getting there.
Does anyone in the community have any suggestions? Below is some sample code that represents the progress I've made so far. I've also attached an image showing what I currently "have" versus what I "want." I greatly appreciate any and all help - cheers!
data one; input client_rank client $9. client_id quarter $ weekending sales; datalines; 1 Apple 12345 Q1 20240106 1000 1 Apple 12345 Q1 20240113 2000 1 Apple 12345 Q1 20240127 5000 1 Apple 12345 Q2 20240413 3000 1 Apple 12345 Q2 20240420 4000 1 Apple 12345 Q2 20240427 2000 2 Microsoft 67890 Q1 20240106 3000 2 Microsoft 67890 Q1 20240113 1000 2 Microsoft 67890 Q1 20240127 2500 2 Microsoft 67890 Q2 20240413 4000 2 Microsoft 67890 Q2 20240420 500 2 Microsoft 67890 Q2 20240427 1500 ; run;
proc report data = one nowd; column client_rank client client_id quarter, weekending, sales ("Total" sales=tot); define client_rank / noprint group; define client / "Client" group; define client_id / "Client ID" group; define quarter / " " across; define weekending / " " across nozero; define sales / "Sales" analysis sum format=comma12.0; define tot / "Sales" analysis sum format=comma12.0;
rbreak after / summarize; compute after; client = "Total"; endcomp; define client_rank / order group; run;
... View more
Hi everyone,
I am writing to seek assistance with centering the values of "HR (95% CI)" in a PROC SGPLOT procedure.
Currently, in my code, the values of "HR (95% CI)" (Var name=HR2)are aligned to the left, and I would like to center them instead. However, I am unsure how to achieve this.
Below is my SAS code:
proc sgplot data=forest_subgroup_2 nowall noborder nocycleattrs dattrmap=attrmap noautolegend;
format text $txt.;
styleattrs axisextent=data;
refline ref2 / lineattrs=(thickness=13 color=cxf0f0f7);
highlow y=obsid low=CIL high=CIU;
scatter y=obsid x=hr / markerattrs=(symbol=squarefilled);
scatter y=obsid x=hr / markerattrs=(size=0) x2axis;
refline 1 / axis=x;
text x=xl y=obsid text=text / position=bottom contributeoffsets=none strip;
yaxistable subgroup / location=inside position=left textgroup=id labelattrs=(size=7)
textgroupid=text indentweight=indentWt;
yaxistable HR2 pvalue/ location=inside position=right pad=(right=15px)
labelattrs=(size=7) valueattrs=(size=7) ;
yaxis reverse display=none colorbands=odd colorbandsattrs=(transparency=1) offsetmin=0.0;
xaxis display=(nolabel) /* TYPE=LOG TYPE=LOG LOGSTYLE=LOGEXPAND LOGBASE=10 */ values=(0.0 0.5 1.0 1.5 2.0 2.5 3.0 );
x2axis label='Hazard Ratio' display=(noline noticks novalues) labelattrs=(size=8);
run;
Could you please advise me on how I can modify the yaxistable HR2 code to center align the values of "HR (95% CI)"?
Thank you very much for your attention to this matter. I look forward to your guidance.
... View more