Home

NewUsrStat

Hi guys, suppose to have the following: data DB; input ID :$20. Date_Admission :date9. Date_Discharge :date9. Event :$20. Date_Event :$20. Index :$20. Final_Date :$20.; format Date_Admission :date9. Date_Discharge :date9. Final_Date :$20.; cards; 0001 18OCT2016 18NOV2016 Change_department MAR2017 1 0 0001 17FEB2018 28FEB2018 . . 0 0 0002 11SEP2014 15SEP2014 Change_department NOV2014 1 0 0002 13DEC2015 25DEC2015 . . 0 0 0003 12MAR2022 15MAR2022 Change_department APR2022 1 0 0004 15APR2016 18APR2016 Change_department AUG2016 1 0 0004 01JAN2018 15FEB2018 Death DEC2018 0 0 0005 15APR2017 18APR2017 Change_department AUG2017 1 0 0005 01JAN2019 15FEB2019 Death FEB2019 0 0 0006 01APR2022 18APR2017 Change_department MAY2022 1 0 0006 13MAR2023 01MAY2023 Death MAY2023 0 0 0007 11NOV2014 13NOV2014 Change_department JAN2015 1 0 .... run; I have to set a Final_Date variable based on dates comparisons according to the following rules: 1) If Index = 1 and Date_Event > Date_Discharge + 90 days and missing (Date_Event) for all other combinations of Date_Admission-Date_Event for the same ID then Final_Date = 30SEP2023 (e.g., ID = 0001) 2) If Index = 1 and Date_Event < Date_Discharge + 90 days then delete the entire ID independently if there are other admissions (e.g., ID = 0002). 3) If Index = 1 and and Date_Event < Date_Discharge + 90 days and there is only one combination Date_Admission-Date_Discharge for that ID, then since Date_Event < Date_Discharge + 90 days then delete (e.g., ID = 0003). 4) If Index = 1 and Date_Event > Date_Discharge + 90 days but there are other admissions-discharges and also an event death, then the Final_Date = the date corresponding to the death. Since there is only the indication of Month and Year, if the death is not in the same month of the last admission-discharge, then set the date to the first day of the month (e.g., for ID=0004 the Final_Date should be 01DEC2018). Otherwise, if the death is in the same month of the last admission-discharge (e.g., ID= 0005) then the final_date should be set as the first day after the corresponding discharge to avoid the final_date falls into the range last admission-discharge. 5) If Index = 1 and Date_Event < Date_Discharge + 90 days (e.g., ID = 0006) then remove all records corresponding to that ID (as point 2) independently from the presence of event Death. 6) If Index = 1 and Date_Event > Date_Discharge + 90 days and there is only one admission-discharge (e.g., ID = 0007) then, if the Date_Event is in the same month of the combination admission-discharge, then the final_date should be the day after the discharge (e.g., ID = 0007), otherwise if not in the same month than set at the first day of the month (e.g., ID = 0008). Desired output: data DB1; input ID :$20. Date_Admission :date9. Date_Discharge :date9. Event :$20. Date_Event :$20. Index :$20. Final_Date :date9.; format Date_Admission :date9. Date_Discharge :date9. Final_Date :date9.; cards; 0001 18OCT2016 18NOV2016 Change_department MAR2017 1 30SEP2022 0001 17FEB2018 28FEB2018 . . 0 30SEP2022 0004 15APR2016 18APR2016 Change_department AUG2016 1 01DEC2018 0004 01JAN2018 15FEB2018 Death DEC2018 0 01DEC2018 0005 15APR2017 18APR2017 Change_department AUG2017 1 16FEB2019 0005 01JAN2019 15FEB2019 Death FEB2019 0 16FEB2018 0007 11NOV2014 13NOV2014 Change_department NOV2014 1 14NOV2014 0008 12DEC2016 15JAN2017 Change_department FEB2017 1 01FEB2017 .... run; Can anyone help me please? Thank you in advance, Best

Quentin

Suppose I sample 30 widgets from a manufacturing line over the course of a 24 hour day. The sampling times are not necessarily random or evenly distributed. Each widget is inspected and either passes or fails. So I have data like: data have ; input time time5. fail ; format time time5. ; cards ; 00:45 0 01:00 1 02:45 0 03:15 1 03:45 1 05:30 1 05:45 0 06:00 1 06:30 0 06:45 1 07:00 0 09:00 1 10:15 1 12:30 1 14:00 0 16:00 0 17:15 0 17:30 0 17:45 0 19:00 0 20:45 0 21:30 0 21:45 0 22:00 0 22:15 0 22:30 0 22:45 0 23:00 0 23:15 0 23:30 0 ; run ; If I plot the data I get: And looking at the plot, I see two clusters. There was a high failure rate between 0:00 and 12:30, and after 14:00 the failure rate was zero. Of course usually the pattern isn't as clear. I could have data like: At first it looks like there may be three clusters, with a low failure rate clusters between 10:00 and 16:00. But then when you think about it there are only four data points in that period. If the failure rate was constant during the day (.3), it would not be unusual to have four consecutive successes (.7**4=.24). So maybe the correct interpretation is that there are no clusters in that data. Or maybe there are two clusters: 00:00-09:30 with a high failure rate and 10:00-24:00 with a lower failure rate. I think what I'm looking for is a way to group the data into clusters (by time) where consecutive clusters have a failure rate significantly different than the failure rate of neighboring clusters. Or maybe another thing I might want is to identify any clusters (by time) where I can conclude that the failure rate is, say, <.1 (with 95% confidence). Would appreciate any thoughts/suggestions. I realize this is not a well-framed question. I feel like I want some sort of cluster analysis? ChatGPT recommended looking into "temporal scan" methods which apparently create different time window sizes and then compare them. But I didn't find much on that approach that seemed helpful.

Ronein

Hello I want to calculate for each numeric variable- number of rows with null values number of rows with zero values number of rows with positive values number of rows with negative values that not null I want that the output will be out to a data set I don't see the desired output I also want to ask about the user format I have defined. Is null value not included in both groups?? . ='Missing' Low-<0='Neg' I want that null be only in first group (. ='Missing') data have; input x y z R $; cards; 1 2 0 a 3 0 -2 b . 3 3 c . . . r 4 4 4 e 4 4 0 f -1 -1 -1 g ; run; proc format; value numeric_Fmt . ='Missing' 0='Zero' 0<-High='POS' Low-<0='Neg' ; run; proc freq data=have; format _NUMERIC_ numeric_Fmt.; tables _NUMERIC_ /out=want missing missprint nocum nopercent; run;

Ronein

Hello, In want to calculate statistics for all numeric variables. There are some statistics that I don't know how to calculate for each numeric variable - Number of rows with zero value (nr_Zero) Number of rows with missing value (nr_Null) Number of rows with positive value (nr_POS) Number of rows with negative value that is not null (nr_Zero) What is the way to calculate it automatically for each numeric var? proc means data=sashelp.class sum min max q1 median q3 P1 p5 p95 P99 P75 P25 cv SUM Range N mean std STACKODSOUTPUT maxdec=2; var _numeric_; ods output summary=Want1(Rename=(Variable=Var_Name)); run; ods output nlevels=LEVELS; proc freq data=sashelp.class nlevels; tables _numeric_ / noprint ; run; Nr_Null Nr_Zero Nr_POS Nr_Neg

Ronein

Hello Let's say that I want to create a data set based on sas_help.class data set. The new data set will have only 2 columns: Var_name , Var_Value What is the way to do it? I expect that the new data set will have 95 rows (because in sas_help.class have 19 rows with 5 columns so 19*5=95) thanks