Home

Ronein

Hello, In want to calculate statistics for all numeric variables. There are some statistics that I don't know how to calculate for each numeric variable - Number of rows with zero value (nr_Zero) Number of rows with missing value (nr_Null) Number of rows with positive value (nr_POS) Number of rows with negative value that is not null (nr_Zero) What is the way to calculate it automatically for each numeric var? proc means data=sashelp.class sum min max q1 median q3 P1 p5 p95 P99 P75 P25 cv SUM Range N mean std STACKODSOUTPUT maxdec=2; var _numeric_; ods output summary=Want1(Rename=(Variable=Var_Name)); run; ods output nlevels=LEVELS; proc freq data=sashelp.class nlevels; tables _numeric_ / noprint ; run; Nr_Null Nr_Zero Nr_POS Nr_Neg

Ronein

Hello I want to calculate for each numeric variable- number of rows with null values number of rows with zero values number of rows with positive values number of rows with negative values that not null I want that the output will be out to a data set I don't see the desired output I also want to ask about the user format I have defined. Is null value not included in both groups?? . ='Missing' Low-<0='Neg' I want that null be only in first group (. ='Missing') data have; input x y z R $; cards; 1 2 0 a 3 0 -2 b . 3 3 c . . . r 4 4 4 e 4 4 0 f -1 -1 -1 g ; run; proc format; value numeric_Fmt . ='Missing' 0='Zero' 0<-High='POS' Low-<0='Neg' ; run; proc freq data=have; format _NUMERIC_ numeric_Fmt.; tables _NUMERIC_ /out=want missing missprint nocum nopercent; run;

Ronein

Hello I have a data set with datetime column with format and informat datetimr22.3 I want to create a new column called time with time value and format time8. I used the following code time(effectiveTimestamp) as time format=time8. As I see the time values are not correct, I see for example datetime 15MAY2024:03:02:08.977 and calculated time 14:17:42 but as I see it should be 03:02:08

Ronein

Hello I want to calculate the following statistics for each numeric variable: nr,nr_POS,nr_Neg,nr_Nulls,nr_0,nr_distinct_values,AVG,STD,UCL,LCL,Sum,Min,Max,Q1,Median, Q3,P1,P5,P95,P99,P75,P25,CV,Range I want to identify automatically the numeric variables. In this code I perform wide to Long change of data structure. My question- Is there a better more efficient way to calculate the required statistics? May you show code? In real life the data set has 2 million rows and 200 numeric variables . I am not sure if this method of work (Long to wide) is efficient in this case proc sql; create table numeric_Vars as select libname, memname, name, type, length, format, informat, label from dictionary.columns where libname = 'SASHELP' and memname='CLASS' and upcase(type)='NUM' ; quit; proc sql noprint; select name into : Numeric_Vars_List SEPARATED by ' ' from numeric_Vars ; quit; %put &Numeric_Vars_List.; /**Age Height Weight**/ proc sql noprint; select count(*) as nr_numeric_Vars into :nr_numeric_Vars from numeric_Vars ; quit; %put &nr_numeric_Vars.; data Long_Structure_Numeric_Vars_Data(DROP=&Numeric_Vars_List. J); set sashelp.class(KEEP=&Numeric_Vars_List.); array vv{&nr_numeric_Vars.} &Numeric_Vars_List.; /* _NUMERIC_ */ do j=1 to dim(vv); Var_Name = vname(vv(j)); Var_Value =vv(j); output; end; run; proc sql; create table Want1 as select Var_Name, count(*) as nr, sum(case when Var_Value>0 then 1 else 0 end ) as nr_POS, sum(case when Var_Value<0 AND Var_Value=. then 1 else 0 end ) as nr_Neg, sum(case when Var_Value=. then 1 else 0 end ) as nr_Nulls, sum(case when Var_Value=0 then 1 else 0 end ) as nr_0, count(distinct Var_Value) as nr_distinct_values, mean(Var_Value) as AVG format=8.2, STD(Var_Value) as STD format=8.2, calculated AVG+calculated STD *3 as UCL, calculated AVG-calculated STD *3 as LCL from Long_Structure_Numeric_Vars_Data group by Var_Name ; quit; proc means data=sashelp.class sum min max q1 median q3 P1 p5 p95 P99 P75 P25 cv SUM Range /**N mean std**/ STACKODSOUTPUT maxdec=2; var _numeric_; ods output summary=Want2(Rename=(Variable=Var_Name)); run; data want_summary_Report; merge Want1 Want2; by Var_Name; Run;

NewUsrStat

Hi guys, suppose to have the following: data DB; input ID :$20. Date_Admission :date9. Date_Discharge :date9. Event :$20. Date_Event :$20. Index :$20. Final_Date :$20.; format Date_Admission :date9. Date_Discharge :date9. Final_Date :$20.; cards; 0001 18OCT2016 18NOV2016 Change_department MAR2017 1 0 0001 17FEB2018 28FEB2018 . . 0 0 0002 11SEP2014 15SEP2014 Change_department NOV2014 1 0 0002 13DEC2015 25DEC2015 . . 0 0 0003 12MAR2022 15MAR2022 Change_department APR2022 1 0 0004 15APR2016 18APR2016 Change_department AUG2016 1 0 0004 01JAN2018 15FEB2018 Death DEC2018 0 0 0005 15APR2017 18APR2017 Change_department AUG2017 1 0 0005 01JAN2019 15FEB2019 Death FEB2019 0 0 0006 01APR2022 18APR2017 Change_department MAY2022 1 0 0006 13MAR2023 01MAY2023 Death MAY2023 0 0 0007 11NOV2014 13NOV2014 Change_department JAN2015 1 0 .... run; I have to set a Final_Date variable based on dates comparisons according to the following rules: 1) If Index = 1 and Date_Event > Date_Discharge + 90 days and missing (Date_Event) for all other combinations of Date_Admission-Date_Event for the same ID then Final_Date = 30SEP2023 (e.g., ID = 0001) 2) If Index = 1 and Date_Event < Date_Discharge + 90 days then delete the entire ID independently if there are other admissions (e.g., ID = 0002). 3) If Index = 1 and and Date_Event < Date_Discharge + 90 days and there is only one combination Date_Admission-Date_Discharge for that ID, then since Date_Event < Date_Discharge + 90 days then delete (e.g., ID = 0003). 4) If Index = 1 and Date_Event > Date_Discharge + 90 days but there are other admissions-discharges and also an event death, then the Final_Date = the date corresponding to the death. Since there is only the indication of Month and Year, if the death is not in the same month of the last admission-discharge, then set the date to the first day of the month (e.g., for ID=0004 the Final_Date should be 01DEC2018). Otherwise, if the death is in the same month of the last admission-discharge (e.g., ID= 0005) then the final_date should be set as the first day after the corresponding discharge to avoid the final_date falls into the range last admission-discharge. 5) If Index = 1 and Date_Event < Date_Discharge + 90 days (e.g., ID = 0006) then remove all records corresponding to that ID (as point 2) independently from the presence of event Death. 6) If Index = 1 and Date_Event > Date_Discharge + 90 days and there is only one admission-discharge (e.g., ID = 0007) then, if the Date_Event is in the same month of the combination admission-discharge, then the final_date should be the day after the discharge (e.g., ID = 0007), otherwise if not in the same month than set at the first day of the month (e.g., ID = 0008). Desired output: data DB1; input ID :$20. Date_Admission :date9. Date_Discharge :date9. Event :$20. Date_Event :$20. Index :$20. Final_Date :date9.; format Date_Admission :date9. Date_Discharge :date9. Final_Date :date9.; cards; 0001 18OCT2016 18NOV2016 Change_department MAR2017 1 30SEP2022 0001 17FEB2018 28FEB2018 . . 0 30SEP2022 0004 15APR2016 18APR2016 Change_department AUG2016 1 01DEC2018 0004 01JAN2018 15FEB2018 Death DEC2018 0 01DEC2018 0005 15APR2017 18APR2017 Change_department AUG2017 1 16FEB2019 0005 01JAN2019 15FEB2019 Death FEB2019 0 16FEB2018 0007 11NOV2014 13NOV2014 Change_department NOV2014 1 14NOV2014 0008 12DEC2016 15JAN2017 Change_department FEB2017 1 01FEB2017 .... run; Can anyone help me please? Thank you in advance, Best