Hello,
In want to calculate statistics for all numeric variables.
There are some statistics that I don't know how to calculate for each numeric variable -
Number of rows with zero value (nr_Zero)
Number of rows with missing value (nr_Null)
Number of rows with positive value (nr_POS)
Number of rows with negative value that is not null (nr_Zero)
What is the way to calculate it automatically for each numeric var?
proc means data=sashelp.class sum min max q1 median q3 P1 p5 p95 P99 P75 P25 cv SUM Range N mean std STACKODSOUTPUT maxdec=2;
var _numeric_;
ods output summary=Want1(Rename=(Variable=Var_Name));
run;
ods output nlevels=LEVELS;
proc freq data=sashelp.class nlevels;
tables _numeric_ / noprint ;
run;
Nr_Null
Nr_Zero
Nr_POS
Nr_Neg
... View more
Hello
I want to calculate for each numeric variable-
number of rows with null values
number of rows with zero values
number of rows with positive values
number of rows with negative values that not null
I want that the output will be out to a data set
I don't see the desired output
I also want to ask about the user format I have defined.
Is null value not included in both groups??
. ='Missing' Low-<0='Neg'
I want that null be only in first group (. ='Missing')
data have;
input x y z R $;
cards;
1 2 0 a
3 0 -2 b
. 3 3 c
. . . r
4 4 4 e
4 4 0 f
-1 -1 -1 g
;
run;
proc format;
value numeric_Fmt
. ='Missing'
0='Zero'
0<-High='POS'
Low-<0='Neg'
;
run;
proc freq data=have;
format _NUMERIC_ numeric_Fmt.;
tables _NUMERIC_ /out=want missing missprint nocum nopercent;
run;
... View more
Hello
I have a data set with datetime column with format and informat datetimr22.3
I want to create a new column called time with time value and format time8.
I used the following code
time(effectiveTimestamp) as time format=time8.
As I see the time values are not correct,
I see for example datetime 15MAY2024:03:02:08.977 and calculated time 14:17:42 but as I see it should be 03:02:08
... View more
Hello
I want to calculate the following statistics for each numeric variable:
nr,nr_POS,nr_Neg,nr_Nulls,nr_0,nr_distinct_values,AVG,STD,UCL,LCL,Sum,Min,Max,Q1,Median,
Q3,P1,P5,P95,P99,P75,P25,CV,Range
I want to identify automatically the numeric variables.
In this code I perform wide to Long change of data structure.
My question- Is there a better more efficient way to calculate the required statistics? May you show code?
In real life the data set has 2 million rows and 200 numeric variables .
I am not sure if this method of work (Long to wide) is efficient in this case
proc sql;
create table numeric_Vars as
select libname, memname, name, type, length,
format, informat, label
from dictionary.columns
where libname = 'SASHELP' and memname='CLASS' and upcase(type)='NUM'
;
quit;
proc sql noprint;
select name into : Numeric_Vars_List SEPARATED by ' '
from numeric_Vars
;
quit;
%put &Numeric_Vars_List.; /**Age Height Weight**/
proc sql noprint;
select count(*) as nr_numeric_Vars into :nr_numeric_Vars
from numeric_Vars
;
quit;
%put &nr_numeric_Vars.;
data Long_Structure_Numeric_Vars_Data(DROP=&Numeric_Vars_List. J);
set sashelp.class(KEEP=&Numeric_Vars_List.);
array vv{&nr_numeric_Vars.} &Numeric_Vars_List.; /* _NUMERIC_ */
do j=1 to dim(vv);
Var_Name = vname(vv(j));
Var_Value =vv(j);
output;
end;
run;
proc sql;
create table Want1 as
select Var_Name,
count(*) as nr,
sum(case when Var_Value>0 then 1 else 0 end ) as nr_POS,
sum(case when Var_Value<0 AND Var_Value=. then 1 else 0 end ) as nr_Neg,
sum(case when Var_Value=. then 1 else 0 end ) as nr_Nulls,
sum(case when Var_Value=0 then 1 else 0 end ) as nr_0,
count(distinct Var_Value) as nr_distinct_values,
mean(Var_Value) as AVG format=8.2,
STD(Var_Value) as STD format=8.2,
calculated AVG+calculated STD *3 as UCL,
calculated AVG-calculated STD *3 as LCL
from Long_Structure_Numeric_Vars_Data
group by Var_Name
;
quit;
proc means data=sashelp.class sum min max q1 median q3 P1 p5 p95 P99 P75 P25 cv SUM Range /**N mean std**/ STACKODSOUTPUT maxdec=2;
var _numeric_;
ods output summary=Want2(Rename=(Variable=Var_Name));
run;
data want_summary_Report;
merge Want1 Want2;
by Var_Name;
Run;
... View more
Hi guys,
suppose to have the following:
data DB;
input ID :$20. Date_Admission :date9. Date_Discharge :date9. Event :$20. Date_Event :$20. Index :$20. Final_Date :$20.;
format Date_Admission :date9. Date_Discharge :date9. Final_Date :$20.;
cards;
0001 18OCT2016 18NOV2016 Change_department MAR2017 1 0
0001 17FEB2018 28FEB2018 . . 0 0
0002 11SEP2014 15SEP2014 Change_department NOV2014 1 0
0002 13DEC2015 25DEC2015 . . 0 0
0003 12MAR2022 15MAR2022 Change_department APR2022 1 0
0004 15APR2016 18APR2016 Change_department AUG2016 1 0
0004 01JAN2018 15FEB2018 Death DEC2018 0 0
0005 15APR2017 18APR2017 Change_department AUG2017 1 0
0005 01JAN2019 15FEB2019 Death FEB2019 0 0
0006 01APR2022 18APR2017 Change_department MAY2022 1 0
0006 13MAR2023 01MAY2023 Death MAY2023 0 0
0007 11NOV2014 13NOV2014 Change_department JAN2015 1 0
....
run;
I have to set a Final_Date variable based on dates comparisons according to the following rules:
1) If Index = 1 and Date_Event > Date_Discharge + 90 days and missing (Date_Event) for all other combinations of Date_Admission-Date_Event for the same ID then Final_Date = 30SEP2023 (e.g., ID = 0001)
2) If Index = 1 and Date_Event < Date_Discharge + 90 days then delete the entire ID independently if there are other admissions (e.g., ID = 0002).
3) If Index = 1 and and Date_Event < Date_Discharge + 90 days and there is only one combination Date_Admission-Date_Discharge for that ID, then since Date_Event < Date_Discharge + 90 days then delete (e.g., ID = 0003).
4) If Index = 1 and Date_Event > Date_Discharge + 90 days but there are other admissions-discharges and also an event death, then the Final_Date = the date corresponding to the death. Since there is only the indication of Month and Year, if the death is not in the same month of the last admission-discharge, then set the date to the first day of the month (e.g., for ID=0004 the Final_Date should be 01DEC2018). Otherwise, if the death is in the same month of the last admission-discharge (e.g., ID= 0005) then the final_date should be set as the first day after the corresponding discharge to avoid the final_date falls into the range last admission-discharge.
5) If Index = 1 and Date_Event < Date_Discharge + 90 days (e.g., ID = 0006) then remove all records corresponding to that ID (as point 2) independently from the presence of event Death.
6) If Index = 1 and Date_Event > Date_Discharge + 90 days and there is only one admission-discharge (e.g., ID = 0007) then, if the Date_Event is in the same month of the combination admission-discharge, then the final_date should be the day after the discharge (e.g., ID = 0007), otherwise if not in the same month than set at the first day of the month (e.g., ID = 0008).
Desired output:
data DB1;
input ID :$20. Date_Admission :date9. Date_Discharge :date9. Event :$20. Date_Event :$20. Index :$20. Final_Date :date9.;
format Date_Admission :date9. Date_Discharge :date9. Final_Date :date9.;
cards;
0001 18OCT2016 18NOV2016 Change_department MAR2017 1 30SEP2022
0001 17FEB2018 28FEB2018 . . 0 30SEP2022
0004 15APR2016 18APR2016 Change_department AUG2016 1 01DEC2018
0004 01JAN2018 15FEB2018 Death DEC2018 0 01DEC2018
0005 15APR2017 18APR2017 Change_department AUG2017 1 16FEB2019
0005 01JAN2019 15FEB2019 Death FEB2019 0 16FEB2018
0007 11NOV2014 13NOV2014 Change_department NOV2014 1 14NOV2014
0008 12DEC2016 15JAN2017 Change_department FEB2017 1 01FEB2017
....
run;
Can anyone help me please?
Thank you in advance,
Best
... View more