Hello
I want to calculate for each numeric variable-
number of rows with null values
number of rows with zero values
number of rows with positive values
number of rows with negative values that not null
I want that the output will be out to a data set
I don't see the desired output
I also want to ask about the user format I have defined.
Is null value not included in both groups??
. ='Missing' Low-<0='Neg'
I want that null be only in first group (. ='Missing')
data have;
input x y z R $;
cards;
1 2 0 a
3 0 -2 b
. 3 3 c
. . . r
4 4 4 e
4 4 0 f
-1 -1 -1 g
;
run;
proc format;
value numeric_Fmt
. ='Missing'
0='Zero'
0<-High='POS'
Low-<0='Neg'
;
run;
proc freq data=have;
format _NUMERIC_ numeric_Fmt.;
tables _NUMERIC_ /out=want missing missprint nocum nopercent;
run;
... View more
Suppose I sample 30 widgets from a manufacturing line over the course of a 24 hour day. The sampling times are not necessarily random or evenly distributed. Each widget is inspected and either passes or fails. So I have data like:
data have ;
input time time5. fail ;
format time time5. ;
cards ;
00:45 0
01:00 1
02:45 0
03:15 1
03:45 1
05:30 1
05:45 0
06:00 1
06:30 0
06:45 1
07:00 0
09:00 1
10:15 1
12:30 1
14:00 0
16:00 0
17:15 0
17:30 0
17:45 0
19:00 0
20:45 0
21:30 0
21:45 0
22:00 0
22:15 0
22:30 0
22:45 0
23:00 0
23:15 0
23:30 0
;
run ;
If I plot the data I get:
And looking at the plot, I see two clusters. There was a high failure rate between 0:00 and 12:30, and after 14:00 the failure rate was zero. Of course usually the pattern isn't as clear. I could have data like:
At first it looks like there may be three clusters, with a low failure rate clusters between 10:00 and 16:00. But then when you think about it there are only four data points in that period. If the failure rate was constant during the day (.3), it would not be unusual to have four consecutive successes (.7**4=.24). So maybe the correct interpretation is that there are no clusters in that data. Or maybe there are two clusters: 00:00-09:30 with a high failure rate and 10:00-24:00 with a lower failure rate.
I think what I'm looking for is a way to group the data into clusters (by time) where consecutive clusters have a failure rate significantly different than the failure rate of neighboring clusters.
Or maybe another thing I might want is to identify any clusters (by time) where I can conclude that the failure rate is, say, <.1 (with 95% confidence).
Would appreciate any thoughts/suggestions. I realize this is not a well-framed question. I feel like I want some sort of cluster analysis? ChatGPT recommended looking into "temporal scan" methods which apparently create different time window sizes and then compare them. But I didn't find much on that approach that seemed helpful.
... View more
Hello
Let's say that I want to create a data set based on sas_help.class data set.
The new data set will have only 2 columns: Var_name , Var_Value
What is the way to do it?
I expect that the new data set will have 95 rows (because in sas_help.class have 19 rows with 5 columns so 19*5=95)
thanks
... View more
Hello,
In want to calculate statistics for all numeric variables.
There are some statistics that I don't know how to calculate for each numeric variable -
Number of rows with zero value (nr_Zero)
Number of rows with missing value (nr_Null)
Number of rows with positive value (nr_POS)
Number of rows with negative value that is not null (nr_Zero)
What is the way to calculate it automatically for each numeric var?
proc means data=sashelp.class sum min max q1 median q3 P1 p5 p95 P99 P75 P25 cv SUM Range N mean std STACKODSOUTPUT maxdec=2;
var _numeric_;
ods output summary=Want1(Rename=(Variable=Var_Name));
run;
ods output nlevels=LEVELS;
proc freq data=sashelp.class nlevels;
tables _numeric_ / noprint ;
run;
Nr_Null
Nr_Zero
Nr_POS
Nr_Neg
... View more
Hello
I have a data set with datetime column with format and informat datetimr22.3
I want to create a new column called time with time value and format time8.
I used the following code
time(effectiveTimestamp) as time format=time8.
As I see the time values are not correct,
I see for example datetime 15MAY2024:03:02:08.977 and calculated time 14:17:42 but as I see it should be 03:02:08
... View more