The LITI rule for Text Analytics that you didn’t know you needed…until now!
Recent Library Articles
Recently in the SAS Community Library: Customer complaint call transcripts can end up being quite verbose. SAS' @PeterChristie reveals how to distill relevant info using SAS Text Analytics.
I used proc import to read my data and one of the date field(4/13/24) imported as sas number date. How do I convert that date to date9. (12APR2024). Also in this field I have some character values like "No date" and some date values. How do I handle them.
data pm;
input exp $ Name $ ;
datalines;
45301 John
44986 Jane
45412 Bob
No_date Emily
10/5/24 Don
;
run;
data temp_test;
set pm;
expiry=input(exp,date9.);
run;
Thank you
... View more
Greetings,
I am running parametric survival models and Cox regression models, and I get counterintuitive results; what might be the reason, please? And in my data set I have just 8 events out of 630 individuals
... View more
Hi there. 🤔
I have an issue with decimals in numeric fields. I read somewhere SAS can be pretty accurate up to 15 - 16 decimals. Say let's say I have a z19:16 formatted numeric field (I need to format it (I think!) to get more than 10 decimals which I think is the standard if the variable is just defined as numeric) that contains: 12.775. I need to send this to a mainframe to which we only can send integers. So to make the mainframe understand it is 12.775, I need to send 12775 and the no. of decimal positions as a separate number, i.e. 3. Another example would be 12.6802 where I have to send 126802 and 4 to "tell" the mainframe to divide by 10.000. So my problem is this: I am not so much into HEX codes and what is going on regarding the internal form SAS operates with with huge number - either before or after the decimal point. The solutions I found on the net use SUBST and SCAN function that normally would do good for text variables but ALSO it seems for numbers up to 10 digits before or after the decimal point (I guess that relates to the entire HEX issue). Yes, I do sound like somebody who need to train to understand this but I have a deadline and need to solve this quick.
So in short .. the code I'll send after this text works for values up to 10 digits. If I have more the result is unpredictable. I need to code this different than what I did below. I need some code that can give me 12 for 77,123456789012, 3 for 66,378, 14 for 95,12345678901234. How do code that people?
What I did to get the right the answer ... (that only works up to 10 decimals). I build a table with some values and then use two methods I found on the internet to determine the number of decimals:
Run the code if you want and then read what I wrote after it.
/* Create a little test data. */
data decimal_play;
format pi_1 pi_2 pi_3 pi_4 pi_5 Z20.17 ;
pi_1 = 3.141592650000000;
pi_2 = 3.141592653000000;
pi_3 = 3.141592653500000;
pi_4 = 3.141592653580000;
pi_5 = 3.141592653589793;
run;
/* Method 1 */
data Method_1 (keep = dec_pi_1 dec_pi_2 dec_pi_3 dec_pi_4 dec_pi_5);;
set decimal_play;
dec_pi_1 = lengthn(scan(cat(pi_1),2,'.'));
dec_pi_2 = lengthn(scan(cat(pi_2),2,'.'));
dec_pi_3 = lengthn(scan(cat(pi_3),2,'.'));
dec_pi_4 = lengthn(scan(cat(pi_4),2,'.'));
dec_pi_5 = lengthn(scan(cat(pi_5),2,'.'));
output;
run;
/* Method 2 */
data Method_2 (keep = dec_pi_1 dec_pi_2 dec_pi_3 dec_pi_4 dec_pi_5);
set decimal_play;
first = substr(pi_1, 1, index(pi_1, '.') - 1); second = left(substr(pi_1, index(pi_1, '.') + 1)); dec_pi_1 = LENGTH(second);
first = substr(pi_2, 1, index(pi_2, '.') - 1); second = left(substr(pi_2, index(pi_2, '.') + 1)); dec_pi_2 = LENGTH(second);
first = substr(pi_3, 1, index(pi_3, '.') - 1); second = left(substr(pi_3, index(pi_3, '.') + 1)); dec_pi_3 = LENGTH(second);
first = substr(pi_4, 1, index(pi_4, '.') - 1); second = left(substr(pi_4, index(pi_4, '.') + 1)); dec_pi_4 = LENGTH(second);
first = substr(pi_5, 1, index(pi_5, '.') - 1); second = left(substr(pi_5, index(pi_5, '.') + 1)); dec_pi_5 = LENGTH(second);
run;
So how should I code this nicely? Code that reads a number, gives me the number as an integer (max 18 long, i.e.: 2 before the decimal point and 16 after) and the number of decimals is a separate variable.
3.141592650000000 as 314159265 & 8
3.141592653000000 as 3141592653 & 9
3.141592653500000 as 31415926535 & 10
3.141592653580000 as 314159265358 & 11
3.141592653589793 as 3141592653589793 & 15
Best regards,
Menno 🙈
... View more
Hi,
I need to convert a time variable currently in character toa numeric but the data is non standard format;
data have ;
input starttm 8.;
datalines;
13.48
9
10
9.3
8.56
11.10
.
1.2
run;
/*
Want the data to look:
13:48
09:00
10:00
09:30
08:56
11:10
.
01:20
*/
data want;
length srtm1 srtm3 srtm4_ srtm4 $5;
set have;
srtm1=put(starttm,time5.);/* Not working. All observatsion=0:00*/
*srtm2=put(starttm,hhmmss5.);* Works only for observations 2,3*/;
srtm3=put(starttm,tod5.);/* Not working. All observatsion=0:00*/
srtm4_=tranwrd(strip(put(starttm,best5.)),".",":");/* Not working. same as original*/
if starttm ^=. then do;
if index(srtm4_,":")=3 then srtm4= srtm4_;
else if index(srtm4_,":")=2 then srtm4= srtm4_||"0";
else if index(srtm4_,":")=20 then srtm4= srtm4_||":00";
end;
if index(srtm4_,":")>0 then srtm5= input(srtm4_,time5.); /* Not working dor observation 4,6*/
else srtm5= input(srtm4_,hhmmss5.);
format srtm5 time5.;
run;
Any suggestions as to how to get a format that works for each value?
Thanks.
... View more
Hello,
I tried using the bootstrapping method described by Rick Wicklin here to calculate the 95% CI around my statistic of interest (the 99th percentile of a variable).
My sample size is about 3200 participants. I am planning to use 2000 replicates. However, when I ran the code (below), it returned CI that do not surround the initial 99th percentile. For example, the 99th percentile of the variable was 0.13 and the CI limits generated were 11.6 and 42.9. I think Any thoughts on what I did wrong here? I added p99 to step 2, which was not in the original example, but is my statistic of interest.
Thanks,
Sophie
proc means data=mydata p99; var myvar; run;
%let NumSamples = 2000; /* number of bootstrap resamples */
/* 1. Generate many bootstrap samples */
proc surveyselect data=mydata NOPRINT seed=1
out=BootSSFreq(rename=(Replicate=SampleID))
method=urs /* resample with replacement */
samprate=1 /* each bootstrap sample has N observations */
/*outhits*/ /* OUTHITS option to suppress the frequency var */
reps=&NumSamples; /* generate NumSamples bootstrap resamples */
run;
/* 2. Compute the statistic for each bootstrap sample */
proc means data=BootSSFreq p99 noprint;
by SampleID;
freq NumberHits;
var myvar;
output out=OutStats skew=Skewness; /* approx sampling distribution */
run;
/* 3. Use approx sampling distribution to make statistical inferences */
proc univariate data=OutStats noprint;
var Skewness;
output out=Pctl pctlpre =CI95_
pctlpts =2.5 97.5 /* compute 95% bootstrap confidence interval */
pctlname=Lower Upper;
run;
proc print data=Pctl noobs; run;
... View more