Recently in the SAS Community Library: SAS' @Sundaresh1 highlights a sometimes overlooked task when applying document embeddings for purposes of similarity-based search. Normalisation of vectors helps obtain relevant matches.
I'm trying to follow the code on this site Test for the equality of two proportions in SAS - The DO Loop for the section called A chi-square test for association in SAS. I basically need to compare the proportion in one area which was tested for something to the proportion in another area which was tested and see if they are significantly different proportions, but I can't get the code to work right. I get this error:
NOTE: Invalid data for N in line 79 1-6.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
79 CountyB Yes 71
Group=CountyA Seq=No N=. _ERROR_=1 _N_=2
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
My full code is:
data underfive; length Group $15 Test $3; input Group Test N; datalines; CountyA Yes 55 CountyA No 45027 CountyB Yes 71 CountyB No 311726;
Once I had that in I figured I would run this:
proc freq data=underfive order=data; weight N; tables Group*Test/chisq; run;
... View more
I'm using SGPANEL, and would like to highlight certain panels that are interesting. I think I just want a way to set the wallcolor for each panel dynamically, is that possible?
Below code will make 3 panels, and all three will have the wallcolor set to yellow:
data have ;
input panelid x y ;
cards ;
1 10 10
1 20 20
2 10 15
2 20 15
3 10 20
3 20 10
;
proc sgpanel data=have ;
panelby panelid/ layout=panel;
styleattrs wallcolor="yellow" ;
series x=x y=y;
run ;
Is there a way I can make only the second panel have a yellow wallcolor?
I tried using a band plot in the background, which came close, but still extended the y-axis even though I set option NOEXTEND. I'm sure I could try other methods for adding a yellow bar in the background of second panel (e.g. REFLINE instead of a BAND), but before I go down that path, wondered if I'm missing an easier way.
My band plot approach:
data want ;
set have ;
if panelid=2 then do ;
lowerband=0 ;
upperband=100 ;
end ;
run ;
proc sgpanel data=want ;
panelby panelid/ layout=panel;
band x=x lower=lowerband upper=upperband /fillattrs=(color=yellow) noextend;
series x=x y=y;
run ;
Returned:
... View more
Hello, I am getting the following error messages when trying to merge two datasets. One of the datasets I am getting from a csv file, so maybe the issue could be there? I was trying to specify the length of the PID variable for the redcap_sort dataset from the redcap one, which is the one we got from the csv file. However, I keep getting messages that the variable has multiple lengths and it keeps truncating the data. Any PID after 999 gets shortened. So 1000 and 1001 become 100, 1010 becomes 101, etc. Any help or a nudge in the right direction would be greatly appreciated, thank you so much.
... View more
**ADDENDUM to original post: I realized that this issue was being caused by starting with a "RETAIN" statement, which I use to put the variables in the desired order. But I'd still like to leave this question up because I'd appreciate any feedback on: How does a RETAIN statement work? When does it affect the outputs of a command in a DATA step? Does anyone have alternate/preferred strategies for reordering the variables in a dataset? Thanks! *********************************************************************** Original post: Hello SAS community, I'm very confused about how SAS deciphers "IF" Statements in the DATA step. In this specific case, I'm working with an account dataset that has some conflicting information about when accounts close, and I am constructing an "effective" close date. Earlier in my data step, I used some IF statements to construct my desired close date. The last step is to convert that numeric close date to a string variable in the format YYYYMM. Here's what I tried: DATA WORK.dates_test;
SET WORK.raw_dates;
close_eff_n = acct_close_dte_n; IF closed = 1 AND acct_close_dte_n = . THEN DO; close_eff_n = maxdate_n; END;
*(omitting some additional logic used here for parsimony);
IF close_eff_n > 0 THEN DO;
close_dte_eff = put(close_eff_n,yymmn.);
END;
RUN; I had earlier written this last segment as: close_dte_eff = put(close_eff_n,yymmn.); but this populated the string variable close_dte_eff with a value of "." when close_eff_n was missing, which is why I'm now trying to implement this conditional logic. The problem is: where this condition fails, SAS populates the close_dte_eff field with whatever the last non-failed value was, which is completely incorrect. e.g. I have: close_eff_n 01MAR2023 01APR2023 . . 01JUL2021 I want: close_eff_n close_dte_eff 01MAR2023 202303 01APR2023 202304 . . 01JUL2021 202107 But instead I get: close_eff_n close_dte_eff 01MAR2023 202303 01APR2023 202304 . 202304 . 202304 01JUL2021 202107 When I tried to replicate this problem with a simplified dataset, i.e. just taking the final input variables and creating the desired output, I got the result I want, so I suspect it might have something to do with the preceding IF-statements. I can think of plenty of workarounds to get this to work as intended, so my question is not so much how to fix this, but why is this happening? There's something fundamental about how the "IF-statement" is being processed where rows that fail the "IF" condition are being populated with the value of the last row that met that condition, and I would like to understand when SAS applies this behavior and when it does not. I can see this being a useful feature in some limited cases, but it's generally not what I would want to do when applying conditional logic. I had thought that these sort of situations where SAS operates on one row depending on what was in the previous row only happen when there is a "BY" statement, but obviously that's incorrect as there is no "BY" statement in this DATA step. I'd really appreciate some explanation as to when actions are applied to rows that do not meet the specified condition in an "IF" statement, and how to control that behavior, so I can make sure that the commands I write are applying to the rows that I expect them to apply to. Please let me know if I can provide any other context or information that would be helpful. Many thanks, Scott
... View more