Home

mhoward2

Hi, I have some code that sums up monthly values and assigns each month as a new column. The problem is the code is ugly and I feel like there should be a better way to do this. Here is the code: PROC SQL; CREATE TABLE TMP1DAY.MONTHLY_SUMMARY AS SELECT t1.ID_VARIABLE, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "201901" AND "201901" AND t1.SECOND_YYYYMM BETWEEN "201901" AND "201902", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_201901, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "201901" AND "201902" AND t1.SECOND_YYYYMM BETWEEN "201901" AND "201903", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_201902, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "201901" AND "201903" AND t1.SECOND_YYYYMM BETWEEN "201901" AND "201904", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_201903, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "201901" AND "201904" AND t1.SECOND_YYYYMM BETWEEN "201901" AND "201905", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_201904, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "201901" AND "201905" AND t1.SECOND_YYYYMM BETWEEN "201901" AND "201906", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_201905, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "201901" AND "201906" AND t1.SECOND_YYYYMM BETWEEN "201901" AND "201907", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_201906, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "201901" AND "201907" AND t1.SECOND_YYYYMM BETWEEN "201901" AND "201908", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_201907, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "201901" AND "201908" AND t1.SECOND_YYYYMM BETWEEN "201901" AND "201909", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_201908, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "201901" AND "201909" AND t1.SECOND_YYYYMM BETWEEN "201901" AND "201910", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_201909, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "201901" AND "201910" AND t1.SECOND_YYYYMM BETWEEN "201901" AND "201911", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_201910, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "201901" AND "201911" AND t1.SECOND_YYYYMM BETWEEN "201901" AND "201912", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_201911, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "201901" AND "201912" AND t1.SECOND_YYYYMM BETWEEN "201901" AND "202001", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_201912_01, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "201901" AND "201912" AND t1.SECOND_YYYYMM BETWEEN "201901" AND "202002", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_201912_02, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "202001" AND "202001" AND t1.SECOND_YYYYMM BETWEEN "202001" AND "202002", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_202001, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "202001" AND "202002" AND t1.SECOND_YYYYMM BETWEEN "202001" AND "202003", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_202002, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "202001" AND "202003" AND t1.SECOND_YYYYMM BETWEEN "202001" AND "202004", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_202003, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "202001" AND "202004" AND t1.SECOND_YYYYMM BETWEEN "202001" AND "202005", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_202004, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "202001" AND "202005" AND t1.SECOND_YYYYMM BETWEEN "202001" AND "202006", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_202005, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "202001" AND "202006" AND t1.SECOND_YYYYMM BETWEEN "202001" AND "202007", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_202006, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "202001" AND "202007" AND t1.SECOND_YYYYMM BETWEEN "202001" AND "202008", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_202007, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "202001" AND "202008" AND t1.SECOND_YYYYMM BETWEEN "202001" AND "202009", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_202008, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "202001" AND "202009" AND t1.SECOND_YYYYMM BETWEEN "202001" AND "202010", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_202009, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "202001" AND "202010" AND t1.SECOND_YYYYMM BETWEEN "202001" AND "202011", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_202010, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "202001" AND "202011" AND t1.SECOND_YYYYMM BETWEEN "202001" AND "202012", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_202011, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "202001" AND "202012" AND t1.SECOND_YYYYMM BETWEEN "202001" AND "202101", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_202012_01, SUM(IFN(t1.FIRST_YYYYMM BETWEEN "202001" AND "202012" AND t1.SECOND_YYYYMM BETWEEN "202001" AND "202102", t1.SUM_COLUMN, 0)) AS SUM_COLUMN_202012_02 FROM WORK.PRE_MONTHLY_SUMMARY t1 GROUP BY 1; QUIT; How would I instead write this to do the same thing but programmatically/with macro variables? its already out of hand and I will need to add more years which will just make it even worse. Any and all advice is appreciated. Thanks in advance!

scwein

**ADDENDUM to original post: I realized that this issue was being caused by starting with a "RETAIN" statement, which I use to put the variables in the desired order. But I'd still like to leave this question up because I'd appreciate any feedback on: How does a RETAIN statement work? When does it affect the outputs of a command in a DATA step? Does anyone have alternate/preferred strategies for reordering the variables in a dataset? Thanks! *********************************************************************** Original post: Hello SAS community, I'm very confused about how SAS deciphers "IF" Statements in the DATA step. In this specific case, I'm working with an account dataset that has some conflicting information about when accounts close, and I am constructing an "effective" close date. Earlier in my data step, I used some IF statements to construct my desired close date. The last step is to convert that numeric close date to a string variable in the format YYYYMM. Here's what I tried: DATA WORK.dates_test; SET WORK.raw_dates; close_eff_n = acct_close_dte_n; IF closed = 1 AND acct_close_dte_n = . THEN DO; close_eff_n = maxdate_n; END; *(omitting some additional logic used here for parsimony); IF close_eff_n > 0 THEN DO; close_dte_eff = put(close_eff_n,yymmn.); END; RUN; I had earlier written this last segment as: close_dte_eff = put(close_eff_n,yymmn.); but this populated the string variable close_dte_eff with a value of "." when close_eff_n was missing, which is why I'm now trying to implement this conditional logic. The problem is: where this condition fails, SAS populates the close_dte_eff field with whatever the last non-failed value was, which is completely incorrect. e.g. I have: close_eff_n 01MAR2023 01APR2023 . . 01JUL2021 I want: close_eff_n close_dte_eff 01MAR2023 202303 01APR2023 202304 . . 01JUL2021 202107 But instead I get: close_eff_n close_dte_eff 01MAR2023 202303 01APR2023 202304 . 202304 . 202304 01JUL2021 202107 When I tried to replicate this problem with a simplified dataset, i.e. just taking the final input variables and creating the desired output, I got the result I want, so I suspect it might have something to do with the preceding IF-statements. I can think of plenty of workarounds to get this to work as intended, so my question is not so much how to fix this, but why is this happening? There's something fundamental about how the "IF-statement" is being processed where rows that fail the "IF" condition are being populated with the value of the last row that met that condition, and I would like to understand when SAS applies this behavior and when it does not. I can see this being a useful feature in some limited cases, but it's generally not what I would want to do when applying conditional logic. I had thought that these sort of situations where SAS operates on one row depending on what was in the previous row only happen when there is a "BY" statement, but obviously that's incorrect as there is no "BY" statement in this DATA step. I'd really appreciate some explanation as to when actions are applied to rows that do not meet the specified condition in an "IF" statement, and how to control that behavior, so I can make sure that the commands I write are applying to the rows that I expect them to apply to. Please let me know if I can provide any other context or information that would be helpful. Many thanks, Scott

fwu811

Hi, I would like to apply two types of floors, respectively for lower and upper segments of a dataset. Assume the dataset has two columns: Col1 which is unique and asc sorted; Col2 is the actual data the floor applies to 1) Floor 1 on the lower band of Col1 : for any values in Col1 less than3, which is set to by the user - replace their values in Col2 with the Col2 value of Col1 =3 2)Floor 2 on the upper band of Col1: for any values in Col1 greater than 4, which is set to by the user - replace their values in Col2 with the Col2 value of previous Col1 to ensure they are not decreasing when Col1 values increase Below is an example of the dataset I have and what I want. Many thanks in advance. data have; input Col1 Col2; datalines; 1 1 2 2 3 3 4 4 5 3 6 6 7 1 8 3 ; data want; Col1 Col2 1 3 2 3 3 3 4 4 5 4 6 6 7 6 8 6 ;

genemroz

Esteemed Advisors: I am trying to interleave two datasets with a condition that the resulting dataset contains only observations that can be found in both of the two datasets. Below is exemplar code to illustrate the problem. If you run this code and inspect dataset interleave2 you will see that for a group of 3 observations where target=1, two came from Random_A and one came from Random_B. Likewise, for a group of three observations where target=2, two came from Random_B and one came from Random_A. All of these observations need to be retained in the desired dataset. For the group of 3 observations where target=3, all observations came from Random_B only. These are ones that need to be omitted. All observations for a given target that come from a single source dataset are not to be retained in the desired dataset. The challenge for me (and now for you) is to come up with the code that will interleave Random_A and Random_B such that the resultant dataset that only contains the groups of targets that are present in both datasets. Hope this makes sense and thanks for taking a look, Gene data Random_A (drop=i); call streaminit(4786); do i=1 to 100; Source="A"; Target=rand("Integer",1,100); ST=catx('/',Source,Target); output; end; data Random_B (drop=i); call streaminit(6874); do i=1 to 150; Source="B"; Target=rand("Integer",1,100); ST=catx('/',Source,Target); output; end; Proc sort data=Random_A; by ST; run; Proc sort data=Random_B; by ST; run; data interleave1; set random_A random_B; by ST; run; proc sort data=interleave1 out=interleave2 nounikey; by target; run;

hollandnumerics

My default region is Europe, but I need to create a course for PharmaSUG on the server for United States 1. How do I create an account away from my default region?