Recently in the SAS Community Library: SAS' @Sundaresh1 highlights a sometimes overlooked task when applying document embeddings for purposes of similarity-based search. Normalisation of vectors helps obtain relevant matches.
Because of variable length limits, etc., I simply renamed all variables:
i_01
i_02
i_03
But each variable really should be associated with a longer, more descriptive name:
Abc.......
Def.......
Ghi.......
I've imported the dataset using the simple variable names. But I'd like to somehow 'import' the longer respective names as 'labels'. Can that be done? If so, how?
Thanks,
Nicholas Kormanik
... View more
Hi, I am trying to achieve a specific sort order using custom sort for the variables Names and Codes. I have a sample variable with this order below: Names Codes Boys 2795 2796 2900 PHA Total Girls 2795 2796 2900 PHA Total Dads 2795 2796 2900 PHA Total Moms 2795 2796 2900 PHA Total This is the order I wanted Names Codes Dads PHA 2795 2796 2900 Total Moms PHA 2795 2796 2900 Total Boys PHA 2795 2796 2900 Total Girls PHA 2795 2796 2900 Total
... View more
I have a setting file, the top two lines are below(it have 10+ lines).
Need to update the sinx_index value with a macro variable from SAS run.
How to do it?! Any convenient way, thanks,
[global]
sinx_index=100
... View more
Hi folks,
I am attempting to read an xlsx file into SAS without SAS doing anything to modify the the variable types or the characters as they appear when viewing the xlsx file in Excel. In other words, if something looks like a date, regardless of how it's stored in the xlsx file (e.g., "11/25/2023"), it'll show up in sas as a character variable as that same value (e.g., "11/25/2023"). In other words, I'd like to replicate what the Import Data wizard in SAS does when it just defines all fields as character, as in the following attributes for one particular column, for example:
type=String
Source Informat=$CHAR29
Len.=29
Output Format=$CHAR29.
Output Informat=$CHAR29.
The code automatically generated from the Import Data wizard when I do this contains the DATALINES4 statement, which I can't use because I can't just fill all of these in, as I'm trying to load the data in without having to look at the contents. I'd be fine, though, if I can define the variable types/lengths/formats/etc., as I know what the variables are and what they'll be named.
How do I do this without having to fill in the DATALINES4 content?
I need to be able to specify the sheet within the Excel workbook in question as well.
Thanks!
... View more
I have very little experience working with weights, so please correct me if my understanding is wrong.
I'm trying to create a summary table of unadjusted rates of quality of care between the TM and MA groups. I was able to produce a table with Proc Ttest and ODS. However, the survey uses a complex design. I need to add a weight variable and, it appears, replicate weight variables. Unfortunately, Proc Ttest can accommodate a weight variable but not replicate weights.
Just to experiment, I tried running Proc Ttest with the weight variable, and the sig score for the variables improved. That confuses me, because the study documentation says "To permit the calculation of random errors due to sampling, a series of replicate weights were computed. Unless the complex nature is taken into account, estimates of the variance of a survey statistic may be biased downward." In other words, not using weights means underestimating the variance. And if the true variance is actually higher, shouldn't that reduce the significance level? One particular variable I looked at has a probt score of 0.0158 when unweighted, and 0.0025 when weighted.
Based on what I found in the study documentation, I'm trying to use Proc Surveyfreq instead. However, this is confusing me as well. The Pr > ChiSq score is now <.0001 for every variable, even those that were not significant when I used Proc Ttest.
Here is the code, with sample data and Proc TTest commented out. I'm only including 1 of the replicate weights here, but there are actually 100 of them:
data have;
infile datalines dsd dlm=',' truncover;
input ACC_HCTROUBL_r ACC_HCDELAY_r ADRD_group TM_group
PUFFWGT PUFF001;
datalines;
3,1,1,0,1310.792231,1957.576268
2,1,1,1,10621.60998,18588.46812
3,2,1,1,3042.093381,5484.728615
3,2,1,1,3166.358963,5497.289892
3,2,1,0,1481.272986,432.6313548
2,2,1,1,6147.605583,9371.965632
2,1,1,1,14001.79093,16689.25322
3,1,1,1,2035.685768,530.211881
2,1,1,1,6356.258972,1899.874476
3,2,1,0,1487.104781,2018.636444
2,1,1,0,5002.553584,1364.125425
2,2,1,1,2493.79145,4039.542597
3,2,1,0,2260.257377,3495.675613
2,2,0,1,9358.048737,2835.543292
3,2,1,1,2978.506348,4932.378916
3,2,1,1,2794.906054,5118.430973
3,1,1,0,1663.418821,519.7549258
3,2,1,0,2083.459361,3067.105973
2,1,1,0,5106.785048,8672.202644
3,1,1,1,3447.574748,854.6276748
3,2,1,1,2819.233426,899.849234
3,2,1,0,4067.38684,6463.15598
3,2,1,1,1249.96647,2053.666234
3,2,1,1,1730.237908,3058.307502
3,2,1,1,4932.936202,1479.55826
; RUN;
/*PROC TTEST plots=none data=have;
CLASS TM_group;
VAR ACC_HCTROUBL_r ACC_HCDELAY_r;
WEIGHT PUFFWGT;
REPWEIGHT PUFF001; /*REPWEIGHT PUFF001-PUFF100;*/
RUN;*/
PROC SURVEYFREQ data=have VARMETHOD = brr (fay=.30);
TABLE ACC_HCTROUBL_r ACC_HCDELAY_r * TM_group / row chisq lrchisq;
WEIGHT PUFFWGT;
REPWEIGHT PUFF001; /*REPWEIGHT PUFF001-PUFF100;*/
WHERE ADRD_group ^= 1;
RUN;
... View more