Recently in the SAS Community Library: SAS' @StuartRogers provides a close look at the new Microsoft Entra Gallery application and details how it can be used.
data vs ; input patient cpevent$ vstest$ vsren vsorresu$; cards ; 100 base pr 78 min 100 scrn hr 72 /min 100 week1 sbp 70 mmhg 100 week7 dbp 110 mm 100 week21 weight 75 kg
100 fwp height 120 kg 100 base pr 78 min 100 scrn hr 79 /mn 100 week1 sbp 70 mmhg 100 week7 dbp 110 mm 100 week21 weight 80 pounds
100 fwp height 75 kg ;
Condition below
If VSTEST having same VSREN and VSORRESU for 4 consecutive visits then highlight those with flag as
"dulipcate record"
Based on above condition and data , create a flag
Please anyone help me .....
... View more
I trying to create a stacked bar chart comparing MA and TM groups on 8 different binary variables (I've only shown 2 here for simplicity) and 1 discrete (0-8) continuous variable. I've created a chart for the continuous variable, and it's pretty close to what I need. Here's what I need to change:
1) The bars are the same color, so it's not clear which group is higher/lower.
2) I'd like to add the name of the group that goes to each color.
3) I need to replace the variable name total_qual_care_score with a label such as "Total Quality of Care Score".
For the binary variables, I'd like to have both in a single chart. I haven't been able to create an example chart, but imagine that only the 0 and 1 bars exist. And instead of 0 and 1, the columns represent ACC_HCTROUBL_r and ACC_HCDELAY_r (with the labels "Trouble getting care" and "Delay in getting care", respectively).
Here's my current code with sample data:
data have;
infile datalines dsd dlm=',' truncover;
input Obs cohort_flag MA_non_ADRD_group TM_non_ADRD_group total_qual_care_score ACC_HCTROUBL_r ACC_HCDELAY_r;
datalines;
1,1,1,0,7,1,0
2,1,0,1,7,0,1
3,1,0,1,7,1,0
4,1,0,0,1,0,1
5,1,0,0,8,0,1
6,1,0,1,7,0,1
7,1,0,1,3,1,0
8,1,0,1,7,0,1
9,1,0,1,8,1,0
10,1,1,0,5,0,1
11,1,1,0,8,0,0
12,1,0,1,8,1,1
13,1,1,0,8,0,1
14,0,,,7,0,1
15,1,0,1,8,0,1
16,1,0,1,8,0,0
17,1,1,0,8,0,1
18,1,0,0,7,0,0
19,1,1,0,8,1,0
20,1,0,1,6,0,1
21,1,0,1,7,1,1
22,1,1,0,7,0,0
23,1,0,1,5,1,0
24,1,0,1,8,0,1
25,1,0,1,8,0,1
; RUN;
title1 "Section 1.2 -- Fig1 Unadj rates quality care TM vs MA without ADRD";
title2 "Version &version.";
PROC MEANS data=have mean n lclm uclm stackods;
class total_qual_care_score;
var TM_non_ADRD_group MA_non_ADRD_group;
ods output summary=temp.TM_MA_groupMean;
WHERE cohort_flag = 1 AND (TM_non_ADRD_group = 1 OR MA_non_ADRD_group = 1);
RUN;
PROC SGPLOT data=temp.TM_MA_groupMean;
vbarparm category=/*variable*/ total_qual_care_score response=mean /
limitlower=lclm
limitupper=uclm;
label mean="Proportion satisfied";
RUN;
... View more
Hi guys,
suppose to have the following:
data DB1;
input ID Discharge;
format Discharge date9.;
cards;
0001 19JUN2017
0001 07SEP2020
0002 17MAR2016
0003 05MAY2016
0003 08FEB2017
0004 22MAR2017
0004 03MAY2017
0004 28MAR2021
;
data DB2;
input ID Discharge Flag NewDate;
format Discharge NewDate date9.;
cards;
0001 19JUN2017 0 .
0001 07SEP2020 1 08SEP2020
0002 17MAR2016 1 18MAR2016
0003 05MAY2016 0 .
0003 08FEB2017 1 09FEB2017
0004 22MAR2017 0 .
0004 03MAY2017 0 .
0004 28MAR2021 1 29MAR2021
;
Is there a way to add a flag to DB1 where for each ID there's the latest date (if there is only one date as for ID 0002 the last date will be the one reported) and then add the new date corresponding to the last + 1 day? Desired output: DB2.
Thank you in advance
... View more
Hi all,
At https://developer.sas.com/apis/rest/v3.5/#filters there is supposed to be a link that explains filter expressions:
For a complete description of filter expressions, see the Filtering reference.
However, when clicking this link (on the word "Filtering") I do not see any such page. It looks like this "got lost" during the switch to the new developers.sas.com web site.
Anyone knows where can I find the doc about API filter expressions?
... View more
Document embeddings (or vectors, as the fashionable like to say) have emerged as a popular area due to the focus on Generative AI. Visual Text Analytics, a SAS Viya offering providing Natural Language Processing (NLP) capabilities, provides an option to train embeddings through the Topics node, backed by the Singular Vector Decomposition algorithm. I encourage you to refer here for a detailed discussion of topics.
The purpose of this article is to highlight a sometimes overlooked task when applying document embeddings for purposes of similarity-based search. Normalisation of vectors helps obtain relevant matches.
Why is this important?
First, let's consider vector embeddings. Simply put, these are numerical representation of text contained within a document. Represented as a series of columns in a table, each column refers to some feature (also known as a dimension) of the source document, and together, these columns represent the document as a whole.
Why do we need to transform a document into embeddings in the first place? Text content can be represented in multiple styles and forms, making it hard to organise, classify and analyse. Motivations for embedding documents include the following:
data standardisation - similar terms are packed as close numbers within dimensions rather than get treated as distinct units
feature engineering - data is organised under different dimensions each of which may carry different meaning
transformation for downstream applications such as analytics and machine learning, for which numbers are more amenable
masking - data is no longer represented as readable text, but as numerical proxies
Now, let's consider the definition of a vector. In mathematics, a vector's a quantity that contains magnitude (length) and direction. Therefore, it isn't just one number (which would make it a scalar) but a set of numbers which represent the number of dimensions.
This is an extremely useful property, since it allows for operations which measure how similar two documents are based on the distance between their vectors. Let's take a simple case involving a two-dimensional vector.
Yes, I know. Poor William's turning over somewhere in Stratford-upon-Avon, but that's the price you pay for fame.
The image above shows vectors for two documents depicted in two-dimensional space. Given their coordinate points, vectors enable calculation of distance between the embedding, a simple and common implementation of which is Euclidean distance. This works out to 1.414 (the approximate square root of 2). As the graph also shows, the vector distance can be viewed as the deviation in direction between the two vectors. A low value indicates that the two documents are more or less similar, which seems to be the case here, albeit to the horror of purists.
However, the utility of the above measure is limited! The reason is that this distance is highly vulnerable to scaling differences which may have been introduced during the embedding training process. Note that embeddings could originate from different sources and we cannot take their representation as standard. This also affects the extent to which we interpret any distance measure that's derived. Is 1.414 small (indicating similar) or large (divergent)? I'll never know until I use a standard. This is achieved through a process known as normalisation.
So, what should I do?
The principle behind vector normalisation is intuitive. Let's consider the same example again.
Let's introduce the unit vector. The unit vector refers to the vector values within the small green box bounded by (1,1). A unit vector is defined as a vector with a magnitude of 1. A magnitude, simply expressed, refers to the length of a vector. Recalling Pythagoras who used to haunt our geometry books, this can be calculated using the formula to calculate the hypothenuse of. a right angled triangle, namely,
Square root ( Sum of squares of dimensions)
Another name for the magnitude is norm, hence the term normalising the vector. To arrive at a normalised value, you simply divide the individual vector values by the magnitude. The resultant vector is a unit vector, which acts as a standard for carrying out similarity search and other vector-based operations.
In our simple example, the unit vectors work out to:
Document
Dimension 1
Dimension 2
Text 1
3 / square root(90)
9 / square root(90)
Text 2
4 / square root(80)
8 / square root(80)
Do it for me, please ?
Gladly. Please refer here for a SAS program which takes in an input table (or dataset) with vectors, and normalises the columns to a magnitude of 1.
The business end of this program can be found between lines 197 to 329. Notice that this program can run on both CAS and SAS (i.e. SAS 9 / SAS Compute or SAS Programming Runtime Environment) engines and uses array logic to normalise the vectors. Also to be noted is the use of the dictionary.columns table which helps us identify all "vector" columns in the input table which conform to a given name pattern. Highly convenient when dealing with typical vector data which does tend to run in the 100s of columns. Imagine writing an array for each one of those!
Give the code a whirl and let me know your feedback. You might also notice that the code has a lot of other programs wrapped around the same, a strong hint of my intention to also make it available as a SAS Studio Custom Step. Soon.
I want to meet you, shake your hand, and shower praise upon you.
Cash would be better. Actually, thank you, but no sweat. I'm happy to answer further questions, though. You can email me by clicking here. Glad you enjoyed it.
... View more