25
Apr

SAS®: Getting Started with PROC IML

Another powerful procedure of SAS, my favorite one, that I would like to share is the PROC IML (Interactive Matrix Language). This procedure treats all objects as a matrix, and is very useful for doing scientific computations involving vectors and matrices. To get started, we are going to demonstrate and discuss the following:
  • Creating and Shaping Matrices;
  • Matrix Query;
  • Subscripts;
  • Descriptive Statistics;
  • Set Operations;
  • Probability Functions and Subroutine;
  • Linear Algebra;
  • Reading and Creating Data;
Above outline is based on the IML tip sheet (see Reference 1). So to begin on the first bullet, consider the following code:

scalar
5
row_vec
123456
col_vec
1
2
3
4
5
6
num_mat
123
456
chr_mat
Hello,
world! :D
i_mat
100000
010000
001000
000100
000010
000001
mat_2
2
2
2
trow_vec
1
2
3
4
5
6
mat1
12
34
56
With the help of the comments in the code, it wouldn't be difficult to comprehend what each line tries to tell us, so I will only explain line 33. In SAS, variables defined are not automatically stored into the workspace unless one stores it first, and then call it on other procedures by loading the storage, which we'll see on the next entry -- Math Query. Functions we'll discuss in math query involve extracting number of columns, rows, and so on, below is the sample code of this,

 SYMBOL     ROWS   COLS TYPE   SIZE                     
------ ------ ------ ---- ------
CHR_MAT 2 1 char 9
COL_VEC 6 1 num 8
I_MAT 6 6 num 8
MAT1 3 2 num 8
MAT_2 3 1 num 8
NUM_MAT 2 3 num 8
ROW_VEC 1 6 num 8
SCALAR 1 1 num 8
TROW_VEC 6 1 num 8
Number of symbols = 10 (includes those without values)

nmat_row
2
nmat_col
3
nmat_dim
23
cmat_len
6
9
cmat_nlen
9
nmat_typ
N
cmat_typ
C
So to load all variables stored in the workspace, we use line 3. Succeeding lines are not that difficult to understand, and this what I love about SAS, the statements and functions are self-explanatory -- a good excuse for us to proceed with subscripting on matrices, below is the code of it

NUM_MAT
123
456
n22_mat
5
nr1_mat
123
ir12_mat
100000
010000
ic12_mat
10
01
00
00
00
00
ngm_mat
3.5
ncm_mat
2.53.54.5
nrm_mat
2
5
ngs_mat
21
nrs_mat
172945
ncs_mat
14
77
nss_mat
91
nrs_mat
172945
ncs_mat
14
77
Line 17 computes the grand mean of the matrix by simply inserting : symbol inside the place holder of the subscript. So that if we have num_mat[:, 1], then mean is computed over the row entries, giving us the column mean, particularly for first column. The same goes for num_mat[1, :], where it computes the mean over the column entries, giving us the row mean. If we replace the symbol in the place holder of the subscripts to +, then we are interested in the sum of the entries. Further, if we use ## symbol, the returned value will be the sum of square of the elements. And reducing this to #, the returned value will be the product of the elements.

Now let's proceed to the next bullet, which is about Descriptive Statistics.

csr_vec
136101521
csn_mat
136
101521
mnr_vec
1
mnn_mat
1
mxr_vec
6
mxn_mat
6
smr_vec
21
smn_mat
21
ssr_vec
91
ssn_mat
91
To generate random numbers from say normal distribution and computing the mean, standard deviation and other statistics, consider the following:

x1
0.2642335
1.0747269
0.8179241
-0.552775
1.5401449
-1.233822
-0.141535
1.0420036
0.0657322
1.225259
-0.148304
0.2901233
-1.149394
-0.482548
-0.452974
0.2738675
-0.224133
0.218553
-0.420015
0.246356
x2
54.993687
58.167325
59.147705
40.74794
45.813645
53.460273
57.877839
51.98273
49.875743
52.570553
54.097005
46.936325
57.509082
50.463228
42.775346
39.376643
53.303455
54.494482
55.747821
44.512206
x12
0.264233554.993687
1.074726958.167325
0.817924159.147705
-0.55277540.74794
1.540144945.813645
-1.23382253.460273
-0.14153557.877839
1.042003651.98273
0.065732249.875743
1.22525952.570553
-0.14830454.097005
0.290123346.936325
-1.14939457.509082
-0.48254850.463228
-0.45297442.775346
0.273867539.376643
-0.22413353.303455
0.21855354.494482
-0.42001555.747821
0.24635644.512206
x12_cor
1-0.001531
-0.0015311
x12_cov
0.5645625-0.006864
-0.00686435.614684
x1_mu
0.1126712
x2_std
5.967804
Line 2 above sets the initial random seed for random numbers to be generated in line 8. Line 5 allocates a matrix of dimension 20 by 1 to x1 variable, and that's done by using the j function. The number of rows of x1 represents the sample size of the random numbers needed. One can also set x1 to a row vector, where in this case, the number of columns represents the sample size needed. The two sets of random sample, x1 and x2, generated from the same family of distribution, Gaussian/Normal, are then concatenated column-wise (||) to form a matrix of size 20 by 2 in line 13. Using this new matrix, x12, we can then compute the correlation and covariance of the two columns using corr and cov functions, respectively, which from the above output tells us that there is almost no relation between the two.

SAS can also perform set operations, and it's easy. Consider the following:

B_comp
aimx
A_comp
ehrty
AuB
aehimortxy
AnB
o
AB_unq
aehimortxy
Next bullet is all about Probability Functions and Subroutine. For example, consider an experiment defined by the random variable $X$ which follows an exponential distribution with mean $beta = .5$. What is the probability of $X$ to be at most 2, $mathrm{P}(Xleq 2)$? To solve this we use the CDF function, but note that the exponential density in SAS is given by $$f(x|beta)=frac{1}{beta}expleft[-frac{x}{beta}right].$$ So to compute the probability, we solve for the following integration, $$ mathrm{P}(Xleq 2)=int_{0}^{2}frac{1}{.5}expleft[-frac{x}{.5}right]operatorname{d}x = 0.9816844 $$ To confirm this in SAS, run the following

px
0.9816844
If we take the derivative of the Cumulative Distribution Function (CDF), the returned expression is what we call the Probability Density Function (PDF). And in SAS, we play on this using the PDF function. For example, we can confirm the above probability by integrating the PDF. And to do so, run the following

px
0.9816844
To end this topic, consider the inverse of the CDF, which is the quantile. To compute for the quantile of the popular level of significance $alpha = 0.05$, from a standard normal distribution, which is $z_{alpha} = -1.645$ for lower tail, run

z_a
-1.644854
Next entry is about Linear Algebra, the topic on which this procedure is based upon. Linear algebra is very useful in Statistics, especially in Regression, Nonlinear Regression, and Multivariate Analysis. To perform this in SAS, consider

xm_det
-1
xm_inv
1-32
-33-1
2-14.441E-16
x_evl
11.344814
0.1709152
-0.515729
x_evc
0.32798530.5910090.7369762
0.591009-0.7369760.3279853
0.73697620.3279853-0.591009
x_coef
3
-4
2
Finally, one of the coolest capabilities of SAS/IML is to Read and Create SAS Data. The following code demos how to read SAS data set.

x_dat
Acura
Acura
Acura
Acura
Acura
Acura
Acura
Audi
Audi
Audi
hp_mean
215.88551
And to create a SAS data set, run

ObsCOL1COL2COL3
1123
2456
To end this post, I want to say, I am loving SAS because of IML. There are still hidden capabilities of this procedure that I would love to explore and share to my readers, so stay tuned. Another great blog about SAS/IML is The DO Loop, whose author, Dr. Rick Wicklin, is also the principal developer of the said procedure and SAS/IML Studio, do check that out.

Reference

  1. SAS/IML Tip Sheet. Frequently Used SAS/IML Functions and Subroutines.
  2. SAS/IML 13.2 User Guide.
  3. Rick Wicklin. The DO Loop. How to numerically integrate a function in SAS.
Read More
25
Apr

Getting Involved in the SAS Community: How and Why

Why? Are you kidding me? If you are a programmer, analyst, statistician, professor or student who uses SAS this is an opportunity to get to know your people and to get known. I’m in Dallas for the SAS Global Forum, which I try to attend whenever I can. Yes, I could watch videos on the Internet, […] Read More
24
Apr

Difference can be misleading

A very common type of graph contains two series plot, where the user is expected to evaluate the difference visually. I saw one such plot on the web today shown on the right.  This graph has two curves, one for malpractice premiums and one for claims, with a shaded band […]

The post Difference can be misleading appeared first on Graphically Speaking.

Read More
24
Apr

Electronic Data Capture and Informed Consent: New FDA Guidance

Informed consent under the FDA’s Food Drug and Cosmetic Act helps people understand the risks and potential benefits of a study, and a person must sign an informed consent document before joining a study to show he or she was provided relevant information and understands it. Informed consent is now better aligned with electronic data […] Read More
24
Apr

SAS Global Forum is live in 3, 2, 1…

There’s a common aphorism that everything is bigger in Texas. So when SAS decided to host SAS Global Forum and SAS Global Forum Executive Conference in Dallas - it had to go big. Both conferences will kick off Sunday evening with a Texas-sized crowd of more than 4,500 attendees. But […]

The post SAS Global Forum is live in 3, 2, 1… appeared first on SAS Users.

Read More
24
Apr

SAS Global Forum 2015: What's inside The Quad?

Every college and university seems to have its own unique version of “The Quad”, where folks mingle and serious (and more often less serious!) happenings are staged. At SAS Global Forum, The Quad is the former SAS Support and Demo area, freshened up with today’s casual atmosphere, more interactive spaces […]

The post SAS Global Forum 2015: What's inside The Quad? appeared first on SAS Users.

Read More
24
Apr

Technical experts on hand at SAS Global Forum

The SAS Training and Certification groups are excited to participate in SAS Global Forum 2015! We’ll have a booth in the Quad where you can stop by to ask questions, talk to your favorite instructor and register to win an iPad! We offer courses on almost every SAS product so to […]

The post Technical experts on hand at SAS Global Forum appeared first on The SAS Training Post.

Read More
23
Apr

SAS support site—not just another pretty face

Sometimes you just need a new pair of shoes or a brand new hat. Something so small can add a pep to your step and allow you to see new opportunities in the same old places. Don't believe me? Try it. We did and I can't wait for you to […]

The post SAS support site—not just another pretty face appeared first on SAS Users.

Read More
22
Apr

More reasons to stop smoking!

Smoking is an addictive habit that can kill you - if you don't believe me, check out the infographic in this blog post. Recently a friend of mine was on the episode of the Dr. Phil show that focused on "quitting smoking." Here's a picture of Traci with Dr. Phil […]

The post More reasons to stop smoking! appeared first on The SAS Training Post.

Read More
22
Apr

Appearing for 3 days at the Quad...

SAS Global Forum 2015 is just a few days away. Many of you are making plans and deciding what presentations to attend, how to connect with old friends or meet new ones, and just how to pack your cowboy hat and boots. This will be the first SAS Global Forum […]

The post Appearing for 3 days at the Quad... appeared first on Key Happenings at support.sas.

Read More
Back to Top