7
Aug

The curse of non-unique eigenvectors

A SAS customer asked, "I computed the eigenvectors of a matrix in SAS and in another software package. I got different answers? How do I know which answer is correct?"

I've been asked variations of this question dozens of times. The answer is usually "both answers are correct."

The mathematical root of the problem is that eigenvectors are not unique. It is easy to show this: If v is an eigenvector of the matrix A, then by definition A v = λ v for some scalar eigenvalue λ. Notice that if you define u = α v for a scalar α ≠ 0, then u is also an eigenvector because A u = α A v = α λ v = λ u. Thus a multiple of an eigenvector is also an eigenvector.

Most statistical software (including SAS) tries to partially circumvent this problem by standardizing an eigenvector to have unit length (|| v || = 1). However, note that v and -v are both eigenvectors that have the same length. Thus even a standardized eigenvector is only unique up to a ± sign, and different software might return eigenvectors that differ in sign. In fact for some problems, the same software can return different answers when run on different operating systems (Windows versus Linux), or when using vendor-supplied basic linear algebra subroutines such as the Intel Math Kernel Library (MKL).

To further complicate the issue, software might sort the eigenvalues and eigenvectors in different ways. Some software (such as MATLAB) orders eigenvalues by magnitude, which is the absolute value of the eigenvalue. Other software (such as SAS) orders eigenvalues according to the value (of the real part) of the eigenvalues. (For most statistical computations, the matrices are symmetric and positive definite (SPD). For SPD matrices, which have real nonnegative eigenvalues, these two orderings are the same.)

Eigenvectors of an example matrix

To illustrate the fact that different software and numerical algorithms can produce different eigenvectors, let's examine the eigenvectors of the following 3 x 3 matrix:

The eigenvectors of this matrix will be computed by using five different software packages: SAS, Intel's MKL, MATLAB, Mathematica, and R. The eigenvalues for this matrix are unique and are approximately 16.1, 0, and -1.1. Notice that this matrix is not positive definite, so the order of the eigenvectors will vary depending on the software. Let's compute the eigenvectors in five different ways.

Method 1: SAS/IML EIGEN Call: The following statements compute the eigenvalues and eigenvectors of M by using a built-in algorithm in SAS. This algorithm was introduce in SAS version 6 and was the default algorithm until SAS 9.4.

proc iml;
reset FUZZ;         /* print very small numbers as 0 */
M = {1  2  3,
     4  5  6,
     7  8  9};
reset EIGEN93;      /* use "9.3" algorithm; no vendor BLAS (option required for SAS 9.4m3) */
call eigen(EigVal, SASEigVec, M);
print EigVal, SASEigVec[colname=("E1":"E3")];

Notice that the eigenvalues are sorted by their real part, not by their magnitude. The eigenvectors are returned in a matrix. The i_th column of the matrix is an eigenvector for the i_th eigenvalue. Notice that the eigenvector for the largest eigenvalue (the first column) has all positive components. The eigenvector for the zero eigenvalue (the second column) has a negative component in the second coordinate. The eigenvector for the negative eigenvalue (the third column) has a negative component in the third coordinate.

Method 2: Intel MKL BLAS: Starting with SAS/IML 14.1, you can instruct SAS/IML to call the Intel Math Kernel Library for eigenvalue computation if you are running SAS on a computer that has the MKL installed. This feature is the default behavior in SAS/IML 14.1 (SAS 9.4m3), which is why the previous example used RESET NOEIGEN93 to get the older "9.3 and before" algorithm. The output for the following statements assumes that you are running SAS 9.4m3 or later and your computer has Intel's MKL.

reset NOEIGEN93;   /* use Intel MKL, if available */
call eigen(EigVal, MKLEigVec, M);
print MKLEigVec[colname=("E1":"E3")];

This is a different result than before, but it is still a valid set of eigenvectors The first and third eigenvectors are the negative of the eigenvectors in the previous experiment. The eigenvectors are sorted in the same order, but that is because SAS (for consistency with earlier releases) internally sorts the eigenvectors that the MKL returns.

Method 3: MATLAB: The following MATLAB statements compute the eigenvalue and eigenvectors for the same matrix:

M = [1, 2, 3; 4, 5, 6; 7, 8, 9];
[EigVec, EigVal] = eig(M);
EigVec =
-0.2320   -0.7858    0.4082
-0.5253   -0.0868   -0.8165
-0.8187    0.6123    0.4082

The eigenvalues are not displayed, but you can tell from the output that the eigenvalues are ordered by magnitude: 16.1, -1.1, and 0. The eigenvectors are the same as the MKL results (within rounding precision), but they are presented in a different order.

Method 4: Mathematica: This example matrix is used in the Mathematica documentation for the Eigenvectors function:

Eigenvectors[{{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}]
0.283349  -1.28335    1
0.641675  -0.141675  -2
1          1          1

This is a different result, but still correct. The symbolic computations in Mathematica do not standardize the eigenvectors to unit length. Instead, they standardize them to have a 1 in the last component. The eigenvalues are sorted by magnitude (like the MATLAB output), but the first column has opposite signs from the MATLAB output.

Method 5: R: The R documentation states that the eigen function in R calls the LAPACK subroutines. Thus I expect it to get the same result as MATLAB.

M <- matrix(c(1:9), nrow=3, byrow=TRUE)
r <- eigen(M)
r$vectors
           [,1]        [,2]       [,3]
[1,] -0.2319707 -0.78583024  0.4082483
[2,] -0.5253221 -0.08675134 -0.8164966
[3,] -0.8186735  0.61232756  0.4082483

Except for rounding, this result is the same as the MATLAB output.

Summary

This article used a simple 3 x 3 matrix to demonstrate that different software packages might produce different eigenvectors for the same input matrix. There were four different answers produced, all of which are correct. This is a result of the mathematical fact that eigenvectors are not unique: any multiple of an eigenvector is also an eigenvector! Different numerical algorithms can produce different eigenvectors, and this is compounded by the fact that you can standardize and order the eigenvectors in several ways.

Although it is hard to compare eigenvectors from different software packages, it is not impossible. First, make sure that the eigenvectors are ordered the same way. (You can skip this step for symmetric positive definite matrices.) Then make sure they are standardized to unit length. If you do those two steps, then the eigenvectors will agree up to a ± sign.

The post The curse of non-unique eigenvectors appeared first on The DO Loop.

1
Mar

Monte Carlo estimates of joint probabilities

Monte Carlo techniques have many applications, but a primary application is to approximate the probability that some event occurs. The idea is to simulate data from the population and count the proportion of times that the event occurs in the simulated data.

For continuous univariate distributions, the probability of an event is the area under a density curve. The integral of the density from negative infinity to a particular value is the definition of the cumulative distribution function (CDF) for a distribution. Instead of performing numerical integration, you can use Monte Carlo simulation to approximate the probability.

One-dimensional CDFs

In SAS software, you can use the CDF function to compute the CDF of many standard univariate distributions. For example, the statement prob = cdf("Normal", -1) computes the probability that a standard normal random variable takes on a value less than -1.

The CDF function is faster and more accurate than a Monte Carlo approximation, but let's see how the two methods compare. You can estimate the probability P(X < -1) by generating many random values from the N(0,1) distribution and computing the proportion that is less than -1, as shown in the following SAS DATA step

data _NULL_;
call streaminit(123456);
N = 10000;                       /* sample size */
do i = 1 to N;
   x = rand("Normal");           /* X ~ N(0,1) */
   cnt + (x < -1);               /* sum of counts for value less than -1 */
end;
Prob = cdf("Normal", -1);        /* P(x< -1) */
MCEst = cnt / N;                 /* Monte Carlo approximation */
put Prob=;
put MCEst=;
run;
Prob =0.1586552539
MCEst=0.1551

The Monte Carlo estimate is correct to two decimal places. The accuracy of this Monte Carlo computation is proportional to 1/sqrt(N), where N is the size of the Monte Carlo sample. Thus if you want to double the accuracy you need to quadruple the sample size.

Two-dimensional CDFs

SAS provides the PROBBNRM function for computing the CDF of a bivariate normal distribution, but does not provide a built-in function that computes the CDF for other multivariate probability distributions. However, you can use Monte Carlo techniques to approximate multivariate CDFs for any multivariate probability distribution for which you can generate random variates.

I have previously blogged about how to use the PROBBNRM function to compute the bivariate normal CDF. The following SAS/IML statements demonstrate how to use a Monte Carlo computation to approximate the bivariate normal CDF. The example uses a bivariate normal random variable Z ~ MVN(0, Σ), where Σ is the correlation matrix with Σ12 = 0.6.

The example computes the probability that a bivariate normal random variable is in the region G = {(x,y) | x<x0 and y<y0}. The program first calls the built-in PROBBNRM function to compute the probability. Then the program calls the RANDNORMAL function to generate 100,000 random values from the bivariate normal distribution. A binary vector (group) indicates whether each observation is in G. The MEAN function computes the proportion of observations that are in the region.

proc iml;
x0  = 0.3; y0 = 0.4; rho = 0.6; 
Prob = probbnrm(x0, y0, rho);      /* P(x<x0 and y<y0) */
 
call randseed(123456);
N = 1e5;                           /* sample size */
Sigma = { 1   &rho,                /* correlation matrix */
         &rho 1 };
mean = {0 0};
Z = randnormal(N, mean, Sigma);    /* sample from MVN(0, Sigma) */
group = (Z[,1] < x0 & Z[,2] < y0); /* binary vector */
MCEst = mean(group);               /* = sum(group=1) / N  */
print Prob MCEst;

You can use a scatter plot to visualize the Monte Carlo technique. The following statements create a scatter plot and use the DROPLINE statement in PROC SGPLOT to indicate the region G. Of the 100000 random observations, 49750 of them were in the region G. These observations are drawn in red. The observations that are outside the region are drawn in blue.

ods graphics / width=400px height=400px;
title "Estimate of P(x < x0  and y < y0) is 0.4978";
title2 "x0 = 0.3; y0 = 0.4; rho = 0.6";
call scatter(Z[,1], Z[,2]) group=group grid={x y}
     procopt="noautolegend aspect=1"
     option="transparency=0.9  markerattrs=(symbol=CircleFilled)"
     other="dropline x=0.3 y=0.4 / dropto=both;";

Higher dimensions

The Monte Carlo technique works well in low dimensions. As the dimensions get larger, you need to generate a lot of random variates in order to obtain an accurate estimate. For example, the following statements generate 10 million random values from the five-dimensional distribution of uncorrelated normal variates and estimate the probability of all components being less than 0:

d = 5;                         /* dimension */
N = 1e7;                       /* sample size */
mean = j(1, d, 0);             /* {0,0,...,0} */
Z = randnormal(N, mean, I(d)); /* Z ~  MVN (0, I)  */
v0 = {0 0 0 0 0};              /* cutoff values in each component */
ComponentsInRegion = (Z < v0)[,+]; /* number of components in region */
group = (ComponentsInRegion=d);    /* binary indicator vector */
MCEst = mean(group);               /* proportion of obs in region */
print (1/2**d)[label="Prob"] MCEst;

Because the normal components are independent, the joint probability is the product of the probabilities for each component: 1/25 = 0.03125. The Monte Carlo estimate is accurate to three decimal places.

The Monte Carlo technique can also handle non-rectangular regions. For example, you can compute the probability that a random variable is in a spherical region.

The Monte Carlo method is computationally expensive for high-dimensional distributions. In high dimensions (say, d > 10), you might need billions of random variates to obtain a reasonable approximation to the true probability. This is another example of the curse of dimensionality.

The post Monte Carlo estimates of joint probabilities appeared first on The DO Loop.

16
Nov

Need to log-transform a distribution? There's a SAS function for that!

At a conference last week, a presenter showed SAS statements that compute the logarithm of a probability density function (PDF). The log-PDF is a a common computation because it occurs when maximizing the log-likelihood function.

The presenter computed the expression in SAS by using an expression that looked like
y = LOG(PDF("distribution", x, params));
In other words, he computed the PDF and then transformed the density by applying the LOG function.

There is a better way. It is more efficient and more accurate to use the LOGPDF function to compute the log-PDF directly.


Special functions for log-transformed distributions #SASTip
Click To Tweet


Log-transformed distributions

SAS provides several functions for computing with log-transformed distributions. In addition to the LOGPDF function, you can use the LOGCDF function to compute probabilities for the log-distribution.

Let's use the gamma distribution to see why the LOGPDF function is usually more efficient. The gamma density function with unit scale parameter is given by
f(x; a) = xa-1 exp(-x) / Γ(a)
where Γ(a) is the value of the gamma function evaluated at a.

The log-transform of the gamma density is
log(f(x; a)) = (a-1)log(x) – x – log(Γ(a))
This formula, which is used when you compute LOGPDF("gamma", x, 2), is cheap to evaluate and does not suffer from numerical underflow when x is large. In particular, recall that in double-precision arithmetic, exp(-x) underflows if x > 709.78, so the PDF function cannot support extremely large values of x. In contrast, the formula for the log-PDF does not experience any numerical problems when x is large. The log-PDF function is also very fast to compute.

The following DATA step illustrates how to use the LOGPDF function to compute the log-gamma density. It computes the log-gamma PDF two ways: by using the LOGPDF function and by using the log-transform of the gamma PDF. The results show that the numerical values are nearly the same, but that the PDF function fails for large values of x:

data LogPDF(drop=a);
a = 2;
do x = 100, 700, 720, 1000;
   LogOfPDF = log( pdf("Gamma", x, a) ); 
   LOGPDF = logpdf("Gamma", x, a);        /* faster and more accurate */
   diff = LOGPDF - LogOfPDF;
   output;
end;
label LOGOfPDF = "LOG(PDF(x))";
run;
 
proc print noobs label; run;
Density of the log-gamma(2) distribution for large x

By the way, SAS provides other functions that are optimized for computing the log-transformation of several well known funcions and quantities:

  • The LOGBETA function computes the logarithm of the beta function. You should use logbeta(a,b) instead of log(beta(a,b)).
  • The LGAMMA function computes the logarithm of the gamma function. You should use lgamma(a) instead of log(gamma(a)).
  • The LCOMB function computes the logarithm of the number of combinations of n objects taken k at a time. You should use lcomb(n,k) instead of log(comb(n,k)).
  • The LFACT function computes the quantity log(n!) for specified values of n. You should use lfact(n) instead of log(fact(n)).

In general, when software provides a function for directly computing the logarithm of a quantity, you should use it. The special-purpose function is typically faster, more accurate, and will handle arguments that are beyond the domain of the non-specialized function.

tags: Numerical Analysis

The post Need to log-transform a distribution? There's a SAS function for that! appeared first on The DO Loop.

31
Aug

The Lambert W function in SAS

Branches of the real-valued Lambert W function

This article describes how you can evaluate the Lambert W function in SAS/IML software. The Lambert W function is defined implicitly: given a real value x, the function's value w = W(x) is the value of w that satisfies the equation w exp(w) = x. Thus W is the inverse of the function g(w) = w exp(w).

Because the function g has a minimum value at w=-1, there are two branches to the Lambert W function. Both branches are shown in the adjacent graph. The top branch is called the principal (or positive) branch and is denoted Wp. The principal branch has domain x ≥ -1/e and range W(x) ≥ -1. The lower branch, denoted by Wm, is defined on -1/e ≤ x < 0 and has range W(x) ≤ = -1. The subscript "m" stands for "minus" since the range contains only negative values. Some authors use W0 for the upper branch and W-1 for the lower branch.

Both branches are used in applications, but the lower branch Wm is useful for the statistical programmer because you can use it to simulate data from heavy-tailed distribution. Briefly: for a class of heavy-tailed distributions, you can simulate data by using the inverse CDF method, which requires evaluating the Lambert W function.

Defining the Lambert W function in SAS/IML

You can download the SAS/IML functions that evaluate each branch of the Lambert W function. Both functions use an analytical approximation for the Lambert W function, followed by one or two iterations of Halley's method to ensure maximum precision:

  • The LambertWp function implements the principal branch Wp. The algorithm constructs a direct global approximation on [-1/e, ∞) based on Boyd (1998).
  • The LambertWm function implements the second branch Wm. The algorithm follows Chapeau-Blondeau and Monir (2002) (hereafter CBM). The CBM algorithm approximates Wm(x) on three different domains. The approximation uses a series expansion on [-1/e, -0.333), a rational-function approximation on [-0.333, -0.033], and an asymptotic series expansion on (-0.033, 0).

Calling the Lambert W function in SAS/IML

Download the SAS/IML program for this article and save it to a file called LambertW.sas. Then the following SAS/IML program shows how to call the functions for the upper and lower branches and plot the graphs:

%include "C:MyPathLambertW.sas";      /* specify path to file */
proc iml;
load module=(LambertWp LambertWm);
 
title "Lambert W Function: Principal Branch";
x = do(-1/exp(1), 1, 0.01);
Wp = LambertWp(x);
call series(x, Wp) grid={x y} other="refline 0/axis=x; refline 0/axis=y";
 
title "Lambert W Function: Lower Branch";
x = do(-1/exp(1), -0.01, 0.001);
Wm = LambertWm(x);
call series(x, Wm) grid={x y};

The graphs are not shown because they are indistinguishable from the graph at the beginning of article.

You can use a simple error analysis to investigate the accuracy f these functions. After computing w = W(x), apply the inverse function g(w) = w exp(w) and see how close the result is to the input values. The LambertWp and LambertWm functions support an optional second parameter that specifies how many Halley iterations are performed. For LambertWm, only one iteration is performed by default. You can specify "2" as the second parameter to get two Halley iterations. The following statements show that the maximum error for the default computation is 2E-11 on (-1/e, -0.01). The error decreases to 5E-17 if you request two Halley iterations.
x = do(-1/exp(1), -0.01, 0.001);
Wm = LambertWm(x);            /* default precision: 1 Halley iteration */
g = Wm # exp(Wm);             /* apply inverse function */
maxError1 = max(abs(x - g));  /* maximum error */
 
Wm = LambertWm(x, 2);         /* higher precision: 2 Halley iterations */
g = Wm # exp(Wm);
maxError2 = max(abs(x - g));
print maxError1 maxError2;
t_LambertW

In summary, the SAS/IML language provides an excellent environment for implementing new functions or statistical procedures that are not built into SAS. In this article I implemented methods from journal articles for evaluating the upper and lower branches of the Lambert W function, which is the inverse function of g(w) = w exp(w). Although the Lambert W function does not have a closed-form solution, you can implement an approximation to the function and then use one or two Halley iterations to quickly converge within machine precision.

tags: Numerical Analysis

The post The Lambert W function in SAS appeared first on The DO Loop.

24
Aug

Halley's method for finding roots

Edmond Halley (1656-1742) is best known for computing the orbit and predicting the return of the short-period comet that bears his name. However, like many scientists of his era, he was involved in a variety of mathematical and scientific activities. One of his mathematical contributions is a numerical method for finding the roots of a real-valued function. The technique, which is called Halley's method, is similar to Newton's method, but converges more rapidly in the neighborhood of a root.

There are dozens of root-finding methods, but Newton's method is the gold standard for finding roots of general nonlinear functions. It converges quadratically and is easy to implement because the formula requires only the evaluation of a function and its first derivative. Other competing methods do not use derivatives at all, or they use higher-order derivatives.

Halley's method falls into the latter category. Like Newton's method, it requires an initial guess for the root. It evaluates the function and its first two derivatives at an approximate location of the root and uses that information to iteratively improve the approximation. This article compares Halley's method with Newton's method and suggests a class of functions for which Halley's method is preferable.


Halley's root-finding method: An alternative to Newton's method #SASTip
Click To Tweet


An example of finding a root in SAS/IML

Consider the function f(x) = x exp(x). If you want to find the values of x for which f(x)=c, you can recast the problem as a root-finding problem and find the roots of the function f(x)-c. For this particular function, the equation f(x)-c has one root if c ≥ 0 and two roots if -1/e < c < 0.

Halley's method: The roots of the function x*exp(x) + 1/(2e)

To be concrete, let c = -1/(2e) so that the equation has two roots. The function f(x)-c is shown to the right and the two roots are marked by red line segments. You can use the built-in FROOT function in SAS/IML to locate the roots. The FROOT function does not use Newton's method. Instead, it requires that you specify an interval on which to search for a root. From the graph, you can specify two intervals that bound roots, as follows:

proc iml;
/* The equation f(x)-c for f(x) = x exp(x) */
start func(x) global(g_target);
   return x # exp(x) - g_target;
finish;
 
g_target = -1 / (2*constant('e')); /* parameter; function has two roots */
intervals = {-3 -2,                /* one root in [-3,-2] */
             -1  0};               /* another root in [-1,0] */
roots = froot("func", intervals);  /* find roots in each interval */
print roots;
t_halley1

The FROOT function is effective and efficient. It always finds an approximate root provided that you can supply a bounding interval on which the function changes signs. However, sometimes you don't know a bounding interval, you only know an approximate value for the root. In that case, Newton-type methods are preferable.

Newton's method versus Halley's method

Halley's method is an extension of Newton's method that incorporates the second derivative of the target function. Whereas Newton's method iterates the formula xn+1 = xn - f(xn) / f′(xn), Halley's method contains an extra term in the denominator:
xn+1 = xn - f(xn) / [f′(xn+1) - 0.5 f(xn) f″(xn) / f′(xn)].
Notice that if f″(xn)=0, then Halley's method reduced to Newton's method.

For many functions, the computational cost of evaluating the extra term in Halley's formula does not offset the gain in convergence speed. In other words, it is often quicker and easier to use Newton's simpler formula than Halley's more complicated formula. However, Halley's method is worth implementing when the ratio f″(x) / f′(x) can be evaluated cheaply. The current function provides an example: f′(x) = (x+1) exp(x) and f″(x) = (x+2) exp(x), so the ratio is simply (x+2) / (x+1). This leads to the following SAS/IML functions. One function implements Newton's iteration and the other implements Halley's iteration:

/* Compute iterations of Newton's method for several initial guesses:
   f(x) = x#exp(x) - c */
start NewtonIters(x0, numIters=1) global(g_target);
   z = j(numIters+1, ncol(x0));
   z[1,] = x0;
   do i = 2 to numIters+1;
      x = z[i-1,];
      fx = x # exp(x) - g_target;         /* f(x)   */
      dfdx = (1+x) # exp(x);              /* f'(x)  */
      z[i,] = x - fx / dfdx;              /* new approximate root */
   end;
   return z;
finish;
 
/* Compute iterations of Halley's method for several initial guesses:
   f(x) = x#exp(x) - c */
start HalleyIters(x0, numIters=1) global(g_target);
   z = j(numIters+1, ncol(x0));
   z[1,] = x0;
   do i = 2 to numIters+1;
      x = z[i-1,];
      fx = x # exp(x) - g_target;         /* f(x)   */
      dfdx = (1+x) # exp(x);              /* f'(x)  */
      R = (2+x) / (1+x);                  /* ratio f''(x) / f'(x) */
      z[i,] = x - fx / (dfdx - 0.5*fx#R); /* new approximate root */
   end;
   return z;
finish;

Notice that the functions are practically identical. Halley's method computes an extra term (R) and includes that term in the iterative formula that converges to the root. I wrote the function in vectorized form so that you can pass in multiple initial guesses and the functions will iterate all guesses simultaneously. The following statements provide four initial guesses and apply both Newton's and Halley's method:

x0 = {-3 -2 -0.5 0.25};        /* multiple initial guesses */
N = NewtonIters(x0, 5);        /* compute 5 Newton iterations */
H = HalleyIters(x0, 3);        /* compute 3 Halley iterations */                
rNames = "Iter=0":"Iter=5";
print N[L="Newton's Method" c=("Guess1":"Guess4") r=rNames];
print H[L="Halley's Method" c=("Guess1":"Guess4") r=rNames];
t_halley2

The tables show that Halley's method tends to converge to a root faster than Newton's method. For the four initial conditions, Newton's method required three, four, or five iterations to converge to within 1e-6 of the root. In contrast, Haley's method used three or fewer iterations to reach the same level of convergence.

Again, for Halley's method to be useful in practice, the ratio f″(x) / f′(x) must be easy to evaluate. To generalize this example, the ratio simplifies if the function is the product of a simple function and an exponential: f(x) = P(x)*exp(x). For example, if P(x) is a polynomial, then f″ / f′ is a rational function.

In summary, Halley's method is a powerful alternative to Newton's method for finding roots of a function f for which the ratio f″(x) / f′(x) has a simple expression. In that case, Halley's root-finding method is easy to implement and converges to roots of f faster than Newton's method for the same initial guess.

tags: Numerical Analysis

The post Halley's method for finding roots appeared first on The DO Loop.

18
May

All I really need to know about Newton's method I learned in primary school

I was eleven years old when I first saw Newton's method.

No, I didn't go to a school for geniuses. I didn't even know it was Newton's method until decades later. However, in sixth grade I learned an iterative algorithm that taught me (almost) everything I need to know about Newton's method for finding the roots of functions.

Newton's method and an iterative square root algorithm

The algorithm I learned in the sixth grade is an iterative procedure for computing square roots by hand. It seems like magic because it estimates a square root from an arbitrary guess. Given a positive number S, you can guess any positive number x0 and then apply the "magic formula" xn+1 = (xn + S / xn)/2 until the iterates converge to the square root of S. For details, see my article about the Babylonian algorithm.

It turns out that this Babylonian square-root algorithm is a special case of Newton's general method for finding the roots of functions. To see this, define the function f(x) = x2 - S. Notice that the square root of S is the positive root of f. Newton's method says that you can find roots by forming the function Q(x) = x - f(x) / f′(x), where f′ is the derivative of f, which is 2x. With a little bit of algebra, you can show that Q(x) = (x + S / x)/2, which is exactly the "magic formula" in the Babylonian algorithm.

Therefore, the square root algorithm is exactly Newton's method applied to a special quadratic polynomial whose root is the square root of S. The Newton iteration process is visualized by the following graph (click to enlarge), which shows the iteration history of the initial guess 10 when S = 20.

Iteration of initial guess under Babylonian square-root algorithm

Everything I need to know about Newton's method...

It turns out that almost everything I need to know about Newton's method, I learned in sixth grade. The Babylonian method for square roots provides practical tips about how to use Newton's method for finding roots:

  • You need to make an initial guess.
  • If you make a good initial guess, the iteration process is shorter.
  • When you get close to a solution, the number of correct digits doubles for each iteration. This is quadratic convergence in action!
  • Avoid inputs where the function is not defined. For the Babylonian method, you can't plug in x=0. In Newton's method, you need to avoid inputs where the derivative is close to zero.
  • Different initial guesses might iterate to different roots. In the square root algorithm, a negative initial guess converges to the negative square root.

All I really need to know about Newton's method I learned in sixth grade.
Click To Tweet


In fact, I'll go further. The Babylonian method teaches important lessons in numerical analysis:

  • Root-finding is a fundamental tool that helps you solve all kinds of numerical problems.
  • Many problems that do not have exact answers can be solved with iterative methods.
  • Iterative methods often linearize the problem and use the linear formulation as part of the algorithm.

So you see, all I really needed to know about Newton's method I learned in sixth grade!

tags: Numerical Analysis

The post All I really need to know about Newton's method I learned in primary school appeared first on The DO Loop.

4
Nov

Trap and cap: Avoid division-by-zero and domain errors when evaluating functions

Statistical programmers often need to evaluate complicated expressions that contain square roots, logarithms, and other functions whose domain is restricted. Similarly, you might need to evaluate a rational expression in which the denominator of the expression can be zero. In these cases, it is important to avoid evaluating a function at any point that is outside the function's domain.

This article shows a technique that I call "trap and cap." The "trap" part of the technique requires using logical conditions to prevent your software from evaluating an expression that is outside the domain of the function. The "cap" part is useful for visualizing functions that have singularities or whose range spans many orders of magnitude.

This technique is applicable to any computer programming language, and it goes by many names. It is one tool in the larger toolbox of defensive programming techniques.

This article uses a SAS/IML function to illustrate the technique, but the same principles apply to the SAS DATA step and PROC FCMP. For an example that uses FCMP, see my article about defining a log transformation in PROC FCMP.

ERROR: Invalid argument to function

The following SAS/IML function involves a logarithm and a rational expression. If you naively try to evaluate the function on the interval [-2, 3], an error occurs:

proc iml;
/* Original function module: No checks on domain of function */
start Func(x);
   numer = exp(-2*x) # (4*x-3)##3 # log(3*x);       /* not defined for x <= 0 */
   denom = x-1;                                     /* zero when x=1 */
   return( numer / denom );
finish;
 
x = do(-2, 3, 0.01);
f = Func(x);
ERROR: (execution) Invalid argument to function.

 count     : number of occurrences is 200
 operation : LOG at line 881 column 40

The error says that there were 200 invalid arguments to the LOG function. The program stops running when it encounters the error, so it doesn't encounter the second problem, which is that the expression numer / denom is undefined when denom is zero.

Experienced SAS programmers might know that the DATA step automatically plugs in missing values and issue warnings for the LOG function. However, I don't like warnings any more than I like errors: I want my program to run cleanly!

Trap bad input values

Because the function contains a logarithm, the function is undefined for x ≤ 0. Because the function is a rational expression and the denominator is zero when x = 1, the function is also undefined at that value.

The goal of the "trap" technique is to prevent an expression from being evaluated at points where it is undefined. Accordingly, the best approach is to compute the arguments of any function that has a restricted domain (log, sqrt, arc-trigonometric functions,...) and then use the LOC function to determine the input values that are within the domain of all the relevant functions. The function is only evaluated at input values that pass all domain tests, as shown in the following example:

proc iml;
/* Trap bad domain values; return missing values when x outside domain */
start Func(x);
   f = j(nrow(x), ncol(x), .);    /* initialize return values to missing */
   /* trap bad domain values */
   d1 = 3*x;                      /* argument to LOG */
   d2 = x-1;                      /* denominator */
   domainIdx = loc( d1>0 & d2^=0 );     /* find values inside domain */
   if ncol(domainIdx)=0 then return(f); /* all inputs invalid; return missing */
 
   t = x[domainIdx];              /* elements of t are valid */
   numer = exp(-2*t) # (4*t-3)##3 # log(3*t);
   denom = t-1;
   f[domainIdx] = numer / denom;  /* evaluate f(t) */
   return( f );
finish;
 
x = do(-2, 3, 0.01);
f = Func(x);

This call does not encounter any errors. The function returns missing values for input values that are outside its domain.

In practice, I advise that you avoid testing floating-point numbers for equality with zero. You can use the ABS function to check that quantities are not close to zero. Consequently, a preferable trapping condition follows:

   domainIdx = loc( d1>0 & abs(d2)>1e-14 );  /* find values inside domain */

I also recommend that you use the CONSTANT function to check for domain values that would result in numerical underflows or overflows, which is especially useful if your expression contains an exponential term.

Cap it! Graphing functions with singularities

You have successfully evaluated the function even though it has two singularities. Let's see what happens if you naively graph the function. The function has a horizontal asymptote at y=0 and vertical asymptotes at x=0 and x=1, so you can use the REFLINE statement in PROC SGPLOT to draw reference lines that aid the visualization. In SAS/IML, you can use the SCATTER subroutine to create a scatter plot, and pass the REFLINE statements by using the OTHER= option, as follows:

refStmt = "refline 0/axis=y; refline 0 1/axis=x;"; /* add asymptotes and axes */
call scatter(x, f) other=refStmt;
trapcap1

The graph is not very useful (look at the vertical scale!) because the magnitude of the function at one point dwarfs the function values at other points. Consequently, you don't obtain a good visualization of the function's shape. This is often the case when visualizing a function that has singularities.

To obtain a better visualization, manually cap the minimum and maximum value of the graph of the function. In the SGPLOT procedure, you can use the MIN= and MAX= options on the YAXIS statement to cap the view range. If you are using the CALL SCATTER routine in SAS/IML, you can add the YAXIS statement to the OTHER= option, as follows:

/* cap the view at +/-5 */
title "Sketch of Function";
title2 "-5 < y < 5";
capStmt = "yaxis min=-5 max=5;";
call scatter(x,f) other=capStmt + refStmt;
trapcap2

This latest graph provides a good overview of the function's shape and behavior. For intervals on which the function is continuous, you can get a nicer graph if you use the SERIES subroutine to connect the points and hide the markers. The following statements re-evaluate the function on the interval (0, 1) and caps the vertical scale by using -1 < y < 1:

title "Zoom into [0, 1]";
title2 "-1 < y < 1";
x = do(0, 1, 0.01);
f = Func(x);
capStmt = "yaxis min=-1 max=1;";
call series(x,f) other=capStmt + refStmt;

The resulting graph shows that the function has two roots in the interval (0, 1) and a local maximum near x=0.43. A visualization like this is often an important step in finding the roots or local extrema of a function.

In summary, the trap-and-cap technique enables you evaluate functions that have singularities or that have a restricted domain. For SAS/IML functions, you can use the LOC function to keep only the input values that are in the domain of the function. (In a scalar language, such as the SAS DATA step, you can use IF-THEN/ELSE statements instead.) For functions with singularities, you can use the YAXIS statement in PROC SGPLOT to ensure that a graph of the function reveals interesting features and not just extreme values.

tags: Numerical Analysis, Optimization, Tips and Techniques

The post Trap and cap: Avoid division-by-zero and domain errors when evaluating functions appeared first on The DO Loop.

24
Jun

The sensitivity of Newton's method to an initial guess

Image from Wikipedia: https://commons.wikimedia.org/wiki/File:Julia_set_for_the_rational_function.png

In my article about finding an initial guess for root-finding algorithms, I stated that Newton's root-finding method "might not converge or might converge to a root that is far away from the root that you wanted to find." A reader wanted more information about that statement.

I have previously shown how to implement Newton's method in SAS. The behavior of Newton's method depends on the initial guess. If you provide a guess that is sufficiently close to a simple root, Newton's method will converge quadratically to the nearby root. However, if your guess is near a critical point of the function, Newton's method will produce a "next guess" that is far away from the initial guess. Further iterations might converge to an arbitrary root, might endlessly cycle in a periodic or aperiodic manner, or might diverge to infinity.

The dynamics of Newton iteration can be quite complex. If you owned a PC in the 80's and early 90's, you might have spent countless hours computing Mandelbrot sets and Julia sets. You might have seen pictures like the one at the beginning of this article, which show the domains of attraction for Newton's iteration for a cubic polynomial. In the picture, each point in the complex plane is colored according to which root Newton's method converges to when it begins at that point. The points that eventually converge to a root are the Fatou set, whereas the points that do not converge form the Julia set.

The sensitivity of Newton's method

You can perform the same kind of computer experiments for Newton's method applied to a real function. (You can download the SAS/IML programs that I used to create the graphs in this article.) Consider the polynomial f(x) = x5 – 2 x4 – 10 x3 + 20 x2 + 9 x – 18. The roots of the polynomial are {-3, -1, 1, 2, 3}. The polynomial has critical points (where the derivative vanishes) near -2.3, -0.2, 1.5, and 2.6. Recall that Newton's method involves iteration of the rational function N(x) = xf(x)/f'(x), which has singularities at the critical points of f.

You can ask the following question: For each point, x, to which root does Newton's method converge when x is the initial guess? You can also keep track of how many iterations it takes for Newton's method to converge to a root.

NewtonDynamics1

If you apply Newton's method to 250 initial conditions on the interval [-4, 4], you get the results that are summarized in the needle plot to the left. (Click to enlarge.) The color of the needle at x indicates the root to which x converges under Newton's method. The height of the needle indicates the number of iterations required to converge.

You can see that initial guesses that are close to a root converge to the nearby root in five or fewer iterations. Near the critical points, Newton's method requires more iterations to converge, often more than 10 and sometimes more than 20 iterations. The regions near the critical points are interlaced bands of color, which indicates that the dynamics of Newton's method is complicated in those regions. A small change in the initial guess can result in a big difference in the Newton iterations.

NewtonDynamics2

The dynamics near 0 seem interesting, so let's zoom into that region. In particular, repeat the previous computation for 250 initial conditions on the interval [-0.5, 0.5]. The needle plot to the left reveals additional details. All roots can be reached from initial guesses in this region, even though the nearest roots are at x = -1 and x = 1 (roots #2 and #3). Again, there are regions for which many iterations are required and there is an interlacing of colors, which indicates that newton's method is sensitive to the initial guess.

You can zoom again into a multicolored region and repeat the computations on a new interval. The behavior continues ad infinitum. You can find two initial guesses that differ by an arbitrarily small amount, yet their iteration under Newton's method eventually diverge and result in different roots. This is known as sensitive dependence on initial conditions, or, more poetically, as the butterfly effect.

Conclusions

Newton's method is one of my favorite root-finding techniques. However, it is important to understand that the famous quadratic convergence of Newton's method applies to initial guesses that are close to a root. For an arbitrary initial guess, Newton's method can be result in divergence, periodic orbits, or convergence to a far-away root. Fractals and chaos are fun topics to explore, but not when you need to find a root as part of your work. Therefore I recommend using a pre-search method, as described in my previous article, to obtain a good initial guess.

Further details

tags: Math, Numerical Analysis

The post The sensitivity of Newton's method to an initial guess appeared first on The DO Loop.

22
Jun

Finding roots: Automating the search for an initial guess

A SAS programmer asked an interesting question on a SAS Support Community. The programmer had a nonlinear function with 12 parameters. He also had file that contained 4,000 lines, where each line contained values for the 12 parameters. In other words, the file specified 4,000 different functions. The programmer wanted to know how to use SAS to find at least one root for each of the 4,000 functions.

Different kinds of root-finding algorithms

Root-finding algorithms fall into two general classes: "shooting methods" and "bounding methods." Shooting methods include the secant algorithm and Newton's method. These iterative methods use derivative information to try to predict the location of a root from a guess. These methods are indispensable for multivariate functions, and you can read about how to implement Newton's method in SAS. Unfortunately, shooting methods might not converge or might converge to a root that is far away from the root that you wanted to find.

For univariate functions, bounding methods are simpler. Bounding methods include the bisection method and Brent's method. The advantage of bounding methods is that they are guaranteed to find a root inside a bounding interval. A bounding interval is an interval [a,b] for which f(a) and f(b) have different signs. If f is a continuous function, there must be a root in the interval.

How to approximately find a root?

For both classes of methods, it is important to determine an approximate location for the root. If you have a function of one variable, you can graph the function on a wide domain such as [-20,20] or [-100, 100] to approximate the location of roots. If necessary, redraw the graph on a smaller domain to "zoom in" on the details.

For multivariate functions, use the tips in the article "How to find an initial guess for an optimization," and apply them to the norm of the function ||f||2, which is the sum of squares of the components of the function. When ||f||2 is small, there might be a root nearby.

How to automate the search for a root?

Graphing a univariate function usually means evaluating the function at many points in the domain and plotting the ordered pairs (x, f(x)). Graphing enables you to see the approximate x values for which f changes signs.

However, you don't actually need to create the graph to use this technique. You could evaluate the function on a uniform subdivision in the domain and then use a program to detect intervals where the function changes signs. I call this a "pre-search" technique, because it isn't actually solving for roots, it is merely finding regions that contain roots.

To find regions that contain roots, choose a large interval [a,b] and subdivide that interval into n smaller subintervals, each of size (b-a)/n. Evaluate the function at each point of the subdivision and locate the intervals for which the function changes sign. You can use the DIF function and the SIGN function to locate sign changes.

rootguess

Let's see how this technique works in practice. The following SAS/IML function evaluates a fifth-degree polynomial by using Horner's method. The graph of the polynomial (shown to the left) reveals that it has roots near the values {-3, -1, 1, 2.5, 3}. The MeshEval function evaluates the polynomial at evenly spaced points and returns the subintervals on which the polynomial changes signs:

proc iml;
/* Evaluate fifth-degree polynomial function. Make sure this function can 
   evaluate a vector of x values. 
   To generalize, write:  start Func(x) global(p);   */
start Func(x);
   /* c5*x##5 + c4*x##4 + ... + c0 */
   p = {1 -2 -10 20 9 -14};       /* Coefficients: p[1]=c5, p[2]=c4, etc */
   y = j(nrow(x), ncol(x), p[1]); /* initialize to coef of largest deg */
   do j = 2 to ncol(p);           /* Horner's method of evaluation */
      y = y # x + p[j];
   end;
   return(y);
finish;
 
/* Evaluate function 'Func' and return intervals where function
   changes sign. Specify a large interval [a,b] and the number, n, 
   of equally spaced subdivisions of [a,b].    */
start MeshEval(ab, n);
   x = ab[1] + range(ab)/n * (1:n);         /* n values of x */
   y = Func(x);                             
   difSgn = dif(sign(y));                   
   idx = loc(difSgn=2 | difSgn=-2);
   if IsEmpty(idx) then return ( {. .} );
   return( x[idx-1] || x[idx] );
finish;
 
ab = {-10 10};                /* search for roots on this big domain */
n = 100;                      /* break up into n smaller intervals */
intervals = MeshEval(ab,n);   /* intervals on which the function changes sign */
F = Func(intervals);
print intervals[c={"a" "b"} L=""]   F[c={"F(a)" "F(b)"} L=""];
t_rootguess

The table shows five bounding intervals on which the function changes signs. Because the function is continuous, there are roots in the intervals [-3.2, -3], [-1, -0.8], [0.6, 0.8], [2.2, 2.4], and [2.8, 3]. You can use the FROOT function to find the roots in each bounding interval:

roots = froot("Func", intervals);
print roots;
t_rootguess2

The FROOT function found all five roots, one root per bounding interval.

How to find the roots of 4,000 functions?

Returning now to the problem asked in the SAS Support Community, the infrastructure is in place to solve for the roots of an arbitrary number of polynomials. All you need to do is add a GLOBAL statement to the START statement and pass in p as a global variables that contains the coefficients. For each set of coefficients in the file, assign p and call the MeshEval and FROOT functions.

In fact, the Func function is written so that it can evaluate a polynomial of any degree, so the same function will work regardless of the number of elements in the p vector.

This approach will always succeed for odd polynomials because you can always find a bounding interval. However, it might not find all roots, and this approach can fail for an arbitrary function. For example, the quadratic polynomial fx) = x22 has roots at ±ε. When ε is small, the EvalMesh function might not find a bounding interval unless n is very large. If you are sure that the function has a real root, but EvalMesh does not find it, you can iteratively increase the value of n and call EvalMesh until the root is found.

tags: Numerical Analysis

The post Finding roots: Automating the search for an initial guess appeared first on The DO Loop.

8
Apr

Compute the rank of a matrix in SAS

A common question from statistical programmers is how to compute the rank of a matrix in SAS. Recall that the rank of a matrix is defined as the number of linearly independent columns in the matrix. (Equivalently, the number of linearly independent rows.) This article describes how to compute the rank of a matrix in SAS by using functions in SAS/IML software.

An important application of the rank is to determine whether a square matrix is nonsingular. The techniques in this article can also help you decide whether a square matrix is singular.

The rank of a matrix is one of those vexing problems in numerical linear algebra. Mathematically, the question is not difficult. However, in finite-precision arithmetic, computing the rank of a matrix (and whether it is nonsingular) can be challenging. This article suggests some best practices. The details have filled many books, journal articles, and PhD dissertations.

The mathematical rank of a matrix

In theory, you can use Gaussian elimination to compute the rank of a matrix. You reduce the matrix to row echelon form; the rank is the number of rows that contain a nonzero element.

For square matrices, the same mathematical process determines whether a matrix is nonsingular. Other mathematical facts are also true: a singular matrix has a determinant equal to 0, at least one eigenvalue that is 0, and at least one singular value that is 0.

However, these mathematical facts are not good ways to numerically determine whether a matrix is rank-deficient or singular. A naive implementation of Gaussian elimination is numerically unstable. A numerical determinant that should be mathematically zero might be computed as a very tiny nonzero number in finite-precision arithmetic. Or an ill-conditioned but non-singular matrix might have a determinant so small that it is numerically indistinguishable from a singular matrix. In any case, you should never compare a numerical value to 0.

The numerical rank of a matrix

So what's a statistical programmer to do? For the rank problem, there are several numerically robust techniques, but I will describe two that are based on the singular value decomposition (SVD). Some researchers use methods based on the QR decomposition, which is faster to compute but less reliable.

Mathematically, the rank of a matrix equals the number of nonzero singular values. Therefore a simple algorithm to estimate the rank is to use the SVD routine in SAS/IML to compute the singular values and then declare that any singular value smaller than some cutoff value is numerically the same as zero. The following example computes and displays the singular values for a 6x5 matrix by using the SVD:

proc iml;
/* rank(A)=4 because only four linearly independent columns */
A = {1 0 1 0 0,
     1 0 0 1 0,
     1 0 0 0 1,
     0 1 1 0 0,
     0 1 0 1 0,
     0 1 0 0 1 };
 
call svd(U, Q, V, A);
print Q[L="SingVals"];
t_rankmatrix1

You can see that the last singular value is tiny, and in fact the matrix is rank 4. Although you might be tempted to use a fixed cutoff value such as 1e-8 to determine which singular values are "small," it is usually better to incorporate factors that reflect the scale of the problem. A common cutoff value (used by MATLAB) is δ = d σmax ε where d is the maximum dimension of A, σmax is the maximum singular value of A, and ε is "machine epsilon." The machine epsilon can be computed by using the CONSTANT function in SAS. The following example computes the cutoff value and estimates the rank of the matrix:

tol = max(dimension(A)) * constant("maceps") * (max(Q));
rankSVD = sum(Q > tol);
print tol rankSVD;
t_rankmatrix2

There are other techniques that you can use to estimate the rank of a matrix. The SAS/IML documentation recommends using the generalized-inverse technique. The mathematical properties of the generalized inverse (also known as the Moore-Penrose inverse) are explained in a short 1978 article by M. James in The Mathematical Gazette.

The generalized inverse always exists, even for nonsquare matrices and for singular matrices. (If A is nonsingular, then the generalized inverse is the same as the inverse matrix A-1.) To reveal the rank of A, compute the trace of the product of the generalized inverse with A. For a nonsingular matrix, the trace is obviously the dimension of A. For nonsquare or singular matrices, the (rounded) trace is an estimate for the rank of the matrix (Penrose, 1954, p. 408), as follows:

rankGInv = round(trace(ginv(A)*A));
print rankGInv;
t_rankmatrix3

The generalized-inverse technique does not enable you to specify a tolerance/scaling parameter, but a parameter is used internally by the GINV function to compute the generalized inverse.

It is useful to package up these techniques into a module, as follows:

start RankMatrix(A, method="SVD", tol=);
   if upcase(method)="SVD" then do;
      call svd(U, Q, V, A);
      if IsEmpty(tol) then 
         tol = max(dimension(A))*constant("maceps")*(max(Q));
      return( sum(Q > tol) );
   end;
   else do;
      return( round(trace(ginv(a)*a)) ); 
   end;
finish;
 
rankSVD = RankMatrix(A);
rankGInv = RankMatrix(A, "GINV");

A numerical test for singularity

Let's test the rank algorithms on a notorious ill-conditioned matrix, the Hilbert matrix. For an n x n Hilbert matrix, the determinant approaches zero quickly, but is always positive, which means that the Hilbert matrix is nonsingular for all values of n.

The following table shows the result of computing the rank for five Hilbert matrices.

t_rankmatrix4

The first column shows the size of the problem, which is also the true rank of the Hilbert matrix. The second column shows the determinant of the matrix, which shows that the matrices are very close to being singular. The third column shows the estimated rank by using the SVD algorithm. The SVD algorithm performs very well. It correctly computes the rank of the Hilbert matrix for n ≤ 10, and correctly finds the rank of a matrix with determinant 2E-53. The fourth column shows the estimated rank by using the generalized inverse algorithm. It also performs well and correctly computes the rank of the Hilbert matrix for n ≤ 9 and for a determinant as tiny as 1E-42.

In summary, both methods perform well. The SVD method is slightly faster and performed better on this test, so I recommend that you use the SVD method to estimate the rank of rectangular matrices and to determine whether a square matrix is nonsingular. The SVD method is the default method for the RankMatrix function.

tags: Numerical Analysis

The post Compute the rank of a matrix in SAS appeared first on The DO Loop.

Back to Top