8.5: Computing Eigenvalues - Mathematics

In practice, the problem of finding eigenvalues of a matrix is virtually never solved by finding the roots of the characteristic polynomial. This is difficult for large matrices and iterative methods are much better. Two such methods are described briefly in this section.

The Power Method

In Chapter [chap:3] our initial rationale for diagonalizing matrices was to be able to compute the powers of a square matrix, and the eigenvalues were needed to do this. In this section, we are interested in efficiently computing eigenvalues, and it may come as no surprise that the first method we discuss uses the powers of a matrix.

Recall that an eigenvalue (lambda) of an (n imes n) matrix (A) is called a dominant eigenvalue if (lambda) has multiplicity (1), and [|lambda| > |mu| quad mbox{ for all eigenvalues } mu eq lambda] Any corresponding eigenvector is called a dominant eigenvector of (A). When such an eigenvalue exists, one technique for finding it is as follows: Let (vect{x}_{0}) in (RR^n) be a first approximation to a dominant eigenvector (lambda), and compute successive approximations (vect{x}_{1}, vect{x}_{2}, dots) as follows: [vect{x}_{1} = Avect{x}_{0} quad vect{x}_{2} = Avect{x}_{1} quad vect{x}_{3} = Avect{x}_{2} quad cdots] In general, we define [vect{x}_{k+1} = Avect{x}_{k} quad mbox{ for each } k geq 0] If the first estimate (vect{x}_{0}) is good enough, these vectors (vect{x}_{n}) will approximate the dominant eigenvector (lambda) (see below). This technique is called the power method (because (vect{x}_{k} = A^{k}vect{x}_{0}) for each (k geq 1)). Observe that if (vect{z}) is any eigenvector corresponding to (lambda), then [frac{vect{z} dotprod (Avect{z})}{vectlength vect{z} vectlength^2} = frac{vect{z} dotprod (lambdavect{z})}{vectlength vect{z} vectlength^2} = lambda] Because the vectors (vect{x}_{1}, vect{x}_{2}, dots, vect{x}_{n}, dots) approximate dominant eigenvectors, this suggests that we define the Rayleigh quotients as follows: [r_{k} = frac{vect{x}_{k} dotprod vect{x}_{k+1}}{vectlength vect{x}_{k} vectlength^2} quad mbox{ for } k geq 1] Then the numbers (r_{k}) approximate the dominant eigenvalue (lambda).

025326 Use the power method to approximate a dominant eigenvector and eigenvalue of (A = leftB egin{array}{rr} 1 & 1 2 & 0 end{array} ightB).

The eigenvalues of (A) are (2) and (-1), with eigenvectors (leftB egin{array}{rr} 1 1 end{array} ightB) and (leftB egin{array}{rr} 1 -2 end{array} ightB). Take (vect{x}_{0} = leftB egin{array}{rr} 1 0 end{array} ightB) as the first approximation and compute (vect{x}_{1}, vect{x}_{2}, dots,) successively, from (vect{x}_{1} = Avect{x}_{0}, vect{x}_{2} = Avect{x}_{1}, dots) . The result is [vect{x}_{1} = leftB egin{array}{rr} 1 2 end{array} ightB, vect{x}_{2} = leftB egin{array}{rr} 3 2 end{array} ightB, vect{x}_{3} = leftB egin{array}{rr} 5 6 end{array} ightB, vect{x}_{4} = leftB egin{array}{rr} 11 10 end{array} ightB, vect{x}_{3} = leftB egin{array}{rr} 21 22 end{array} ightB, dots] These vectors are approaching scalar multiples of the dominant eigenvector (leftB egin{array}{rr} 1 1 end{array} ightB). Moreover, the Rayleigh quotients are [r_{1} = frac{7}{5}, r_{2} = frac{27}{13}, r_{3} = frac{115}{61}, r_{4} = frac{451}{221}, dots] and these are approaching the dominant eigenvalue (2).

To see why the power method works, let (lambda_{1}, lambda_{2}, dots, lambda_{m}) be eigenvalues of (A) with (lambda_{1}) dominant and let (vect{y}_{1}, vect{y}_{2}, dots, vect{y}_{m}) be corresponding eigenvectors. What is required is that the first approximation (vect{x}_{0}) be a linear combination of these eigenvectors: [vect{x}_{0} = a_{1}vect{y}_{1} + a_{2}vect{y}_{2} + dots + a_{m}vect{y}_{m} quad mbox{ with } a_{1} eq 0] If (k geq 1), the fact that (vect{x}_{k} = A^{k}vect{x}_{0}) and (A^kvect{y}_{i} = lambda_{i}^kvect{y}_{i}) for each (i) gives [vect{x}_{k} = a_{1}lambda_{1}^kvect{y}_{1} + a_{2}lambda_{2}^kvect{y}_{2} + dots + a_{m}lambda_{m}^kvect{y}_{m} quad mbox{ for } k geq 1] Hence [frac{1}{lambda_{1}^k}vect{x}_{k} = a_{1}vect{y}_{1} + a_{2}left(frac{lambda_{2}}{lambda_{1}} ight)^kvect{y}_{2} + dots + a_{m}left(frac{lambda_{m}}{lambda_{1}} ight)^kvect{y}_{m}] The right side approaches (a_{1}vect{y}_{1}) as (k) increases because (lambda_{1}) is dominant (left( left|frac{lambda_{i}}{lambda_{1}} ight| < 1 mbox{ for each } i > 1 ight)). Because (a_{1} eq 0), this means that (vect{x}_{k}) approximates the dominant eigenvector (a_{1}lambda_{1}^kvect{y}_{1}).

The power method requires that the first approximation (vect{x}_{0}) be a linear combination of eigenvectors. (In Example [exa:025326] the eigenvectors form a basis of (RR^2).) But even in this case the method fails if (a_{1} = 0), where (a_{1}) is the coefficient of the dominant eigenvector (try (vect{x}_{0} = leftB egin{array}{rr} -1 2 end{array} ightB) in Example [exa:025326]). In general, the rate of convergence is quite slow if any of the ratios (left| frac{lambda_{i}}{lambda_{1}} ight|) is near (1). Also, because the method requires repeated multiplications by (A), it is not recommended unless these multiplications are easy to carry out (for example, if most of the entries of (A) are zero).


A much better method for approximating the eigenvalues of an invertible matrix (A) depends on the factorization (using the Gram-Schmidt algorithm) of (A) in the form [A = QR] where (Q) is orthogonal and (R) is invertible and upper triangular (see Theorem [thm:025166]). The QR-algorithm uses this repeatedly to create a sequence of matrices (A_{1} =A, A_{2}, A_{3}, dots,) as follows:

  1. Define (A_{1} = A) and factor it as (A_{1} = Q_{1}R_{1}).

  2. Define (A_{2} = R_{1}Q_{1}) and factor it as (A_{2} = Q_{2}R_{2}).

  3. Define (A_{3} = R_{2}Q_{2}) and factor it as (A_{3} = Q_{3}R_{3}).

In general, (A_{k}) is factored as (A_{k} = Q_{k}R_{k}) and we define (A_{k + 1} = R_{k}Q_{k}). Then (A_{k + 1}) is similar to (A_{k}) [in fact, (A_{k+1} = R_{k}Q_{k} = (Q_{k}^{-1}A_{k})Q_{k})], and hence each (A_{k}) has the same eigenvalues as (A). If the eigenvalues of (A) are real and have distinct absolute values, the remarkable thing is that the sequence of matrices (A_{1}, A_{2}, A_{3}, dots) converges to an upper triangular matrix with these eigenvalues on the main diagonal. [See below for the case of complex eigenvalues.]

025425 If (A = leftB egin{array}{rr} 1 & 1 2 & 0 end{array} ightB) as in Example [exa:025326], use the QR-algorithm to approximate the eigenvalues.

The matrices (A_{1}), (A_{2}), and (A_{3}) are as follows: [egin{aligned} A_{1} = & leftB egin{array}{rr} 1 & 1 2 & 0 end{array} ightB = Q_{1}R_{1} quad mbox{ where } Q_{1} = frac{1}{sqrt{5}}leftB egin{array}{rr} 1 & 2 2 & -1 end{array} ightB mbox{ and } R_{1} = frac{1}{sqrt{5}}leftB egin{array}{rr} 5 & 1 0 & 2 end{array} ightB A_{2} = & frac{1}{5}leftB egin{array}{rr} 7 & 9 4 & -2 end{array} ightB = leftB egin{array}{rr} 1.4 & -1.8 -0.8 & -0.4 end{array} ightB= Q_{2}R_{2} &mbox{ where } Q_{2} = frac{1}{sqrt{65}}leftB egin{array}{rr} 7 & 4 4 & -7 end{array} ightB mbox{ and } R_{2} = frac{1}{sqrt{65}}leftB egin{array}{rr} 13 & 11 0 & 10 end{array} ightB A_{3} = &frac{1}{13}leftB egin{array}{rr} 27 & -5 8 & -14 end{array} ightB = leftB egin{array}{rr} 2.08 & -0.38 0.62 & -1.08 end{array} ightBend{aligned}] This is converging to (leftB egin{array}{rr} 2 & ast 0 & -1 end{array} ightB) and so is approximating the eigenvalues (2) and (-1) on the main diagonal.

It is beyond the scope of this book to pursue a detailed discussion of these methods. The reader is referred to J. M. Wilkinson, The Algebraic Eigenvalue Problem (Oxford, England: Oxford University Press, 1965) or G. W. Stewart, Introduction to Matrix Computations (New York: Academic Press, 1973). We conclude with some remarks on the QR-algorithm.

Shifting. Convergence is accelerated if, at stage (k) of the algorithm, a number (s_{k}) is chosen and (A_{k} - s_{k}I) is factored in the form (Q_{k}R_{k}) rather than (A_{k}) itself. Then [Q_{k}^{-1}A_{k}Q_{k} = Q_{k}^{-1}(Q_{k}R_{k} + s_{k}I)Q_{k} = R_{k}Q_{k} + s_{k}I] so we take (A_{k+1} = R_{k}Q_{k} + s_{k}I). If the shifts (s_{k}) are carefully chosen, convergence can be greatly improved.

Preliminary Preparation. A matrix such as [leftB egin{array}{rrrrr} ast & ast & ast & ast & ast ast & ast & ast & ast & ast 0 & ast & ast & ast & ast 0 & 0 & ast & ast & ast 0 & 0 & 0 & ast & ast end{array} ightB] is said to be in upper Hessenberg form, and the QR-factorizations of such matrices are greatly simplified. Given an (n imes n) matrix (A), a series of orthogonal matrices (H_{1}, H_{2}, dots, H_{m}) (called Householder matrices) can be easily constructed such that [B = H_{m}^T cdots H_{1}^TAH_{1} cdots H_{m}] is in upper Hessenberg form. Then the QR-algorithm can be efficiently applied to (B) and, because (B) is similar to (A), it produces the eigenvalues of (A).

Complex Eigenvalues. If some of the eigenvalues of a real matrix (A) are not real, the QR-algorithm converges to a block upper triangular matrix where the diagonal blocks are either (1 imes 1) (the real eigenvalues) or (2 imes 2) (each providing a pair of conjugate complex eigenvalues of (A)).

Exercises for 1



In each case, find the exact eigenvalues and determine corresponding eigenvectors. Then start with (vect{x}_{0} = leftB egin{array}{rr} 1 1 end{array} ightB) and compute (vect{x}_{4}) and (r_{3}) using the power method.

(A = leftB egin{array}{rr} 2 & -4 -3 & 3 end{array} ightB) (A = leftB egin{array}{rr} 5 & 2 -3 & -2 end{array} ightB) (A = leftB egin{array}{rr} 1 & 2 2 & 1 end{array} ightB) (A = leftB egin{array}{rr} 3 & 1 1 & 0 end{array} ightB)

  1. Eigenvalues (4), (-1); eigenvectors (leftB egin{array}{rr} 2 -1 end{array} ightB), (leftB egin{array}{rr} 1 -3 end{array} ightB); (vect{x}_{4} = leftB egin{array}{rr} 409 -203 end{array} ightB); (r_{3} = 3.94)

  2. Eigenvalues (lambda_{1} = frac{1}{2}(3 + sqrt{13})), (lambda_{2} = frac{1}{2}(3 - sqrt{13})); eigenvectors (leftB egin{array}{c} lambda_{1} 1 end{array} ightB), (leftB egin{array}{c} lambda_{2} 1 end{array} ightB); (vect{x}_{4} = leftB egin{array}{rr} 142 43 end{array} ightB); (r_{3} = 3.3027750) (The true value is (lambda_{1} = 3.3027756), to seven decimal places.)

In each case, find the exact eigenvalues and then approximate them using the QR-algorithm.

(A = leftB egin{array}{rr} 1 & 1 1 & 0 end{array} ightB) (A = leftB egin{array}{rr} 3 & 1 1 & 0 end{array} ightB)

  1. (A_{1} = leftB egin{array}{rr} 3 & 1 1 & 0 end{array} ightB), (Q_{1} = frac{1}{sqrt{10}}leftB egin{array}{rr} 3 & -1 1 & 3 end{array} ightB), (R_{1} = frac{1}{sqrt{10}}leftB egin{array}{rr} 10 & 3 0 & -1 end{array} ightB)
    (A_{2} = frac{1}{10}leftB egin{array}{rr} 33 & -1 -1 & -3 end{array} ightB),
    (Q_{2} = frac{1}{sqrt{1090}}leftB egin{array}{rr} 33 & 1 -1 & 33 end{array} ightB),
    (R_{2} = frac{1}{sqrt{1090}}leftB egin{array}{rr} 109 & -3 0 & -10 end{array} ightB)
    (A_{3} = frac{1}{109}leftB egin{array}{rr} 360 & 1 1 & -33 end{array} ightB)
    ({} = leftB egin{array}{rr} 3.302775 & 0.009174 0.009174 & -0.302775 end{array} ightB)

Apply the power method to
(A = leftB egin{array}{rr} 0 & 1 -1 & 0 end{array} ightB), starting at (vect{x}_{0} = leftB egin{array}{rr} 1 1 end{array} ightB). Does it converge? Explain.

If (A) is symmetric, show that each matrix (A_{k}) in the QR-algorithm is also symmetric. Deduce that they converge to a diagonal matrix.

Use induction on (k). If (k = 1), (A_{1} = A). In general (A_{k+1} = Q_{k}^{-1}A_{k}Q_{k} = Q_{k}^{T}A_{k}Q_{k}), so the fact that (A_{k}^{T} = A_{k}) implies (A_{k+1}^{T} = A_{k+1}). The eigenvalues of (A) are all real (Theorem [thm:016145]), so the (A_{k}) converge to an upper triangular matrix (T). But (T) must also be symmetric (it is the limit of symmetric matrices), so it is diagonal.

Apply the QR-algorithm to
(A = leftB egin{array}{rr} 2 & -3 1 & -2 end{array} ightB). Explain.

Given a matrix (A), let (A_{k}), (Q_{k}), and (R_{k}), (k geq 1), be the matrices constructed in the QR-algorithm. Show that (A_{k} = (Q_{1}Q_{2} cdots Q_{k})(R_{k} cdots R_{2}R_{1})) for each (k geq 1) and hence that this is a QR-factorization of (A_{k}).

Shortcuts for computing the eigenvalues of a linear transformation

How would you calculate the eigenvalues of the following matrix?

$A = egin -3 & 1 & -1 -7 & 5 & -1 -6 & 6 & -2end$ $ $ $ $chi_A(lambda) = det(A-lambda I)= egin -3-lambda & 1 & -1 -7 & 5-lambda & -1 -6 & 6 & -2-lambda end=ldots =0$

I'd really like to avoid using the rule of Sarrus. This will just lead to a huge list of multiplications and finally I may even have to guess the roots of the polynomial (maybe with Vieta's Theorem) and factor them out via polynomial division. - This whole process is tedious and prone to errors so I'd like to take some shortcuts whenever I can.

Here are some shortcuts I already know, which may be used in conjunction:

  • Multiplying two rows by $(-1)$ (this will not change the determinant)
  • Developing the determinant via a row or column that has a lot of zeros in it (ideally just one factor) to get out linear factors of the polynomial.
  • Transforming the matrix via gaussian elimination to a matrix which has more zeros in one column or row (ideally: transform it to a lower/upper triangular matrix).
  • Compute the determinant via the theorem for block-diagonal matrices.

However, none of these shortcuts are useful for calculating the eigenvalues of $A$. How would you approach the computation of the determinant of the matrix of above? What other shortcuts do you have to share?

8.5: Computing Eigenvalues - Mathematics

It’s now time to start solving systems of differential equations. We’ve seen that solutions to the system,

where (lambda) and (vec eta )are eigenvalues and eigenvectors of the matrix (A). We will be working with (2 imes 2) systems so this means that we are going to be looking for two solutions, (left( t ight)) and (left( t ight)), where the determinant of the matrix,

We are going to start by looking at the case where our two eigenvalues, (>) and (>) are real and distinct. In other words, they will be real, simple eigenvalues. Recall as well that the eigenvectors for simple eigenvalues are linearly independent. This means that the solutions we get from these will also be linearly independent. If the solutions are linearly independent the matrix (X) must be nonsingular and hence these two solutions will be a fundamental set of solutions. The general solution in this case will then be,

Note that each of our examples will actually be broken into two examples. The first example will be solving the system and the second example will be sketching the phase portrait for the system. Phase portraits are not always taught in a differential equations course and so we’ll strip those out of the solution process so that if you haven’t covered them in your class you can ignore the phase portrait example for the system.

So, the first thing that we need to do is find the eigenvalues for the matrix.

Now let’s find the eigenvectors for each of these.

The eigenvector in this case is,

The eigenvector in this case is,

Then general solution is then,

Now, we need to find the constants. To do this we simply need to apply the initial conditions.

All we need to do now is multiply the constants through and we then get two equations (one for each row) that we can solve for the constants. This gives,

Now, let’s take a look at the phase portrait for the system.

From the last example we know that the eigenvalues and eigenvectors for this system are,

It turns out that this is all the information that we will need to sketch the direction field. We will relate things back to our solution however so that we can see that things are going correctly.

We’ll start by sketching lines that follow the direction of the two eigenvectors. This gives,

Now, from the first example our general solution is

If we have ( = 0) then the solution is an exponential times a vector and all that the exponential does is affect the magnitude of the vector and the constant (c_<1>) will affect both the sign and the magnitude of the vector. In other words, the trajectory in this case will be a straight line that is parallel to the vector, (>). Also notice that as (t) increases the exponential will get smaller and smaller and hence the trajectory will be moving in towards the origin. If ( > 0) the trajectory will be in Quadrant II and if ( < 0) the trajectory will be in Quadrant IV.

So, the line in the graph above marked with (>) will be a sketch of the trajectory corresponding to ( = 0) and this trajectory will approach the origin as (t) increases.

If we now turn things around and look at the solution corresponding to having ( = 0) we will have a trajectory that is parallel to (>). Also, since the exponential will increase as (t) increases and so in this case the trajectory will now move away from the origin as (t) increases. We will denote this with arrows on the lines in the graph above.

Notice that we could have gotten this information without actually going to the solution. All we really need to do is look at the eigenvalues. Eigenvalues that are negative will correspond to solutions that will move towards the origin as (t) increases in a direction that is parallel to its eigenvector. Likewise, eigenvalues that are positive move away from the origin as (t) increases in a direction that will be parallel to its eigenvector.

If both constants are in the solution we will have a combination of these behaviors. For large negative (t)’s the solution will be dominated by the portion that has the negative eigenvalue since in these cases the exponent will be large and positive. Trajectories for large negative (t)’s will be parallel to (>) and moving in the same direction.

Solutions for large positive (t)’s will be dominated by the portion with the positive eigenvalue. Trajectories in this case will be parallel to (>) and moving in the same direction.

In general, it looks like trajectories will start “near” (>), move in towards the origin and then as they get closer to the origin they will start moving towards (>) and then continue up along this vector. Sketching some of these in will give the following phase portrait. Here is a sketch of this with the trajectories corresponding to the eigenvectors marked in blue.

In this case the equilibrium solution (left( <0,0> ight)) is called a saddle point and is unstable. In this case unstable means that solutions move away from it as (t) increases.

So, we’ve solved a system in matrix form, but remember that we started out without the systems in matrix form. Now let’s take a quick look at an example of a system that isn’t in matrix form initially.

We first need to convert this into matrix form. This is easy enough. Here is the matrix form of the system.

This is just the system from the first example and so we’ve already got the solution to this system. Here it is.

Now, since we want the solution to the system not in matrix form let’s go one step farther here. Let’s multiply the constants and exponentials into the vectors and then add up the two vectors.

So, the solution to the system is then,

Let’s work another example.

So, the first thing that we need to do is find the eigenvalues for the matrix.

Now let’s find the eigenvectors for each of these.

The eigenvector in this case is,

The eigenvector in this case is,

Then general solution is then,

Now, we need to find the constants. To do this we simply need to apply the initial conditions.

Now solve the system for the constants.

Now let’s find the phase portrait for this system.

From the last example we know that the eigenvalues and eigenvectors for this system are,

This one is a little different from the first one. However, it starts in the same way. We’ll first sketch the trajectories corresponding to the eigenvectors. Notice as well that both of the eigenvalues are negative and so trajectories for these will move in towards the origin as (t) increases. When we sketch the trajectories we’ll add in arrows to denote the direction they take as (t) increases. Here is the sketch of these trajectories.

Now, here is where the slight difference from the first phase portrait comes up. All of the trajectories will move in towards the origin as (t) increases since both of the eigenvalues are negative. The issue that we need to decide upon is just how they do this. This is actually easier than it might appear to be at first.

The second eigenvalue is larger than the first. For large and positive (t)’s this means that the solution for this eigenvalue will be smaller than the solution for the first eigenvalue. Therefore, as (t) increases the trajectory will move in towards the origin and do so parallel to (>). Likewise, since the second eigenvalue is larger than the first this solution will dominate for large and negative (t)’s. Therefore, as we decrease (t) the trajectory will move away from the origin and do so parallel to (>).

Adding in some trajectories gives the following sketch.

In these cases we call the equilibrium solution (left( <0,0> ight)) a node and it is asymptotically stable. Equilibrium solutions are asymptotically stable if all the trajectories move in towards it as (t) increases.

Note that nodes can also be unstable. In the last example if both of the eigenvalues had been positive all the trajectories would have moved away from the origin and in this case the equilibrium solution would have been unstable.

Before moving on to the next section we need to do one more example. When we first started talking about systems it was mentioned that we can convert a higher order differential equation into a system. We need to do an example like this so we can see how to solve higher order differential equations using systems.

So, we first need to convert this into a system. Here’s the change of variables,

Now we need to find the eigenvalues for the matrix.

Now let’s find the eigenvectors.

The eigenvector in this case is,

The eigenvector in this case is,

The general solution is then,

Apply the initial condition.

This gives the system of equations that we can solve for the constants.

The actual solution to the system is then,

we can see that the solution to the original differential equation is just the top row of the solution to the matrix system. The solution to the original differential equation is then,

Notice that as a check, in this case, the bottom row should be the derivative of the top row.

Quick Refresher: What are they?

First, just a quick refresher on what Eigenvalues and Eigenvectors are. Take a square matrix X. If there is a vector v and a scalar λ such that

then v is an eigenvector of X and λ is the corresponding eigenvalue of X.

In words, if you think of multiplying v by X as applying a function to v, then for this particular vector v this function is nothing but a stretching/squishing scalar multiplication. Usually multiplying a vector by a matrix equates to taking linear combinations of the components of a vector. But if you take an Eigenvector, you don’t need to do all that computation. Just multiply by the Eigenvalue and you are all good.

Computation of Eigenvalues

This is a linear system for which the matrix coefficient is . We also know that this system has one solution if and only if the matrix coefficient is invertible, i.e. . Since the zero-vector is a solution and C is not the zero vector, then we must have

Example. Consider the matrix

The equation translates into

which is equivalent to the quadratic equation

Solving this equation leads to

In other words, the matrix A has only two eigenvalues.

In general, for a square matrix A of order n, the equation

will give the eigenvalues of A . This equation is called the characteristic equation or characteristic polynomial of A . It is a polynomial function in of degree n. So we know that this equation will not have more than n roots or solutions. So a square matrix A of order n will not have more than n eigenvalues.

Example. Consider the diagonal matrix

Its characteristic polynomial is

So the eigenvalues of D are a , b , c , and d , i.e. the entries on the diagonal.

This result is valid for any diagonal matrix of any size. So depending on the values you have on the diagonal, you may have one eigenvalue, two eigenvalues, or more. Anything is possible.

Remark. It is quite amazing to see that any square matrix A has the same eigenvalues as its transpose A T because

For any square matrix of order 2, A , where

the characteristic polynomial is given by the equation

The number ( a + d ) is called the trace of A (denoted tr ( A )), and clearly the number ( ad - bc ) is the determinant of A . So the characteristic polynomial of A can be rewritten as

Let us evaluate the matrix

We leave the details to the reader to check that

This equation is known as the Cayley-Hamilton theorem. It is true for any square matrix A of any order, i.e.

where is the characteristic polynomial of A .

We have some properties of the eigenvalues of a matrix.

Theorem. Let A be a square matrix of order n. If is an eigenvalue of A , then: 1. is an eigenvalue of A m , for 2. If A is invertible, then is an eigenvalue of A -1 . 3. A is not invertible if and only if is an eigenvalue of A . 4. If is any number, then is an eigenvalue of . 5. If A and B are similar, then they have the same characteristic polynomial (which implies they also have the same eigenvalues).

The next natural question to answer deals with the eigenvectors. In the next page, we will discuss the problem of finding eigenvectors..

2 Answers 2

The eigenvectors for a one-qubit unitary are two orthogonal vectors. As such, on the Bloch sphere, they are visualised as a single axis (going through the origin). (Remember that angles on the Bloch sphere are doubled so orthogonal states are an angle $pi$ different on the Bloch sphere, i.e. opposite directions along the same axis.)

The eigenvalue (or, more precisely, the relative angle between the two eigenvalues) is the angle of rotation around that axis.

A state $ ho$ with Bloch sphere coordinates $ ewcommand<s>[1]<oldsymbol<#1>>s requiv (x,y,z)$ has the form $ ho = frac<2>equiv frac<2>, $ with $sigma_x,sigma_y,sigma_z$ the Pauli matrices.

Computing the eigenvalues (eigenvectors) of $ ho$ thus amounts to computing those of $s rcdotssigma$ . Observe that $s rcdots sigma=eginz & x-iy x+iy & -z,end$ and thus the eigenvalues are $lambda_pm = pmsqrt<-det(s rcdots sigma)>=pm|s r|$ . The corresponding eigenvectors are then seen to be $lvertlambda_pm angle = frac<1>>eginx-iy pm |s r| - zend.$ The vectors in the Bloch sphere corresponding to $lvertlambda_pm angle$ have coordinates $egin x_pm &=& pm x/ |s r|, y_pm &=& pm y/ |s r|, z_pm &=& pm z/ |s r|. end$ In other words, the eigenvectors of $s rcdotssigma$ correspond to the two unit vectors in the Bloch sphere along the same direction as $ ho$ .

The eigenvectors of $ ho$ are then clearly the same as those of $s rcdots sigma$ , while its eigenvalues are $(1pmlambda_pm)/2$ .


If T is a linear transformation from a vector space V over a field F into itself and v is a nonzero vector in V , then v is an eigenvector of T if T(v) is a scalar multiple of v . This can be written as

where λ is a scalar in F , known as the eigenvalue, characteristic value, or characteristic root associated with v .

There is a direct correspondence between n-by-n square matrices and linear transformations from an n-dimensional vector space into itself, given any basis of the vector space. Hence, in a finite-dimensional vector space, it is equivalent to define eigenvalues and eigenvectors using either the language of matrices, or the language of linear transformations. [3] [4]

If V is finite-dimensional, the above equation is equivalent to [5]

where A is the matrix representation of T and u is the coordinate vector of v .

Eigenvalues and eigenvectors feature prominently in the analysis of linear transformations. The prefix eigen- is adopted from the German word eigen (cognate with the English word own) for "proper", "characteristic", "own". [6] [7] Originally used to study principal axes of the rotational motion of rigid bodies, eigenvalues and eigenvectors have a wide range of applications, for example in stability analysis, vibration analysis, atomic orbitals, facial recognition, and matrix diagonalization.

In essence, an eigenvector v of a linear transformation T is a nonzero vector that, when T is applied to it, does not change direction. Applying T to the eigenvector only scales the eigenvector by the scalar value λ, called an eigenvalue. This condition can be written as the equation

referred to as the eigenvalue equation or eigenequation. In general, λ may be any scalar. For example, λ may be negative, in which case the eigenvector reverses direction as part of the scaling, or it may be zero or complex.

The Mona Lisa example pictured here provides a simple illustration. Each point on the painting can be represented as a vector pointing from the center of the painting to that point. The linear transformation in this example is called a shear mapping. Points in the top half are moved to the right, and points in the bottom half are moved to the left, proportional to how far they are from the horizontal axis that goes through the middle of the painting. The vectors pointing to each point in the original image are therefore tilted right or left, and made longer or shorter by the transformation. Points along the horizontal axis do not move at all when this transformation is applied. Therefore, any vector that points directly to the right or left with no vertical component is an eigenvector of this transformation, because the mapping does not change its direction. Moreover, these eigenvectors all have an eigenvalue equal to one, because the mapping does not change their length either.

Linear transformations can take many different forms, mapping vectors in a variety of vector spaces, so the eigenvectors can also take many forms. For example, the linear transformation could be a differential operator like d d x >> , in which case the eigenvectors are functions called eigenfunctions that are scaled by that differential operator, such as

Alternatively, the linear transformation could take the form of an n by n matrix, in which case the eigenvectors are n by 1 matrices. If the linear transformation is expressed in the form of an n by n matrix A, then the eigenvalue equation for a linear transformation above can be rewritten as the matrix multiplication

where the eigenvector v is an n by 1 matrix. For a matrix, eigenvalues and eigenvectors can be used to decompose the matrix—for example by diagonalizing it.

Eigenvalues and eigenvectors give rise to many closely related mathematical concepts, and the prefix eigen- is applied liberally when naming them:

  • The set of all eigenvectors of a linear transformation, each paired with its corresponding eigenvalue, is called the eigensystem of that transformation. [8][9]
  • The set of all eigenvectors of T corresponding to the same eigenvalue, together with the zero vector, is called an eigenspace, or the characteristic space of T associated with that eigenvalue. [10]
  • If a set of eigenvectors of T forms a basis of the domain of T, then this basis is called an eigenbasis.

Eigenvalues are often introduced in the context of linear algebra or matrix theory. Historically, however, they arose in the study of quadratic forms and differential equations.

In the 18th century, Leonhard Euler studied the rotational motion of a rigid body, and discovered the importance of the principal axes. [a] Joseph-Louis Lagrange realized that the principal axes are the eigenvectors of the inertia matrix. [11]

In the early 19th century, Augustin-Louis Cauchy saw how their work could be used to classify the quadric surfaces, and generalized it to arbitrary dimensions. [12] Cauchy also coined the term racine caractéristique (characteristic root), for what is now called eigenvalue his term survives in characteristic equation. [b]

Later, Joseph Fourier used the work of Lagrange and Pierre-Simon Laplace to solve the heat equation by separation of variables in his famous 1822 book Théorie analytique de la chaleur. [13] Charles-François Sturm developed Fourier's ideas further, and brought them to the attention of Cauchy, who combined them with his own ideas and arrived at the fact that real symmetric matrices have real eigenvalues. [12] This was extended by Charles Hermite in 1855 to what are now called Hermitian matrices. [14]

Around the same time, Francesco Brioschi proved that the eigenvalues of orthogonal matrices lie on the unit circle, [12] and Alfred Clebsch found the corresponding result for skew-symmetric matrices. [14] Finally, Karl Weierstrass clarified an important aspect in the stability theory started by Laplace, by realizing that defective matrices can cause instability. [12]

In the meantime, Joseph Liouville studied eigenvalue problems similar to those of Sturm the discipline that grew out of their work is now called Sturm–Liouville theory. [15] Schwarz studied the first eigenvalue of Laplace's equation on general domains towards the end of the 19th century, while Poincaré studied Poisson's equation a few years later. [16]

At the start of the 20th century, David Hilbert studied the eigenvalues of integral operators by viewing the operators as infinite matrices. [17] He was the first to use the German word eigen, which means "own", [7] to denote eigenvalues and eigenvectors in 1904, [c] though he may have been following a related usage by Hermann von Helmholtz. For some time, the standard term in English was "proper value", but the more distinctive term "eigenvalue" is the standard today. [18]

The first numerical algorithm for computing eigenvalues and eigenvectors appeared in 1929, when Richard von Mises published the power method. One of the most popular methods today, the QR algorithm, was proposed independently by John G. F. Francis [19] and Vera Kublanovskaya [20] in 1961. [21] [22]

Eigenvalues and eigenvectors are often introduced to students in the context of linear algebra courses focused on matrices. [23] [24] Furthermore, linear transformations over a finite-dimensional vector space can be represented using matrices, [25] [4] which is especially common in numerical and computational applications. [26]

Consider n -dimensional vectors that are formed as a list of n scalars, such as the three-dimensional vectors

These vectors are said to be scalar multiples of each other, or parallel or collinear, if there is a scalar λ such that

Now consider the linear transformation of n -dimensional vectors defined by an n by n matrix A ,

If it occurs that v and w are scalar multiples, that is if

then v is an eigenvector of the linear transformation A and the scale factor λ is the eigenvalue corresponding to that eigenvector. Equation (1) is the eigenvalue equation for the matrix A .

Equation (1) can be stated equivalently as

where I is the n by n identity matrix and 0 is the zero vector.

Eigenvalues and the characteristic polynomial Edit

Equation (2) has a nonzero solution v if and only if the determinant of the matrix (AλI) is zero. Therefore, the eigenvalues of A are values of λ that satisfy the equation

Using Leibniz' rule for the determinant, the left-hand side of Equation (3) is a polynomial function of the variable λ and the degree of this polynomial is n, the order of the matrix A. Its coefficients depend on the entries of A, except that its term of degree n is always (−1) n λ n . This polynomial is called the characteristic polynomial of A. Equation (3) is called the characteristic equation or the secular equation of A.

The fundamental theorem of algebra implies that the characteristic polynomial of an n-by-n matrix A, being a polynomial of degree n, can be factored into the product of n linear terms,

where each λi may be real but in general is a complex number. The numbers λ1, λ2, …, λn, which may not all have distinct values, are roots of the polynomial and are the eigenvalues of A.

As a brief example, which is described in more detail in the examples section later, consider the matrix

Taking the determinant of (AλI) , the characteristic polynomial of A is

Setting the characteristic polynomial equal to zero, it has roots at λ=1 and λ=3 , which are the two eigenvalues of A. The eigenvectors corresponding to each eigenvalue can be found by solving for the components of v in the equation ( A − λ I ) v = 0 =mathbf <0>> . In this example, the eigenvectors are any nonzero scalar multiples of

If the entries of the matrix A are all real numbers, then the coefficients of the characteristic polynomial will also be real numbers, but the eigenvalues may still have nonzero imaginary parts. The entries of the corresponding eigenvectors therefore may also have nonzero imaginary parts. Similarly, the eigenvalues may be irrational numbers even if all the entries of A are rational numbers or even if they are all integers. However, if the entries of A are all algebraic numbers, which include the rationals, the eigenvalues are complex algebraic numbers.

The non-real roots of a real polynomial with real coefficients can be grouped into pairs of complex conjugates, namely with the two members of each pair having imaginary parts that differ only in sign and the same real part. If the degree is odd, then by the intermediate value theorem at least one of the roots is real. Therefore, any real matrix with odd order has at least one real eigenvalue, whereas a real matrix with even order may not have any real eigenvalues. The eigenvectors associated with these complex eigenvalues are also complex and also appear in complex conjugate pairs.

Algebraic multiplicity Edit

Let λi be an eigenvalue of an n by n matrix A. The algebraic multiplicity μA(λi) of the eigenvalue is its multiplicity as a root of the characteristic polynomial, that is, the largest integer k such that (λλi) k divides evenly that polynomial. [10] [27] [28]

Suppose a matrix A has dimension n and dn distinct eigenvalues. Whereas Equation (4) factors the characteristic polynomial of A into the product of n linear terms with some terms potentially repeating, the characteristic polynomial can instead be written as the product of d terms each corresponding to a distinct eigenvalue and raised to the power of the algebraic multiplicity,

If d = n then the right-hand side is the product of n linear terms and this is the same as Equation (4). The size of each eigenvalue's algebraic multiplicity is related to the dimension n as

If μA(λi) = 1, then λi is said to be a simple eigenvalue. [28] If μA(λi) equals the geometric multiplicity of λi, γA(λi), defined in the next section, then λi is said to be a semisimple eigenvalue.

Eigenspaces, geometric multiplicity, and the eigenbasis for matrices Edit

Given a particular eigenvalue λ of the n by n matrix A, define the set E to be all vectors v that satisfy Equation (2),

On one hand, this set is precisely the kernel or nullspace of the matrix (AλI). On the other hand, by definition, any nonzero vector that satisfies this condition is an eigenvector of A associated with λ. So, the set E is the union of the zero vector with the set of all eigenvectors of A associated with λ, and E equals the nullspace of (AλI). E is called the eigenspace or characteristic space of A associated with λ. [29] [10] In general λ is a complex number and the eigenvectors are complex n by 1 matrices. A property of the nullspace is that it is a linear subspace, so E is a linear subspace of ℂ n .

Because the eigenspace E is a linear subspace, it is closed under addition. That is, if two vectors u and v belong to the set E, written u, vE , then (u + v) ∈ E or equivalently A(u + v) = λ(u + v) . This can be checked using the distributive property of matrix multiplication. Similarly, because E is a linear subspace, it is closed under scalar multiplication. That is, if vE and α is a complex number, (αv) ∈ E or equivalently A(αv) = λ(αv) . This can be checked by noting that multiplication of complex matrices by complex numbers is commutative. As long as u + v and αv are not zero, they are also eigenvectors of A associated with λ.

The dimension of the eigenspace E associated with λ, or equivalently the maximum number of linearly independent eigenvectors associated with λ, is referred to as the eigenvalue's geometric multiplicity γA(λ). Because E is also the nullspace of (AλI), the geometric multiplicity of λ is the dimension of the nullspace of (AλI), also called the nullity of (AλI), which relates to the dimension and rank of (AλI) as

Because of the definition of eigenvalues and eigenvectors, an eigenvalue's geometric multiplicity must be at least one, that is, each eigenvalue has at least one associated eigenvector. Furthermore, an eigenvalue's geometric multiplicity cannot exceed its algebraic multiplicity. Additionally, recall that an eigenvalue's algebraic multiplicity cannot exceed n.

Additional properties of eigenvalues Edit

Left and right eigenvectors Edit

Many disciplines traditionally represent vectors as matrices with a single column rather than as matrices with a single row. For that reason, the word "eigenvector" in the context of matrices almost always refers to a right eigenvector, namely a column vector that right multiplies the n × n matrix A in the defining equation, Equation (1),

The eigenvalue and eigenvector problem can also be defined for row vectors that left multiply matrix A . In this formulation, the defining equation is

Diagonalization and the eigendecomposition Edit

Suppose the eigenvectors of A form a basis, or equivalently A has n linearly independent eigenvectors v1, v2, …, vn with associated eigenvalues λ1, λ2, …, λn. The eigenvalues need not be distinct. Define a square matrix Q whose columns are the n linearly independent eigenvectors of A,

Since each column of Q is an eigenvector of A, right multiplying A by Q scales each column of Q by its associated eigenvalue,

With this in mind, define a diagonal matrix Λ where each diagonal element Λii is the eigenvalue associated with the ith column of Q. Then

Because the columns of Q are linearly independent, Q is invertible. Right multiplying both sides of the equation by Q −1 ,

or by instead left multiplying both sides by Q −1 ,

A can therefore be decomposed into a matrix composed of its eigenvectors, a diagonal matrix with its eigenvalues along the diagonal, and the inverse of the matrix of eigenvectors. This is called the eigendecomposition and it is a similarity transformation. Such a matrix A is said to be similar to the diagonal matrix Λ or diagonalizable. The matrix Q is the change of basis matrix of the similarity transformation. Essentially, the matrices A and Λ represent the same linear transformation expressed in two different bases. The eigenvectors are used as the basis when representing the linear transformation as Λ.

Conversely, suppose a matrix A is diagonalizable. Let P be a non-singular square matrix such that P −1 AP is some diagonal matrix D. Left multiplying both by P, AP = PD . Each column of P must therefore be an eigenvector of A whose eigenvalue is the corresponding diagonal element of D. Since the columns of P must be linearly independent for P to be invertible, there exist n linearly independent eigenvectors of A. It then follows that the eigenvectors of A form a basis if and only if A is diagonalizable.

A matrix that is not diagonalizable is said to be defective. For defective matrices, the notion of eigenvectors generalizes to generalized eigenvectors and the diagonal matrix of eigenvalues generalizes to the Jordan normal form. Over an algebraically closed field, any matrix A has a Jordan normal form and therefore admits a basis of generalized eigenvectors and a decomposition into generalized eigenspaces.

Answer: Similar matrices. Two matrices $A$ and $B$ are similar if there exists an invertible matrix $S$ so that $A = S^<-1>BS$. Similar matrices have the same eigenvalues: $ Aunderline = lambda underline quad Leftrightarrowquad B(S^<-1>underline) = lambda (S^<-1>underline). $ The iterates $A_0,A_1,ldots,$ from the QR algorithm are similar matrices since $ A_ = R_kQ_k = Q_k^<-1>Q_k R_k Q_k = Q_k^<-1>A_KQ_k. $ Therefore, $A_0,A_1,ldots,$ have the same eigenvalues.

The secret to why the QR algorithm produces iterates that usually converge to reveal the eigenvalues is from the fact that the algorithm is a well-disguised (successive) power method. A very first idea to calculate eigenvalues might be to perform the power iteration on a basis $underline_1,ldots,underline_n$ of $mathbb^n$ instead of just one vector. That is, to consider the sequences: $ underline_j, Aunderline_j, A^2underline_j,ldots. $

Unfortunately, this does not work as the vectors $A^kunderline_j$ all tend to be close to a multiple of the eigenvector of $A$ corresponding to the eigenvalue of largest magnitude. Moreover, $|A^kunderline_j|$ usually overflows or underflows for moderate $k$. We can resolve the situation by orthonormalizing the vectors after each application of $A$ to prevent them from all converging to the dominate eigenvector.

The QR algorithm based on Gram-Schmidt and the power iteration¶

Let $underline_1,ldots,underline_n$ be $n$ linearly independent vectors in $mathbb^n$ and suppose that $v_1,ldots,v_n$ are an orthonormal basis of Schur vectors for $A$ (see wiki). Consider the following algorithm for computing the vectors $v_1,ldots,v_n$ based on the Gram-Schmidt procedure and the power method:

While this algorithm will be far better than the power iteration without orthogonalization, this algorithm is still numerically unstable (as it is based on Gram-Schmidt). In particular, the computed vectors $underline_1,ldots,underline_n$ may not be orthonormal numerically. To try it out, we implement this algorithm below for symmetric matrices so that the Schur vectors are in fact eigenvalues:

Here, we test this code to witness the numerical instability:

Orthogonal iteration¶

Of course, we can greatly improve the situation by using the modified Gram-Schmidt and the power method, where we do not wait for the vector $underline_1$ before doing a power iteration for $underline_2$. Instead, we do the power method simutaneously on every vector and orthogonalize after every step. A even better idea is to use Householder reflections for extra numerical stability. This gives us an algorithm called orthogonal iteration. The algorithm looks like this:

The matrix iterates $U^<(0)>,U^<(1)>,ldots,$ here are computed so that $(U^<(k)>)^T A U^ <(k)> ightarrow T$, where $T$ is upper-triangular.

Orthogonal iteration to QR algorithm¶

One can view the QR algorithm as a well-designed version of orthogonal iteration with $U^ <(0)>= I$. The connection can seen from the fact that they are both computing QR factorizations of the matrix $A^k$: $ egin extquad A^k &= Q_1R_1Q_1R_1cdots Q_1R_1 = Q_1Q_2R_2Q_1cdots Q_1R_1 = cdots = (Q_1cdots Q_k)(R_kcdots R_1)[10pt] ext quad A^k &= AA^ = AQ^<(k-1)>(Q^<(k-1)>)^TA^ = Q^<(k)>R^<(k)>(Q^<(k-1)>)^TA^ = cdots = Q^<(k)>(R^<(k)>cdots R^<(1)>).[15pt] end $

John Francis' algorithm is computing $A_k = Q_kR_k$ at the $k$th iteration, which is converging to an upper-triangular matrix. This is not too surprising because we expect $Q_1cdots Q_k$ to converge to an orthogonal matrix so $Q_k$ should converge to the identity matrix and hence, $Q_kR_k$ will usually converge to an upper-triangular matrix. Since $A_k = Q_kR_k$ is similar to $A$, the eigenvalues of $R_k$ should converge to the eigenvalues of $A$ as $k ightarrow infty$. In this next section, we make this intuition precise.

A proof of convergence¶

Here, we prove the following convergence theorem with lots of assumption to make the proof as easy as possible. Afterwards, we write down two way to make the theorem strong.

Theorem: Let $A$ be an $n imes n$ symmetric positive definite matrix with distinct eigenvalues given by $ lambda_1>lambda_2>cdots > lambda_n>0. $ Assume further that $A = QLambda Q^T$ is the eigenvalue decomposition of $A$, where $Q^T=LU$ has an LU decomposition and the diagonal entries of $U$ are nonnegative. Then, the unshift QR algorithm on $A$ computes iterates $A_1,A_2,A_3,ldots,$ that converge to a diagonal matrix.

Proof For simplicity here we will start by assuming that every computed QR factorization of a matrix involves an upper-triangular matrix $R$ with nonnegative diagonal entries. This ensures that the QR factorization is unique (see class exercises). Let $ A = QLambda Q^T $ be the eigenvalue decomposition for $A$. Then, $A^k$ can be written as $ A^k = QLambda^k Q^T = (Q_1cdots Q_k)(R_kcdots R_1). $ Since $Q^T = LU$ exists by assumption (recall that $L$ is unit lower triangular so has $1

(1) Recall that eigenvalues are roots of the characteristic polynomial $p(lambda)=det(A-lambda I_n)$.
It follows that we have
&det(A-lambda I_n)
a_<1 1>- lambda & a_ <1 2>& cdots & a_ <1,n>
a_ <2 1>& a_ <2 2>-lambda & cdots & a_ <2,n>
vdots & vdots & ddots & vdots
a_ & a_ & cdots & a_-lambda
end =prod_^n (lambda_i-lambda). ag<*>

Letting $lambda=0$, we see that $det(A)=prod_^n lambda_i$ and this completes the proof of part (a).

(2) Compare the coefficients of $lambda^$ of the both sides of (*).
The coefficient of $lambda^$ of the determinant on the left side of (*) is

$(-1)^(a_<11>+a_<22>+cdots a_)=(-1)^ r(A).$
The coefficient of $lambda^$ of the determinant on the right side of (*) is
$(-1)^sum_^n lambda_i.$
Thus we have $ r(A)=sum_^n lambda_i$.

8.5: Computing Eigenvalues - Mathematics

Consider multiplying a square 3x3 matrix by a 3x1 (column) vector. The result is a 3x1 (column) vector. The 3x3 matrix can be thought of as an operator - it takes a vector, operates on it, and returns a new vector. There are many instances in mathematics and physics in which we are interested in which vectors are left "essentially unchanged" by the operation of the matrix. Specifically, we are interested in those vectors v for which Av=kv where A is a square matrix and k is a real number. A vector v for which this equation hold is called an eigenvector of the matrix A and the associated constant k is called the eigenvalue (or characteristic value) of the vector v. If a matrix has more than one eigenvector the associated eigenvalues can be different for the different eigenvectors.

Geometrically, the action of a matrix on one of its eigenvectors causes the vector to stretch (or shrink) and/or reverse direction.

In order to find the eigenvalues of a nxn matrix A (if any), we solve Av=kv for scalar(s) k. Rearranging, we have Av-kv=0. But kv=kIv where I is the nxn identity matrix

So, 0=Av-kv=Av-kIv=(A-kI)v. This equation is equivalent to a homogeneous system of n equations with n unknowns. This equation has a non-zero solution for v if and only if the determinant det(A-kI) is zero. Thus, by finding the zeros of the polynomial in k determined by the characteristic equation det(A-kI)=0, we will have found the eigenvalues of the matrix A.


To find the eigenvalues of the matrix

we substitute A into the equation det(A-kI)=0 and solve for k. The matrix A-kI is given by

which has determinant k^2-2k-3. Hence, we are looking for values k satisfying k^2-2k-3=0. So, of course, we have k=3 or k=-1 . To find the eigenvectors of the eigenvalue k=3 we look for solutions v of the homogeneous system of equations (A-3I)v=0:

Since the second equation is a constant multiple of the first, this system of equations reduces to the single equation -x+(3/2)y=0 or equivalently x=1.5y. This system has an infinite number of solutions. So for example, choosing y=2 yeilds the vector <3,2> which is thus an eigenvector that has eigenvalue k=3. In a general form, all eigenvectors with eigenvalue 3 have the form <2t,3t> where t is any real number. It can also be shown (by solving the system (A+I)v=0) that vectors of the form <t,-2t> are eigenvectors with eigenvalue k=-1.


Find the eigenvalues and corresponding eigenvalues for the matrix

First, we must find det(A-kI):

This leads to the characteristic equation k^2+2k+2=0 which has complex roots k=-1+i and k=-1-i. To find the eigenvectors for k=-1+i, we solve (A-(-1+i)I)v=0 for v:

The second equation is a constant multiple of the first equation so the system reduces to the single equation (2-i)x-y=0 which implies y=(2-i)x. There are once again an infinite number of eigenvectors of A of the form <t,(2-i)t> with eigenvalue k=-1+i. By examining the system of equations (A-(-1-i)I)v=0 it can also be shown that vectors of the form <t,(2+i)t> are eigenvectors of A with eigenvalue k=-1-i.

From the examples above we can infer a property of eigenvectors and eigenvalues: eigenvectors from distinct eigenvalues are linearly independent. The following examples illustrate that the situation is not so clear cut when the eigenvalues are not distinct.


has two eigenvalues (1 and 1) but they are obviously not distinct. Since A is the identity matrix, Av=v for any vector v, i.e. any vector is an eigenvector of A. We can thus find two linearly independent eigenvectors (say <-2,1> and <3,-2>) one for each eigenvalue.


also has non-distinct eigenvalues of 1 and 1. All eigenvalues are solutions of (A-I)v=0 and are thus of the form <t,0>. Hence, in this case there do not exist two linearly independent eigenvectors for the two eigenvalues 1 and 1 since <t,0> and <s,0> are not linearly independent for any values of s and t.

Symmetric Matrices

There is a very important class of matrices called symmetric matrices that have quite nice properties concerning eigenvalues and eigenvectors. A symmetric matrix A is a square matrix with the property that A_ij=A_ji for all i and j. The matrices

are symmetric matrices. In symmetric matrices the upper right half and the lower left half of the matrix are mirror images of each other about the diagonal.

  1. A has exactly n (not necessarily distinct) eigenvalues
  2. There exists a set of n eigenvectors, one for each eigenvalue, that are mututally orthogonal.

Thus, the situation encountered with the matrix D in the example above cannot happen with a symmetric matrix: A symmetric matrix has n eigenvalues and there exist n linearly independent eigenvectors (because of orthogonality) even if the eigenvalues are not distinct .


Find the eigenvalues and a set of mutually orthogonal eigenvectors of the symmetric matrix

Thus, the characteristic equation is (k-8)(k+1)^2=0 which has roots k=-1, k=-1, and k=8. Note that we have listed k=-1 twice since it is a double root. We must find two eigenvectors for k=-1 and one for k=8. We now examine (A+I)v=0 to find the eigenvectors for the eigenvalue k=-1:

It is easily seen that this system reduces to the single equation 2x+y+2z=0 since the other two equations are twice this one. There are two parameters here (x and z) thus, eigenvectors for k=-1 must have the form y=-2x-2z which corresponds to vectors of the form <s,-2s-2t,t>. We must choose values of s and t that yield two orthogonal vectors (the third comes from the eigenvalue k=8). First, choose anything, say s=1 and t=0: <1,-2,0>. Now find a vector <x,-2x-2z,z> such that

An easy choice here is x=4 and z=-5. So, we now have two orthogonal vectors <1,-2,0> and <4,2,-5> that correspond to the two instances of the eigenvalue k=-1. It can also be shown that the eigenvectors for k=8 are of the form <2r,r,2r> for any value of r. It is easy to check that this vector is orthogonal to the other two we have for any choice of r. So, let's take r=1. We now have the following: eigenvalues and orthogonal eigenvectors:

Note that since this matrix is symmetric we do indeed have 3 eigenvalues and a set of 3 orthogonal (and thus linearly independent) eigenvectors (one for each eigenvalue).

Watch the video: Principal Component Analysis and Extensions - Principal Component Analysis (December 2021).