Date | Topics discussed | ||||||||
---|---|---|---|---|---|---|---|---|---|
Tuesday, March 30 |
This lecture is the last to be devoted solely to linear algebra. I'll
discuss the use of eigenvectors and eigenvalues to perform
diagonalization, and why we might want to diagonalize matrices. In
this I will use two examples I began in the previous lecture.
Example 1Let's take A=(2 1) (3 4)We saw in the last lecture that this A has two eigenvalues and we computed corresponding eigenvectors. Here they are: For the eigenvalue =1, the associated eigenvectors are non-zero multiples of (1,-1). For the eigenvalue =5, the associated eigenvectors are non-zero multiples of (1,3). I then created the matrix C= ( 1 1) (-1 3)and did some computations which I said I would explain afterwards. First, I created C^{-1}.
( 1 1 | 1 0)~(1 1 | 1 0)~(1 0 | 3/4 -1/4) (-1 3 | 0 1) (0 4 | 1 1) (0 1 | 1/4 1/4)So C^{-1} is (3/4 -1/4) (1/4 1/4)and you can check this easily by multiplying C^{-1} with C, and the result with be I_{2}, the 2 by 2 identity matrix. I_{2} is (1 0) (0 1)Now I computed the rather interesting product C^{-1}AC. Since matrix multiplication is associative the way I group the factors doesn't matter. That is, C^{-1}(AC) and (C^{-1}A)C will give the same result. Now AC is (2 1)( 1 1)=( 1 5) (3 4)(-1 3) (-1 15)and then C^{-1}(AC) is (3/4 -1/4)( 1 5)=(1 0) (1/4 1/4)(-1 15) (0 5)Maybe it is remarkable that the result turns out to be a rather simple diagonal matrix, D. On the other hand, let me try to explain what I'm doing. Suppose I take a vector (x,y) in R^{2} and I try to investigate what multiplying the corresponding column vector by the 2 by 2 matrix C^{-1}AC "does". The corresponding column vector is (x)=x(1)+y(0)=a e_{1}+b e_{2} (y) (0) (1)and multiplication by C distributes over this linear combination. So we just need to understand C e_{1} and C _{1}. I "build" C so that the results of these computations are eigenvectors of A corresponding to the eigenvalues =1 and =5: ( 1) and (1) (-1) (3)Therefore when we multiply x( 1)+y(1) (-1) (3)by A the result must be x1( 1)+y5(1) (-1) (3)C changes bases from the "standard basis" of R^{2} to a basis of the two eigenvectors (I know it is a basis because the two vectors are linearly independent because the matrix C has an inverse!). What does C^{-1} then do? It changes back from the basis of eigenvectors to the standard basis. The important thing to observe is that x and y are just going to get changed to 1x and 5y, because we used the eigenvectors when we multiplied by A. So the result D, the diagonal matrix with entries 1 and 5, as the value of C^{-1}AC therefore shouldn't be a surprise. Comment Although C^{-1}AC should be diagonal, I should confess when I'm doing these computations by hand (and even sometimes with the help of computers!) I make mistakes. And I confidently (!) expect the result to be diagonal, and it is not because, well, I may have "dropped" a minus sign, or I may have entered a command wrong in Maple or ... many, many reasons. There are lots of ways to be human, and "To err is human."
Another short digression (1 283) (0 1)We learned that the only eigenvalue was 1 and that all eigenvectors are non-zero multiples of one vector: (1,0). Can this matrix be diagonalized as I just computed above? The answer is an emphatic No! An n by n matrix can be diagonalized exactly when there ia a basis of n eigenvectors of R^{n}, because those are the vectors which would supply us with a matrix, C, which changes coordinates. Sometimes reasons that a matrix can't be diagonalized are subtle and difficult to explain, but here it is evident: there aren't enough eigenvectors. So now I diagonalized the matrix A= (2 1) (3 4)Why should I/we care about this? There certainly turn out to be theoretical reasons, but there are many practical numerical reasons also. For example, I could want to compute powers of A. What's A^{6}, for example? {Why you might want to compute high powers of matrices: it turns out that the "evolution" of certain systems under time is equivalent to looking at powers of matrices. We don't have enough time in the course to explain this.) Anyway, Maple told me that A^{6} is ( 3907 3906) (11718 11719)But I am worried. The matrix looks too nice, and maybe the answer is wrong or ... somehow something gout fouled up. Let me show you how to do some cheap and easy checking of this result. We know that C^{-1}AC=D. Therefore we could multiply on the left by C and on the right by C^{-1}. The result would be A=CDC^{-1}. Notice, please, that I must be very careful about the order, because matrix multiplication is not necessarily commutative. That is, order may matter, but "grouping" (associativity) does not. So we must be careful. Since A=CDC^{-1} we can write and we can regroup (associate, but not reorder!) Now each product CC^{-1} becomes I_{2} and I_{2} is a multiplicative identity -- you can multiply by it and not change anything. Therefore We can choose to compute A^{6} by computing D^{6} instead, and then pre- and post-multiplying by the appropriate matrices. Multiplying diagonal matrices is quite easy: (k_{1} 0 )(m_{1} 0 )=(k_{1}m_{1} 0 ) ( 0 k_{2})( 0 m_{2}) ( 0 k_{2}m_{2})Therefore if D=(1 0) then D^{6}=(1^{6} 0 )=(1 0 ) (0 5) (0 5^{6}) (0 5^{6})and I will not "simplify" further since I don't know offhand what 5^{6} is. I can sort of compute CD^{6}C^{-1}, though. Let's see: D^{6}C^{-1} is (1 0 )(3/4 -1/4)=( 3/4 -1/4) (0 5^{6})(1/4 1/4) (5^{6}/4 5^{6}/4)and now let's left multiply by C: ( 1 1)( 3/4 -1/4)=( [3+5^{6}]/4 [-1+5^{6}]/4) (-1 3)(5^{6}/4 5^{6}/4) ([-3+3·5^{6}]/4 [1+3·5^{6}]/4)What a mess this is! This mess can explain easily some of the (supposedly) correct answer, that A^{6} is ( 3907 3906) (11718 11719)Look: the entries in the top row in both answers do differ by 1, as do the entries in the bottom row. The "/4" is exactly what is needed to make things match up. So A^{100} similarly could be computed by evaluating CD^{100}C^{-1}. Maybe this isn't too darn impressive to you but it is to me. Computing the 100^{th} power of a 2 by 2 matrix directly is ... uhhh ... lots of computation. If, in real life we had a 36 by 36 matrix (that is really not too big) and I wanted to compute powers of this matrix efficiently, I certainly would prepare by diagonalizing it, computing C and C^{-1} and D. This would be much less work than a direct computation.
Matrix exponentials (t^{n}/n! 0 ) ( 0 (5t)^{n}/n!)If you sum this up from n=0 to infinity, the entries in the matrix are (e^{t} 0 ) ( 0 e^{5t})and therefore e^{At} is exactly C(e^{t} 0)C^{-1}=((3/4)e^{t}+(1/4)e^{5t} -(1/4)e^{t}+(1/4)e^{5t}) ( 0 e^{5t}) (-(3/4)e^{t}+(1/4)e^{5t} (1/4)e^{t}+(3/4)e^{5t})The general idea is that all the work can be done componentwise on D, and, once A is diagonalized, you don't really need to do much with A itself. I also admit that the last computation was done by Maple, which has, for these things, much more patience than I do!
Example 2Mr. Hunt kindly answered the QotD given last time. That was: find the eigenvalues of(0 2 1) (2 0 1) (1 1 1)He computed the characteristic polynomial, det(A-I_{3}) and got a cubic polynomial:^{3}-^{2}-6. We can "peel off" a to completely factor this: it becomes: (-3)(+2)(). The roots, the eigenvalues, are 0 and -2 and 3. Each eigenvalue has at least one eigenvector, so I already know I am in good shape for diagonalizing. There will be three eigenvectors, and, if you think about it (this is in the book) eigenvectors corresponding to different eigenvalues are linearly independent.
I then asked the class to produce eigenvectors for each
eigenvalue. After some waiting, we discovered that you could almost
guess the eigenvectors,
since this is an in-class example. So: A-0I_{3}=(0 2 1)(x_{1})=(0x_{1}) (2 0 1)(x_{2}) (0x_{2}) (1 1 1)(x_{3}) (0x_{3})We can guess the answer(s)! It isn't too hard, since we know this is an in-class example. The corresponding eigenvector is (all non-zero multiples of) (1,1,-2). When =-2, we need (x_{1},x_{2},x_{3}) so that A-(-2)I_{3}=(2 2 1)(x_{1})=(0x_{1}) (2 2 1)(x_{2}) (0x_{2}) (1 1 3)(x_{3}) (0x_{3})Again we guessed the answer(s)! The corresponding eigenvector doesn't involve x_{3} at all, and it is (all non-zero multiples of) (1,-1,0). When =3, we need (x_{1},x_{2},x_{3}) so that A-3I_{3}=(-3 2 1)(x_{1})=(0x_{1}) ( 2 -3 1)(x_{2}) (0x_{2}) ( 1 1 -2)(x_{3}) (0x_{3})This was somehow the most difficult one to guess. Of course, we could do row reduction, etc. Oh well. We can guess the answer(s)! The corresponding eigenvector is (all non-zero multiples of) (1,1,1). The eigenvectors for this matrix are therefore (1,1,-2) and (1,-1,0) and (1,1,1) corresponding to =0 and =-2 and =3, respectively. I could now diagonalize, etc. But I asked something more difficult of the class. I remarked that this A was not "random", since it was symmetric: A=A^{t} (a is its own transpose). I believe that Mr. Ivanov first noticed that the these eigenvectors were orthogonal. The dot product of two different eigenvectors was 0. (That certainly did not happen with our first 2 by 2 example.
This is generally true. Please see pp.359-360 of the text for the
following results (which are not hard to verify). The results are
very useful in practice:
( 1 1 1) ( 1 -1 1) (-2 0 1)What if I were to take the transpose of this matrix: (1 1 -2) (1 -1 0) (1 1 1)Now check this: the product of the second matrix with the first is (6 0 0) (0 2 0) (0 0 3)so if I "adjusted" the lengths by multiplying the columns of the initial guess by constants, then the transpose would be the inverse. So I really should take C to be: ( 1/sqrt(6) 1/sqrt(2) 1/sqrt(3)) ( 1/sqrt(6) -1/sqrt(2) 1/sqrt(3)) (-2/sqrt(6) 0 1/sqrt(3))and then C^{-1} would be C^{t}: (1/sqrt(6) 1/sqrt(6) -2/sqrt(6)) (1/sqrt(2) -1/sqrt(2) 0 ) (1/sqrt(3) 1/sqrt(3) 1/sqrt(3))This is wonderful -- well, wonderful because it is less work. Here is the general recipe:
Here A= (0 2 1) (2 0 1) (1 1 1)which has eigenvalues 0 and -2 and 3. C is the matrix ( 1/sqrt(6) 1/sqrt(2) 1/sqrt(3)) ( 1/sqrt(6) -1/sqrt(2) 1/sqrt(3)) (-2/sqrt(6) 0 1/sqrt(3))and then I claim that C^{t}AC is actually the diagonal matrix (0 0 0) (0 -2 0) (0 0 3)If you are too tired to check this, well, I don't really blame you. Here is Maple's verification: > eval(A); [0 2 1] [ ] [2 0 1] [ ] [1 1 1] > eval(C); [ 1/2 1/2 1/2] [ 1/6 6 1/2 2 1/3 3 ] [ ] [ 1/2 1/2 1/2] [ 1/6 6 - 1/2 2 1/3 3 ] [ ] [ 1/2 1/2] [- 1/3 6 0 1/3 3 ] > evalm(transpose(C)&*A&*C); [0 0 0] [ ] [0 -2 0] [ ] [0 0 3]The first two instructions ask Maple to display the data structures associated with A and C: they are our A and C. The last instruction asks Maple to evaluate the matrix product C^{t}AC, and we get the predicted diagonal matrix of eigenvalues. That's all for now. The QotD was: suppose A is a 2 by 2 matrix and A^{t}=5A. What can you say about A and why is what you declare true? Mr. Ivanov asked several times how "often" a matrix is diagonalizable. I tried to evade a general answer. For example, if the matrix is symmetric, then what I wrote above asserts it is diagonalizable. But the general question has been quite well-studied. An "average" (??) matrix should be diagonalizable, but you may have to allow for complex entries. One place to get the full story is Math 350, or look at any advanced linear algebra book. | ||||||||
Thursday, March 25 |
Ms. Mudrak kindly wrote the answer to
the last QotD. Since A=(1 a) and B=(b) (0 1) (c)then AX=B (where X is a 2 by 1 column vector with entries x_{1} and x_{2}) has the solution det(b a) (c 1) b-ac x_{1}=----------- = ------ det(1 a) 1 (0 1)Then the partial derivative of x_{1} with respect to a is -c, with respect to b is 1, and with respect to c is -a. At (0,0,0), these derivatives are 0, 1, and 1.
Eigenvalues, eigenvectors, etc. (5 0 0) (0 7 0) (0 0 2)If we think about (left) multiplication by A as a function from R^{3} to R^{3}, so X gets sent to AX=Y, then this matrix stretches in the x direction by a factor of 5, in the y direction by a factor of 7, and in the z direction by a factor of 2. The "unit sphere", (x_{1})^{2}+(x_{2})^{2}+(x_{3})^{2}=1, is changed to an ellipsoid centered at 0 with axes of symmetry along the coordinate axes, with the various lengths of the ellipsoid determined by the 5, 7, and 2. We want to generalize this.
DEFINITION
A number is called an
eigenvalue of A if there is a non-zero vector X so that
AX=X.
How to find Back to A= (5 0 0) (0 7 0) (0 0 2)The equation det(A-I_{3})=0 asks for the determinant of (5- 0 0 ) ( 0 7- 0 ) ( 0 0 2-)and the determinant of a diagonal matrix is easy, so the characteristic polynomial is (5-)(7-)(2-). Its only roots are 5 and 7 and 2, so these are the eigenvalues of A. What about the associated eigenvectors? Take =5, and let's try to solve (A-5I_{3})X=0 with X not 0. This is (0 0 0)(x_{1}) (0) (0 2 0)(x_{2})=(0) (0 0 -3)(x_{3}) (0)Then -3x_{3}=0 so x_{3} must be 0 and 2x_{2}=0 so x_{2} must be 0, also. However, there is no restriction on x_{1}, so we can take any non-zero number for x_{1}. Therefore (1,0,0) is an eigenvector associated to the eigenvalue =5, and so is (-5,0,0) and so is (sqrt(2),0,0) and ... Generally, any non-zero multiple of an eigenvector will also be an eigenvector, associated to the same eigenvalue. It is similarly easy to find the eigenvectors associated to =7 (non-zero multiples of (0,1,0)) and the eigenvectors associated to =2 (non-zero multiples of (0,0,1)).
Disaster
A series of examples with n=2 #1 Here A= (1 283) (0 1)so the characteristic polynomial is det(1- 283) ( 0 1-)which is (1-)^{2}. The only eigenvalue is =1. What are the candidates for eigenvectors? We need to set =1, and "solve" 0x_{1}+283x_{2}=0 0x_{1}+0x_{2}=0Therefore the only non-zero solutions are the non-zero multiples of (1,0). #2 Here A= (1 283) (0 7)so the characteristic polynomial is det(1- 283) ( 0 7-)which is (1-)(7-). The eigenvalues are 1 and 7. What are the candidates for eigenvectors? If =1, we solve 0x_{1}+283x_{2}=0 0x_{1}+6x_{2}=0The only non-zero solutions are the non-zero multiples of (1,0), so these are the eigenvectors associated to the eigenvalue =1. If =7, we need to solve -6x_{1}+283x_{2}=0 0x_{1}+0x_{2}=0Therefore x_{1}=(283/6)x_{2} and the possibilities for the associated eigenvector are all non-zero multiples of (283/6,1). #3 Here A= (0 -1) (1 0)so the characteristic polynomial is det(- -1) ( 1 -)which is ^{2}+1. The eigenvalues are +/-i. What are the candidates for eigenvectors? If =i, we must solve -ix_{1}-1x_{2}=0 1x_{1}-ix_{2}=0We get one solution by taking x_{2}=1 in the second equation, so x_{1}=i. And if you substitute these values in the first equation then you'll get -i(i)-1(1) which is 0. So the associated eigenvectors are the non-zero multiples of (i,1). If =-i, then the associated eigenvectors are the non-zero multiples of (-i,1). What's going on here? Eigenvectors are supposed to be vectors transformed into multiples of themselves. Why, suddenly, do we get i's coming in? In fact this A is really rather nice. It takes the unit vector along the x-axis, (1,0), and changes it into (1,0). It changes (0,1) into (-1,0). This "action" on the basis vectors should tell you what A does: A rotates the plane counterclockwise by 90 degrees (or, in MathLand, Pi/2). Certainly A doesn't take any real vector into a multiple of itself, but it does do this for certain complex vectors and complex multiples. #4 I said I'd write down a random matrix and of course I did not. I analyzed A= (2 1) (3 4)which is rather carefully chosen. The characteristic polynomial is det(2- 1 ) ( 3 4-)which is (2-)(4-)-3=^{2}-6+8-3=^{2}-6+5=(-5)(-1), so the eigenvalues are 5 and 1. When =1, we must solve 1x_{1}+1x_{2}=0 3x_{1}+3x_{2}=0and one solution is (1,-1). So all associated eigenvectors are non-zero multiples of (1,-1). When =5, we must solve -3x_{1}+1x_{2}=0 3x_{1}-1x_{2}=0and one solution is (1,3). So all associated eigenvectors are non-zero multiples of (1,3). Next time I will continue with example #4 and show how to "diagonalize" that A, leading to much faster computation of certain quantities, and maybe better understanding of the action of A.
The disaster, revisited (1 3 5 -7 2) (0 2 PI sqrt(2) 8) (0 0 3 1/3 -2/7) (0 0 0 4 17) (0 0 0 0 5)It certainly is true that the characteristic polynomial is (-1)(-2)(-3)(-4)(-5), because the matrix A-I_{5} is in upper-triangular form, and the determinant is the product of the diagonal elements. So I therefore know that the eigenvalues are 1, 2, 3, 4, and 5. My mistake was being too hasty in telling the class about the eigenvectors. They are not simple, and certainly not as simple as I first stated. One of them is: (1,0,0,0,0). This is an eigenvector associated to the eigenvalue 1. When =2, though, we must solve (-1 3 5 -7 2)(x_{1}) (0) ( 0 0 PI sqrt(2) 8)(x_{2}) (0) ( 0 0 1 1/3 -2/7)(x_{3})=(0) ( 0 0 0 2 17)(x_{4}) (0) ( 0 0 0 0 3)(x_{5}) (0)The last equation tells me that x_{5} must be 0. Then I can go backwards: the fourth equation tells me that x_{4} must be 0 since I already know that x_{5} is 0. The third equation, because both x_{4} and x_{5} are 0, tells me that x_{3}=0. But now consider the first two equations, which I will write with the last three variables erased since they are 0: -1x_{1}+3x_{2}=0 0x_{1}+0x_{2}=0Since the coefficient of x_{2} in the second equation is 0, we can't continue as before and conclude that x_{2} is 0. In fact, only the first equation gives a restriction. So (3,1,0,0,0) is an eigenvector associated with the eigenvalue =2. It actually gets much worse. Here is a list of the eigenvalues and corresponding eigenvectors created by Maple (I used the command eigenvects in the package linalg): An eigenvector 1 (1, 0, 0, 0, 0) 2 (3, 1, 0, 0, 0) 3 (3/2*Pi+5/2, Pi, 1, 0, 0) 4 (1/2*Pi+3/2*2^(1/2)-16/3, 1/2*Pi+3/2*2^(1/2), 1, 3, 0) 5 (113/168*Pi+17/4*2^(1/2)-4013/168, 113/126*Pi+17/3*2^(1/2)+8/3, 113/42, 17, 1)
The QotD was: find the eigenvalues of (0 2 1) (2 0 1) (1 1 1)Please read the textbook and hand in 7.8:5, 8.1: 1 a,b, 7 a,b, 19 a,b | ||||||||
Tuesday, March 23 |
The last QotD asked: Does the following collection of these 5 vectors in R^{5} form a basis: (5, 2, 1, 0, 0) and (3, 2, 2, 0, -1) and (0, 1, 3, 2, 1) and (2, -2, 2, -2, 2) and (0, 1, 1, 1, 0)? Most students answered this by "assembling" the matrix: (5 2 1 0 0) (3 2 2 0 -1) (0 1 3 2 1) (2 -2 2 -2 2) (0 1 1 1 0)and studied the determinant. The actual value of the determinant is -16. Students either got the actual value or did enough work to conclude that the determinant is non-zero. But, to me, because I can ask (as I just did!) Maple to evaluate the determinant, more important is establishing the intellectual link between -16 and the given set of 5 vectors in R^{5} is a basis.
An explicit formula for A^{-1}
The formula for 3 by 3 (a b c) (d e f) (g h i)and we evaluate the determinant of A by "expanding" along the first row of A: (a b c) det(A)=det(d e f)=a·det(e f)-b·det(d f)+c·det(d e) (g h i) (h i) (g i) (g h)I want to resist simplifying anything. Look at the right-hand side of the equation. It seems like a dot product of two 3 dimensional vectors: (a,b,c) and the vector (det(e f),-det(d f), det(d e)) (h i) (g i) (g h)What would the dot product of (d,e,f) and this vector be? Well, it would be d·det(e f)e·-det(d f)+f·det(d e)) (h i) (g i) (g h)This is, if you go backwards (unsimplify!), the determinant of (d e f) (d e f) (g h i)Since this determinant has two rows the same its value must be 0. Similarly if you took the dot product of (g,h,i) you would get 0. This is all quite weird. Similar things occur if you try expanding along the other rows. In fact, "assemble" the yucky matrix (+det(e f) -det(b c) +det(b c) ) ( (h i) (h i) (e f) ) ( ) (-det(d f) +det(a c) -det(a c) ) ( (g i) (g i) (d f) ) ( ) (+det(d e) -det(a b) +det(a b) ) ( (g h) (g h) (d e) )Then the product of that matrix with (a b c) (d e f) (g h i)is exactly (det(A) 0 0 ) ( 0 det(A) 0 ) ( 0 0 det(A) )I remarked above that this was complicated. So if we divided the weird matrix of determinants by det(A) (when det(A) is not 0) then we would get something whose product with the original matrix is I_{3}. This explanation was given so that you would hold still for the general formula. You don't need to memorize this derivation, just learn a bit about the general formula!
A formula for A^{-1}
A 2 by 2 inverse (2 -6) (5 7)Then the matrix of M_{ij}'s would be ( 7 5) (-6 2)We need to flip this and then adjust the signs: (7 -6) transpose; ( 7 6) (5 2) +'s & -'s (-5 2)and now the determinant of the original matrix is 2(7)-(-6)5=44, and we must divide by this: ( 7/44 6/44) (-5/44 2/44)This is our candidate for A^{-1}. Is it correct? We can check the product: (2 -6)( 7/44 6/44) this (2·7+6·5)/44 (2·6-6·2)/44) (5 7)(-5/44 2/44) is (5·7-7·5)/44 (5·6+7·2)/44)and all the fractions work out, so the result is I_{2}, the 2 by 2 identity matrix. This is not a miracle, but is just as we predicted and should expect.
A 3 by 3 inverse ( 3 0 -2) ( 4 2 2) (-1 0 7)Then det(A) (expand along the second column) is 2 times the determinant of ( 3 -2) (-1 7)which is 21-2=19. So the determinant of A is 38. (As I mentioned in class, I use Maple to check my computations!). Then the matrix we need to work on is: (+det(e f) -det(b c) +det(b c) ) (+det(2 2) -det(0 -2) +det(0 -2) ) ( (h i) (h i) (e f) ) ( (0 7) (0 7) (2 2) ) ( ) ( ) (-det(d f) +det(a c) -det(a c) )=(-det(4 2) +det( 3 -2) -det(3 -2) ) ( (g i) (g i) (d f) ) ( (-1 7) (-1 7) (4 2) ) ( ) ( ) (+det(d e) -det(a b) +det(a b) ) (+det(4 2) -det(3 0) +det(3 0) ) ( (g h) (g h) (d e) ) ( (-1 0) (-1 0) (4 2) )I copied this from the yucky matrix formula above and plugged in the appropriate values of a and b and c and d and e and f and g and h and i and j. This matrix is ( 14 0 4) (-30 19 -14) ( 2 0 6)and now divide each entry by 38. The result (which is the predicted A^{-1}) is ( 14/38 0 4/38) (-30/38 19/38 -14/38) ( 2/38 0 6/38)I did some checking of this in class by multiplying some rows by some columns and everything worked out, consistent with the assertion that we have created the inverse of A. I also just checked this result with Maple, and the answer given there was the same.
n=4 (x 2 3 4) (1 2 3 4) (1 x 3 4) (1 2 x 4)Suppose that I very much need to know the (1,2)^{th} entry in A^{-1}. How can I find it? Probably the simplest way is to use the formula we have for A^{-1}. So we will need to know det(A). Clever math follows Since A has entries with x, det(A) is some sort of function of x. I suggested cos(x), and, met with derision ("ridicule, mockery" according to the dictionary), I changed my answer to 3cos(x+7). I was corrected: the answer, as a function of x, would be a polynomial in x. I asked why. I was told that the determinant is evaluated by sums of products of entries, and since some of the entries are "x", the result must be a polynomial in x. What degree could this polynomial be? Since there are three x's in A, the highest degree the polynomial could be is 3. So what polynomial of degree 3 is this? What do we know about the polynomial? For example, are there values of x for which the determinant will be 0? Well, if x=1, the first two rows are identical. Then the determinant must be 0 (interchange the rows, see that the matrix doesn't change, but the det changes sign, and the only way -det(A) could equal det(A) is for det(A) to be 0). Also, if x=3, the same logic applies to rows 2 and 3 and if x=3 apply the logic to rows 2 and 4. Therefore the determinant is a polynomial of degree<=3 which has roots at 1 and 2 and 3, so that the polynomial must be (CONSTANT)(x-1)(x-2)(x-3). What's the CONSTANT? The only way to get an x^{3} in the determinant expansion is to take a product with the three x's. If you remember how rook arrangements work, you can see that the only product with three x's is x·x·x·4, and the sign is + (there are 2 reversals -- you can count them). Therefore the CONSTANT is 4, and the determinant must be 4(x-1)(x-2)(x-3). But I'd like the (1,2)^{th} entry in A^{-1}. According to the formula we developed above, this means I need to evaluate the determinant of the (2,1)^{th} minor (flip the coordinates!) and then multiply by (-1)^{1+2}=-1. Let's see: for the (2,1)^{th} minor we just need to delete the second row and first column: (x 2 3 4) (2 3 4) (1 2 3 4) ===> (x 3 4) (1 x 3 4) (2 x 4) (1 2 x 4)and then compute the determinant of the resulting 3 by 3 matrix. Again, because of the x's, this is a polynomial of degree at most 2. And when x=2 or x=3, the determinant is 0 because two of the rows are the same (we could also look at columns but I've been doing row thinking since the text is row-oriented). Therefore the determinant of the 3 by 3 matrix is CONSTANT(x-2)(x-3). Again, the term with two x's multiplied also has a 4, so the CONSTANT is 4. Therefore the determinant is 4(x-2)(x-3). Now let's not forget (-1)^{1+2}=-1. The (1,2)^{th} entry in A^{-1} must be -4[(x-2)(x-3)]/[4(x-1)(x-2)(x-3)]= -1/(x-1). And, hey, I did check this with Maple which can compute inverses of symbolic matrices (if they aren't too large!) and Maple's answer agreed the answer we just computed. Mr. Ivanov asked how Maple computed these things, since, well, maybe say x=1 so this would be dividing by 0 or something. I wrote him a long explanatory e-mail with the information I have on the subject.
Cramer's Rule
n=2 3x_{1}-7x_{2}=4 2x_{1}+5x_{2}=6Here A=(3 -7) X=(x_{1}) B=(4) Q_{1}=(4 -7) Q_{2}=(3 4) (2 5) (x_{2}) (6) (6 5) (2 6)Therefore x_{1} should be det(Q_{1})/det(A) which is 62/29 (they're only 2 by 2 determinants!) and x_{2} should be det(Q_{2})/det(A) which is 10/29. Let's check by direct substitution in the original equations. The left-hand side of the equation 3x_{1}-7x_{2}=4 becomes 3[62/29]-7[10/29]=(186-70)/29=116/29 which actually is 4! The left-hand side of the equation 2x_{1}+5x_{2}=6 becomes 2[62/29]+5[10/29]=(124+50)/29=174/29 which actually is 6! Wow. Maybe, wow. Then I tried another example, from the text.
n=3 5x_{1}-6x_{2}+1x_{3}=4 -1x_{1}+3x_{2}-4x_{3}=5 2x_{1}+3x_{2}+1x_{3}=-8I did first ask if the solution could be x_{1}=sqrt(2) and x_{2}=PI and x_{3}=e. There was a short period of quiet while people assimilated this ridiculous assertion. Finally, several people remarked that sqrt(2) was irrational (so are PI and e, by the way) but Cramer's Rule asserts that x_{1} should be a quotient of integers, so "my" value of x_{1} had to be incorrect. I think what I then did was write the formula for one of the x_{j}'s, maybe x_{2}: ( 5 4 1) det(-1 5 -4) ( 2 -8 1) x_{2}= --------------- ( 5 -6 1) det(-1 3 -4) ( 2 3 1)where in order to get the second variable I substituted the "B" column for the second column of A (this creates Q_{2}). That's all I did with this example, since I was exhausted and didn't want to compute another determinant. The QotD was the following: suppose we have AX=B, a 2 by 2 linear system where A= A=(1 a) and B=(b) (0 1) (c)and the two components of the column vector X are x_{1} and x_{2}. Since det(A)=1 the system has a unique solution for all values of a and b and c. Let x_{1}=x_{1}(a,b,c) be that unique solution: it is a function of a and b and c. What are the partial derivatives of x_{1} with respect to a and b and c when a=0 and b=0 and c=0? I think this is a fairly sophisticated question. I haven't looked at the results yet, but I hope a few people got it right.
Cultural comments | ||||||||
Thursday, March 11 |
Mr. Meiswinkle only had to be urged a
little bit to
present a solution to yesterday's QotD.
(4 1 0 1) (1 1 0 0) (1 1 0 0) (1 1 0 0) (1 0 -1/2 -1/2) (1 0 -1/2 -1/2) (0 2 1 1) (0 2 1 1) (0 2 1 1) (0 1 1/2 1/2) (0 1 1/2 1/2) (0 1 1/2 1/2) (1 1 0 0) (4 1 0 1) (0 -3 0 1) (0 -3 0 1) (0 0 3/2 5/2) (0 0 1 5/3) (0 0 2 2) (0 0 2 2) (0 0 2 2) (0 0 2 2) (0 0 2 2) (0 0 2 2) REMEMBER -1 2 3/2 (1 0 -1/2 -1/2) (1 0 0 -1/3) (1 0 0 -1/3) (1 0 0 0) (0 1 1/2 1/2) (0 1 0 -1/3) (0 1 0 -1/3) (0 1 0 0) (0 0 1 5/3) (0 0 1 5/3) (0 0 1 5/3) (0 0 1 0) (0 0 2 2) (0 0 0 -4/3) (0 0 0 1) (0 0 0 1) -4/3The product of the things to REMEMBER is (-1)(2)(3/2)(-4/3)=4, so that's the determinant of the matrix. I would very (very!) rarely evaluate a determinant with a computational scheme exactly like the one above. There are numerous other ways to compute determinants, and this lecture is intended to show you some of them. After this lecture, you may compute the determinant of a matrix using any valid method you choose. Of course, I hope you use the method correctly! Onwards:
The official definition
Rook arrangements (0 0 a 0 0) (0 0 0 0 b) (0 0 0 c 0) (d 0 0 0 0) (0 e 0 0 0)the result will be -abcde. It turns out that the determinant of a matrix with just a "rook arrangement" of positions occupied will always be (-1)^{# of reversals}(the product of the positions). If the signature is even, then the (-1)^{even} is +1, and if it is odd, then the sign will be -. Now here is the official definition of determinant: The determinant of an n by n matrix A=(a_{ij}) is the SUM over all n! permutations p of: (-1)^{# of reversals of p}a_{1p(1)}_{2p(2)}a_{3p(3)}...a_{np(n)} Just so maybe you understand how unwieldy this is, for a 10 by 10 matrix, the SUM has 10!=3,628,800 terms, and each of the terms has a sign (+ or -) and is obtained by taking the product of 10 entries in the matrix. This is almost ludicrous computation. But the determinants of much bigger matrices can be and are computed easily (in time proportional to n^{2}, not to some superexponential growth). By the way, the "rules" for computing the determinants of 2 by 2 and 3 by 3 matrices which were so marvelously illustrated last time are clever methods to put the correct signs in front of each product. I don't know any methods for the 24 products involved in the definition of a 4 by 4 matrix.
A very very very special case
Example (4 1 0 1) (4 1 0 1) (4 1 0 1) (4 1 0 1) (0 2 1 1)~(0 2 1 1)~(0 2 1 1)~(0 2 1 1) (1 1 0 0) (0 3/4 0 -1/4) (0 0 -3/8 -5/8) (0 0 -3/8 -5/8) (0 0 2 2) (0 0 2 2) (0 0 2 2) (0 0 0 -4/3)Now multiply the diagonal elements: (4)(2)(-3/8)(-4/3)=4. All I did was row operations to "clear" the low-triangular elements to 0. This is easy. The needed work for this is proportional to n^{2}, and is therefore much less than using the definition (which would involve n! amount of work).
Minors, cofactors, row and column expansions, etc.
Evaluating a determinant by "expanding" along a row
Evaluating a determinant by "expanding" along a column The important thing to keep track of is the pattern of signs. The signs start at +1 in the upper left corner, and alternate at each vertical or horizontal step: (+ - + - + - + - + - + - ...) (- + - + - + - + - + - + ...) (+ - + - + - + - + - + - ...) (- + - + - + - + - + - + ...) (+ - + - + - + - + - + - ...) (- + - + - + - + - + - + ...) (+ - + - + - + - + - + - ...) (...........................) (...........................)
Examples (4 1 0 1) (0 2 1 1) (1 1 0 0) (0 0 2 2)I asked students what row they would like to use, and was told: "the third". Third row expansion (4 1 0 1) det(0 2 1 1)=+1det(M_{31})-1det(M_{32})+0det(M_{33})-0det(M_{34}) (1 1 0 0) (0 0 2 2)Now consider: (1 0 1) (4 0 1) det(M_{31})=det(2 1 1)=4+4-2=6; det(M_{32})=det(1 0 0)=0+0+2=2. (0 2 2) (0 2 2)I used the special rule for 3 by 3 matrices, but one can continue to expand along etc. The final result is 1(6)-1(2)=4, as it should be. I think we then did the third column expansion. Third column expansion (4 1 0 1) det(0 2 1 1)=+0det(M_{13})-1det(M_{23})+0det(M_{33})-2det(M_{34}) (1 1 0 0) (0 0 2 2)And: (4 1 1) (4 1 1) det(M_{23})=det(1 1 0)=2(4)=8; det(M_{34})=det(0 2 1)=-1-5=-6. (0 0 2) (1 1 0)so that det(A)=-1(8)-2(-6)=4. I evaluated the first det by expanding along the third row, and the second det by using the spcial rule for 3 by 3 matrices. I have found that expanding along a row or column is sometimes useful when dealing with sparse matrices, those with relatively few non-zero entries. But, generally, I convert to upper-triangular form and take the product of the diagonal elements.
A recursive definition
Historical note 'Twas brillig, and the slithy toves Did gyre and gimble in the wabe; All mimsy were the borogoves, And the mome raths outgrabe.This was written using the pseudonym, Lewis Carroll.
Transpose (2 0 4) (4 -1 16)is (2 4) (0 -1) (4 16)The transpose of a p by q matrix is a q by p matrix, so the transpose of a square matrix is a square matrix of the same size. The following is true, and illustrates the fact that row algorithms and column algorithms will produce the same value for determinant: IMPORTANT: det(A)=det(A^{t}) This really occurs because in a rook arrangement, the total number of rooks below and to the left is always equal to the total number above and to the right, so the signs will match up, and the # of reversals in A will equal the # of reversals in A^{t} (any rook which is below and to the left of another rook has that rook above and to the right!).
Special names
The QotD was: Please continue reading the text. The homework due at the next meeting is 7.4: 1 and 7.5: 5, 13 and 7.6: 1. Have a good vacation. | ||||||||
Tuesday, March 9 |
Mr. Inserra kindly agreed to write a
solution to the last QotD. Actually, Mr. Inserra was probably
bullied by the instructor of the course, who has the
sensitivity of granite. Mr. Inserra wrote something like(3 3 3 | 1 0 0) (1 1 1 | 1/3 0 0) (1 0 -1/2 | -1/6 1/2 0) (3 1 0 | 0 1 0)~(0 -2 -3 | -1 1 0)~(0 1 3/2 | 1/2 -1/2 0) (3 0 -1 | 0 0 1) (0 -3 -4 | -1 0 1) (0 0 1/2 | 1/2 -3/2 1)and finally (1 0 -1/2 | -1/6 1/2 0) (1 0 0 | 1/3 -1 1) (0 1 3/2 | 1/2 -1/2 0)~(0 1 0 | -1 4 -3) (0 0 1/2 | 1/2 -3/2 1) (0 0 1 | 1 -3 2)So (1/3 -1 1) ( -1 4 -3) ( 1 -3 2)is the inverse of (3 3 3) (3 1 0) (3 0 -1)and this can be easily checked by computing the product of the matrices. The instructor then remarked that the first 6 solutions he read for that QotD were all distinct. Therefore at least 5 of them were incorrect. Ummm ... engineering students should be able to do rational arithmetic. Let's look at n by n matrices, the nicest matrices. What do we know about these matrices? Square matrices represent systmes of linear equations. If A is such a matrix, then its rank is an integer between 0 and n. What if A is a "full rank" matrix? Then the RREF form of A is I_{n}, the n by n identity matrix. What happens?
Matrices which are n by n with rank=n
So square matrices having largest rank are very good. We will discuss determinant. If A is an n by n matrix, then det(A) will be a real number. This number is 0 exactly when the rank of A is less than n. So the value of det(A) could serve as a diagnostic for when the rank=n, if we know how to calculate det(A). When A is 2 by 2 or when it is 3 by 3 there are rather simple recipes for det(A). I want to understand what det(A) means and how to compute it for n larger than 3. Why does one need det(A) for n>3, anyway? Here's an idea I got from Professor Komarova, who is teaching section 1 of Math 421. I hope that M&AE students will appreciate this example. Think of a robot arm (as excellently pictured). What information is needed to understand the "state" of the end of the arm? The word "state" is used here in terms of control theory. We need to know the position (x and y and z coordinates) and, also, if the arm is moving (the arrow in the picture), we would probably also like to record the velocity of the end of the arm, and that's a vector with three components: already we are up to R^{6} for the "state space" of the end of the robot arm! And there may be more complications, such as a joint or two on the arm, etc. While it may be obvious how to record the state of the arm, there may be more advantageous points of view, such as using something on the robot as the origin of the system of coordinates (recording data with respect to the robot itself). Then the problem of changing coordinates occurs, from one system (the "absolute" xyz system) and the local system of the robot. Typically n by n invertible matrices and their inverses will be used. And the n might be larger than we might guess. I am going to take a phenomenogical approach to determinants. That is, according to the Oxford English Dictionary, I will deal "with the description or classification of phenomena, not with their explanation or cause." So I will try to describe properties of determinants, and only vaguely hint at the connections between the properties -- how they logically depend on each other. Determinants are quite complicated, and a detailed explanation of what I will show you would probably take weeks! So I will start with an n by n matrix A, which does the following: if X is a vector in R^{n}, then AX is another vector in R^{n}. So left multiplication by A is a function from R^{n} to R^{n}, taking n-dimensional vectors to n-dimensional vectors (the vectors are column vectors here, of course).
Determinants and geometry Example 1 Suppose that A is (2 0) (0 3)Then the n-dimensional unit cube is just the two-dimensional square whose corners are (0,0) and (1,0) and (1,1) and (0,1). That part is easy. What happens when we look at A(the unit square)? In this case we get a nice rectangle with corners (0,0) and (2,0) and (2,3) and (0,3). In class I made quite a production of this, and I constantly reminded people that we were dealing with matrix multiplication by A so everything was linear. Therefore A(the unit cube) in two dimensions will always be something "linear", indeed, a parallelogram with one vertex (0,0) etc. It can't be a circle! (It could be a line segment or a point, though: "degenerate parallelograms". You should be able to give A's which transform the unit square to a line segment or a point.) The area of a 2 by 3 rectangle is 6, so det(A)=6 for this matrix. Example 2 Suppose that A is (0 1) (1 0)Then the unit square becomes ... the unit square? Well, not exactly. The situation is too darn symmetric to see what happens. Suppose I draw a block F in the square and I very carefully try to compare the domain and range versions of the F. Note that this A takes the i unit vector along the x-axis and changes it to j. And then it takes j and changes it to i. The "positive" angle from i to j (positive is counterclockwise) to "negative" (counterclockwise). While I can look down at the plane and read the F on the left, there is no way (!) I can "read" the F on the right! This A reverses orientation, and its determinant will therefore be negative. Since the geometric area is not changed, the determinant is -1. Example 3 Suppose that A is (1 LARGE #) (0 1 )where LARGE # is indeed some really large positive number. This is an example I would be reluctant to show in a beginning linear algebra course, since it is confusing. This mapping distorts distances a great deal (the vertical sides of the square in the domain are 1 unit long, and the corresponding edges in the range are >LARGE #). However, what is amazing is that the area distortion factor is 1. The unit square gets changed to a parallelogram of area 1 (base and height are both 1, after all). This somewhat paradoxical A is an example of a 2-dimensional shear. Notice that the orientation is preserved: although the F is distorted, I can still read it without "flipping" it. This A has determinant 1. Notice that since these mappings are LINEAR all the areas are distorted by the same factor. That is, if A is a matrix which changes the square's area by multiplying it by 23, then the area of any other region will also be multiplied by 23. In fact, if you think carefully about the area of the unit square first when it is transformed A and then by another matrix, B, then you can see that the compound change in area is det(B)det(A). But we are just multiplying the column vector X first by A and then by B: B(AX). Matrix multiplication is associative, and B(AX)=(BA)X. If you think even more, you can see that det(BA)=det(B)det(A). Amazing! This is useful if we can write a weird matrix as a product of simpler ones, and then compute the determinants of the simpler matrices. There are some real-world algorithms which use this approach. The determinant is positive if it keeps the same orientation, and it is negative if it reverses orientation. My geometric "intuition" is not particularly good in dimension 23 (it barely works in 1 or 2 or 3!) so I can't really tell you what orientation looks like there. In dimension 3, the determinant of A turns out to be equal to: [(row 1)x(row 2)]·(row 3) if you think about the rows of A as three-dimensional vectors. Here x and · are cross and dot products in R^{3}. Sometimes with specific examples of triples of vectors you can "see" the reversal of orientation, but it can be complicated. Now I asked people to try to see the geometry of det(A) in 3 dimensions. The unit cube is changed into a parallelopiped (that's what it is called!) with one vertex at 0. The oriented volume of that object is det(A). There are now a wider variety of examples with det(A)=0. The first example I was given was the all 0's matrix, which I said was not particularly creative. How about (1 1 1) (1 1 1) (1 1 1)I was asked? Well, this is a degenerate (!) three dimensional object. Indeed, if you think about it, what you get is the collection of vectors (t,t,t) where t is in [0,1]. This object is 1 dimensional, and its three-dimensional volume is 0, so this determinant must be 0. Maybe slightly more complicated is (1 1 1) (1 0 1) (1 1 1)where the image in R^{3} is a tilted parallelogram, a two-dimensional object: one edge is (1,1,1) and the other is (1,0,1). This object also has three-dimensional volume equal to 0, so the determinant of this matrix is 0. The two matrices have rank 2 and 1, respectively. These numbers are the same as the dimension count of the image of the unit cube.
Determinants and row operations
What about the second row operation? In two dimensions, we could start with (a_{11} a_{12}) (a_{21} a_{22})The lower two triangles in the picture to the right represent the image parallelogram. If we add the second row to the first row, the result is (a_{11}+a_{21} a_{12}+a_{22}) ( a_{21} a_{22} )Careful observation will show you that the diagonal of the first parallelogram now becomes a side of the "new" parallelogram. The top two triangles are the new one, and basic geometry should convince you that the areas of the two parallelograms don't change. So adding one row to another doesn't change the determinant, and even adding a multiple of one row to another doesn't change the determinant. What happens when you interchange rows? Here the mystery of orientation intervenes, just as in example 2 above. The sign of the determinant flips, from + to - or from - to +. There's a sign change.
Example (0 2 1 1) (2 1 1 0) (1 1 1 2) (2 0 2 0)Step 1 Exchange rows 1 and 3. REMEMBER -1. (1 1 1 2) (2 1 1 0) (0 2 1 1) (2 0 2 0)Step 2 Use multiples of row 1 to clear the remainder of the first column. Nothing to "remember". (1 1 1 2) (0 -1 -1 -4) (0 2 1 1) (0 -2 0 -4)Step 3 Multiply row 2 by -1. REMEMBER -1. (1 1 1 2) (0 1 1 4) (0 2 1 1) (0 -2 0 -4)Step 4 Use multiples of row 2 to clear the remainder of the second column. Nothing to "remember". (1 0 0 -2) (0 1 1 4) (0 0 -1 -7) (0 0 2 4)Step 5 Multiply row 3 by -1. REMEMBER -1. (1 0 0 -2) (0 1 1 4) (0 0 1 7) (0 0 2 4)Step 6 Use multiples of row 3 to clear the remainder of the third column. Nothing to "remember". (1 0 0 -2) (0 1 0 -3) (0 0 1 7) (0 0 0 -10)Step 7 Multiply row 3 by -1/10. REMEMBER -10. (1 0 0 -2) (0 1 0 -3) (0 0 1 7) (0 0 0 1)Step 8 Use multiples of row 4 to clear the remainder of the fourth column. Nothing to "remember". (1 0 0 0) (0 1 0 0) (0 0 1 0) (0 0 0 1)Clearly (yeah, I think in this case, "clearly" is appropriate!) the result has determinant 1. The determinant of the original matrix is the product of all of the REMEMBER notes: (-1)(-1)(-1)(-10). So the value of the determinant is 10. By the way, thank goodness, Maple agrees. It will turn out that I did some extra, unnecessary work here. It is enough to convert the matrix into "upper triangular" form and then take the product of the diagonal entries. We will see this next time. So come to class! In dimension 2, we learn early on that the determinant of (a_{11} a_{12}) (a_{21} a_{22})is +a_{11}a_{22}-a_{21} a_{12}. The picture indicates this. In dimension 3, we learn early on that the determinant of (a_{11} a_{12} a_{13}) (a_{21} a_{22} a_{23}) (a_{31} a_{32} a_{33})is three positive products of three terms and three negative products of three terms: +a_{11}a_{22}a_{33}+a_{12}a_{23}a_{31}+a_{13}a_{21}a_{32}-a_{13}a_{22}a_{31}-a_{11}a_{23}a_{32}-a_{12}a_{21}a_{33} The picture indicates this. Maybe: if you can understand the picture (the northwest-southeast products are positive and the northeast-southwest products are negative). As I mentioned in class, the number of products goes up. It starts 2 and 6, and then ... 24 ... 120, and is n! products of n terms with signs. n! is approximately sqrt(2PI*n)(n/e)^{n} (Stirling's formula), as I said in class, and you can see this grows quickly. 10! is about three and a half million, so a formula like the two above for 10 by 10 matrices would have three and a half million terms. Gaussian elimination is efficient and fast for handling matrices of numbers, but evaluating symbolic determinants efficiently is a current research problem. I again mention: rows/columns are the same. Anything we're doing with rows is valid and correct for columns. The QotD was compute the determinant of (4 1 0 1) (0 2 1 1) (1 1 0 0) (0 0 2 2)using row operations as shown in this lecture. The answer, I declared, was 4. I hope people got 4. There are lots and lots and lots of algorithms for determinants. Next time we'll look at the official definition, and cofactor expansions. |
Maintained by greenfie@math.rutgers.edu and last modified 3/10/2004.