Important Note:
If you see funny characters in your display, and if you are using Xwindows
(for instance, under linux), please see the following file
for information on configuring your browser:
Xfonts
(unfortunately, printing will still give funny fonts  please use the pdf or
postscript versions of this document for printing!)
Introduction to Matrix Exponentials
Introduction to Matrix Exponentials
1 Definition of Exponentials
A system of autonomous linear differential equations can be written as
where
A is an
n by
n matrix and
Y =
Y(
t) is a vector listing the
n
dependent variables. (In most of what we'll do, we take
n = 2, since we study
mainly systems of 2 equations, but the theory is the same for all
n.)
If we were dealing with just one linear equation
then the general solution of the equation would be
e^{at}.
It turns out that
also for vector equations the solution looks like this,
provided that we interpret what we mean by ``e^{At}'' when A is a matrix
instead of just a scalar. How to define
e^{At}? The most obvious procedure
is to take the power series which defines the exponential, which as you
surely remember from Calculus is
e^{x} = 1 + x + 
1
2

x^{2} + 
1
6

x^{3} + ¼+ 
1
k!

x^{k} + ¼ 

and just formally plugin
x =
At. (The answer should be a matrix, so we have
to think of the term ``1'' as the identity matrix.) In summary, we
define :
e^{At} = I + At + 
1
2

(At)^{2} + 
1
6

(At)^{3} + ¼+ 
1
k!

(At)^{k} + ¼ 

where we understand the series as defining a series for each coefficient.
One may prove that:
e^{A(t+s)} = e^{At} e^{As} for all s,t . 
 (1) 
and therefore,
since (obviously)
e^{A0} =
I, using
s = 
t gives
(which is the matrix version of
e^{x} =
^{1}/
_{x}).
We now prove that this matrix exponential has the following property:

d e^{At}
dt

= A e^{At} = e^{At} A 
 (3) 
for every
t.
Proof:
Let us differentiate the series term by term:




d
dt


æ ç
è

I + At + 
1
2

(At)^{2} + 
1
6

(At)^{3} + ¼+ 
1
k!

(At)^{k} + ¼ 
ö ÷
ø


 

0 + A + A^{2}t + 
1
2

A^{3}t^{2} + ¼+ 
1
(k1)!

A^{k} t^{k1} +¼ 
 

A 
æ ç
è

I + At + 
1
2

(At)^{2} + 
1
6

(At)^{3} + ¼+ 
1
k!

(At)^{k} + ¼ 
ö ÷
ø


 


 

and a similar proof, factoring
A on the right instead of to the left,
gives the equality between the derivative and
e^{At} A.
(Small print: the differentiation termbyterm can be justified using facts
about term by term differentiation of power series inside their domain of
convergence.)
The property (
3) is the fundamental property of exponentials of
matrices. It provides us immediately with this corollary:
The initial value problem [(dY)/( dt)] = A Y,
Y(0) = Y_{0} has the unique solution Y(t) = e^{At} Y_{0}.
We can, indeed, verify that the formula Y(t) = e^{At} Y_{0} defines a solution
of the IVP:

d Y(t)
dt

= 
d e^{At}Y_{0}
dt

= 
d e^{At}
dt

Y_{0} = (A e^{At})Y_{0} = A (e^{At}Y_{0}) = A Y(t) . 

(That it is the unique, i.e., the only, solution is proved as follows: if
there were another solution
Z(
t) of the same IVP, then we could let
W(
t) =
Y(
t)
Z(
t) and notice that
W¢ =
Y¢
Z¢ =
A (
Y
Z) =
AW, and
W(0) =
Y(0)
Z(0) = 0.
Letting
V(
t) =
e^{At}W(
t), and applying the product rule, we have that
V¢ = Ae^{At}W + e^{At}W¢ = e^{At}AW + e^{At}AW = 0 

so that
V must be constant. Since
V(0) =
W(0) = 0, we have that
V must be
identically zero. Therefore
W(
t) =
e^{At}V(
t) is also identically zero, which
because
W =
Y
Z, means that the functions
Y and
Z are one and the same,
which is what we claimed.)
So we have, in theory, solved the general linear differential equation.
A potential problem is, however, that it is not always easy to calculate
e^{At}.
1.1 Some Examples
We start with this example:
We calculate the series by just multiplying
A by
t:
and now calculating the powers of
At.
Notice that, because
At is a diagonal matrix, its powers are very easy to
compute: we just take the powers of the diagonal entries
(why? if you don't understand, stop and think it over right now).
So, we get
e^{At} = 
æ ç
è


 
ö ÷
ø

+ 
æ ç
è


 
ö ÷
ø

+ 
1
2


æ ç
è


 
ö ÷
ø

+ 
1
6


æ ç
è


 
ö ÷
ø

+ ¼ + 
1
k!


æ ç
è


 
ö ÷
ø

+ ¼ 

and, just adding coordinatewise, we obtain:
e^{At} = 
æ ç ç ç
ç ç è


1 + t + 
1
2

t^{2} + 
1
6

t^{3} + ¼+ 
1
k!

t^{k} + ¼ 

 
1 + 2t + 
1
2

(2t)^{2} + 
1
6

(2t)^{3} + ¼+ 
1
k!

(2t)^{k} + ¼ 

 
ö ÷ ÷ ÷
÷ ÷ ø



which gives us, finally, the conclusion that
So, in this very special case we obtained the exponential by just taking the
exponentials of the diagonal elements and leaving the offdiagonal elements
zero (observe that we did not end up with exponentials of the nondiagonal
entries, since
e^{0} = 1, not 0).
In general, computing an exponential is a little more difficult than this, and
it is not enough to just take exponentials of coefficients.
Sometimes things that seem surprising (the first time that you see them)
may happen. Let us take this example now:
To start the calculation of the series, we multiply
A by
t:
and again calculate the powers of
At.
This is a little harder than in the first example, but not too hard:
and so on. We won't compute more, because by now you surely have recognized
the pattern (
right? ). We add these up (not forgetting the
factorials, of course):
e^{At} = 
æ ç
è


 
ö ÷
ø

+ 
æ ç
è


 
ö ÷
ø

+ 
1
2


æ ç
è


 
ö ÷
ø

+ 
1
3!


æ ç
è


 
ö ÷
ø

+ 
1
4!


æ ç
è


 
ö ÷
ø

+ ¼ 

and, just adding each coordinate, we obtain:
e^{At} = 
æ ç ç ç
ç ç è


1  
t^{2}
2

+ 
t^{4}
4!

 ¼ 

t  
t^{3}
3!

+ 
t^{5}
5!

 ¼ 

t + 
t^{3}
3!

 
t^{5}
5!

+ ¼ 

1  
t^{2}
2

+ 
t^{4}
4!

 ¼ 

 
ö ÷ ÷ ÷
÷ ÷ ø



which gives us, finally, the conclusion that
e^{(} 
 ) t = e^{At} = 
æ ç
è


 
ö ÷
ø

. 

It is remarkable that trigonometric functions have appeared.
Perhaps we made a mistake?
How could we make sure?
Well, let us
check that property (
3) holds (we'll check
only the first equality, you can check the second one).
We need to test that

d
dt


æ ç
è


 
ö ÷
ø

= A 
æ ç
è


 
ö ÷
ø

. 
 (6) 
Since (sin
t)
¢ = cos
t, and (cos
t)
¢ = sin
t, we know that

d
dt


æ ç
è


 
ö ÷
ø

= 
æ ç
è


 
ö ÷
ø



and, on the other hand, multiplying matrices:

æ ç
è


 
ö ÷
ø


æ ç
è


 
ö ÷
ø

= 
æ ç
è


 
ö ÷
ø



so we have verified the equality (
6).
As a last example, let us take this matrix:
Again we start by writing
and calculating the powers of
At.
It is easy to see that the powers are:
since this is obviously true for
k = 1 and, recursively, we have
(At)^{k+1} = (At)^{k} A = 
æ ç
è


 
ö ÷
ø


æ ç
è


 
ö ÷
ø

= 
æ ç
è


 
ö ÷
ø

= 
æ ç
è


 
ö ÷
ø

. 

Therefore,
To summarize, we have worked out three examples:

The first example (4) is a diagonal matrix, and we found that its
exponential is obtained by taking exponentials of the diagonal entries.

The second example (5) gave us an exponential matrix that was expressed
in terms of trigonometric functions. Notice that this matrix has imaginary
eigenvalues equal to i and i, where i = [Ö(1)].

The last example (7) gave us an exponential matrix which had a nonzero
function in the (1,2)position. Notice that this nonzero function
was not just the exponential of the (1,2)position in the original
matrix. That exponential would give us an e^{t} term. Instead, we got a more
complicated te^{t} term.
In a sense, these are all the possibilities. Exponentials of all two by two
matrices can be obtained using functions of the form
e^{at},
te^{at}, and trigonometric functions (possibly multiplied by
e^{at}).
(Not only that, but exponentials of any size matrices, not just 2 by 2,
can be expressed using just polynomial combinations of
t, scalar
exponentials, and trigs. We will not quite prove this fact here; see any
linear algebra book for the details.)
Calculating exponentials using power series is OK for very simple examples,
and important to do a few times, so that you understand what this all means.
But in practice, one uses very different methods for computing matrix
exponentials. (remember how you first saw the definition of derivative using
limits of incremental quotients, and computed some derivatives in this
way, but soon learned how to use ``the Calculus'' to calculate derivatives of
complicated expressions using the multiplication rule, chain rule, and so on.)
2 Computing Matrix Exponentials
We wish to calculate e^{At}.
The key concept for simplifying the computation of matrix exponentials is that
of matrix similarity .
Suppose that we have found two matrices, L and S, where S is
invertible,such that this formula holds:
(one says that
A and
L are similar matrices).
Then, we claim, it is true that also:
e^{At} = S e^{L t} S^{1} 
 (9) 
for all
t. Therefore, if the matrix
L is one for which
e^{L t} is easy
to find (for example, if it is a diagonal matrix), we can then multiply by
S
and
S^{1} to get
e^{At}.
To see why (
9) is a consequence of (
8), we just
notice that
At =
S (
L t)
S^{1} and we have the following ``telescopic''
property for powers:
(At)^{k} = [S(L t)S^{1}][S(L t)S^{1}]¼[S(L t)S^{1}] = S(L t)^{k}S^{1} 

since all the inbetween pairs
S^{1}S cancel out.
Therefore,



I + At + 
1
2

(At)^{2} + 
1
6

(At)^{3} + ¼+ 
1
k!

(At)^{k} + ¼ 
 

I + S(L t)S^{1} + 
1
2

S(L t)^{2}S^{1} + 
1
6

S(L t)^{3}S^{1} + ¼+ 
1
k!

S(L t)^{k}S^{1} + ¼ 
 

S 
é ê
ë

I + L t + 
1
2

(L t)^{2} + 
1
6

(L t)^{3} + ¼+ 
1
k!

(L t)^{k} + ¼ 
ù ú
û

S^{1} 
 


 

as we claimed.
The basic theorem is this one:
Theorem.
For every n by n matrix A, one can find an invertible matrix S and
an upper triangular matrix L such that (8) holds.
Remember that an upper triangular matrix is one that has the following form:

æ ç ç ç ç ç
ç ç ç ç ç è


 
ö ÷ ÷ ÷ ÷ ÷
÷ ÷ ÷ ÷ ÷ ø



where the stars are any numbers. The numbers
l_{1},
¼,
l_{n} turn out to
be the eigenvalues of
A.
There are two reasons that this theorem is interesting.
First, it provides a way to compute exponentials, because it is not difficult
to find exponentials of upper triangular matrices (the example (7)
is actually quite typical) and second because it has important theoretical
consequences.
(Two stronger theorems are possible. One is the ``Jordan canonical form''
theorem, which provides a matrix L that is not only upper triangular but
which has an even more special structure. Jordan canonical forms are not very
useful from a computational point of view, because they are what is known in
numerical analysis as ``numerically unstable'', meaning that small
perturbations of A can give one totally different Jordan forms.
A second strengthening is the ``Schur unitary triangularization theorem''
which says that one can pick the matrix S to be unitary. This last
theorem is extremely useful in practice, and is implemented in many numerical
algorithms.)
We do not prove the theorem here in general, but only show it for n = 2; the
general case can be proved in much the same way, by means of a recursive
process.
We start the proof by remembering that every matrix has at least one
eigenvalue, let us call it l, and an associate eigenvector, v.
That is to say, v is a vector different from zero, and
To find
l, we find a root of the characteristic equation
which, for twodimensional systems is the same as the equation
l^{2}  trace(A) l+ 
det
 (A) = 0 

and, recall,
(There are, for 2 by 2 matrices, either two real eigenvalues, one real
eigenvalue with multiplicity two, or two complex einegvalues.
In the last case, the two complex eingenvalues must be conjugates of
each other.)
An eigenvector associated to an eigenvalue
l is then found by solving
the linear equation
(there are an infinite number of solutions; we pick any nonzero one).
With an eigenvalue l and eigenvector v found, we next pick any
vector w with the property that the two vectors v and w are linearly
independent. For example, if
and
a is not zero, we can take
(what would you pick for
w is
a were zero?).
Now, since the set {
v,
w} forms a basis of twodimensional space,
we can find coefficients
c and
d so that
We can summarize both (
10) and (
11) in one matrix
equation:
A (v w) = (v w) 
æ ç
è


 
ö ÷
ø

. 

We let
S = (
v w) and
Then,
which is the same as what we wanted to prove, namely
A =
SL S^{1}.
Actually, we can even say more.
It is a fundamental fact in linear algebra that, if two matrices are similar,
then their eigenvalues must be the same.
Now, the eigenvalues of
L are
l and
d, because the eigenvalues of any
triangular matrix are its diagonal elements.
Therefore, since
A and
L are similar,
d must be also an eigenvalue of
A.
3 The Three Cases for n = 2
The following special cases are worth discussing in detail:

A has two different real eigenvalues.

A has two complex conjugate eigenvalues.

A has a repeated real eigenvalue.
In cases 1 and 2, one can always find a diagonal matrix L .
To see why this is true, let us go back to the proof, but now, instead of
taking just any linearly independent vector w, let us pick a special one,
namely an eigenvector corresponding to the other eigenvalue of A:
This vector is always linearly independent of
v, so the proof can be
completed as before.
Notice that
L is now diagonal, because
d =
m and
c = 0.
(Proof that v and w are linearly independent: if av+bw = 0,
then alv+bmw = A(av+bw) = 0.
On the other hand, multiplying av+bw = 0 by l we would
have alv+blw = 0. Subtracting we would obtain
b(lm)w = 0, and as lm ¹ 0 we would arrive at the conclusion
that bw = 0.
But w, being an eigenvector, is required to be nonzero, so we would have to
have b = 0. Plugging this back into our linear dependence would give
av = 0, which would require a = 0 as well.
This shows us that there are no nonzero coefficients a and b for which
av+bw = 0, which means that the eigenvectors v and w are linearly
independent.)
Notice that in cases 1 and 3, the matrices L and S are both real.
In case 1, we will interpret the solutions with initial conditions on the
lines that contain v and w as ``straight line solutions'' and this is
the subject of Section 3.2 in the book.
In case 2, the matrices L and S are, in general, not real.
Note that, in case 2, if Av = lv, taking complex conjugates gives
and we note that
because
l is not real.
So, we can always pick
w = the conjugate of
w.
It will turn out that solutions can be reexpressed in terms of trigonometric
functions  remember example (
5)  as we'll see
in the next section and in Section 3.4 of the book.
Now let's consider Case 3 (the repeated real eigenvalue).
We have that
so we can also write
L =
lI+
cN, where
N is the following matrix:
Observe that:
(lI+cN)^{2} = (lI)^{2}+c^{2}N^{2}+2lcN = l^{2}I + 2lcN 

(because
N^{2} = 0) and, for the general power
k, recursively:



(l^{k1}I + (k1)l^{k2}cN)(lI+cN) 
 

l^{k}I + (k1)l^{k1}cN + l^{k1}cN + (k1)l^{k2}a^{2}N^{2} 
 


 

so
(lI+cN)^{k}t^{k} = (l^{k}I + kl^{k1}cN)t^{k} = 
æ ç
è


 
ö ÷
ø



and therefore
because 0 +
ct + (2
lc)
t^{2}/2 + (3
l^{2}c)
t^{3}/6! +
¼ =
cte^{lt}.
(This generalizes the special case in example (
7).)
4 A Shortcut
If we just want to find the form of the general solution of Y¢ = AY,
we do not need to actually calculate the exponential of A and the inverse of
the matrix S.
Let us first take the cases of different eigenvalues (real or complex, that is,
cases 1 or 2, it doesn't matter which one). As we saw,
L can be taken to be the diagonal matrix consisting of these eigenvalues
(which we call here l and m instead of l_{1} and l_{2}),
and S = (v w) just lists the two eigenvectors as its columns.
We then know that the solution of every initial value problem Y¢ = AY,
Y(0) = Y_{0} will be of the following form:
Y(t) = e^{At} Y_{0} = S e^{L t} S^{1} Y_{0} = (v w) 
æ ç
è


 
ö ÷
ø


æ ç
è


 
ö ÷
ø

= a e^{lt}v + b e^{mt}w 

where we just wrote
S^{1}Y_{0} as a column vector of general coefficients
a and
b.
In conclusion:
The general solution of Y¢ = AY, when A has two eigenvalues l and m
with respective eigenvectors v and w, is of the form
for some constants a and b.
So, one approach to solving IVP's is to first find eigenvalues and
eigenvectors, write the solution in the above general form, and then plugin
the initial condition in order to figure out what are the right constants.
Section 3.2 in the book gives us lots of practice with this procedure.
In the case of nonreal eigenvalues, recall that we showed that the two
eigenvalues must be conjugates of each other, and the two eigenvectors may
be picked to be conjugates of each other.
Let us show now that we can write (13) in a form which
does not involve any complex numbers.
In order to do so, we start by decomposing the first vector function which
appears in (13) into its real and imaginary parts:
e^{lt}v = Y_{1}(t) + i Y_{2}(t) 
 (14) 
(let us not worry for now about what the two functions
Y_{1} and
Y_{2} look
like).
Since
m is the conjugate of
l and
w is the conjugate of
v, the
second term is:
e^{mt}w = Y_{1}(t)  i Y_{2}(t) . 
 (15) 
So we can write the general solution shown in (
13) also
like this:
a(Y_{1}+iY_{2}) + b(Y_{1}iY_{2}) = (a+b) Y_{1} + i(ab)Y_{2} . 
 (16) 
Now, it is easy to see that
a and
b must be conjugates of each other.
(Do this as an optional homework problem. Use the fact that these two
coefficents are the components of
S^{1} Y_{0}, and the fact that
Y_{0} is
real and that the two columns of
S are conjugates of each other.)
This means that
both coefficients a+b and i(ab) are real numbers.
Calling these coefficients ``
k_{1}'' and ``
k_{2}'', we can summarize the
complex case like this:
The general solution of Y¢ = AY, when A has a nonreal eigenvalue l
with respective eigenvector v, is of the form
k_{1} Y_{1}(t) + k_{2} Y_{2}(t) 
 (17) 
for some real constants k_{1} and k_{2}.
The functions Y_{1} and Y_{2} are found by the following procedure: calculate
the product e^{lt}v and separate it into real and imaginary parts
as in Equation (14).
What do Y_{1} and Y_{2} really look like? This is easy to answer using
Euler's formula, which gives
e^{lt} = e^{at+ibt} = e^{at}(cosbt + isinbt) = e^{at}cosbt +i e^{at}sinbt 

where
a and
b are the real and imaginary parts of
l respectively.
This is what we do in Section 3.4 of the book.
Finally, in case 3 (repeated eigenvalues) we can write, instead:

Y(t) = e^{At} Y_{0} = S e^{L t} S^{1} Y_{0} 


(v w) 
æ ç
è


 
ö ÷
ø


æ ç
è


 
ö ÷
ø


 

a e^{lt}v + b e^{lt}(c tv + w) . 

 

When
c = 0 we have from
A =
SL S^{1} that
A must have been the diagonal
matrix
to start with (because
S and
L commute).
When
c ¹ 0, we can write
k_{2} =
bc and redefine
w as
^{1}/
_{c}w.
Note that then (
11) becomes
Aw =
v +
lw, that is, (
A
lI)
w =
v.
Any vector
w with this property is linearly independent from
v (why?).
So we conclude, for the case of repeated eigenvalues:
The general solution of Y¢ = AY, when A has a repeated (real) eigenvalue l
is either of the form e^{lt}Y_{0} (if A is a diagonal matrix) or,
otherwise, is of the form
k_{1} e^{lt}v + k_{2} e^{lt}(tv + w) 
 (18) 
for some real constants k_{1} and k_{2}, where v is an eigenvector
corresponding to l and w is
any vector which satisfies (AlI)w = v.
Observe that (AlI)^{2} w = (AlI)v = 0.
general, one calls any nonzero vector such that (AlI)^{k} w = 0 a
generalized eigenvector (of order k) of the matrix A (since, when
k = 1, we have eigenvectors).
5 Forcing Terms
The use of matrix exponentials also helps explain much of what is done in
chapter 4 (forced systems), and renders Laplace transforms unnecessary. Let
us consider nonhomogeneous linear differential equations of this type:

dY
dt

(t) = A Y(t) + u(t) . 
 (19) 
We wrote the arguments ``
t'' just this one time, to emphasize that
everything is a function of
t, but from now on we will drop the
t's when
they are clear from the context.
Let us write, just as we did when discussing scalar linear equations ,
Y¢AY = u.
We consider the ``integrating factor'' M(t) = e^{At}.
Multiplying both sides of the equation by M, we have,
since (e^{At}Y)¢ = e^{At}Y¢ e^{At}AY (right?):

de^{At}Y
dt

= e^{At}u . 

Taking antiderivatives:
e^{At}Y = 
ó õ

t
0

e^{As}u(s) ds + Y_{0} 

for some constant vector
Y_{0}.
Finally, multiplying by
e^{At} and remembering that
e^{At}e^{At} =
I, we
conclude:
Y(t) = e^{At} Y_{0} + e^{At} 
ó õ

t
0

e^{As}u(s) ds . 
 (20) 
This is sometimes called the ``variation of parameters'' form of the general
solution of the forced equation (
19).
Of course,
Y_{0} =
Y(0) (just plugin
t = 0 on both sides).
Notice that, if the vector function u(t) is a polynomial in t, then the
integral in (20) will be a combination of exponentials and powers of
t (integrate by parts). Similarly, if u(t) is a combination of
trigonometric functions, the integral will also combine trigs and polynomials.
This observation justifies the ``guesses'' made for forced systems in chapter
4 (they are, of course, not guesses, but consequences of integration by parts,
but the book would lead you to believe otherwise).
6 Exercises
 In each of the following, factor the matrix A into a product SL S^{1},
with L diagonal:

For each of the matrices in Exercise 1, use the SL S^{1}
factorization to calculate A^{6} (do not just mutiply A by itself).

For each of the matrices in Exercise 1, use the SL S^{1}
factorization to calculate e^{At}.

Calculate e^{At} for this matrix:
using the power series definition.

Consider these matrices:
A = 
æ ç
è


 
ö ÷
ø

B = 
æ ç
è


 
ö ÷
ø



and calculate e^{At}, e^{Bt}, and e^{(A+B)t}.
Answer, true or false: is e^{At}e^{Bt} = e^{(A+B)t}?

(Challenge problem)
Show that, for any two matrices A and B, it is true
that
e^{At}e^{Bt} = e^{(A+B)t} for all t 

if and only if ABBA = 0.
(The expression ``ABBA'' is called the ``Lie bracket'' of the two matrices
A and B, and it plays a central role in the advanced theory of
differential equations.)
File translated from T_{E}X by T_{T}H, version 2.00.
On 16 Jan 2000, 17:44.