$35
Bonus Project
Vector Norms
A vector norm assigns a size to a vector, in such a way that scalar multiples do what we
expect, and the triangle inequality is satisfied. There are four common vector norms in
dimensions:
The vector norm
The (or ``Euclidean'') vector norm
The vector norm
The vector norm
To compute the norm of a vector in Matlab:
norm(x,1);
norm(x,2)= norm(x);
norm(x,p);
norm(x,inf)
(Recall that inf is the Matlab name corresponding to .)
Exercise 1: For each of the following column vectors:
x1 = [ 4; 6; 7 ]
x2 = [ 7; 5; 6 ]
x3 = [ 1; 5; 4 ]
compute the vector norms, using the appropriate Matlab commands. Be sure your
answers are reasonable.
L1 L2 L Infinity
x1 __________ __________ __________
x2 __________ __________ __________
x3 __________ __________ __________
Matrix Norms
A matrix norm assigns a size to a matrix, again, in such a way that scalar multiples do what
we expect, and the triangle inequality is satisfied. However, what's more important is that we
want to be able to mix matrix and vector norms in various computations. So we are going to
be very interested in whether a matrix norm is compatible with a particular vector norm, that
is, when it is safe to say:
There are four common matrix norms and one ``almost'' norm:
The or ``max column sum'' matrix norm:
Remark: This is not the same as the norm of the vector of dimension whose
components are the same as .
The matrix norm:
where is a (necessarily real and non-negative) eigenvalue of or
where is a singular value of ;
The or ``max row sum'' matrix norm:
Remark: This is not the same as the norm of the vector of dimension whose
components are the same as .
There is no matrix norm in Matlab.
The ``Frobenius'' matrix norm:
Remark: This is the same as the norm of the vector of dimension whose
components are the same as .
The spectral radius (not a norm):
(only defined for a square matrix), where is a (possibly complex) eigenvalue of .
To compute the norm of a matrix in Matlab:
norm(A,1);
norm(A,2)=norm(A);
norm(A,inf);
norm(A,'fro')
See below for computation of (the spectral radius of )
Compatible Matrix Norms
A matrix can be identified with a linear operator, and the norm of a linear operator is usually
defined in the following way.
(It would be more precise to use rather than here but the surface of a sphere in finitedimensional space is a compact set, so the supremum is attained, and the maximum is
correct.) A matrix norm defined in this way is said to be ``vector-bound'' to the given vector
norm.
In order for a matrix norm to be consistent with the linear operator norm, you need to be able
to say the following:
(1)
but this expression is not necessarily true for an arbitrarily chosen pair of matrix and vector
norms. When it is true, then the two are ``compatible''.
If a matrix norm is vector-bound to a particular vector norm, then the two norms are
guaranteed to be compatible. Thus, for any vector norm, there is always at least one matrix
norm that we can use. But that vector-bound matrix norm is not always the only choice. In
particular, the matrix norm is difficult (time-consuming) to compute, but there is a simple
alternative.
Note that:
The , and matrix norms can be shown to be vector-bound to the
corresponding vector norms and hence are guaranteed to be compatible with them;
The Frobenius matrix norm is not vector-bound to the vector norm, but is
compatible with it; the Frobenius norm is much faster to compute than the matrix
norm (see Exercise 6 below).
Exercise 2: Consider each of the following column vectors:
x1 = [ 4; 6; 7 ];
x2 = [ 7; 5; 6 ];
x3 = [ 1; 5; 4 ];
and each of the following matrices
A1 = [38 37 80
53 49 49
23 85 46];
A2 = [77 89 78
6 34 10
65 36 26];
Check that the compatibility condition (1) holds for each case in the following table by
computing the ratios
and filling in the following table. The third column should contain the letter ``S'' if the
compatibility condition is satisfied and the letter ``U'' if it is unsatisfied.
Suggestion: I recommend you write a script m-file for this exercise. If you do, please
include it with your summary.
Matrix Vector
norm norm S/U r(A1,x1) r(A1,x2) r(A1,x3) r(A2,x1) r(A2,x2) r(A2,x3)
(p) (q)
1 1 ___ _________ __________ __________ __________ __________ __________
1 2 ___ _________ __________ __________ __________ __________ __________
1 inf ___ _________ __________ __________ __________ __________ __________
2 1 ___ _________ __________ __________ __________ __________ __________
2 2 ___ _________ __________ __________ __________ __________ __________
2 inf ___ _________ __________ __________ __________ __________ __________
inf 1 ___ _________ __________ __________ __________ __________ __________
inf 2 ___ _________ __________ __________ __________ __________ __________
inf inf ___ _________ __________ __________ __________ __________ __________
Types of Errors
A natural assumption to make is that the term ``error'' refers always to the difference between
the computed and exact ``answers.'' We are going to have to discuss several kinds of error, so
let's refer to this first error as ``solution error'' or ``forward error.'' Suppose we want to solve a
linear system of the form , (with exact solution ) and we computed . We
define the solution error as .
Usually the solution error cannot be computed, and we would like a substitute whose behavior
is acceptably close to that of the solution error. One example would be during an iterative
solution process where we would like to monitor the progress of the iteration but we do not
yet have the solution and cannot compute the solution error. In this case, we are interested in
the ``residual error'' or ``backward error,'' which is defined by
where, for convenience, we have defined the variable
to equal . Another way of looking at the residual error is to see that it's
telling us the difference between the right hand side that would ``work'' for versus
the right hand side we have.
If we think of the right hand side as being a target, and our solution procedure as determining
how we should aim an arrow so that we hit this target, then
The solution error is telling us how badly we aimed our arrow;
The residual error is telling us how much we would have to move the target in order
for our badly aimed arrow to be a bullseye.
There are problems for which the solution error is huge and the residual error tiny, and all the
other possible combinations also occur.
Exercise 3: For each of the following cases, compute the Euclidean norm (two norm)
of the solution error and residual error and characterize each as ``large'' or ``small.''
(For the purpose of this exercise, take ``large'' to mean greater than 1 and ``small'' to
mean smaller than 0.01.) In each case, you must find the true solution first (Pay
attention! In each case you can solve in your head for the true solution, xTrue.), then
compare it with the approximate solution xApprox.
1. A=[1,1;1,(1-1.e-12)], b=[0;0], xApprox=[1;-1]
2. A=[1,1;1,(1-1.e-12)], b=[1;1], xApprox=[1.00001;0]
3. A=[1,1;1,(1-1.e-12)], b=[1;1], xApprox=[100;100]
4. A=[1.e+12,-1.e+12;1,1], b=[0;2], xApprox=[1.001;1]
Case Residual large/small xTrue Error large/small
1 ________ _______ ______ _______ ________
2 ________ _______ ______ _______ ________
3 ________ _______ ______ _______ ________
4 ________ _______ ______ _______ ________
The linear system ``problem''
The linear system ``problem'' can be posed in the following way. Find an -
vector that satisfies the matrix equation
(1)
where is a matrix and is a -vector. In Matlab,
both and are column vectors.
You probably already know that there is ``usually" a solution if the matrix is
square, (that is, if ). We will concentrate on this case for now. But you may
wonder if there is an intelligent response to this problem for the cases of a square
but singular matrix, or a rectangular system, whether overdetermined or
underdetermined.
At one time or another, you have probably been introduced to several algorithms
for producing a solution to the linear system problem, including Cramer's rule
(using determinants), constructing the inverse matrix, Gauß-Jordan elimination and
Gauß factorization. We will see that it is usually not a good idea to construct the
inverse matrix and we will focus on Gauß factorization for solution of linear
systems.
We will be concerned about several topics:
Efficiency: what algorithms produce a result with less work?
Accuracy: what algorithms produce an answer that is likely to be more
accurate?
Difficulty: what makes a problem difficult or impossible to solve?
Special cases: how do we solve problems that are big? symmetric? banded?
singular? rectangular?
The inverse matrix
A classic result from linear algebra is the following:
Theorem The linear system problem (1) is uniquely solvable for arbitrary if and
only if the inverse matrix exists. In this case the solution can be expressed
as .
So what's the catch? There are a few:
Computing the inverse takes a lot of time-more time than is necessary if you
only want to know the solution of a particular system.
In cases when the given matrix actually has very many zero entries, as is the
case with the dif2 matrix, computing the inverse is enormously more
difficult and time consuming than merely solving the system. And it is quite
often true that the inverse matrix requires far more storage than the matrix
itself. The second difference matrix, for example, is an ``M-matrix'' and it
can be proved that all the entries in its inverse are positive numbers. You
only need about numbers to store the dif2 matrix because you can often
avoid storing the zero entries, but you need numbers to store its inverse.
Sometimes the inverse is so inaccurate that it is not worth the trouble to
multiply by the inverse to get the solution.
In the following exercises, you will see that constructing the inverse matrix takes
time proportional to (for an matrix), as does simply solving the system,
but that solving the system is several times faster than constructing the inverse
matrix. You will also see that matrices with special formats, such as tridiagonal
matrices, can be solved very efficiently. And you will see even simple matrices can
be difficult to numerically invert.
You will be measuring elapsed time in order to see how long these calculations
take. The problem sizes are designed to take a modest amount of time (less than 6
minutes) on newer computers.
WARNING: Most newer computers have more than one processor (core). By
default, Matlab will use all the processors available by assigning one ``thread'' to
each available processor. This procedure will mess up our timings, so you should
tell Matlab to use only a single thread. To do this, you should start up Matlab with
the command line
matlab -singleCompThread
If you do not know how to start up Matlab from the command line, use the
following command at the beginning of your session.
maxNumCompThreads(1);
Matlab will warn you that this command will disappear in the future, but it should
work fine now.
Exercise 4: The Matlab command inv(A) computes an approximation for the
inverse of a matrix A. Do not worry for now just how it does it, but be
assured that it uses one of the most efficient and reliable methods available.
Matlab provides the commands tic and toc to measure computational time.
The tic command starts the stopwatch, and the toc command stops it, either
printing the elapsed time or returning its value as in the
expression elapsedTime=toc;. The times are in seconds.
(Note: tic and toc measure elapsed time. When a computer is doing more
than one thing, the cputime function can be used to measure how much
computer time is being used for this task alone.)
1. Copy the following code to a Matlab function m-file
named exer4.m and modify it to produce information for the table
below. Be sure to add comments and your name and the date. Include
the file with your summary.
2. function elapsedTime=exer4(n)
3. % elapsedTime=exer4(n)
4. % comments
5.
6. % your name and the date
7.
8. if mod(n,2)==0
9. error('Please use only odd values for n');
10. end
11.
12. A = magic(n); % only odd n yield invertible matrices
13. b = ones(n,1); % the right side vector doesn't change the
time
14. tic;
15. Ainv = ??? % compute the inverse matrix
16. xSolution = ??? % compute the solution
17. elapsedTime=toc;
18.You have seen in lecture that the time required to invert
an matrix should be proportional to . Fill in the following
table, where the column entitled ``ratio'' should contain the ratio of the
time for n divided by the time for the preceeding value of n.
(Note: timings are not precise! Furthermore, the first time a function
is used results in Matlab reading the m-file and ``compiling it,'' and
that takes considerable time. The last row of this table may take
several minutes!)
Remark: Compute the first line of the table twice and use the second
value. The first time a function is called involves substantial overhead
that later calls do not require.
19. Time to compute inverse matrices
20. n time ratio
21. 161 _________
22. 321 _________ _________
23. 641 _________ _________
24. 1281 _________ _________
25. 2561 _________ _________
26. 5121 _________ _________
27. 10241 _________ _________
28.Are these solution times roughly proportional to ?
Exercise 5: Matlab provides a special operator, the backslash (\) operator,
that is designed to solve a linear system without computing the inverse. It is
used in the following way, for a matrix A and column vector b.
x=A\b;
It may help you to remember this operator if you think of the A as being
``underneath'' the slash. The effect of this operator is to find the solution of
the system of equations A*x=b.
The backslash looks strange, but there is a method to this madness. You
might wonder why you can't just write x=b/A. This would put the column
vector bto the left of the matrix A and that is the wrong order of operations.
1. Copy exer4.m to exer5.m and replace the inverse matrix computation
and solution with an expression using the Matlab ``\'' command, and
fill in the following table
2. Time to compute solutions
3. n time ratio
4. 161 _________
5. 321 _________ _________
6. 641 _________ _________
7. 1281 _________ _________
8. 2561 _________ _________
9. 5121 _________ _________
10. 10241 _________ _________
11.Are these solution times roughly proportional to ?
12.Compare the times for the inverse and solution and fill in the
following table
13. Comparison of times
14. n (time for inverse)/(time for solution)
15. 161 _________
16. 321 _________
17. 641 _________
18. 1281 _________
19. 2561 _________
20. 5121 _________
21. 10241 _________
22.Theory shows that computation of a matrix inverse should take
approximately three times as long as computation of the solution. Are
your results consistent with theory?
You should be getting the message: ``You should never compute an inverse
when all you want to do is solve a system.''
Warning: the ``\'' symbol in Matlab will work when the matrix A is not square. In
fact, sometimes it will work when A actually is a vector. The results are not usually
what you expect and no error message is given. Be careful of this potential for error
when using ``\''. A similar warning is true for the / (division) symbol because
Matlab will try to ``interpret'' it if you use it with matrices or vectors and you will
get ``answers'' that are not what you intend.
Gauß factorization
The standard method for computing the inverse of a matrix is Gaußian elimination.
You already know this algorithm, perhaps by another name such as row reduction..
First, we will prefer to think of the process as having two separable parts, the
``factorization'' or ``forward substitution'' and ``back substitution'' steps.
We now turn to writing code for the Gauß factorization. The discussion in this lab
assumes that you are familiar with Gaußian elimination (sometimes called ``row
reduction''), and is entirely self-contained. Because, however, interchanging rows is
a source of confusion, we will assume at first that no row interchanges are
necessary.
The idea is that we are going to use the row reduction operations to turn a given
matrix into an upper triangular matrix (all of whose elements below the main
diagonal are zero) and at the same time keep track of the multipliers in the lower
triangular matrix . These matrices will be built in such a way that could be
reconstructed from the product at any time during the factorization procedure.
Alert: The fact that even before the computation is completed can be a
powerful debugging technique. If you get the wrong answer at the end of a
computation, you can test each step of the computation to see where it has gone
wrong. Usually you will find an error in the first step, where the error is easiest to
understand and fix.
Let's do a simple example first. Consider the matrix
There is only one step to perform in the row reduction process: turn the 1 in the
lower left into a 0 by subtracting half the first row from the second. Convince
yourself that this process can be written as
(2)
If you want to write, , you need to have be the inverse of the first matrix
on the left in (2). Thus
and
are matrices that describe the row reduction step ( ) and its result ( ) in such
a way that the original matrix can be recovered as .
The Hilbert matrix
The Hilbert matrix is related to the interpolation problem on the interval [0,1]. The
matrix is given by the formula . For example,
with the Hilbert matrix is:
The Hilbert matrix arises in interpolation and approximation contexts because it
happens that . The Hilbert matrix is at once nice because
its inverse has integer elements and also not nice because it is difficult to compute
the inverse accurately using the usual formulæ to invert a matrix.
Now, suppose you have a 5 by 5 Hilbert matrix (switching to Matlab notation) A,
and you wish to perform row reduction steps to turn all the entries in the first
column below the first row into zero, and keep track of the operations in the
matrix L and the result in U. Convince yourself that the following code does the
trick.
Matlab has special functions for the Hilbert matrix and its inverse,
called hilb(n) and invhilb(n), and we will be using these functions in this lab.
n=5;
Jcol=1;
A=hilb(5);
L=eye(n); % square n by n identity matrix
U=A;
for Irow=Jcol+1:n
% compute Irow multiplier and save in L(Irow,Jcol)
L(Irow,Jcol)=U(Irow,Jcol)/U(Jcol,Jcol);
% multiply row "Jcol" by L(Irow,Jcol) and subtract from row "Irow"
% This vector statement could be replaced with a loop
U(Irow,Jcol:n)=U(Irow,Jcol:n)-L(Irow,Jcol)*U(Jcol,Jcol:n);
end
Exercise 6:
1. Use the above code to compute the first row reduction steps for the
matrix A=hilb(5). Print the resulting matrix U and include it in the
summary report. Check that all entries in the first column in the
second through last rows are zero or roundoff. (If any are non-zero,
you have an error somewhere. Find the error before proceeding.)
2. Multiply L*U and confirm for yourself that it equals A. An easy way to
do this is to compute norm(L*U-A,'fro')/norm(A,'fro'). (As you have
seen, the Frobenius norm is faster to compute than the default 2-norm,
especially for large matrices.)
3. Use the code given above as a model for an m-file
called gauss_lu.m that performs Gaußian reduction without pivoting.
The function should start out
4. function [L,U]=gauss_lu(A)
5. % function [L,U]=gauss_lu(A)
6. % performs an LU factorization of the matrix A using
7. % Gaussian reduction.
8. % A is the matrix to be factored.
9. % L is a lower triangular factor with 1's on the diagonal
10. % U is an upper triangular factor.
11. % A = L * U
12.
13. % your name and the date
14.Use your routine to find L and U for the Hilbert matrix of order 5.
Include L and U in your summary.
15.What is U(5,5), to at least four significant figures?
16.Verify that L is lower triangular (has zeros above the diagonal)
and U is upper triangular (has zeros below the diagonal).
17.Confirm that L*U recovers the original matrix.
18.The Matlab expression R=rand(100,100) will generate
a matrix of random entries. Do not print this matrix-it has
10,000 numbers in it.Use gauss_lu to find its factors LR and UR.
o Use norms to confirm that LR*UR=R, without printing the
matrices.
o Use tril and triu to confirm that LR is lower triangular
and UR is upper triangular, without printing the matrices.
It turns out that omitting pivoting leaves the function you just wrote vulnerable to
failure.
Exercise 7:
1. Compute L1 and U1 for the matrix
2. A1 = [ -2 1 0 0 0
3. 1 0 1 -2 0
4. 0 0 0 1 -2
5. 1 -2 1 0 0
6. 0 1 -2 1 0]
The method should fail for this matrix, producing matrices with
infinity (inf) and ``Not a Number'' (NaN) in them.
7. On which step of the decomposition (values of Irow and Jcol) does
the method fail? Why does it fail?
8. What is the determinant of A1? What is the condition number
(cond(A1))? Is the matrix singular or ill-conditioned?
In Gaußian elimination it is entirely possible to end up with a zero on the diagonal,
as the previous exercise demonstrates. Since you cannot divide by zero, the
procedure fails. It is also possible to end up with small numbers on the diagonal
and big numbers off the diagonal. In this case, the algorithm doesn't fail, but
roundoff errors can overwhelm the procedure and result in wrong answers. One
solution to these difficulties is to switch rows and columns around so that the
largest remaining entry in the matrix ends up on the diagonal. Instead of this ``full
pivoting'' strategy, it is almost as effective to switch the rows only, instead of both
rows and columns. This is called ``partial pivoting..