Introduction To Applied Mathematics Strang Pdf

Book cover Introduction to applied mathematics
Introduction to applied mathematics

Gilbert Strang
How much do you like this book?
What's the quality of the file?
Download the book for quality assessment
What's the quality of the downloaded files?
Introduction to Applied Math offers a comprehensive introductory treatment of the subject. The author's explanations of Applied Mathematics are clearly stated and easy to understand. The reference includes a wide range of timely topics from symmetric linear systems to optimization as well as illuminating hands-on examples.
Chapter 1: Symmetric Linear Systems; Chapter 2: Equilibrium Equations; Chapter 3: Equilibrium in the Continuous Case; Chapter 4: Analytical Methods; Chapter 5: Numerical Methods; Chapter 6: Initial-Value Problems; Chapter 7: Network Flows and Combinatorics; Chapter 8: Optimization; Software for Scientific Computing.
Publisher:
Wellesley-Cambridge Press
The file will be sent to your email address. It may take up to 1-5 minutes before you receive it.
The file will be sent to your Kindle account. It may takes up to 1-5 minutes before you received it.
Please note: you need to verify every book you want to send to your Kindle. Check your mailbox for the verification email from Amazon Kindle.
INTRODUCTION TO APPLIED MATHEMATICS GILBERT STRANG O O 6 O WELLESLEY-CAMBRIDGE PRESS  This page is intentionally left blank  INTRODUCTION TO APPLIED MATHEMATICS GILBERT STRANG Massachusetts Institute of Technology WELLESLEY-CAMBRIDGE PRESS (§) Box 812060 Wellesley, Massachusetts 02181  Introduction to Applied Mathematics Copyright ©1986 by Gilbert Strang All rights reserved. No part of this work may be reproduced or stored or transmitted by any means, electronic or mechanical, including photocopying and recording, or by any information storage or retrieval system, without the written permission of the publisher, except for brief passages quoted by a reviewer. Translation into any language is strictly prohibited - authorized translations are arranged. Designed by Michael Michaud, Unicorn Production Services Printed in the United States of America 7 8 9 Published by Wellesley-Cambridge Press Box 812060 Wellesley MA 02181 USA (617)431-8488 FAX (617)253-4358 gs@math.mit.edu QA37.2.S87 1986 510 84-52450 ISBN 0-9614088-0-4 A manual for instructors is available from the publisher. Other texts from Wellesley-Cambridge Press Wavelets and Filter Banks Gilbert Strang and Truong Nguyen ISBN 0-9614088-7-1 (1996) Introduction to Linear Algebra Gilbert Strang ISBN 0-9614088-5-5 (1993) Calculus Gilbert Strang ISBN 0-9614088-2-0 (1991) An Analysis of the Finite Element Method Gilbert Strang and George Fix ISBN 0-13-032946-0 (1973)  CONTENTS 1. SYMMETRIC LINEAR SYSTEMS 1.1 Introduction 1 1.2 Gaussian Elimination 4 1.3 Positive Definite Matrices and A = LDLT 15 1.4 Minimum Principles 32 1.5 Eigenvalues and Dynamical Systems 47 1.6 A Review of Matrix Theory 68 2. EQUILIBRIUM EQUATIONS 2.1 A Framework for the Applications 87 2.2 Constraints and Lagrange Multipliers 96 2.3 Electrical Networks 110 2.4 Structures in Equilibrium 123 2.5 Least Squares Estimation and the Kalman Filter 137 3. EQUILIBRIUM IN THE CONTINUOUS CASE 3.1 Introduction: One-dimensional Problems 155 3.2 Differential Equations of Equilibrium 166 3.3 Laplace's Eq; uation and Potential Flow 182 3.4 Vector Calculus in Three Dimensions 199 3.5 Equilibrium of Fluids and Solids 220 3.6 Calculus of Variations 242  ii Contents 4. ANALYTICAL METHODS 4.1 Fourier Series and Orthogonal Expansions 263 4.2 Discrete Fourier Series and Convolution 290 4.3 Fourier Integrals 309 4.4 Complex Variables and Conformal Mapping 330 4.5 Complex Integration 352 5. NUMERICAL METHODS 5.1 Linear and Nonlinear Equations 367 5.2 Orthogonalization and Eigenvalue Problems 382 5.3 Semi-direct and Iterative Methods 403 5.4 The Finite Element Method 428 5.5 The Fast Fourier Transform 448 6. INITIAL-VALUE PROBLEMS 6.1 Ordinary Differential Equations 471 6.2 Stability and the Phase Plane and Chaos 492 6.3 The Laplace Transform and the z-Transform 513 6.4 The Heat Equation vs. the Wave Equation 536 6.5 Difference Methods for Initial-Value Problems 562 6.6 Nonlinear Conservation Laws 587 7. NETWORK FLOWS AND COMBINATORICS 7.1 Spanning Trees and Shortest Paths 607 7.2 The Marriage Problem 620 7.3 Matching Algorithms 629 7.4 Maximal Flow in a Network 640 7.5 The Transportation Problem 651 8. OPTIMIZATION 8.1 Introduction to Linear Programming 665 8.2 The Simplex Method and Karmarkar's Method 673 8.3 Duality in Linear Programming 692 8.4 Saddle Points (Minimax) and Game Theory 704 8.5 Nonlinear Optimization 718 Appendix: Software for Scientific Computing 735 References and Acknowledgements 738 Solutions to Selected Exercises 741 Index 751  PREFACE I believe that the teaching of applied mathematics needs a fresh approach. That opinion seems to be widely shared. Many of the textbooks in use today were written a generation ago, and they cannot reflect the ideas (or the algorithms) that have brought such significant change. Certainly there are things that will never be different—but even for the solution of Laplace's equation a great deal is new. In addition to Fourier series and complex variables, it is time to see fast transforms and finite elements. Topics like stability and optimization and matrix methods have earned a more central place—perhaps at the expense of series solutions of differential equations. Applied mathematics is alive and very vigorous. That ought to be reflected in our teaching. In my own class I became convinced that the textbook is crucial. It must provide a framework into which the applications will fit. A good course has a clear purpose, and you can sense that it is there. It is a pleasure to teach a subject when it is moving forward, and this one is—but the book has to share in that spirit and help to establish it. The central topics are differential equations and matrix equations—the continuous and discrete. They reinforce each other because they go in parallel. On one side is calculus; on the other is algebra. Differential equations and their transforms are classical, but still beautiful and essential. Nothing is out of date about Fourier! At the same time this course must develop the discrete analogies, in which potential differences rather than derivatives drive the flow. Those analogies are not difficult; they are basic, and I have found that they are welcome. To see the cooperation between calculus and linear algebra is to see one of the best parts of modern applied mathematics. I am also convinced that these equations are better understood—they are more concrete and useful—when we present at the same time the algorithms that solve  viii Preface them. Those algorithms give support to the theory. And in general, numerical methods belong with the problems they solve. I do not think that the Fast Fourier Transform and difference equations and numerical linear algebra belong only in some uncertain future course. It is this course that should recognize what the computer can do, without being dominated by it. Perhaps I may give special mention to the role of linear algebra. If any subject has become indispensable, that is it. The days of explaining matrix notation in an appendix are gone; this book assumes a working knowledge of matrices. It begins with the solution of Ax = b—not by computing A~l, but by recognizing that as elimination goes to an upper triangular matrix, A is being factored into LU or LDlJ. A fuller treatment is given in my textbook Linear Algebra and Its Applications (3rd edition, HBJ, 1988). Here we start with the essential facts of linear algebra, and put them to use. I do not think that the right approach is to model a few isolated examples. The important goal is to find ideas that are shared by a wide range of applications. This is the contribution a book can make, to recognize and explain the underlying pattern. We need to present KirchhofFs laws, and to see how they lead in the continuous case to the curl and divergence, but we hope to avoid a total and fatal immersion into vector calculus—which has too frequently replaced applied mathematics, and taken all the fun out of it. It seems natural for the discrete case to come first, but Chapter 1 should not be too slow. It is Chapter 2 that points out the triple product ATCA in the equations of equilibrium, and Chapter 3 finds that framework repeated by differential equations. Applications and examples are given throughout. Where the theory strengthens the understanding, it is provided—but this is not a book about proofs. After the equations are formulated they need to be solved. Chapter 4 uses Fourier methods and complex variables. Chapter 5 uses numerical methods. It presents scientific computing as an integral part of applied mathematics, with the same successes (especially through Fourier) and the same difficulties (including stability in Chapter 6). Orthogonality is central, as it is for analytical methods. The book is a text for applied mathematics and advanced calculus and engineering mathematics. It aims to explain what is essential, as far as possible within one book and one year. It also reaches beyond the usual courses, to introduce more recent ideas: 2.5 The Kalmah filter 5.3 Iterative methods for Ax = b and Ax = Xx 5.4 Finite elements 6.2 Chaos and strange attractors 6.6 Shock waves and solitons 7.1 Combinatorial optimization 7.4 Maximal flows and minimal cuts in networks 8.2 Karmarkar's method for linear programming You will not have time for all of these; I do not. Those that are unfamiliar may stay that way; no objection. Nevertheless they are part of this subject and they fit perfectly into its framework—even Karmarkar's method has a place, whether or  Preface ix not it fulfills everything that has been promised. (It outdoes the simplex method on some problems but by no means on all.) It solves the piecewise linear equations of mathematical programming through a series of linear equations, whose coefficient matrices are again ATCA. Finite elements and recursive filters and Laplace's equation fit this same pattern. In my experience the framework is seen and understood by the reader, and appreciated. The whole subject is extremely coherent, and sections or chapters which are left for reference are no less valuable on that account. In fact this is also a textbook on numerical analysis (with applications included in the course, not separated) and on optimization (emphasizing the quadratic problems of Chapter 2, the network flows of Chapter 7, and the duality theory of Chapter 8). The book starts with ordinary differential equations and makes the transition to partial differential equations; that is not a great obstacle. It presents the minimum principles of least energy and least action, which are more subtle than equations and sometimes more revealing. But the emphasis must go to equilibrium equations (boundary-value problems) and dynamic equations (initial-value problems). Those are the central questions, continuous and discrete. Applied mathematics is a big subject, and this is not a short book. Its goal is to be as useful as possible to the reader, in class and out. Engineering and scientific applications play a larger part than before; infinite series play a smaller part. Those series need to give way, in teaching as they have done in practice, to a more direct approach to the solution. We intend to try—working always with specific examples—to combine the algorithms with the theory. That is the way to work and I believe it is also the way to learn. The effort to teach what students will need and use is absolutely worthwhile. A personal note before the book begins. These years of writing have been a tremendous pleasure, and one reason was the help offered by friends. It was shamelessly accepted, and what can I do in return? A public acknowledgement is made at the end, of my gratitude for their encouragement. It is what keeps an author going. In one respect this is a special adventure. I decided that it must be possible to see the book all the way through. I care too much about the subject to mail in a manuscript and say goodbye. Therefore Wellesley-Cambridge Press was created, in order to do it right. The book was printed in the normal way, and it should be handled more efficiently than usual. Bookstores (and individual readers) will need a new address and telephone number, given below. No representative will call! I have to depend on those who enjoy it to say so. More than ever it must stand on its own—if it can. I hope you like the book. Gilbert Strang Wellesley-Cambridge Press M.I.T. Box 812060 Room 2-240 Wellesley MA 02181 Cambridge MA 02139 (617) 431-8488 (617) 253-4383  This page is intentionally left blank  INTRODUCTION TO APPLIED MATHEMATICS  CHAPTER 1 IN OUTLINE: SYMMETRIC LINEAR SYSTEMS 1.1 Introduction 1.2 Gaussian Elimination—linear equations by rows and by columns Geometry of the Columns—independent vs. dependent columns Elimination and Pivots—forward elimination and back substitution Factorization into A = LU—two triangular systems Lc = b and Ux = c 1.3 Positive Definite Matrices and A = LDLT—positive pivots in D Matrices of Order n = 2—pivots d1=a and d2 = c — b2/a The Minimum of a Quadratic—positive definite if xTAx > 0 Test for a Minimum: n = 2—positive pivots appear in the squares Elimination = Completing the Square—each step removes M = ldlT Elimination = Factorization—A = lld1l\ + ••• +lndntf = LDLT Matrix Multiplication—rows times columns or columns times rows Block Matrices—multiplication and inversion by blocks 1.4 Minimum Principles—the minimum of a quadratic P = ^xTAx — xTb Positive Definiteness of ATA and ATCA—independent columns in A Least Squares Solution of Ax = b—the normal equation ATAx = ATb Fitting Measurements by a Straight Line—replacing Ax = b by Ax = p Failure of Independence in the Columns of A—the pseudoinverse A + Minimum Potential Energy—Ax = e and Ce = y and ATy =f The Stiffness Matrix—K = ATCA is tridiagonal and X-1 is positive 1.5 Eigenvalues and Dynamical Systems—Ax = kx, det(A — XI) = 0 The Diagonal Form—S~XAS = A with eigenvectors in S, eigenvalues in A Differential Equations—solutions ektx for du/dt = Au Second-Order Equations—solutions eio)tx for utt + Au = 0 Single Equations of Higher Order—solutions eXt for utt + put + qu = 0 Symmetric Matrices—real eigenvalues, orthogonal eigenvectors, A = QAQT 1.6 A Review of Matrix Theory—Ax = b for b in the column space of A Incidence Matrices—KirchhofTs laws and the four subspaces Factorizations Based on Elimination—A = LDU, PA = LDU, PA = LU Factorizations Based on Eigenvalues—A = SAS1 and A = MJM1 Factorizations Based on ATA—A = QR9A = QJLQl, A = QB Review of the Review—rank one matrices A = vwT  1 SYMMETRIC LINEAR SYSTEMS INTRODUCTION ■ 1.1 The simplest model in applied mathematics is a system of linear equations. It is also by far the most important, and we begin this book with an extremely modest example: 2xl + 4x2 = 2 Axx + llx2 = 1- Such a system, with two equations in two unknowns, is easy to solve. But in discussing this particular problem we have four more general goals in mind, and they can be described in advance: (1) to establish the method of Gaussian elimination for solving a more complicated system, of n equations in n unknowns (2) to express the solution technique as a matrix factorization (3) to explain the underlying geometry (4) to recognize the associated minimum principle. Part of this will already be familiar. Therefore we mention one thing that is likely to be new: it is the fact that matrix multiplication can be interpreted in four different ways, and that each of those interpretations contributes to the analysis of a linear system. The one which is least known turns out to be surprisingly useful; it multiplies column vectors by row vectors (and produces a full matrix).  2 1 Symmetric Linear Systems The original equations can be written in matrix notation as and the solution could be dismissed as x = A~xb. To leave it there would be completely wrong. In fact, it is very rarely that the inverse of a matrix needs to be computed. A~l makes the formula easy but it is better to exclude it from the calculations. The real solution to a linear system comes from Gaussian elimination, which has the remarkable effect of splitting A into a product of triangular matrices. For this example the product is A = LDlJ. A lower triangular matrix L multiplies its transpose (which is upper triangular), with a positive diagonal matrix D in between. That is the first of the triple products which dominate this subject. It is the one that comes from the solution of a linear problem, while the second comes from the formulation of the problem. Perhaps I could also write down the third, which comes from the dynamics (when the problem changes with time): 1. LDLT: triangular L from elimination diagonal D from the pivots 2. ATCA\ rectangular A from the geometry square C from the material properties 3. QAQT: orthogonal Q from the eigenvectors diagonal A from the eigenvalues The second one is absolutely crucial. ATCA is present in all the central problems of equilibrium, whether they are discrete (with matrix equations) or continuous (with differential equations). The first examples come in Section 1.4; then the following chapters present the complete framework. It is the key to the applications. Later, in Section 1.5, there is the third factorization A = QAQT. It is more delicate, because it comes not from Ax = b but from the eigenvalue problem Ax = Xx. It depends, as the LDLT factorization did, on the symmetry of A. The same coefficient 4 multiplies x2 in the first equation and xx in the second equation, so that A equals its own transpose. (In matrix notation A = AT) And the fact that we have not only a symmetric problem but also a minimum problem must be reflected in the positivity of the eigenvalues. In part this chapter is a summary of linear algebra. It will include matrices that are not symmetric; Section 1.6 describes the changes that come when symmetry is lost. That section also admits rectangular matrices; A can be an m by n matrix of rank r. The list of factorizations is extended beyond the two that come from elimination and eigenvalues, to give a concise but more complete picture of matrix theory. Most of the chapter is not, however, a review. It moves directly to the special class of matrices, symmetric and positive definite, which are at the center of applied mathematics. Those matrices describe a condition of equilibrium, and they are associated with a minimum of energy—because that is what nature seeks. They  1.1 Introduction 3 also describe oscillations around that equilibrium, when the energy is conserved. Matrices of this kind are linked directly to minimum principles. They extend into n-dimensional space the familiar test from calculus, that where a function has a minimum its second derivative should be positive. Of course the first derivative should be zero; that is the equation Ax = b. Thus the solution to a linear system is also the point at which the energy is minimized. Throughout this book you are faced with that choice, between equations and minimum principles—fortunately it is possible to have both, since they are equivalent statements of the same law: minimize ^xTAx — xTb or solve Ax = b. For those who do not want this fast course on matrix theory—or for those who want an even faster course—the main connections to later sections can be given at once. They are LDLT: elimination (1.2) leading to direct methods for linear systems (5.1) ATCA: least squares (1.4) leading to filters (2.5) and Karmarkar's method (8.2) ATCA\ elastic springs (1.4) leading to structural mechanics (2.4) ATCA: incidence matrices (1.6) leading to networks (2.3) and graphs (7.1) QAQT: diagonalization (1.5) leading to the convolution rule (4.2) These are only the discrete applications. There is a corresponding series of problems in differential equations, in equilibrium and in evolution. They also come from ATCA or from minimum principles. They are studied analytically by diagonalization (Fourier series) or numerically by discretization (finite differences and finite elements). Therefore the discrete problems come first. The key steps are indicated by gray shading in the text. Beyond that, a number of textbooks offer a fuller treatment of linear algebra and its applications. One that we are familiar with is mentioned in the list of references. We start with Gaussian elimination, using examples that are symmetric positive definite. Then the properties of those examples emerge at the same time as the general pattern of elimination (which only cares that the matrix is invertible). We emphasize that although there are other tests for invertibility and other algorithms for solving linear systems—Cramer's rule is the most notorious, based on determinants—it is elimination that is constantly used in applications. For a small matrix that choice probably doesn't matter. For a large matrix it certainly does. For really large problems even elimination is expensive, and it becomes a combinatorial problem to reduce the number of steps; graph theory will help. But in each case the steps of elimination expose part of the inner structure, and the eigenvalues expose another part, and we want to see their relation.  4 1 Symmetric Linear Systems 1.2 ■ GAUSSIAN ELIMINATION We begin with the geometry. One approach is to look at the two equations separately: first 2xx + 4x2 = 2 and then 4xx + 1 lx2 = 1. Each equation is represented by a straight line and those lines intersect at the solution. In our example one line contains every point (xx, x2) for which 2xx + 4x2 = 2. The other line represents 4xx + llx2 = 1, its slope is —4/11, and it meets the first line at the point (3, — 1). Therefore the solution is xx = 3, x2 = — 1. That is the only point on both lines and therefore the only solution to both equations. (There is a dotted line in Fig. 1.1, also going through the solution. We will see that it comes from elimination, which produces a simpler equation but leaves the solution unchanged.) The problem has a second geometric interpretation, less familiar but equally important. It comes from looking at the columns of the system instead of the rows. We can rewrite the original problem in vector notation as *GMM- (i) meaning that we want to find the combination of the two vectors on the left which equals the vector on the right. These are vectors in 2-dimensional space, which is a plane, and they are drawn in Fig. 1.1. To multiply a vector by 3, we triple its length: 'GH«! *2 A 2xy + 4x2 = 2 4xj -h llx2 = Xj = 3, x2= — 1 Fig. 1.1. Row picture and column picture of the geometry.  1.2 Gaussian Elimination 5 When a vector is multiplied by — 1, its direction is reversed: -'[«]■[-»]■ To add vectors, we can describe the result either geometrically by shifting one vector to start where the other one ends, or else algebraically by just adding the corresponding components: Both methods lead to the same conclusion, that the right weights are xx = 3, x2 = — 1. Our second example is in four dimensions: 2xx + x2 =2 xx + 2x2 + x3 =1 (2) x2 + 2x3 + x4 = 4 x3 4- 2xA = 8 The first approach again looks separately at each equation. The graph of 2xx + x2 = 2 is no longer a line; instead, since x3 and x4 can be chosen arbitrarily, it is a three-dimensional surface in four-dimensional space. | It is harder to visualize, but in some four-dimensional sense it must be flat. It is called a plane, or sometimes a hyperplane, to indicate that it is determined by one linear equation (like an ordinary plane in three dimensions). It has one fewer degree of freedom than the full four- dimensional space. It is perpendicular to the vector (2, 1, 0, 0), and although it is three-dimensional (it would look solid to us) it is a very thin set in four-dimensional space. The second equation is similar: xx + 2x2 + x3 = 1 describes another hyperplane. It intersects the first in a two-dimensional surface, still flat and still in four dimensions. Then the third hyperplane x2 + 2x3 + x4 = 4 intersects that surface in a line. Note that the line does not go through the origin, and is not described by a single equation; in four dimensions, it requires three equations to determine a line. Finally the hyperplane from the fourth equation intersects this line in a point, which is the solution. That completes one picture of the geometry behind a linear system; it is a family of intersecting hyperplanes. In n dimensions there are n planes, and their intersection is the solution to the linear equations. t The author has tried to draw it, and failed.  6 1 Symmetric Linear Systems Geometry of the Columns: Invertible vs. Singular The second approach to the geometry, by columns instead of rows, starts from (3) This is still Ax = b. The left side is a "column at a time" multiplication of A times x: 2 1 0 0 + x2 1 2 1 0 + x3 0 1 2 1 + x4 0 0 1 2 = 2 1 4 8 Ax = 2 1 0 01 12 10 0 12 1 0 0 1 2J r*1! *2 *3 LX4.J xx (column 1) 4- x2 (column 2) + x3 (column 3)' + x4 (column 4) The product Ax is a combination of the four columns of A^ To solve Ax = b is to find the combination of the four vectors on the left of (3) which produces the vector b on the right. In this example the combination is especially simple; b is the first column plus four times the last column. Thus the solution is (xl9 x2, x3,x4) = (1,0, 0,4). It is also the point where the four hyperplanes intersect, in the first approach. Now we consider a more unpleasant possibility, in which both approaches fail. Suppose the hyperplanes do not intersect in a point. Alternatively, suppose the combinations of the columns of A cannot produce the right side b. This is the degenerate case, or singular case. It is typified by a matrix in which the third row is the sum of the first two rows: Ax = l~l 1 f 12 3 [2 3 4 *i *2 *3 = 01 1 5 (4) Each equation represents an ordinary plane in three dimensions. The first plane is xi + xi + *3 = 0, perpendicular to the vector (1, 1, 1). This plane goes through the origin, since the point x1=x2 = x3=0 satisfies the equation. It intersects the second plane xx + 2x2 + 3x3 = 1 in a line. However that line does not touch the third plane—the plane is parallel to the line and they never meet. There is no point on all three planes, and no solution to Ax = b. Algebraically that conclusion is reached by adding the first two equations. We get 2xx + 3x2 + 4x3 = 1, which contradicts the third equation. Algorithmically the 11 cannot emphasize that too strongly; it is simple but fundamental.  1.2 Gaussian Elimination 7 conclusion comes from the failure of elimination, which would lead in a systematic way to 0 = 4. Here we stay with the geometry, to visualize this singular case. In the "row at a time" approach the three planes do not meet. The problem is not that two planes are parallel; it is one step more subtle. Viewed from the right direction the planes form a triangle (Fig. 1.2). Since they have no point in common, the equations are inconsistent. In the "column at a time" approach, the three columns of A lie in the same plane. Their combinations do not fill 3-dimensional space, and they cannot produce the right side b = (0, 1, 5). That vector b is not a combination of the columns, and there is no solution. no common point in a common plane Fig. 1.2. Dependent rows and dependent columns. Note one important point: Because there were only 2 independent rows, there were only 2 independent columns. We know that row 3 is the sum of rows 1 and 2. Therefore some combination of the columns will also give zero, and the columns lie in a plane. The combination can be found by solving Ax = 0; it is x = (1, —2, 1). The first and third columns in Fig. 1.2 add to twice the second column. Of course the equations are not always inconsistent. If b3 = 5 is changed to by = 1, then the third equation is the sum of the first two. It contains no new information, but it does not destroy the system. In that case there is a whole line of solutions lying on all three planes, and the system is underdetermined. In the other figure, the right side b — (0, 1, 1) falls into the plane of the columns. It is a combination with weights xx = — 1, x2 = 1, x3 = 0, but we can add any multiple of 1, —2, 1 to those weights and find another solution. Now we go back to the nonsingular case, where A is invertible and there is a unique solution x = A~lb—which is found not from that formula but from the most basic algorithm in scientific computation. Elimination and Pivots Geometry can clarify the underlying problem but it is algebra that solves it. We need a systematic method to compute the solution, and the key idea is simple. The size of the problem is progressively reduced, by eliminating one variable after another, until only the last unknown xn is left. That unknown is determined by a single equation dnxn — cn9 which is trivial: xn = cjdn. Then, by going back to each of the eliminated variables, we solve for the remaining unknowns x„-l9..., xx in  8 1 Symmetric Linear Systems reverse order. The final result is a straightforward and efficient algorithm for the solution of n simultaneous linear equations in n unknowns. It is known as Gaussian elimination. For n = 2 it is deceptively simple. We start with 2xx+ 4x2 = 2 4xx + llx2 = 1. The upper left entry 2 is called the pivot. Some multiple of the pivot will eliminate the entry beneath it, and in this case the multiple is two; we subtract twice equation (1) from equation (2) to arrive at the new system 2x1 + 4x2 = 2 3x2 = — 3. (5) The second pivot has now appeared; it is 3, the coefficient of x2 on the diagonal after elimination. Normally we would use this pivot d2 to eliminate x2 from the later equations, but in this case there are no later equations and elimination is complete. The second equation is d2x2 = c2, or 3x2 = — 3, and it is this equation which corresponds to the dotted line in Fig. 1.1. It gives x2= —1, and after substituting that answer into the first equation we find xx = 3. This is a first example of a triangular system, so called from the shape of the left hand side in (5). The process of solving it, from the bottom to the top, is called back substitution. Now consider the example with n = 4 equations: Ax = 2 10 0 12 10 0 12 1 0 0 12 *i *2 *3 X4 2 1 4 8 = b. For the present we ignore the right side fc, the "inhomogeneous term" in the equation. The elimination steps are completely decided by the matrix A, as we reduce it one column at a time to a triangular matrix. The same steps are of course applied to b, to maintain equality, but they can be done afterwards. The immediate goal is to simplify A. The first pivot is dx = 2, and we use it to eliminate xx from the remaining equations. In this problem, xx is already absent from equations (3) and (4). Therefore we subtract from equation (2) the multiple l21 =j of equation (1): 2 1 0 0 1 2 1 0 0 1 2 1 0 0 1 2 - 2 0 0 0 1 3 2 1 0 0 1 2 1 0 0 1 2  1.2 Gaussian Elimination 9 This time the second pivot is d2 = f, and we use it to eliminate x2 from the third equation. The required multiple is Z32 = f, and subtraction produces a zero in row 3, column 2: 1 0 0~| | 1 0 0 | 1 " 0 1 2 J The final step uses the third pivot d3 = f, and multiplies it by Z43 = J, in order to annihilate the last nonzero entry below the diagonal. The subtraction 2 — f = f gives the fourth pivot and leaves the final triangular form 1 3 2 0 0 0 1 4 3 0 0 0 1 5 4 This matrix U is upper triangular, with the pivots on its diagonal. Note that the pivots are not the original diagonal entries of A; they are not known until elimination discovers them. The forward elimination is now completed for A. To keep the four equations correct, we apply the same steps to the right side b: |~2~ 1 4 L8_ \ row 1 from row 2 ~l\ 0 4 _8J The last step subtracts Z43 times the third row from the fourth, and produces 8 —1(4) = 5. The new system of equations will be Ux = c. The accident of a zero entry meant that the second step produced no change; that is not very important. What is much more important is to recognize the effect of the zeros in the original matrix A. That matrix is tridiagonal. It has three zeros below the diagonal, all outside the "band" of nonzeros. As a result, three of the normal elimination steps were unnecessary; the multipliers Z31, Z41, and Z42 vanished since the entries in those locations were already zero (and remained zero). Above the band a similar property holds; the zero entries in A are still there in U. Most large matrices that arise in practice are sparse, and it will be a fundamental question to preserve as much as possible the sparsity of A; in this case it was entirely preserved. Elimination has taken us from A to U and from b to c. The systems Ax = b and Ux = c have the same solution, since equality was maintained at each step by applying the same operations to A and to b. These operations are completely reversible; we can return from Ux = c to Ax = b, so that no solutions are created or 2 0 0 0 u = 2 2 0 J row 3 0 ► 4 from row 4 4 8 5  10 1 Symmetric Linear Systems destroyed. Therefore the problem is now governed by a triangular matrix, 0 1 4 3 0 0 0 1 5 4 and it is this system that back substitution will solve. The last equation is (|)x4 = 5, and thus x4 = 4. Then the third equation becomes fx3 + x4 = 4, or fx3 = 0, or x3 = 0. Continuing upwards we have x2 = 0, xx = 1, and x = (1, 0, 0, 4) as expected. Remark Some descriptions of elimination go beyond U9 to remove any nonzeros that are above the diagonal. With enough operations the original A can be changed into a diagonal matrix, and even into the identity matrix. This is Gauss-Jordan elimination. It is not used in practice, because there is no reason to pay its extra cost. It does lead to the inverse matrix, but normally A ~x is not wanted or needed. All we require is x = A~lb, which comes from back substitution. Factorization into A = LU Back substitution solves Ux = c. It is easy because U is triangular; that is the object of elimination. It is the second half of the action on the right side of Ax = fc, and we now look back at the first half—the step from b to c. This was "forward elimination" when the subtractions that are applied to A on the left side are also applied to b on the right. Those subtractions change A and b to U and c, and the key to elimination is to put them into matrix form. The steps were listed in equation (6). Multiples ltj of one component were subtracted from a later component. Our whole point is that if these multipliers ltj are put into a matrix L, then the steps from b to c are exactly the same as solving Lc = b. That is the matrix form of forward elimination, and in this case the multipliers are |, f, and f. 1 1 2 0 0 1 2 3 0 We claim that (7) is the same as (6). Suppose the right side b is given and we are solving for c. Certainly the first component cx = 2 comes from bx = 2. Look at the second equation. To find c2 = 0 we subtract half of Cj from b2 = 1; that agrees with Ux = 2 1 o ! 0 0 0 0 2 0 4 5 2 1 4 8 or Lc = b. (7)  1.2 Gaussian Elimination 11 the first elimination step. To find c3 we subtract f of c2 from fc3, and again that is correct. Finally c4 = 5 comes from b4 = 8 by subtracting |c3 = 3. If we turn away from these numbers, and concentrate on the matrix form, the pattern is remarkable. The forward half of elimination is governed by the triangular matrix L. The backward half is governed by U. The effect of elimination is to break Ax = b into two triangular systems, first Lc = b and then Ux = c. The matrices L and U come from elimination on the left side of the original equation; L contains the multipliers and U contains the end result. They are the output from A, and then x is the output from b. A good code separates Gaussian elimination into these two subroutines: 1. Factor (to compute L and U from A) 2. Solve (to compute x from L and U and b). This applies to unsymmetric as well as symmetric matrices. There may also be row exchanges, which are never needed for positive definite problems. What is most important is the product of L and U; it turns out to equal the original A. 1A Elimination goes from Ax = b to the triangular system Ux = c. The step from b to c is another triangular system Lc = b, where L contains the multipliers used in elimination (and l's on the diagonal). The two equations combine to give Lc = LUx=b, so that A = LU. (8) Elimination factors A into L times U. The example illustrates this matrix equation LU = A: [1 i 1 ! i L i U "2 1 ! i f 1 5 4_ "2 1 "1 1 2 1 1 2 1 1 2j Elimination went from A to U by subtractions. It returns from U to A by undoing those subtractions, and adding back l(j times row j of U to row i. Normally the multipliers fill a lower triangular matrix L, but in the case displayed above L and U and A are band matrices. The factors of a tridiagonal matrix are bidiagonal. This factorization will be studied again in the next section, to make a special point about symmetry. L is related directly to U, after you divide the rows of U by the pivots. (In other words, multiply the four rows by ^, |, f, f.) Then the diagonal entries are 1 and the upper triangular matrix becomes LT. It is the transpose ofL.  12 1 Symmetric Linear Systems This creates three factors instead of two, but now they preserve the symmetry: [l i i 1 i L i ij ~2 3 2 4 3 5 4_ r1 * i 11 11 i "2 1 1 12 1 12 1 1 2 J That is the triple factorization A = LDLT. Before ending this section we must emphasize the practical importance of A = LU: Once L and U are known, every new right side b involves only two triangular systems Lc = b and Ux = c (n2 steps). The elimination on A is not repeated, and A ~l is not needed. Suppose the components of b are changed to 0, 0, 4, 8. Forward elimination in Lc = b leads to c = (0, 0, 4, 5). Then back substitution gives x = (0, 0, 0, 4). This solution is correct, since 4 times the last column of A does equal b. The essential point is that every vector b can be processed in the same extremely fast way. In the next section we see how the relation of U to L recovers the symmetry of A9 and how positive pivots are linked to a minimization. EXERCISES 1.2.1 Solve [1 1 l~l 13 3 [l 3 5 J 1 Xl *2 1 *3 = 2 0 2 using elimination to reach Ux = c and then back substitution to compute x. 1.2.2 In the preceding problem, describe the graph of the second equation xl + 3x2 + 3x3 = 0. Find its line of intersection with the first surface xt 4- x2 + x3 = 2, by giving two points on the line. 1.2.3 Solve Ax = b with the same A as in the text, but a different b: 2 10 01 12 10 0 12 1 0 0 1 2J pi 1 *^2 *3 1 *4 " ^ 1 12 H Without repeating the steps from A to U, apply them only to solve Lc = b. Then solve Ux = c.  1.2 Gaussian Elimination 13 1.2.4 (a) Find the value of q for which elimination fails, in the system G:][» (b) For this value of q, what happens to the first geometrical interpretation (two intersecting lines) in Fig. 1.1? (c) What happens to the second interpretation, in which b is a combination of the two columns? (d) What value should replace b2 = 4 to make the system solvable for this ql 1.2.5 For the small angle in Fig. 1.1, find cos2 6 from the inner product formula cos g=JPwWr i' With vTw = V 4^nl 1.2.6 By applying elimination to A = 1 2 2 2 7 7 2 7 9 factor it into A — LU. 1.2.7 From the multiplication LS show that L = " 1 '21 1 .'31 0 1 is the inverse of S = 1 0 1 5 subtracts multiples of row 1 and L adds them back. 1.2.8 Unlike the previous exercise, show that L = 1 In 1 .'31 ': 32 AJ is not the inverse of 5 = -'21 -'3. If S is changed to £ = 1 0 1 0 -/, 1 -hi 1 -'3. 0 1. show that E is the correct inverse of L. E contains the elimination steps as they are actually done—subtractions of multiples of row 1 followed by subtraction of a multiple of row 2.  14 1 Symmetric Linear Systems 1.2.9 Find examples of 2 by 2 matrices such that (a) LU * UL (b) A1 — —I, with real entries in A (c) B2 — 0, with no zeros in B (d) CD = -DC, not allowing CD = 0. 1.2.10 (a) Factor A into LU and solve Ax = b for the 3 right sides: A = 1 1 0 -1 2 -1 0" -1 2_ , b = 1 0 0 , 0 1 0 5 0 0 1 (b) Verify that your solutions x1,x2,x3 are the three columns of A 1. (A times this inverse matrix should give the identity matrix.) 1.2.11 True or false: Every 2 by 2 matrix A can be factored into a lower triangular L times an upper triangular (7, with nonzero diagonals. Find L and U (if possible) when ■[: a- 1.2.12 What combination of the vectors [~2~ 0 _6_ > V2 = "3" 4 _9_ > ^3 = "21 0 _7J gives b ■- 1.2.13 What is the intersection point of the three planes 2xl 4- 3x2 4- 2x3 = 2, 4x2 = — 8, 6xl 4- 9x2 4-7x3 = 7? 1.2.14 What are the possibilities other than Fig. 1.2 for a 3 by 3 matrix to be singular? Draw a different figure and find a corresponding matrix A. 1.2.15 Solve by elimination and back-substitution (and row exchange): [1 1 f] 3 3 4 L1 2 U pi *2 L*3_ = Ol 2 _~4J 1.2.16 Write down a 3 by 3 matrix with row 1—2 row 2 4- row 3 = 0 and find a similar dependence of the columns—a combination that gives zero.  1.3 Positive Definite Matrices and A = LDLT 15 POSITIVE DEFINITE MATRICES AND A = LDLT ■ 1.3 The previous section went from A to an upper triangular matrix U. It used a sequence of row operations, with multiplying factors ltj chosen to produce zeros below the pivots. This section looks again at those multipliers, and puts them into a matrix L. There is a pattern in the thousands (or millions) of numbers that elimination produces, and when recognized it is beautifully simple. We see it as a factorization of A into triangular matrices: A equals L times U. Our plan is to begin by summarizing the main points of the section. It concentrates on symmetric matrices, A = AT, and it takes a major step beyond the mechanics of elimination. In fact it goes one step beyond A — LU. The upper triangular U is split into two parts, by dividing each row by its pivot. This is easy to do—we just factor out a diagonal matrix D containing the pivots—but for symmetric matrices it has a remarkable result. The matrix it leaves is exactly Lr, the upper triangular transpose of the lower triangular L. This reappearance of L means that symmetric matrices have a symmetric factorization: A equals L times D times LT. One more thing. We have assumed that elimination could proceed without row exchanges; the entries in the pivot positions were never zero. For the most important matrices in this book something extra is true: The pivots are positive. A symmetric matrix with positive pivots is called a positive definite matrix, and it is those matrices that will dominate the applications. In the first application, at the end of this section, we concentrate on one of their fundamental properties: They identify the presence of a minimum. Thus the main points are I (1) If the multipliers ltj are put into a lower triangular matrix L, with ones on the main diagonal, the original A is linked to the upper triangular U by A = LU (2) If A is symmetric and the diagonal matrix D containing the pivots is divided out of 17, it leaves the transpose of L. Thus U is linked to L by U = DLT (3) If the pivots are positive then A is positive definite and its symmetric I factorization is A = LDLT. These steps will be illustrated for small matrices before going to the general case. We show that factoring A into LDLT is the same as the method of "completing the square"—which was discovered by Lagrange in 1759, a century before the invention of matrices. It is the key to deciding when a function F(xl5 ..., xn) has a minimum. We mention that Cholesky went one step further, by taking the square roots of the pivots. He combined D1/2 with L into a new triangular matrix L = LDl/2. Then the other square root D1/2 is in the transpose of L, and the diagonal matrix disappears in A = LD1/2D1/2LT = LLT.  16 1 Symmetric Linear Systems This is the Cholesky factorization of a positive definite A—needing only two matrices, at the cost of computing square roots. Matrices of Order n = 2 The way to understand this section is through a 2 by 2 matrix: A = a bl b c]' It requires only one elimination step and one multiplier l2l = b/a. Subtracting this multiple of row 1 from row 2 leaves the upper triangular matrix Ja b 1 |_0 c-b2/a] V-[0 c-„'-l- <» Thus the pivots are dx = a and d2 = c — b2/a. Now we construct the lower triangular L, with unit diagonal and /21 = b/a, and verify each of the steps listed above: (1) LU = A becomes lb/a lj|_0 c-62/flJ = U CJ (2) In other words, we return from U to A by adding back to row 2 the multiple b/a of row 1. The matrix L gives the inverse of the subtraction step, so that lb/a l]l-b/a lj |_0 1_T or LL~1=I. The elimination step applied L x to A; it subtracted /21 times row 1, and reached U. The reverse step applies L to U and recovers A. (2) Dividing each row of U by its pivot will factor out a diagonal matrix D: [0 c-b2/a] |_0 c-b2la\\_0 lj The final matrix has ones on the diagonal and it is LT—the transpose of the earlier L. (3) The factorization A = LU = LDLT becomes explicit: [b c]~lb/a lJLo c-fc2/aJLo 1 J' ()  1.3 Positive Definite Matrices and A = LDLT 17 The symmetry of A on the left is reflected in the symmetry of the factorization on the right. In fact, the rule for transposing LDLT guarantees that every such combination is symmetric. The transpose of any product AB is BTAT, with the transposes in reverse order. Therefore the transpose of a triple product LDLT has the three individual transposes reversed, but since LTT is L and DT is D we come back where we started: (LDLT)T = LTTDTLT = LDLT. (4) Thus if A has a symmetric factorization A = LDLT, it must be a symmetric matrix: A = AT. The real point is in the opposite direction, to determine how and when a matrix allows such a factorization with D > 0. This is the additional condition on which Cholesky insisted, and we need to emphasize it properly. It is the requirement that the pivots must be positive. Not every symmetric matrix will satisfy this condition. For n = 2, positive pivots mean a > 0 and c — b2/a > 0. The latter gives c> 0, so that both diagonal entries a and c must be positive. But positivity of the main diagonal is only a necessary condition; by itself, a positive diagonal is not enough for positive pivots. The condition c > b2/a also imposes a requirement on b—or on the determinant of the matrix, which is ac — b2. The determinant must be positive too. In other words, it is the sign of the diagonal and the size of the off-diagonal that are jointly decisive. The matrices [i i] - [1 -i] are not positive definite, although a> 0 and c>0 in the first case, and the determinant is positive in the second. A symmetric matrix with positive pivots is a positive definite matrix. This definition applies to symmetric matrices of any order n, and it is straightforward to check if n is not large. Nevertheless it is not completely satisfactory. The formulas for the pivots become complicated as n increases, and even with n = 2 it would take some effort to prove the following simple property: The sum of two positive definite matrices is positive definite. Therefore our goal is to complement this definition by another one, equivalent but quite different—so that the second and more basic description can be used when the one based on pivots becomes unworkable.! The Minimum of a Quadratic For n = 2, the new definition of positive definite matrices will deal with the special function f(xi •> xi)= axi + 2bxl x2 + cx\. t Section 1.5 gives a third definition, again different but also equivalent; it requires A to have positive eigenvalues.  18 1 Symmetric Linear Systems This is called a quadratic form, since all its terms are of second degree—they involve only squares xf or products x^x,-. The link between the function / and the matrix A is immediate from a multiplication: /= [*1 *2] a b b c X2. , or /= xTAx. (5) Each off-diagonal term contributes bxlx2, and the diagonal entries yield the squares ax\ and cx\. There is a complete correspondence between symmetric matrices A and quadratic forms /= xTAx. For the matrix at the beginning of the book, A = n] corresponds to /= 2x\ + Sxlx2 + llx|- The new definition is based on /: A is positive definite iff= xTAx is always positive (for x i=- 0). Eventually we show that this is equivalent to positivity of the pivots. Certainly / is zero when x1 = x2 = 0; that point is excluded in asking if f> 0. Also the first derivatives are zero at xx = x2 = 0, since dx1 dx2 = 2axl + 2bx2 = 2bxt -h 2cx2. (6) Everything is contained in the second derivatives, which come directly from the numbers a, b, and c: d2f d2f dx\ ' dxldx2 = 2b, dxl = 2c. All higher derivatives are zero. Our question is: Does f have a minimum at xx = x2 = 0? The function / is zero at the origin, and we ask whether at all other points it is positive: /= ax\ + 2bXi x2 + cx\ > 0. (7) If this is true, its graph will be shaped like a bowl resting at the origin (Fig. 1.3). If not, the bowl might be upside down (when / has a maximum), or it can change into a saddle (when / is positive at some points and negative at others). In every case the origin is a stationary point, where the tangent is horizontal. The first derivatives are  1.3 Positive Definite Matrices and A = LDLT 19 zero, and we are extending to two variables or n variables the test that comes from calculus: /" > 0 at a minimum. positive definite negative definite Fig. 1.3. A minimum, a maximum, and a saddle point. indefinite Test for a Minimum: /? = 2 It is certainly necessary that a > 0. Otherwise we look along the xx axis, where x2 = 0, and/ = ax\ is not positive. Similarly we need c> 0, or/ will not increase in the x2 direction. In other words the pure second derivatives fxx and fyy must be positive, imitating /" > 0. But there is also a condition on b, coming from the cross derivative fxy, and its sign is not what matters. Each of the quadratics /i =xl + 10XiX2 + x\ and f2 = x\ — 10x1x2 + x2 has a saddle point rather than a minimum at the origin, even though a = c = 1. In the first case, fx = — 8 at x = (1, — 1); in the second, /2 = — 8 at x = (1, 1). These functions can be both positive and negative. So we come to the central problem. How is it decided whether a quadratic form is everywhere positive? We answer it first for the particular example f=2x\ + 8xxx2 + llxl, to see the key idea, and then for a general /= xTAx. The particular case has a = 2 and c = 11, both positive. But also the cross term 8xxx2 must be dominated by the others, or it will produce a saddle point. What is needed is a way of rewriting / so that it is obviously positive. The trick is to include the cross term within a perfect square, and to write / as a sum of squares: 2x\ + 8xxx2 + llx2 = 2{x1 + 2x2)2 + 3xf. (8) This is the step of completing the square, and the result is seen to be positive. If the cross term had been too big, say 12XiX2, the first square would be 2(xx + 3x2)2. This uses 18x2, more than we have to give. The final term is then (11 — 18)x|, or — 7x1. This difference of squares means a saddle point instead of a minimum. The coefficients on the right side of (8) were not hard to find. By matching the left side they proved it to be positive—which is what we wanted. But to leave it there is  20 1 Symmetric Linear Systems to miss a chance of understanding the whole theory. Those same coefficients have already appeared, when elimination was applied to A and the result was written as LDLT: g M "][o ati J]- (9) The pivots 2 and 3 are the same numbers that are outside the squares in equation (8). The multiplier 2 in L and LT is the same as the number inside the square. In fact all the entries of L are inside the squares, if they are written out in full: xTAx = 2(1*! + 2x2)2 + 3(0*! + lx2)2 (10) The first column of L is in the first square (the coefficients 1 and 2) and the second column (0 and I) produces the second square. That cannot be an accident. We check first that it always happens in the 2 by 2 case, where the first term ax\ and the cross term 2bxlx2 are matched by the first square, /7Y? . S = a xx + -x2 = flx2 + 2frx1x2H *i- \ a J a The sign of S is the same as the sign of a. The coefficient of x2 is adjusted in order to equal c, and this correction completes the square: f=a[xl + -x2J + (c-—JxJ. This identity reveals the conditions for a minimum: f> 0 if and only if a > 0 and c > 0. a (id (12) These conditions are necessary, because if either one is violated we can make the corresponding term in (11) negative, and the other term zero. They are sufficient, because if both conditions are satisfied then everything is positive (except at the origin xx = x2 = 0). You recognize a and c — b2/a. They are the pivots, and therefore the quadratic form xTAx is positive exactly when the pivots of A are positive. The two definitions of positive definiteness coincide (at least for n = 2). The function f has a minimum when the matrix A is positive definite. It is this property that we now extend from two variables to n variables, connecting symmetric matrices A to quadratic forms J — X /\.X — L^l ^2 *.] au al2 a2\ a22 0*1 0*2 01* 02* *1 X2 (13)  1.3 Positive Definite Matrices and A = LDLT 21 The notation will be more complicated but not the underlying idea. The pivots come from one algorithm—Gaussian elimination—and xTAx > 0 is decided by another algorithm—completing the square. The key is to show that those two algorithms are the same. Elimination = Completing the Square Mistakenly or not, I feel a duty to prove that A = LDLT for positive definite matrices of all orders. There are several possible proofs. Perhaps the quickest is by induction, from order n — 1 to order n; I do not think it is the most illuminating. Two proofs of A — LU are in my text on Linear Algebra and Its Applications (the 3rd edition is published by Harcourt Brace Jovanovich, 1988). With symmetry and positive definiteness LU becomes LDLT. Here I want to try a fresh approach. It looks at each stage of elimination as the removal of a special matrix—one whose rows are all multiples of the same row (the pivot row). That is what Gaussian elimination does. Then the sum of these special matrices, which equals A, can be written as LDLT by a close look at matrix multiplication. While expressing elimination as A = LDLT, we are at the same time watching the signs of the pivots—to show they are positive when /= xTAx is positive. Perhaps we should point to the main steps in advance, so that the details will be partly optional: (1) The matrix removed by each elimination stage has the form ldlT, column times pivot times row (2) The sum of these matrices is LDLT (3) If xTAx > 0 then the pivots are positive—the first is dx > 0 and the argument applies again to the remaining matrix that is one order smaller. You will see everything from an example. The important thing is to think of elimination not only as a method for producing zeros, but as a splitting of A into simpler matrices. The example is A = 1 2 2 2 7 7 2 7 9 The first stage of elimination produces zeros below the pivot dx = 1. It multiplies the first row by 2 and subtracts it from the other rows. The effect is to remove the first row and column from the problem, and to leave behind a new matrix in the lower right corner: |~1 2 2 2 7 7 [2 7 9 = 1 2 2" 2 4 4 2 4 4 + OOOl 0 3 3 0 3 5J (14)  22 1 Symmetric Linear Systems 0 0 0 0 3 3 0 3 5 = 0 0 0 0 3 3 0 3 3 + 0 0 0 0 0 0 0 0 2 The matrix that is removed contains multiples of row 1. Then elimination continues on the last matrix. It starts with the next pivot d2 = 3, and the multiple of row 2 to be subtracted is /32 = 1. This second stage of elimination produces zeros by removing another simple matrix and leaving one that is even simpler: (15) The last pivot is 2. Thus the original A is the sum of three matrices, the middle one in (14) and the two on the right side of (15). The first contains multiples of the row 1, 2, 2; the second contains multiples of 0, 1, 1; the third contains multiples of 0,0,1. You notice that the same is true of the columns. We interpret this splitting in two ways. First, it completes the squares in / = xTAx. Second, it gives the factorization A = LDlJ. In practice that is the important point. The numbers in LDLT are a record of elimination, and to solve Ax = bwe call up those numbers and apply them to b—which is easy to do for this 3 by 3 matrix. It is only an example, but it illustrates the general rule. We begin with the sum of squares. The quadratic /= xTAx is L*! X2 X3J 1 2 2~| 2 7 7 2 7 9J |"*i] *2 LX3J = x\ + 4xxx2 + 4xxx3 + 7*2 + 14x2x3 + 9x3. (16) The question is whether this quadratic is always positive. The answer is yes, because the three matrices in the splitting come from perfect squares. The first matrix matches all terms involving xx: \_xl x2 x3J 12 2 2 4 4 2 4 4 *i *2 *3 = xj + 4xxx2 + 4xxx3 + 4x2 + 8x2x3 + 4x3 = (x1+2x2 + 2x3)2. (17) The row and column 1, 2, 2 are inside the square, and the pivot d^ = 1 appears outside. The second matrix in the splitting gives a square with d2 = 3 outside and 0,1, 1 inside, 3x1 + 6x2x3 4- 3x3 = 3(x2 + x3)2. This matches all remaining terms involving x2. Then the third matrix gives 2x3. These three squares add up to / in (16), which is therefore positive.  1.3 Positive Definite Matrices and A = LDLT 23 Exactly the same numbers appear in the symmetric factorization A = LDLT: A = 1 2 1 2 1 1 1 3 2 1 2 2 1 1 1 Thus the factorization into LDLT is essentially the same as the splitting into simple matrices. That is still to be proved in general, but we can settle the question of positive pivots: For any symmetric matrix they signal the presence of a minimum. 1B The pivots are positive if and only if xTAx is positive for all nonzero vectors x. In this case the symmetric matrix A is positive definite and f=xTAx has a minimum at x = 0. The first matrix to be removed from A is Mv It contains multiples of the first row of A. It uses the first pivot d\i, which we call d, and chooses the multiples to match A in the first column. The rows of Mt are iv 1: w2: n n: 1 a2l d d times times times id Id Id a12 • a12 • al2 • •• «iJ •• «iJ • «iJ The first column contains d, a2l,..., anl and agrees with the first column of A. But symmetry gives more than that. The column entries are the same as the row entries. If the first column is /, the row in brackets is d times lT. Therefore we can write Ml in a most remarkable way, as a column times a row: Mt = 1 a2l/d am Id [d al2 «iJ = ldl7 (18) Notice above all that a column times a row is a matrix. In our example 1 2 2 2 4 4 2 4 4 = ~l] 2 2J In general an n by 1 matrix (a column) times a 1 by n matrix (a row) is an n by n  24 1 Symmetric Linear Systems matrix. If A is not symmetric then the pattern is the same but the row is different from the column. They lead to A = LU. When Mx is removed from A it leaves a block of order n — 1. There are zeros in the first row and column; we call this new matrix B. The next elimination step removes from B another matrix M2, leaving zeros in two rows and columns and a block of order n — 2. This continues until Mt, M2,..., M„ have been removed and nothing is left. Equation (14) was an example of A = Mx + B, and equation (15) was an example of B = M2 + M3. Together they split A into Mx + M2 + M3. We now connect this splitting to a sum of squares, to prove IB. If xTAx is positive then certainly the first pivot is positive. The choice xT = [1 0 ••• 0] forces xTAx = d>0. This is the necessary condition seen before, that the main diagonal must be positive. Then to display the whole xTAx as positive we need n squares, and the first square is xTM1x. This matches correctly all the terms that involve xx. It is like (xx + 2x2 + 2x3)2, which matched the first terms of (16). It does not match the remaining terms, which will be adjusted by the remaining squares. The next square is xrM2x, which matches all the terms in x2. Removing Mx from A leaves a smaller problem governed by B. For the block in B, involving only n — 1 variables and ignoring the first row and column of zeros, we assume that the result is already proved: Its pivots are positive if and only if xTBx is positive.! The pivots of this block are the last n — 1 pivots of A; elimination continues and these pivots appear. At the same time, xTAx is positive in dimension n if and only if xTBx is positive in dimension n — 1. (If the latter failed, we could choose xx to make the first square zero and then the former would fail.) Therefore the result assumed true for B is also true for A: The pivots are positive if and only if the function xTAx is positive. That completes the proof of IB. It follows immediately that if two matrices Ax and A2 are positive definite so is Ax + A2: it passes the test xT(A1 + A2)x = xTA1x + xTA2x > 0. This test succeeds easily, while the pivots are much harder to follow. The pivots of Al-\- A2 are not the pivots of Ax plus the pivots of A2 (except for the first one). But they must all be positive. Elimination = Factorization The two algorithms, one which eliminated xx from Ax = b and the other which eliminated it from xTAx, gave the same matrix B. As they continue, one produces pivots and multipliers and A = LDLT, and the other expresses xTAx as a sum of squares. We now compare those two end products. It will be like the 2 by 2 and 3 by 3 examples, where the pivots were outside the squares and the multipliers were t This is the "induction hypothesis." The result is already known to be correct for n = 1 (and n = 2), and we go from each size to the next larger size.  1.3 Positive Definite Matrices and A = LDLT 25 inside. We will be proving that elimination is equivalent to the factorization A = LU = LDLT, and if you stay with it you will be through the longest theoretical section of the book. This is one case in which the theory is basic to numerical applications. 1C Every symmetric positive definite matrix allows a symmetric factorization ,4 = LL>Lr = /1d1/[ + -.. +IJJI (19) The numbers dt > 0 are the pivots in D, and the vectors Z£ are the columns of the multiplier matrix L. Everything rests on one point: The elimination step from A to B is achieved by removing the nby n matrix M1 = 1 a2i/d ani/d d\_\ al2/d ••• aln/d] = ldl7 This vector / (or /x) is the first column of L—including the 1 on the diagonal. The vector dlT (or dx l\) is the first row of U. The matrix Mi is symmetric and removing it leaves a symmetric matrix B. Then the first pivot of B is d2, its multipliers fill the column /2, and elimination continues all the way to a block of order one: Ho i]—[x a-** 0 0 0 * 1 n-1 2 n-2 n-1 1 Since Mj = ljdjlJ is removed at every step, A is decomposed into A = l1d1ll + l2d2ll+-+lmdmll All that remains is to recognize this as the matrix multiplication LDLT. It involves a sum of columns times rows—the columns of L times the rows of LT—which is an unusual way to multiply matrices! But it is legal, and it fits perfectly with the steps of Gaussian elimination. Matrix Multiplication The usual order of matrix multiplication is "row times column." If the matrices are A and B, then row 1 of A multiplies column 1 of B to give the (1, 1) entry in the corner of AB. This product of vectors is an inner product, combining n ordinary  26 1 Symmetric Linear Systems multiplications of alj times bjl. If A and B are n by n, there will be n2 entries in AB and they normally involve n3 ordinary multiplications. However the order of these multiplications is not sacred. Watch all9 when row 1 of A multiplies the columns of B. In these inner products, an always multiplies a number in the first row of B. Similarly al2 multiplies the second row of B. Sooner or later, aln will multiply the last row of B. The addition of all these products involving row 1 of A gives row 1 of AB: row 1 of AB = [an fliJ row 1 of B row n of B = an(row 1 of£)+ ••• + aln(row n of B). This is matrix multiplication "a row at a time." If we write all the rows of AB in this way, and assemble the results, the product AB looks like AB = flu (row 1 of B)+ ■•• +aln(row n of B) anl(row 1 of B) + ••• + ann(row n otB) Now you can see another possibility. This expression can be split up in a different way, without changing the final result. The order of multiplication can become AB = [row 1 of£] + Hn [row n of B~\. (20) We state this as a general rule and then test it on a specific example. 1D The matrix AB can be written as a sum of "outer products" of columns times rows: AB = (column 1 of X)(row 1 of fl)+ ••• 4- (column n of 4)(row nof B). (21) Each product is an n by n matrix of rank one and requires n2 ordinary multiplications. EXAMPLE |[0 3]  1.3 Positive Definite Matrices and A = LDLT 27 This is almost A = LDLT. The matrices on the left are L and U. In the middle is the matrix Mx that was removed by elimination, and the matrix M2 that was left. The remaining step is to divide out the pivots. Instead of U we want DLT, instead of LU we want LDLT, and instead of column times row we want column times pivot times row: \i ,:]-[i]raD 2]+[?F1[01] This is ^ = LDLr, a sum of the simple matrices IjdjlJ. Block Matrices That completes a long section, and we add only two remarks. The first is to mention that matrices can also be multiplied "a column at a time." If the columns of B are bl9..., bn, then AB = A[b1 - bn-] = lAb1 .- Abnl (23) Each column of AB is actually a combination of the columns of A. The second remark is in the same direction, but more general. In practice, the entries of a matrix might come in blocks. The matrix may even be block diagonal (with square blocks of different sizes) or block triangular (with rectangular blocks off the diagonal). Provided the shapes match, we can carry out block multiplication. The blocks can be treated like numbers, except that BA ~* replaces b/a and the order of multiplication is important—since AB ^ BA for most matrices. In block elimination of a symmetric matrix, for example, r / oiva bija \_-BTA-' l\\_BT C\ \_0 B BTA~1B The block pivots are A and C — BTA lB, instead of a and c — b2/a. Inverting the first matrix gives a block form of A = LU = LDLT: YA B1J I 01VA B "I lBT C\ IBTA'1 IJIO C-BTA-1B] = \_BTA~1 l\\_0 C-B^-^JLo / J' (24) The inverse in Exercise 1.3.12 is also a 2 by 2 matrix of blocks. We apologize for all this matrix algebra, but without LDLT there is no good way to solve symmetric systems.  28 1 Symmetric Linear Systems EXERCISES 1.3.1 Write A -[-.' i]' in the forms A = LDLT and A = l1dllJ -\-l2d2ll. Are the 3 -3 5^ pivots positive, so that A is symmetric positive definite? Write 3x2 — 6xlx2 + 5x\ as a sum of squares. 1.3.2 Factor A ■■ a into A = LDLT. Is this matrix positive definite? Write xTAx as a combination of two squares. 1.3.3 Find the triangular factors L and U of A = "110 0" 12 10 0 12 1 0 0 12 In this case U is the same as II. What is the pivot matrix D? Solve Lc = b and Ux = c if ft = (1,0, 0,0). 1.3.4 How do you know from elimination that the rows of L always start with the same zeros as the rows of A1 Note: Zeros inside the central band of A may be lost by L; this is the "fill-in" that is painful for sparse matrices. 1.3.5 Write /= x\ + 10^ x2 + x\ as a difference of squares, and /= x\ + 10xj x2 + 30x| as a sum of squares. What symmetric matrices correspond to these quadratic forms by / = xTAx1 1.3.6 In the 2 by 2 case, suppose the positive coefficients a and c dominate b in the sense that a + c>2b. Is this enough to guarantee that ac>b2 and the matrix is positive definite? Give a proof or a counterexample. 1.3.7 Decide for or against the positive definiteness of A = 1 1 1 1 1 1 1 1 1 and A' = 1 1 1 1 2 2 1 2 3 Write A as lldllj and write A' as LDLT. 1.3.8 If each diagonal entry ait is larger than the sum of the absolute values \atj\ along the rest of its row, then the symmetric matrix A is positive definite. How large would c have to be in A = [~c 1 1 1 c 1 1] 1 c 1  1.3 Positive Definite Matrices and A = LDLT 29 for this statement to apply? How large does c actually have to be to assure that A is positive definite? Note that xTAx = (*! + x2 + x3)2 + (c- l)(xl + x22 + x\)\ when is this positive? 1.3.9 (i) The determinant of a triangular matrix is the product of the entries on the diagonal. Thus det L = 1 and det A = det LDLT = (det L)(det D)(det LT) = det D. The determinant is the product of the pivots. Show that det A > 0 if A is positive definite, (ii) Give an example with det A > 0 in which A is not positive definite, (iii) What is the determinant of A in Exercise 1.3.3? 1.3.10 Inverting A = LDLT gives A'1 =MD~lMT, where M is the inverse of LT. Is M lower triangular or upper triangular? How could you factor A itself, so that the first factor is upper and not lower triangular? 1.3.11 A function F(x, y) has a local minimum at any point where its first derivatives dF/dx and dF/dy are zero and the matrix of second derivatives A = is positive definite. Is this true for F1=x2 — x2y2 + y2 + y3 and F2 = cos x cos y at x = y = 0? Does Fx have a global minimum or can it approach — 00? 1.3.12 Find the inverse of the 2 by 2 symmetric matrix . Verify by direct lb c\ multiplication that the inverse of a 2 by 2 block symmetric matrix is VA BIT1 [A-1 + A~1BSBTA-1 -A~lBSl IBT C] ~|_ -SBTA~l S J' where S = (C — BTA~1B)~1. A and C are square but B can be rectangular. 1.3.13 For the block quadratic form d2F dx2 d2F dxdy d2F dxdy d2F dy2 f=W /] A BT » xTAx + xTBy + yTBTx + yTCy, find the term that completes the square: f=(x + A-1By)TA(x + A-1By) + yT( ? )y. The block matrix is positive definite when A and C — BTA~XB are positive definite.  30 1 Symmetric Linear Systems 1.3.14 The rule for block multiplication of AB seems to be: Vertical cuts in A must be matched by horizontal cuts in B, while other cuts (horizontal in A or vertical in B) can be arbitrary. Examples for 3 by 3 matrices are X X X X X X X X X X X X X X X X X x J X X X X X X |_x X X X X X X X X column times row row times column |~x X |_x X X X X X X ~X X X X X X matching blocks x~l X xj X X X Give two more examples and put in numbers to confirm that the multiplication succeeds. 1.3.15 Find the LDLT factorization, and Cholesky's LU factorization with L = LDl/2, for the matrix i=r4 i2i. |_12 45 J What is the connection to xTAx = (2x1 + 6x2)2 + (3x2)2? 1.3.16 Suppose [~1 "I 2 1 L4 2 u n 3 L i_ "1 2 4] 1 2 lj and b = Solve Ax = b by solving two triangular systems. How do you know that A is symmetric positive definite? 1.3.17 If ax! = d is the first pivot of A, under what condition is a22 the second pivot? Are the pivots of A'1 equal to the reciprocals l/dt of the pivots of A1 1.3.18 Write down in words the sequence of column operations (!) by which the following code computes the usual multipliers ltj. It overwrites atj with these multipliers for i >j, using no extra storage. The notation : = signals a definition in terms of existing quantities. Symmetric factorization LDLT for positive definite A Forj= 1, ..., n For p=l, ...,;-l rP: = dPajP dJ:=aJj- Z aJPrP p=-l  1.3 Positive Definite Matrices and A = LDLT 31 lfdj = 0 then quit else For 1=7 + 1, ..., n The algorithm requires about n3/6 multiplications. 1.3.19 Explain (and if possible code) the following solution of Ax = b for positive definite tridiagonal A. The diagonal of A is originally in dl9 ..., dn and the subdiagonal and superdiagonal in ll9 ...Jn_x\ the solution x overwrites b. For k =2, ..., n t' = lk-i lk-1:=t/dk.1 For k =2,..., n bk: = bk-lk-ibk-i For k = 1, ..., n h-=h/dk For k = n— 1,..., 1 Show how this uses 5n multiplications or divisions, and give an example of failure when A is not positive definite. 1.3.20. If a new row vT is added to A, what is the change in ATA1  32 1 Symmetric Linear Systems 1.4 ■ MINIMUM PRINCIPLES So far the unknown x has appeared in a linear system Ax = b. Now those systems will have competition. They will still be present at the end, when x is actually computed, but the basic statement of the problem will not be a linear equation. Instead, x will be determined by a minimum principle. We introduce minimum principles through two examples. The first starts with equations that have no solution; the original matrix A is rectangular, and there are more equations than unknowns. In that case it is generally impossible to solve Ax = b. Nevertheless such systems constantly appear in applications, and the unknown x has to be determined; it will be chosen to make Ax as close as possible to b. It minimizes the quadratic \\Ax — b\\2, and the choice of x which makes this error a minimum is the least squares solution to Ax = b. The second example is chosen from a thousand possibilities in engineering and science. It is based on a fundamental principle of mechanics, that nature acts so as to minimize energy. The system looks for a point of equilibrium, and an equilibrium is successful only if it is stable. It should require the input of energy to move away from equilibrium; otherwise, if a movement releases rather than consumes energy, that movement will grow.| It is like a ball on a sphere (or a child on a slide). Balanced at the top, the ball is unstable; the steady state is at the bottom, where the potential energy is a minimum. Fortunately, the energy is often a quadratic. Its minimum occurs where the derivative is zero, and the derivative of a quadratic function is linear. Therefore the equilibrium point is determined by a system of linear equations. The scalar case, with only one degree of freedom, is simple but typical: the energy is described by a parabola P(x), so that dP P(x) = \Ax2 — bx and -— = Ax — b. ax The derivative is zero at x = A ~ * b. Provided A is positive this equilibrium is stable; the parabola opens upward. In more dimensions x becomes a vector, A is a symmetric matrix, and the parabola becomes a paraboloid (Fig. 1.4). But the minimum still occurs where Ax = b: 1E If A is positive definite, then the quadratic P(x) = \xT Ax — xTb is minimized at the point where Ax = b. The minimum value is P(A~lb) = —jbTA~lb. Proof Suppose x is the solution to Ax = b. We want to show that at any other point y, P(y) is larger than P(x). Their difference is 11 think this also applies to us.  1.4 Minimum Principles 33 P(y) - P(x) = \yTAy - yTb - {xT Ax + xTb = \yT Ay — yTAx+jxTAx = Uy-x)TA(y-x). (1) Since A is positive definite, the last expression can never be negative; it is zero only if y — x = 0. Therefore P(y) is larger than P(x\ and the minimum occurs at x = A ~ 1b. At that point Pala = UA-1b)TA{A-1b)-{A-1b)Tb=-ibTA-1b, completing the proof. (2) P(x) = \xTAx - xTb Fig. 1.4. The minimum of a quadratic function P(x). The equilibrium point can also be determined from calculus. The first derivatives dP/dxt must be zero, and we take the typical example p = K*i *2] :]fc]-* <} "2 3" 3 5_ = \{2x\ + 6Xix2 + 5xi) ~ 7xi ~ %x2 dP dxx dP_ dx2 = 2xx + 3x2 — 7 = 0 = 3xx + 5x2 - 8 = 0 The vanishing of the derivatives has produced Ax = b. Then to verify that this x gives a minimum, calculus looks also at the second derivatives. It is here that we use the positive definiteness of A, and the work of the previous section. The diagonal  34 1 SymmetrJc Linear Systems entries 2 and 5 are positive, and so is the determinant. For n = 2 this guarantees a minimum. A third very simple proof of IE comes from rewriting P as ±xTAx-xTb = ±(x-A-1b)TA(x-A-1b)-±bTA-1b. This immediately puts the minimum at x = A~^b, to bring the first term on the right side down to zero. That leaves only the constant term Pmin = — \bTA ~ 1fe. It is interesting that the minimum is never positive. The geometry is very straightforward. For b = 0 the minimum occurs at the origin x = 0. For other vectors b, the graph of P is shifted until it is centered at A~lb, and it is lowered so that the value of P at equilibrium (the minimum energy) is Pmin. The graph of P(x) goes through P(0) = 0, so the minimum is not above zero. The cross section of the bowl in Fig. 1.4 is an ellipse. Positive Definiteness of ATA and ATCA The examples share one feature which we have not been brave enough to mention. They do not start with a positive definite matrix A\ In fact the underlying matrix, in these examples and throughout the book, is very frequently rectangular. It might give the connection between the nodes and edges of a network, and the number of nodes is different from the number of edges. But when everything is put together, and we come to an equation of equilibrium, we do get back to n equations in n unknowns—with a square coefficient matrix. That matrix is not the original A. Instead it is ATA, or more often ATCA. You must recognize the difference between this section and the last one. There we started with a square matrix and factored it into LDLT. The equations were given, and the problem was to solve them. That is one side of applied mathematics. The other side—which is probably more important and harder—is to find the equations in the first place. We are given physical laws that connect voltage with current, or pressure with flow. The geometry adds laws of its own, determined by the connections between nodes and edges. From these laws, with the physics represented by C and the geometry by A, come the equations of equilibrium and the minimization of energy. The coefficient matrix gets assembled as ATCA, in formulating the equations, and then it gets disassembled into LDLT, in solving them. The matrices A7A and ATCA are always symmetric (if C is) and we need to know when they are positive definite. 1F (i) If A has linearly independent columns—it can be square or rectangular— then the product ATA is symmetric positive definite, (ii) If also C is symmetric positive definite, so is the triple product ATCA. The first is a special case of the second, with C = I. (The identity matrix is certainly positive definite: its pivots are all 1, its eigenvalues are all 1, and xTIx = x\  1.4 Minimum Principles 35 + • • • 4- x2n is positive except at x = 0.) If the matrix A is m by n, then linear independence of the columns forces n < m. It cannot have more columns than rows, or there would be no chance that the columns were independent. (The number of independent columns equals the number of independent rows, by the fundamental theorem of linear algebra.) A typical example is ATCA |_2 2 lj [100 0 10 L° ° 3_ 1 2] 1 2 0 lj -G ft with m = 3 rows, n = 2 columns (independent!), and an m by m positive definite matrix C in the middle. The final product ATCA is the n by n matrix at the start of this book. To prove that A7A is positive definite we test its quadratic form: xTATAx is the same as (Ax)T(Ax) or || Ax|| This is the square of the length of Ax. It is positive unless Ax = 0. But now enters the independence of the columns: The combination Ax of the columns is never zero except in the unavoidable case x = 0. Thus xTATAx is positive except if x = 0. That makes ATA positive definite, f For ATCA, the quadratic form is again positive: xTATCAx = (Ax)TC(Ax) and C is assumed positive definite. The only way to obtain zero is from Ax = 0. Appealing as above to the independence of the columns of A, Ax = 0 happens only if x = 0. Thus ATCA is positive definite. We now apply these results to the examples. EXAMPLE 1: Least Squares Solution of Ax = b We are given m equations in n unknowns, with m>n. The problem is overdetermined, and this system is not likely to have a solution. It is easy to describe the right hand sides for which Ax = b can be solved; the equation asks that they be combinations of the columns of A, since the matrix-vector multiplication Ax gives xx (column 1) + • • • + xn (column n) = b. The vectors b that can be obtained this way form an n-dimensional subspace—it is the column space of A. This subspace lies within m-dimensional space; all the columns have m components. Thus Ax = b has a solution only if b lies in this n- dimensional subspace of m-dimensional space—and with n < m, that is unlikely. A subspace is very thin. fits near relative AAT is not positive definite (Exercise 1.4.9).  36 1 Symmetric Linear Systems For example, suppose the system Ax = b is C + t^D^bi C + t2D = b2 C + t3D = b3 C + t4D = bA or ~1 o" 1 1 1 3 L1 4_ rci UJ m b2\ h\ IaJ (3) These equations ask for a line y = C + Dt that passes through four given points; it tries for y = bx at t = 0, y = b2 at t = 1, y = b3 at t = 3, and y = b4 at t = 4. Normally that is impossible (Fig. 1.5). The four equations can be solved only if the four points happen to fall on a line. Then the vector b is a combination of the columns of A. The bt might be equal, giving a multiple of the first column and a horizontal line y = C; they might be proportional to the second column 0,1, 3,4, corresponding to a special line y = Dt; or they might be a combination of the two. The choice b = (0, 8, 8, 20) is none of these, and there will be an error e = b — Ax.f b* = 20 : b2 = b3 = $ - fc,=0 < i i ' 1 ► yS ( 1 y=l+4t > i 1 4* t j = 0 f2 = 1 t3 = 3 U = 4 Fig. 1.5. The nearest straight line to the measurements 0, 8, 8, 20. The best line is the one that makes e as small as possible. To minimize the error is to minimize \\Ax — b\\2 = (Ax — b)T(Ax — b\ if we measure a vector in the usual way: The square of its length is \\e\\2 = eTe = el + el+ ••• +e2n. (4) For a right triangle this is just Pythagoras' formula for the hypotenuse. In n t The usual notation in statistics is e — Y — Xfi. We emphasize that the errors e( are the vertical distances to the line, not the shortest distances.  1.4 Minimum Principles 37 dimensions there are n perpendicular sides, and ||e|| is the length of the diagonal in an ^-dimensional box. An ^-dimensional unit cube has a diagonal (1, 1, ..., 1) of length yfn. Which vector x will minimize (Ax — b)T(Ax — b) = xTATAx — xTATb — bTAx + bTW. This is a quadratic, and we can answer that question by IE. The constant term bTb will not affect the minimization and can be ignored. The cross term bTAx is the same as xTATb, since for real vectors like b and Ax the order in the inner product is irrelevant. If we divide by two, we reach the minimization of P = \xTATAx — xTATb.'\ That is the problem solved earlier for \xTAx — xTb and we can copy that solution—except that the matrix is no longer A but ATA, and the vector multiplying x is no longer b but ATb. This changes Ax = b into ATAx = ATb: 1G The vector x that minimizes || Ax — b ||2 is the solution to the normal equations ATAx = ATb. (5) This vector x = (ATA)~1ATb is the least squares solution to Ax = b. The error e = b — Ax will not be zero, but its inner product with every column of A is zero. That gives the normal equations ATe = 0, or ATAx = ATb. The error e is perpendicular to the whole column space of A, and b is decomposed into b = Ax + (b — Ax) = closest point in column space + error. Fig. 1.6. The nearest point to b is p = column 1 + 4 (column 2). This reflects what is clear geometrically, that if Ax is the closest point to b in the column space, then the line from b to Ax is perpendicular to that space (Fig. 1.6). It t The factor \ has no special place in least squares. For physical problems it goes naturally into the definition of energy.  38 1 Symmetric Linear Systems is natural to call this closest point p the projection of b onto the column space. Algebraically, the rule is easy to summarize: if Ax = b has no solution, multiply by AT and solve ATAx = ATb. Fitting Measurements by a Straight Line We return to the four points (0,0), (1, 8), (3, 8), (4, 20), which are not in a straight line. Equation (3) looks for such a line, but from Fig. 1.5 it is hopeless; the four equations have no solution. There is no choice y = C + Dt which will fit all four points exactly. Therefore we choose the values of C and D which make the error a minimum. These optimal values x = (C, D) come from the normal equations. For those equations we need ATA and ATb: ATA = 1111] 0 1 3 4J [l 0~ 1 ' 1 3 L1 4J ATb [0 1 3 4j 20 |_8 26 J Then the system to be solved is ATAx = ATb: 4C+ 8D = 36 8C + 26D=112. It gives C = 1, D = 4, and the best straight line is y = 1 + At. On the closest line, the values of y at times t = 0, 1, 3, 4 are 1, 5, 13, 17. This is the projection p, the first column of A plus 4 times the second column. It is the closest point to the vector b with components 0, 8, 8, 20. The error b — Ax agrees with the vertical distances —1, 3, —5, 3 in Fig. 1.5, and it should be perpendicular to Ax: [-1 3 -5 3] 1 5 13 17 = 0. Notice how b and p and the four errors appear in both figures! In the 4-dimensional figure we see the perpendicularity. In the plane figure we see the actual line.  1.4 Minimum Principles 39 The sum of the individual errors is zero: —1+3 — 5 + 3 = 0. This is always true since the first row of AT is [1 1 1 1]. The first equation in ATAx = ATb is [1 1 1 l]e = 0, and therefore the sum of the et is zero. Geometrically, this can be interpreted in terms of the "center point" at t = (0 + 1 + 3 + 4)/4, b = (0 + 8 + 8 + 20)/4. The best straight line goes through this center point: C + Dt = b or 1 + 4-2 = 9. This application of a minimum principle has solved one form of a basic problem in statistics, the problem of linear regression: 1H Given the measurements bl9...9bm at times ti9 ...,£m, the line y = C + Dt which minimizes the error ||e||2 is determined by The best line is y = 5+ D(t - f), with D = Zt,l Zt?J X(t;- rci UJ- -t)bt Xbt izttb, (6) Mt,-i)2 Failure off Independence in the Columns off A We could take a second measurement b5 at one of the same times, say at t5 = t4 = 4, without any difficulty. Repeating a row of A does not spoil the independence of its columns, and the best line tries to fit both bA and b5. But we could not repeat one of the columns, and look for the best line y = C + Dt + E, without serious trouble. In this case C and E cannot be distinguished, and the new third column of A would be identical to the first: = ~1 0 f 111 13 1 _1 4 lj These columns are not independent and ATA will not be positive definite: ATA = 4 8 4 8 26 8 4 8 4 There is a repeated column and a repeated row, and x = (1,0, — 1) gives xTATAx = 0 The quadratic form is zero at other points than the origin  40 1 Symmetric Linear Systems It is true that A1A is semidefinite: xTATAx is never negative, since it still equals || Ax ||2. But the matrix is singular and we cannot expect a unique solution to the normal equations. It is for exactly this situation that the pseudoinverse was invented. It chooses one particular solution, in this case C = E = \ and D = 4, which leaves the best line at y = 1 + At. This allows the method of least squares to continue even with dependent columns. The optimal Ax is still the projection p— the nearest point to b in the column space. But the formula x = (ATA)~1ATb breaks down and the pseudoinverse has to take the place of the inverse. In almost all applications, the columns will be independent. EXAMPLE 2: Minimum Potential Energy ///////////////////// 777777777777777777777 Fig. 1.7. Four springs and three masses. We are given four springs in a line, with masses between them. The ends at the top and bottom are fixed (Fig. 1.7). The problem is to find the displacements *i> *2> x3 of the three masses. It seems simple, and a little old-fashioned, but it will be important for an enormous class of applications. Therefore we write down everything. There are four vectors involved. Two of them are associated with the springs, and the other two are defined at the nodes (where the masses are). They are (1) the vector x of nodal displacements xl9 x2, x3 (2) the vector e of elongations of the springs (3) the vector y of forces in the springs (the springs exert forces — y on the nodes) (4) the vector / of external forces on the nodes.  1.4 Minimum Principles 41 Each vector is connected to the one above it by a matrix equation: Ax = e and Ce = y and A'y=f. The first step is to find those matrices A and C and AT. (1 to 2) The elongation in the ith spring is et = xt — xt_ x. One end of the spring moves a distance xi9 the other end moves xt _ ^, and the difference is the elongation. Since the boundary displacements are zero—fixed endpoints means x0 = x4 = 0— we have four equations in three unknowns: e = Ax or 1 1 -1 x2 *3 (7) A positive et describes a spring in tension; e4 < 0 for the lowest spring, in compression. (2 to 3) The force in the ith spring is proportional to its elongation, yi = ciei. This comes from Hooke's law for a linearly elastic material. It is the "constitutive law," determining the spring constants ct. It leads to a diagonal (but positive definite) matrix C: y = Ce or )>i yi ^3 )>4 CAr ^4 (8) (3 to 4) The net force at each node is zero: f{ = yt -yi+1. This is the equilibrium equation balancing the internal forces from the springs against the external forces / acting at the nodes. The force ft (positive downwards) acts to stretch spring i and compress spring i: + 1. Therefore the balance of forces is f=ATy or fi 1 -1 1 - -1 yi y3 ^4 (9) It is no accident that the matrix connecting x to e is the transpose of the matrix conneccing y to /. That fact appears automatically in the fundamental variational principles of physics. It means that the internal work of stretching the springs equals the external work done at the nodes: yTe = yTAx=fTx. (10)  42 1 Symmetric Linear Systems In this case the external forces fi,f2,f3 are m1g,m2g, and m3g, the masses multiplied by the gravitational constant. displacements x. K = ATCA external forces /) elongations ei spring forces y} Fig. 1.8. The basic framework of mechanics. Now we assemble the three laws into one. This is simple but crucial. The equations e = Ax and y = Ce combine to give y = CAx; e has been eliminated. Then CAx is substituted for y in the third law ATy =f, to give the fundamental equation: 11 The nodal displacements x are determined from the nodal forces / by the equations Ax = e^\ Ce = y> or ATCAx=f. ATy=f) The matrix K = ATCA is symmetric positive definite. (ii) You see here the fundamental equation of mechanics, for a system in equilibrium. The Stiffness Matrix The coefficient matrix ATCA is one step beyond the ATA of least squares. It combines all three laws, and it is a positive definite matrix—known in computational mechanics as the stiffness matrix, and denoted by K. The same matrix K = ATCA appears in the underlying minimum principle. The springs search for the position in which the total potential energy, of masses and springs together, is a minimum. The potential energy of the masses is negative, -mlgx1 - m2gx2 - m3gx3 = -xTf, since it requires that much work to lift them back to their original positions. The  1.4 Minimum Principles 43 potential energy of the springs is positive, 2^1 ef + \c2 e\ + ^c3 e\ + |c4^ = |erCe = \xT ATCAx. The springs are under strain and will give back energy if they are released.f Thus the total potential energy, for any displacements x, is the quadratic P(x) = $xTATCAx-xTf. The vector that minimizes P(x) is the one that solves equation (11): ATCAx =f. The structure of ATCA is important. If A is m by n, then there are m springs and C is a square matrix of order m. The vectors e and y have m components, and x and / have n (one for each node). In our case m = 4 and n = 3; multiplying the three matrices gives AT n by m c m by m m by n Ci + c2 -c2 0 -c2 c2 + c3 -c3 C3 0 -c3 Off the diagonal, the nonzero entries reflect the connections between nodes. Since node 1 is not connected to node 3, the (1, 3) and (3, 1) entries are zero. Whenever the springs are connected along a line, their matrix ATCA will be tridiagonal: there are only three nonzero diagonals, and the (i,j) entry is zero if |z — j\ > 1. Suppose the springs are identical, with spring constants c{ = 1. Then the stiffness matrix ATCA has the special form ATCA = 2-1 0 -1 2 -1 0-1 2 It is a "second difference" matrix, since the equation ATCAx=f involves the differences — xi+l +2x{ — x,^ =fh with xo = x4 = 0. At a typical node like the second, the forces are xx — x2 from the spring above, x3 — x2 from the spring below, and f2 from gravity. We will see how the second differences become second derivatives, for a continuum of springs. Remark 1 There is a further property of ATCA which appears after elimination. 2 1 0 -1 2 -1 0 -1 2 = 0 - 1 2 l -I i 0 1 -f (12) t At elongation e, the force is ce and the work in a further stretching is ce de. Integrating from e = 0, the total energy stored in the spring is \ce2.  44 1 Symmetric Linear Systems We knew the pivots would be positive, because ATCA is positive definite. As in the earlier example of elimination (the 1, 2, 1 tridiagonal matrix in Section 1.2, with 4-1 instead of — 1 off the diagonal), the factors L and LT are bidiagonal; their structure is inherited from ATCA. The new property is the positivity of the inverse: L~l = i~l 0 * l 1 2 [_3 3 ol 0 lj and (ATCAy1=L-TD-1L~1 = - 3 2 1 2 4 2 1 2 3 (13) The inverse of the stiffness matrix is positive. It is also positive definite, but you should not confuse those two properties: neither one implies the other. Positivity means that when all the forces / go in one direction, so do all the displacements x. Multiplying a positive / by a positive matrix gives a positive x = (ATCA)~1f. In the continuous case we will find the same property for a membrane; when all forces act downwards, the displacement is everywhere down. (It is not true for a beam, governed by fourth derivatives or fourth differences.) The key is in the negative entries off the diagonal of ATCA.-\ They guarantee X-1 > 0. Remark 2 A computer would solve this problem at the speed of light, but there is a crucial point to be made about the computation. The inverse is a full matrix, without the band structure and the zeros of ATCA. In this case we actually lose if we allow X"1 into the computation; multiplying / by the inverse takes much longer (n2 steps for an n by n matrix) than ordinary elimination. The solution should be found as x = L~TD~1L~1f, but not by combining those three matrices into a single inverse. Instead each separate step L"1/, D_1(L-1/), and L~T(D~1L~1f) is a simple back substitution—or in the case of c = L~lf it is a forward substitution: Lc =/ is 1 I 0 - o ol 1 0 ^ M pi c2 1 °3 = |7i] fi\ h\ and cl9c2, c3 are found in that order. For an n by n matrix this needs only n — 1 multiplications, D1 needs n more, and back substitution uses n— 1—a total of 3n — 2, which is much less than n2. The solution of tridiagonal systems, or any system with a narrow band matrix, is tremendously fast—provided the inverse never appears. |A matrix with non-positive off-diagonal entries is an M-matrix if its inverse is nonnegative. No less than 40 equivalent descriptions have been given without assuming symmetry: all pivots are positive, all real eigenvalues are positive, and 38 others. With symmetry this means it is positive definite.  1.4 Minimum Principles 45 EXERCISES 1.4.1 Find the minimum value (and the minimizing x) for (i) P(x) = ^(x2 + x22)-x1b1-x2b2 (ii) P(x) = $(x2 + 2xxx2 + 2x22) -x1+x2. 1.4.2 What equations determine the minimizing x for (i) P = \xTAx-xTb (ii) P = ^xTATAx-xTATb (iii) E=\\Ax-b\\2f> 1.4.3 Find the best least squares solutions to all three problems r2 _i 2 2 L-1 2_ [:;]= 1 _1_ and 2~ -1 2_ and 2~ 2 _-lJ 1.4.4 Find the best least squares solution to [i r 1 2 L1 3_ [3- ol 4 _2j Graph the measurements 0, 4, 2 at times t = 1, 2, 3, and the best straight line. 1.4.5 The best fit to bu b2, b3, b4 by a horizontal line*(a constant function y = C) is their average C = (b1 + b2 + b3 + b4)/4. Confirm this by least squares solution of rr i i _i [C] = [M ^2 b3 Lm From calculus, which C minimizes the error E = (bl — C)2 + ••• + (fc4 — c)2? 1.4.6 Which three equations are to be solved by least squares, if we look for the line y = Dt through the origin that best fits the data y = 4 at t = 1, y = 5 at t = 2, y = 8 at t = 3? What is the least squares solution D that gives the best line? 1.4.7 For the three measurements b = 0, 3,12 at times t = 0,1, 2, find (i) the best horizontal line y = C (ii) the best straight line y = C + Dt (iii) the best parabola y = C + Dt + Et2. 1.4.8 Multiple regression fits two-dimensional data by a plane y = C + Dt + Ez, instead of one-dimensional data by a line. If we are given b1=2atf = z = 0;b2 = 2at£ = 0, z=\; b3 = latt=l,z = 0; and fr4 = 5atf = z=l, write down the 4 equations in the 3 unknowns C, D, E. What is the least squares solution from the normal equations?  46 1 Symmetric Linear Systems 1.4.9 For a matrix with more columns than rows, like A = 1 1 01 2 2 lj or even A = [1 2], the matrix ATA is not positive definite. Why is it impossible for these columns to be independent? Compute A7A in both cases and check that it is singular. 1.4.10 In a system with three springs and two forces and displacements write out the equations e = Ax, y = Ce, and ATy =f. For unit forces and spring constants, what are the displacements? 1.4.11 Suppose the lowest spring in Fig. 1.7 is removed, leaving masses m1,m2,m3 hanging from the three remaining springs. The equation e = Ax becomes 1 -1 _ 0 0 1 -1 °1 0 lj pi *2 L^J Find the corresponding equations y = Ce and ATy =/, and solve the last equation for y. This is the determinate case, with square matrices, when the factors in ATCA can be inverted separately and y can be found before x. 1.4.12 For the same 3 by 3 problem find K = ATCA and A'1 and K~l. If the forces f\->fiih are aH positive, acting in the same direction, how do you know that the displacements xl9x2,x3 are also positive? 1.4.13 If the matrix A is a column of l's as in problem 5 above, can you invent a four spring- one displacement system which produces this matrix? With spring constants cl9c2ic3ic4 and force /, what is the displacement x? 1.4.14 If two springs with constants cx and c2 hang (i) from separate supports, or (ii) with one attached below the other, what masses mx and m2 in each case will produce equal displacements xx and x2? 1.4.15 Suppose the three equations are changed to e = Ax- Eliminate e and y to find a single equation for x. b, y = Ce, and ATy = 0.  1.5 Eigenvalues and Dynamical Systems 47 EIGENVALUES AND DYNAMICAL SYSTEMS ■ 1.5 There is another way to study a symmetric matrix, more subtle than A = LDLT and less mechanical. It involves the eigenvalues and eigenvectors of A. For any square matrix, we can look for vectors x that are in the same direction as Ax; those are the eigenvectors. Multiplication by A normally changes the direction of a vector, but for certain exceptional vectors Ax is a multiple ofx. That is the basic equation Ax = Xx. The number X is the eigenvalue, which tells whether x is stretched or contracted or reversed by A. For the identity matrix nothing changes direction. Every vector is an eigenvector and the eigenvalue is X = 1. Of course this is special. It is much more normal to have n different eigenvalues, and they are easy to recognize when the matrix is diagonal: A = 3 0 L° 0 2 0 ol 0 oj has Ax = 3 with xt = X* = 0 with Xi = X7 = 2 with x-> On each eigenvector the matrix acts like a multiple of the identity, but the multiples 3, 2, 0 are different in the three directions. A vector like x = (2, 4, 6) is a mixture 2xx + 4x2 + 6x3 of eigenvectors, and when A multiplies x it gives /VX — -ZAj Xj ~r 4/2X2 1 OA3X3 — This was a typical vector x—not an eigenvector—but the action of A was still determined by its eigenvalues and eigenvectors. Notice that there was nothing exceptional about X3 = 0. Like every other number, zero might be an eigenvalue and it might not. If it is, then its eigenvector satisfies Ax = Ox. Thus x is in the nullspace of A; a zero eigenvalue signals that A has linearly dependent columns (and rows). Its determinant is zero. Invertible matrices have X # 0, while singular matrices include zero among their eigenvalues. Diagonal matrices are certainly the simplest. The eigenvalues are the diagonal entries and the eigenvectors are in the coordinate directions. For other matrices we find the eigenvalues first, before the eigenvectors, by the following test: If Ax = Xx then (A — XI)x = 0 and A — XI has dependent columns. Therefore the determinant of A — XI must be zero. In other words, shifting the matrix by XI makes it singular. We look for those  48 1 Symmetric Linear Systems exceptional numbers X, and the eigenvector is in the nullspace of A — XI.f Actually it is not correct to speak about the eigenvector, since if x is doubled or tripled or multiplied by — 1 it is still an eigenvector. The equation Ax = Xx still holds after rescaling x, and it is the direction of x that is important. EXAMPLE 1. A = 2 r 1 2 and A — XI = 2-/1 1 1 2-X The eigenvalues of A are the numbers that make the determinant of A — XI zero: det(i4 - XI) = (2 - X)2 - 1 = X2 - 4X + 3 = 0. This factors into (X — 1)(X — 3) = 0, and the eigenvalues are 1 and 3. For each eigenvalue the equation (A — XI)x = 0 yields the corresponding eigenvector: a1=i ^-i1/=rj [i Xi=[_!l A2 = 3 ^-A2/ = f | _[] *2 = [j] You see that the shifted matrix A — XI has dependent columns and (A — XI)x = 0 in each case. That is equivalent to Ax = Xx. If the calculation is done in this order, the equation Ax = Xx can be used to check the algebra. This example is partly special and partly typical. For a 2 by 2 matrix, the determinant of A — XI will always be a quadratic polynomial; its first term is X2. The eigenvalues may not be integers like 1 and 3. They can easily turn out to be complex numbers in the unsymmetric case [; and det Ya-X b ~|_ |_ V c-X] ■ (a + c)X + ac — bb'. This polynomial, whose roots are the eigenvalues, is the characteristic polynomial of A. For an n by n matrix it has degree n and begins with (— X)n. There are n roots and therefore n eigenvalues. For each eigenvalue there is a solution of (A — XI)x = 0—or possibly several independent eigenvectors x, if X is a repeated eigenvalue. In the 2 by 2 case the formula for the roots of a quadratic gives the eigenvalues exactly. When n = 3 and n = 4 there are still formulas for the roots, although they are seldom remembered and rarely used. For n > 4 there is not and never will be such a formula. Most eigenvalues have to be found numerically, and that is not done by computing the determinant of A — XL Instead the matrix is gradually brought into t We use / to keep vectors and matrices straight; (A — X)x = 0 is shorter but mixed up.  1.5 Eigenvalues and Dynamical Systems 49 diagonal form (see Chapter 5) and the eigenvalues begin to appear on the main diagonal. The computation takes longer than elimination—perhaps 20 times longer, if it achieves good accuracy—but that is not unreasonable. The algorithms for eigenvalues depend on the size and structure of A, and they are a major part of numerical linear algebra. The Diagonal Form It is fair to say that diagonalization is the great contribution of the eigenvectors. Suppose the columns of 5 are the eigenvectors of A. Then we want to show that S~1AS is a diagonal matrix. Furthermore it is extremely special; it contains the eigenvalues. It is denoted by A (capital lambda) as a way of remembering that Al5..., Xn are the entries on the diagonal. U Suppose the n by n matrix A has n linearly independent eigenvectors. Then if these vectors are the columns of 5, it follows that S-XAS = A = X (1) This is so important that we do it twice. First the eigenvectors xt go into 5 and we multiply a column at a time. Since they are eigenvectors, AS = A Aj Xj A*2 X