Differential Equations with Linear Algebra
This page intentionally left blank
Differential Equations with Linear Al...

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Differential Equations with Linear Algebra

This page intentionally left blank

Differential Equations with Linear Algebra

Matthew R. Boelkins, J. L. Goldberg, and Merle C. Potter

3 2009

3

Oxford University Press, Inc., publishes works that further Oxford University’s objective of excellence in research, scholarship, and education. Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With ofﬁces in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam

Copyright © 2009 by Oxford University Press, Inc. Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016 www.oup.com Oxford is a registered trademark of Oxford University Press All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press. Library of Congress Cataloging-in-Publication Data Boelkins, Matthew R. Differential equations with linear algebra / Matthew R. Boelkins, J.L. Goldberg, Merle C. Potter. p. cm. Includes index. ISBN 978-0-19-538586-1 (cloth) 1. Differential equations, Linear. 2. Algebras, Linear. I. Goldberg, Jack L. (Jack Leonard), 1932– II. Potter, Merle C. III. Title. QA372.B657 2009 515 .354–dc22 2008050361

9

8

7

6

5

4

3

2

1

Printed in the United States of America on acid-free paper

Contents

1

Introduction

xi

Essentials of linear algebra 1.1 Motivating problems 1.2 Systems of linear equations

3 3 8 15 21

1.2.1

1.3

Row reduction using Maple

Linear combinations 1.3.1 1.3.2

Markov chains: an application of matrix-vector multiplication Matrix products using Maple

1.4 1.5 1.6 1.7

The span of a set of vectors Systems of linear equations revisited Linear independence Matrix algebra

1.8

The inverse of a matrix

1.7.1 1.8.1 1.8.2

1.9

Computer graphics Matrix inverses using Maple

The determinant of a matrix 1.9.1

1.10

Matrix algebra using Maple

Determinants using Maple

The eigenvalue problem 1.10.1 1.10.2

Markov chains, eigenvectors, and Google Using Maple to ﬁnd eigenvalues and eigenvectors

26 29 33 39 49 58 62 66 70 73 78 82 84 93 94

vi

Contents

1.11 1.12 1.13

Generalized vectors Bases and dimension in vector spaces For further study 1.13.1 1.13.2 1.13.3

2

First-order differential equations 2.1 Motivating problems 2.2 Deﬁnitions, notation, and terminology 2.2.1

2.3 2.4

2.5

Plotting slope ﬁelds using Maple

Linear ﬁrst-order differential equations Applications of linear ﬁrst-order differential equations 2.4.1 2.4.2 2.4.3

Mixing problems Exponential growth and decay Newton’s law of Cooling

Nonlinear ﬁrst-order differential equations 2.5.1 2.5.2

Separable equations Exact equations

2.6

Euler’s method

2.7

Applications of nonlinear ﬁrst-order differential equations

2.6.1

2.7.1 2.7.2

2.8

Implementing Euler’s method in Excel

The logistic equation Torricelli’s law

For further study 2.8.1 2.8.2 2.8.3 2.8.4

3

Computer graphics: geometry and linear algebra at work Bézier curves Discrete dynamical systems

Converting certain second-order des to ﬁrst-order DEs How raindrops fall Riccati’s equation Bernoulli’s equation

Linear systems of differential equations 3.1 Motivating problems 3.2 The eigenvalue problem revisited 3.3 Homogeneous linear ﬁrst-order systems 3.4 Systems with all real linearly independent eigenvectors 3.4.1

3.5 3.6 3.7

Plotting direction ﬁelds for systems using Maple

When a matrix lacks two real linearly independent eigenvectors Nonhomogeneous systems: undetermined coefﬁcients Nonhomogeneous systems: variation of parameters 3.7.1

Applying variation of parameters using Maple

99 108 115 115 119 123 127 127 129 135 139 147 147 148 150 154 154 157 162 167 172 172 176 181 181 182 183 184 187 187 191 202 211 219 223 236 245 250

Contents

3.8

Applications of linear systems 3.8.1 3.8.2 3.8.3

3.9

For further study 3.9.1 3.9.2

4

4.4

Repeated roots Complex roots

Nonhomogeneous equations 4.4.1 4.4.2

Undetermined coefﬁcients Variation of parameters

4.5 4.6

Forced motion: beats and resonance Higher order linear differential equations

4.7

For further study

4.6.1 4.7.1 4.7.2 4.7.3 4.7.4

Solving characteristic equations using Maple Damped motion Forced oscillations with damping The Cauchy–Euler equation Companion systems and companion matrices

Laplace transforms 5.1 Motivating problems 5.2 Laplace transforms: getting started 5.3 General properties of the Laplace transform 5.4 Piecewise continuous functions 5.4.1 5.4.2 5.4.3

5.5 5.6

5.7

The Heaviside function The Dirac delta function The Heaviside and Dirac functions in Maple

Solving IVPs with the Laplace transform More on the inverse Laplace transform 5.6.1

Laplace transforms and inverse transforms using Maple

For further study 5.7.1 5.7.2 5.7.3

6

Diagonalizable matrices and coupled systems Matrix exponential

Higher order differential equations 4.1 Motivating equations 4.2 Homogeneous equations: distinct real roots 4.3 Homogeneous equations: repeated and complex roots 4.3.1 4.3.2

5

Mixing problems Spring-mass systems RLC circuits

Laplace transforms of inﬁnite series Laplace transforms of periodic forcing functions Laplace transforms of systems

Nonlinear systems of differential equations 6.1 Motivating problems

vii

253 253 255 258 268 268 270 273 273 274 281 281 283 288 289 295 300 309 316 319 319 321 323 325 329 329 331 337 347 347 353 357 359 371 375 378 378 380 384 387 387

viii

Contents

6.2

Graphical behavior of solutions for 2 × 2 nonlinear systems 6.2.1

6.3 6.4

Linear approximations of nonlinear systems Euler’s method for nonlinear systems

6.5

For further study

6.4.1 6.5.1 6.5.2

7

Implementing Euler’s method for systems in Excel The damped pendulum Competitive species

Numerical methods for differential equations 7.1 Motivating problems 7.2 Beyond Euler’s method 7.2.1 7.2.2

7.3

7.4

Taylor methods Runge–Kutta methods

Methods for systems and higher order equations 7.4.1 7.4.2 7.4.3 7.4.4

7.5

Heun’s method Modiﬁed Euler’s method

Higher order methods 7.3.1 7.3.2

Euler’s method for systems Heun’s method for systems Runge–Kutta method for systems Methods for higher order IVPs

For further study 7.5.1 7.5.2 7.5.3

8

Plotting direction ﬁelds of nonlinear systems using Maple

Predator–Prey equations Competitive species The damped pendulum

Series solutions for differential equations 8.1 Motivating problems 8.2 A review of Taylor and power series 8.3 Power series solutions of linear equations 8.4 Legendre’s equation 8.5 Three important examples 8.5.1 8.5.2 8.5.3

8.6 8.7

The Hermite equation The Laguerre equation The Bessel equation

The method of Frobenius For further study 8.7.1 8.7.2

Taylor series for ﬁrst-order differential equations The Gamma function

391 397 400 409 413 417 417 418 421 421 423 424 427 430 431 434 439 440 442 443 445 449 449 450 450 453 453 455 463 471 477 477 480 482 485 491 491 491

Contents

ix

Appendix A

Review of integration techniques

493

Appendix B

Complex numbers

503

Appendix C

Roots of polynomials

509

Appendix D

Linear transformations

513

Appendix E

Solutions to selected exercises

523

Index

549

This page intentionally left blank

Introduction

In Differential Equations with Linear Algebra, we endeavor to introduce students to two interesting and important areas of mathematics that enjoy powerful interconnections and applications. Assuming that students have completed a semester of multivariable calculus, the text presents an introduction to critical themes and ideas in linear algebra, and then, in its remaining seven chapters, investigates differential equations while highlighting the role that linearity plays in their study. Throughout the text, we strive to reach the following goals: • To motivate the study of linear algebra and differential equations through interesting applications in order that students may see how theoretical results can answer fundamental questions that arise in physical situations. • To demonstrate the fact that linear algebra and differential equations can be presented as two parts of a mathematical whole that is coherent and interconnected. Indeed, we regularly discuss how the structure of solutions to linear differential equations and systems of equations exemplify important ideas in linear algebra, and how linear algebra often answers key questions regarding differential equations. • To present an exposition that is intended to be read and understood by students. While certainly every textbook is written with students in mind, often the rigor and formality of standard mathematical presentation takes over, and books become difﬁcult to read. We employ an examples-ﬁrst philosophy that uses an intuitive approach as a lead-in to more general, theoretical results. xi

xii

Introduction

• To develop in students a deep understanding of what may be their ﬁrst exposure to post-calculus mathematics. In particular, linear algebra is a fundamental subject that plays a key role in the study of much higher level mathematics; through its study, as well as our investigations of differential equations, we aim to provide a foundation for further study in mathematics for students who are so interested. Whether designed for mathematics or engineering majors, many universities offer a hybrid course in linear algebra and differential equations, and this text is written for precisely such a class. At other institutions, linear algebra and differential equations are treated in two separate courses; in settings where linear algebra is a prerequisite to the study of differential equations, this text may also be used for the differential equations course, with its ﬁrst chapter on linear algebra available as a review of previously studied material. More details on the ways the book can be implemented in these courses follows shortly in the section How to Use this Text. An overriding theme of the book is that if a differential equation or system of such equations is linear, then we can usually solve it exactly.

Linear algebra and systems ﬁrst

In most other texts that present the subjects of differential equations and linear algebra, the presentation begins with ﬁrst-order differential equations, followed by second- and higher order linear differential equations. Following these topics, a modest amount of linear algebra is introduced before beginning to consider systems of linear differential equations. Here, however, we begin on the very ﬁrst page of the text with an example that shows the natural way that systems of linear differential equations arise, and use this example to motivate the need to study linear algebra. We then embark on a one-chapter introduction to linear algebra that aims not only to introduce such important concepts as linear combinations, linear independence, and the eigenvalue problem, but also to foreshadow the use of such topics in the study of differential equations. Following chapter 1, we consider ﬁrst-order differential equations brieﬂy in chapter 2, using the study of linear ﬁrst-order equations to highlight some of the key ideas already encountered in linear algebra. From there, we quickly proceed to an in-depth presentation of systems of linear differential equations in chapter 3. In that setting, we show how the eigenvalues of an n × n matrix A naturally provide the general solution to systems of linear differential equations in the form x = Ax. Moreover, we include examples that show how any single higher order linear differential equation may be converted to a system of equations, thus providing further motivation for why we choose to study systems ﬁrst. Through this approach, we again strive to emphasize critical connections between linear algebra and differential equations and to demonstrate the most important ideas that arise in the study of each. In the remainder of the text, the

Introduction

xiii

role of linear algebra is continually emphasized, even in the study of nonlinear equations and systems.

Features of the text

Instructors and students alike will ﬁnd several consistent features in the presentation. • Each chapter begins with one or two motivating problems that present a natural situation—often a physical application—in which linear algebra or differential equations arises. From such problems, we work to develop related ideas in subsequent sections that enable us to solve the original problem. In discussing the motivating problems, we also endeavor to use our intuition to predict the solution(s) we expect to ﬁnd, and then later test our results against these predictions. • In almost every section of the text, we use an examples-ﬁrst approach. By this we mean that we introduce a certain type of problem that we are interested in solving, and then consider a relatively simple one that can be solved by intuition or ideas studied previously. From the solution of an elementary example, we then discuss how this approach can be generalized or modiﬁed to solve more complex examples, and then ultimately prove or state theorems that provide general results that enable the solution of a wide range of problems. With this philosophy, we strive to demonstrate how the general theory of mathematics comes from experimenting and investigating through individual examples followed by looking for overall trends. Moreover, we often use this approach to foreshadow upcoming ideas: for example, while studying linear algebra, we look ahead to a handful of fundamental differential equations. Similarly, early on in our investigations of the Laplace transform, we regularly attempt to demonstrate through examples how the transform will be used to solve initial-value problems. • While there are many formal theoretical results that hold in both linear algebra and differential equations, we have endeavored to emphasize intuition. Speciﬁcally, we use the aforementioned examples-ﬁrst approach to solve sample problems and then present evidence as to why the details of the solution process for a small number of examples can be generalized to an overall structure and theory. This is in contrast to many books that ﬁrst present the overall theory, and then demonstrate the theory at work in a sequence of subsequent examples. In addition, we often eschew formal proofs, choosing instead to present more heuristic or intuitive arguments that offer evidence of the truth of important theorems. • Wherever possible, we use visual reasoning to help explain important ideas. With over 100 graphics included in the text, we have provided

xiv

Introduction

ﬁgures that help deepen students’ understanding and offer additional perspective on essential concepts. By thinking graphically, we often ﬁnd that an appropriate picture sheds further light on the solution to a problem and how we should expect it to behave, thus adding to our intuition and understanding. • With computer algebra systems (CASs), such as Maple and Mathematica, approaching their twentieth year of existence, these technologies are an important part of the landscape of the teaching and learning of mathematics. Especially in more sophisticated subjects with computationally complicated problems, these tools are now indispensable. We have chosen to integrate instructional support for Maple directly within the text, while offering similar commentary for Mathematica, MATLAB, and SAGE on our website, www.oup.com/ differentialequations/. For each, students can ﬁnd directions for how to effectively use computer algebra systems to generate important graphs and execute complicated or tedious calculations. Many sections of the text are followed by a short subsection on “Using Maple to . . ..” Parallel sections for the other CASs, numbered similarly, can be found on the website. • Each chapter ends with a section titled For further study. In this setting, rather than a full exposition, a sequence of leading questions is presented to guide students to discover some key ideas in more advanced problems that arise naturally from the material developed to date. These sections can be used as a basis for instructor-led in-class discussions or as the foundation for student projects or other assignments. Interested students can also pursue these topics on their own.

How to use this text

There are two courses for which this text is well-suited: a hybrid course in linear algebra and differential equations, or a course in differential equations that requires linear algebra as a prerequisite. We address each course separately with some suggestions for instructors. Linear algebra and differential equations

For a hybrid course in the two subjects, instructors should begin with chapter 1 on linear algebra. There, in addition to an introduction to many essential ideas in the subject, students will encounter a handful of examples on linear differential equations that foreshadow part of the role of linear algebra in the ﬁeld of differential equations. The goal of the chapter on linear algebra is to introduce important ideas such as linear combinations, linear independence and span, matrix algebra, and the eigenvalue problem. At the close of chapter 1

Introduction

xv

we also introduce abstract vector spaces in anticipation of the structural role that vector spaces play in solving linear systems of differential equations and higher order linear differential equations. Instructors may choose to move on from chapter 1 upon completing section 1.10 (the eigenvalue problem), as this is the last topic that is absolutely essential for the solution of linear systems of differential equations in chapter 3. Discussion of ideas like basis, dimension, and vector spaces of functions from the ﬁnal two sections of chapter 1 can occur alongside the development of general solutions to systems of linear differential equations or higher order linear differential equations. Over the past decade or two, ﬁrst-order differential equations have become a standard topic that is normally discussed in calculus courses. As such, chapter 2 can be treated lightly at the instructor’s discretion. In particular, it is reasonable to expect that students are familiar with direction ﬁelds, separable differential equations, Euler’s method, and several fundamental applications, such as Newton’s law of Cooling and the logistic differential equation. It is less likely that students will have been exposed to integrating factors as a solution technique for linear ﬁrst-order equations and the solution methods for exact equations. In any case, chapter 2 is not one on which to linger. Instructors can choose to selectively discuss a small number of sections in class, or assign the pages there as a reading assignment or project for independent investigation. Chapter 3 on systems of linear differential equations is the heart of the text. It can be begun immediately following section 1.10 in chapter 1. Here we ﬁnd not only a large number of rich ideas that are important throughout the study of differential equations, but also evidence of the essential role that linear algebra plays in the solution of these systems. As is noted on several occasions in chapter 3, any higher order linear differential equation may be converted to a system of ﬁrst-order equations, and thus an understanding of systems enables one to solve these higher order equations as well. Thus, the material in chapter 4 may be de-emphasized. Instructors may choose to provide a brief overview, in class, of how the ideas in solving linear systems translate naturally to the higher order case, or may choose to have students investigate these details on their own through a sequence of reading and homework assignments or a group project. Section 4.5 on beats and resonance is one to discuss in class as these phenomena are fascinating and important and the perspective of higher order equations is a more natural context in which to consider their solution. The Laplace transform is a topic that affords discussion of a variety of important ideas: linear transformations, differentiation and integration, direct solution of initial-value problems, discontinuous forcing functions, and more. In addition, it can be viewed as a gateway to more sophisticated mathematical techniques encountered in more advanced courses in mathematics, physics, and engineering. Chapter 5 is written with the goal of introducing students to the Laplace transform from the perspective of how it can be used to solve initial-value problems. This emphasis is present throughout the chapter, and culminates in section 5.5.

xvi

Introduction

Finally, a course in both linear algebra and differential equations should not be considered complete until there has been at least some discussion of nonlinearity. Chapter 6 on nonlinear higher order equations and systems offers an examination of this concept from several perspectives, all of which are related to our previous work with linear differential equations. Direction ﬁelds, approximation by linear systems, and an introduction to numerical approximation with Euler’s method are natural topics with which to round out the course. Due to the time required to introduce the subject of linear algebra to students, the ﬁnal two chapters of the text (on numerical methods and series solutions) are ones we would normally not expect to be considered in a hybrid course. Differential equations with a linear algebra prerequisite

For a differential equations course in which students have already taken linear algebra, chapter 1 may be used as a reference for students, or as a source of review as needed. The comments for the hybrid course above for chapters 2–5 hold for a straight differential equations class as well, and we would expect instructors to use the time not devoted to the study of linear algebra to focus more on the material on nonlinearity in chapter 6, numerical methods in chapter 7, and series solutions in chapter 8. The ﬁrst several sections of chapter 7 may be treated any time after ﬁrst-order differential equations have been discussed; only the ﬁnal section in that chapter is devoted to systems and higher order equations where the methods naturally generalize work with ﬁrst-order equations. In addition to spending more time on the ﬁnal three chapters of the text, instructors of a differential equations-only course can take advantage of the many additional topics for consideration in the For further study sections that close each chapter. There is a wide range of subjects from which to choose, both theoretical and applied, including discrete dynamical systems, how raindrops fall, matrix exponentials, companion matrices, Laplace transforms of periodic piecewise continuous forcing functions, and competitive species. Appendices

Finally, the text closes with ﬁve appendices. The ﬁrst three—on integration techniques, polynomial zeros, and complex numbers—are intended as a review of familiar topics from courses as far back in students’ experience as high school algebra. The instructor can refer to these topics as necessary and encourage students to read them for review. Appendix D is different in that it aims to connect some key ideas in linear algebra and differential equations through a more sophisticated viewpoint: linear transformations of vector spaces. Some of the material there is appropriate for consideration following chapter 1, but it is perhaps more suited to discussion after the Laplace transform has been introduced. Finally, appendix E contains answers to nearly all of the odd-numbered exercises in the text.

Introduction

xvii

Acknowledgments

We are grateful to our institutions for the time and support provided to work on this manuscript; to several anonymous reviewers whose comments have improved it; to our students for their feedback in classroom-testing of the text; and to all students and instructors who choose to use this book. We welcome all comments and suggestions for improvement, while taking full responsibility for any errors or omissions in the text. Matt Boelkins/J. L. Goldberg/Merle Potter

This page intentionally left blank

Differential Equations with Linear Algebra

This page intentionally left blank

1 Essentials of linear algebra

1.1 Motivating problems

The subjects of differential equations and linear algebra are particularly important because each ﬁnds a wide range of applications in fundamental physical problems. We consider two situations that involve systems of equations to motivate our work in this chapter and much of the remainder of the text. The pollution of bodies of water is an important issue for humankind. Environmental scientists are particularly interested in systems of rivers and lakes where they can study the ﬂow of a given pollutant from one body of water to another. For example, there is great concern regarding the presence of a variety of pollutants in the Great Lakes (Lakes Michigan, Superior, Huron, Erie, and Ontario), including salt due to snow melt from highways. Due to the large number of possible ways for salt to enter and exit such a system, as well as the many lakes and rivers involved, this problem is mathematically complicated. But we may gain a feel for how one might proceed by considering a simple system of two tanks, say A and B, where there are independent inﬂows and outﬂows from each, as well as two pipes with opposite ﬂows connecting the tanks as pictured in ﬁgure 1.1. We will let x1 denote the amount of salt (in grams) in A at time t (in minutes). Since water ﬂows into and out of the tank, and each such ﬂow carries salt, the amount of salt x1 is changing as a function of time. We know from calculus that dx1 /dt measures the rate of change of salt in the tank with respect to time, and is measured in grams per minute. In this basic model, we can see that the rate of change of salt in the tank will be the difference between the net rate of salt ﬂowing in and the net rate of salt ﬂowing out. 3

4

Essentials of linear algebra

A

B

Figure 1.1 Two tanks with inﬂows, outﬂows,

and connecting pipes.

As a simplifying assumption, we will suppose that the volume of solution in each tank remains constant and all inﬂows and outﬂows happen at the identical rate of 5 liters per minute. We will further assume that the tanks are uniformly mixed so that the salt concentration in each is identical throughout the tank at a given time t . Let us now suppose that the volume of tank A is 200 liters; as we just noted, the pipe ﬂowing into A delivers solution at a rate of 5 liters per minute. Moreover, suppose that this entering water is contaminated with 4 g of salt per liter. An analysis of the units on these quantities shows that the rate of inﬂow of salt into A is 5 liters 4 g g · = 20 min liter min

(1.1.1)

There is one other inﬂow to consider, that being the pipe from B, which we will consider momentarily after ﬁrst examining the behavior of the outﬂow. For the solution exiting the drain from A at a rate of 5 liters/min, observe its concentration is unknown and depends on the amount of salt in the tank at time t . In particular, since there are x1 g of salt in the tank at time t , and this is distributed over the volume of 200 liters, we can say (using the simplifying assumption that the tank’s contents stay uniformly mixed) that the rate of outﬂow of salt in each of the exiting pipes is 5 liters x1 g x1 g · = min 200 liters 40 min

(1.1.2)

Since there are two such exit ﬂows, this means that the combined rate of outﬂow of salt from A is twice this amount, or x1 /20 g/min. Finally, there is one last inﬂow to consider. Note that solution from B is entering A at a rate of 5 liters per minute. If we assume that B has a (constant) volume of 400 liters, this ﬂow has a salt concentration of x2 g/400 liters. Thus the rate of salt entering A from B is 5 liters x2 g x2 g · = min 400 liters 80 min

(1.1.3)

Motivating problems

5

Combining the rates of inﬂow (1.1.1) and (1.1.3) and outﬂow (1.1.2), where inﬂows are considered positive and outﬂows negative, leads us to the differential equation x2 x1 dx1 = 20 + − (1.1.4) dt 80 20 Since we have two tanks in the system, there is a second differential equation to consider. Under the assumptions that B has a volume of 400 liters, the pipe entering B carries a concentration of salt of 7 g/liter, and the net rates of inﬂow and outﬂow match those into A, a similar analysis to the above reveals that dx2 x1 x2 = 35 + − dt 40 40 Together, these two DEs form a system of DEs, given by

(1.1.5)

dx1 x2 x1 = 20 + − (1.1.6) dt 80 20 dx2 x1 x2 = 35 + − dt 40 40 Systems of DEs are therefore, seen to play a key role in environmental processes. Indeed, they ﬁnd application in studying the vibrations of mechanical systems, the ﬂow of electricity in circuits, the interactions between predators and prey, and much more. We will begin our examination of the mathematics involved with systems of differential equations in chapter 3. An important question related to the above system of DEs leads us to a more familiar mathematical situation, one that is the foundation of much of the subject of linear algebra. For the system of tanks above, we might ask, “under what circumstances is the amount of salt in the two tanks not changing?” In such a situation, neither x1 nor x2 varies, so the rate of change of each is zero, and therefore dx1 dx2 = =0 dt dt Substituting these values into the system of DEs, we see that this results in the system of linear equations x2 x1 0 = 20 + − (1.1.7) 80 20 x1 x2 − 0 = 35 + 40 40 Multiplying both sides of the ﬁrst equation by eighty and the second by forty and rearranging terms, we ﬁnd an equivalent system to be 4x1 − x2 = 1600 x1 − x2 = −1400 Geometrically, this system of linear equations represents the set of all points that simultaneously lie on each of the two lines given by the respective equations.

6

Essentials of linear algebra

The solution of such 2 × 2 systems is typically discussed in introductory algebra classes where students learn how to solve systems like these with the methods of substitution and elimination. Doing so here leads to the unique solution x1 = 1000, x2 = 2400; one interpretation of this ordered pair is that the system of two tanks has an equilibrium state where, if the two tanks ever reach this level of salinity, that salinity will then stay constant. With further study of linear algebra and DEs, we will be able to show that over time, regardless of how much salt is initially in each tank, the amount of salt in A will approach 1000 g, while that in B will approach 2400 g. We will thus call the equilibrium point stable. Electrical circuits are another physical situation where systems of linear equations naturally arise. Flow of electricity through a collection of wires is similar to the ﬂow of water through a sequence of pipes: current measures the ﬂow of electrons (charge carriers) past a given point in the circuit. Typically, we think about a battery as a source that provides a ﬂow of electricity, wires as a collection of paths along which the electricity may ﬂow, and resistors as places in the circuit where electricity is converted to some sort of output such as heat or light. While we will discuss the principles behind the ﬂow of electricity in more detail in section 3.8, for now a basic understanding of Kirchoff’s laws enables us to see an important application of linear systems of equations. In a given loop or branch j of a circuit, current is measured in amperes (A) and is denoted by the symbol Ij . Resistances are measured in ohms (), and the energy produced by the battery is measured in volts. As shown in ﬁgure 1.2, we use arrows in the circuit to represent the direction of ﬂow of the current; when 10V + −

6Ω

I3

I3 I2

I2

a

b

2Ω I1

3Ω

4Ω

I1

+ −

5V Figure 1.2 A simple circuit with two loops, two

batteries, and four resistors.

Motivating problems

7

this ﬂow is away from the positive side of a battery (the circles in the diagram), then the voltage is taken to be positive. Otherwise, the voltage is negative. Two fundamental laws govern how the currents in various loops of the circuit behave. One is Kirchoff’s current law, which is essentially a conservation law. It states that the sum of all current ﬂowing into a node equals the sum of the current ﬂowing out. For example, in ﬁgure 1.2 at junction a, I1 + I3 = I2

(1.1.8)

Similarly, at junction b, we must have I2 = I1 + I3 . This equation is identical to (1.1.8) and adds no new information about the currents. Ohm’s law governs the ﬂow of electricity through resistors, and states that the voltage drop across a resistor is proportional to the current. That is, V = IR, where R is a constant that is the amount of resistance, measured in ohms. For instance, in the circuit given in ﬁgure 1.2, the voltage drop through the 3- resistor on the bottom right is V = 3 . Kirchoff’s voltage law states that, in any closed loop, the sum of the voltage drops must be zero. Since the battery that is present maintains a constant voltage, it follows that in the bottom loop of the given circuit, 4I1 + 2I2 + 3I1 = 5

(1.1.9)

Similarly, in the upper loop, we have 6I3 + 2I2 = 10

(1.1.10)

Finally, in the outer loop, taking into account the direction of ﬂow of electricity by regarding opposing ﬂows as having opposing signs, we observe 6I3 − 4I1 − 3I1 = −5 + 10

(1.1.11)

Taking (1.1.8) through (1.1.11), combining like terms, and rearranging each so that indices are in increasing order, we have the system of linear equations I 1 − I2 + I3 = 0 7I1 + 2I2 =5 2I2 + 6I3 = 10 −7I1 + 6I3 = 5

(1.1.12)

We will call the system (1.1.12) a 4 × 3 system to represent the fact that it is a collection of four linear equations in three unknown variables. Its solution—the set of all possible values of (I1 , I2 , I3 ) that make all four equations simultaneously true—provides the current in each loop of the circuit. In this ﬁrst chapter, we will develop our understanding of the more general situation of systems of linear equations with m linear equations in n unknown variables. This problem will lead us to consider important ideas from the theory of matrices that play key roles in a variety of applications ranging from computer graphics to population dynamics; related ideas will ﬁnd further applications in our subsequent study of systems of differential equations.

8

Essentials of linear algebra

1.2 Systems of linear equations

Linear equations are the simplest of all possible equations and are involved in many applications of mathematics. In addition, linear equations play a fundamental role in the study of differential equations. As such, the notion of linearity will be a theme throughout this book. Formally, a linear equation in variables x1 , . . . , xn is one having the form a1 x1 + a2 x2 + · · · + an xn = b

(1.2.1)

where the coefﬁcients a1 , . . . , an and the value b are real or complex numbers. For example, 2x1 + 3x2 − 5x3 = 7 is a linear equation, while x12 + sin x2 − x3 ln x1 = 5 is not. Just as the equation 2x1 + 3x2 = 7 describes a line in the x1 –x2 plane, the linear equation 2x1 + 3x2 − 5x3 = 7 determines a plane in three-dimensional space. A system of m linear equations in n unknown variables is a collection of m linear equations in n variables, say x1 , . . . , xn . We often refer to such a system as an “m × n system of equations.” For example, x1 + 2x2 + x3 = 1 x1 + x2 + 2x3 = 0

(1.2.2)

is a system of two linear equations in three unknown variables. A solution to the system is any point (x1 , x2 , x3 ) that makes both equations simultaneously true; the solution set for (1.2.2) is the collection of all such solutions. Geometrically, each of these two equations describes a plane in three-dimensional space, as shown in ﬁgure 1.3, and hence the solution set consists of all points that lie on both of the planes. Since the planes are not parallel, we expect this solution set to 10 x3 x1+ 2x2+ x3 = 1 5 x1+ x2+ 2x3 = 0 2 x1

2

x2

Figure 1.3 The intersection of the planes x1 + 2x2 +

x3 = 1 and x1 + x2 + 2x3 = 0.

Systems of linear equations

9

form a line in R3 . Note that R denotes the set of all real numbers; R3 represents familiar three-dimensional Euclidean space, the set of all ordered triples with real entries. The solution set for the system (1.2.2) may be determined using elementary algebraic steps. We say that two systems are equivalent if they share the same solution set. For example, if we multiply both sides of the ﬁrst equation by −1 and add this to the second equation, we eliminate x1 in the second equation and get the equivalent system x1 + 2x2 + x3 = 1 −x2 + x3 = −1 Next, we multiply both sides of the second equation by −1 to get x1 + 2x2 + x3 = 1 x 2 − x3 = 1 Finally, if we multiply the second equation by −2 and add it to the ﬁrst equation, it follows that x1 + 3x3 = −1 (1.2.3) x2 − x3 = 1 This shows that any solution (x1 , x2 , x3 ) of the original system must satisfy the (simpler) equivalent system of equations x1 = −1 − 3x3 and x2 = 1 + x3 . Said differently, any point in R3 of the form (−1 − 3x3 , 1 + x3 , x3 ), where x3 ∈ R (here the symbol ‘∈’ means is an element of ), is a solution to the system. Replacing x3 by the parameter t , we recognize that the solution to the system is the line parameterized by (1.2.4) (−1 − 3t , 1 + t , t ), t ∈ R which is the intersection of the two planes with which we began, as seen in ﬁgure 1.3. Note that this shows there are inﬁnitely many solutions to the given system of equations; a particular example of such a solution may be found by selecting any value of t (i.e., any point on the line). We can also check that the resulting point makes both of the original equations true. It is not hard to see in the 2 × 2 case that any linear system has either no solution (the lines are parallel), a unique solution (the lines intersect once), or inﬁnitely many solutions (the two equations represent the same line). These three options (no solution, exactly one solution, or inﬁnitely many) turn out to be the only possible cases for any m × n system of linear equations. A system with at least one solution is said to be consistent, while a system with no solution is called inconsistent. In our work above from (1.2.2) to (1.2.3) in reducing the given system of equations to a simpler equivalent one, it is evident that the coefﬁcients of the system played the key role, while the variables x1 , x2 , and x3 (and the equals sign) were essentially placeholders. It proves expedient to therefore change notation and collect all of the coefﬁcients into a rectangular array (called a matrix) and eliminate the redundancy of repeatedly writing the variables. Let us reconsider

10

Essentials of linear algebra

our above work in this light, where we will now refer to rows in the coefﬁcient matrix rather than equations in the original system. When we create a right-most column consisting of the constants from the right-hand side of each equation, we often say we have an augmented matrix. From the ‘simplest’ version of the system at (1.2.3), the corresponding augmented matrix is 1 0 3 −1 0 1 −1 1 The 0’s represent variables that have been eliminated in each equation. From this, we see that our goal in working with a matrix that represents a system of equations is essentially to introduce as many zeros as possible through operations that do not change the solution set of the system. We now repeat the exact same steps we took with the system above, but translate our operations to be on the matrix, rather than the equations themselves. We begin with the augmented matrix 1 2 1 1 1 1 2 0 To introduce a zero in the bottom left corner, we add −1 times the ﬁrst row to the second row, to yield a new row 2 and the updated matrix 1 2 1 1 0 −1 1 −1 The ‘0’ in the second entry of the ﬁrst column shows that we have eliminated the presence of the x1 variable in the second equation. Next, we can multiply row 2 by −1 to obtain an updated row 2 and the augmented matrix 1 2 1 1 0 1 −1 1 Finally, if we multiply row 2 by −2 and add this to row 1, we ﬁnd a new row 1 and the matrix 1 0 3 −1 0 1 −1 1 At this point, we have introduced as many zeros as possible1 , and have arrived at our goal of the simplest possible equivalent system. We can reinterpret the matrix as a system of equations: the ﬁrst row implies that x1 + 3x3 = −1, while the second row implies x2 − x3 = 1. This leads us to ﬁnd, as we did above, that any solution (x1 , x2 , x3 ) of the original system must be of the form (−1 − 3x3 , 1 + x3 , x3 ), where x3 ∈ R. 1 Any additional row operations to introduce zeros in the third or fourth columns will replace the zeros in columns 1 or 2 with nonzero entries.

Systems of linear equations

11

We will commonly need to refer to the number of rows and columns in a matrix. For example, the matrix

1 0 3 −1 0 1 −1 1

has two rows and four columns; therefore, we say this is a 2 × 4 matrix. In general, an m × n matrix has m rows and n columns. Observe that if we have a 2 × 3 system of equations, its corresponding augmented matrix will be 2 × 4. The above example demonstrates the general fact that there are basic operations we can perform on an augmented matrix that, at each stage, result in the matrix representing an equivalent system of equations; that is, these operations do not change the solution to the system, but rather make the solution more easily obtained. In particular, we may 1. Replace one row by the sum of itself and a multiple of another row; 2. Interchange any two rows; or 3. Scale a row by multiplying every entry in a given row by a ﬁxed nonzero constant. These three types of operations are typically called elementary row operations. Two matrices are row equivalent if there is a sequence of elementary row operations that transform one matrix into the other. When matrices are used to represent systems of linear equations, as was done above, it is always the case that row-equivalent matrices correspond to equivalent systems. We desire to use elementary row operations systematically to produce row equivalent matrices from which we may easily interpret the solution to a system of equations. For example, the solution to the system represented by ⎡ ⎤ 1 0 0 −5 ⎣0 1 0 6⎦ 0 0 1 −3

(1.2.5)

is easy to obtain (in particular, x1 = −5, x2 = 6, x3 = −3), while the solution for ⎡

⎤ 3 −2 4 −39 ⎣−1 2 7 −4⎦ 6 9 −3 33

is not, even though the two matrices are equivalent. Therefore, we desire each variable in the system to be represented in its corresponding augmented matrix as infrequently as possible. Essentially our goal is to get as many columns of the matrix as possible to have one entry that is 1, while all the rest of the entries in that column are 0.

12

Essentials of linear algebra

A matrix is said to be in reduced row echelon form (RREF) if and only if the following characteristics are satisﬁed: • All nonzero rows are above any rows with all zeros • The ﬁrst nonzero entry (or leading entry) in a given row is 1 and is in a column to the right of the ﬁrst nonzero entry in any row above it • Every other entry in a column with a leading 1 is 0 For example, the matrix in (1.2.5) is in RREF, while the matrix ⎡ ⎤ 1 −2 4 −5 ⎣0 2 7 6⎦ 0 0 −3 −3 is not, since two of the rows lack leading 1’s, and columns 2 and 3 lack zeros in the entries above the lowest nonzero locations. Each leading 1 in RREF is said to be in a pivot position, the column in which the 1 lies is termed a pivot column, and the leading 1 itself is called a pivot. Rows with all zeros do not contain a pivot position. The process by which row operations are applied to a matrix to convert it to RREF is usually called Gauss– Jordan elimination. We will also say that we “row-reduced” a given matrix. While this process can be described in a somewhat cumbersome algorithm, it is best demonstrated with a few examples. By working through the details of the following problems (in particular by deciding which elementary row operations were performed at each stage), the reader will not only learn the basics of row reduction, but also will see and understand the key possibilities for the solution set of a system of linear equations. Example 1.2.1

Solve the system of equations 3x1 + 2x2 − x3 = 8 x1 − 4x2 + 2x3 = −9 −2x1 + x2 + x3 = −1

Solution.

(1.2.6)

We begin with the corresponding augmented matrix ⎡ ⎤ 3 2 −1 8 ⎣ 1 −4 2 −9⎦ −2 1 1 −1

and then perform a sequence of row operations. The arrows below denote the fact that one or more row operations have been performed to produce a row equivalent matrix. We ﬁnd that ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 3 2 −1 1 −4 2 −9 1 −4 2 −9 8 ⎣ 1 −4 2 −1 2 −9⎦ → ⎣ 3 8⎦ → ⎣0 14 −7 35⎦ → 0 −7 −2 1 1 −1 −2 1 1 −1 5 −19

Systems of linear equations

13

x3 1 2 1

5

3

x2

2 x1

Figure 1.4 The intersection of the three

planes given by the linear system (1.2.6).

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 1 1 −4 2 −9 1 0 0 1 5 1 5⎦ 5⎦ 1 ⎣0 ⎣0 1 − 2 ⎦ ⎣ 1 − 12 2 → 0 1 −2 2 → 2 → 3 3 0 −7 5 −19 0 0 1 −1 0 0 2 −2 ⎡ ⎤ 1 0 0 1 ⎣0 1 0 2⎦ 0 0 1 −1

This shows us that the original 3 × 3 system has a unique solution, and that this solution is the point (1, 2, −1). Geometrically, this demonstrates that the three planes with equations given by the system (1.2.6) meet in a single point, as we can see in ﬁgure 1.4.

Example 1.2.2 Solve the system of equations x1 + 2x2 − x3 = 1 x 1 + x2 =2 3x1 + x2 + 2x3 = 8 Solution.

(1.2.7)

We consider the corresponding augmented matrix ⎡ ⎤ 1 2 −1 1 ⎣1 1 0 2⎦ 3 1 2 8

and again perform a sequence of row operations: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 −1 1 1 2 −1 1 1 2 −1 1 1 0 1 3 ⎣1 1 0 2⎦ → ⎣0 −1 1 1⎦ → ⎣0 1 −1 −1⎦ → ⎣0 1 −1 −1⎦ 3 1 2 8 0 −5 5 5 0 −5 5 5 0 0 0 0 In this case, we see that one row of the matrix has essentially vanished. This shows that one of the equations in the original system was redundant, and

14

Essentials of linear algebra

did not contribute any restrictions on the system. Moreover, as the matrix is now in RREF, we can see that the simplest equivalent system is given by the two equations x1 + x3 = 3 and x2 − x3 = −1. In other words, x1 = 3 − x3 and x2 = −1 + x3 . Since the variable x3 has no restrictions on it, we call x3 a free variable. This implies that the system under consideration has inﬁnitely many solutions, each having the form (3 − t , −1 + t , t ), where t ∈ R

(1.2.8)

In the next section, we will begin to emphasize the role that vectors play in systems of linear equations. For example, the ordered triple (3 − t , −1 + t , t ) in (1.2.8) may be viewed as a vector in R3 . In addition, the representation (1.2.8) of the set of all solutions involving the parameter t is often called the parametric vector form of the solution. As we saw in the very ﬁrst system of equations discussed in this section, example 1.2.2 shows that the three planes given in the system (1.2.7) meet in a line. Example 1.2.3

Solve the system of equations x1 + 2x2 − x3 = 1 x1 + x2 =2 3x1 + x2 + 2x3 = 7

Solution. Observe that the only difference between this example and the previous one is that the “8” in the third equation has been replaced with “7.” We proceed with identical row operations to those above and ﬁnd that ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 −1 1 1 2 −1 1 1 2 −1 1 1 0 1 3 ⎣1 1 0 2⎦ → ⎣0 −1 1 1⎦ → ⎣0 1 −1 −1⎦ → ⎣0 1 −1 −1⎦ 3 1 2 7 0 −5 5 4 0 −5 5 4 0 0 0 −1 In this case, the ﬁnal row of the reduced matrix corresponds to the equation 0x1 + 0x2 + 0x3 = −1. Since there are no points (x1 , x2 , x3 ) that make this equation true, it follows that there can be no points which simultaneously satisfy all three equations in the system. Said differently, the three planes given in the original system of equations do not meet at a single point, nor do they meet in a line. Therefore, the system has no solution; recall that we call such a system inconsistent. Note that the only difference between example 1.2.2 and example 1.2.3 is one constant in the righthand side in the equation of one of the planes. This changed the result dramatically, from the case where the system had inﬁnitely many solutions to one where no solutions were present. This is evident geometrically if we think about a situation where three planes meet in a line, and then we alter the equation of one of the planes to shift it to a new plane parallel to its original location: the three planes will no longer have any points in common.

Systems of linear equations

15

Algebraically, we can see what is so special about the one constant we changed (8 to 7) if we replace this value with an arbitrary constant, say k, and perform row operations: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 −1 1 1 2 −1 1 1 0 1 3 ⎣1 1 0 2⎦ → ⎣0 1 −1 −1⎦ → ⎣0 1 −1 −1⎦ 3 1 2 k 0 −5 0 0 0 k −8 5 k −3 This shows that for any value of k other than 8, the resulting system of linear equations will be inconsistent, therefore having no solutions. In the case that k = 8, we see that a free variable arises and then the system has inﬁnitely many solutions. Overall, the question of consistency is an important one for any linear system of equations. In asking “is this system consistent?” we investigate whether or not the system has at least one solution. Moreover, we are now in a position to understand how RREF determines the answer to this question. We note from considering the RREF of a matrix that there are two overall cases: either the system contains an equation of the form 0x1 + · · · + 0xn = b, where b is nonzero, or it has no such equation. In the former case, the system is inconsistent and has no solution. In the latter case, it will either be that every variable is uniquely determined, or that there are one or more free variables present, in which case there are inﬁnitely many solutions to the system. This leads us to state the following theorem. Theorem 1.2.1 For any linear system of equations, there are only three possible cases for the solution set: there are no solutions, there is a unique solution, or there are inﬁnitely many solutions. This central fact regarding linear systems will play a key role in our studies. 1.2.1 Row-reduction using Maple

Obviously one of the problems with the process of row reducing a matrix is the potential for human arithmetic errors. Soon we will learn how to use computer software to execute all of these computations quickly; ﬁrst, though, we can deepen our understanding of how the process works, and simultaneously eliminate arithmetic mistakes, by using a computer algebra system in a step-bystep fashion. Our software of choice is Maple. For now, we only assume that the user is familiar with Maple’s interface, and will introduce relevant commands with examples as we go. We will use the LinearAlgebra package in Maple, which is loaded using the command > with(LinearAlgebra):

(The symbol ‘>’ is called a Maple prompt; the program makes this available to the user automatically, and it should not be entered by the user.) To demonstrate

16

Essentials of linear algebra

various commands, we will revisit the system from example 1.2.1. The reader should explore this code actively by entering and experimenting on his or her own. Recall that we were interested in row-reducing the augmented matrix ⎡ ⎤ 3 2 −1 8 ⎣ 1 −4 2 −9⎦ −2 1 1 −1 We enter the augmented matrix, say A, column-wise in Maple with the command > A := ;

We ﬁrst want to swap rows 1 and 2; this is accomplished by entering > A1 := RowOperation(A,[1,2]);

Note that this stores the result of this row operation in the matrix A1, which is convenient for use in the next step. After executing the most recent command, the following matrix will appear on the screen: ⎤ ⎡ 1 −4 2 −9 2 −1 8⎦ A1 := ⎣ 3 −2 1 1 −1 To perform row-replacement, our next step is to add (−3) · R1 to R2 (where rows 1 and 2 are denoted R1 and R2 ) to generate a new second row; similarly, we will add 2 · R1 to R3 for an updated row 3. The commands that accomplish these steps are > A2 := RowOperation(A1,[2,1],-3); > A3 := RowOperation(A2,[3,1],2);

and lead to the following output: ⎡ ⎤ 1 −4 2 −9 35⎦ A3 := ⎣0 14 −7 0 −7 5 −19 Next, we will scale row 2 by a factor of 1/14 using the command > A4 := RowOperation(A3,2,1/14);

to ﬁnd that

⎡ ⎤ 1 −4 2 −9 5⎦ 1 − 12 A4 := ⎣0 2 0 −7 5 −19

Systems of linear equations

17

The remainder of the computations in this example involve slightly modiﬁed versions of the three versions of the RowOperation command demonstrated above, and are left as an exercise for the reader. Recall that the unique solution to the original system is (1, 2, −1). Maple is certainly capable of performing all of these steps at once. After completing each step-by-step command above in the row-reduction process, the result can be checked by executing the command > ReducedRowEchelonForm(A);

The corresponding output should be ⎡ ⎤ 1 0 0 1 ⎣0 1 0 2⎦ 0 0 1 −1 which clearly reveals the unique solution to the system, (1, 2, −1). Exercises 1.2 In exercises 1–4, solve each system of equations or explain why no solution exists. 1. x1 + 2x2 = 1 x 1 + x2 = 0 2.

x1 + 2x2 = 1 −2x1 − 4x2 = −2

3.

x1 + 2x2 = 1 −2x1 − 4x2 = −3

4. 4x1 − 3x2 = 5 −x1 + 4x2 = 2 In exercises 5–9, for each linear system represented by a given augmented matrix in RREF, decide whether or not the system is consistent or not. If the system is consistent, determine its solution set. For systems with inﬁnitely many solutions, express the solution in parametric vector form. ⎡ ⎤ 5. 1 0 0 4 ⎣0 1 0 −2⎦ 0 0 1 3 ⎡ ⎤ 6. 1 0 0 4 ⎣0 1 1 −2⎦ 0 0 0 3 ⎡ ⎤ 7. 1 0 2 −3 ⎢0 1 1 −2⎥ ⎢ ⎥ ⎣0 0 0 0⎦ 0 0 0 0

18

Essentials of linear algebra

⎡ 1 ⎣0 0 ⎡ 9. 1 ⎢0 ⎢ ⎣0 0

8.

⎤ 0 0 −3 5 0 1 −2 4⎦ 0 0 0 0 −2 0 0 0

0 1 0 0

4 3 0 0

⎤ 0 −1 0 2⎥ ⎥ 1 −5⎦ 0 0

In exercises 10–14, the given augmented matrix represents a system for which some row operations have been performed to partially row-reduce the matrix. By deciding which operations must next be executed, ﬁnish row-reducing each matrix. Finally, interpret your results to state the solution set to the system. ⎡ ⎤ 10. 1 3 2 5 ⎣0 1 −4 −1⎦ 0 0 1 7 ⎡ ⎤ 11. 1 0 0 4 ⎣0 0 0 3⎦ 0 1 1 −2 ⎡ ⎤ 12. 1 0 2 −3 ⎢0 1 1 −2⎥ ⎢ ⎥ ⎣0 3 3 −6⎦ 0 2 2 −1 ⎡ ⎤ 13. 1 0 5 −1 6 ⎣0 0 2 −8 2⎦ 0 0 0 0 0 ⎡ ⎤ 14. 1 −3 0 5 0 −3 ⎢0 0 1 3 0 4⎥ ⎢ ⎥ ⎣0 0 0 1 2 −9⎦ 0 0 0 0 1 4 Determine all value(s) of h that make each augmented matrix in exercises 15–18 correspond to a consistent linear system. For such h, describe the solution set to the system. 15. 1 −2 7 −3 6 h 16. 1 −2 7 −3 h −21 17. 1 h 3 2 h 6 18. 1 2 3 −2 h 5

Systems of linear equations

19

Use a computer algebra system to perform step-by-step row operations to solve each of the following linear systems in exercises 19–23. If the system is consistent, determine its solution set. For systems with inﬁnitely many solutions, express the solution in parametric vector form. 19. x1 − x2 + x3 = 5 2x1 − 4x2 + 3x3 = 0 x1 − 6x2 + 2x3 = 3 20.

4x1 + 2x2 − x3 = −2 x 1 − x2 + x 3 = 6 −3x1 + x2 − 4x3 = −20

21.

4x1 + 2x2 − x3 = −2 x 1 − x2 + x 3 = 6 −2x1 − 4x2 + 3x3 = 14

22.

4x1 + 2x2 − x3 = −2 x 1 − x2 + x 3 = 6 −2x1 − 4x2 + 3x3 = 13

23.

2x2 + 3x3 2x3 2x1 + 2x2 − 5x3 2x1 − 6x3

− 4x4 + 3x4 + 2x4 + 9x4

=1 =4 =4 =7

In exercises 24–27, determine whether or not the given three lines or planes meet in a single point. Justify your answer using appropriate row operations. 24. x1 + x2 = 5, 2x1 − 3x2 = −5, −4x1 + 2x2 = −2 25. x1 + x2 = 5, 2x1 − 3x2 = −5, −4x1 + 2x2 = −3 26. x1 + x2 + x3 = 5, 2x1 − 3x2 + x3 = 1, −4x1 + 2x2 + 5x3 = 4 27. x1 + x2 + x3 = 5, 2x1 − 3x2 + x3 = 3, −4x1 + 2x2 + 5x3 = 4 28. Consider a linear system whose corresponding augmented matrix has all zeros in its ﬁnal column. Is it ever possible for such a system to be inconsistent? Why or why not? 29. Is it possible for a 2 × 3 linear system to be inconsistent? Explain. 30. If a 3 × 4 linear system has three pivot columns in its corresponding augmented matrix, can you determine whether or not the system must be consistent? Explain. 31. A system of linear equations has a unique solution. What can be determined about the relationship between the number of pivot columns in the augmented matrix and the number of variables in the system?

20

Essentials of linear algebra

32. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) Two lines must either intersect or be parallel. (b) A system of three linear equations in three unknown variables can have exactly three solutions. (c) If the RREF of a matrix has a row of all zeros, then the corresponding system must have a free variable present. (d) If a system has a free variable present, then the system has inﬁnitely many solutions. (e) A solution to a 4 × 3 linear system is a list of four numbers (x1 , x2 , x3 , x4 ) that simultaneously makes every equation in the system true. (f) A matrix with three columns and four rows is 3 × 4. (g) A consistent system is one with exactly one solution. 33. Suppose that we would like to ﬁnd a quadratic function p(t ) = a2 t 2 + a1 t + a0 that passes through the three points (1, 4), (2, 7), and (3, 6). How does this problem lead to a system of linear equations? Find the function p(t ). (Hint: p(1) = 4 implies that 4 = a2 12 + a1 1 + a0 .) 34. Find a quadratic function p(t ) = a2 t 2 + a1 t + a0 that passes through the three points (−1, 1), (2, −1), and (5, 4). How does this problem involve a system of linear equations? 35. For the circuit shown at the left in ﬁgure 1.5, set up and solve a system of linear equations whose solution is the respective currents I1 , I2 , and I3 . 36. For the circuit shown at the right in ﬁgure 1.5, set up and solve a system of linear equations whose solution is the respective currents I1 , I2 , and I3 . I3

20V + −

I3

2Ω

2Ω

4Ω I2

I3

I3 I2

I2

3Ω

5Ω 1Ω

I1

I1

I1

+ −

6V I1

1Ω

+ −

+ −

10V

8V

Figure 1.5 Circuits for use in exercises 35 and 36.

I2

Linear combinations

21

1.3 Linear combinations

An important theme in mathematics that is especially present in linear algebra is the value of considering the same idea from a variety of different perspectives. Often, we can make statements that on the surface may seem unrelated, when in fact they ultimately mean the same thing, and one of the statements is most advantageous for solving a particular problem. Throughout our study of linear algebra, we will see that the subject offers a wide variety of perspectives and terminology for addressing the central concept: systems of linear equations. In this section, we take another look at the concept of consistency, but do so in a different, geometric light. Example 1.3.1 Consider the system of equations x1 − x2 = 1 x1 + x2 = 3 x1 + 2x2 = 4

(1.3.1)

Rewrite the system in vector form and explore how two vectors are being combined to form a third, particularly in terms of the geometry of R3 . Then solve the system. Solution. In multivariable calculus, we learn to think of vectors in R3 very much like we think of points. For example, given the point (a , b , c), we may write v = a , b , c or v = ai + bj + ck to denote the vector v that emanates from (0, 0, 0) and ends at (a , b , c). (Here i, j, and k represent the standard unit coordinate vectors: i is the vector from (0, 0, 0) to (1, 0, 0), j to (0, 1, 0), and k to (0, 0, 1).) In linear algebra, we will prefer to take the perspective of writing such an ordered triple as a matrix with only one column, also known as a column vector, in the form ⎡ ⎤ a v = ⎣b⎦ c

(1.3.2)

To save space, we will sometimes use the equivalent notation2 v = [a b c ]T . Recall that two vectors are equal if and only if their corresponding entries are equal, that a vector may be multiplied by a scalar, and that any two vectors of the same size may be added.

2 The ‘T ’ stands for transpose, and the transpose of a matrix is achieved by turning every column into a row.

22

Essentials of linear algebra

We can now re-examine the system of equations (1.3.1) in the light of equality among vectors. In particular, observe that it is equivalent to say ⎡ ⎤ ⎡ ⎤ x1 − x2 1 ⎣ x1 + x2 ⎦ = ⎣ 3 ⎦ (1.3.3) 4 x1 + 2x2 since two vectors are equal if and only if their corresponding entries are equal. Recalling further that vectors are added component-wise, we can rewrite (1.3.3) as ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −x2 x1 1 ⎣ x1 ⎦ + ⎣ x 2 ⎦ = ⎣ 3 ⎦ (1.3.4) 4 x1 2x2 Finally, we observe in (1.3.4) that the ﬁrst vector on the left-hand side has a common factor of x1 in each component, and the second vector similarly contains x2 . Since a scalar multiple of a vector is computed component-wise, here we can rewrite the equation once more, now in the form ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 1 1 x 1 ⎣ 1 ⎦ + x2 ⎣ 1 ⎦ = ⎣ 3 ⎦ (1.3.5) 1 2 4 Equation (1.3.5) is equivalent to the original system (1.3.1), but is now being viewed in a very different way. Speciﬁcally, this last equation asks if there are values of x1 and x2 for which x 1 v 1 + x2 v 2 = b where

⎤ ⎡ ⎤ ⎡ ⎡ ⎤ −1 1 1 v1 = ⎣ 1 ⎦ , v2 = ⎣ 1 ⎦ , and b = ⎣ 3 ⎦ 1 2 4

(1.3.6)

If we plot the vectors v1 , v2 , and b, an interesting situation comes to light, as seen in ﬁgure 1.6. In particular, it appears as if all three vectors lie in the same plane. Moreover, if we think about the parallelogram law of vector addition and stretch the vector v1 by a factor of 2, we see the image in ﬁgure 1.7. This shows geometrically that it appears b = 2v1 + v2 ; a quick check of the vector arithmetic conﬁrms that this is in fact the case. In other words, the unique solution to the system (1.3.1) is x1 = 2 and x2 = 1. Among the many important ideas in example 1.3.1, perhaps most signiﬁcant is the way we were able to re-cast a problem about a system of linear equations as a question involving vectors. In particular, we saw that it was equivalent to ask if there exist constants x1 and x2 such that x 1 v 1 + x2 v 2 = b

(1.3.7)

Linear combinations

23

x3 6 4 v2

−1 2

b 1

v1

3

2 x 1

4 x2

Figure 1.6 The vectors v1 , v2 , and b

from (1.3.6).

x3 6 4 −1 2

v2 b

1

2v1

3

2 x 1

4 x2

Figure 1.7 The parallelogram formed by

the vectors 2v1 and v2 from (1.3.6).

Note that in (1.3.7), we are only taking scalar multiples of vectors and adding them—computations that are linear in nature. We thus naturally come to use the terminology that “x1 v1 + x2 v2 is a linear combination of the vectors v1 and v2 .” A more general deﬁnition now follows, from which we will be able to widen our perspective on systems of linear equations. Deﬁnition 1.3.1 If v1 , . . . , vk are vectors in Rn (that is, each vi is a vector with n entries), and x1 , . . . , xk are scalars, then the vector b given by b = x1 v1 + · · · + xk vk

(1.3.8)

is a linear combination of the vectors v1 , . . . , vk , with weights or coefﬁcients x1 , . . . , xk . Note the notational convention we use, as in example 1.3.1: a bold, nonitalicized, lowercase variable, say x, represents a vector, while a non-bold, italicized, lower-case variable, say c, denotes a scalar. A bold, non-italicized, uppercase variable, say A, will represent a matrix with at least two columns.

24

Essentials of linear algebra

In light of this new terminology of linear combinations, in example 1.3.1 we saw that the question “is there a solution to the linear system (1.3.1)?” is equivalent to asking “is the vector b a linear combination of the vectors v1 and v2 ?” If we now consider the more general situation of a system of linear equations, say a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .. .

am1 x1 + am2 x2 + · · · + amn xn = bm it follows (as in section 1.2) that we can view this system in terms of the augmented matrix [a1 a2 · · · an b] where a1 is the vector in Rm representing the ﬁrst column of the augmented matrix, and so on. Now, however, we have the additional perspective, as in example 1.3.1, that the columns of the augmented matrix A are precisely the vectors being used to form a linear combination in an attempt to construct b. That is, the general m × n linear system above asks the question, “is b a linear combination of a1 , . . . , an ?” We make the connection between linear combinations and augmented matrices more explicit by deﬁning matrix–vector multiplication in terms of linear combinations. Deﬁnition 1.3.2 Given an m × n matrix A with columns a1 , . . . , an that are vectors in Rm , if x is a vector in Rn , then we deﬁne the product Ax by the equation ⎡ ⎤ x1 ⎢ x2 ⎥ ⎢ ⎥ (1.3.9) Ax = [a1 a2 · · · an ] ⎢ .. ⎥ = x1 a1 + x2 a2 + · · · + xn an ⎣ . ⎦ xn That is, the matrix–vector product of A and x is the vector Ax obtained by taking the linear combination of the column vectors of A according to the weights prescribed by the entries in x. Certainly we must have the same number of entries in x as columns in A, or Ax will not be deﬁned. The following example highlights how to compute and interpret matrix–vector products. Example 1.3.2 Let a1 = [1 − 4 2]T and a2 = [−3 1 5]T , and let A be the matrix whose columns are a1 and a2 . Compute Ax, where x = [−5 2]T , and interpret the result in terms of linear combinations.

Linear combinations

25

Solution.

By deﬁnition, we have that ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ −3 −11 1 −3 1 −5 1⎦ = −5 ⎣ −4 ⎦ + 2 ⎣ 1 ⎦ = ⎣ 22 ⎦ Ax = ⎣ −4 2 5 0 2 5 2

The above computations show clearly that the vector Ax = [−11 22 0]T is a linear combination of a1 and a2 . Following a few more computational examples in homework exercises, the reader will quickly see how to compute the product Ax whenever it is deﬁned; usually we skip past the intermediate stage of writing out the explicit linear combination of the columns and simply write the resulting vector. Matrix– vector multiplication also has several important general properties, some of which will be explored in the exercises. For now, we simply list these properties here for future reference: for any m × n matrix A, vectors x, y ∈ Rn , and c ∈ R, • A(x + y) = Ax + Ay • A(cx) = c(Ax) The ﬁrst property shows that matrix multiplication distributes over addition; the second demonstrates that a scalar multiple can be taken either before or after multiplying the vector x by A. These two properties of matrix multiplication are often referred to as being properties of linearity—note the use of only scalar multiplication and vector addition in each, and the linear appearance of each equation.3 Finally, note that it is also the case that A0n = 0m , where 0n is the vector in Rn with all entries being zero, and 0m is the corresponding zero vector in Rm . There is one more important perspective that this new matrix–vector product notation permits. Recall that, in example 1.3.1, we learned that the question “is b a linear combination of a1 and a2 ?” is equivalent to asking “is there a solution to the system of linear equations whose augmented matrix has columns a1 , a2 , and b?” Now, in light of matrix–vector multiplication, we also see that the question “is b a linear combination of a1 and a2 ?” may be rephrased as asking “does there exist a vector x such that Ax = b?” That is, are there weights x1 and x2 (the entries in vector x) such that b is a linear combination of the columns of A? In particular, we may now adopt the perspective that we desire to solve the equation Ax = b for the unknown vector x, where A is a matrix whose entries are known, and b is a vector whose entries are known. This equation is strikingly similar to the most elementary of equations encountered in algebra, ones such as 2x = 7. Therefore, we see that the linear equation Ax = b, involving matrices and vectors, is of fundamental importance as it is another way of expressing questions

3

A deeper discussion of the notion of linear transformations can be found in appendix D.

26

Essentials of linear algebra

regarding linear combinations and solutions of systems of linear equations. In subsequent sections, we will explore this equation from several perspectives. 1.3.1 Markov chains: an application of matrix–vector multiplication

People are often distributed naturally among various groupings. For example, much political discussion in the United States is centered on three classiﬁcations of voters: Democrat, Republican, and Independent. A similar situation can be considered with regard to peoples’ choices for where to live: urban, suburban, or rural. In each case, the state of the population at a given time is its distribution among the relevant categories. Furthermore, in each of these situations, it is natural to assume that if we consider the state of the system at a given point in time, its state depends on the system’s state in the preceding year. For example, the percentage of Democrats, Republicans, and Independents in the year 2020 ought to be connected to the respective percentages in 2019. Let us assume that a population of voters (of constant size) is considered in which every-one must classiﬁed as either D, R, or I (Democrat, Republican, or Independent). Suppose further that a study of voter registrations over many years reveals the following trends: from one year to the next, 95 percent of Democrats keep their registration the same. For the remaining 5 percent who change parties, 2 percent become Republicans and 3 percent become Independents. Similar data for Republicans and Independents is given in the following table. Future party (↓)/current party (→)

D(%)

R(%)

I(%)

Democrat

95

3

7

Republican

2

90

13

Independent

3

7

80

If we let Dn , Rn , and In denote the respective numbers of registered Democrats, Republicans, and Independents in year n, then the table shows us how to determine the respective numbers in year n + 1. For example, Dn+1 = 0.95Dn + 0.03Rn + 0.07In

(1.3.10)

since 95 percent of the Democrats in year n stay registered Democrats, and 3 percent of Republicans and 7 percent of Independents change to Democrats. Similarly, we have Rn+1 = 0.02Dn + 0.90Rn + 0.13In

(1.3.11)

In+1 = 0.03Dn + 0.07Rn + 0.80In

(1.3.12)

Linear combinations

27

If we combine (1.3.10), (1.3.11), and (1.3.12) in a single vector equation, then ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ Dn+1 0.95 0.03 0.07 ⎣ Rn+1 ⎦ = Dn ⎣ 0.02 ⎦ + Rn ⎣ 0.90 ⎦ + In ⎣ 0.13 ⎦ (1.3.13) 0.03 0.07 0.80 In+1 Here we ﬁnd that linear combinations of vectors have naturally arisen. Note, for example, that the vector [0.03 0.90 0.07]T is the Republican vector, and represents the likelihood that a Republican in a given year will be in one of the three parties in the following year. More speciﬁcally, we observe that probabilities are involved: a Republican has a 3 percent likelihood of registering as a Democrat in the following year, a 90 percent likelihood of staying a Republican, and 7 percent chance of becoming an Independent. The sum of the entries in each column vector is 1. If we use the vector x (n) to represent ⎡ ⎤ Dn x (n) = ⎣ Rn ⎦ In and use matrix–vector multiplication to represent the linear combination of vectors in (1.3.13), then (1.3.13) is equivalently expressed by the equation x (n+1) = Mx (n) where M is the matrix

(1.3.14)

⎡

⎤ 0.95 0.03 0.07 M = ⎣ 0.02 0.90 0.13 ⎦ 0.03 0.07 0.80

The matrix M is often called a transition matrix since it shows how the population transitions from state n to state n + 1. We observe that in order for such a matrix to represent the probabilities that groups in a particular set of states will transition to another set of states, the columns of the matrix M must be nonnegative and add to 1. Such a matrix is called a stochastic matrix or a Markov matrix. Finally, we call any system such as the one with three classiﬁcations of voters, where the state of the system in a given observation period results from applying probabilities to a previous state, a Markov chain or Markov process. We see, for example, that if we had a group of 250 000 voters that at year n = 0 was distributed among Democrats, Republicans, and Independents by the vector (with entries measured in thousands) x (0) = [120 110 20]T then we can easily compute the projected distribution of voters in subsequent years. In particular, (1.3.14) implies ⎡ ⎡ ⎡ ⎤ ⎤ ⎤ 118.70 117.80 117.18 x (1) = Mx (0) = ⎣ 104⎦, x (2) = Mx (1) = ⎣ 99.52⎦, x (3) = Mx (2) = ⎣ 96.18⎦ 27.3 32.68 36.65

28

Essentials of linear algebra

Interestingly, if we continue the sequence, we eventually ﬁnd that there is very little variation from one vector x (n) to the next. For example, ⎡ ⎡ ⎤ ⎤ 116.67 116.79 x (17) = ⎣ 85.95 ⎦ ≈ x (18) = ⎣ 85.76 ⎦ 47.42 47.44 In fact, as we will learn in our later study of eigenvectors, there exists a vector x ∗ called the steady-state vector for which x ∗ = Mx ∗ . This shows that the system can reach a state in which it does not change from one year to the next. Another example is instructive. Example 1.3.3 Geographers studying a metropolitan area have observed a trend that while the population of the area stays roughly constant, people within the city and its suburbs are migrating back and forth. In particular, suppose that 85 percent of people whose homes are in the city keep their residence from one year to the next; the remainder move to the suburbs. Likewise, while 92 percent of people whose homes are in suburbs will live there the next year, the other 8 percent will move into the city. Assuming that in a given year there are 230 000 people living in the city and 270 000 people in the surrounding suburbs, predict the population distribution over the next 3 years. Solution. If we let Cn and Sn denote the populations of the city and suburbs in year n, the given information tells us that the following relationships hold: Cn+1 = 0.85Cn + 0.08Sn Sn+1 = 0.15Cn + 0.92Sn Using the notation

Cn x = Sn we can model the changing distribution of the population between the city and suburbs with the Markov process x (n+1) = Mx (n) , where M is the Markov matrix 0.85 0.08 M= 0.15 0.92

(n)

In particular, starting with x (0) = [230 270]T , we see that 217.10 207.17 199.52 x (1) = , x (2) = , x (3) = 282.90 292.83 300.48 As with voter distribution, this example is oversimpliﬁed. For instance, we have not taken into account members of the population who move into or away from the metropolitan area. Nonetheless, the basic ideas of Markov processes are important in the study of systems whose current state depends on preceding ones, and we see the key role matrices and matrix multiplication play in representing them.

Linear combinations

29

1.3.2 Matrix products using Maple

After becoming comfortable with computing elementary matrix products by hand, it is useful to see how Maple can assist us with more complicated computations. Here, we demonstrate the relevant command. Revisiting example 1.3.2, to compute the product Ax, we ﬁrst enter A and x using the familiar commands > A := ; x := ;

Next, we use the ‘period’ symbol to inform Maple that we want to multiply. Entering > b := A.x;

yields the expected output that

⎡

⎤ −11 b = ⎣ 22 ⎦ 0

Note: Maple will obviously only perform the multiplication when it is deﬁned. If, say, we were to attempt to multiply a 2 × 2 matrix and a 3 × 1 vector, Maple would report the following: Error, (in LinearAlgebra:-MatrixVectorMultiply) vector dimension (3) must be the same as the matrix column dimension (2).

Exercises 1.3 For exercises 1–4, where a matrix A and vector x are given, compute the product Ax in every case that it is deﬁned. If the product is undeﬁned, explain why. 1 −3 2 −1 1. A = , x= 2 −4 1 0 ⎤ ⎡ −1 1 −3 2 , x = ⎣ 2⎦ 2. A = −4 1 0 4 ⎤ ⎡ 5 −2 3 ⎦ ⎣ 1 −1 , x = 3. A = −2 −3 2 ⎡ ⎤ 3

4. A = −4 2 7 , x = ⎣ 5 ⎦ −1

30

Essentials of linear algebra

5. Recall from multivariable calculus that given vectors x , y ∈ R3 , the dot product of x and y, x · y, is computed by taking x · y = x1 y1 + x2 y2 + x3 y3 How can matrix–vector multiplication (when deﬁned) be viewed as the result of computing several appropriate dot products? Explain. 6. For the system of equations given below, determine a vector equation with an equivalent solution. What is the system asking in regard to linear combinations of certain vectors? x1 + 2x2 = 1 x1 + x2 = 0 In addition, determine a matrix A and vector b so that the equation Ax = b is equivalent to the given system of equations. 7. For the system of differential equations (1.1.6) (also given below) from the introductory section, how can we rewrite the system in matrix–vector notation? dx1 x 1 x2 = 20 − + dt 20 80 dx2 x 1 x2 = 35 + − dt 40 40 Hint: recall that if x(t ) is a vector function, we write x (t ) or dx /dt for the vector [dx1 /dt dx2 /dt ]T . 8. Determine if the vector b = [−3 1 5]T is a linear combination of the vectors a1 = [−1 2 1]T , a2 = [3 1 1]T , and a3 = [1 5 3]T . If so, will more than one set of weights work? 9. Determine if the vector b = [0 7 4]T is a linear combination of the vectors a1 = [−1 2 1]T , a2 = [3 1 1]T , and a3 = [1 5 3]T . If so, will more than one set of weights work? 10. We know from our work in this section that the matrix equation Ax = b corresponds both to a vector equation and a system of linear equations. What is the augmented matrix that represents this system of equations? In exercises 11–15, let A be the stated matrix and b the given vector. Solve the linear equation Ax = b by converting the equation to a system of linear equations and row-reducing appropriately. If the system has more than one solution, express the solution in parametric vector form. Finally, write a sentence in each case that explains how the vector b is related to linear combinations of the columns of A. 4 5 −1 13 11. A = , b= 3 1 2 −4

Linear combinations

31

2 5 5 A= , b= 6 −3 −1 6 2 7 A= , b= −3 −1 −1 ⎡ ⎡ ⎤ ⎤ 1 −3 5 1⎦, b = ⎣ −5 ⎦ A = ⎣−2 3 −1 5 ⎤ ⎤ ⎡ ⎡ 5 −3 1 0 1 4⎦, b = ⎣ 22 ⎦ A = ⎣−2 1 0 −2 −11

12. 13.

14.

15.

16. Linear equations of the form Ax = 0 are important for a variety of reasons, some of which we will study in the next section. Explain why the system of linear equations corresponding to the equation Ax = 0 is always consistent, regardless of the matrix A. In exercises 17–21, solve the linear equation Ax = 0 by row-reducing appropriately. If the system has more than one solution, express the solution in parametric vector form. 4 5 −1 17. A = 3 1 2 2 5 18. A = −3 −1 6 2 19. A = −3 −1 ⎡ ⎤ 1 −3 1⎦ 20. A = ⎣−2 3 −1 ⎡ ⎤ 5 −3 1 1 4⎦ 21. A = ⎣−2 1 0 −2 3 −4 b 22. Let A = and b = 1 . Describe the set of all vectors b for −6 8 b2 which the equation Ax = b is consistent. 3 b −4 , v2 = , and b = 1 . Describe the set of all 23. Let v1 = 8 −6 b2 vectors b for which b is a linear combination of v1 and v2 .

32

Essentials of linear algebra

24. Let A be an m × n matrix, x and y ∈ Rn , and c ∈ R. Show that (a) A(x + y) = Ax + Ay (b) A(cx) = c(Ax) 25. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) To compute the product Ax, the vector x must have the same number of entries as the number of rows in A. (b) A linear combination of three vectors in R3 produces another vector in R3 . (c) If b is a linear combination of v1 and v2 , then there exist scalars c1 and c2 such that c1 v1 + c2 v2 = b. (d) If A is a matrix and x and b are vectors such that Ax = b, then x is a linear combination of the columns of A. (e) The equation Ax = 0 can be inconsistent. 26. Suppose that for a large population that stays relatively constant, people are classiﬁed as living in urban, suburban, or rural settings. Moreover, assume that the probabilities of the various possible transitions are given by the following table: Future location (↓)/current location (→)

U(%)

S(%)

R(%)

92

3

2

Suburban

7

96

10

Rural

1

1

88

Urban

Given that the population of 250 million is initially distributed in 100 million urban, 100 million suburban, and ﬁfty million rural, predict the population distribution in each of the following ﬁve years. 27. Car-owners can be grouped into classes based on the vehicles they own. A study of owners of sedans, minivans, and sport utility vehicles shows that the likelihood that an owner of one of these automobiles will replace it with another of the same or different type is given by the table Future vehicle (↓)/ current vehicle (→)

Sedan(%)

Minivan(%)

SUV(%)

91

3

2

Minivan

7

95

8

SUV

2

2

90

Sedan

The span of a set of vectors

33

If there are currently 100 000 sedans, 60 000 minivans, and 80 000 SUVs among the owners being studied, predict the distribution of vehicles among the population after each owner has replaced her vehicle 3 times.

1.4 The span of a set of vectors

In section 1.3, we saw that the question “is b a linear combination of a1 and a2 ?” provides an important new perspective on solutions of linear systems of equations. It is natural to slightly rephrase this question and ask more generally “which vectors b may be written as linear combinations of a1 and a2 ?” We explore this question further through the following sequence of examples. Example 1.4.1 Describe the set of all vectors in R2 that may be written as a linear combination of the vector a1 = [2 1]T . Solution. Since we have just one vector a1 , any linear combination of a1 has the form ca1 , which of course is a scalar multiple of a1 . Geometrically, the vectors that are linear combinations of a1 are stretches of a1 , which lie on the line through (0, 0) in the direction of a1 , as shown in ﬁgure 1.8. In this ﬁrst example, we see a visual way to interpret the question about linear combinations: essentially we want to know “which vectors can we create using only linear combinations of a1 ?” The answer is not surprising: only vectors that lie on the line through the origin in the direction of a1 . Next, we consider how the situation changes when we consider two parallel vectors. 4

x2

a1 x1 −4

4

−4 Figure 1.8 The set of all linear combinations of

a1 in example 1.4.1.

34

Essentials of linear algebra

Example 1.4.2 Describe the set of all vectors in R2 that may be written as a linear combination of the vectors a1 = [2 1]T and a2 = [−1 − 12 ]T . Solution. Observe ﬁrst that − 12 a1 = a2 . Here we are considering the set of all vectors y of the form −1 2 + c2 y = c1 1 − 12 In ﬁgure 1.9, we observe that the vectors a1 and a2 point in opposing directions. When we take a linear combination of these vectors to form y, we are adding a stretch of c1 units of the ﬁrst to a stretch of c2 units of the second. Because the two directions are parallel, this leaves the resulting vector as a stretch of one of the two original vectors, and therefore on the line through the origin in their direction. This may also be seen algebraically since − 12 a1 = a2 implies y = c1 a1 + c2 a2 = c1 a1 − 12 c2 a1 = (c1 − 12 c2 )a1 . We note particularly that since the two given vectors a1 and a2 are parallel, any linear combination of them is actually a scalar multiple of a1 . Thus, the resulting set of all linear combinations is identical to what we found with the single vector given in example 1.4.1. Finally, we consider the situation where we consider all linear combinations of two non-parallel vectors.

4

x2

a1 x1 −4

a2

4

−4 Figure 1.9 The set of all linear combinations of

a1 and a2 in example 1.4.2.

Example 1.4.3 Describe the set of all vectors in R2 that may be written as a linear combination of the vectors a1 = [2 1]T and a2 = [1 2]T .

The span of a set of vectors

4 −2a1+ 2a2

35

x2 a1+ a2

a2 a1

−4

x1 4

7/3 a1− 5/3 a2

−4 Figure 1.10 Linear combinations of a1 and a2

from example 1.4.3.

Solution. Algebraically, we are again considering the set of all vectors y such that y = c1 a1 + c2 a2 . A visual way to think about how the set of all such vectors y looks is found in the question, “which vectors can we create by taking a stretch of a1 and adding this to a stretch of a2 ?” If we consider a plot of the given two vectors a1 and a2 and think of the “grid” that is formed by considering all of their stretches and the sums of their stretches, we have the picture shown in ﬁgure 1.10. The fact that a1 and a2 are not parallel enables us to “get off the line” that each one generates through the origin. For example, if we simply take the sum of these two vectors and set y = a1 + a2 , by the parallelogram law of vector addition we arrive at the new vector [3 3]T shown in ﬁgure 1.10. Two other linear combinations are shown as well, and from here it is not hard to visualize the fact that we can create any vector in the plane using linear combinations of the non-parallel vectors a1 and a2 . In other words, the set of all linear combinations of a1 and a2 is R2 . It is also possible to verify our ﬁndings in example 1.4.3 algebraically. We will explore this further in the exercises and in section 1.5. Certainly we are not limited to considering linear combinations of only two vectors. We therefore introduce a more formal perspective and terminology to describe the phenomena examined in the above examples. Deﬁnition 1.4.1 Given a set of vectors S = {v1 , . . . , vk }, vi ∈ Rm , the span of S, denoted Span(S) or Span{v1 , . . . , vk }, is the set of all linear combinations of the vectors v1 , . . . , vk . Equivalently, Span(S) is the set of all vectors y of the form y = c1 v1 + · · · + ck vk

36

Essentials of linear algebra

where c1 , . . . , ck are scalars. We also say that Span(S) is the subset of Rm spanned by the vectors v1 , . . . , vk . For any single nonzero vector v1 ∈ Rm , Span{v1 } consists of all vectors that lie on the line through the origin in Rm in the direction of v1 . For two nonparallel vectors v1 , v2 ∈ Rm , Span{v1 , v2 } is the plane through the origin that contains both the vectors v1 and v2 . Next, let us recall that our interest in linear combinations was motivated by a desire to look at systems of linear equations from a new perspective. How is the concept of span related to linear systems? We begin to answer this question by considering the special situation where b = 0. A system of linear equations that can be represented in matrix form by the equation Ax = 0 is said to be homogeneous; the case when b = 0 is termed nonhomogeneous. We also call the equation Ax = 0 a homogeneous equation. By the deﬁnition of matrix–vector multiplication, it is immediately clear that A0 = 0 (note that these two zero vectors may be of different sizes), and thus any homogeneous equation has at least one solution and is guaranteed to be consistent. We will usually call the solution x = 0 the trivial solution. Under what circumstances will a homogeneous system have nontrivial solutions? How is this question related to the span of a set of vectors? The following example provides insight into these questions. Example 1.4.4 Solve the homogeneous system of linear equations given by the equation Ax = 0 where A is the matrix ⎡ ⎤ 1 1 1 1 ⎢2 1 −1 3⎥ ⎥ A=⎢ ⎣1 0 −2 2⎦ 8 5 −1 11 If more than one solution exists, express the solution in parametric vector form. Solution. To begin, we augment the matrix A with a column of zeros to represent the vector 0 in the system given by Ax = 0. We then row-reduce this augmented matrix to ﬁnd ⎡ ⎤ ⎡ ⎤ 1 1 1 1 0 1 0 −2 2 0 ⎢2 1 −1 3 0⎥ ⎢0 1 3 −1 0⎥ ⎢ ⎥ ⎢ ⎥ ⎣1 0 −2 2 0⎦ → ⎣0 0 0 0 0⎦ 0 0 0 0 0 8 5 −1 11 0 We observe that the system has two free variables, and therefore inﬁnitely many solutions. In particular, these solutions must satisfy the equations x1 − 2x3 + 2x4 = 0 x2 + 3x3 − x4 = 0

The span of a set of vectors

37

where x3 and x4 are free. Equivalently, using these equations and vector addition and scalar multiplication, it must be the case that any solution x to Ax = 0 has the form ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ x1 2x3 − 2x4 2 −2 ⎢ x2 ⎥ ⎢ −3x3 + x4 ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ = x3 ⎢ −3 ⎥ + x4 ⎢ 1 ⎥ (1.4.1) x=⎢ ⎣ x3 ⎦ = ⎣ ⎦ ⎣ ⎦ ⎣ 0⎦ 1 x3 1 0 x4 x4 where x3 , x4 ∈ R. Note particularly that this shows that every solution x to the original homogeneous equation Ax = 0 can be expressed as a linear combination of the two vectors on the rightmost side of (1.4.1). Moreover, it is also the case that every linear combination of these two vectors is a solution to the equation. In light of the terminology of span, we can say that the set of all solutions to the homogeneous equation Ax = 0 is Span{v1 , v2 }, where ⎡ ⎡ ⎤ ⎤ −2 2 ⎢ −3 ⎥ ⎢ 1⎥ ⎢ ⎥ ⎥ v1 = ⎢ ⎣ 1 ⎦ , v2 = ⎣ 0 ⎦ 1 0 In this section, we have seen that the set of all linear combinations of a set of vectors can be interpreted geometrically, particularly in the case when we only have one or two vectors present, by thinking about lines and planes. In addition, the span of a set of vectors arises naturally in considering homogeneous equations in which inﬁnitely many solutions are present. In that situation, the set of all solutions can be expressed as the span of a set of k vectors, where k is the number of free variables that arise in row-reducing the augmented matrix. Exercises 1.4 In exercises 1–6, solve the homogeneous equation Ax = 0, given the matrix A. If inﬁnitely many solutions exist, express the solution set as the span of the smallest possible set of vectors. 1 −3 2 1. A = −4 1 0 ⎡ ⎤ −4 2 2. A = ⎣ 1 −3⎦ 6 5 8 −5 3. A = 10 −16 2 −4 4. A = 2 −1

38

Essentials of linear algebra

⎡

3 5. A = ⎣ 1 −1 ⎡ 1 6. A = ⎣ 4 −7

⎤ 1 −1 3 1⎦ 1 3 ⎤ −1 2 −2 6⎦ 3 −10

7. Let A be an m × n matrix where n > m. Is it possible that Ax = 0 has only the trivial solution? Explain why or why not. 8. Let A be an m × n matrix where n ≤ m. Is it guaranteed that Ax = 0 will have only the trivial solution? Explain why or why not. 9. Determine if the vector b = [11 − 4]T is in the span of the vectors a1 = [3 − 2]T and a2 = [−9 6]T . Justify your answer carefully. 10. Determine if the vector b = [−17 31]T is in the span of the vectors a1 = [1 0]T and a2 = [0 1]T . What do you observe? 11. Determine if the vector b = [9 17 11]T is in the span of the vectors a1 = [−1 2 1]T , a2 = [3 1 1]T , and a3 = [1 5 3]T . Justify your answer. 12. Explain why the vector b = [3 2]T does not lie in the span of the set S, where S = {v } and v = [1 1]T . 13. Describe geometrically the set W = Span{v1 , v2 }, where v1 = [1 1 1]T and v2 = [−3 0 2]T . 14. Can every vector b ∈ R3 be found in W = Span{v1 , v2 }, where v1 = [1 1 1]T and v2 = [−3 0 2]T ? If so, explain why. If not, ﬁnd a vector not in W and justify your answer. 15. Show that every point (vector) that lies on the line with equation 2x1 − 3x2 = 0 also lies in the set W = Span{v1 }, where v1 = [3 2]T . 16. Show that every point (vector) that lies on the plane with equation −x + y + z = 0 also lies in the set W = Span{v1 , v2 }, where v1 = [1 − 1 2]T and v2 = [2 1 1]T . 17. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) The span of a single nonzero vector in R2 can be thought of as a line through the origin. (b) The span of any two nonzero vectors in R3 can be viewed as a plane through the origin in R3 . (c) If Ax = b holds true for a given matrix A and vectors x and b, then x lies in the span of the columns of A.

Systems of linear equations revisited

39

(d) It is possible for a homogeneous equation Ax = 0 to be inconsistent. (e) The number of free variables present in the solution to Ax = 0 is the same as the number of pivot columns in the matrix A.

1.5 Systems of linear equations revisited

From our initial work with row-reducing a system of linear equations to our recent discussions of linear combinations and span, we have seen already that there are several perspectives from which to view a system of linear equations. One is purely algebraic: “is there at least one ordered list (x1 , . . . , xn ) that makes every equation in a given system true?” Here we are viewing the system in the form a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .. .. .=.

am1 x1 + am2 x2 + · · · + amn xn = bm In light of linear combinations, we can rephrase this question geometrically as “is the vector b a linear combination of the vectors a1 , . . . , an ?”, where ai is the ith column of the coefﬁcient matrix of the system. From this standpoint, asking if the system has a solution can be thought of in terms of the question, “does the vector b belong to the span of the columns of A?” Finally, through matrix multiplication, we can also express this system of equations in its simplest form: Ax = b. From all of this, we know that the question, “Does Ax = b have at least one solution?” is one of fundamental importance. We have also seen that in the special case of the homogeneous equation Ax = 0, the answer to the above questions is always afﬁrmative, since setting x = 0 guarantees that we have at least one solution. In what follows, we further explore the nonhomogeneous case Ax = b, with particular emphasis on understanding characteristics of the matrix A that enable us to answer the questions in the preceding paragraph. We begin by revisiting example 1.4.2 from a more algebraic perspective. Example 1.5.1 For which vectors b is the equation Ax = b consistent, if A is the matrix whose columns are the vectors a1 = [2 1]T and a2 = [−1 − 12 ]T ? Solution. By the deﬁnition of matrix multiplication, this question is equivalent to asking, “which vectors b are linear combinations of the columns of A?” This question may be equivalently rephrased as “which vectors b are in the span of the columns of A?” We have already answered this question from a geometric perspective in example 1.4.2, where we saw that since a1 and a2 are parallel, it follows that every vector in R2 that lies on the line through the origin in

40

Essentials of linear algebra

the direction of a1 can be written as a linear combination of the two vectors. Nonetheless, it is insightful to explore algebraically why this is the case. Letting b be the vector whose entries are b1 and b2 and writing the equation Ax = b in the form of an augmented matrix, we row-reduce and ﬁnd that 2 −1 b1 1 − 12 b2 → 1 − 12 b2 0 0 b1 − 2b2 The second row in the augmented matrix represents the equation 0x1 + 0x2 = b1 − 2b2 Observe that if b1 − 2b2 = 0, this equation cannot possibly be true, and therefore the system would be inconsistent. Said differently, the only way for Ax = b to be consistent is for b1 − 2b2 = 0. That is, if b is a vector such that b1 = 2b2 , or 2b2 b= b2 then Ax = b is consistent. This makes sense geometrically, since the span of the columns of A is all the stretches of the vector a1 = [2 1]T . An important lesson to take from example 1.5.1 is that the equation Ax = b discussed there is not consistent for every choice of b. In fact, the equation is only consistent for very limited choices of b. For example, if b = [6 3]T , the equation is consistent, but if b = [6 k ]T for any k = 3, the equation is inconsistent. Moreover, we should observe that for the matrix in this example, A does not have a pivot position in every row. This is what ultimately leads to the algebraic equation 0x1 + 0x2 = b1 − 2b2 , and the potential inconsistency of Ax = b. At this point in our work, it is important that we begin to generalize our observations in order to apply them in new, but similar, circumstances. We again emphasize that it is a noteworthy characteristic of linear algebra that the discipline often offers great ﬂexibility through the large number of ways to say the same thing; at times, one way of stating a fact can give more insight than others, and therefore it is important to be well versed in shifting among multiple perspectives. The following theorem is of the form “the following statements are equivalent”; this means that if any one of the statements is true, all the others are as well. Likewise, if any one statement is false, every statement in the theorem must be false. This theorem formalizes our ﬁndings in the example above, and, in some sense, our work in the ﬁrst several sections of the text. Theorem 1.5.1 Let A be an m × n matrix and b a vector in Rm so that the equation Ax = b represents a system of m linear equations in n unknown variables. The following statements are equivalent: a. The equation Ax = b is consistent b. The vector b is a linear combination of the columns of A

Systems of linear equations revisited

41

c. The vector b is in the span of the columns of A d. When the augmented matrix [A b] is row-reduced, there are no rows where the ﬁrst n entries are zero and the last entry is nonzero. The following example demonstrates how we can use theorem 1.5.1 to answer questions about span and linear combinations. Example 1.5.2 Does the vector b = [1 − 7 − 14]T belong to the span of the vectors a1 = [1 3 4]T , a2 = [2 1 − 1]T , and a3 = [0 5 9]T ? Does the result change if we ask the same question about the vector c = [1 − 7 − 13]T ? Solution. By theorem 1.5.1, we know that it is equivalent to ask if the equation Ax = b is consistent, where b is the given vector and A is the matrix whose columns are a1 , a2 , and a3 . To answer that question, we consider the augmented matrix [A | b] and row-reduce: ⎡ ⎤ ⎡ ⎤ 1 2 0 1 1 0 2 −3 ⎣3 1 5 −7⎦ → ⎣0 1 −1 2⎦ 4 −1 9 −14 0 0 0 0 Because this system of equations is consistent, it follows that b is indeed a linear combination of the columns of A and therefore b lies in the span of a1 , a2 , and a3 . If we instead consider the vector c stated in the example and proceed similarly, row-reduction shows that ⎡ ⎤ ⎡ ⎤ 1 2 0 1 1 0 2 0 ⎣3 1 5 −7⎦ → ⎣0 1 −1 0⎦ 4 1 9 −13 0 0 0 1 which implies that the system is inconsistent and therefore c is not a linear combination of the columns of A, or equivalently, c does not lie in the span of a1 , a2 , and a3 . At this point, it is natural to think the situations in examples 1.5.1 and 1.5.2 are somewhat dissatisfying: sometimes Ax = b is consistent, and sometimes not, all depending on our choice of b. A natural question to ask is, “are there matrices A for which Ax = b is consistent for every choice of b?” With that question, we are certainly interested in the properties of the matrix A that make this situation occur. We next revisit example 1.4.3 and explore these issues further. Example 1.5.3 For which vectors b is the equation Ax = b consistent, if A is the matrix whose columns are the vectors a1 = [2 1]T and a2 = [1 2]T ?

42

Essentials of linear algebra

Solution. Proceeding as in the previous example, we row reduce the augmented matrix form of the equation and ﬁnd that

2 1 1 0 2 1 b1 3 b1 − 3 b2 → 1 2 b2 0 1 − 13 b1 + 23 b2 Algebraically, this shows that regardless of the entries we select for the vector b, we can always ﬁnd a solution to the equation Ax = b. In particular, x is the vector in R2 whose components are x1 = 23 b1 − 13 b2 and x2 = − 13 b1 + 23 b2 . Thus the equation Ax = b is consistent for every b in R2 . Note that this is not surprising, given our work in example 1.4.3, where we found that from a geometric perspective, every vector b ∈ R2 could be written as a linear combination of a1 and a2 . This example simply conﬁrms that ﬁnding, but now from an algebraic point of view. In terms of a key property of the matrix in example 1.5.3, we see that A has a pivot position in every row. In particular, there is no row in RREF(A) where we encounter all zeros, and thus it is impossible to ever encounter an equation of the form 0 = k, where k = 0. This is, therefore, one property of the matrix A that guarantees consistency for every choice of b. We generalize our ﬁndings in this example in the following theorem, which is similar to theorem 1.5.1, but now focuses solely the matrix A and no longer requires a vector b to be initially chosen. Theorem 1.5.2 equivalent:

Let A be an m × n matrix. The following statements are

a. The equation Ax = b is consistent for every b ∈ Rm b. Every vector b ∈ Rm is a linear combination of the columns of A c. The span of the columns of A is Rm d. A has a pivot position in every row. That is, when the matrix A is row-reduced, there are no rows of all zeros. Our next example shows how we can apply theorem 1.5.2 to answer general questions about the span of a set of vectors and the consistency of related systems of equations. Example 1.5.4 Does the vector b = [1 − 7 − 13]T belong to the span of the vectors a1 = [1 3 4]T , a2 = [2 1 − 1]T , and a3 = [0 5 10]T ? Can every vector in R3 be found in the span of the vectors a1 , a2 , and a3 ? Solution. Just as in example 1.5.2, we know by theorem 1.5.1 that it is equivalent to ask if the equation Ax = b is consistent, where b is the given vector and A is the matrix whose columns are a1 , a2 , and a3 . We thus consider

Systems of linear equations revisited

43

the augmented matrix [A | b] and row-reduce: ⎡ ⎤ ⎡ ⎤ 1 2 0 1 1 0 0 −5 ⎣3 1 5 −7⎦ → ⎣0 1 0 3⎦ 0 0 1 1 4 −1 10 −13 Because this system of equations is consistent, it follows that b is indeed a linear combination of the columns of A and therefore b lies in the span of a1 , a2 , and a3 . But by theorem 1.5.2 we can now make a much more general observation. Because we see that the coefﬁcient matrix A has a pivot in every row, it follows that regardless of which vector b we choose in R3 , we can write that vector as a linear combination of the columns of A. That is, the vectors a1 , a2 , and a3 span all of R3 and the equation Ax = b will be consistent for every choice of b. This example demonstrates that it is in some sense ideal if a matrix A has a pivot in every row. As we proceed with further study of linear algebra, we will focus more and more on properties of the coefﬁcient matrix and their implications for related systems of equations. We conclude this section by examining a key link between homogeneous and nonhomogeneous equations in order to foreshadow an essential concept in our pending study of differential equations. Example 1.5.5 Solve the nonhomogeneous system of linear equations given by the equation Ax = b where A and b are ⎡ ⎡ ⎤ ⎤ 1 1 1 1 1 ⎢2 1 −1 3⎥ ⎢ −8 ⎥ ⎢ ⎥ ⎥ A=⎢ ⎣1 0 −2 2⎦ , b = ⎣ −9 ⎦ 8 5 −1 11 −22 If more than one solution exists, express the solution in parametric vector form. Solution. Note that the coefﬁcient matrix A is identical to the one in example 1.4.4, so that here we are simply considering a related nonhomogeneous equation. We augment the matrix A with b and then row reduce to ﬁnd ⎡ ⎤ ⎡ ⎤ 1 1 1 1 1 1 0 −2 2 −9 ⎢2 1 −1 3 −8⎥ ⎢ 3 −1 10⎥ ⎢ ⎥ → ⎢0 1 ⎥ ⎣1 0 −2 2 −9⎦ ⎣0 0 0 0 0⎦ 0 0 0 0 0 8 5 −1 11 −22 As we found with the homogeneous equation, the system is consistent and has two free variables, and therefore inﬁnitely many solutions. These solutions must satisfy the equations x1 = −9 + 2x3 − 2x4 x2 = 10 − 3x3 + x4

44

Essentials of linear algebra

where x3 and x4 are free. Equivalently, it must be the case that any solution x has the form ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ −9 + 2x3 − 2x4 −9 −2 x1 2 ⎢ ⎥ ⎢ ⎥ ⎢ x2 ⎥ ⎢ 10 − 3x3 + x4 ⎥ ⎢ 10 ⎥ ⎥ ⎢ ⎥=⎢ ⎥ + x3 ⎢ −3 ⎥ + x4 ⎢ 1 ⎥ x=⎢ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎣ x3 ⎦ = ⎣ 0 0⎦ 1 x3 0 1 0 x4 x4 where x3 , x4 ∈ R. Observe that if we let xp = [−9 10 0 0]T and let xh be any vector of the form ⎡ ⎤ ⎡ ⎤ −2 2 ⎢ −3 ⎥ ⎢ ⎥ ⎥+s⎢ 1⎥ xh = t ⎢ ⎣ 1⎦ ⎣ 0⎦ 1 0 then any solution to the equation Ax = b has the form x = xp + xh . Moreover, it is now apparent that this vector xh is the same general solution vector that we found for the corresponding homogeneous equation in example 1.4.4. In addition, it is straightforward to check that Axp = b. Thus, we see that the general solution to the nonhomogeneous equation contains the general solution to the corresponding homogeneous equation. It appears from example 1.5.5 that if we have a solution, say xp , to a nonhomogeneous equation Ax = b, we may add any solution xh to the homogeneous equation Ax = 0 to xp and still have a solution to Ax = b. To see why any vector of the form xp + xh is a solution to Ax = b, let us assume that xp is a solution to Ax = b, and xh is a solution to Ax = 0. We claim that x = xp + xh is also a solution to Ax = b. This holds since Ax = A(xp + xh ) = Axp + Axh = b+0 =b

(1.5.1)

Clearly, this shows that the solution to the corresponding homogeneous equation plays a central role in the solution of nonhomogeneous equations. One observation we can make is that in the event we can ﬁnd a single particular solution xp to the nonhomogeneous equation, if the corresponding homogeneous equation has at least one free variable, then we know that there must be inﬁnitely many solutions to the nonhomogeneous equation as well. We could even take the perspective that, in order to solve a nonhomogeneous equation, we simply need to do two things: ﬁnd one particular solution to Ax = b, and then combine that particular solution with the general solution to the corresponding homogeneous equation Ax = 0. While this is not so useful with systems of linear algebraic equations, it turns out that this approach of solving the homogeneous equation ﬁrst is essential in the solution of differential equations.

Systems of linear equations revisited

45

The following example shows how the same structure is present in a class of differential equations that we will discuss in detail in section 2.3. Example 1.5.6 Consider the differential equations y + 3y = 0 and y + 3y = 6. Compare and contrast the solutions to these two equations. Solution. The ﬁrst equation, y + 3y = 0, we will call a homogeneous linear ﬁrst-order differential equation. Note that it asks a straightforward question: what function y(t ) is such that the function’s derivative plus 3 times itself is the zero function? Said differently, we seek a function y such that y = −3y. From our experience with exponential functions in calculus, we know that if y = e −3t , then y = −3e −3t . The same is true for functions like y = 2e −3t and y = −5e −3t ; indeed, we see that for any constant C, the function y = Ce −3t satisﬁes the differential equation. (It also turns out that these are the only functions that satisfy the differential equation.) If we next consider the related differential equation y + 3y = 6 – one that we will call a nonhomogeneous linear ﬁrst-order differential equation—we see that there is one obvious solution to the equation. In particular, if we let y(t ) be the constant function y(t ) = 2, then y (t ) = 0 and this function clearly makes the differential equation true since 3 × 2 = 6. Now, we should wonder if we have found all of the possible solutions to y + 3y = 6. The answer is no: as we will see in section 2.3, it turns out that the general solution y to this differential equation is y(t ) = 2 + Ce −3t We can verify that this is the case by direct substitution. Note that y = −3Ce −3t and therefore y + 3y = −3Ce −3t + 3(2 + Ce −3t ) = −3Ce −3t + 6 + 3Ce −3t = 6 Observe the structure of this solution function: if we let yp = 2, we have a particular solution to the nonhomogeneous equation. Further, letting yh = Ce −3t , this is the general solution to the related homogeneous equation. This demonstrates that the overall solution to the nonhomogeneous equation is y = yp + yh = 2 + Ce −3t Exercises 1.5 For each of the following m × n matrices A in exercises 1–8, determine whether the equation Ax = b is consistent for every choice of b ∈ Rm . If not, describe the set of all b ∈ Rm for which the equation is consistent. In each case, explain your reasoning fully. 4 −1 1. A = 1 −4 4 −1 2. A = −12 3

46

Essentials of linear algebra

1 0 2 3. A = 0 1 −3 ⎡ ⎤ 2 1 3⎦ 4. A = ⎣−1 4 −2 ⎡ ⎤ 1 5 −2 7⎦ 5. A = ⎣ 2 −1 −3 4 −14 ⎤ ⎡ 1 5 −2 7⎦ 6. A = ⎣ 2 −1 −3 4 −13 ⎡ ⎤ 1 0 0 ⎢0 1 0⎥ ⎥ 7. A = ⎢ ⎣0 0 1⎦ 0 0 0 ⎡ ⎤ 1 0 0 2 5⎦ 8. A = ⎣0 1 0 0 0 1 −3

9. If A is an m × n matrix and m > n, is it possible for the equation Ax = b to be consistent for every b ∈ Rm ? Explain. 10. If A is an m × n matrix and m ≤ n, is it guaranteed that the equation Ax = b will be consistent for every b ∈ Rm ? Explain. In each of exercises 11–16, determine whether the given vector b is in the span of the columns of the given matrix A. If b lies in the span of the columns of A, determine weights that enable you to explicitly write b as a linear combination of the columns of A. 2 4 −1 11. b = , A= 1 −4 5 6 4 −1 12. b = , A= −20 −12 3 6 1 0 2 13. b = , A= 0 1 −3 −2 ⎡ ⎡ ⎤ ⎤ 1 2 1 3⎦ 14. b = ⎣ −11 ⎦, A = ⎣−1 14 4 −2 ⎡ ⎡ ⎤ ⎤ −4 1 5 −2 7⎦ 15. b = ⎣ −2 ⎦, A = ⎣ 2 −1 1 −3 4 −14

Systems of linear equations revisited

⎤ −4 16. b = ⎣ −2 ⎦, 1 ⎡

47

⎤ 1 5 −2 7⎦ A = ⎣ 2 −1 −3 4 −13 ⎡

For each matrix A given in exercises 17–21, determine the general solution xh to the homogeneous equation Ax = 0. 1 −3 2 17. A = −4 1 3 ⎡ ⎤ 1 2 0 1 1 5 −7 ⎦ 18. A = ⎣3 4 −1 10 −13 8 −5 19. A = 10 −16 ⎡ ⎤ 3 1 −1 1⎦ 20. A = ⎣ 1 3 −1 1 3 ⎡ ⎤ 1 −1 2 6⎦ 21. A = ⎣ 4 −2 −7 3 −10 In exercises 22–26, solve the nonhomogeneous equation Ax = b, given the matrix A and vector b. Express your solution x (if one exists) in the form x = xp + xh , where xp is a particular solution to Ax = b and xh is the solution to the corresponding homogeneous equation Ax = 0. Compare your results to exercises 17–21, respectively. 1 −3 2 5 22. A = , b= −4 1 3 −9 ⎡ ⎡ ⎤ ⎤ 1 2 0 1 1 1 5 −7⎦, b = ⎣ 3 ⎦ 23. A = ⎣3 5 4 −1 10 −13 8 −5 −21 24. A = , b= 10 −16 42 ⎡ ⎡ ⎤ ⎤ 3 1 −1 3 1 ⎦, b = ⎣ − 1 ⎦ 25. A = ⎣ 1 3 1 −1 1 3 ⎤ ⎤ ⎡ ⎡ 1 −1 5 2 6⎦, b = ⎣ 16 ⎦ 26. A = ⎣ 4 −2 −7 3 −10 −27

48

Essentials of linear algebra

27. Suppose that A is a 6 × 9 matrix that has a pivot in every row. What can you say about the consistency of Ax = b for every b ∈ R6 ? Why? 28. Suppose that A is a 3 × 4 matrix and that the span of the columns of A is R3 . What can you say about the consistency of Ax = b for every b ∈ R3 ? Why? 29. If possible, give an example of a 3 × 2 matrix A such that the span of the columns of A is R3 . If ﬁnding such a matrix is impossible, explain why. 30. Suppose that A is a 4 × 3 matrix for which the homogeneous equation Ax = 0 has only the trivial solution. Will the equation Ax = b be consistent for every b ∈ R4 ? Explain. For the vectors b for which Ax = b is indeed a consistent equation, how many solution vectors x does each equation have? Why? 31. Suppose that A is a 3 × 4 matrix for which the homogeneous equation Ax = 0 has exactly one free variable present. Will the equation Ax = b be consistent for every b ∈ R3 ? Explain. For the vectors b for which Ax = b is indeed a consistent equation, how many solution vectors x does each equation have? Why? 32. Suppose that A is a 4 × 5 matrix for which the homogeneous equation Ax = 0 has exactly two free variables present. Will the equation Ax = b be consistent for every b ∈ R4 ? Explain. For the vectors b for which Ax = b is indeed a consistent equation, how many solution vectors x does each equation have? Why? 33. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) If Ax = b is consistent for at least one vector b, then A has a pivot in every row. (b) If A is a 4 × 3 matrix, then it is possible for the columns of A to span R4 . (c) If A is a 3 × 3 matrix with exactly two pivot columns, then the columns of A do not span R3 . (d) If A is a 3 × 4 matrix, then the columns of A must span R3 . (e) If y and z are solutions to the equation Ax = 0, then the vector y + z is also a solution to Ax = 0. (f) If y and z are solutions to the equation Ax = b, where b = 0, then the vector y + z is also a solution to Ax = b. 34. Solve the linear ﬁrst-order differential equation y + y = 3 by ﬁrst ﬁnding all functions yh that satisfy the homogeneous equation y + y = 0 and then determining a constant function yp that is a solution to y + y = 3. Verify by direct substitution that y = yh + yp is a solution to the given equation. 35. Solve the linear ﬁrst-order differential equation y − 5y = 6 by ﬁrst ﬁnding all functions yh that satisfy the homogeneous equation y − 5y = 0 and

Linear independence

49

then determining a constant function yp that is a solution to y − 5y = 6. Verify by direct substitution that y = yh + yp is a solution to the given equation.

1.6 Linear independence

In theorem 1.5.2, we found that when solving Ax = b, an ideal situation occurs when A has a pivot position in every row. Equivalently, this means that the equation Ax = b is guaranteed to have at least one solution for every vector b ∈ Rm (when A is m × n), or that every b ∈ Rm can be written as a linear combination of the columns of A. In other words, regardless of the choice of b, the equation Ax = b is always consistent. Because the equation is consistent, we are guaranteed that at least one solution x exists. In what follows, we explore conditions that imply not only that at least one solution exists, but in fact that only one solution exists. First, we consider the simpler situation of homogeneous equations. In section 1.4, we discovered that the equation Ax = 0 is always consistent. Because x = 0 always makes this equation true, we know that we at least have the trivial solution present. It is natural to ask: under what conditions on A is the trivial solution the only solution to the homogeneous equation Ax = 0? Geometrically, we are asking whether or not a nontrivial linear combination of the columns of A can be formed that leads to the zero vector. We revisit an earlier example to further explore these issues. Example 1.6.1 Does the equation Ax = 0 have nontrivial solutions if A is the matrix whose columns are a1 = [2 1]T and a2 = [−1 − 12 ]T ? Discuss the geometric implications of your conclusions. Solution. We ﬁrst consider the corresponding augmented matrix and row reduce, ﬁnding that

2 −1 0 1 − 12 0 → 1 − 12 0 0 0 0 This shows that any vector x = [x1 x2 ]T that satisﬁes x1 = 12 x2 will be a solution to Ax = 0. The presence of the free variable x2 implies that there are inﬁnitely many nontrivial solutions to this equation. If we interpret the matrix–vector product Ax as the linear combination Ax = x1 a1 + x2 a2 , then the equation 1 x2 a1 + x2 a2 = 0 2 implies geometrically that the zero vector (on the right) may be expressed as a nontrivial linear combination of a1 and a2 . For example, a1 + 2a2 = 0.

50

Essentials of linear algebra

4

x2

a1 x1 −4

a2

4

−4 Figure 1.11 Linear combinations of a1 and a2

from example 1.6.1.

Indeed, if we consider ﬁgure 1.11 this conclusion is evident: if we add one length of a1 to two lengths of a2 , we end up at 0. Another way to express the equation a1 + 2a2 = 0 is to write a1 = −2a2 . In this setting, we can see that a1 depends on a2 , and that the relationship is given by a linear equation. We hence say that a1 and a2 are linearly dependent vectors. The situation in example 1.6.1, where the vectors a1 and a2 are parallel is in contrast to that of example 1.4.3, where we instead considered the non-parallel vectors a1 = [2 1]T and a2 = [1 2]T ; in that setting, if we solve the associated homogeneous equation Ax = 0, we ﬁnd that 2 1 0 1 0 0 → 1 2 0 0 1 0 In this case, the only solution to Ax = 0 is the trivial solution, x = 0. The geometry of the situation also informs us: if we desire a linear combination of the vectors a1 and a2 (as shown in ﬁgure 1.12) that results in the zero vector, we see that the only way to accomplish this is to take 0a1 + 0a2 . Said differently, if we take any nontrivial linear combination c1 a1 + c2 a2 , we end up at a location other than the origin. When a1 and a2 in example 1.6.1 were parallel, we said that a1 and a2 were linearly dependent. In the current context, where a1 and a2 are not parallel, it makes sense to say that a1 and a2 are linearly independent, since neither depends on the other. Of course, in linear algebra we often consider sets of more than two vectors. The next deﬁnition formalizes what the terms linearly dependent and linearly independent mean in a more general context. Observe that the key criterion is

Linear independence

4

51

x2

a2 a1 −4

x1 4

−4 Figure 1.12 Linear combinations of a1 and a2

from example 1.4.3.

a geometric one: can we form a nontrivial linear combination of vectors that results in 0? Deﬁnition 1.6.1 Given a set S = {v1 , . . . , vk } where each vector vi ∈ Rm , the set S is linearly dependent if there exists a nontrivial solution x to the vector equation x1 v1 + x2 v2 + · · · + xk vk = 0

(1.6.1)

If (1.6.1) has only the trivial solution, then we say the set S is linearly independent. Note that (1.6.1) also takes us back to the fundamental questions about any linear system of equations: “does at least one solution exist?” (Yes; the zero vector is always a solution.) And “is that solution unique?” (Maybe; only if the vectors are linearly independent and the zero vector is the only solution.) The latter question addresses the fundamental issue of linear independence. We consider an example to demonstrate how we interpret the language of this most recent deﬁnition as well as how we will generally respond to the question of whether or not a set of vectors is linearly independent. Example 1.6.2 Determine whether the set S = {v1 , v2 , v3 } is linearly independent or linearly dependent if ⎤ ⎤ ⎡ ⎡ ⎡ ⎤ −1 1 0 v1 = ⎣ 1 ⎦ , v2 = ⎣ 0 ⎦ , v3 = ⎣ 1 ⎦ 1 1 −1

52

Essentials of linear algebra

Solution. By deﬁnition, the linear independence of the set S rests on whether or not nontrivial solutions exist to the vector equation x1 v1 + x2 v2 + x3 v3 = 0. Letting A = [v1 v2 v3 ], we know that this question is equivalent to determining whether or not Ax = 0 has a nontrivial solution. Considering the augmented matrix [A 0] and row-reducing, we ﬁnd ⎤ ⎡ ⎤ ⎡ 1 −1 0 0 1 0 0 0 ⎣ 1 0 1 0⎦ → ⎣0 1 0 0⎦ (1.6.2) 0 0 1 0 −1 1 1 0 It follows that Ax = 0 has only the trivial solution, and therefore the set S is linearly independent. Geometrically, this means that if we take any nontrivial combination of v1 , v2 , and v3 , the result is a vector that is not the zero vector. From example 1.6.2, we see how we will normally test a set of vectors for linear independence: we take advantage of our understanding of linear combinations and matrix multiplication and convert the vector equation x1 v1 + x2 v2 + · · · + xk vk = 0 to the matrix equation Ax = 0, where A is the matrix with columns v1 , . . . , vk . Row-reducing, we can test whether or not nontrivial solutions exist to Ax = 0 by examining pivot locations in the matrix A. Several facts about linear dependence and independence will prove to be useful in many aspects of our upcoming work. We simply state them here, and leave their veriﬁcation to the exercises at the end of this section: • Any set containing the zero vector is linearly dependent. • Any set {v1 } consisting of a single nonzero vector is linearly independent. • Any set of two vectors {v1 , v2 } is linearly independent whenever v1 is not a scalar multiple of v2 . • The columns of a matrix A are linearly independent if and only if the equation Ax = 0 has only the trivial solution. The concepts of linear independence and span both involve linear combinations of a set of vectors. Furthermore, there are many important and natural connections between span and linear independence. The next example extends the previous one and lays the foundation for a discussion of several general results. Example 1.6.3 Let the vectors v1 , v2 , v3 , and v4 be given by ⎡ ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ ⎤ −1 1 0 5 v1 = ⎣ 1 ⎦ , v2 = ⎣ 0 ⎦ , v3 = ⎣ 1 ⎦ , v4 = ⎣ 6 ⎦ 1 1 −1 −1 Let R = {v1 , v2 }, S = {v1 , v2 , v3 }, and T = {v1 , v2 , v3 , v4 }. Which of the sets R, S, and T are linearly independent? Which of the sets R, S, and T span R3 ?

Linear independence

53

Solution. We have already seen in example 1.6.2 that the set S is linearly independent. Moreover, we saw that when we let A = [v1 v2 v3 ] and row-reduce the augmented matrix for the equation Ax = 0, it follows that ⎡

⎤ ⎡ ⎤ 1 −1 0 0 1 0 0 0 ⎣ 1 0 1 0⎦ → ⎣0 1 0 0⎦ 0 0 1 0 −1 1 1 0

Not only does this show that the vectors in set S are linearly independent (Ax = 0 has only the trivial solution because A has a pivot in every column so there are no free variables present), but also, by theorem 1.5.2, the vectors in S span R3 since A has a pivot in every row. Since the vectors in S span R3 , this means that we can write every vector in R3 as a linear combination of the three vectors in S. Moreover, since A has a pivot in every column, it will also follow that every such linear combination is unique: every vector in R3 can be written in exactly one way as a linear combination of v1 , v2 , and v3 . What happens if we remove v3 from S and instead consider the set R = {v1 , v2 }? To answer the question of linear independence, we ask if there is a nontrivial solution to the vector equation x1 v1 + x2 v2 = 0. Equivalently, we let B be the 3 × 2 matrix whose columns are v1 and v2 and solve Bx = 0. Doing so, we ﬁnd that ⎡ ⎤ ⎡ ⎤ 1 −1 0 1 0 0 ⎣ 1 0 0⎦ → ⎣0 1 0 ⎦ 0 0 0 −1 1 0 so only the trivial solution exists and thus the set R is linearly independent. Note again that this is due to the fact that B has a pivot in every column. This should not be surprising, since we removed a vector from the linearly independent set S to get the set R: if the vectors in S do not depend on one another, neither should the vectors in R. On the other hand, we can also say by theorem 1.5.2 that the set R does not span R3 , since B does not have a pivot position in every row. For example, the vector b = [0 1 1]T cannot be written as a linear combination of v1 and v2 . This can be seen by row-reducing the augmented matrix that represents Bx = b, where we ﬁnd that ⎡ ⎤ ⎡ ⎤ 1 −1 0 1 0 0 ⎣ 1 0 1⎦ → ⎣0 1 0⎦ 0 0 1 −1 1 1 The last equation tells us that 0x1 + 0x2 = 1, which is impossible, and thus b cannot be written as a linear combination of the vectors in R. Finally, we consider the set T = {v1 , v2 , v3 , v4 }. To test if T is linearly independent, we let C be the matrix whose columns are v1 , v2 , v3 , and v4 ,

54

Essentials of linear algebra

and consider the equation Cx = 0, which corresponds to the equation x1 v1 + x2 v2 + x3 v3 + x4 v4 = 0. Row-reducing, ⎡ ⎤ ⎡ ⎤ 1 −1 0 5 0 1 0 0 2 0 ⎣ 1 0 1 6 0⎦ → ⎣0 1 0 −3 0⎦ 0 0 1 4 0 −1 1 1 −1 0 Note that the variable x4 is free, since C does not have a pivot in its fourth column. This shows that any vector x with entries x1 , x2 , x3 , and x4 such that x1 = −2x4 , x2 = 3x4 , and x3 = −4x4 will be a solution to the equation Cx = 0. For example, taking x4 = 1, it follows that ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 1 0 5 0 −2 ⎣ 1 ⎦ + 3 ⎣ 0 ⎦ − 4 ⎣ 1 ⎦ + 1 ⎣ 6 ⎦ = ⎣ 0 ⎦ 1 1 0 −1 −1 Thus, the set T is linearly dependent. We can also see from our computations that the set T does indeed span R3 , since the matrix C has a pivot position in every row. This result should be expected: we have already shown that every vector in R3 can be written as a linear combination of the vectors in S, and the set T contains all three vectors in S. There are many important generalizations we can make from example 1.6.3. For instance, from an algebraic perspective we see that we can easily answer questions about the linear independence and span of the columns of a matrix simply by considering the location of pivots in the matrix. In particular, the columns of A are linearly independent if and only if A has a pivot in every column, while the columns of A span Rm if and only if A has a pivot in every row. We state these results formally in the two following theorems. Theorem 1.6.1 equivalent:

Let A be an m × n matrix. The following statements are

a. The columns of A span Rm . b. A has a pivot position in every row. c. The equation Ax = b is consistent for every b ∈ Rm . In the next theorem, note particularly the change in emphasis in statement (b) from rows to columns when considering pivot positions in the matrix. Theorem 1.6.2 equivalent:

Let A be an m × n matrix. The following statements are

a. The columns of A are linearly independent. b. A has a pivot position in every column. c. The equation Ax = 0 has only the trivial solution.

Linear independence

55

At this point, it appears ideal if a set is linearly independent or spans Rm . The best scenario, then, is the case when a set has both of these properties and forms a linearly independent spanning set. In this case, for the matrix whose columns are the vectors in the set, we need the matrix to have a pivot in every column, as well as in every row. As we saw in example 1.6.3 with the set S and the corresponding matrix A, this can only happen when the number of vectors in the set S matches the number of entries in each vector. In other words, the corresponding matrix A must be square. Obviously if a square matrix has a pivot in every row, it must also have a pivot in every column, and vice versa. We close our current discussion with an important result that links the concepts of linear independence and span in the columns of a square matrix; theorem 1.6.3 is a consequence of the two preceding ones. Theorem 1.6.3 Let A be an n × n matrix. The following statements are equivalent: a. The columns of A are linearly independent. b. The columns of A span Rn . c. A has a pivot position in every column. d. A has a pivot position in every row. e. For each b ∈ Rn , the equation Ax = b has a unique solution. Theorem 1.6.3 shows that square matrices play a particularly important role in linear algebra, an idea that will further demonstrate itself when we study the notion of the inverse of a matrix in the following section. We conclude this section with a look ahead to our study of linear differential equations, in which the concepts of linear independence and span will also ﬁnd a prominent role. Example 1.6.4 Consider the differential equation y + y = 0. Explain why the function y = c1 cos t + c2 sin t is a solution to the differential equation. Solution. In our upcoming study of differential equations, we will call the equation y + y = 0 a linear second-order homogeneous equation with constant coefﬁcients. Equations of this form will be considered in chapter 3 and be the focus of chapter 4. For now, we can intuitively understand why y = c1 cos t + c2 sin t is a solution to the equation. Note that in order to solve the equation y + y = 0, we must ﬁnd all functions y such that y = −y. From our experience in calculus, we know that d d [sin t ] = cos t and [cos t ] = − sin t dt dt

56

Essentials of linear algebra

Furthermore, if we consider second derivatives, d d d2 d2 [ sin t ] = [ cos t ] = − sin t and [cos t ] = [− sin t ] = − cos t dt 2 dt dt 2 dt Hence, the second derivative of each basic trigonometric function is the opposite of itself, which makes both y = cos t and y = sin t solutions to the equation y + y = 0. Moreover, it is a straightforward exercise to show (using properties of the derivative) that any scalar multiple (such as y = 3 sin t ) of either function is also a solution to the differential equation, as is any combination of the form y = 2 cos t + 3 sin t . More generally, this makes any function y = c1 cos t + c2 sin t a solution to the differential equation. If we think about our understanding of linear independence for a set of two vectors, we ﬁnd an analogy to the two functions cos t and sin t : since these two functions are not scalar multiples of one another, it makes sense to call these functions linearly independent. Moreover, from the form of the function y = c1 cos t + c2 sin t , we are taking linear combinations of the basic trigonometric functions to form other solutions to the differential equation. We can even go so far as to say that the solution set to the differential equation is the span of the two functions cos t and sin t . In future work, we will see that this broader perspective on linear independence and span serves us well in solving linear differential equations. We will gain additional understanding of why the solution set to every secondorder linear homogeneous differential equation with constant coefﬁcients demonstrates a similar structure in subsequent work. Exercises 1.6 In each of exercises 1–8, determine whether the given set S is linearly independent or linearly dependent. 1. S = {v1 , v2 } where v1 = [3 − 2]T and v2 = [−9 6]T 2. S = {v1 , v2 } where v1 = [1 0]T and v2 = [0 1]T 3. S = {v1 , v2 } where v1 = [5 − 2]T and v2 = [5 2]T 4. S = {v1 , v2 , v3 } where v1 = [5 − 2]T , v2 = [5 2]T , and v3 = [11 − 5]T 5. S = {v1 , v2 , v3 } where v1 = [−1 2 1]T , v2 = [3 1 1]T , and v3 = [1 5 3]T 6. S = {v1 , v2 , v3 } where v1 = [−1 2 1]T , v2 = [3 1 1]T , and v3 = [1 5 2]T 7. S = {v1 , v2 } where v1 = [1 − 2 4 3]T and v2 = [−3 6 − 12 − 9]T 8. S = {v1 , v2 , v3 , v4 } where v1 = [−1 2 1]T , v2 = [3 1 1]T , v3 = [1 5 2]T , and v4 = [1 1 1]T 9. For each of the sets S in exercises 1–8, determine whether or not S spans Rm , where m is chosen appropriately.

Linear independence

57

10. Suppose that S is a set of three vectors in R5 . Is it possible for S to span R5 ? Why or why not? 11. Suppose that S is a set of two vectors in R3 . Is S linearly independent, linearly dependent, or not necessarily either? Explain your answer. 12. Let S be a set of four vectors in R3 . Is it possible for S to be linearly independent? Is it possible for S to span R3 ? Why or why not? 13. Let S be a set of ﬁve vectors in R4 . Must S span R4 ? Is it possible for S to be linearly independent? Explain. 14. If A is an m × n matrix, for what relationship between n and m are the columns of A guaranteed to not span Rm ? For what relationship between n and m will the columns have to be linearly dependent? 15. Prove that any set that contains the zero vector must be linearly dependent. 16. Explain why any set consisting of a single nonzero vector must be linearly independent. 17. Show that any set of two vectors, {v1 , v2 }, is linearly independent if and only if v1 is not a scalar multiple of v2 . 18. Explain why the columns of a matrix A are linearly independent if and only if the equation Ax = 0 has only the trivial solution. 19. Let v1 = [−1 2 1]T , v2 = [3 1 1]T , and v3 = [5 3 k ]T . For what value(s) of k is {v1 , v2 , v3 } linearly independent? For what value(s) of k is v3 in the span of {v1 , v2 }? How are these two questions related? 20. Consider the set S = {v1 , v2 , v3 } where v1 = [1 0 0]T , v2 = [0 1 0]T , and v3 = [0 0 1]T . Explain why S spans R3 , and also why S is linearly independent. In addition, determine the weights x1 , x2 , and x3 that allow you to write the vector [−27 13 91]T as a (unique) linear combination of v1 , v2 , v3 . What do you observe? 21. Let A be a 4 × 7 matrix. Suppose that when solving the homogeneous equation Ax = 0 there are three free variables present. Do the columns of A span R4 ? Explain. Are the columns of A linearly dependent, linearly independent, or is it impossible to say? Justify your answer. 22. Suppose that A is a 9 × 6 matrix and that A has six pivot columns. Are the columns of A linearly dependent, linearly independent, or is it impossible to say? Do the columns of A span R9 , or is it impossible to tell? Justify your answers. 23. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) If the system represented by Ax = 0 has a free variable present, then the columns of the matrix A are linearly independent vectors.

58

Essentials of linear algebra

(b) If a matrix has more columns than rows, then the columns of the matrix must be linearly dependent. (c) If an m × n matrix A has a pivot in every column, then the columns of A span Rm . (d) If A is an m × n matrix that is not square, it is possible for its columns to be both linearly independent and span Rm . 24. Consider the linear second-order homogeneous differential equation y + y = 0. Show by direct substitution that y1 = e t and y2 = e −t are solutions to the differential equation. In addition, show by substitution that any linear combination y = c1 e t + c2 e −t is also a solution. 25. We have seen that the general solution to the linear second-order differential equation y + y = 0 is given by y(t ) = c1 sin(t ) + c2 cos(t ) Suppose we know initial values for y(0) and y (0) to be y(0) = 4 and y (0) = −2 What are the values of c1 and c2 ? How is a system of linear equations involved? 26. It can be shown that the solution to the linear second-order differential equation y − y = 0 is given by y(t ) = c1 e t + c2 e −t Suppose we know initial values for y(0) and y (0) to be y(0) = 4 and y (0) = −2 What are the values of c1 and c2 ? How is a system of linear equations involved? 1.7 Matrix algebra

For a given system of linear equations, we are now interested in solving the vector equation Ax = b, where A is a known m × n matrix, b ∈ Rm is given, and we seek x ∈ Rn . It is natural to compare this equation to an elementary linear equation such as 2x = 7. The key algebraic step in solving 2x = 7 is to divide both sides of the equation by 2. Said differently, we multiply both sides by the multiplicative inverse of the number 2. In anticipation of a new approach to solving the vector equation Ax = b, we carefully state the details required to solve 2x = 7. In particular, from the equation 2x = 7, it follows that 1 1 1 7 7 7 2 (2x) = 2 (7), so that ( 2 · 2)x = 2 . Thus, 1 · x = 2 , so x = 2 . From a sophisticated perspective, to solve the equation 2x = 7, we need to be able to multiply, to have a multiplicative identity (that is, the number 1), and to be able to compute a multiplicative inverse (here, the number 12 ).

Matrix algebra

59

In this section, we lay the foundation for similar ideas that provide an alternate way to solve the equation Ax = b: essentially we are interested in determining whether we can ﬁnd a matrix B so that when we compute BA the result is the matrix equivalent of “1”. To do this, we will ﬁrst have to learn what it means to multiply two matrices; a simpler (and still important) place to begin is with the addition of matrices and multiplication of matrices by scalars. We already know how to add vectors and multiply them by scalars; similar principles hold for matrices. Two matrices can be added (or subtracted) if and only if they have an identical number of rows and columns. When addition (subtraction) is deﬁned, the result is computed component-wise. Furthermore, the multiple of a matrix by a scalar c ∈ R is attained by multiplying every entry of the matrix by the same constant c. The following example demonstrates these basic facts. Example 1.7.1 Let A and B be the matrices 1 3 −4 −6 10 −1 A= , B= 0 −7 3 2 11 2 Compute A + B and −3A. Solution. Since A and B are both 2 × 3, their sum is deﬁned and is given by 1 3 −4 −6 10 −1 −5 13 −5 A+B = + = 0 −7 3 2 11 3 −5 13 2 The scalar multiple of a matrix is always deﬁned, and −3A is given by −3 −9 12 −3A = 0 21 −6 Matrix addition, when deﬁned, has all of the expected properties of addition. In particular, A + B = B + A, so order does not matter, and we say matrix addition is commutative. Since A + (B + C) = (A + B) + C, the way we group more than two matrices to add also does not matter and we say matrix addition is associative. There is even a matrix that acts like the number 0. If Z is a matrix of the same number of rows and columns as A such that every entry in Z is zero, then it follows that A + Z = Z + A = A. We call this zero matrix the additive identity. The next natural operation to consider, of course, is multiplication. What does it mean to multiply two matrices? And when does it even make sense to multiply two matrices? We know for matrix–vector multiplication that the product Ax computes the vector b that is the unique linear combination of the columns of A having the entries of the vector x as weights. Moreover, this product is only deﬁned when the number of entries in x matches the number of columns of A. If we now consider a matrix B, we can naturally think about the matrix product AB by considering the columns of B, say b1 , . . . , bk . In particular, we make the following deﬁnition.

60

Essentials of linear algebra

Deﬁnition 1.7.1 If A is an m × n matrix, and B is a matrix whose columns are b1 , . . . , bk such that the matrix–vector product Abj is deﬁned for each j = 1, . . . , k, then we deﬁne the matrix product AB by (1.7.1) AB = [Ab1 Ab2 · · · Abk ] Note particularly that since A has n columns, in order for Abj to be deﬁned each bj must belong to Rn . This in turn implies that the matrix B must have dimensions n × k. Speciﬁcally, the number of rows in B must equal the number of columns in A. We explore matrix multiplication and its properties in the next example. Example 1.7.2

Let A and B be the matrices 1 3 −4 −6 10 A= , B= 0 −7 3 2 2

Compute the matrix products AB and BA, or explain why they are not deﬁned. Solution. First we consider AB. To do so, we would have to compute both Ab1 and Ab2 , where b1 and b2 are the columns of B. But neither of these products is defined, since A has three columns and B has just two rows. Thus, AB is not defined. On the other hand, BA is deﬁned. For instance, we can compute the ﬁrst column of BA by taking Ba1 , where we see that 1 −6 10 −6 Ba1 = = 3 2 0 3 Similar computations for Ba2 and Ba3 show that −6 −88 44 BA = 3 −5 −8 There are several important observations to make based on example 1.7.2. One is that if A is m × n and B is n × k so that the product AB is deﬁned, then the resulting matrix AB is m × k. This is true since the columns of AB are each of the form Abj , thus being linear combinations of the columns of A, which have m entries, so that AB has m rows. Moreover, we have to consider each of the products Ab1 , . . . , Abk , therefore giving AB k columns. Furthermore, we clearly see that order matters in matrix multiplication. Speciﬁcally, given matrices A and B for which AB is deﬁned, it is not even guaranteed that BA is deﬁned, much less that AB = BA. Even when both products are deﬁned, it is possible (even typical) that AB = BA. Formally, we say that matrix multiplication is not commutative. This fact will be explored further in the exercises. It is, however, the case that matrix multiplication (for matrices of the appropriate sizes) is both associative and distributive. That is, A(BC) = (AB)C and A(B + C) = AB + AC, again provided the sizes of the matrices make the relevant products and sums deﬁned.

Matrix algebra

61

Now, we should not forget our motivation for considering matrix multiplication: we want to develop an alternative approach to solving equations of the form Ax = b by multiplying A by another matrix B so that the product BA is the matrix equivalent of the number 1 (while simultaneously multiplying b by the same matrix B). What is the matrix equivalent of the number 1? We consider this question and more in the following example. Example 1.7.3 Consider the matrices 5 11 1 0 A= and I2 = 0 1 −3 −7 Compute AI2 and I2 A. What is special about the matrix I2 ? Solution.

Using the rules for matrix multiplication, we observe that 5 11 1 0 5 11 AI2 = = =A −3 −7 0 1 −3 −7

and similarly I2 A =

1 0 5 11 5 11 = =A 0 1 −3 −7 −3 −7

Thus, we see that multiplying the matrix A by I2 has no effect on the matrix A. The matrix I2 in example 1.7.3 is important because it has the property that I2 A = A for any matrix A with two rows (not simply the matrix A in example 1.7.3) and AI2 = A for any A with two columns. We can similarly show that if I3 is the matrix ⎤ ⎡ 1 0 0 I3 = ⎣0 1 0⎦ 0 0 1 then I3 A = A for any matrix A with three rows, and AI3 = A for any matrix A with three columns. Similar results hold for corresponding matrices In of larger size; each of these matrices acts like the number 1, since multiplying other matrices by In has no effect on the given matrix. Matrices which when multiplied by other matrices do not change the other matrices, are called identity matrices. More formally, the n × n identity matrix In is the square matrix whose diagonal entries all equal 1, and whose off-diagonal entries are all 0. (The diagonal entries in a matrix are those whose row and column indices are the same.) Often, when the context is clear, we will write simply I, rather than In . We also note that In is the only matrix that is n × n and acts as a multiplicative identity. Finally, it is evident that for any m × n matrix A, Im A = AIn = A. In the next section, we will explore the notion of the inverse of a matrix, and there see that identity matrices play a central role. One ﬁnal algebraic operation with matrices merits formal introduction here. Given a matrix A, its transpose, denoted AT , is the matrix whose columns

62

Essentials of linear algebra

are the rows of A. That is, taking the transpose of a matrix replaces its rows with its columns, and vice versa. For example, if A is the 2 × 3 matrix 1 3 −4 A= 0 −7 2 then its transpose AT is the 3 × 2 matrix ⎡

⎤ 1 0 AT = ⎣ 3 −7⎦ −4 2

Note that this is the same notation we regularly use to express a column vector in the form b = [1 2 3]T . In the case that A is a square matrix, taking its transpose results in swapping entries across its diagonal. For example, if ⎤ ⎡ 5 −2 7 A = ⎣ 0 −3 −1⎦ −4 8 −6 then

⎡

⎤ 5 0 −4 8⎦ AT = ⎣−2 −3 7 −1 −6

The transpose operator has several nice algebraic properties, some of which will be explored in the exercises. For example, for matrices for which the appropriate sums and products are deﬁned, (A + B)T = AT + BT and (AB)T = BT AT For a square matrix such as A=

3 −1 −1 2

it happens that AT = A. Any square matrix A for which AT = A is said to be symmetric. It turns out that symmetric matrices have several especially nice properties in the context of more sophisticated concepts that arise later in the text, and we will revisit them at that time. 1.7.1 Matrix algebra using Maple

While it is important that we ﬁrst learn to add and multiply matrices by hand to understand how these processes work, just like with row-reduction it is reasonable to expect that we will often use available technology to perform tedious computations like multiplying a 4 × 5 and 5 × 7 matrix. Moreover, in real-world applications, it is not uncommon to have to deal with matrices that

Matrix algebra

63

have thousands of rows and thousands of columns, or more. Here we introduce a few Maple commands that are useful in performing some of the algebraic manipulations we have studied in this section. Let us consider some of the matrices deﬁned in earlier examples: 1 3 −4 −6 10 −6 10 −1 A= , B= , C= 0 −7 3 2 3 2 11 2 After deﬁning each of these three matrices with the usual commands in Maple, such as > A := ;

we can execute the sum of A and C and the scalar multiple −3B with the commands > A + C; > -3*B;

for which Maple will report the outputs 18 −30 −5 13 −5 and 3 −5 13 −9 −6 We have previously seen that to compute a matrix–vector product, the period is used to indicate multiplication, as in > A.x;. The same syntax holds for matrix multiplication, where deﬁned. For example, if we wish to compute the product BA, we enter > B.A;

which yields the output

−6 −88 44 3 −5 −8

If we try to have Maple compute an undeﬁned product, such as AB through the command > A.B;, we get the error message Error, (in LinearAlgebra:-MatrixMatrixMultiply) first matrix column dimension (3) second matrix row dimension (2)

In the event that we need to execute computations involving an identity matrix, rather than tediously enter all the 1’s and 0’s, we can use the built-in Maple command IdentityMatrix(n); where n is the number of rows and columns in the matrix. For example, entering > Id := IdentityMatrix(4);

64

Essentials of linear algebra

results in the output

⎡ 1 ⎢0 Id := ⎢ ⎣0 0

0 1 0 0

0 0 1 0

⎤ 0 0⎥ ⎥ 0⎦ 1

Note: Id is the name we are using to store this √ identity matrix. We cannot use the letter I because I is reserved to represent −1 in Maple . Finally, if we desire to compute the transpose of a matrix A, such as 1 3 −4 A= 0 −7 2 the relevant command is > Transpose(A);

which generates the output

⎤ 1 0 AT = ⎣ 3 −7⎦ −4 2 ⎡

Exercises 1.7 1. Let A, B, and C be the given matrices. In each of the following problems, compute (by hand) the prescribed algebraic combination of A, B, and C if the operation is deﬁned. If the operation is not deﬁned, explain why. ⎡ ⎡ ⎤ ⎤ −6 10 5 3 3 −5 2 0⎦ , B = ⎣ 2 11⎦ , C = ⎣−1 A= −1 5 −4 2 −4 −3 −2 (a) B + C (f) BA (k) AT + B (p) (BA)T

(b) A + B (g) AA (l) (B + C)T

(c) −2A (h) A(B + C) (m) BT C

(d) −3B + 4C (i) CA (n) BCT

(e) AB (j) C(A + B) (o) (AB)T

2. Let A, B, and C be the given matrices. In each of the following problems, compute (by hand) the prescribed algebraic combination of A, B, and C whenever the operation is deﬁned. If the operation is not deﬁned, explain why. 2 11 1 0 −5 3 A= , B= , C= 2 4 −3 −2 −5 3 (a) B + C (f) BA (k) AT + B (p) (BA)T

(b) A + B (g) AA (l) (B + C)T

(c) −2A (h) A(B + C) (m) BT C

(d) −3B + 4C (i) CA (n) BCT

(e) AB (j) C(A + B) (o) (AB)T

Matrix algebra

65

3. Discuss the differences between multiplying two square matrices versus multiplying non-square matrices. That is, under what circumstances can two square matrices be multiplied? How does the situation change for non-square matrices? In addition, if the product AB is deﬁned, is BA? 4. Give an example of 2 × 2 matrices A and B for which AB = BA. 5. Give an example of 2 × 2 matrices A and B for which AB = BA. 6. If A is m × n and B is n × k, and neither A nor B is square, can AB ever equal BA? Explain. In exercises 7–9, let A be the given matrix. If possible, ﬁnd a matrix B such that BA = I2 ; if B exists, determine whether BA = AB. 2 0 7. A = 0 5 2 4 8. A = 0 5 1 −1 9. A = −1 2 In exercises 10 and 11, for the given matrix A, answer each of the following questions: (a) Are the columns of A linearly independent? (b) Do the columns of A span R2 ? (c) How many pivot positions does A have? (d) Solve the equation Ax = 0 by row reducing by hand. Is A row equivalent to an important matrix? (e) If possible, determine a 2 × 2 matrix B such that BA = I2 . 2 −1 10. A = 2 −3 2 −1 11. A = 2 −4 12. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) If A and B are matrices of the same size, then the products AB and BA are always deﬁned. (b) If A and B are matrices such that the products AB and BA are both deﬁned, then AB = BA. (c) If A and B are matrices such that AB is deﬁned, then (AB)T = AT BT .

66

Essentials of linear algebra

(d) If A and B are matrices such that A + B is deﬁned, then (A + B)T = AT + BT . 13. Compute the prescribed algebraic computations in exercise 1 using a computer algebra system. 14. Compute the prescribed algebraic computations in exercise 2 using a computer algebra system.

1.8 The inverse of a matrix

We have observed repeatedly that linear algebra is a subject centered on one idea—systems of linear equations—viewed from several different perspectives. Continuing with this theme, we have recently considered an alternative method for solving the equation Ax = b by attempting to ﬁnd a matrix B such that BA = I, where I is the appropriate identity matrix. If we can in fact ﬁnd such a matrix B, it follows that B(Ax) = Bb

(1.8.1)

By the associativity of matrix multiplication and the deﬁning property of B, it follows that B(Ax) = (BA)x = Ix = x

(1.8.2)

Equations (1.8.1) and (1.8.2) together imply that x = Bb. Thus, the existence of such a matrix B shows us how we can solve Ax = b by multiplication. It turns out that from a computational point of view, row-reduction is a superior approach to solving Ax = b; nonetheless, the perspective that it may be possible to solve the equation through the use of a multiplicative inverse has many important theoretical applications. In addition, similar ideas will be encountered in our study of differential equations. Our work in section 1.7 showed that if A and B are not square matrices, it is never the case that AB and BA are equal. Thus it is only possible to ﬁnd a matrix B such that AB = BA = I if A is square (though even then it is not always the case that such a matrix B exists). Moreover, as we know from theorem 1.6.3, some square matrices have the important property that the equation Ax = b has a unique solution for every possible choice of b. For the next few sections, we therefore focus our attention almost exclusively on square matrices. Here, our emphasis is on the questions “when does a matrix B exist such that AB = BA = I?” and “when such a matrix B exists, how can we ﬁnd it?” The next deﬁnition formalizes the notion of the inverse of a matrix. Deﬁnition 1.8.1 If A is an n × n matrix, we say that A is invertible if and only if there exists an n × n matrix B such that AB = BA = In

(1.8.3)

The inverse of a matrix

67

When A is invertible, we call B the inverse of A and write B = A−1 (read “B is A-inverse”). If A is not invertible, A is often called a singular matrix, and thus saying “A is invertible” is equivalent to saying “A is nonsingular.” It can be shown (see exercise 19) that if A is an invertible n × n matrix, then its inverse is unique (i.e., a given matrix cannot have two distinct inverses). In addition, we note from our discussion above in (1.8.1) and (1.8.2) that if A is invertible, then the equation Ax = b has a solution for every b ∈ Rn . In particular, that solution is x = A−1 b. Moreover, since Ax = b has a solution for every b ∈ Rn , we know from theorem 1.6.1 that A has a pivot position in every row. From this, the fact that A is square, and theorem 1.6.3, it follows that Ax = b has a unique solution for every b ∈ Rn . We state this result formally in the following theorem. Theorem 1.8.1 If A is an n × n invertible matrix, then the equation Ax = b has a unique solution for every b ∈ Rn . Before beginning to explore how to ﬁnd the inverse of a matrix, as well as when the inverse even exists, we consider an example to see how we may check if two matrices are inverses and how to apply an inverse to solve a related equation. Example 1.8.1 Let A and B be the matrices 4 5 2/3 −5/3 A= , B= 1 2 −1/3 4/3 Show that A and B are inverses, and then use this fact to solve Ax = b, where b = [−7 3]T , without using row reduction. Solution. The reader should verify that the following matrix products indeed hold: 4 5 2/3 −5/3 1 0 AB = = 1 2 −1/3 0 1 4/3 and 2/3 −5/3 4 5 1 0 = BA = 0 1 −1/3 4/3 1 2 This shows that indeed B = A−1 . Note, equivalently, that A = B−1 . Now, we can easily solve the equation Ax = b where b is the given vector: 2/3 −5/3 −7 −29/3 −1 x=A b= = 3 19/3 −1/3 4/3 Of course, what is not clear in example 1.8.1 is how, given the matrix A, one might determine the entries in the inverse matrix B = A−1 . We now explore this in the 3 × 3 case for a general matrix A, and along the way learn conditions that guarantee that A−1 exists.

68

Essentials of linear algebra

Given a 3 × 3 matrix A, we seek a matrix B such that AB = I3 . Let the columns of B be b1 , b2 , and b3 , and the columns of I3 be e1 , e2 , and e3 . The column-wise deﬁnition of matrix multiplication then tells us that the following three vector equations must hold: Ab1 = e1 , Ab2 = e2 , and Ab3 = e3

(1.8.4)

For the unique inverse matrix B to exist, it follows that each of these equations must have a unique solution. Clearly if A has a pivot position in every row (or, equivalently, the columns of A span R3 ), then by theorem 1.6.3 it follows that we can ﬁnd unique vectors b1 , b2 , and b3 that make these three equations hold. Thus, any one of the conditions in theorem 1.6.3 will guarantee that B = A−1 exists. Moreover, if A−1 exists, we know from theorem 1.8.1 that every condition in theorem 1.6.3 also holds. Momentarily, let us assume that A is indeed invertible. If we proceed to ﬁnd the matrix B by solving the three equations in (1.8.4), we see that row-reduction provides an approach for producing all three vectors at once. To ﬁnd these vectors one at a time, it would be necessary to row-reduce each of the three augmented matrices [A e1 ], [A e2 ], and [A e3 ]

(1.8.5)

In each case, the exact same elementary row operations will be applied to A and thus be applied, respectively, to the vectors e1 , e2 , and e3 . As such, we may do all of them at once by considering the augmented matrix [A e1 e2 e3 ]

(1.8.6)

Note particularly that the form of the augmented matrix in (1.8.6) is [A I3 ]. If we now row-reduce this matrix, and A has a pivot in every row, it follows that we will be able to read the coefﬁcients of A−1 from the result. This process is best illuminated by an example, so we now explore how these computations lead us to A−1 in a concrete situation. Example 1.8.2

Find the inverse of the matrix ⎤ ⎡ 2 1 −2 1 −1⎦ A=⎣ 1 −2 −1 3

Solution. Following the discussion above, we identity matrix and row-reduce. It follows that ⎡ ⎤ ⎡ 2 1 −2 1 0 0 1 0 ⎣ 1 1 −1 0 1 0⎦ → ⎣0 1 0 0 −2 −1 3 0 0 1

augment A with the 3 × 3 ⎤ 0 2 −1 1 0 −1 2 0⎦ 1 1 0 1

These computations demonstrate two important things. The ﬁrst is that the row reduction of A in the ﬁrst three columns of the augmented matrix shows

The inverse of a matrix

69

that A has a pivot position in every row, and therefore A is invertible. Moreover, the row-reduced form of [A I3 ] tells us that A−1 is the matrix ⎡ ⎤ 2 −1 1 2 0⎦ A−1 = ⎣−1 1 0 1 Again, we observe from our preceding discussion and example 1.8.2 that we have found an algorithm for ﬁnding the inverse of a square matrix A. We augment A with the corresponding identity matrix and row-reduce. Provided that A has a pivot in every row, we ﬁnd by row-reducing that [A I] → [I A−1 ] That is, row-reduction of an invertible matrix A augmented with the identity matrix leads us directly to the inverse, A−1 . Next, we examine what happens in the event that a square matrix is not invertible. Example 1.8.3 Find the inverse of the matrix 2 1 A= −6 −3 provided the inverse exists. If the inverse does not exist, explain why. Solution. We augment A with the 2 × 2 identity matrix and row-reduce, ﬁnding that 1

1 2 0 − 16 2 1 1 0 → 1 −6 −3 0 1 0 0 1 3 Again, we see at least two key facts from these computations: A does not have a pivot position in every row, and thus A is not invertible. In particular, recall that we are solving two vector equations simultaneously in these computations: Ab1 = e1 and Ab2 = e2 . If we consider the ﬁrst of these and observe the rowreduction 1 2 1 1 1 2 0 → −6 −3 0 0 0 1 we see that this system of equations is inconsistent—the last row of the augmented matrix is equivalent to the equation 0b11 + 0b12 = 1, where b = [b11 b12 ]T . This is yet another way of saying that A does not have an inverse. The above two examples together show us, in general, how we answer two questions at once: does the square matrix A have an inverse? And if so, what is A−1 ? In a computational sense, we can simply row-reduce A augmented with the appropriate identity matrix and then observe if A has a pivot position in every row. If A is row equivalent to the appropriately sized identity matrix, then A is invertible and A−1 will be revealed through the row-reduction.

70

Essentials of linear algebra

We close this section with a formal statement of a theorem that summarizes our discussion. Note particularly how this result extends theorem 1.6.3 and demonstrates the theme of linear algebra: one idea from several perspectives. We will refer to this result as The Invertible Matrix Theorem. Theorem 1.8.2 (The Invertible Matrix Theorem) Let A be an n × n matrix. The following statements are equivalent: a. A is invertible. b. The columns of A are linearly independent. c. The columns of A span Rn . d. A has a pivot position in every column. e. A has a pivot position in every row. f. A is row equivalent to In . g. For each b ∈ Rn , the equation Ax = b has a unique solution. In addition to being of great theoretical signiﬁcance, inverse matrices ﬁnd many key applications. We investigate one such use in the following subsection. 1.8.1 Computer graphics

Linear algebra is the engine that drives computer animations. While animated movies originally were constructed by artists hand-drawing thousands of similar sketches that were photographed and played in sequence, today such ﬁlms are created entirely with computers. Once a ﬁgure has been constructed, moving the image around the screen is essentially an exercise in matrix multiplication. Every pixel in an image on a computer screen can be represented through coordinates. For an elementary example, consider an animated ﬁgure which, at a given point in time, has its hand located at the point (3, 4). To see how a basic animation can be built, assume further that the ﬁgure’s elbow is at the origin (0, 0), and that an animator wishes to make the hand wave back and forth. This enables us to represent the forearm of the ﬁgure with the vector v = [3 4]T . If we now consider the matrix √ 3/2 √ −1/2 R= 3/2 1/2 and apply the matrix R to the vector v, we see that the product is √ √ 3/2 √ −1/2 3 3 3− 4/2 0.598 √ = ≈ Rv = 4.964 3/2 4 1/2 3 + 4 3/2

The inverse of a matrix

5

71

Rv v

3 Figure 1.13 The vectors

v = [3 4] and Rv = [0.598 4.964]T .

Thus, the ﬁgure’s hand is now located at the point (0.598, 4.964). In fact, the hand has been rotated 30◦ counterclockwise about the origin, as shown in ﬁgure 1.13. The matrix R is known as a rotation matrix; its impact on any vector is to rotate the vector 30◦ counterclockwise about the origin. One way to see why this is so is to compute the vectors Re1 and Re2 , where e1 and e2 are the columns of the 2 × 2 identity matrix. Since each of those two vectors is rotated 30◦ when multiplied by R, the same thing happens to any vector in R2 , because any such vector may be written as a linear combination of e1 and e2 . Not only do computer animations show one application of matrix–vector multiplication, but they also demonstrate the need for inverse matrices. For instance, suppose we knew that the matrix R had been applied to some unknown vector v and that the result was 2 Rv = 5 That is, a hand located at some unknown point v was waved and had been moved to the new point (2, 5). An animator might want to wave the hand back so that it ended up at its original location, which is again represented by the vector v. To do so, he must answer the question “for which vector v is Rv = [2 5]T ?” We now know that one way to solve for v is to use the inverse of R. The matrix R is clearly invertible because its columns are linearly independent; we can compute R −1 in the standard way to ﬁnd that √ 3/2 √1/2 R −1 = −1/2 3/2

72

Essentials of linear algebra

We can solve for v by computing v=R so that v=R

−1

−1

(Rv) = R

−1

2 5

√ 2 3/2 √1/2 2 4.232 = ≈ 5 3.330 −1/2 3/2 5

Of course, in actual animations, we would not wave the hand by a single 30◦ rotation, but rather through a sequence of consecutive small rotations, for instance, 1-degree rotations. Again, computers enable us to do thousands of such computations almost instantly and make amazing animations possible. We consider an additional example to see the role of matrices to store data as well as matrices and their inverses to transform the data. Example 1.8.4

Consider the matrix

0 1 B= 1 0

Let v1 = [2 1]T , v2 = [3 3]T , and v3 = [4 0]T be the vertices of a triangle in the plane. Compute Bv1 , Bv2 , and Bv3 . Sketch a picture of the new triangle that has resulted from applying the matrix B to the vertices (2, 1), (3, 3), and (4, 0). What is the impact of the matrix B on each point? Finally, determine the inverse of B. What do you observe? Solution.

We observe ﬁrst that 0 1 2 1 0 1 3 3 = , Bv2 = = , and Bv1 = 1 0 1 2 1 0 3 3 0 1 4 0 Bv1 = = 1 0 0 4

From these calculations, we see that multiplying by B moves a given point to a new point that corresponds to the one found by switching the coordinates of the given point. Geometrically, the matrix B accomplishes a reﬂection across the line y = x in the plane, as we can see in ﬁgure 1.14. Moreover, if we think about how we might undo reﬂection across the line y = x, it is clear that to restore a point to its original location, we need to reﬂect the point back across the line. Said differently, the inverse of the matrix B must be the matrix itself. We can conﬁrm that B−1 = B by computing the product 0 1 0 1 BB = =I 1 0 1 0 It is noteworthy that the calculations of Bv1 , Bv2 , and Bv3 can be simpliﬁed into a single matrix product if we let T = [v1 v2 v3 ]. That is, the matrix T holds the

The inverse of a matrix

73

5 (0,4) (3,3) (1,2) (2,1) (4,0) 5 Figure 1.14 The

triangle with vertices v1 = [2 1]T , v2 = [3 3]T , and v3 = [4 0]T and its image under multiplication by the matrix B.

coordinates of the three points in the given triangle; the product BT is then the image of the triangle under multiplication by the matrix B. A more complicated polygonal ﬁgure than a triangle would be stored in a matrix with additional columns. Of course, the actual work of computer animations is much more complicated than what we have presented here. Nonetheless, matrix multiplication is the platform on which the entire enterprise of animated ﬁlms is built. In addition to achieving rotations and reﬂections, matrices can be used to dilate (or magnify) images, to shear images, and even to translate them (provided that we are clever about the coordinate system we use to represent points). Finally, matrices are even essential to the storage of images, as each column of a matrix can be viewed as a data point in an image. More about the application of matrices and their inverses to computer graphics can be learned in one of the projects found at the end of this chapter. In addition, a deeper discussion of the notion of linear transformations (of which reﬂection and rotation matrices are a part) can be found in appendix D. 1.8.2 Matrix inverses using Maple

Certainly we can use Maple’s row-reduction commands to ﬁnd inverses of matrices. However, an even simpler command exists that enables us to avoid having to enter the corresponding identity matrix. Let us consider the two matrices from examples 1.8.2 and 1.8.3. Let ⎡ ⎤ 2 1 −2 1 −1⎦ A=⎣ 1 −2 −1 3 If we enter the command > MatrixInverse(A);

74

Essentials of linear algebra

we see the resulting output which is indeed A−1 , ⎡ ⎤ 2 −1 1 ⎣−1 2 0⎦ 1 0 1 For the matrix

2 1 A= −6 −3

executing the command > MatrixInverse(A); produces the output Error, (in LinearAlgebra:-LA Main:-MatrixInverse) singular matrix

which is Maple’s way of saying “A is not invertible.” Exercises 1.8 In exercises 1–5, ﬁnd the inverse of each matrix (doing the computations by hand), or show that the inverse does not exist. 2 1 1. 2 2 5 0 2. 0 −3 2 −1 3. −4 2 ⎡ ⎤ 1 2 −1 3⎦ 4. ⎣0 1 0 0 2 ⎡ ⎤ 1 −2 −1 1 0⎦ 5. ⎣−1 1 3 4 1 3 2 11 −3 6. Let A = and b1 = , b2 = , b3 = . Find A−1 and 1 4 5 4 −7 use it to solve the equations Ax = b1 , Ax = b2 , and Ax = b3 . In addition, show how you can use row reduction to solve all three of these equations simultaneously. 1 −3 10 2 −1/2 7. Let A = and b1 = , b2 = , b3 = . Solve the 1 1 −2 6 −20 equations Ax = b1 , Ax = b2 , and Ax = b3 . What do you observe about the matrix A? 1 −2 3 8. Let A = and b = . Without doing any computations, explain 1 2 5 why b may be written as a linear combination of the columns of A.

The inverse of a matrix

75

Then execute computations to ﬁnd the explicit weights by which b is a linear combination of the columns of A. ⎤ ⎡ 1 0 0 9. Let E be the elementary matrix given by E = ⎣0 0 1⎦. Note that E is 0 1 0 obtained by interchanging rows 2 and 3 of the 3 × 3 identity matrix. Choose a 3 × 3 matrix A, and compute EA. What is the effect on A of multiplication by E? 10. Without doing any row-reduction, determine E−1 where E is the matrix deﬁned in exercise 9. (Hint: E−1 EI = I. Think about the impact that E has on I, and then what E−1 must accomplish.) ⎡ ⎤ 1 0 0 11. Let E be the elementary matrix given by E = ⎣0 c 0⎦. Note that E is 0 0 1 obtained by scaling the second row of the 3 × 3 identity matrix by the constant c. Choose a 3 × 3 matrix A, and compute EA. What is the effect on A of multiplication by E? 12. Without doing any row reduction, determine E−1 where E is the matrix deﬁned in exercise 11. What do you observe? ⎡ ⎤ 1 0 0 13. Let E be the elementary matrix given by E = ⎣ 0 1 0⎦. Note that E is a 0 1 obtained by applying the row operation of taking a times row 1 of the 3 × 3 identity matrix and adding it to row 3 to form a new row 3. Choose a 3 × 3 matrix A, and compute EA. What is the effect on A of multiplication by E? 14. Without doing any row reduction, determine E−1 where E is the matrix deﬁned in exercise 13. (Hint: E−1 EI = I. Think about the impact that E has on I, and then what E−1 must accomplish.) √ √ 1/√2 −1/√2 15. Let A = . Compute A−1 . What do you observe about the 1/ 2 1/ 2 relationship between A and A−1 ? cos θ − sin θ . Compute AT and 16. Let θ be any real number and A = sin θ cos θ AT A. What do you observe about the relationship between A and AT ? 17. Let A and B be invertible n × n matrices with inverses A−1 and B−1 , respectively. Show that AB is also an invertible matrix by ﬁnding (AB)−1 in terms of A−1 and B−1 . 18. Let A be an invertible matrix. Explain why A−1 is also invertible, and ﬁnd (A−1 )−1 . 19. Show that if A is an invertible n × n matrix, then its inverse is unique. (Hint: suppose that both B and C are inverses of A. What can you say about AB and AC?)

76

Essentials of linear algebra

20. For real numbers a and b, the Zero Product Property states that “if a · b = 0, then a = 0 or b = 0.” Said differently, if a = 0 and b = 0, then a · b = 0. Let 0 be the 2 × 2 zero matrix (i.e., all entries are zero). Does the Zero Product Property hold for matrices? That is, can you ﬁnd two nonzero matrices A and B such that AB = 0? Can you ﬁnd such matrices where none of the entries in A or B are zero? If so, what kind of matrices are A and B? 21. Does there exist a 2 × 2 matrix A, none of whose entries are zero, such that A2 = 0? 22. Does there exist a 2 × 2 matrix A other than the identity matrix such that A2 = I? What is special about such a matrix? 23. Let D be a diagonal matrix, P an invertible matrix, and A = PDP−1 . Using the expression PDP−1 for A, compute and simplify the matrix A2 = A · A. Do likewise for A3 = A · A · A. What will be the simpliﬁed form of An in terms of P, D, and P−1 ? a b 24. Let A be the matrix . Find conditions on a, b, c, and d that c d guarantee that Ax = 0 has inﬁnitely many solutions. What must therefore be true about a, b, c, and d in order for A to be invertible? √ 1/2 3/2 √ and v1 , v2 , v3 be the vectors that emanate from 25. Let A = − 3/2 1/2 the origin to the vertices of the triangle given by (2, 1), (3, 3), and (4, 0). Compute the new triangle that results from applying the matrix A to the given vertices, and sketch a picture of the original triangle and the resulting image. What is the effect of multiplying by A? 26. Suppose that A in exercise 25 was applied to a different set of three unknown vectors x1 , x2 , and x3 . The resulting output from these products is 0 2 −4 Ax1 = , Ax2 = , and Ax3 = 2 3 1 In other words, the new image after multiplying by A is the triangle whose vertices are (−4, 2), (0, 3), and (2, 1). Determine the exact vectors x1 , x2 , and x3 and sketch the original triangle that was mapped to the triangle with vertices (−4, 2), (0, 3), and (2, 1). 27. Consider the matrix 0 −1 B= 1 0 Let v1 = [2 1]T , v2 = [3 3]T , and v3 = [4 0]T . Compute Bv1 , Bv2 , and Bv3 . Sketch a picture of the new triangle that has resulted from applying the matrix B to the vertices (1, 1), (2, 3), and (4, 0). What is the geometric effect of the matrix B on each point?

The inverse of a matrix

77

28. Determine the inverse of B in exercise 27. What do you observe? 29. An unknown 2 × 2 matrix C is applied to the two vectors v1 = [1 1]T and v2 = [2 3]T , and the results are Cv1 = [0.1 0.7]T and Cv2 = [−0.1 1.8]T . Determine the entries in the matrix C. 30. Suppose that a computer graphics programmer decides to use the matrix √ √ 1/√2 1/√2 A= 1/ 2 1/ 2 Why is the programmer’s choice a bad one? What will be the result of applying this matrix to any collection of points? 31. Suppose that for a large population that stays relatively constant, people are classiﬁed as living in urban, suburban, or rural settings. Moreover, assume that the probabilities of the various possible transitions are given by the following table: Future location (↓)/current location (→)

U(%)

S(%)

R(%)

92

3

2

Suburban

7

96

10

Rural

3

1

88

Urban

Given that the population of 250 million in a certain year is distributed among 100 million urban, 100 million suburban, and 50 million rural, determine the population distribution in each of the preceding two years. 32. Car-owners can be grouped into classes based on the vehicles they own. A study of owners of sedans, minivans, and sport-utility vehicles shows that the likelihood that an owner of one of these automobiles will replace it with another of the same or different type is given by the table Future vehicle (↓)/ current vehicle (→)

Sedan(%)

Minivan(%)

SUV(%)

91

3

2

Minivan

7

95

8

sUV

2

2

90

Sedan

If there are currently 100 000 sedans, 60 000 minivans, and 80 000 SUVs among the owners being studied, determine the distribution of vehicles among the population before each current owner replaced his or her previous vehicle.

78

Essentials of linear algebra

33. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) If A is a matrix with a pivot in every row, then A is invertible. (b) If A is an invertible matrix, then its columns are linearly independent. (c) If Ax = b has a unique solution, then A is an invertible matrix. (d) If A and B are invertible matrices, then (AB)−1 exists and (AB)−1 = A−1 B−1 . (e) If A is a square matrix row equivalent to the identity matrix, then A is invertible. (f) If A is a square matrix and Ax = b has a solution for a given vector b, then Ax = c has a solution for every choice of c. (g) If R is a matrix that reﬂects points across a line through the origin, then R −1 = R. (h) If A and B are 2 × 2 matrices with all nonzero entries, then AB cannot equal the 2 × 2 zero matrix. 1.9 The determinant of a matrix

The Invertible Matrix Theorem (theorem 1.8.2) tells us that there are several different ways to determine whether or not a matrix is invertible, and hence whether or not an n × n system of linear equations has a unique solution. There is at least one more useful way to characterize invertibility, and that is through the concept of a determinant. As seen in exercise 24 of section 1.8, it may be shown through row-reduction that the general 2 × 2 matrix a b c d is invertible if and only if ad − bc = 0. We call the quantity (ad − bc) the determinant of the matrix A, and write4 det(A) = ad − bc. Note that this expression provides a condition on the entries of matrix A that determines whether or not A is invertible. We can explore similar ideas for larger matrices. For example, if we take an arbitrary 3 × 3 matrix ⎡ ⎤ a11 a12 a13 A = ⎣a21 a22 a23 ⎦ a31 a32 a33 and row-reduce in order to explore conditions under which the matrix has a pivot position in every row, it turns out to be necessary that the quantity D = a11 a22 a33 − a11 a23 a32 − a12 a21 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 4

Some authors use the notation |A| instead of det(A).

The determinant of a matrix

79

is nonzero. Grouping and factoring, we see that D may be rewritten in the form D = a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 ) (1.9.1) We again call this quantity D the determinant of the matrix A. In (1.9.1) we see evidence of the fact that determinants of larger matrices can be deﬁned recursively in terms of smaller matrices found within the original matrix A. For example, letting a22 a23 A11 = a32 a33 it follows that det(A11 ) = a22 a33 − a23 a32 , which is the expression multiplied by a11 in (1.9.1). More generally, if we let Aij be the submatrix deﬁned by deleting row i and column j of the original matrix A, then we see from (1.9.1) that D = a11 det(A11 ) − a12 det(A12 ) + a13 det(A13 ) The formal deﬁnition of the determinant of an n × n matrix is given through a similar recursive process. Deﬁnition 1.9.1 The determinant of an n × n matrix A with entries aij is deﬁned to be the number given by det(A) = a11 det(A11 ) − a12 det(A12 ) + · · · + (−1)n+1 a1n det(A1n )

(1.9.2)

where Aij is the matrix found by deleting row i and column j of A. We next consider an example to see some concrete computations. Example 1.9.1 Compute the determinant of the matrix ⎡ ⎤ 2 −1 1 1 2⎦ A=⎣ 1 −3 0 −3 In addition, determine if A is invertible. Solution. By deﬁnition, ⎡ ⎤ 2 −1 1 1 2 1 2 1 1 1 2⎦ = 2 det − (−1) det + 1 det det ⎣ 1 0 −3 −3 −3 −3 0 −3 0 −3 = 2(−3 − 0) + 1(−3 − (−6)) + 1(0 − (−3)) = −6 + 3 + 3 =0

80

Essentials of linear algebra

Next, to determine whether or not A is invertible, we row-reduce A to see if A has a pivot position in every row. Doing so, we ﬁnd that ⎡ ⎤ ⎡ ⎤ 2 −1 1 1 0 1 ⎣ 1 1 2⎦ → ⎣0 1 1⎦ 0 0 0 −3 0 −3 Thus, we see that A does not have a pivot in every row, and therefore A is not invertible. Of course, we should note that the primary motivation for the concept of the determinant comes from the question, “is A invertible?” Indeed, one reason the 3 × 3 matrix in the above example is not invertible is precisely because its determinant is zero. Later in this section, we will formally establish the connection between the value of the determinant and the invertibility of a general n × n matrix. It is clear at this point that determinants of most n × n matrices with n ≥ 3 require a substantial number of computations. Certain matrices, however, have particularly simple determinants to calculate, as the following example demonstrates. Example 1.9.2

Compute the determinant of the matrix ⎡ ⎤ 2 −2 7 A = ⎣0 −5 3⎦ 0 0 4

In addition, determine if A is invertible. Solution.

Again using the deﬁnition, we see that 0 3 0 −5 −5 3 det(A) = 2 det − (−2) det + 7 det 0 4 0 4 0 0 = 2(−5 · 4 − 2 · 0) + 2(0 − 0) + 7(0 − 0) = 2(−5)(4) = −40

Note particularly that the determinant of A is the product of its diagonal entries. Moreover, A clearly has a pivot position in every row, and so by this fact (or equivalently by the nonzero determinant of A) we see that A is invertible. In general, the determinant of any triangular matrix (one where all entries either below or above the diagonal are zero) is simply the product of its diagonal entries. There are other interesting properties that the determinant has, several of which are explored in the next example for the 2 × 2 case. Example 1.9.3

Let A=

a b c d

The determinant of a matrix

81

be an arbitrary 2 × 2 matrix. Explore the effect of elementary row operations on the determinant of A. Solution. First, let us consider a row swap, calling A1 the matrix c d A1 = a b We observe immediately that det(A) = ad − bc and det(A1 ) = cb − ad = − det(A). We next consider scaling; let A2 be the matrix whose ﬁrst row is [ka kb ], a scaled version of row 1 in A. We see that det(A2 ) = kad − kbc = k(ad − bc) = k · det(A). Finally, replacing, say, row 2 of A by the sum of k times row 1 with itself, we arrive at the matrix a b A3 = c + ka d + kb Then det(A3 ) = a(d + kb) − b(c + ka) = ad + kab − bc − kab = ad − bc = det(A). Thus, we see that for the 2 × 2 case, swapping rows in a matrix changes only the sign of the determinant, scaling a row by a nonzero constant scales the determinant by the same constant, and executing a row replacement does not change the value of the determinant at all. These demonstrate the effect that the three elementary row operations from the process of row-reduction have on a 2 × 2 matrix A. Given that the general deﬁnition of the determinant is recursive, it should not be surprising that the properties witnessed in example 1.9.3 can be shown to hold for n × n matrices. We state this result formally as our next theorem. Theorem 1.9.1 Let A be an n × n matrix and k a nonzero constant. Then a. If two rows of A are exchanged to produce matrix B, then det(B) = − det(A). b. If one row of A is multiplied by k to produce B, then det(B) = k det(A). c. If B results from a row replacement in A, then det(B) = det(A). Theorem 1.9.1 enables us to more clearly see the link between invertibility and determinants. Through a ﬁnite number of row interchanges and row replacements, any square matrix A may be row-reduced to upper triangular form U (where we have all subdiagonal zeros, but we do not necessarily scale to get 1’s on the diagonal). It follows from theorem 1.9.1 that det(A) = (−1)k det(U), where k is the number of row interchanges needed. Note that since U is triangular, its determinant is the product of its diagonal entries, and these entries

82

Essentials of linear algebra

lie in the pivot locations of A. Thus, A has a pivot in every row if and only if this determinant is nonzero. Speciﬁcally, we have shown that A is invertible if and only if det(A) = 0. To conclude this section, we note that linear algebra has once again afforded an alternate perspective on the problem of solving an n × n system of linear equations, and we can now add an additional statement involving determinants to the Invertible Matrix Theorem. Theorem 1.9.2 (Invertible Matrix Theorem) Let A be an n × n matrix. The following statements are equivalent: a. A is invertible. b. The columns of A are linearly independent. c. The columns of A span Rn . d. A has a pivot position in every column. e. A has a pivot position in every row. f. A is row equivalent to In . g. For each b ∈ Rn , the equation Ax = b has a unique solution. h. det(A) = 0. 1.9.1 Determinants using Maple

Obviously for most square matrices of size greater than 3 × 3, the computations necessary to ﬁnd determinants are tedious and present potential for error. As with other concepts that require large numbers of arithmetic operations, Maple offers a single command that enables us to take advantage of the program’s computational powers. Given a square matrix A of any size, we simply enter > Determinant(A);

As we explore properties of determinants in the exercises of this section, it will prove useful to be able to generate random matrices. Within the LinearAlgebra package in Maple, one accomplishes this for a 3 × 3 matrix with the command > RandomMatrix(3);

For example, if we wanted to consider the determinant of a random matrix A we could enter the code > A := RandomMatrix(3); > det(A);

See exercise 11 for a particular instance where this code will be useful.

The determinant of a matrix

83

Exercises 1.9 Compute (by hand) the determinant of each of the following matrices in exercises 1–7, and hence state whether or not the matrix is invertible. 2 1 1. A = 2 2

2 4 2. A = 1 2

⎤ 2 1 −3 5⎦ 3. A = ⎣2 2 2 3 −1 ⎡

⎡

⎤ 2 1 3 4. A = ⎣2 2 4⎦ 2 3 5 ⎡

−3 ⎢ 0 5. A = ⎢ ⎣ 0 0

⎤ 1 0 5 2 −4 0⎥ ⎥ 0 −7 11⎦ 0 0 6

⎤ a a d 6. A = ⎣b b e ⎦ c c f ⎡

7. In , where In is the n × n identity matrix. 1 2 invertible? Explain your 8. For which value(s) of h is the matrix −3 h answer in at least two different ways. 2−z 1 9. For which value(s) of z is the matrix invertible? Why? 1 2−z 10. For of z do nontrivial solutions x to the equation which value(s) 2−z 1 x = 0 exist? For one such value of z, determine a nontrivial 1 2−z solution x to the equation. 11. In a computer algebra system, devise code that will generate two random 3 × 3 matrices A and B, and that subsequently computes det(A), det(B), and det(AB). What theorem do you conjecture is true about the relationship between det(AB) and the individual determinants det(A) and det(B)?

84

Essentials of linear algebra

12. In a computer algebra system, devise code that will generate a random 3 × 3 matrix A and that subsequently computes its transpose AT , as well as det(A) and det(AT ). What theorem do you conjecture is true about the relationship between det(A) and det(AT )? 13. Use the formula conjectured in exercise 11 above to show that if A is 1 invertible, then det(A−1 ) = . (Hint: AA−1 = I.) det(A) 14. What can you say about the determinant of any square matrix in which one of the columns (or rows) is zero? Why? 15. What can you say about the determinant of any square matrix where one of the columns (or rows) is repeated in the matrix? Why? 16. Suppose that A is a n × n matrix and that Ax = 0 has inﬁnitely many solutions. What can you say about det(A)? Why? 17. Suppose that A2 is not invertible. Can you determine if A is invertible or not? Explain. 18. Two matrices A and B are said to be similar if there exists an invertible matrix P such that A = PBP−1 . What can you say about the determinants of similar matrices? 19. Let A be an arbitrary 2 × 2 matrix of the form a b c d where a = 0 and A is assumed to be invertible. Working by hand, row reduce the augmented matrix [A I2 ] and hence determine a formula for A−1 in terms of the entries of A. What role does det(A) play in the formula for A−1 ? 20. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) Swapping the rows in a square matrix A does not change the value of det(A). (b) If A is a square matrix with a pivot in every column, then det(A) = 0. (c) The determinant of any diagonal matrix is the product of its diagonal entries. (d) If A is an n × n matrix and Ax = b has a unique solution for every b ∈ Rn , then det(A) = 0.

1.10 The eigenvalue problem

Another powerful characteristic of linear algebra is the way the subject often allows us to better understand an inﬁnite collection of objects in terms of the properties of a small, ﬁnite number of elements in the set. For example, if we have

The eigenvalue problem

85

a set of three linearly independent vectors that spans R3 , then every vector in R3 may be understood as a unique linear combination of the three special vectors in the linearly independent spanning set. Thus, in some ways it is sufﬁcient to understand these three vectors, and to use that knowledge to better understand the rest of the vectors in R3 . In a similar way, as we will see in this section, for an n × n matrix A there are up to n important vectors (called eigenvectors) that enable us to better understand a variety of properties of the matrix. The process of matrix multiplication enables us to associate a function with any given matrix A. For example, if A is a 2 × 2 matrix, then we may deﬁne a function T by the formula T (x) = Ax

(1.10.1)

Note that the domain of the function T is R2 , the set of all vectors with two entries. Moreover, note that every output of the function T is also a vector in R2 . We therefore use the notation T : R2 → R2 . This is analogous to familiar functions like f (x) = x 2 , where for every real number input we obtain a real number output (f : R → R); the difference here is that for the function T , for every vector input we get a vector output. In what follows, we go in search of special input vectors to the function T for which the corresponding output is particularly simple to compute. The next example will highlight the properties of the vector(s) we seek. Example 1.10.1 Explore the geometric effect of the matrix 2 1 A= 1 2 on the vectors u = [1 0]T and v = [1 1]T from the perspective of the function T (x) = Ax. Solution. We ﬁrst compute T (u) = Au = [2 1]T . In ﬁgure 1.15, we see a plot of the vector u on the left, and T (u) on the right. This shows that the geometric effect of T on u is to rotate u and stretch it. For the vector v, we observe that T (v) = Av = [3 3]T . Graphically, as shown in ﬁgure 1.16, it is clear that T (v) is simply a stretch of v by a factor of 3. Said slightly differently, we might write that 3 1 T (v) = Av = =3 = 3v 3 1 This shows that the result of the function T (and hence the matrix A) being applied to the vector v is particularly simple: v is only stretched by T . For any n × n matrix A, there is an associated function T : Rn → Rn deﬁned by T (x) = Ax. This function takes a given vector in Rn and maps it to a corresponding vector in Rn ; in every case, we may view this output as resulting from the input vector being stretched and/or rotated. Input vectors that are

86

Essentials of linear algebra

3

3

T(u) T

u −3

−3

3

−3

3

−3

Figure 1.15 The vectors u and T (u) in example 1.10.1.

T(v)

3

3

v

T −3

3

−3

3

−3

−3 Figure 1.16 The vectors v and T (v).

only stretched have corresponding outputs that are simplest to determine: the input vector is simply multiplied by a scalar. To put this another way, for these stretched-only vectors, multiplying them by A is equivalent to multiplying them by a constant. Such vectors prove to be important for a host of reasons, and are called the eigenvectors of a matrix A. Deﬁnition 1.10.1 For a given n × n matrix A, a nonzero vector v is said to be an eigenvector of A if and only if there exists a scalar λ such that Av = λv

(1.10.2)

The scalar λ is called the eigenvalue corresponding to the eigenvector v.

The eigenvalue problem

87

In example 1.10.1, we found that the vector v = [1 1]T is an eigenvector of the given matrix A with corresponding eigenvalue 3 since Av = 3v. What is not yet clear is how we even begin to ﬁnd eigenvectors and eigenvalues. We will soon see that some of the many different perspectives we can take on systems of linear equations will help us solve this problem. In general, given an n × n matrix A, we seek eigenvectors v that are, by deﬁnition, nonzero and satisfy the equation Av = λv. In one sense, what makes this problem challenging is that neither v nor λ is initially known. We thus explore some different perspectives on the problem to see if we can highlight the role of either v or λ. Early in this chapter, we spent signiﬁcant effort studying homogeneous equations and the circumstances under which they have nontrivial solutions. Here, the eigenvector problem can be rephrased in a similar light. Subtracting λv from both sides of (1.10.2), we equivalently seek λ and v such that Av − λv = 0

(1.10.3)

Viewing λv as (λI)v, we can factor (1.10.3) and write (A − λI)v = 0

(1.10.4)

Now the question becomes, “for which values of λ does (1.10.4) have a nontrivial solution?” At this point, we recall theorem 1.6.2, which tells us that the equation Bx = 0 has only the trivial solution if and only if the matrix B has a pivot in every column. To have a nontrivial solution, we therefore want A − λI to not have a pivot in every column. In (1.10.4), the matrix A − λI is square, so by the Invertible Matrix Theorem such a nontrivial solution exists if and only if A − λI is not invertible. This last observation brings us, ﬁnally, to determinants. As we saw in Section 1.9, a matrix is invertible if and only if its determinant is nonzero. Therefore, a nontrivial solution to (1.10.4) exists whenever λ is such that det(A − λI) = 0. In the next example, we explore how this equation enables us to ﬁnd the eigenvalues of a matrix A, and hence the eigenvectors as well. Example 1.10.2 Find the eigenvalues and eigenvectors of the matrix 2 1 A= 1 2 Solution. As seen in our preceding discussion, by the deﬁnition of eigenvalues and eigenvectors, λ is an eigenvalue of A if and only if the equation (A −λI)v = 0 has a nontrivial solution. Note ﬁrst that A − λI is the matrix A with the scalar λ subtracted from each diagonal entry since 2 1 2−λ 1 λ 0 = A − λI = − 1 2 0 λ 1 2−λ

88

Essentials of linear algebra

We next compute det(A − λI) so that we can see which values of λ make this determinant zero. In particular, we have 2−λ 1 det(A − λI) = det 1 2−λ = (2 − λ)2 − 1 = λ2 − 4λ + 3

(1.10.5)

Thus, in order for det(A − λI) = 0, λ must satisfy the equation λ2 − 4λ + 3 = 0. Factoring, (λ − 3)(λ − 1) = 0, and therefore λ = 3 and λ = 1 are eigenvalues of A. The value λ = 3 is not surprising, given our earlier discoveries in example 1.10.1. Next, we proceed to ﬁnd the eigenvectors that correspond to each eigenvalue. Beginning with λ = 3, we seek nonzero vectors v that satisfy Av = 3v, or equivalently (A − 3I)v = 0 This problem is a familiar one: solving a homogeneous system of linear equations for which inﬁnitely many solutions exist. Augmenting A − 3I with a column of zeros and row-reducing, we ﬁnd that 1 0 1 −1 0 −1 → 1 −1 0 0 0 0 Note that from the very deﬁnition of an eigenvector, by which we seek a nontrivial solution to (A − λI)v = 0, it must be the case at this point that the matrix A − λI does not have a pivot in every row. Interpreting the row-reduced matrix with the free variable v2 , we ﬁnd that the vector v = [v1 v2 ]T must satisfy v1 − v2 = 0. Thus, any vector v of the form v2 1 v= = v2 1 v2 is an eigenvector of A that corresponds to the eigenvalue λ = 3. In particular, we observe that any scalar multiple of the vector v = [1 1]T is an eigenvector of A with associated eigenvalue 3. We say that the set of all eigenvectors associated with eigenvalue 3 is the eigenspace corresponding to λ = 3. It now only remains to ﬁnd the eigenvectors associated with λ = 1. We proceed in the same manner as above, now solving the homogeneous equation (A − 1I)v = 0. Row-reducing, we ﬁnd that 1 1 0 1 1 0 → 1 1 0 0 0 0

The eigenvalue problem

89

and therefore the eigenvector v must satisfy v1 + v2 = 0 and have the form −v2 −1 = v2 v= 1 v2 Here, any scalar multiple of v = [−1 1]T is an eigenvector of A corresponding to λ = 1. There are several important general observations to be made from example 1.10.2. One is that for any 2 × 2 matrix, the matrix will have 0, 1, or 2 real eigenvalues. This comes from the fact that det(A − λI) is a quadratic function in the variable λ, and therefore can have up to two real zeros. While it is possible to consider complex eigenvalues, we will wait until these arise in our study of systems of differential equations to address them in detail. In addition, we note that there are inﬁnitely many eigenvectors associated with each eigenvalue. Often we will be interested in ﬁnding representative eigenvectors— ones for which all others with the same eigenvalue are linear combinations. Finally, it is worthwhile to note that the two representative eigenvectors found in example 1.10.2, corresponding respectively to the two distinct eigenvalues, are linearly independent. More on why this is important will be discussed at the end of this section; for now, we remark that it is possible to show that eigenvectors corresponding to distinct eigenvalues are always linearly independent. This fact will be proved in exercise 16. The observations in the preceding paragraph generalize to the case of n × n matrices. It may be shown that det(A − λI) is a polynomial of degree n in λ. This function is usually called the characteristic polynomial; the equation det(A − λI) = 0 is typically referred to as the characteristic equation. Because the characteristic polynomial has degree n, it follows that A has up to n real eigenvalues5 . Next we consider two additional examples that demonstrate some more of the possibilities and important ideas that arise in trying to ﬁnd the eigenvalues and eigenvectors of a given matrix. Example 1.10.3 Determine the eigenvalues and eigenvectors of the matrix √ √ 1/√2 −1/√2 R= 1/ 2 1/ 2 In addition, explore the geometric effect of the function T (v) = Rv on vectors in R2 .

5 See appendix C for a review and discussion of important properties of roots of polynomial equations.

90

Essentials of linear algebra

Solution. solve

We consider the characteristic equation det(R − λI) = 0 and hence

0 = det

√1 − λ 2 √1 2

− √1

2

√1 2

−λ

2 1 1 = √ −λ + 2 2 √ 2 = λ − 2λ + 1

By the quadratic formula, it follows that √ √ √ √ 2± 2−4 2±i 2 λ= = 2 2 which shows that R does not have any real eigenvalues. If we explore the geometric effect of T (v) = Rv graphically, we can better understand why this is the√ case.√Beginning with the vector e1 = [1 0]T and computing Re1 = [1/ 2 1/ 2]T , as seen in ﬁgure 1.17, we see that the function T (x) = Rx rotates the vector e1 counterclockwise by π/4 radians, and (as computing the length of each vector shows) there is no stretching Similarly, for √ involved. √ the vector e2 = [0 1]T , we can see that Re2 = [−1/ 2 1/ 2]T . Just as with the previous vector e1 , we see that the function T (v) = Rv simply rotates the vector e2 counterclockwise by π/4 radians. In fact, since every vector in R2 can be written as a linear combination of e1 and e2 , it follows that the image Rv of any vector v is simply the original vector rotated counterclockwise π/4 radians. This shows that no vector in R2 is simply stretched under multiplication by R, and therefore R has no real eigenvectors.

2

2

Re1 T

e1 −2

2

−2 Figure 1.17 The vectors e1 and T (e1 ) = Re1 .

−2

2

−2

The eigenvalue problem

91

Matrices such as R in example 1.10.3 with the property that they rotate every vector by a ﬁxed angle (with no stretching factor) are usually called rotation matrices. Other interesting cases arise in the search for eigenvectors when some of the eigenvalues are repeated. That is, when a value λ is a multiple root of the characteristic equation det(A − λI) = 0. We explore this further in the next example. Example 1.10.4 Determine all eigenvalues and eigenvectors of the matrix ⎡ ⎤ 5 6 2 A = ⎣0 −1 −8⎦ 1 0 −2 Solution. As in previous examples, we ﬁrst compute det(A − λI). Doing so and simplifying yields det(A − λI) = −36 + 15λ + 2λ2 − λ3 Factoring, it follows that det(A − λI) = −(λ + 4)(λ − 3)2 Setting the characteristic polynomial equal to zero, it is required that −(λ + 4) (λ − 3)2 = 0. This shows that A has two distinct eigenvalues; moreover, just as with zeros of polynomials, we say that λ = −4 has multiplicity 1, while λ = 3 has multiplicity 2. We now ﬁnd the eigenvectors corresponding to each eigenvalue. For λ = −4, we solve the equation (A + 4I)v = 0, and see by row-reducing that ⎡ ⎤ ⎡ ⎤ 1 0 2 0 9 6 2 0 ⎣0 3 −8 0⎦ → ⎣0 1 − 8 0⎦ 3 1 0 2 0 0 0 0 0 Note that v3 is a free variable, and that the corresponding eigenvector v must have components which satisfy v1 + 2v3 = 0 and v2 − 83 v3 = 0, which shows that v has form ⎡ ⎤ −2 ⎢ ⎥ v = v3 ⎣ 83 ⎦ 1 Likewise, for λ = 3, we consider (A − 3I)v = 0, and row-reduce to ﬁnd that ⎡ ⎤ ⎡ ⎤ 2 6 2 1 0 −5 ⎣0 −4 −8⎦ → ⎣0 1 2⎦ 0 0 0 1 0 −5

92

Essentials of linear algebra

This leads us to see that the corresponding eigenvector has form ⎡ ⎤ 5 v = v3 ⎣ −2 ⎦ 1 Therefore, we see that for this matrix A, the matrix has two distinct eigenvalues (−4 and 3), and each of these eigenvalues has only one associated linearly independent eigenvector. That is, every eigenvector of A associated with λ = −4 is a scalar multiple of [−2 83 1]T while every eigenvector associated with λ = 3 is a scalar multiple of [5 − 2 1]T . In the three preceding examples, we have seen that an n × n matrix has up to n real eigenvalues. It turns out that there are also up to n linearly independent eigenvectors of the matrix. For many reasons, the best possible scenario is when a matrix has n linearly independent eigenvectors, such as the matrix A in example 1.10.2. In that 2 × 2 situation, A had two distinct real eigenvalues, and two corresponding linearly independent eigenvectors. One reason that this is so useful is that the eigenvectors are not only linearly independent, but also span R2 . If we call the two eigenvectors found in example 1.10.2 u and v, corresponding to λ = 3 and μ = 1, respectively, then, since these two vectors are linearly independent in R2 and span R2 , we can write every vector in R2 uniquely as a linear combination of u and v. In particular, given a vector x, there exist coefﬁcients α and β such that x = αu + β v If we are interested in computing Ax, we can do so now solely by knowing how A acts on the eigenvectors. Speciﬁcally, if we apply the linearity of matrix multiplication and the deﬁnition of eigenvectors, we have Ax = A(α u + β v) = α Au + β Av = αλu + βμv

This then reduces matrix multiplication essentially to scalar multiplication. In conclusion, we have seen in this section that via matrix multiplication, every matrix can be viewed as a function in the way that, through multiplication, it stretches and rotates vectors. Those vectors that are only stretched are called eigenvectors, and the factor by which the matrix stretches them are called eigenvalues. By knowing the eigenvalues and eigenvectors, we can better understand how A acts on an arbitrary vector, and, with some more sophisticated approaches, even further understand key properties of the matrix. Some of these properties will be studied in detail later in this text when we consider systems of differential equations.

The eigenvalue problem

93

1.10.1 Markov chains, eigenvectors, and Google

In a Markov process such as the one discussed in subsection 1.3.1 that represents the transition of voters from one classiﬁcation to another, it is natural to wonder whether or not there is a distribution of voters for which the total number in each category will remain constant from one year to the next. For example, for the Markov process represented by x (n+1) = Mx (n) where M is the matrix

(1.10.6)

⎡ ⎤ 0.95 0.03 0.07 M = ⎣0.02 0.90 0.13⎦ 0.03 0.07 0.80

we can ask: is there a voter distribution x such that Mx = x? In light of our most recent work with eigenvalues and eigenvectors, we see that this question is equivalent to asking if the matrix M has λ = 1 as an eigenvalue with some corresponding eigenvector that can represent a voter distribution. If we compute the eigenvalues and eigenvectors of M, we ﬁnd that the eigenvalues are λ = 1.000, 0.911, 0.739. The eigenvector corresponding to λ = 1 is v = [0.770 0.558 0.311]T . Scaling v so that the sum of its entries is 250, we see that the eigenvector v = [117.450 85.113 47.437]T represents the distribution of a population of 250 000 people in such a way that the total number of Democrats, Republicans, and Independents does not change from one year to the next, under the hypothesis that voters change categories annually according to the likelihoods expressed in the Markov matrix M. This eigenvector is sometimes also called a stationary vector. Remarkably, we can also note that in our earlier computations in subsection 1.3.1 for this Markov chain, we observed that the sequence of vectors x (1) , x (2) , . . . , x (20) , . . . was approaching a single vector. In fact, the limiting value of this sequence is the eigenvector v = [117.450 85.113 47.437]T . That this phenomenon occurs is the result of the so-called Power method, a rudimentary numerical technique for computing an eigenvalue–eigenvector pair of a matrix. More about this concept can be studied in the project on discrete dynamical systems found in section 1.13.3. Example 1.10.5 Find the stationary vector from the matrix in example 1.3.3. Solution. Under the assumptions stated in example 1.3.3, we saw that the migration of citizens from urban to suburban areas of a metropolitan area, or vice versa, were modeled by the Markov process x (n+1) = Mx (n) where M is the matrix 0.85 0.08 M= 0.15 0.92

94

Essentials of linear algebra

Solving the equation x = Mx by writing (M − I)x = 0, we see that we need to ﬁnd the eigenvector of x that corresponds to λ = 1. Doing so, we ﬁnd that the eigenvector is 0.4706 v= 0.8824 Scaling this vector so that the sum of its entries is one, we see that the population stabilizes when it is distributed with 34.78 percent in the city and 65.22 percent in the suburbs, in accordance with the vector [0.3478 0.6522]T . One of the most stunning applications of eigenvalues and eigenvectors can be found on the World Wide Web. In particular, the idea of ﬁnding a stationary vector that satisﬁes Mx = x is at the center of Google’s Page Rank Algorithm that it uses to index the importance of billions of pages on the Internet. What is particularly challenging about this problem is the fact that the stochastic matrix M used by the algorithm is a square matrix that has one column for every page on the World Wide Web that is indexed by Google! In early 2007, this meant that M was a matrix with 25 billion columns. Nonetheless, properties of the matrix M and sophisticated numerical algorithms make it possible for modern computers to quickly ﬁnd the stationary vector of M and hence provide the user with the results we have all grown accustomed to in using Google.6 1.10.2 Using Maple to ﬁnd eigenvalues and eigenvectors

Due to its reliance upon determinants and the solution of polynomial equations, the eigenvalue problem is computationally difﬁcult for any case larger than 3 × 3. Sophisticated algorithms have been developed to compute eigenvalues and eigenvectors efﬁciently and accurately. One of these is the socalled QR algorithm, which through an iterative technique produces excellent approximations to eigenvalues and eigenvectors simultaneously. While Maple implements these algorithms and can ﬁnd both eigenvalues and eigenvectors, it is essential that we not only understand what the program is attempting to compute, but also how to interpret the resulting output. As always, in what follows we are working within the LinearAlgebra package. Given an n × n matrix A, we can compute the eigenvalues of A with the command > Eigenvalues(A);

6 A detailed description of how the Page Rank Algorithm works and the role that eigenvectors play may be read at http://www.ams.org/featurecolumn/archive/pagerank.html.

The eigenvalue problem

Doing so for the matrix

A=

95

2 1 1 2

from example 1.10.2 yields the Maple output 3 1 Despite the vector format, the program is telling us that the two eigenvalues of the matrix A are 3 and 1. If we desire the eigenvectors, too, we can use the command > Eigenvectors(A);

which leads to the output

3 1 −1 , 1 1 1

Here, the ﬁrst vector tells us the eigenvalues of A. The following matrix holds the corresponding eigenvectors in its columns; the vector [1 1]T is the eigenvector corresponding to λ = 3 and [−1 1]T corresponds to λ = 1. Maple is extremely powerful. It is not at all bothered by complex numbers. So, if we enter a matrix like the one in example 1.10.3 that has no real eigenvalues, Maple will ﬁnd complex eigenvalues and eigenvectors. To see how this appears, we enter the matrix √ √ 1/√2 −1/√2 R= 1/ 2 1/ 2 and execute the command > Eigenvectors(R);

The resulting output is

√ 1√ 1 I −I 2 2 + 2I 2 , √ √ 1 1 1 1 2 2 − 2I 2

Note √ that here Maple is using ‘I ’ to denote not the identity matrix, but rather −1. Just as we saw in example 1.10.3, R does not have any real eigenvalues. We can use familiar properties of complex numbers (most importantly, I 2 = 1) to actually check that the equation Ax = λx holds for the listed complex eigenvalues and complex eigenvectors above. However, at this point in our study, these complex eigenvectors are of less importance, so we defer further details on them until later work with systems of differential equations. One ﬁnal example is relevant here to see how Maple deals with repeated eigenvalues and missing eigenvectors. If we enter the 3 × 3 matrix A from

96

Essentials of linear algebra

example 1.10.4 and execute the Eigenvectors command, we receive the output ⎤ ⎡ ⎤ ⎡ 5 0 −2 3 8⎥ ⎣ 3 ⎦, ⎢ ⎣−2 0 3⎦ −4 1 0 1 Here we see that 3 is a repeated eigenvalue of A with multiplicity 2. The ﬁrst two columns of the matrix in the output contain the (potentially) linearly independent eigenvectors which correspond to this eigenvalue. The second column of all zeros indicates that A has only one linearly independent eigenvector corresponding to this particular eigenvalue. The third column, of course, is the eigenvector associated with the eigenvalue λ = −4. The column of all zeros also demonstrates that R3 does not have a linearly independent spanning set that consists of eigenvectors of A. Exercises 1.10 In exercises 1–8, compute (by hand) the eigenvalues and any corresponding real eigenvectors of the given matrix A. 5 1 1. A = 0 3 3 −1 2. A = −1 3 3 4 3. A = −5 −5 1 4 4. A = 1 4 ⎡ ⎤ 2 1 0 5. A = ⎣0 2 1⎦ 0 0 2 ⎡ ⎤ 2 1 0 6. A = ⎣0 2 0⎦ 0 0 2 ⎡ ⎤ 2 0 0 7. A = ⎣0 2 0⎦ 0 0 2 ⎡ ⎤ −3 2 5 8. A = ⎣ 0 6 −2⎦ 0 0 5

The eigenvalue problem

97

9. A 2 × 2 matrix A has eigenvalues 5 and −1 and corresponding eigenvectors u = [0 1]T and v = [1 0]T . Use this information to compute Ax, where x is the vector x = [−5 4]T . 10. A 2 × 2 matrix A has eigenvalues −3 and −2 and corresponding eigenvectors u = [−1 1]T and v = [1 1]T . Use this information to compute Ax, where x is the vector x = [−3 5]T . 11. Consider the matrix

⎡

⎤ −2 1 1 1⎦ A = ⎣ 1 −2 1 1 −2

(a) Determine the eigenvalues and eigenvectors of A. (b) Does R3 have a linearly independent spanning set that consists of eigenvectors of A? 12. Consider the matrix

A=

3 −1 −1 3

(a) Determine the eigenvalues and eigenvectors of A, and show that A has two linearly independent eigenvectors. (b) Let P be the matrix whose columns are two linearly independent eigenvectors of A. Why is P invertible? (c) Let D be the diagonal matrix whose diagonal entries are the eigenvalues of A; place the eigenvalues on the diagonal in an order corresponding to the order of the eigenvectors in the columns of P, where P is the matrix deﬁned in (b) above. Compute AP and PD. What do you observe? (d) Explain why A = PDP−1 . Use this factorization to compute A2 , A3 , and A10 in terms of P, D, and P−1 . In particular, explain how A10 can be easily computed by using the diagonal matrix D along with P and P−1 . 13. Consider the matrix

⎡

⎤ 3 −1 1 3 −1⎦ A = ⎣−1 1 −1 3

(a) Determine the eigenvalues and eigenvectors of A, and show that A has three linearly independent eigenvectors. (b) Let P be the matrix whose columns are three linearly independent eigenvectors of A. Why is P invertible? (c) Let D be the diagonal matrix whose diagonal entries are the eigenvalues of A; place the eigenvalues on the diagonal in an order corresponding to the order of the eigenvectors in the columns of P,

98

Essentials of linear algebra

where P is the matrix deﬁned in (b) above. Compute AP and PD. What do you observe? (d) Explain why A = PDP−1 . Use this factorization to compute A2 , A3 , and A10 in terms of P, D, and P−1 . 14. Prove that an n × n matrix A is invertible if and only if A has no eigenvalue equal to zero. 15. Show that if A, B, and P are square matrices (with P invertible) such that B = PAP−1 , then A and B have the same eigenvalues. (Hint: consider the characteristic equation for PAP−1 .) 16. Prove that if A is a 2 × 2 matrix and v and u are eigenvectors of A corresponding to distinct eigenvalues λ and μ, then v and u are linearly independent. (Hint: suppose to the contrary that v and u are linearly dependent.) 17. For a differentiable function y, denote the derivative of y with respect to x by D(y). Now consider the function y = e 7x , and compute D(y). For what value of λ is D(y) = λy? Explain how this value behaves like an eigenvalue of the operator D. What is the corresponding eigenvector? How does the problem change if we consider y = e rx for any other real value of r? 18. For a vector-valued function x(t ), let the derivative of x with respect to t be denoted by D(x). For the function e −2t x(t ) = −3e −2t compute D(x). For what value(s) of λ is D(x) = λx? Explain how it appears from your work that the operator D has an eigenvalue-eigenvector pair. 19. Suppose that for a large population that stays relatively constant, people are classiﬁed as living in urban, suburban, or rural settings. Moreover, assume that the probabilities of the various possible transitions are given by the following table: Future location (↓)/current location (→)

U(%)

S(%)

R(%)

90

3

2

Suburban

7

96

10

Rural

3

1

88

Urban

Given that a population of 250 million is present, is there a stationary vector that reveals a population which does not change from year to year? 20. Car-owners can be grouped into classes based on the vehicles they own. A study of owners of sedans, minivans, and sport utility vehicles shows

Generalized vectors

99

that the likelihood that an owner of one of these automobiles will replace it with another of the same or different type is given by the table Future vehicle (↓)/ current vehicle (→)

Sedan(%)

Minivan(%)

SUV(%)

91

3

2

Minivan

7

95

8

SUV

2

2

90

Sedan

If there are currently 100 000 vehicles in the population under study, is there a stationary vector that represents a distribution in which the number of owners of each type of vehicle will not change as they replace their vehicles? 21. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) If x is any vector and λ is a constant such that Ax = λx, then x is an eigenvector of A. (b) If Ax = 0 has nontrivial solutions, then λ = 0 is an eigenvalue of A. (c) Every 3 × 3 matrix has three real eigenvalues. (d) If A is a 2 × 2 matrix, then A can have up to two real linearly independent eigenvectors. 1.11 Generalized vectors

Throughout our work with vectors in Rn , we have regularly used several key algebraic properties they possess. For example, any two vectors u and v can be added to form a new vector u + v, any single vector can be multiplied by a scalar to determine a new vector cu, and there is a zero vector 0 with the property that for any vector v, v + 0 = v. Of course, we use other algebraic properties of vectors as well, often implicitly. Other sets of mathematical objects behave in ways that are algebraically similar to vectors. The purpose of this section is to expand our perspective on what familiar mathematical entities might also reasonably be called vectors; much of this expanded perspective is in anticipation of our pending work with differential equations and their solutions. We motivate our study with several familiar examples, and then summarize a collection of formal properties that all these examples share. Example 1.11.1 Let M2×2 denote the collection of all 2 × 2 matrices with real entries. Show that if A and B are any 2 × 2 matrices and c ∈ R, then A + B and cA are also 2 × 2 matrices. In addition, show that there exists a “zero matrix” Z such that A + Z = A for every matrix A.

100

Essentials of linear algebra

Solution.

Let

a11 a12 b11 b12 and B = A= a21 a22 b21 b22

By the deﬁnition of matrix addition, a + b11 a12 + b12 A + B = 11 a21 + b21 a22 + b22 and thus we see that A + B is also a 2 × 2 matrix. Recall that it only makes sense for matrices of the same size to be added; here we are simply pointing out the obvious fact that the sum of two matrices of the same size is yet another matrix of the same size. In the same way, ca11 ca12 cA = ca21 ca22 which shows that not only is the scalar multiple deﬁned, but also that cA is a 2 × 2 matrix. Finally, if we let Z be the 2 × 2 matrix all of whose entries are zero, 0 0 Z= 0 0 then our work with matrix sums shows us immediately that A + Z = A for every possible 2 × 2 matrix A. Certainly, we can see that there is nothing particularly special about the 2 × 2 case in this example; the same properties will hold for Mm×n for any positive integer values of m and n. Mathematicians often use the language “M2×2 is closed under addition and scalar multiplication” and “M2×2 contains a zero element ” to describe the observations we made in example 1.11.1. Speciﬁcally, to say that a set is closed under an operation means simply that if we perform the operation on an appropriate number of elements from the set, the result is another element in the set. We next consider several more examples of sets that demonstrate the properties of being closed and having a zero element. Example 1.11.2 Let P2 denote the set of all polynomials of degree 2 or less. That is, P2 is the set of all functions of the form p(x) = a2 x 2 + a1 x + a0 where a0 , a1 , a2 ∈ R. Show that P2 is closed under addition and scalar multiplication, and that P2 contains a zero element. Solution. Before we formally address the stated tasks, let us remind ourselves how we add polynomial functions. If we are given, say, f (x) = 2x 2 − 5x + 11 and g (x) = 4x − 3, we compute (f + g )(x) = f (x) + g (x) = 2x 2 − 5x + 11 + 4x − 3. We can then add like terms to simplify and ﬁnd that (f + g )(x) = 2x 2 − x + 8.

Generalized vectors

101

Similarly, if we wanted to compute (−3f )(x), we have (−3f )(x) = −3f (x) = −3(2x 2 − 5x + 11) = −6x 2 + 15x − 33. We now show that P2 is indeed closed under the operations of addition and scalar multiplication. Given two arbitrary elements of P2 , say f (x) = a2 x 2 + a1 x + a0 and g (x) = b2 x 2 + b1 x + b0 , it follows upon adding and combining like terms that (f + g )(x) = (a2 + b2 )x 2 + (a1 + b1 )x + (a0 + b0 ) which is obviously a polynomial of degree 2 or lower, and thus f + g is an element of P2 . In the same way, for any real value c, (cf )(x) = ca2 x 2 + ca1 x + ca0 which also belongs to P2 . Finally, it is evident that if we let z(x) = 0x 2 + 0x + 0 (i.e., z(x) is the zero function), then (f + z)(x) = f (x) for any choice of f in P2 . Here, too, we should observe that while these properties hold for P2 , there is nothing special about the 2. In fact, Pn (the set of all polynomials of degree n or less) has the exact same properties. Even P, the set of all polynomials, behaves in the same manner. Example 1.11.3 From calculus, consider the set C [−1, 1] of all continuous functions on the interval [−1, 1]. That is, C [−1, 1] = {f | f is continuous on [−1, 1]}.

Show that C [−1, 1] is closed under addition and scalar multiplication, and also that C [−1, 1] contains a zero element. Solution. Two standard facts from calculus tell us that the sum of any two continuous functions is also a continuous function and that a constant multiple of a continuous function is also a continuous function. Thus C [−1, 1] is closed under addition and scalar multiplication. Furthermore, the zero function z(x) = 0 is itself continuous, which shows that C [−1, 1] indeed has a zero element. One of the principal reasons that we are shifting our attention from vectors in Rn to this more generalized concept of vector where the objects under consideration are often functions is the fact that our focus in subsequent chapters will be solving differential equations. The solution to a differential equation is a function that makes the equation true. Moreover, we will also see that for certain important classes of differential equations, there are multiple solutions to the equation and that often these solution sets are closed under addition and scalar multiplication and also contain the zero function. From each of the above examples, we see that Rn has many important properties that we can consider in a broader context. We therefore introduce the notion of a vector space, which is a set of objects that have deﬁned operations of addition and scalar multiplication that satisfy the list of ten rules below. The concept of a vector space is a generalization of Rn .

102

Essentials of linear algebra

While many of the rules are technical in nature, the most important ones to verify turn out to be the three that we have focused on so far: being closed under addition, closed under scalar multiplication, and having a zero element. All three sets described in the above examples are vector spaces, as is Rn . Deﬁnition 1.11.1 A vector space is a nonempty set V of objects, on which operations of addition and scalar multiplication are deﬁned, where the objects in V (called vectors) adhere to the following ten rules: 1. For every u and v in V , the sum u + v is in V (V is “closed under vector addition”) 2. For every u and v in V , u + v = v + u (“vector addition is commutative”) 3. For every u, v , w in V , (u + v) + w = v + (u + w) (“vector addition is associative”) 4. There exists a zero vector 0 in V such that u + 0 = u for every u ∈ V (0 is called the additive identity of V ) 5. For every u ∈ V , there is a vector −u such that u + (−u) = 0 (−u is called the additive inverse of u) 6. For every u ∈ V and every scalar c, the scalar multiple cu ∈ V (V is “closed under scalar multiplication”) 7. For every u and v in V and every scalar c, c(u + v) = cu + cv (“scalar multiplication is distributive over vector addition”) 8. For every u ∈ V and scalars c and d, (c + d)u = cu + du 9. For every u ∈ V and scalars c and d, c(du) = (cd)u 10. For every u ∈ V , 1u = u Sometimes we can take a sub-collection (i.e., a subset) of the vectors in a vector space, and that smaller set itself acts like a vector space. For example, the set of all polynomial functions is a vector space. If we take just the polynomials of degree 2 or less (as in example 1.11.2 above), that subset is itself a vector space. This leads us to introduce the notion of a subspace. Deﬁnition 1.11.2 Given a vector space V , let H be a subset of V (i.e., every object in H is also in V .) There are then operations of addition and scalar multiplication on objects in H : speciﬁcally, the same addition and scalar multiplication as on the objects in V . We say H is a subspace of V if and only if all three of the following conditions hold: 1. H is closed under addition 2. H is closed under scalar multiplication 3. H contains the zero element of V

Generalized vectors

103

We close this section with two important examples of subspaces. The ﬁrst is a subspace of Rn associated with a given matrix A. The second is a subspace of the set of all continuous functions on [−1, 1]. Example 1.11.4 Recall the matrix A from example 1.10.4 in section 1.10, ⎡ ⎤ 5 6 2 A = ⎣0 −1 −8⎦ 1 0 −2 Show that the set of all eigenvectors that correspond to a given eigenvalue of A forms a subspace of R3 . Solution. In example 1.10.4, we saw that the eigenvalues of A are λ = −4 (with multiplicity 1) and λ = 3 (with multiplicity 2). In addition, the corresponding eigenvectors are v = [−2 83 1]T for λ = −4 and v = [5 − 2 1]T for λ = 3. In particular, recall that every scalar multiple of vλ=−4 is also an eigenvector of A corresponding to λ = −4. We now show that the set of all these eigenvectors corresponding to λ = −4 is a subspace of R3 . Let Eλ=−4 denote the set of all vectors v such that Av = −4v. First, certainly it is the case that A0 = −40. This shows that the zero element of R3 is an element of Eλ=−4 . Furthermore, we have already seen that every scalar multiple of an eigenvector is itself an eigenvector, and thus Eλ=−4 is closed under scalar multiplication. Finally, suppose we have two vectors x and y such that Ax = −4x and Ay = −4y. Observe that by properties of linearity, A(x + y) = Ax + Ay = −4x − 4y = −4(x + y)

which shows that (x + y) is also an eigenvector of A corresponding to λ = −4. Therefore, Eλ=−4 is closed under addition. This shows that Eλ=−4 is indeed a subspace of R3 . In a similar fashion, Eλ=3 is also a subspace of R3 . Our observations for the eigenspaces of the 2 × 2 matrix A in example 1.11.4 hold in general for any n × n matrix A: the set of all eigenvectors corresponding to a given eigenvalue of A forms a subspace of Rn . Example 1.11.5 Show that the set of all linear combinations of the sine and cosine functions is a subspace of the vector space C of all continuous functions. Solution. We let C denote the vector space of all continuous functions, and now let H be the subset of C which is deﬁned to be all functions that are linear combinations of sin t and cos t . That is, a typical element of H is a function f of the form f (t ) = c1 sin t + c2 cos t

104

Essentials of linear algebra

where c1 and c2 are any real scalars. We need to show that the set H contains the zero function from C , that H is closed under scalar multiplication, and that H is closed under addition. First, if we choose c1 = c2 = 0, the function z(t ) = 0 sin t + 0 cos t = 0 is the function that is identically zero, which is the (continuous) zero function from C . Next, if we take a function from H , say f (t ) = c1 sin t + c2 cos t , and multiply it by a scalar k, we get kf (t ) = k(c1 sin t + c2 cos t ) = (kc1 ) sin t + (kc2 ) cos t which is of course another element in H , so H is closed under scalar multiplication. Finally, if we consider two elements f and g in H , given by f (t ) = c1 sin t + c2 cos t and g (t ) = d1 sin t + d2 cos t , then it follows that f (t ) + g (t ) = (c1 sin t + c2 cos t ) + (d1 sin t + d2 cos t ) = (c1 + d1 ) sin t + (c2 + d2 ) cos t

so that H is closed under addition, too. Thus, H is a subspace of C . In fact, it turns out that the subspace considered in example 1.11.5 contains all of the solutions to a familiar differential equation. We will revisit this issue in example 1.11.7. It is also instructive to consider an example of a set that is not a subspace. Example 1.11.6 Consider the vector space C [−1, 1] of all continuous functions on the interval [−1, 1]. Let H be the set of all functions with the property that f (−1) = f (1) = 2. Determine whether or not H is a subspace of C [−1, 1]. Solution. The set H does not satisfy any of the three required properties of subspaces, so any one of these sufﬁces to show that H is not a subspace. In particular, the zero function z(t ) = 0 does not have the property that z(−1) = 2, and thus the zero function from C [−1, 1] does not lie in H , so H is not a subspace. We could also observe that any scalar multiple of a function whose value at t = −1 and t = 1 is 2 will result in a new function whose value at these points is not 2; similarly, the sum of two functions whose values at t = −1 and t = 1 are 2 will lead to a new function whose values at these points is 4. These facts together show that H is not closed under scalar multiplication, nor under addition. As we have already mentioned, we are considering this generalization of the term vector to include mathematical objects like functions because this structure underlies the study of differential equations, and this vector space perspective will help us to better understand a variety of key ideas when we are solving important problems later on. To foreshadow these coming ideas, we present an example of an elementary differential equation that shows how the set of solutions to the equation is in fact the subspace of continuous functions considered in example 1.11.5.

Generalized vectors

105

Example 1.11.7 Consider the differential equation y + y = 0 Show that y1 = sin t and y2 = cos t are solutions to this differential equation, and that every function of the form y = c1 y1 + c2 y2 is a solution as well. Solution. This example is very similar to example 1.6.4. Because of its importance, we discuss the current problem in full detail here as well. For any equation, a solution is an object that makes the equation true. In the above differential equation, y represents a function. The equation asks “for which functions y is the sum of y and its second derivative equal to zero?” Observe ﬁrst that if we let y1 = sin t , then y1 = cos t , so y1 = − sin t , and therefore y1 + y1 = − sin t + sin t = 0. In other words, y1 is a solution to the differential equation. Similarly, for y2 = cos t , y2 = − sin t and y2 = − cos t , so that y2 + y2 = − cos t + cos t = 0. Thus, y2 is also a solution to the differential equation. Now, consider any function y of the form y = c1 y1 + c2 y2 . That is, let y be any linear combination of the two solutions we have already found. We then have y = c1 sin t + c2 cos t so that, using standard properties of the derivative (properties which are linear in nature), it follows that y = c1 cos t − c2 sin t and y = −c1 sin t − c2 cos t We, therefore see that y + y = (−c1 sin t − c2 cos t ) + (c1 sin t + c2 cos t ) = −c1 sin t + c1 sin t − c2 cos t + c2 cos t =0

so that y is indeed also a solution of y + y = 0. In example 1.11.7, we ﬁnd a large number of connections to our work in systems of linear equations and linear algebra: properties of linearity, linear combinations of vectors, homogeneous equations, inﬁnitely many solutions, and more. In particular, the set of all solutions to the differential equation in example 1.11.7 is precisely the subspace of continuous functions examined in example 1.11.5. Certainly, we will revisit these topics in greater detail as we progress in our study of differential equations. Exercises 1.11 In exercises 1–16, determine whether or not the set H is a subspace of the given vector space V . If H is a subspace, show that it satisﬁes the

106

Essentials of linear algebra

three required properties stipulated by the deﬁnition; if not, show at least one example of why at least one of the properties does not hold. x 2 1. V = R , H = : x ≥ 0, y ≥ 0 y x 2 : x ·y ≥ 0 2. V = R , H = y ⎧ ⎡ ⎫ ⎤ 2 ⎨ ⎬ 3. V = R3 , H = t ⎣ 0 ⎦ : t ∈ R ⎩ ⎭ −1 ⎫ ⎧ ⎡ ⎤ ⎡ ⎤ 2 1 ⎬ ⎨ 4. V = R3 , H = t ⎣ 0 ⎦ + ⎣ 1 ⎦ : t ∈ R ⎭ ⎩ 1 −1 5. V = P2 , H = at 2 : a ∈ R 6. V = P2 , H = at 2 + 1 : a ∈ R 2 −1 5 2 7. V = R , H = x : Ax = b where A = and b = −6 3 −15 2 −1 0 8. V = R2 , H = x : Ax = b where A = and b = 0 −6 3 9. V = M2×2 , H = {A ∈ M2×2 : A is invertible} 10. V = M2×2 , H = {A ∈ M2×2 : A is not invertible} a 0 11. V = M2×2 , H = A ∈ M2×2 : A = b c a 1 12. V = M2×2 , H = A ∈ M2×2 : A = b c 13. V = C [−1, 1], H = f ∈ C [−1, 1] : f (−1) = 0 14. V = C [−1, 1], H = f ∈ C [−1, 1] : f (−1) = 5 15. V = C [−1, 1], H = f ∈ C [−1, 1] : f + f = 0 16. V = C [−1, 1], H = f ∈ C [−1, 1] : f + f = 1 17. Recall that for a given eigenvalue λ of a matrix A, the eigenspace associated to that eigenvalue is the set of all eigenvectors that correspond to λ. For the 2 −1 matrix A = , describe all of the eigenspaces of A. −1 2 2 1 , describe all of the eigenspaces of A. 18. For the matrix A = 0 2

Generalized vectors

107

19. Explain why for any set of vectors {u, v } in Rn , Span{u, v } is a subspace of Rn . Similarly, explain why Span {v1 , . . . , vk } is a subspace of Rn for any set {v1 , . . . , vk }. ⎫ ⎧⎡ ⎤ ⎬ ⎨ 2a + b 20. Let V = R3 and H = ⎣ a − b ⎦ : a , b ∈ R . Determine vectors u and ⎭ ⎩ 3a + 5b v so that H can be expressed as the set Span{u, v }, and hence explain why H is a subspace of R3 . ⎧⎡ ⎫ ⎤ ⎨ 2a + b ⎬ −2 ⎦ : a , b ∈ R . Explain why H is not a 21. Let V = R3 and H = ⎣ ⎩ ⎭ 3a + 5b subspace of R3 . 22. Let A be an m × n matrix. The null space of the matrix A, denoted Nul(A) is the set of all solutions to the equation Ax = 0. Explain why Nul(A) is a subspace of Rn . 23. Let A be an m × n matrix. The column space of the matrix A, denoted Col(A) is the set of all linear combinations of the columns of A. Explain why Col(A) is a subspace of Rm . In exercises 24–27, use the deﬁnitions of the null space Nul(A) and column space Col(A) of a matrix given in exercises 22 and 23. 2 1 −1 24. Let A = . Is the vector v = [−2 1 1]T in Nul(A)? Justify your 1 3 4 answer clearly. In addition, describe all vectors that belong to Nul(A) as the span of a ﬁnite set of vectors. ⎡ ⎤ 1 −2 1⎦. Is the vector v = [−2 1 1]T in Col(A)? Justify your 25. Let A = ⎣ 3 −4 0 answer. Is the vector u = [−1 4 − 4]T in Col(A)? In addition, describe all vectors that belong to Col(A) as the span of a ﬁnite set of vectors. 26. Given a matrix A and a vector v, is it easier to determine whether v lies in Nul(A) or Col(A)? Why? 27. Given a matrix A and a vector v, is it easier to describe Nul(A) or Col(A) as the span of a ﬁnite set of vectors? Why? 28. Consider the differential equation y = 3y. Explain why any function of the form y = Ce 3t is a solution to this equation. Is the set of all these solutions a subspace of the vector space of continuous functions? 29. Consider the differential equation y = 3y − 3. Explain why any function of the form y = Ce 3t + 1 is a solution to this equation. Is the set of all these solutions a subspace of the vector space of continuous functions?

108

Essentials of linear algebra

30. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) If H is a subspace of a vector space V , then H is itself a vector space. (b) If H is a subset of a vector space V , then H is a subspace of V . (c) The set of all linear combinations of any two vectors in R3 is a subspace of R3 . (d) Every nontrivial subspace of a vector space has inﬁnitely many elements. 1.12 Bases and dimension in vector spaces

In section 1.11, we saw that some common sets we encounter in mathematics are very similar to Rn . For instance, the set M2×2 of all 2 × 2 matrices, the set P2 of all polynomials of degree 2 or less, and the set C [−1, 1] of all continuous functions on [−1, 1] are sets that contain a zero element, are closed under addition, and are closed under scalar multiplication. In addition, because they each satisfy the other required seven characteristics we noted, these sets are all vector spaces. We speciﬁcally observe that this enables us to take linear combinations of elements of a vector space, because addition and scalar multiplication are deﬁned and closed in these collections of objects. Every vector space has further characteristics that are similar to Rn . For example, it is natural to discuss now-familiar concepts such as linear independence and span in the context of the more generalized notion of vector. As we will see, the deﬁnitions of these terms in the setting of vector spaces are almost identical to those we encountered earlier in Rn . Moreover, just as we can frequently describe sets in Rn in terms of a small number of special vectors, we will ﬁnd that this often occurs in general vector spaces. We begin by updating two key deﬁnitions. Deﬁnition 1.12.1 In a vector space V , given a set S = {v1 , . . . , vk } where each vector vi ∈ V , the set S is linearly dependent if there exists a nontrivial solution to the vector equation x1 v1 + x2 v2 + · · · + xk vk = 0

(1.12.1)

If (1.12.1) has only the trivial solution (x1 = · · · = xk = 0), then we say the set S is linearly independent. The only difference between this deﬁnition and deﬁnition 1.6.1 that we encountered in section 1.6 is that Rn has been replaced by V . Just as with vectors in Rn , it is an equivalent formulation to say that a set S in a vector space V is linearly independent if and only if no vector in the set may be written as a linear combination of the other vectors in the set. We can also deﬁne the span of a set of vectors in a vector space V .

Bases and dimension in vector spaces

109

Deﬁnition 1.12.2 In a vector space V , given a set of vectors S = {v1 , . . . , vk }, vi ∈ V , the span of S, denoted Span(S) or Span{v1 , . . . , vk }, is the set of all linear combinations of the vectors v1 , . . . , vk . Equivalently, Span(S) is the set of all vectors y of the form y = c1 v1 + · · · + ck vk , where c1 , . . . , ck are scalars. We also say that Span(S) is the subset of V spanned by the vectors v1 , . . . , vk . In example 1.6.3 in section 1.6, we studied three sets R, S, and T in R3 . R contained two vectors and was linearly independent but did not span R3 ; S contained three vectors, was linearly independent, and spanned R3 ; and T consisted of four vectors, was linearly dependent, and spanned R3 . In that setting, we came to see that the set S was in some ways the best of the three: it had both key properties of being linearly independent and a spanning set. In other words, the set had enough vectors to span R3 , but not so many vectors as to generate redundancy by being linearly dependent. Through the next deﬁnition, we will now call such a set a basis, even in the generalized setting of vector spaces and subspaces. Deﬁnition 1.12.3 Let V be a vector space and H a subspace of V . A set B = {v1 , v2 , . . . , vk } of vectors in H is called a basis of H if and only if B is linearly independent and Span(B) = H . That is, B is a basis of H if and only if it is a linearly independent spanning set. Several examples now follow that use the terminology of linear independence, span, and basis in the context of different vector spaces. Example 1.12.1 In the vector space P of all polynomials, consider the subspace H = P2 of all polynomials of degree 2 or less. Show that the set B = {1, t , t 2 } is a basis for H . Is the set {1, t , t 2 , 4 − 3t } also a basis for H ? Solution. To begin, we observe that every element of H = P2 is a polynomial function of the form p(t ) = a0 + a1 t + a2 t 2 . In particular, every element of P2 is a linear combination of the functions 1, t , and t 2 , and therefore the set B = {1, t , t 2 } spans H . In addition, to determine whether the set B is linearly independent, we consider the equation (1.12.2) c 0 + c 1 t + c2 t 2 = 0 and ask whether or not this equation has a nontrivial solution. Keeping in mind that the ‘0’ on the right-hand side represents the zero function in P2 , the function that is everywhere equal to zero, we can see that if at least one of c0 , c1 , or c2 is nonzero, we will be guaranteed to have either a nonzero constant function, a linear function, or a quadratic function, thus making c0 + c1 t + c2 t 2

110

Essentials of linear algebra

not identically zero. This shows that (1.12.2) has only the trivial solution, and therefore the set B = {1, t , t 2 } is linearly independent. Having shown that B is a linearly independent spanning set for H = P2 , we can conclude that B is a basis for H . On the other hand, the set {1, t , t 2 , 4 − 3t } is not a basis for H since we can observe that the element 4 − 3t is a linear combination of the elements 1 and t : 4 − 3t = 4 · 1 − 3 · t . This shows that the set {1, t , t 2 , 4 − 3t } is linearly dependent and thus cannot be a basis. Example 1.12.2 Consider the set H of all functions of the form y = c1 sin t + c2 cos t . In the vector space C of all continuous functions, explain why the set B = {sin t , cos t } is a basis for the subspace H . Solution. First, we recall that H is indeed a subspace of C [−1, 1] due to our work in example 1.11.5. By the deﬁnition of H (the set of all functions of the form y = c1 sin t + c2 cos t ), we see immediately that B is a spanning set for H . In addition, it is clear that the functions sin t and cos t are not scalar multiples of one another: any scalar multiple of sin t is simply a vertical stretch of the function, which cannot result in cos t . This tells us that the set B = {sin t , cos t } is also linearly independent, and therefore is a basis for H . Example 1.12.3 In R3 , consider the set B = {e1 , e2 , e3 }, where e1 = [1 0 0]T , e2 = [0 1 0]T , and e3 = [0 0 1]T . Explain why B is a basis for R3 . Is the set S = {v1 , v2 , v3 }, where v1 = [1 2 − 1]T , v2 = [−1 1 3]T , and v3 = [0 3 1]T also a basis for R3 ? Solution. First, we observe that while the formal deﬁnition of a basis refers to the basis of a subspace H of a vector space V , since every vector space is a subspace of itself, it follows that we can also discuss a basis for a vector space. Considering the set B = {e1 , e2 , e3 }, we observe that the vectors in this set are the columns of the 3 × 3 identity matrix. By the Invertible Matrix Theorem, it follows that the set B is linearly independent because I3 has a pivot in every column. Likewise, the set B spans R3 since I3 has a pivot in every row. As a linearly independent spanning set in R3 , B is indeed a basis. For the set S whose elements are the columns of the matrix ⎡ ⎤ 1 −1 0 1 3⎦ A=⎣ 2 −1 3 1 we again use the Invertible Matrix Theorem to determine whether or not S is a basis for R3 . Row-reducing A, it is straightforward to see that A is row equivalent to the identity matrix, and therefore is invertible. In particular, A has a pivot in

Bases and dimension in vector spaces

111

every column and every row, and thus the columns of A are linearly independent and span R3 . It follows that S is also a basis for R3 . The basis B = {e1 , e2 , e3 } consisting of the columns of the 3 × 3 identity matrix is often referred to as the “standard basis of R3 .” In addition, by our work in example 1.12.3, we can see the role that the Invertible Matrix Theorem plays in determining whether a set of vectors in Rn is a basis or not. Speciﬁcally, since we know that it is logically equivalent for the columns of a square matrix A to be linearly independent and to be a spanning set for Rn , it follows that a matrix A is invertible if and only if its columns form a basis for Rn . We therefore update the Invertible Matrix Theorem with an additional statement as follows. Theorem 1.12.1 (Invertible Matrix Theorem) Let A be an n × n matrix. The following statements are equivalent: a. A is invertible. b. The columns of A are linearly independent. c. The columns of A span Rn . d. A has a pivot position in every column. e. A has a pivot position in every row. f. A is row equivalent to In . g. For each b ∈ Rn , the equation Ax = b has a unique solution. h. det(A) = 0. i. The columns of A form a basis for Rn . Our next example demonstrates how certain families of vectors naturally form subspaces of Rn and how vector arithmetic can be used to determine a basis for the subspace they form. ⎡ ⎤ 3a + b − c ⎢ 4a − 5b + c ⎥ ⎥ Example 1.12.4 Consider the set W of all vectors of the form ⎢ ⎣ a + 2b − 3c ⎦. a −b 4 Show that W is a subspace of R and determine a basis for this subspace. Solution.

First, we observe that a typical element v of W is a vector of the form ⎤ ⎡ 3a + b − c ⎢ 4a − 5b + c ⎥ ⎥ v=⎢ ⎣ a + 2b − 3c ⎦ a −b

112

Essentials of linear algebra

Using properties of vector addition and scalar multiplication, we can write ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 3 1 ⎢4⎥ ⎢ −5 ⎥ ⎢ 1⎥ ⎥ ⎢ ⎥ ⎢ ⎥ v =a⎢ ⎣ 1 ⎦ + b ⎣ 2 ⎦ + c ⎣ −3 ⎦ 1 −1 −1 From this, we observe that W may be viewed as the span of the set S = {w1 , w2 , w3 }, where ⎤ ⎤ ⎡ ⎤ ⎡ ⎡ −1 3 1 ⎥ ⎥ ⎢4⎥ ⎢ ⎢ ⎥ , w2 = ⎢ −5 ⎥ , w3 = ⎢ 1 ⎥ w1 = ⎢ ⎣1⎦ ⎣ 2⎦ ⎣ −3 ⎦ 1 −1 −1 As seen in exercise 19 in section 1.11, the span of any set of vectors in Rn generates a subspace of Rn ; it follows that W is a subspace of R4 . Moreover, we can observe that S = {w1 , w2 , w3 } is a linearly independent set since ⎡ ⎤ ⎡ ⎤ 3 1 −1 1 0 0 ⎢4 −5 ⎢ ⎥ 1⎥ ⎢ ⎥ → ⎢0 1 0⎥ ⎣1 ⎦ ⎣ 0 0 1⎦ 2 −3 0 0 0 1 −1 −1 Since S both spans the subspace W and is linearly independent, it follows that S is a basis for W . In example 1.12.4 we used the fact that the span of any set in Rn is a subspace of Rn . This result extends to general vector spaces and is stated formally in the following theorem. Theorem 1.12.2 subspace of V .

In any vector space V , the span of any set of vectors forms a

It is not hard to prove this result. Since the span of a set contains all linear combinations of the set, it must contain the zero combination and be closed under both vector addition and scalar multiplication. One of the reasons that a basis for a subspace is important is that a basis tells us the minimum number of vectors needed to fully describe every element of the subspace. More speciﬁcally, given a basis B for a subspace W , we know that we can write every element of W uniquely as a linear combination of the elements in the basis. Note that a subspace does not have a unique basis; for example, in example 1.12.3, we saw two different bases for R3 . Furthermore, in R3 we have seen that the standard basis (and one example of another basis) has three elements. By the Invertible Matrix Theorem, it is clear that every basis of R3 consists of three vectors since we are required to have a set that is both linearly independent and spans R3 . Likewise, any basis of Rn will have n elements. It can be shown that even in vector spaces

Bases and dimension in vector spaces

113

other than Rn , any two bases of a subspace are guaranteed to have the same number of elements. Therefore, this number of elements in a basis can be used to identify a fundamental property of any subspace: the minimum number of elements needed to describe all of the elements in the space. We call this number the dimension of the subspace. Deﬁnition 1.12.4 Given a subspace W in a vector space V and a basis B for W , the number of elements in B is the dimension of W . Equivalently, if B has k elements, we write dim(W ) = k. Thus we naturally use the language that “R3 is three-dimensional” and similarly that “Rn has dimension n.” Similarly, we can say dim(P2 ) = 3 (see example 1.12.1), and that the dimension of the vector space of all linear combinations of the functions sin t and cos t is two (see example 1.12.2). In closing, it is worth recalling example 1.6.3 in section 1.6, where we considered three sets R, S, and T in R3 . R contained two vectors and was linearly independent but did not span R3 ; S contained three vectors, was linearly independent, and spanned R3 ; and T consisted of four vectors, was linearly dependent, and spanned R3 . Since the set S has both key properties of being linearly independent and a spanning set, we can say that the set S is a basis for R3 , which further reﬂects the fact that dim(R3 ) = 3. Exercises 1.12 In the vector space V given in each of exercises 1–7, determine a basis for the subspace H and hence state the dimension of H . ⎧ ⎡ ⎫ ⎤ 2 ⎨ ⎬ 1. V = R3 , H = t ⎣ 0 ⎦ : t ∈ R ⎩ ⎭ −1 2. V = P2 , H = at 2 : a ∈ R ⎧⎡ ⎫ ⎤ 2a + 3b ⎪ ⎪ ⎪ ⎪ ⎨⎢ ⎬ ⎥ a − 4b 4 ⎥ : a, b ∈ R 3. V = R , H = ⎢ ⎣ −3a + 2b ⎦ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ a −b 4. V = P (the vector space of all polynomials), H = Pn (the subspace of all polynomials of degree n or less) 2 −1 2 5. V = R , H = x : Ax = 0 where A = −6 3 1 −3 2 −1 4 6. V = R , H = x : Ax = 0 where A = −2 5 0 4 a 0 7. V = M2×2 , H = A ∈ M2×2 : A = b c

114

Essentials of linear algebra

8. Determine whether or not the following set S is a basis for R3 . If not, is some subset of S a basis for R3 ? Explain. ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 0 1 2 ⎬ ⎨ 1 S = ⎣ 0 ⎦,⎣ 1 ⎦,⎣ 1 ⎦,⎣ 1 ⎦ ⎩ ⎭ 1 1 1 3 9. Is the set S = {[1 2]T , [2 1]T } a basis for R2 ? Justify your answer. 10. Is the set S = {[1 2]T , [−4 − 8]T } a basis for R2 ? Justify your answer. 11. Is the set S = {[1 2 1 1]T , [2 1 1 − 1]T , [−1 1 3 1]T , [2 4 5 1]T } a basis for R4 ? Justify your answer. 12. Is the set S = {[1 2 1 1]T , [2 1 1 − 1]T , [−1 1 3 1]T , [2 4 5 0]T } a basis for R4 ? Justify your answer. 13. Can a set with three vectors be a basis for R4 ? Why or why not? 14. Can a set with seven vectors be a basis for R6 ? Why or why not? 15. Not every vector space has a basis with ﬁnitely many elements. If there is not a ﬁnite basis, then we say that the vector space is inﬁnite dimensional. Explain why the vector space P of all polynomial functions is an inﬁnite dimensional vector space. 16. Let V be the vector space V = C [−1, 1] and H the subset deﬁned by H = f ∈ C [−1, 1] : f is differentiable Explain why H is an inﬁnite dimensional subspace of V and why we cannot explicitly write down the elements in a basis for H . 17. Recall from exercises 22 and 23 in section 1.11 that the null space of a matrix A is the subspace of all solutions to the equation Ax = 0 and that the column space of A is the space spanned by the columns of A. By exploring several different examples of matrices A of your choice, discuss how the dimensions of the null and column spaces are related to the number of pivot columns in the matrix. In particular, explain what you can say about the relationship between the sum of the dimensions of the null and column spaces and the number of columns in the matrix A. 18. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) Any set of ﬁve vectors is a basis for R5 . (b) If S is a linearly independent set of six vectors in R6 , then S is a basis for R6 . (c) If the determinant of a 3 × 3 matrix A is zero, then the columns of A form a basis for R3 . (d) If A is an n × n matrix whose columns span Rn , then the columns of A form a basis for Rn .

For further study

115

1.13 For further study 1.13.1 Computer graphics: geometry and linear algebra at work

In modern computer graphics, images consisting of sets of pixels are moved around the screen through mathematical computations that rely on linear algebra. If we focus on two-dimensional objects, there are several basic moves that we must be able to perform: translation, rotation, reﬂection, and dilation. In what follows, we explore the role that linear algebra plays in the geometry of linear transformations and computer graphics. (a) In section 1.8.1 we began to develop an understanding of how matrix multiplication can be used to move a two-dimensional image around the plane. If you have not already read this section, do so now. If we take the perspective that a given point in the plane is stored in the vector v, then for any 2 × 2 matrix A, the matrix A moves the vector via multiplication to the new location Av. If we have a ﬁnite set of points (which together constitute an image), we can store the points in a matrix M whose columns represent the individual points), and the new image which results from multiplication by A is given by AM. Consider the triangle with vertices (0, 0), (3, 1), and (2, 2), stored in the matrix 0 3 2 M= 0 1 2 Choose three different matrices A and compute AM. Then explain why it is impossible to use multiplication by a 2 × 2 matrix to translate the triangle so that all three of its vertices appear in new locations. (b) Due to our discovery in (a) that a simple translation is impossible using 2 × 2 matrices, we introduce the notion of homogeneous coordinates; instead of representing points in the two-dimensional plane as [x y ]T , we move to a plane in three-dimensional space where the third coordinate is always 1. That is, instead of [x y ]T we use [x y 1]T . Consider the matrix A given by

⎡ ⎤ 1 0 a A = ⎣0 1 b ⎦ 0 0 1

(1.13.1)

and the triangle from (a) which can be represented in homogeneous coordinates by the matrix ⎡ ⎤ 0 3 2 M = ⎣0 1 2⎦ 1 1 1

116

Essentials of linear algebra

Compute AM. What has happened to each vertex of the triangle represented by M? Explain in terms of the parameters a and b in A. (c) Using a = 2 and b = −1 in (1.13.1) along with the triangle M from above, compute AM in order to determine the translation of the triangle 2 units in the x-direction and −1 units in the y-direction. Sketch both the original triangle and its image under this translation. (d) In order to view some more sophisticated graphics, we use Maple in our computations that follow. Rather than performing operations on a triangle, we will use the syntax > with(plots): with(LinearAlgebra): > setoptions(scaling=constrained, axes=boxed, tickmarks=[5,5]): > X := cos(t)*(1+sin(t))*(1+0.3*cos(8*t))* (1+0.1*cos(24*t)): > Y := sin(t)*(1+sin(t))*(1+0.3*cos(8*t))* (1+0.1*cos(24*t)): > plot([X,Y,t=0..2*Pi], color=blue, thickness=3);

which generates a parametric curve whose plot is the leaf shown in ﬁgure 1.18. Input these commands in Maple, as well as the syntax > leaf := plot([X,Y,t=0..2*Pi], color=grey, thickness=1):

to store the image of the original leaf in leaf.

2.0

1.0

0.0 −1.0

0.0

Figure 1.18 A Maple leaf.

1.0

For further study

117

Finally, for a given matrix A of the form ⎡ ⎤ a11 a12 a13 A = ⎣a21 a22 a23 ⎦ 0 0 1 and a vector Z = [X Y 1], compute AZ (by hand) to show how AZ depends on the entries in A. (e) By our work in (c) and (d), if we now let ⎡ ⎤ 1 0 2 A = ⎣0 1 −1⎦ 0 0 1 the product AZ should result in translation of the leaf by the vector [2 − 1]T . To test this, we deﬁne the matrix A in Maple by > A := ;

and compute the coordinates in the new image by > Xnew := A[1,1]*X + A[1,2]*Y + A[1,3]*1: > Ynew := A[2,1]*X + A[2,2]*Y + A[2,3]*1: > image1 := plot([Xnew,Ynew,t=0..2*Pi], thickness=3, color=blue):

The last command above plots the resulting image and stores it in image1. Display both the original leaf and the new image with the command > display(leaf, image1);

and show that this results indeed in the translated leaf as shown in ﬁgure 1.19. (f) In section 1.8.1, we learned that a matrix of the form cos θ − sin θ R= sin θ cos θ is known as a rotation matrix and, through multiplication, rotates any vector by θ radians counterclockwise about the origin. To work with a rotation matrix in homogeneous coordinates, we update the matrix as follows: ⎤ ⎡ cos θ − sin θ 0 cos θ 0⎦ R = ⎣ sin θ 0 0 1 Let us say that we wanted to perform two operations on the leaf. First, we wish to translate the leaf as above along the vector [2 − 1]T , and then we

118

Essentials of linear algebra

1

−1 −1

1

3

Figure 1.19 The original leaf and its transla-

tion by [2 − 1]T .

want to rotate the resulting image π/4 radians clockwise about the origin. We can accomplish this through two matrices by computing their product, as the following discussion shows. From (e), we know that using the matrix > Translation := ;

leads to the desired translation. Likewise, the matrix > Rotation := ;

will produce the sought rotation. Explain why the matrix > A := Rotation.Translation;

will produce the combined translation and rotation, and plot the resulting ﬁgure by updating your computations for Xnew and Ynew and using the syntax > image2 := plot([Xnew,Ynew,t=0..2*Pi], thickness=4, color=black): > display(leaf, image1, image2);

(g) What is the result of applying the matrix ⎤ ⎡ 1 0 0 ⎥ ⎢ A = ⎣0 12 0⎦ 0 0 1

For further study

119

on the leaf? What kind of geometric transformation is performed by this matrix? What matrix would keep the height of the leaf constant but stretch its width by a factor of 2? (h) It can be shown that to reﬂect an image across a line through the origin that forms an angle α with the positive x-axis, the necessary matrix is ⎡ ⎤ cos 2α sin 2α 0 A = ⎣ sin 2α − cos 2α 0⎦ 0 0 1 By ﬁnding the appropriate value of α , ﬁnd the matrix that will reﬂect an image across the line y = x and compute and plot the image of the original leaf under this reﬂection. (i) Exercises for further practice and investigation: 1. Find the image of the original leaf under rotation about the origin by 2π/3 radians, followed by a reﬂection across the y-axis. 2. Find the image of the original leaf under rotation about the point (−3, 1) by −π/6 radians. (Hint: To rotate about a point other than the origin, ﬁrst translate that point to the origin, then rotate, then translate back.) 3. Find the image of the original leaf under translation along the vector [3 2]T , followed by reﬂection across the line y = x /2. 1.13.2 Bézier curves

In what follows7 , we explore the use of a speciﬁc type of parametric curves, called Bézier curves (pronounced “bezzy-eh”), which have a variety of important applications. These curves were originally developed by two automobile engineers in France in the 1960s, P. Bézier and P. de Casteljau, who were working to develop mathematical formulas to graph the smooth, wiggle-free curves that formed the shape of a car’s body. Today, Bézier curves ﬁnd their way into our lives every day: they are used to create the letters that appear in typeset fonts. The principles that govern these curves involve fundamental mathematics from linear algebra and calculus. (a) In calculus, we study parametric curves given in the form x = f (t ), y = g (t ), where f and g are each functions of the parameter t . Another way to denote this situation is to write P(t ) = (f (t ), g (t )) where t belongs to some interval of real numbers. Note that P(t ) is essentially a vector; the graph of P(t ) is the parametric curve traced out by 7 The material in this project has been adapted from Steven Janke’s chapter “Designer Curves” in Applications of Calculus, MAA Notes Number 29, Philip Strafﬁn, Ed.

120

Essentials of linear algebra

the vector over time. It will be most convenient if we simply write this as P(t ) = (x(t ), y(t )) in what follows. In this problem we begin to consider some special formulas for x(t ) and y(t ). To parameterize the line between the points P0 (1, 3) and P1 (3, 7), we can think about wanting to make x go from 1 to 3, and y go from 4 to 7. Indeed, we want these to occur simultaneously as t goes from 0 to 1. Consider the parameterization: x = x(t ) = 1 + t (3 − 1) = t · 3 + (1 − t ) · 1 y = y(t ) = 3 + t (7 − 3) = t · 7 + (1 − t ) · 3 0≤t ≤1 Observe that when t = 0, x = 1 and y = 3, and when t = 1, x = 3 and y = 7. Show that the curve parameterized by these two equations is indeed the line segment between P0 and P1 . For instance, you might use algebra to eliminate the variable t , thereby deducing a relationship between x and y. (b) We can think about the equations for x and y in (a) in a more compact manner. Consider the following vector notation to replace the previous equations: x(t ) 3 1 P(t ) = (1.13.2) =t + (1 − t ) y(t ) 7 3 This is sometimes referred to as taking a convex combination of the points (1, 3) and (3, 7), because t and 1 + t are both nonnegative and sum to 1. Using the above style, write the parametric equations for the line segment that passes between the general points P0 (x0 , y0 ) and P1 (x1 , y1 ). (c) An even more concise notation is to simply write P(t ) = (1 − t )P0 + tP1 . We will now use this notation to combine two or more of these parameterizations for line segments in a way that constructs curves that can be “controlled” in very interesting ways. Consider three points, labeled P0 , P1 , and P2 . In the most recent form of P(t ) given above at (1.13.2), write parameterizations for the two line segments from P0 to P1 and from P1 to P2 , as pictured below. Call the ﬁrst parameterization P (1) (t ) and the second parameterization P (2) (t ). In addition, determine the parameterizations P (1) (t ) and P (2) (t ) for the speciﬁc set of points P0 (2, 3), P1 (4, 7), and P2 (7, 1). Show your work, and write each out in the expanded form where you have an expression for x(t ) and another for y(t ). (d) From the two line-segment parameterizations in (c), we will now create a new parametric plot by taking similar combinations of P (1) (t ) and P (2) (t ).

For further study

121

P2 P0 P (2) P

(1)

P1 Figure 1.20 The

line segments from P0 to P1 and P1 to P2 .

Consider the function Q(t ) deﬁned as follows: Q(t ) = (1 − t ) · P (1) (t ) + t · P (2) (t )

(1.13.3)

First, substitute in (1.13.3) your expressions for P (1) (t ) and P (2) (t ) from (c) that involve the general points P0 , P1 , and P2 . Simplify the result as much as possible in order to write the formula for Q in the following form: Q(t ) = a0 (t )P0 + a1 (t )P1 + a2 (t )P2 where a0 (t ), a1 (t ), and a2 (t ) are polynomial functions of t . Then, using the speciﬁc parameterizations for P (1) (t ) and P (2) (t ) for the points P0 (2, 3), P1 (4, 7), and P2 (7, 1), determine the parametric equations for x(t ) and y(t ) that make up the function Q(t ). For each of these three parameterizations (P (1) , P (2) , and Q), use Maple to sketch a plot8 and describe the results in detail. For example, how does Q(t ) look in comparison to the two line segments? What kind of functions make up the components x(t ) and y(t ) in Q? What is true about Q(0) relative to the points P0 , P1 , and P2 ? Q(1)? What direction is a particle moving along Q(t ) headed as t starts out away from 0? As t gets near to 1? (e) It turns out that we will have even more freedom and control in drawing curves if we start with four control points, P0 , P1 , P2 , and P3 . The development here is similar to what was done above, just using a greater number of points. First, parameterize the segments from P0 to P1 (with P (1) (t )), P1 to P2 (with P (2) (t )), and from P2 to P3 (with P (3) (t )). The usual formulas apply 8 The Maple syntax to plot a parametric curve (f (t ), g (t )) on the interval [a , b ] is > plot([f(t),g(t),t=a..b]);.

122

Essentials of linear algebra

here; write down the basic form of each P (j) (t ), j = 1, 2, 3, in terms of the various points Pi . Then combine, as in (d) above, the parameterizations for the ﬁrst two segments to get a new function Q (1) ; also combine the parameterizations for the second two segments to get Q (2) . These Q parameterizations are written as Q (1) (t ) = (1 − t ) · P (1) (t ) + t · P (2) (t ) Q (2) (t ) = (1 − t ) · P (2) (t ) + t · P (3) (t ) Finally, combine Q (1) and Q (2) to get a new parametric function that we call B(t ) according to the natural formula B(t ) = (1 − t ) · Q (1) (t ) + t · Q (2) (t ) By substituting appropriately for Q (1) (t ) and Q (2) (t ) and then replacing these with the appropriate P (j) (t ) functions, show that B(t ) = P0 (1 − t )3 + 3P1 t (1 − t )2 + 3P2 t 2 (1 − t ) + P3 t 3 . B(t ) is called a cubic Bézier curve. By ﬁnding and using appropriate t values, show that the points P0 and P3 both lie on the curve given by B(t ). (f) Write the formulas for x(t ) and y(t ) that give the parameterizations for the cubic Bézier curve that has the four control points P0 (2, 2), P1 (5, 10), P2 (40, 20), and P3 (10, 5). Use Maple to plot each of the parametric curves given by P (j) (t ), j = 1 . . . 3, Q (1) (t ), Q (2) (t ), and B(t ) in the same window. Discuss how the various curves combine to form others. (g) For the general Bézier curve with control points P0 (x0 , y0 ), P1 (x1 , y1 ), P2 (x2 , y2 ), and P3 (x3 , y3 ), derive the equation for the tangent line to the curve at the point (x0 , y0 ), and prove that the point (x1 , y1 ) lies on this tangent line. (Hint: to determine the slope of the tangent line, use the chain rule in the standard way for ﬁnding dy /dx for a parametric curve.) (h) Laser printers and the program Postscript use Bézier curves to construct the fonts that we use to represent letters. For example, a picture of the letter g is shown below that reveals the control points and Bézier curves required to accomplish this. In Maple, use two or more Bézier curves to sketch a reasonable representation of the letter S. (You need not try to emulate the thickness of the ‘g’ that is shown above.) Then, use an appropriate number of Bézier curves to create an approximation of the lowercase letter ‘a,’ in the form shown here in quotes. State the control points required for the various curves.

For further study

123

Figure 1.21 The letter g.

(i) Discuss the role that vectors and linear combinations play in the development of Bézier curves. 1.13.3 Discrete dynamical systems

A linear discrete dynamical system is a model that represents changes in a system from time k to time k + 1 by the rule x (k +1) = Ax (k) A discrete dynamical system is similar to a Markov chain, but we no longer require that the columns of the matrix A sum to 1. A key issue in either scenario is the long term behavior of the quantity x (k) being modeled. In what follows, we explore the role of eigenvalues and eigenvectors in determining this long-term behavior and study an important application of these ideas. (a) To begin investigating the long-term behavior of the system, we will assume that A is an n × n matrix with n real linearly independent eigenvectors v1 , . . . , vn . Furthermore, assume that the corresponding real eigenvalues of A satisfy the relationship |λ1 | > |λ2 | ≥ · · · ≥ |λn |

Consider an initial vector x (0) . Explain why there exist constants c1 , . . . , cn such that x (0) = c1 v1 + c2 v2 + · · · + cn vn and show that Ax (0) = c1 λ1 v1 + c2 λ2 v2 + · · · + cn λn vn

124

Essentials of linear algebra

Furthermore, show that x (k) = Ak x (0) = c1 λk1 v1 + c2 λk2 v2 + · · · + cn λkn vn

(1.13.4)

(b) In (1.13.4), divide both sides by λk1 . What can you conclude about (λ2 /λ1 )k as k → ∞? Why can you make similar conclusions about (λj /λ1 )k for j = 3 . . . n? Hence explain why for large k k 1 Ak x (0) ≈ c1 v1 λ1 and thus why Ak x (0) is an approximate eigenvector of A corresponding to v1 . (c) In studying a population like spotted owls, mathematical ecologists often pay close attention to the various numbers of a species at different stages of life. For example, for spotted owls there are three pronounced groupings: juveniles (under 1 year), subadults (1 to 2 years old), and adults (2 years and older). The owls mate during the latter two stages, breed as adults, and can live for up to 20 years. A critical time in the life cycle and survival of these owls is when the juvenile leaves the nest to build a home of its own.9 Let the number of spotted owls in year k be represented by the vector ⎡ ⎤ jk x (k) = ⎣ sk ⎦ ak where jk is the number of juveniles, sk the number of subadults, and ak the number of adults. Using ﬁeld data, mathematical ecologists have determined10 that a particular spotted owl population is modeled by the discrete dynamical system ⎡ ⎤ 0 0 0.33 0 0⎦ x (k) x (k +1) = ⎣0.18 0 0.71 0.94 What does this model imply about the percent of juveniles that survive to become subadults? About the percent of subadults that survive to become adults? About the percent of adults that survive from one year to the next? What percent of adults produce juvenile offspring in a given year? (d) Assume that in a given region, ecologists have measured the present populations as follows: j0 = 200, s0 = 45, and a0 = 725. Use the model stated in (c) to determine the population x (k) = [jk sk ak ]T for k = 1, . . . , 20. Do you think the spotted owl will become extinct? Give a 9 To read more about the issue of spotted owl survival, see the introduction to chapter 5 of David C. Lay’s Linear Algebra and its Applications. 10 R. H. Lamberson et al., “A Dynamic Analysis of the Viability of the Northern Spotted Owl in a Fragmented Forest Environment,” Conservation Biology 6 (1992), 505–512.

For further study

125

convincing argument using not only your computations of the population vectors but also the results of (b). (e) Say that r is the fraction of juveniles that survive from one year to the next (that is, replace 0.18 in the matrix of the model with r) . By experimenting with different values of r, determine the minimum fraction of juveniles that must survive from one year to the next in order for the spotted owl population not to become extinct. How does your answer depend on the eigenvalues of the matrix? (f) Let A be the n × n matrix of a discrete dynamical system and assume that A has n real linearly independent eigenvectors. Let x (0) be an initial vector and let ρ (A) denote the maximum absolute value of an eigenvalue of A. Show that the following are true: (i) If ρ (A) < 1, then limk →∞ Ak x (0) = 0. (ii) If ρ (A) = 1 and λ = 1 is the unique eigenvalue having this maximum absolute value, then limk →∞ Ak x (0) is an eigenvector of A. (iii) If ρ (A) > 1, then there exist choices of x (0) for which Ak x (0) grows without bound.

This page intentionally left blank

2 First-order differential equations

2.1 Motivating problems

Differential equations arise naturally in many problems encountered when modeling physical phenomena. To begin our study of this subject, we introduce two fundamental examples that demonstrate the central role that differential equations play in our world. In section 1.1, we discussed how the amount of salt present in a system of two tanks can be modeled through a system of differential equations. Here, an even simpler situation is considered: our goal is to predict the amount of salt present in a city’s water reservoir at time t , given a set of determining conditions. Suppose that the reservoir is ﬁlled to its capacity of 10 000 m3 , and that measurements indicate an initial concentration of salt of C0 = 0.02 g/m3 . Note that it follows there are A0 = 200 g of salt initially present. As the city draws this solution from the reservoir for use, new solution (water with some salt concentration) from the local treatment facility ﬂows into the reservoir so that the volume of water present in the tank stays constant. Let us assume that the concentration of salt in the inﬂowing solution is 0.01 g/m3 , and that the rate of this inﬂow is 1000 m3 /day. Since the city is also assumed to be drawing solution at an equal rate from the reservoir, the outﬂow also occurs at a rate of 1000 m3 /day. We are interested in several key questions. How much salt is in the tank at time t ? What is the concentration of salt in the water being used by the city at time t ? What happens to these values over time? We will let A(t ) denote the amount of salt in the tank at time t . The instantaneous rate of change dA /dt of A(t ) is given by the difference between 127

128

First-order differential equations

the rate at which salt is entering the tank and the rate at which salt is leaving. Exploring the given information regarding inﬂow and outﬂow, we can determine these rates precisely. Since solution is entering the reservoir at 1000 m3 /day containing a concentration of 0.01 g/m3 , it follows that salt is entering the tank at a rate of m3 g g · 0.01 3 = 10 day m day For salt leaving the reservoir, the situation is slightly more complicated. Since we do not know the exact amount of salt present in the reservoir at time t , we denote this by A(t ). Assuming that the solution in the reservoir is uniformly mixed, the concentration of salt in the outﬂowing solution is the ratio of the amount A(t ) of salt to the volume of the tank. That is, the outﬂowing concentration is A(t ) g 10 000 m3 Since this outﬂow is occurring at a rate of 1000 m3 /day, it follows that salt is leaving the tank at a rate of 1000

m3 A(t )g A(t ) g = · day 10 000 m3 10 day It now follows that the instantaneous rate of change dA /dt of salt in the tank in grams per day is given by the difference of the rate of salt entering and the rate of salt leaving the tank. Speciﬁcally, dA A(t ) = 10 − (2.1.1) dt 10 Note carefully what this last equation is saying: A(t ) is an unknown function, but we have an equation that relates this unknown function to its derivative. Such an equation is called a differential equation. The solution to this equation is a function A(t ) that makes the equation true. If we can solve the equation for A(t ), we then will be able to predict the amount of salt in the tank at any time t . Determining such solutions and their long-term behaviors is the main focus of this chapter. Another important application of differential equations involves population growth. Consider a population P(t ) of animals. As likelihood of reproduction depends on the number of animals present, it is natural to assume that the rate of change of P(t ) is directly proportional to P(t ). Phrased in terms of the derivative, this assumption means that dP = kP(t ) (2.1.2) dt where k is some positive constant. Observe that (2.1.2) is a differential equation involving the function P. It is a standard exercise in calculus to show that functions of the form P(t ) = P0 e kt are solutions to (2.1.2). 1000

Deﬁnitions, notation, and terminology

129

Because the function P(t ) = P0 e kt exhibits unbounded growth over time, it turns out that this exponential growth model is not realistic beyond a relatively short period of time. A related, but more sophisticated, model of population growth is the logistic differential equation dP P(t ) = kP(t ) 1 − dt A where the constant k is considered the reproductive rate of the population and the constant A is the surrounding environment’s carrying capacity. For example, if a population had a relative growth rate of k = 0.02 and a carrying capacity of A = 100, the population function would satisfy the differential equation dP P(t ) = 0.02P(t ) 1 − dt 100 The logistic model, usually credited to the Dutch mathematician Pierre Verhulst, accounts not only for reproductive growth, but also for mortality by considering environmental limitations on maximum population. The logistic equation is more challenging to solve; we will do so in section 2.7. In addition to mixing problems and models of population growth, differential equations enjoy widespread applications in other physical phenomena. Differential equations are also mathematically interesting in and of themselves, and in upcoming sections we will study not only their applications, but also their key properties and characteristics to better understand the subject as a whole.

2.2 Deﬁnitions, notation, and terminology

As we have seen with the examples dA A = 10 − dt 10 dP P = 0.02P 1 − dt 100 y + y = 0

(2.2.1) (2.2.2) (2.2.3)

a differential equation is an equation relating an unknown function to one or more of its derivatives. Usually we will suppress the notation “A(t )” and instead simply write “A,” as in (2.2.1). We will interchangeably use the notations y and dy /dt to represent the ﬁrst derivative; similarly, y = d 2 y /dt 2 . Other books sometimes employ the notations y = D(y) = y˙ and y = D 2 (y) = y¨ . A solution of a differential equation is a differentiable function that satisﬁes the equation on some interval (a , b) of values for the independent variable. For example, the function y = sin t is a solution to (2.2.3) on (−∞, ∞) since y = − sin t , and − sin t + sin t = 0 for all values of t . Given any differential equation, we are interested in determining all of its solutions. But many, if not most, differential equations are difﬁcult or impossible

130

First-order differential equations

to solve. For example, the equation y + ty = t (which is only a slightly modiﬁed version of (2.2.3)) has no solution in terms of elementary functions.1 In such situations, we may turn to qualitative or approximation methods that may enable us to analyze how a solution should behave, while perhaps not being able to determine an explicit formula for the function. Equations (2.2.1), (2.2.2), and (2.2.3) are often called ordinary differential equations, in contrast to partial differential equations such as ∂ 2u ∂ 2u + =0 ∂x2 ∂y2

where the solution function u(x , y) has two independent variables x and y. Our focus will be on ordinary differential equations, as partial differential equations are beyond the scope of this text. The order of a differential equation is the order of the highest derivative present. For example, (2.2.1) and (2.2.2) are ﬁrst-order differential equations since they only involve ﬁrst derivatives. Equation (2.2.3) is second-order. For now, we limit our attention to ﬁrst-order equations; higher order equations will be discussed in detail in subsequent chapters. It is important to note that every student of calculus learns to solve a certain class of differential equations through integration. For example, the problem, “ﬁnd a function y whose derivative is te t ” can be restated as a differential equation. In particular, this problem can be stated as the differential equation dy = te t (2.2.4) dt Integrating both sides with respect to t and using integration by parts on the right, it follows that y(t ) = te t − e t + C is a solution for any choice of the constant C. Here we see an important fact: differential equations typically have a family of inﬁnitely many solutions. Determining all possible members of that family, like determining all solutions to systems of linear equations in linear algebra, will be a central component of our work. Calculus students also know that if we are given one more piece of information about the function y along with (2.2.4), it is possible to uniquely determine the integration constant, C. For example, had the problem above read, “ﬁnd a function y whose derivative is te t such that y(0) = 5,” we could integrate to ﬁnd y = te t − e t + C, just as we did previously, and then use the initial condition y(0) = 5 to see that C must satisfy the equation 5 = 0 · e0 − e0 + C 1

This fact is not obvious.

Deﬁnitions, notation, and terminology

131

and thus C = 6. When we are given a differential equation of order n along with n initial conditions, we say that we are solving an initial-value problem.2 In the given example, y = te t − e t + 6 is the solution to the stated initial-value problem. Based on the example above and our experience in calculus, it is clear that integration is an obvious (and often effective) approach to solving differential equations of the form dy = f (t ) dt where f (t ) is a given function. If we can integrate f symbolically, then the differential equation is solved. Even if f (t ) cannot by integrated symbolically with respect to t , we can still use techniques like numerical integration to successfully attack the problem. The situation grows more complicated when we want to solve differential equations that also involve the unknown function y, such as dy = te y dt In what follows in this chapter, we seek to classify ﬁrst-order equations into types that can be solved in a straightforward way by symbolic means (often involving integration), as well as to develop methods that can be used to generate approximate solutions in situations where a symbolic solution is either difﬁcult or impossible to attain. Throughout, the general form of the equations we are considering will be y = f (t , y), where the function f (t , y) represents some combination of the independent variable t and the unknown function y. It is also important to note that a wide range of ﬁrst-order initial-value problems are guaranteed to have unique solutions. This is stated formally in the following theorem, whose proof may be found in more advanced texts. Theorem 2.2.1 Consider the initial-value problem given by y = f (t , y), y(t0 ) = y0 . If the function f (t , y) is continuous on a rectangle that includes (t0 , y0 ) in its interior and the partial derivative3 fy (t , y) is continuous on that same rectangle, then there exists an interval containing t0 on which the initial-value problem has a unique solution. Often the dependent variable, or unknown function y, in a differential equation will model an important quantity in some physical problem: the amount of salt in a tank at time t , the number of members of a population at a given time, or the position of a mass attached to a spring. As such, we will place particular emphasis on the graph of the solution function in order to better understand what the differential equation is telling us about the physical situation it models. 2 3

We often use the abbreviation IVP to stand for the phrase “initial-value problem.” We typically use the notation fy (t , y) = ∂ f /∂ y.

132

First-order differential equations

Just as geometry and graphical interpretations shaped our understanding of linear algebra in chapter 1, these perspectives will prove extremely helpful in our study of differential equations. We begin our explorations of these graphical interpretations through the reservoir problem from section 2.1 and the earlier example y = te t . So far in our references to derivatives in the reservoir and population models, we have viewed the derivative as measuring the instantaneous rate of change of a quantity that is varying. From a more geometric point of view, we also know that the derivative of a function measures the slope of the tangent line to the function’s graph at a given point. For example, with the differential equation dA A = 10 − (2.2.5) dt 10 we can say that if, at some time t , the amount of salt A is A = 20, then dA /dt = 10 − 20/10 = 8. Thus, if A(t ) is a solution to the differential equation, it follows that at any time where A(t ) = 20, A (t ) = 8. Graphically, this means that at such a point, the slope of the tangent line to the curve must be 8. Since we are interested in the function A(t ) over an interval of t -values, we also expect that A(t ) will take on a wide range of values. As such, it is natural to compute the slope of the tangent line determined by (2.2.5) for a large number of different values of A and t . Obviously computers are best suited to such a task, and, as we will see in the introduction to Maple commands at the end of this section, Maple and other computer algebra systems provide tools for doing so. Computing values of dA /dt over a grid of t and A values, we can plot a small portion of each corresponding tangent line at the point (t , A), and see the resulting slope ﬁeld (or direction ﬁeld). The slope ﬁeld for (2.2.5) is shown in ﬁgure 2.1. A(t) 200 150 100 50 t 10

20

Figure 2.1 The

30

40

50

slope ﬁeld for (2.2.5); the graph of the solution corresponding to an initial condition A(0) = 200 is included.

Deﬁnitions, notation, and terminology

133

Observe that a slope ﬁeld provides an intuitive way to understand the information a ﬁrst-order differential equation possesses: the slope at each point gives the direction of the solution at that point. Indeed, we use arrows instead of small lines in order to indicate the ﬂow of the solution as time increases. In essence, the slope ﬁeld is a map that the solution must navigate based on the initial point from which the function starts. For example, if we use the initial condition A(0) = 200 (as was given in the original example in section 2.1), we can start a graph at the point (0, 200) and follow the map. Doing so yields the curve shown in ﬁgure 2.1. Note particularly how we can clearly see the slope of the solution curve ﬁtting with the slopes present in the direction ﬁeld. Moreover, observe that the direction ﬁeld provides an immediate overall sense of how every solution to the differential equation behaves: for any solution A(t ), A(t ) → 100 as t → ∞. This makes sense physically, too, since the saltwater solution entering the reservoir has concentration 0.01 g/m3 . Over time, the concentration of solution in the reservoir should tend to that level, and with 10 000 m3 of solution present in the reservoir, we expect the amount of salt to approach 100 g. Another example of a differential equation’s slope ﬁeld provides further insights. For the differential equation dy = te t dt

(2.2.6)

its slope ﬁeld for the window −2 ≤ t ≤ 1 and −2 ≤ y ≤ 2 is given in ﬁgure 2.2. We noted earlier that the general solution to this equation is y = te t − e t + C. Moreover, given any initial condition, we can determine C. For example, if y(0) = 1/2, C = 3/2. Likewise, if y(0) = 0, C = 1, and if y(0) = −1, C = 0. If we plot the corresponding three functions with the slope ﬁeld, then (as shown in ﬁgure 2.2) the three members of the family of all solutions to the original differential equation appear as shown. In integral calculus, students learn about families of antiderivatives4 and how two members of such a family differ only by a constant. Here, we see this fact graphically in the slope ﬁeld of ﬁgure 2.2, and can add the perspective that there exists a family of solutions to a certain differential equation. In upcoming sections, we will learn new techniques for how to determine solutions analytically in various circumstances, while not losing sight of the fact that every ﬁrst-order differential equation can be interpreted graphically through a direction ﬁeld. Finally, there is an important type of ﬁrst-order differential equation (DE) for which solutions can be determined algebraically. A ﬁrst-order DE is said to be autonomous if it can be written in the form y = f (y). That is, the independent variable t is not involved explicitly in f (y). For example, the equation y = 1 − y2 is autonomous. 4

An antiderivative F of a function f is a function that satisﬁes F = f .

(2.2.7)

134

First-order differential equations

y(t) 1.6 0.8 t −2.0 −1.5 −1.0 −0.5

0.5

1.0

−0.8 −1.6

Figure 2.2 The slope ﬁeld for (2.2.6) along

with three solution functions for the initial conditions y(0) = 1/2, y(0) = 0, and y(0) = −1.

In addition, a solution y to a DE is called an equilibrium or constant solution if the function y is constant. In (2.2.7), both y = 1 and y = −1 are equilibrium solutions to the DE above. Such a solution is stable if all solutions with initial conditions y(t0 ) = y0 with y0 close to the equilibrium solution result in the overall solution to the IVP tending toward the equilibrium solution. Otherwise, the equilibrium solution is called unstable. We close this section with an example regarding an autonomous differential equation. Example 2.2.1 Consider the differential equation y = (y 2 − 1)(y − 3)2 . Determine all equilibrium solutions to the equation, as well as whether or not each is stable or unstable. Finally, plot the direction ﬁeld for the equation and include plots of the equilibrium solutions. Solution. To ﬁnd the equilibrium solutions, we assume that y is a constant function, and therefore y = 0. Solving the algebraic equation 0 = (y 2 − 1)(y − 3)2 we ﬁnd that y = −1, y = 1, and y = 3 are the equilibrium solutions of the given DE. We can decide the stability of each equilibrium solution by studying the sign of y near the equilibrium value; note that (y − 3)2 is always nonnegative. To consider the stability of y = −1, observe that when y < −1, y = (y + 1) (y − 1)(y − 3)2 > 0, since the ﬁrst two terms are both negative and the third is positive. When y > −1 (and y < 1), it follows y = (y + 1)(y − 1)(y − 3)2 < 0

Deﬁnitions, notation, and terminology

135

y(t) 3 2 1 t −0.5

0.5 −1

Figure 2.3 The slope ﬁeld for y = (y 2 − 1)

(y − 3)2 along with its three equilibrium solutions.

since the middle term is negative while the other two are positive. Hence, if a solution starts just below y = −1, that solution will increase toward −1, whereas if a solution starts just above y = −1, it will decrease to −1. This makes the equilibrium y = −1 stable. These observations are easiest to make visually in the direction ﬁeld. As seen in ﬁgure 2.3, the constant solution y = −1 is stable, since any solution with an initial condition just above or just below y = −1 will tend to y = −1. However, the solution at y = 1 is unstable, since any solution with an initial value just above or just below y = 1 will tend away from 1 (and tend toward y = 3 or y = −1, respectively). Finally, although solutions just below y = 3 tend to 3, any solution that begins just above y = 3 will increase away from that constant solution, and hence y = 3 is also unstable.5

2.2.1 Plotting slope ﬁelds using Maple

Just as our work in linear algebra required the use of Maple’s Linear Algebra package, to take advantage of the software’s support for the study of differential equations we use the DEtools package, loading it with the command > with(DEtools): 5 Some authors call a solution such as y = 3 in this example semi-stable, since there is stability on one side and instability on the other.

136

First-order differential equations

To plot the direction ﬁeld associated with a given differential equation, it is convenient to ﬁrst deﬁne the equation itself in Maple. This is accomplished (for the equation from the reservoir problem) through the following command: > Eq1 := diff(A(t), t) = 10-1/10*A(t);

Note that the differential equation of interest is now stored in “Eq1”. The slope ﬁeld may now be generated by the command > DEplot(Eq1, A(t), t = 0 .. 50, A(t) = 0 .. 200, color = grey, arrows=large);

This command produces the slope ﬁeld of ﬁgure 2.1, but without any particular solution satisfying an initial value included. It is important to note that the range of t and A(t ) values is extremely important. Without a well-chosen window selected by the user, the plot Maple generates may not be very insightful. For example, if the above command were changed so that the range of A(t ) values is 0 .. 10, almost no information can be gained from the slope ﬁeld. As such, we will strive to learn to analyze the expected behavior of a differential equation from its form so that we can choose windows well in related plots; we may often have to experiment and explore to ﬁnd graphs that are useful. Finally, if we are interested in one or more related initial-value problems, a variation of the DEplot command enables us to sketch the graph of each corresponding solution. For example, the command > DEplot(Eq1, A(t), t = 0 .. 50, A(t) = 0 .. 200, color = grey, arrows=large, [[0,200]]);

will generate not only the slope ﬁeld, but also the graph of the solution A(t ) that satisﬁes A(0) = 200, as shown in ﬁgure 2.1. Additional curves for different initial conditions may be plotted by listing the other conditions to be satisﬁed: for example, in the stated command above we could replace [[0,200]] with [[0,200], [0,100], [0,0]] to include the plots of the three solution curves that respectively satisfy A(0) = 200, A(0) = 100, and A(0) = 0. Exercises 2.2 1. Consider the differential equation y = 4y. (a) What is the order of this equation? (b) Show via substitution that the function y = e 2t is a solution to this equation. (c) Are there any other functions of the form y = e rt (r = 2) that are also solutions to the equation? If so, which? Justify your answer.

Deﬁnitions, notation, and terminology

137

2. For a ball thrown straight up from an initial height s(0) = 4 meters at an initial velocity of s (0) = 10 m/s, we know that after being thrown, the only force acting on the ball is gravity, provided we neglect air resistance. Knowing that acceleration due to gravity is constant at −9.81 m/s2 , it follows that s (t ) = −9.81. Use the given information to determine s(t ), the function that tells us the height of the ball at time t . Then determine the maximum height the ball reaches, as well as the time the ball lands. 3. In the differential equation dA /dt = 10 − A /10 from the reservoir problem, explain why the function A(t ) = 100 is an equilibrium solution to the equation. Is it stable or unstable? Why? 4. Consider the logistic differential equation dP P = 0.02P 1 − dt 100 Use Maple to plot the direction ﬁeld for this equation. Print the output and, by hand, sketch the solutions that correspond to the initial conditions P(0) = 10, P(0) = 75, and P(0) = 125. What is the long-term behavior of every solution P(t ) for which P(0) > 0? Are there any constant (or equilibrium) solutions to the equation? Explain what these observations tell you about the behavior of the population being modeled. 5. For the logistic differential equation

dP P = 0.001P 1 − dt 25

how should the direction ﬁeld appear? Use the constant/equilibrium solutions to the equation as well as the long-term behavior of the population to help you sketch, by hand, the direction ﬁeld for this DE. 6. By constructing tangent lines over a grid with at least sixteen vertices, sketch a direction ﬁeld by hand for each of the following differential equations. (a) y = 1 − y (b) y = 12 (t − y) (c) y = 12 (t + y)

(d) y = 1 − t

7. Without using Maple to plot direction ﬁelds, match each of the following differential equations with its corresponding direction ﬁeld. Write at least one sentence to explain the reasoning behind each of your choices. (a)

dy = y −t dt

(b)

dy = ty dt

(c)

dy =y dt

(d)

dy =t dt

138

First-order differential equations

y(t) 1.0

1.0

y(t)

t (ii) 1.0 −1.0

(i) −1.0

−1.0 1.0

t 1.0

−1.0 y(t)

1.0

t

t (iv) 1.0 −1.0

(iii) −1.0

−1.0

y(t)

1.0

−1.0

In exercises 8–15, use integration to ﬁnd a family of solutions for the given differential equation. 8. y = t 2 + 2 9. y = t + cos t 10. y =

t t2 + 1

11. y = t 2 + 2 12. y = 5t 13. y = t sin t 14. y =

1 t 2 + 5t + 6

15. y = te −t

2

Linear ﬁrst-order differential equations

139

In exercises 16–23, solve each of the following initial-value problems. 16. y = t 2 + 2,

y(1) = 4

17. y = t + cos t , y(π/2) = 1 t 18. y = 2 , y(0) = 3 t +1 19. y = t 2 + 2, 20. y = 5t ,

y(−1) = 3, y (−1) = −1, y (−1) = 0

21. y = t sin t , 22. y =

y(1) = 4, y (1) = −2 y(0) = 2

1 t 2 + 5t

23. y = te −t , 2

+6

,

y(0) = 1

y(0) = −1

24. For an n th order IVP of the form y (n) = f (t ), how many initial conditions are needed in order to uniquely determine the solution y(t )? Explain. For each of the autonomous differential equations given in exercises 25–29, algebraically determine all equilibrium solutions to the DE. In addition, plot an appropriate direction ﬁeld and use it to classify each equilibrium solution as stable or unstable. 25. y = 3 − 2y 26. y = −y 2 − 5y − 6 27. y = y − y 3 28. y = e −y (1 + y 2 ) 29. y = (y − 1)(y − 3)2

2.3 Linear ﬁrst-order differential equations

Some classes of differential equations can usually be solved by certain standard techniques. In this section, we consider the class of linear ﬁrst-order differential equations and develop an approach for solving any such equation. Since any ﬁrst-order DE is an equation that involves the functions y and y , it is natural for us to consider the different ways in which y and y may be combined. For example, the equations yy = e t

(2.3.1)

2ty + y sin t = cos t

(2.3.2)

y + sin y = 2

(2.3.3)

140

First-order differential equations

are all ﬁrst-order DEs. Recall that in section 1.12 we discussed linear combinations of generalized vectors. Here we can view y and y as functions that belong to a vector space, and thus think about whether a certain combination of y and y is a linear combination or not. We say that any differential equation of the form (2.3.4) a1 (t )y + a0 (t )y = b(t ) is a linear ﬁrst-order differential equation, since a linear combination of y and y is being formed. Any other ﬁrst-order differential equation is said to be nonlinear. If we stipulate that a1 (t ) = 0, we can divide through by a1 (t ) and hence write y + p(t )y = f (t )

(2.3.5)

as the standard form for a linear ﬁrst-order equation. We call f (t ) the forcing function. Above, note that (2.3.1) and (2.3.3) are nonlinear equations, while (2.3.2) is linear. The simplest linear ﬁrst-order differential equations are those for which the forcing function is zero. We naturally call the equation y + p(t )y = 0

(2.3.6)

a homogeneous linear ﬁrst-order DE. We consider a particular example that shows how every such homogeneous DE may be solved. Example 2.3.1 Solve the differential equation y + (1 + 3t 2 )y = 0. In addition, solve the initial-value problem that is given by the same DE and the initial condition y(0) = 4. Solution. We will use integration to solve for y. Rearranging the given equation, we observe that y = −(1 + 3t 2 )y . Dividing both sides by y, we ﬁnd that y = −(1 + 3t 2 ) y Keeping in mind the fact that y and y are each unknown functions of t , we integrate both sides of the previous equation with respect to t : y dt = −(1 + 3t 2 ) dt y We recognize from the chain rule that the left-hand side is ln y. Thus, integrating the polynomial in t on the right yields ln y = −t − t 3 + C We note that while an arbitrary constant arises on each side of the equation when integrating, it sufﬁces to simply include one constant on the right. Finally, we solve for y using properties of the natural logarithm and exponential functions to ﬁnd that 3 3 y = e −t −t +C = e C e −t −t

Linear ﬁrst-order differential equations

141

Since C is a constant, so is e C , and thus we write y = Ke −t −t

3

Observe that we have found an entire family of functions that solve the original differential equation: regardless of the constant K , the above function y is a solution. If we consider the stated initial-value problem and apply the given initial condition y(0) = 4, we immediately see that K = 4, and the solution to the initial-value problem is 3 y = 4e −t −t The solution method in example 2.3.1 can be generalized to apply to any homogeneous linear ﬁrst-order DE. Using the notation p(t ) to replace the function 1 + 3t 2 , which is the coefﬁcient of y, the same steps above may be used to ﬁnd the solution to the standard homogeneous linear ﬁrst-order differential equation. We state this result in the following theorem. Theorem 2.3.1 For any homogeneous linear ﬁrst-order differential equation of the form y + p(t )y = 0, the general solution is y = Ke −P(t ) , where P is any antiderivative of p. Moreover, for the initial condition y(t0 ) = y0 , if p(t ) is continuous on an interval containing t0 , then the solution to the corresponding initial-value problem is unique. The uniqueness of the solution to the initial-value problem follows from theorem 2.2.1. But perhaps the most important lesson to learn from this result is that a homogeneous linear ﬁrst-order DE can always be solved. This is analogous to our experience with homogeneous linear systems of algebraic equations in chapter 1. In particular, note that by taking K = 0, the zero function (y = 0) is always a solution to y + p(t )y = 0; in addition, the homogeneous linear ﬁrst-order DE has inﬁnitely many solutions. This is very similar to how, for a given matrix A, the homogeneous equation Ax = 0 always has the zero vector as a solution and, in the case where A is singular, Ax = 0 has inﬁnitely many solutions. Having now completely addressed the case of a homogeneous linear ﬁrstorder DE, we turn to the nonhomogeneous case. In particular, we are interested in solving the equation (2.3.7) y + p(t )y = f (t ) where f (t ) is not identical to zero. Recalling the product rule from calculus, d [v(t ) · y ] = v(t )y + v (t )y (2.3.8) dt we observe that the left-hand side of (2.3.7), y + p(t )y, looks similar to the right-hand side of (2.3.8). If we multiply both sides of (2.3.7) by an unknown

142

First-order differential equations

function v(t ), we have v(t )y + v(t )p(t )y = v(t )f (t )

(2.3.9)

v (t )

Next, we observe that if v(t ) is a function such that = v(t )p(t ), then it follows from the product rule that (2.3.9) has the form d

v(t )y = v(t )f (t ) (2.3.10) dt We assume temporarily that such a function v(t ) exists; we will proceed to discuss more about v(t ) shortly. Integrating both sides of (2.3.10), we now see that v(t )y =

v(t )f (t ) dt

(2.3.11)

To solve for y, we divide both sides by v(t ), yielding 1 y(t ) = v(t )f (t ) dt (2.3.12) v(t ) Prior to (2.3.10), we stipulated a condition on v that enabled us to proceed. In particular, we noted that “if v(t ) is a function such that v (t ) = v(t )p(t ),” then we could ﬁnd a solution in terms of v. Observe that the differential equation v satisﬁes is, in fact, a homogeneous linear ﬁrst-order equation itself (v − p(t )v = 0), and therefore its solution is

v(t ) = Ke P(t ) ,

where P(t ) = p(t ) dt . Since we only need one such nonzero function v to proceed, we set K = 1. From this and our conclusion in (2.3.12), we have determined that y(t ) = e −P(t )

e P(t ) f (t ) dt

(2.3.13)

where P(t ) = p(t ) dt . The function v(t ) = e P(t ) is usually called an integrating factor. We next consider two examples of nonhomogeneous linear ﬁrst-order differential equations and apply the method we just derived to solve them.

Example 2.3.2

Solve the differential equation y + 2y = 4.

Solution. In this equation, p(t ) = 2, and therefore P(t ) = 2t . From (2.3.13), it follows that y(t ) = e −P(t ) e P(t ) f (t ) dt = e −2t e 2t · 4 dt = e −2t 2e 2t + C = 2 + Ce

−2t

(2.3.14) (2.3.15)

Linear ﬁrst-order differential equations

143

There are several important observations to make from our work in example 2.3.2. First, the parentheses at (2.3.14) are essential. Without them, e −2t is not multiplied by the entire antiderivative, and the function y would no longer be a solution to the given DE. A second is that if we had instead solved the corresponding homogeneous differential equation y + 2y = 0, we would have found the so-called complementary solution yh = Ce −2t . Moreover, by observing that y = 4 − 2y = 2(2 − y), if we consider the function yp = 2, it is apparent that yp is a solution to the nonhomogeneous equation y + 2y = 4. In addition, if we omit the constant of integration C in (2.3.14), it follows that the method derived in (2.3.13) can be viewed as producing a so-called particular solution yp that is a solution to the given nonhomogeneous linear ﬁrst-order differential equation. Thus we see that the method derived in (2.3.13) and implemented to ﬁnd (2.3.15) ultimately expresses the solution to the original nonhomogeneous linear ﬁrst-order DE in the form y = yp + yh where yp is a particular solution to the nonhomogeneous equation, while yh is the complementary solution, the solution to the corresponding homogeneous equation. This situation reminds us of one way to view the general solution to a system of linear equations given by Ax = b, where in (1.5.1) in section 1.5 we found that x = xp + xh . A further discussion of this property of linear ﬁrst-order DEs will occur in theorem 2.3.2 to close the current section. Before doing so, we consider another example. Example 2.3.3 Solve the nonhomogeneous ﬁrst-order linear differential equation y + y tan t = cos t In addition, solve the initial-value problem (IVP) that is given by the same DE and the initial condition y(π/3) = 1. Solution. We ﬁrst determine the integrating factor v(t ). Since p(t ) = tan t , it follows that P(t ) =

tan t dt = − ln(cos t )

Thus, v(t ) = e − ln(cos t ) . Applying the integrating factor and using properties of exponential and logarithmic functions, we now observe that y = e ln(cos t ) cos t · e − ln(cos t ) dt = cos t

cos t

1 dt cos t

144

First-order differential equations

= cos t

1 dt

= cos t (t + C)

Thus, the general solution to the given differential equation is y = t cos t + C cos t . To solve the corresponding IVP with the condition that y(π/3) = 1, it follows that 1 = π/3 · 1/2 + C · 1/2, so that C = 2 − π/3. The solution is y = t cos t + (2 − π/3) cos t As in example 2.3.2, we note that the solution y = t cos t + C cos t in example 2.3.3 is of the form y = yp + yh , where yh = C cos t can easily be checked to be the solution to the corresponding homogeneous equation. Two important results can now be stated in general. The ﬁrst is a formal statement of our derivation in (2.3.12) that shows how we can use an integrating factor to solve any nonhomogeneous linear ﬁrst-order DE. The second demonstrates that for any of these types of DEs, if yp is a particular solution to the nonhomogeneous DE and yh is a complementary solution to the corresponding homogeneous DE, then y = yp + yh is also a solution to the nonhomogeneous DE. Theorem 2.3.2 For any nonhomogeneous linear ﬁrst-order differential equation of the form y + p(t )y = f (t ), the general solution is y = e −P(t )

e P(t ) f (t ) dt

where P(t ) = p(t ) dt . Moreover, for the initial condition y(t0 ) = y0 , if p(t ) and f (t ) are continuous on an interval containing t0 , then the solution to the corresponding initial-value problem is unique.

The proof of the ﬁrst part of theorem 2.3.2 is given above in the discussion of (2.3.7)–(2.3.12). The uniqueness of the solution to the IVP follows from theorem 2.2.1. Finally, we observe that given a nonhomogeneous linear ﬁrst-order differential equation y + p(t )y = f (t ) and a particular solution yp (so yp + p(t )yp = f (t )) and complementary solution yh to the corresponding homogeneous equation (yh + p(t )yh = 0), it follows that (yp + yh ) + p(t )(yp + yh ) = yp + yh + p(t )yp + p(t )yh = (yp + p(t )yp ) + (yh + p(t )yh ) = f (t ) + 0 = f (t )

Linear ﬁrst-order differential equations

145

Therefore, yp + yh is also a solution to the nonhomogeneous DE. Formally, we have the following result. Theorem 2.3.3 For any nonhomogeneous linear ﬁrst-order differential equation, y + p(t )y = f (t ) if yp is a particular solution to the nonhomogeneous equation and yh is a solution to the corresponding homogeneous equation, then y = yp + yh is also a solution to the nonhomogeneous equation.

Exercises 2.3 In exercises 1–6, classify each equation as linear or nonlinear. Do not attempt to solve the equations. 1. y + 7y = e t 2. cos ty + sin ty = t 2 3. cos y + sin y = t 2 4. ty + t 2 y = t 3 5. y y 2 = 3t 6. 1 = y /y In exercises 7–13, solve each of the given homogeneous linear ﬁrst-order DEs. 7. y + y = 0 8. y + 2y = 0 9. y + ty = 0 2 10. y + y = 0 t 11. y = −y cot t 12. (1 + t 2 )y + 2ty = 0 2 y 100 − t In exercises 14–20, solve each of the given nonhomogeneous linear ﬁrst-order DEs. 13. y = −

14. y + y = 2 15. y + 2y = 2t 16. y + ty = 10t

146

First-order differential equations

2 17. y + y = e t t 18. y = −(y − 1) cot t 19. (1 + t 2 )y + 2ty = 2t 2 y 100 − t In exercises 21–27, solve each of the given initial-value problems. 20. y = 0.03 −

21. y + y = 2,

y(0) = 3

22. y + 2y = 2t , 23.

y + ty

y(1) = 0

= 10t ,

y(0) = 5

2 24. y + y = e t , y(1) = 4, t > 0 t 25. y = −(y − 1) cot t , y(π/2) = 1, 0 < t < π 26. (1 + t 2 )y + 2ty = 2t ,

y(0) = 1

2 y, 100 − t

y(0) = 1

27. y = 0.03 −

In exercises 28–33, plot a slope ﬁeld in an appropriate window of t and y values for each of the given DEs. In addition, in the same window, plot the solution to each given IVP. Compare each graph to the solutions you found in the corresponding exercises 21–27. 28. y + y = 2,

y(0) = 3

29. y + 2y = 2t , 30.

y + ty

= 10t ,

y(1) = 0 y(0) = 5

2 31. y + y = e t , y(1) = 4, t > 0 t 32. y = −(y − 1) cot t , y(π/2) = 1, 0 < t < π 33. (1 + t 2 )y + 2ty = 2t ,

y(0) = 1

34. With matrix multiplication, we noted that for any matrix A and appropriately sized vectors x and y, A(x + y) = Ax + Ay. In addition, for any constant c, A(cx) = cAx. We called these properties “the linearity of matrix multiplication.” In calculus, we learn that the derivative operator, D, satisﬁes similar properties of linearity. In particular, if f and g are differentiable functions and c is any constant, what can you say about D(f + g ) and D(cf )? (Recall that D(f ) is alternate notation for f .)

Applications of linear ﬁrst-order differential equations

147

2.4 Applications of linear ﬁrst-order differential equations

A large number of important physical situations can be modeled by linear ﬁrst-order differential equations. In this section we introduce several such applications through examples and explore further scenarios in the exercises. 2.4.1 Mixing problems

Recall that in section 2.1, we encountered a problem where a saltwater solution was entering and exiting a city’s water reservoir. Speciﬁcally, in (2.1.1) we encountered the DE A(t ) dA = 10 − dt 10 This equation, rewritten in the form 1 A + A = 10 10 is a linear ﬁrst-order DE that we now can easily solve. With p(t ) = 1/10, the integrating factor is v(t ) = e t /10 , and therefore −t /10 A=e e t /10 · 10 dt (2.4.1) = e −t /10 (100e t /10 + C) = 100 + Ce

−t /10

(2.4.2) (2.4.3)

From this result, we can also conﬁrm our previous observation that as t → ∞, A(t ) → 100, for any solution A(t ) to the differential equation. Moreover, if we consider the initial condition A(0) = 200 stated along with the original problem in section 2.1, it follows that A(t ) = 100 + 100e −t /10 Certainly we can consider a wide range of variations on this mixing problem by changing concentrations, ﬂow rates, and tank volumes. In every such scenario, the most important thing to keep in mind is that the rate of change of salt (or whatever quantity is under consideration) is the difference between the rate of salt entering and the rate exiting. Furthermore, an analysis of units is often very helpful. We consider one more example to demonstrate what can occur when the entering and exiting solutions are ﬂowing at different rates. Example 2.4.1 Consider a tank in which 1 g of chlorine is initially present in 100 m3 of a solution of water and chlorine. A chlorine solution concentrated at 0.03 g/m3 ﬂows into the tank at a rate of 1 m3 /min, while the uniformly mixed solution exits the tank at 2 m3 /min. At what time is the maximum amount of chlorine present in the tank, and how much is present?

148

First-order differential equations

Solution. To answer the questions posed, we set up and solve an IVP. We let A(t ) denote the amount of chlorine in the tank (in grams) at time t (in minutes). We note from the inﬂow that the rate at which chlorine is entering the tank is given by m3 g · 0.03 3 (2.4.4) rate in = 1 min m For the exiting ﬂow, we must compute the concentration of chlorine present in the solution leaving the tank. This concentration is given by the ratio of amount present in grams to the total volume of solution in the tank at time t . In this problem, note that the volume is changing as a function of time. In particular, since solution enters at 1 m3 /min and exits at 2 m3 /min, it follows that the volume V (t ) of solution present in the tank is decreasing at a rate of 1 m3 /min. With 100 m3 initially present, we observe that V (t ) = 100 − t is the volume of solution in the tank at time t . Thus, the concentration of chlorine in the solution exiting the tank at time t is given by rate out = 2

2 · A(t ) g m3 A(t ) g · = 3 min V (t ) m 100 − t min

(2.4.5)

It follows from (2.4.4) and (2.4.5) that the overall instantaneous rate of change of chlorine in the tank with respect to time is 2A dA = rate in − rate out = 0.03 − dt 100 − t Note that we also have the initial condition A(0) = 1. Rearranging the differential equation, we see that we must solve the nonhomogeneous linear ﬁrst-order equation 2 A + (2.4.6) A = 0.03 100 − t Applying the approach discussed in section 2.3, followed by the initial condition, it can be shown that the solution to (2.4.6) is A(t ) = 3 − 0.03t − 0.0002(100 − t )2 From the quadratic nature of this solution, as well as from the direction ﬁeld shown in ﬁgure 2.4, we can see that this function has a maximum value. It is a straightforward exercise to show that this maximum of A(t ) occurs when t = 25 min and that the maximum is A = 1.125 g.

2.4.2 Exponential growth and decay

A radioactive substance emits particles; in doing so, the substance decreases its mass. This process is known as radioactive decay. For example, the radioactive isotope carbon-14 emits particles and loses half its mass over a period of 5730 years. For any such isotope, the instantaneous rate of decay is proportional to the mass of the substance present at that instant. Thus, assuming an initial

Applications of linear ﬁrst-order differential equations

149

A(t) 2.0

1.0

t 50

100

Figure 2.4 Direction

ﬁeld for (2.4.6) with solution corresponding to the initial condition A(0) = 1.

mass M0 is present, it follows that the mass M (t ) of the substance at time t must satisfy the initial-value problem M = −kM , M (0) = M0

(2.4.7)

for some positive constant k. Note that the minus sign is present in (2.4.7) since the mass M (t ) is decreasing. It follows from our work with homogeneous linear ﬁrst-order DEs in section 2.3 that the solution to this equation is M (t ) = M0 e −kt

(2.4.8)

Similarly, experiments show that a population with zero death rate (e.g., a colony of bacteria with sufﬁcient food and no predators) grows at a rate proportional to the size of the population at time t . In particular, if P(t ) is the population present at time t and P0 is the initial population, then P satisﬁes the initialvalue problem P = kP, P(0) = P0 , for some positive constant k. Here, it follows that (2.4.9) P(t ) = P0 e kt Problems involving radioactive decay and exponential population growth are very similar and should be familiar to students from past courses in calculus and precalculus. We include one example here for review and several more in the exercises at the end of the section. Example 2.4.2 A radioactive isotope initially has 40 g of mass. After 10 days of radioactive decay, its mass is 39.7 g. What is the isotope’s half-life? At what time t will 1 g remain? Solution. Because the isotope decays radioactively, we know that its mass M (t ) must have the form M (t ) = M0 e −kt . To answer the questions posed, we must ﬁrst determine the constant k. In the given problem, we know that M0 = 40

150

First-order differential equations

and that M (10) = 39.7. It follows that 39.7 = 40e −10k Dividing both sides of the equation by 40, taking natural logs, and solving for k, we ﬁnd that 1 39.7 k = − ln 10 40 To compute the half life, we now solve the equation M0 = M0 e −kt 2 for t . In particular, we have 1

20 = 40e 10 ln

39.7 40 t

Dividing by 40 and taking natural logs, 1 1 39.7 ln t = ln 2 10 40 so ln 12 t = 1 39.7 10 ln 40 Thus the half-life of the isotope is approximately 921 days. Finally, to determine when 1 g of the substance will remain, we simply solve the equation 1

1 = 40e 10 ln Doing so shows that t ≈ 4900 days.

39.7 40 t

2.4.3 Newton’s law of Cooling

Suppose that T (t ) is the temperature of a body immersed in a cooler surrounding medium such as air or water. Sir Isaac Newton postulated (and experiments conﬁrm) that the body will lose heat at a rate proportional to the difference between its present temperature and the temperature of its surroundings. If we assume that the temperature of the surrounding medium is constant, say Tm , and that the warmer body’s initial temperature is T (0) = T0 , then Newton’s law of Cooling can be expressed through the initial-value problem T = −k(T − Tm ), T (0) = T0

(2.4.10)

Written in the standard form of a nonhomogeneous linear ﬁrst-order DE, we ﬁnd that T satisﬁes the IVP T + kT = kTm , T (0) = T0

(2.4.11)

Applications of linear ﬁrst-order differential equations

151

Solving this problem in the standard way reveals that the temperature of the cooling body must satisfy T (t ) = (T0 − Tm )e −kt + Tm (2.4.12) We consider an example with some particular details given in order to analyze the behavior of the temperature function. Example 2.4.3 A can of soda at room temperature 70◦ F is placed in a refrigerator that maintains a constant temperature of 40◦ F. After 1 hour in the refrigerator, the temperature of the soda is 58◦ F. At what time will the soda’s temperature be 41◦ F? Solution. Let T (t ) denote the temperature of the soda at time t in degrees F; note that T0 = 70. Since the surrounding temperature is 40, T satisﬁes the initial-value problem T = −k(T − 40), T (0) = 70 and therefore by (2.4.12) T has the form T (t ) = 30e −kt + 40 In particular, note that the temperature is decreasing exponentially as time increases and tending towards 40◦ F, the temperature of the refrigerator, as t → ∞. To determine the constant k, we use the additional given information that T (1) = 58, and therefore 58 = 30e −k + 40 It follows that e −k = 3/5, and thus k = ln(5/3). To now answer the original question, we solve the equation 41 = 30e − ln(5/3)t + 40 and ﬁnd that t = ln(30)/ ln(5/3) ≈ 6.658 h. Exercises 2.4 1. A population of bacteria is growing at a rate proportional to the number of cells present at time t . If initially 100 million cells are present and after 6 hours 300 million cells are present, what is the doubling time of the population? At what time will 100 billion cells be present? 2. The half-life of a radioactive element is 2000 years. What percentage of its original mass is left after 10 000 years? After 11 000 years? 3. The evaporation rate of moisture from a sheet hung on a clothesline is proportional to the sheet’s moisture content. If one half of the moisture evaporates in the ﬁrst 30 min, how long will it take for 95 percent of the moisture to evaporate?

152

First-order differential equations

4. A population of 200 million people is observed to grow at a rate proportional to the population present and to be increasing at a rate of 2 percent per year. How long will it take for the population to triple? 5. In a certain lake, wildlife biologists determine that the walleye population is growing very slowly. In particular, they conclude that the population growth is modeled by the differential equation P = 0.002P, where P is measured in thousands of walleye, and time t is measured in years. The biologists estimate that the initial population of walleye in the lake is 100 000 ﬁsh. To enhance the ﬁshery, the department of conservation begins planting walleye ﬁngerlings in the lake at a rate of 5000 walleye per year. (a) Write an IVP that the population P(t ) of walleye in the lake in year t will satisfy under the assumption that walleye are being added to the lake at a rate of 5000 ﬁsh per year. (b) Solve the IVP stated in (a). (c) In 20 years, how many more walleye will be in the lake than if the biologists had not planted any ﬁsh? 6. Solve the IVP A = 0.03 − 2/(100 − t ) A, A(0) = 1, in order to verify the stated solution in example 2.4.1. 7. Brine (saltwater) is entering a 25 m3 tank at ﬂow rate of 0.25 m3 /min and at a concentration of 6 g/m3 . The uniformly mixed solution exits the tank at a rate of 0.25 m3 /min. Assume that initially there are 15 m3 of solution in the tank at a concentration of 3 g/m3 . (a) State an IVP that is satisﬁed by A(t ), the amount of salt in grams in the tank at time t . (b) What will happen to the amount of salt in the tank as t → ∞? Why? (c) Plot a direction ﬁeld for the IVP stated in (a), including a plot of the solution. (d) At exactly what time will there be 75 g of salt present in the tank? 8. Brine is entering a 25-m3 tank at ﬂow rate of 0.5 m3 /min and at a concentration of 6 g/m3 . The uniformly mixed solution exits the tank at a rate of 0.25 m3 /min. Assume that initially there are 5 m3 of solution in the tank at a concentration of 25 g/m3 . (a) State an IVP that is satisﬁed by the amount of salt A(t ) in grams in the tank at time t . (b) Solve the IVP stated in (a). For what values of t is this problem valid? Why? (c) At exactly what time will the least amount of salt be present in the tank? How much salt will there be at that time? (d) Plot a direction ﬁeld for the IVP stated in (a), including a plot of the solution. Discuss why this direction ﬁeld and the solution make sense in the physical context of the problem.

Applications of linear ﬁrst-order differential equations

153

9. A body of water is polluted with mercury. The lake has a volume of 200 million cubic meters and mercury is present in a concentration of 5 grams per million cubic meters. Health ofﬁcials state that any level above 1 g per million cubic meters is considered unsafe. If water unpolluted by mercury ﬂows into the lake at a rate of 0.5 million cubic meters per day, and uniformly mixed lake water ﬂows out of the lake at the same rate, how long will it take for the lake to reach a mercury concentration that is considered safe? 10. An average person takes eighteen breaths per minute and each breath exhales 0.0016 m3 of air that contains 4 percent more carbon dioxide (CO2 ) than was inhaled. At the start of a seminar containing 300 participants, the room air contains 0.4 percent CO2 . The ventilation system delivers 10 m3 of fresh air per minute to the room whose volume is 1500 m3 . Find an expression for the concentration level of CO2 in the room as a function of time; assume that air is leaving the room at the same rate that it enters. 11. Solve the general Newton’s law of Cooling IVP T = −k(T − Tm ), T (0) = T0 in order to verify the solution stated in (2.4.12). 12. A potato at room temperature of 72◦ F is placed in an oven set at 350◦ F. After 30 min, the potato’s temperature is 105◦ F. At what time will the potato reach a temperature of 165◦ F? 13. An object at a temperature of 80◦ C is placed in a refrigerator maintained at 5◦ C. If the temperature of the object is 75◦ C at 20 min after it is placed in the refrigerator, determine the time (in hours) the object will reach 10◦ C. 14. An object at a temperature of 9◦ C is placed in a refrigerator that is initially at 5◦ C. At the same time the object is placed in the refrigerator, the refrigerator’s thermostat is adjusted in order to raise the temperature inside from 5◦ C to 10◦ C; the function that governs the temperature of the 10 refrigerator is R(t ) = . 1 + e −0.75t (a) Using the refrigerator’s temperature constant k from exercise 13, modify Newton’s law of Cooling appropriately to state an IVP whose solution is the temperature of the object. (b) Plot a direction ﬁeld for the IVP from (a) and sketch an approximate solution to the IVP. (c) Discuss the qualitative behavior of the solution to the IVP. Estimate the minimum temperature the object achieves. 15. On a cold, winter evening with an outdoor temperature of 4◦ F, a home’s furnace fails at 10 pm. At the time of the furnace failure, the indoor temperature was 68◦ F. At 2 am, the indoor temperature was 60◦ F.

154

First-order differential equations

Assuming the outside temperature remains constant, at what time will the homeowner have to begin to worry about pipes freezing due to an indoor temperature below 32◦ F?

2.5 Nonlinear ﬁrst-order differential equations

So far in our work with differential equations, we have seen that linear ﬁrstorder differential equations have many interesting properties. One is that any IVP that corresponds to a linear ﬁrst-order DE (with reasonably well-behaved functions p(t ) and f (t )) is guaranteed to have a unique solution. In addition, through our development of integrating factors, we have a method by which we can always (at least in theory) determine a solution for the differential equation. Any differential equation that is not linear is called nonlinear. Thus, nonlinear differential equations constitute every other type of equation we can conceive. Unfortunately, nonlinear equations are (in general) far more difﬁcult to solve than linear ones. We will limit ourselves in this section to considering a few relatively common special cases of nonlinear ﬁrst-order differential equations that can be solved analytically. In section 2.6, we will consider qualitative and approximation techniques that enable us to gain valuable information from a nonlinear initial-value problem, even in the event that we cannot solve it explicitly. 2.5.1 Separable equations

In example 2.3.1 in section 2.3, we solved the differential equation y = −(1 + t 2 )y. While this equation is linear, our method provides insight into how to approach a class of nonlinear equations whose structure is similar. We begin by considering a slightly modiﬁed example. Example 2.5.1

Solve the nonlinear ﬁrst-order differential equation y = −(1 + t 2 )y 2

(2.5.1)

Solution. Following our approach in example 2.3.1, we can separate the variables y and t algebraically to arrive at the equation dy = −1 − 3t 2 dt Integrating both sides of this equation with respect to t , dy (y(t ))−2 dt = (−1 − 3t 2 ) dt (2.5.2) dt The left-hand side may be simpliﬁed to y −2 dy. Thus, evaluating each integral in (2.5.2), we ﬁnd that −y −1 = −t − t 3 + C (2.5.3) y −2

Nonlinear ﬁrst-order differential equations

155

We note again that since an arbitrary constant of integration arises on each side, it sufﬁces to include just one. It is essential here to observe that by successfully integrating, we have removed the presence of y in the equation, and now have only an algebraic, rather than differential, equation in t and y. Solving (2.5.3) algebraically for y, it follows y=

1 t+

1 3 3t − C

The strategy of example 2.5.1 may be applied to any differential equation of the form y = f (t , y) where f (t , y) can be decomposed into a product of two functions of t and y only. That is, if we can write f (t , y) = g (t ) · h(y) then we are able to separate the variables in the equation, writing all of the y-terms on one side (multiplied by y ), and writing all of the t -terms on the other. Any differential equation of the form y = g (t ) · h(y) is said to be separable. We attempt to solve a separable differential equation by separating the variables and writing 1 (2.5.4) y = g (t ) h(y) Writing y in the alternate notation dy /dt , we have 1 dy = g (t ) h(y) dt

(2.5.5)

Hence when we integrate both sides of (2.5.5) with respect to t , we ﬁnd 1 dy = g (t ) dt h(y) Now, all of this work is only useful if we arrive at integrals we can actually √ evaluate. For example, if the left-hand side is sin y dy, we are really no closer to solving for y than we were when considering the initial differential equation. In section 2.6, we will address ways to approximate the solution of such equations that we seem unable to solve analytically. For now, we consider a few examples of separable equations that we can solve, with more to follow in the exercises. Example 2.5.2 Find a family of solutions to the differential equation y = e t +2y t and a solution to the corresponding initial-value problem with the condition that y(1) = 1.

156

First-order differential equations

Solution.

First, we may write e t +2y = e t e 2y . Thus, we have

y = e t e 2y t Separating the variables, it follows that dy = te t dt Integrating both sides with respect to t , we may now write −2y e dy = te t dt e −2y

Using integration by parts on the right and evaluating both integrals, we have 1 − e −2y = (t − 1)e t + C 2 To now solve algebraically for y, we ﬁrst multiply both sides by −2. Since C is an arbitrary constant, −2C is just another constant, one that we will denote by C1 . Hence e −2y = −2(t − 1)e t + C1 Taking logarithms and solving for y, we can conclude that 1 y = − ln(−2(t − 1)e t + C1 ) 2 is the family of functions that provides the general solution to the original DE. To solve the corresponding IVP with y(1) = 1, we observe that 1 1 y(1) = − ln(−2(1 − 1)e t + C1 ) = − ln(C1 ) = 1 2 2 so ln(C1 ) = −2, and therefore C1 = e −2 . The solution to the IVP is 1 y = − ln(−2(t − 1)e 1 + e −2 ) 2 Example 2.5.3

Is the following differential equation linear or nonlinear? ty + y 2 = 4 Classify the equation, and solve it to ﬁnd a general family of solutions. Solution. We note that the given equation is nonlinear due to the presence of y 2 in the equation; said differently, the left-hand side is not a linear combination of y and y . To separate the variables, we ﬁrst write ty = 4 − y 2 Dividing both sides by t (4 − y 2 ), it follows that 1 1 dy = 2 4 − y dt t

Nonlinear ﬁrst-order differential equations

and therefore

157

dt dy = 2 4−y t Evaluating both integrals, noting that the left-hand side requires integration by partial fractions or a table of integrals, we have y +2 1 = ln t + C ln 4 y −2

It only remains to solve for y algebraically. Using rules of logarithms and letting C = ln K , we can write y + 2 1/4 ln = ln(Kt ) y −2 It now follows that

y + 2 1/4 = Kt y −2 Raising both sides to the fourth power, multiplying by (y − 2), and solving for y yields (Kt )4 + 1 y =2 (Kt )4 − 1 2.5.2 Exact equations

We will consider one other type of nonlinear differential equation that may be solved analytically. We explore this through an example. Let us solve the DE (2 + t 2 y)y + ty 2 = 0 We ﬁrst observe that this equation is neither linear nor separable. The former is clear from the presence of y 2 and yy ; the latter is less obvious, but nonetheless true since the presence of the term (2 + t 2 y) makes it impossible to separate the variables t and y. We therefore explore another algebraic approach. Considering the derivative in differential notation, we have dy (2 + t 2 y) + ty 2 = 0 dt and thus we may instead write (ty 2 )dt + (2 + t 2 y)dy = 0

(2.5.6)

This form may remind us of the total differential d φ of a function φ (t , y), as studied in multivariable calculus. Recall that for a differentiable function φ (t , y), its total differential d φ is given by d φ = φt dt + φy dy where φt = ∂φ/∂ t and φy = ∂φ/∂ y. Note, therefore, from (2.5.6) that if there exists a function φ such that φt = ty 2 and φy = 2 + t 2 y, then (2.5.6) is actually

158

First-order differential equations

of the form d φ = 0, from which it follows that φ (t , y) = K , for some constant K . Assuming that we can ﬁnd the function φ (t , y), we have then transformed the original differential equation in t and y to an algebraic equation in t and y, one that we can hopefully solve for y. In the current example, let us suppose that such a function φ (t , y) exists, and therefore that ∂φ = ty 2 (2.5.7) ∂t and ∂φ = 2 + t 2y (2.5.8) ∂y Integrating both sides of (2.5.7) with respect to t , it follows that 1 φ (t , y) = t 2 y 2 + g (y) 2 The function g (y) arises since the partial derivative with respect to t of any function of only y is zero. For φ to satisfy the condition in (2.5.8), we see that we must take the partial derivative with respect to y of our most recent result and set this equal to 2 + t 2 y. Doing so, we ﬁnd that ∂φ = t 2 y + g (y) = 2 + t 2 y ∂y

Therefore, g (y) = 2, so g (y) = 2y, and we have found that 1 φ (t , y) = t 2 y 2 + 2y 2 Since it is the case that d φ = 0, we know that φ (t , y) = K , and therefore t and y are related by the algebraic equation 1 2 2 t y + 2y = K 2 From the quadratic formula, it follows that √ −2 ± 4 + 2Kt 2 y= t2 and we have solved the original equation. The choice of “+” or “−” in the solution would depend on the value given in an initial condition. There are several important lessons to learn from this example. One is some terminology. If a differential equation can be written in the form M (t , y)dt + N (t , y)dy = 0

(2.5.9)

and there exists a function φ (t , y) such that φt (t , y) = M (t , y) and φy (t , y) = N (t , y), then since the differential equation is of the form d φ (t , y) = 0, we say that the equation is exact. So, certainly a ﬁrst check of whether an equation might be exact consists in trying to write it in the form of (2.5.9). Still, there is the issue of whether or

Nonlinear ﬁrst-order differential equations

159

not φ exists. If φ does exist, and we further assume that M (t , y) and N (t , y) have continuous ﬁrst-order partial derivatives, then it follows from Clairaut’s Theorem in multivariable calculus that My (t , y) = φty = φyt = Nt (t , y) Thus, if (2.5.9) is exact, then it must be the case that My = Nt . Said differently, if My = Nt , then the differential equation is not exact. In fact, it turns out that if My = Nt , then the equation is guaranteed to be exact, but this result is much more difﬁcult to prove. As a consequence of this, it sufﬁces for us to check if My = Nt as a ﬁrst step; if so, the equation is indeed exact and we then proceed to try to ﬁnd the function φ in order to solve the differential equation. If not, another approach is needed. An example is instructive. Example 2.5.4 Solve the differential equation t y + ln(ty) + 1 = 0 y Solution. We begin by observing that this equation is neither linear nor separable. Thus, writing the derivative in differential notation, we have t dy + ln(ty) + 1 = 0 y dt and then rearranging algebraically, t (ln(ty) + 1)dt + dy = 0 y

(2.5.10)

Letting M (t , y) = ln(ty) + 1 and N (t , y) = t /y, we observe that 1 1 1 My = t = and Nt = ty y y and therefore, My = Nt . Hence the differential equation is exact and we can assume that a function φ exists such that φt = M (t , y) and φy = N (t , y). Since the latter equation is more elementary, we consider φy = t /y, and integrate both sides with respect to y. Doing so, we ﬁnd that φ (t , y) = t ln y + h(t )

(2.5.11)

From (2.5.10), φ must also satisfy φt = ln(ty) + 1, so we take the partial derivative of both sides of (2.5.11) with respect to t to ﬁnd that φt = ln y + h (t ) = ln(ty) + 1

From this and properties of the logarithm, we observe that ln y + h (t ) = ln t + ln y + 1 and thus h (t ) = ln t + 1. It follows (integrating by parts and simplifying) that h(t ) = t ln t . Thus, we have demonstrated that the original equation is indeed

160

First-order differential equations

exact by ﬁnding φ (t , y) = t ln y + t ln t = t ln(ty). From here, we now know that φ (t , y) = K , and so t ln(ty) = K Solving for y, we have that 1 y = e K /t t

Exercises 2.5 Classify each of the DEs in exercises 1–14 as linear, nonlinear, separable, or exact. Note that it is possible for an equation to satisfy more than one classiﬁcation. 1. y = 10y 2. y = 10y + 10 3. y = 10y 2 4. y = 10y 2 − 10 5. t 2 y + y 2 = 1 dy =1 dt 7. tdy − (y − 1)dt = 0 6. e 3t +y

8.

5ty − t dy = dt 4 + t2

dy dy = 6 − 3t 2 dt dt −2ty dy 10. = 2 dt t +1 9. y − t

11. (2 + t 2 )y + 2ty = 0 12. 3y 2 y + t 2 = 0 13. (y + t )y + y = t 14. y sin 2t + 2y cos 2t = 0 Solve each of the DEs in exercises 15–28. 15. y = 10y 16. y = 10y + 10 17. y = 10y 2 18. y = 10y 2 − 10

Nonlinear ﬁrst-order differential equations

161

19. t 2 y + y 2 = 1 dy =1 dt 21. tdy − (y − 1)dt = 0

20. e 3t +y

22.

5ty − t dy = dt 4 + t2

dy dy = 6 − 3t 2 dt dt −2ty dy 24. = 2 dt t +1 23. y − t

25. (2 + t 2 )y + 2ty = 0 26. 3y 2 y + t 2 = 0 27. (y + t )y + y = t 28. y sin 2t + 2y cos 2t = 0 Solve each of the IVPs stated in exercises 29–42. In addition, use a computer algebra system to plot an appropriate direction ﬁeld for each, and sketch your solution within the plot. 29. y = 10y,

y(0) = 3

30. y = 10y + 10, 31. y = 10y 2 , 32.

y

y(0) = 2

y(1) = 4

= 10y 2 − 10,

33. t 2 y + y 2 = 1,

y(1) = −1 y(2) = 0

dy = 1, y(0) = 0 dt 35. tdy − (y − 1)dt = 0, y(1) = 3

34. e 3t +y

36.

dy 5ty − t = , dt 4 + t2

y(1) = 1

dy dy = 6 − 3t 2 , y(1) = 5 dt dt −2ty dy = 2 , y(0) = 4 38. dt t +1

37. y − t

39. (2 + t 2 )y + 2ty = 0, 40. 3y 2 y + t 2 = 0,

y(1) = 1

y(0) = 1

162

First-order differential equations

41. (y + t )y + y = t ,

y(0) = 1

42. y sin 2t + 2y cos 2t = 0, y(π/4) = 1/2 √ 43. Consider the IVP y = y, y(0) = 0. Show that this IVP has more than one solution. Does this result contradict theorem 2.2.1? 2.6 Euler’s method

While we have learned to solve certain classes of differential equations explicitly—including linear ﬁrst-order, separable, and exact equations—we must also develop the ability to estimate solutions to initial-value problems that we cannot solve analytically. Direction ﬁelds will play a key role in motivating our work, as we see in the following introductory example. Consider the initial-value problem dy + y 2 = t , y(0) = 1 dt

(2.6.1)

This DE is not linear due to the presence of y 2 . In addition, since we can write y = t − y 2 , we see that the right-hand side may not be expressed as a product of two functions that each involve just one of the variables t and y. Thus, the equation is not separable. Finally, writing the equation in the form dy + (y 2 − t )dt = 0, it is straightforward to check that this equation is not exact. While it may seem frustrating to not be able to use any of the solution methods we have discussed so far, it is important to realize that many differential equations cannot be solved explicitly by analytic techniques. As such, we must explore how we can use our understanding of derivatives to estimate certain values of the solution to an IVP. For the given DE, writing y = t − y 2 , we can generate the direction ﬁeld that is shown in ﬁgure 2.5. For the initial condition y(0) = 1, visually estimating how the solution y(t ) will ﬂow through the direction ﬁeld, we can roughly estimate that y(1/2) ≈ 0.75. But if we think about the calculus underpinnings of slope ﬁelds, we can be much more precise in our estimate. Recall that a direction ﬁeld for a DE y = f (t , y) is created by observing that the slope of the tangent line to the solution curve y(t ) at the point (t0 , y0 ) is f (t0 , y0 ). In the current example, we know that the solution to the IVP must pass through the point (t0 , y0 ) = (0, 1). At this point, the slope of the tangent line to the solution curve is m = 0 − 12 = −1; note also that m ≈ y /t , where y is the exact change in y from t = 0 to t = 1/2, due to the fact that the tangent line approximates the solution curve for values near the point of tangency. Thus, as we step from t0 = 0 to t = 1/2, a change of 1/2 in the t -direction will generate an approximate change y = t · m = 1/2 · (−1) = −1/2 in y. Therefore, from our original y-value of 1, a change of −1/2 leads us to the approximation that y(1/2) ≈ 1/2.

Euler’s method

163

y(t) 2

1

t −1

1

2

3

−1 Figure 2.5 The direction ﬁeld for (2.6.1).

y(t) 2.0 1.5 1.0 0.5 t 0.5

1.0

1.5

2.0

Figure 2.6 Taking one step to esti-

mate y(0.5) in (2.6.1).

Graphically, this estimation approach amounts to following the tangent line to the solution curve for some prescribed change in t . We can see this in ﬁgure 2.6, where it is immediately evident that our estimate is too small. In calculus, we learn that while the tangent line approximation to a differentiable function is good near the point of tangency, the approximation gets poorer and poorer the further we move from the point of tangency. Thus, a natural approach to the estimation problem at hand is to take a smaller step, then search the direction ﬁeld for a new direction to follow, and then take another small step. In this situation, we are much like a hiker lost in the woods who is attempting to navigate by compass: just as the hiker is best served by checking a compass frequently, so are we best served by checking slopes frequently.

164

First-order differential equations

y(t) 2.0 1.5 1.0 0.5 t 0.5

1.0

1.5

2.0

Figure 2.7 Two steps of size 0.25 to

estimate y(0.5) in (2.6.1).

So, rather than stepping the full distance of 1/2 from t = 0 to t = 1/2, let us ﬁrst step to t = 1/4, ﬁnd an estimate to y(1/4), and then proceed from there to estimate y(1/2). Starting at (0, 1), we know that the slope of the tangent line to the solution curve at this point is m0 = f (0, 1) = −1. Stepping t = 0.25, it follows that we experience a change in y along the tangent line of y = m0 t = −1(0.25) = −0.25. Thus, we have that y(0.25) ≈ y(0) + y = 1 − 0.25 = 0.75. Now we repeat this process from the point (0.25, 0.75). At this point, the slope of the tangent line to the solution curve is m1 = f (0.25, 0.75) = 0.25 − (0.75)2 = −0.3125. Taking a step of t = 0.25, it follows that the change in y along the tangent line will be y = m1 t = −0.3125(0.25) = −0.078125. Thus we have that y(0.5) ≈ 0.75 − 0.078125 = 0.671875. We record our work graphically in ﬁgure 2.7, where our improved approximation is apparent, though the estimate is still too small. It is evident from our work in this ﬁrst example that we can signiﬁcantly improve our ability to estimate an initial-value problem’s solution at various t -values by developing an iterative process that uses reasonably small step sizes. In particular, we want to imitate the way in which we took two steps, but rather be able to take n steps using a step-size of t = h. Throughout, the key idea is always that we are estimating the solution function by determining its tangent line at a given point, and then following the tangent line for the determined step size. We observe that when moving along any line from a given point (told , yold ) to a new point (tnew , ynew ), it follows that ynew = yold + y = yold +

y · t t

= yold + m · t

(2.6.2)

Euler’s method

165

Another essential observation to make is that the slope m at each step of our approximation is given by m = y = f (t , y) in the differential equation that we are attempting to solve. In particular, if we have some approximation at time tk given by yk , the slope of the tangent line to the solution curve at this point is given by f (tk , yk ). Therefore, using this value for m in (2.6.2) and letting h = t be the step size, we now have (2.6.3) ynew = yold + hf (told , yold ) Hence, starting from the initial condition (t0 , y0 ), we are able to generate the sequence of points (t1 , y1 ), . . . , (tn , yn ), where for each n ≥ 0, (2.6.4) tn+1 = tn + h and yn+1 = yn + hf (tn , yn ) The value yn is an approximation of the exact solution value y(tn ) at each step, so that yn ≈ y(tn ) for each n ≥ 1. This method of approximating the solution to an initial-value problem is known as Euler’s method. Example 2.6.1 For the initial-value problem dy + y 2 = t , y(0) = 1 dt that we have just considered, apply Euler’s method to estimate the value of y(1/2) using h = 0.1. Solution. At the end of this section, the implementation of Euler’s method in a spreadsheet such as Excel will be discussed. Here, we simply report the results of such a computer implementation. If we use a step size of h = 0.1, we see that we will take ﬁve steps to move from t0 = 0 to t5 = 0.5, the point at which we seek to approximate y. Doing so yields the output shown in table 2.1. With just ﬁve steps, we can see in the direction ﬁeld in ﬁgure 2.8, together with a piecewise linear plot of the approximate solution, that we have an apparently good estimate in the above table for how the actual solution to this IVP behaves on this interval. In the example we have been considering with various step sizes, one shortcoming is that we do not have a precise sense of how accurate our Table 2.1 Euler’s method applied to the IVP y = t − y 2 , y(0) = 1, using h = 0.1 tn

yn

0 0.1 0.2 0.3 0.4 0.5

1 0.9 0.829 0.7802759 0.749392852 0.733233887

166

First-order differential equations

y(t) 2.0 1.5 1.0 0.5 t 0.5

1.0

1.5

2.0

Figure 2.8 Five steps of size h = 0.1 to estimate y(0.5).

approximations are. One way to explore this issue is to apply Euler’s method to an IVP that we can solve exactly, and then compare our estimates with actual solution values. We do so in the following example. Example 2.6.2 Solve the IVP y = y − t , y(0) = 0.5 exactly, and use Euler’s method with the step sizes h = 0.2 and h = 0.1 to estimate the value of y(1). Hence analyze the effect that step size has on error in the method. Solution. We ﬁrst observe that y = y − t is a linear ﬁrst-order DE. Applying our work from section 2.3, we can determine that the solution to this equation is y = 1 + t + Ce t . The initial condition y(0) = 0.5 then implies that C = −1/2, so that the solution to the IVP is y(t ) = 1 + t −

et 2

If we apply Euler’s method with h = 0.2 and take 5 steps to determine yn at each, and also evaluate y(tn ) at each stage, the resulting output is shown in table 2.2. Here, we observe the obvious pattern that the further we step away from the initial condition, the greater the error we encounter. This is a natural consequence of the use of linear approximations. To get a further sense of how the error at a given step depends on step size, we now apply the same method with h = 0.1. Doing so produces the results in table 2.3. For ease of display and comparison to the case where h = 0.2, we only report the results from every other step. By comparing the approximations in the preceding two tables at the common values of t = 0.2, 0.4, 0.8, 1 we can see that cutting the step size in half appears to have reduced the error by a factor of approximately 2.

Euler’s method

167

Table 2.2 Euler’s method applied to the IVP y = y − t, y(0) = 0.5, using h = 0.2 Euler Est.

Solution

Error

tn

yn

y(tn )

|y(tn ) − yn |

0 0.2 0.4 0.6 0.8 1.0

0.5 0.6 0.68 0.736 0.7632 0.75584

0.5 0.5892986 0.6540877 0.6889406 0.6872295 0.6408591

0 0.0107014 0.0259123 0.0470594 0.0759705 0.1149809

Table 2.3 Euler’s method applied to the IVP y = y − t, y(0) = 0.5, using h = 0.1 Euler Est.

Solution

Error

tn

yn

y(tn )

|y(tn ) − yn |

0 0.2 0.4 0.6 0.8 1

0.5 0.595 0.66795 0.7142195 0.728205595 0.70312877

0.5 0.5892986 0.6540877 0.6889406 0.6872295 0.6408591

0 0.0057014 0.0138623 0.0252789 0.0409761 0.0622697

In fact, there are sophisticated ways by which we can analyze the error of Euler’s method in general; we explore these and related issues in depth in chapter 7 on numerical methods. And while Euler’s method can give us an intuitive sense for how a solution is behaving locally, we must note here that its error grows too fast to make it reliable. More sophisticated algorithms for numerically estimating solutions to differential equations exist; several of these are developed in chapter 7. 2.6.1 Implementing Euler’s method in Excel

Any spreadsheet program provides a straightforward way to implement Euler’s method. In our calculations, we will use Microsoft Excel. Recall that in Euler’s method, given an initial-value problem y = f (t , y), y(t0 ) = y0 , we seek approximations y1 , y2 , . . . such that yn ≈ y(tn ), where tn = t0 + htn for some chosen step size h. In particular, we use the rule yn+1 = yn + hf (tn , yn )

168

First-order differential equations

In a given row of the spreadsheet, we will view the data (as labeled in the cells below) step number n, step size h, t -value tn , approximate current y-value yn , slope f (tn , yn ), and updated y-value yn+1 . We will demonstrate the development of such an Excel spreadsheet for the particular example y = t − y 2 , y(0) = 1 using a step size of h = 0.1. To begin, we establish names for the various columns, say in cells A1, B1, C1, D1, E1, and F1, as shown below by entering the text “n”, “h”, etc., in the respective cells shown below.

1

A

B

C

D

E

n

h

t n

y n

f(t n,y n)

F y n+1

In row 2, we now enter the given data at step zero. In particular, in cell A2 we enter the step number (“0”), in B2 the chosen step size (“0.1”), in C2 the starting t -value (“0”), in D2 the starting y-value (“1”), and in E2, we apply the function f (t , y) to get the slope at the point at this step. That is, since in this IVP f (t , y) = t − y 2 , we enter in E2 the command “=C2 - D2ˆ2”. We now also have enough information entered to compute y1 in cell F2. Using the rule from Euler’s method, we know y1 = y0 + hf (t0 , y0 ). In our spreadsheet, this implies we must enter “=D2 + B2*E2”. Doing so, the result (y1 = 0.9) appears in cell F2. Now our spreadsheet should appear as shown.

A

B

C

D

1

n

h

t n

y n

2

0

0.1

0

1

E

F

f(t n,y n) y n+1 −1

0.9

In row 3, we may now build subsequent entries based on existing data. To increase the step number, in A3 we enter “=A2 + 1”. Since the step-size stays constant throughout, in B3 we input “=B2”. Because the next t -value will be the preceding t -value plus the step size (t1 = t0 + h), we enter in C3 the command “=C2 + B2”. We also have the next y-value, so in D3 we enter “=F2” to have this data available in the given row. The slope at step 1 is computed according to the same rule (given by f (t , y)) as it was at step 0. Hence in cell E3 we simply paste a copy of cell E2, which ensures that Excel uses the same computations, but updates them for the current step. Equivalently, we can directly enter in E3 the text “=C3 - D3ˆ2”. Cell F3 computes the newest y-value: the same rule as in step 0 must be followed, so we can copy and paste cell F2 into F3, or equivalently enter in F3 “=D3 + B3*E3”.

Euler’s method

169

At this stage, we see on the screen the following. A

B

C

D

E

F

1

n

h

t n

y n

f(t n,y n)

y n+1

2

0

0.1

0

1

−1

0.9

3

1

0.1

0.1

0.9

−0.71

0.829

Now we can harness the power of Excel to compute as many subsequent steps as we like. By using the mouse to highlight row 3 (cells A3 through F3), and then placing the cursor on the bottom right corner of cell F3, we can then click and drag downward to ﬁll subsequent rows with similar calculations. For example, doing so through row 5 (i.e., down to F7) yields the following table. A

B

C

D

E

F

1

n

h

t n

y n

f(t n,y n)

y n+1

2

0

0.1

0

1

-1

0.9

3

1

0.1

0.1

0.9

-0.71

0.829

4

2

0.1

0.2

0.829

-0.487241

0.7802759

5

3

0.1

0.3

0.7802759

6

4

0.1

0.4

0.749392852 -0.161589647 0.733233887

7

5

0.1

0.5

0.733233887 -0.037631934 0.729470694

-0.30883048 0.749392852

Besides the ease of iteration past the ﬁrst two rows, there are further advantages Excel offers. One is that changing one appropriately-chosen cell will update all of our computations. For example, if we are interested in the change induced by a different step size, say h = 0.05, all we need to do is enter “0.05” in cell B2, and every other cell will update accordingly. In addition, if we desire to see the graphical results of our work, we can use Excel’s Chart Wizard. To plot our approximations, we can simultaneously highlight the t and y columns in our chart above (cells C2 through C7 and D2 through D7), and then go to Insert menu and select Chart (alternatively, we may click on the Chart Wizard icon on the toolbar). In the prompt window that arises, we choose “XY (Scatter)” and select one of the graph style options at the right by clicking on the desired one. By clicking “Next” in a few subsequent windows (in which advanced users can avail themselves of more options), we eventually get to a ﬁnal window where our graph appears and the option to “Finish.” Clicking on “Finish,” the graph will appear in the spreadsheet and may be moved around

170

First-order differential equations

1.2 1 0.8 Series1

0.6 0.4 0.2 0 0

0.1

0.2

0.3

0.4

0.5

0.6

Figure 2.9 An Excel plot of an approximate solution to the IVP y = t − y 2 , y(0) = 1, for 0 ≤ t ≤ 0.5.

by clicking and dragging it accordingly. We see the resulting plot displayed as in ﬁgure 2.9. Exercises 2.6 1. Consider the IVP y = t /y, y(1) = 3 (where we assume that y is always positive). (a) Program Excel to use Euler’s method to determine an estimate of the value of y(3). Do so using a step size of h = 0.2. Show the results in a table and create an appropriate plot of the approximate solution. (b) Use an established solution method to determine an algebraic formula for the unique solution y(t ) for the given IVP. Then determine y(tn ) exactly and use Excel to determine the error in your approximation at each step n. Finally, compare a plot of y(t ) to your plot of the approximation above. (c) Use a computer algebra system appropriately to plot a direction ﬁeld for the given differential equation. By hand, sketch a solution that satisﬁes the above IVP. Compare your work in (a) and (b) to the direction ﬁeld. 2. Consider the IVP y = (1 − t )(1 + y), y(0) = 2. (a) Program Excel to use Euler’s method to determine an estimate to the value of y(1.6). Do so using step sizes of h = 0.2 and h = 0.1. Show the results in a table and create an appropriate plot of the approximate solution.

Euler’s method

171

(b) Use an established solution method to determine an algebraic formula for the unique solution y(t ) for the given IVP. Then determine y(tn ) exactly and use Excel to determine the error in your approximation at each step n. Finally, compare a plot of y(t ) to your plot of the approximation above. (c) Use a computer algebra system appropriately to plot a direction ﬁeld for the given differential equation. By hand, sketch a solution that satisﬁes the above IVP. Compare your work in (a) and (b) to the direction ﬁeld. 3. Consider the IVP y = (t − y)2 /4, y(0) = 1/2. (a) Program Excel to use Euler’s method to determine an estimate to the value of y(1.5). Do so using step sizes of h = 0.1 and h = 0.05. Show the results in a table and create an appropriate plot of the approximate solution. (b) Explain why you cannot solve the given IVP explicitly. (c) Use a computer algebra system appropriately to plot a direction ﬁeld for the given differential equation. By hand, sketch a solution that satisﬁes the above IVP. Compare your work in (a) to the direction ﬁeld. 2 4. Consider the IVP y = e t − y, y(1) = 4, t > 0. t (a) Program Excel to use Euler’s method to determine an estimate to the value of y(2.2). Do so using step sizes of h = 0.1 and h = 0.05. Show the results in a table and create an appropriate plot of the approximate solution. (b) Use an established solution method to determine an algebraic formula for the unique solution y(t ) for the given IVP. Then determine y(tn ) exactly and use Excel to determine the error in your approximation at each step n. Finally, compare a plot of y(t ) to your plot of the approximation above. (c) Use a computer algebra system appropriately to plot a direction ﬁeld for the given differential equation. By hand, sketch a solution that satisﬁes the above IVP. Compare your work in (a) and (b) to the direction ﬁeld. In each of exercises 5–10, ﬁnd an approximate solution to the stated IVP by using Euler’s method with h = 0.1 on the interval [0, 1]. In addition, ﬁnd an exact solution and compare the values and plots of the approximate and exact solutions. 5. y + 2ty = 0, 6.

y

= 2y − 1,

7. y − y = 0,

y(0) = −2 y(0) = 2 y(0) = 2

172

First-order differential equations

8. (y )2 + 2y = 0, 9.

y y 2

10. (t

= 8,

+ 1)yy

y(0) = 2

y(0) = 1 = −1 − y 2 ,

y(0) = 2

In each of exercises 11–14, ﬁnd an approximation solution to the stated IVP by using Euler’s method with h = 0.1 on the interval [0, 1]. In addition, explain why it is not possible to solve the IVP exactly by established methods. 11. (y )2 − 2y 2 = t , y(0) = 2 12. y − sin y = 2e t , y(0) = 0 13. y + y 3 = t 3 , y(0) = 2 14. (t + 1)yy = −1 − y 2 − t 2 , y(0) = 2 2.7 Applications of nonlinear ﬁrst-order differential equations

In this section, we explore two examples of nonlinear differential equations. It is important to recall that if an equation is nonlinear, it is possible that we may not be able to solve for the solution function explicitly. Regardless, we can use direction ﬁelds to qualitatively understand the behavior of solution curves; furthermore, if we are unable to ﬁnd an exact solution function, we may employ Euler’s method to generate approximate solutions. 2.7.1 The logistic equation

We have recently learned that if a population is assumed to grow at a constant relative growth rate (or in a way such that the rate of change of the population is proportional to the size of the population), then the population function satisﬁes the initial-value problem P = kP , P(0) = P0 This leads to the familiar population model P(t ) = P0 e kt , which is also studied in algebra and calculus courses. While this model is a natural one, it is also unrealistic: over signiﬁcant periods of time, the function P will grow to values that become unreasonable since the function exhibits unbounded growth. Therefore, we now explore a more plausible population model. Let us assume we know that a given population P has the tendency over time to level off at a value A. The value A is often called the carrying capacity of the population; as the name indicates, it is the maximum population sustainable by the surrounding environment. It is natural to further assume that if P is close to, but less than A, then dP /dt will be small and positive, indicating that the population will be growing slowly. Similarly, if P is close to, but greater than A, we will want dP /dt to be negative and close to zero, so that the population will be decreasing slowly.

Applications of nonlinear ﬁrst-order differential equations

173

At the same time, we want to maintain the natural inherent exponential characteristic of growth, so when P is relatively small (in comparison to A), we would like for dP /dt to be approximately kP for some appropriate constant k. The combination of all these criteria led Dutch mathematician Pierre Verhulst (1804–1849) to propose the differential equation P dP = kP 1 − (2.7.1) dt A as a more realistic model of population growth, where k and A are positive constants. Equation (2.7.1) is known as the logistic differential equation. That the logistic equation may be solved in general (to determine an explicit solution P involving k and A) will be shown in the exercises. We consider here a speciﬁc example where k and A are given to provide further insight into the behavior of solutions to this equation. Example 2.7.1 A population P(t ) exhibits logistic growth according to the model dP P = 0.05P 1 − , P(0) = 10 dt 75 (a) Determine the values of P for which P is an increasing function (b) Plot the direction ﬁeld for the differential equation (c) Determine the value(s) of P for which P is increasing most rapidly (d) Solve the IVP explicitly for P Solution. (a) To determine where P is increasing, we require that dP /dt > 0. If P < 0, note that (1 − P /75) > 0, which makes dP /dt < 0, so we need P > 0 and (1 − P /75) > 0 to make dP /dt positive. This occurs on the interval 0 < P < 75, so for these P values, P is an increasing function of t . We note further that if P > 75 or P < 0, then dP /dt < 0 and P is a decreasing function. Finally, it is evident that both P = 0 and P = 75 are equilibrium solutions, which makes sense given the physical interpretation of the population model. (b) Using familiar commands in Maple, we can plot the direction ﬁeld for this differential equation. Note in advance the behavior we expect from our work above: two equilibrium solutions at 0 and 75, plus certain increasing and decreasing behavior. Finally, note that our analysis of the equation suggests a good range of values to select for P when plotting, say, P = −10 . . . 100. As always, some experimentation with t may be necessary to get a useful plot. The plot is shown in ﬁgure 2.10.

174

First-order differential equations

P(t) 100 75 50 25 t 25

50

75

100

Figure 2.10 The slope ﬁeld for

dP /dt = 0.05P(1 − P /75).

(c) To decide where P is increasing most rapidly, we seek the maximum value of P . Graphically, we can observe in ﬁgure 2.10 that this appears to occur approximately halfway between P = 0 and P = 75. This is reasonable in light of the physical meaning of the logistic equation, since at this point the population has accumulated some substantial numbers to increase its growth rate, while not being close enough to the carrying capacity to have its growth slowed. We can determine this point of greatest increase in P analytically as well. Note that P = 0.05P(1 − P /75) = 0.05P − 0.0006P 2 , so that P is determined by a quadratic function of P. We have already observed that this quadratic function has zeros at the equilibrium solutions (P = 0 and P = 75), and furthermore, we know that every quadratic function achieves is extremum (a maximum in this case, since the function g (P) = 0.05P − 0.0006P 2 is concave down) at the midpoint of its zeros. Hence, P is maximized precisely when P = 75/2. (d) Our ﬁnal task is to solve the given initial-value problem explicitly for P. We ﬁrst solve the differential equation dP = 0.05P (1 − P /75) dt for P. Note that this equation is separable and nonlinear. Separating variables, we ﬁrst write dP = 0.05dt (2.7.2) P(1 − P /75) Because the left-hand side is a rational function of P, we may use the method of partial fractions to integrate the left-hand side of (2.7.2). Observe that 75 1 = P(1 − P /75) P(75 − P)

Applications of nonlinear ﬁrst-order differential equations

175

Now, letting A B 75 = + P(75 − P) P 75 − P it follows that A = 1 and B = 1, so that (2.7.2) may now be written as 1 1 − dP = 0.05 dt (2.7.3) P P − 75 Integrating both sides of (2.7.3), we ﬁnd that P must satisfy the equation ln |P | − ln |P − 75| = 0.05t + C Using a standard property of logarithms, the left-hand side may be expressed as ln |P |/|P − 75|, and hence using the deﬁnition of the natural logarithm, it follows that P 0.05t +C = Ke 0.05t P − 75 = e where K = e C . Since K is an arbitrary constant, the sign of K will absorb the ± that arises from the presence of the absolute value signs, and thus we may write P = Ke 0.05t P − 75 Multiplying both sides by P − 75 and expanding, we see that P = PKe 0.05t − 75Ke 0.05t and gathering all terms involving P on the left, P(1 − Ke 0.05t ) = −75Ke 0.05t Thus, it follows that −75Ke 0.05t 1 − Ke 0.05t Multiplying the top and bottom of the right-hand side by −1/(Ke 0.05t ), it follows that 75 P= 1 − Me −0.05t where M = 1/K . In this ﬁnal form, it is evident that as t → ∞, P(t ) → 75, which ﬁts with the given carrying capacity in the original problem. At this point, we can use the initial condition P(0) = 10 to solve for M ; doing so results in the equation 10 = 75/(1 − M ), which yields that M = −13/2, and thus 75 P= 13 −0.05t 1+ 2 e A plot of this function (shown in ﬁgure 2.11), along with comparison to our work throughout this example, demonstrates that our solution is correct.

P=

176

First-order differential equations

80

y

40

t 50

100

150

Figure 2.11 The solution P = 75/ −0.05t ) to the IVP dP /dt = (1 + 13 2 e 0.05P(1 − P /75), P(0) = 10.

For the general logistic differential equation dP P = kP 1 − dt A an argument similar to the one we just completed can be used to show that the solution to this equation is A , 1 + Me −kt where M is a constant that may be determined by an initial condition. This fact will be shown in exercise 1 for this section. P(t ) =

2.7.2 Torricelli’s law

Suppose that a water tank has a hole in its base with area a, through which water is ﬂowing. Let h(t ) be the depth of the water and V (t ) be the volume of water in the tank at time t . At what rates are h(t ) and V (t ) changing? Evangelista Torricelli (1608–1647) discovered what has come to be known as Torricelli’s law, which describes the way water in an open tank will ﬂow through a small hole in the bottom. To develop this law, let us consider6 how water molecules will rearrange themselves as water exits the tank and the relationship between the potential and kinetic energy of a small mass m of water. The potential energy lost as a small mass m of water falls from a height h > 0 is mgh, where g is the gravitational constant; at the same time, the kinetic energy gained as an equal mass m exits the tank is 12 mv 2 , where v is the velocity at which the water is ﬂowing. Equating the potential and kinetic energy, we ﬁnd 6 Our approach follows that of R. D. Driver in “Torricelli’s law: An Ideal Example of an Elementary ODE,” Amer. Math. Monthly, 105(5) (May 1998), pp. 453–455.

Applications of nonlinear ﬁrst-order differential equations

177

that mgh = 12 mv 2 , so that v = 2gh This model assumes that no friction is present; a slightly more realistic model takes a fraction of this velocity, depending on the viscosity. For simplicity, we will consider the ideal case where friction is not considered. If we now consider the water exiting the tank, it follows that the rate of change dV /dt of volume in the tank is determined by the product of the area a of the hole and the exiting water’s velocity v. In other words, dV = −av = −a 2gh (2.7.4) dt At this point, observe that we have related the rate of change of volume to the height of the water in the tank at time t . Instead, we desire to either relate dV /dt and V or dh /dt and h. Of course, height and volume are related. If we assume that A(y) denotes the tank’s cross sectional area at height y, then integral calculus tells us that the volume of the tank up to height h is given by h V (h) = A(y)dy 0

Furthermore, by the Fundamental Theorem of Calculus, differentiating V (h) implies dV /dh = A(h), and thus by the chain rule, dV dh dh dV = = A(h) dt dh dt dt Using this new expression for dV /dt in (2.7.4), it follows that dh = −a 2gh (2.7.5) dt which is a differential equation in h. In particular, this nonlinear equation predicts, given a tank of a particular shape (as determined by A(h)) with a hole of area a, the behavior of the function h(t ) that describes the height of the water at time t . We explore this further in the following example. A(h)

Example 2.7.2 For a cylindrical tank of height 2 m and radius 0.3 m, ﬁlled to the top with water, how long does it take the tank to drain once a hole of diameter 4 cm is opened? Solution. In this situation, the cross sectional area A(h) of the tank at height h is constant because each is a circle of radius 0.3, so that A(h) = 0.09π . In addition, the area of the hole in square meters is a = π (0.02)2 = 0.0004π , and the gravitational constant is g = 9.8 m/s2 . Since we have already established that A(h)dh /dt = −a 2gh, we therefore conclude that h satisﬁes the equation √ dh = −0.0004π 19.6h 0.09π dt

178

First-order differential equations

h(t) 2.0

1.0

t 50

100

150

Figure 2.12 The√slope ﬁeld for dh /

dt = −0.019676 h.

Simplifying, it follows that √ dh = −0.019676 h dt

Separating variables, we have h −1/2 dh = −0.019676dt and upon integrating, it follows that 2h 1/2 = −0.019676t + C Thus, h(t ) = (C0 − 0.009838t )2 √ √ Because h(0) = 2, C0 = 2. Furthermore, with h(t ) = ( 2 − 0.009838t )2 , we can see that h(t ) = 0 when t = 143.75 sec, at which time the tank is empty. A plot of h(t ) conﬁrms precisely the behavior observed in the direction ﬁeld in ﬁgure 2.12.

Exercises 2.7 1. For a population P(t ) that exhibits logistic growth according to the general model P dP , P(0) = P0 = kP 1 − dt A (a) Determine the values of P (in terms of A and k) for which P is an increasing function. (b) Sketch by hand the direction ﬁeld for the differential equation, clearly indicating the role of the constant A in your sketch. (c) Determine the value(s) of P (in terms of A and k) for which P is increasing most rapidly, and justify your answer.

Applications of nonlinear ﬁrst-order differential equations

179

(d) Solve the initial-value problem explicitly for P to show that A P(t ) = 1 + Me −kt and determine M in terms of A and P0 . 2. The growth of an animal population is governed by the equation 500 dP = 50 − P P dt where P(t ) is the number of animals in the herd at time t . The initial population is known to be 125. Determine the solution P(t ), sketch its graph, and decide whether there will ever be more than 125 or fewer than 50 animals present. 3. Consider the differential equation dP /dt = −0.02P 2 + 0.08P. (a) What are the equilibrium solutions to this equation? (b) Determine whether each equilibrium solution is stable or unstable. (c) At what value of P is the function growing most rapidly? (d) Under the initial condition P(0) = 0.25, determine the time at which P(t ) = 3. 4. Consider a ﬁsh population that grows according to the model dP = 0.05P − 0.000005P 2 dt where t is measured in years, and P is measured in thousands. (a) Determine the population of ﬁsh at time t if initially P(0) = 1000. What is the carrying capacity of the population? (b) Suppose that the ﬁsh population is established as growing according to the above model in the absence of ﬁsh being removed from the lake. Suppose that harvesting begins at a rate of 20 000 ﬁsh per year. How does the differential equation governing the ﬁsh population change? Explain. (c) Plot a direction ﬁeld for the updated differential equation you found in part (b). Discuss the new equilibrium solutions for the ﬁsh population. Can you solve the IVP with P(0) = 1000? (d) How would the DE change if wildlife biologists began planting 30 000 ﬁsh per year in the lake, and no harvesting occurred? 5. Solve the initial-value problem dP = 6 − 7P + P 2 , P(0) = 2 dt Sketch your solution curve P(t ) and explain why it makes sense in light of the equilibrium solutions to the given equation and your understanding of where dP /dt is positive and negative.

180

First-order differential equations

6. A cruise ship leaves port with 2500 vacationers aboard. At the time the boat leaves the dock, ten recent visitors of an amusement park are sick with the ﬂu. Let S(t ) denote the number of people at time t who have had the ﬂu at some time since leaving port. (a) Assuming that the rate at which the ﬂu virus spreads is directly proportional to the product of the number of people who have had the ﬂu times the number of people not yet infected, write a differential equation whose solution is the function S(t ). Explain why the differential equation is a logistic equation. (b) Solve the differential equation you found in (a). Assume that four days into the trip, 150 people have been sick with the ﬂu. Clearly show how all constants are identiﬁed, and sketch a graph of your solution curve. (c) How many people have been sick seven days into the trip? How long would the boat have to stay at sea for half the vacationers to get ill? 7. A cylindrical tank of height 4 m and radius 1 m is full of water. A small hole of diameter 1 cm is opened in the bottom of the tank. Use Torricelli’s law to determine how long it will take for all the water to drain from the tank. 8. A cylindrical tank of height 1.2 m and radius 30 cm is originally full of water. A small hole is opened in the bottom of the tank, and after 15 min, the water in the tank has dropped 10 cm. According to Torricelli’s law, how large is the hole and how long will it take the tank to drain? √ 9. Consider a tank that is generated by taking the curve x = y and revolving it about the y-axis. Assume that the tank is full of water to a depth of 1.2 m and that a hole of diameter 1 cm is opened in the bottom. Use Torricelli’s law to determine how long it will take for all the water to drain from the tank. 10. Suppose a hemispherical bowl has top radius of 30 cm and at time t = 0 is full of water. At that moment a circular hole with diameter 1.2 mm is opened in the bottom of the tank. Use Torricelli’s law to determine how long it will take for all the water to drain from the tank. 11. For an open cylindrical tank, Torricelli’s law tells us that if a small hole is opened, the height of the water at time t obeys the IVP √ dh = −k h , h(t0 ) = h0 dt where k is a constant that depends on the radius of the tank and the radius of the hole. In this exercise, we will take k = 1. (a) Explain why theorem 2.2.1 does not guarantee a unique solution to the IVP √ dh = − h , h(1) = 0 dt (b) Explain why it is physically impossible to determine the height of the water at time t < 1 in a tank which satisﬁes h(1) = 0.

For further study

(c) Show that for any c < 1, the function ! 2 1 c − 21 t 2 h(t ) = 0

181

if t < c if t ≥ c

is a solution to the IVP in (a). (d) Explain how the result of (c) can be interpreted physically in light of the time when the tank becomes empty. Compare your ﬁndings to those in (a) and (b). 2.8 For further study 2.8.1 Converting certain second-order DEs to ﬁrst-order DEs

Linear second-order differential equations such as y + p(t )y + q(t )y = f (t )

(2.8.1)

will be the focus of upcoming work in chapters 3 and 4. But there are some second-order equations we can solve at present. For example, if q(t ) = 0 in (2.8.1), then we can perform a process called reduction of order to convert the equation to a ﬁrst-order one. (a) Consider the second-order equation y + p(t )y = f (t ). Using the substitution u = y , convert the equation to a new ﬁrst-order DE involving the function u. (b) Use a standard solution technique to state the solution u to the differential equation in (a) in terms of p(t ) and f (t ). (Your answer will involve integrals.) (c) Explain how you would use your result in (b) to ﬁnd the solution y to the original DE. (d) Use reduction of order to solve each of the following second-order IVPs. (i) y + 2y = 4,

y(0) = 2,

(ii) y + tan(t )y = t , (iii) y + (iv) y +

2t y = t 2, 1+t 2 1 4−t y = 4 − t ,

y (0) = 1

y(0) = 1,

y (0) = 0

y(0) = 0,

y (0) = 1

y(0) = 1,

y (0) = 1

(e) Reduction of order can be performed on certain nonlinear differential equations as well. For instance, suppose that we have an equation of form y = g (y )h(t )

(2.8.2)

Show that the substitution u = y converts (2.8.2) to a ﬁrst-order equation in u. Explain how you would approach solving the new equation in u.

182

First-order differential equations

(f) Solve each of the following second-order IVPs. (i) y = (y )2 t 2 , (ii) y =

t + t (y )2 , y

(iii) y = e 2t +y , (iv) y =

y(0) = 1,

y ,

y (0) = 0

y(0) = 2,

y(0) = 0, y(0) = 3,

y (0) = 1

y (0) = 0 y (0) = 5

2.8.2 How raindrops fall

The following questions and discussion are based on the article “Falling Raindrops” by Walter J. Meyer7 . When a raindrop falls, various forces act upon it. We explore several different models that show the importance of adjusting assumptions appropriately to match physical conditions. Let us ﬁrst assume that the only force acting upon the raindrop is the acceleration due to gravity. Under this assumption, Galileo (1564–1642) hypothesized that the falling raindrop would gain an extra 32 ft/s in velocity for every second for which it falls. In other words, the acceleration of the raindrop is constant and equal to 32 ft/sec2 . (a) Let y(t ) denote the distance (in feet) traveled by the rain drop after it has been falling for t seconds. Write an initial-value problem involving y(t ) based on the above assumption. Solve this IVP; be sure to introduce appropriate initial conditions based on the context of the problem. (b) Assuming that the raindrop starts from rest at an elevation of 3000 ft, how long does it take the raindrop to fall to earth? What is the raindrop’s velocity when it hits the ground? Why is this model unrealistic? (c) We next must attempt to account for the air resistance the raindrop encounters through a slightly more sophisticated model. For a raindrop having diameter d ≤ 0.00025 ft, this model, sometimes known as Stoke’s law, states that the acceleration of the raindrop due to gravity is opposed by an acceleration directly proportional to the velocity of the raindrop at that instant. Suppose that the constant of proportionality is given by c /d 2 , where c ≈ 3.29 × 10−6 ft2 /s is an experimentally determined constant. Write a new IVP (again involving y(t ) and its relevant derivatives) for the raindrop having diameter d. Do not yet attempt to solve this equation. Leave d as an unknown constant. (d) Letting v = y and using the fact that the raindrop starts from rest, convert the IVP in (c) to a ﬁrst-order IVP involving v. Using d = 0.00012 ft (which can be considered a drizzle), produce a slope ﬁeld corresponding to the 7

See Applications of Calculus, MAA Notes Number 29, pp. 101–111.

For further study

183

differential equation in v. On this slope ﬁeld, sketch a graphical approximation of the solution to the stated IVP. Describe the behavior of the raindrop’s velocity based on the slope ﬁeld you constructed in the problem above. (e) In the model in (d), we will say that the long-term limiting velocity of the raindrop is its terminal velocity, denoted vterm . Calculate this terminal velocity by using the IVP to answer the following questions: What is the initial velocity of the raindrop? What is the equilibrium solution of the differential equation? What happens to the velocity of the raindrop if it ever reaches the equilibrium value? Why, in view of the differential equation, must the velocity of the raindrop increase from its initial value to the equilibrium value? (f) Use your result from (e) to determine the terminal velocities for raindrops having diameters of 0.00009, 0.00012, and 0.00015 ft, respectively. Graph vterm as a function of d, and comment on the phenomena observed. (g) Solve the IVP from (d) explicitly for v. Graph your solution, and then use your solution to calculate vterm as well. (h) Assuming that a raindrop of diameter 0.00012 ft starts from rest at 3000 ft, how long does it take the raindrop to fall to the ground? What is its velocity at the instant it hits the ground? Do your answers surprise you? Is it raining hard or barely raining when raindrops are this size? (i) When the diameter of the raindrop becomes too large, the force of air resistance on the raindrop becomes so appreciable that Stoke’s model loses accuracy as well. This leads to a third model, known as the velocity-squared model. This model states that when a raindrop has diameter d ≥ 0.004 ft, the acceleration due to gravity is opposed by an acceleration directly proportional to the square of the velocity of the raindrop at that instant. Here the constant of proportionality is given by k /d, where k ≈ 0.00046. (j) Repeat questions (c), (d), and (e) for the velocity-squared model. Compare your ﬁndings with those of Stoke’s model. For example, how do the terminal velocities of small raindrops compare with those of large raindrops? For which type of raindrop, small or large, does the terminal velocity increase more rapidly as a function of diameter? (k) Finally, explicitly solve the IVP arising from the velocity-squared model for the velocity function v(t ). Graph your solution v(t ) for an appropriate choice of d and compare the result to the results in (j). 2.8.3 Riccati’s equation

The Ricatti equation y + p(t )y + q(t )y 2 = f (t )

(2.8.3)

184

First-order differential equations

and its study are attributed to the Italian mathematician Jacobo Riccati (1667–1748). Observe that this nonlinear equation is a modiﬁcation of the standard linear ﬁrst-order equation y + p(t )y = f (t ). Through the following steps, we will use a change of variables to transform the Riccati equation into a linear, second-order differential equation. (a) We consider a change of variables to convert (2.8.3) from being a differential equation in y to a new equation in v. Let v be a function that satisﬁes the relationship v = q(t )y(t )v(t ) (i) Differentiate v = qyv with respect to t to show that v = (qyv) = q yv + qy v + qyv

(2.8.4)

(ii) Show that q yv = q v /q. (b) Multiply both sides of the Riccati equation (2.8.3) by qv and use (i) and (ii) to show that the left-hand side may be written q 2 2 vqy + vqpy + vq y = v + p − v (2.8.5) q (c) Use your work in (b) to show that the Riccati equation may now be re-expressed as the second-order equation in v given by q v + p − v − vqf = 0 (2.8.6) q (d) Explain how you would solve the Riccati equation in the special case when f (t ) = 0. Note particularly that to solve (2.8.6) with f (t ) = 0, you must reduce the order of the equation through an appropriate substitution, say u = v . See section 2.8.1 for further details on this technique. In addition, note that your goal is to ﬁnd the solution y to the original equation (2.8.3). Be sure to explain how the functions v and u are used in this process. (e) Solve the following differential equations, each of which is a Riccati equation. (i) y + 2y + 4y 2 = 0 (ii) y + 1t y + t 2 y 2 = 0 (iii) y + y tan t + y 2 cos t = 0 2.8.4 Bernoulli’s equation

The Bernoulli brothers, James (1654–1705) and John (1667–1748), contributed to the solution of y + p(t )y = q(t )y n , n = 1

(2.8.7)

For further study

185

the so-called Bernoulli equation. We will explore the approach credited to John through the following prompts. Similar to the Riccati equation, the Bernoulli equation may be transformed into a linear differential equation through a clever change of variables. (a) First, multiply (2.8.7) by y −n to obtain y −n y + p(t )y 1−n = q(t )

(2.8.8)

Next, consider the change of variables v = y 1−n . Compute v to show that v = (1 − n)y −n y

(2.8.9)

Now use (2.8.8) and (2.8.9) to show that v satisﬁes the linear ﬁrst-order equation (2.8.10) v + (1 − n)p(t )v = (1 − n)q(t ) (b) Explain why in the cases when n = 1, n = 2, q(t ) = 0, and p(t ) = 0 the Bernoulli equation reduces to familiar equations whose solutions are known. (c) Solve these differential equations, each of which is a Bernoulli equation. (i) y + 2y = ty 3 (ii) y + 1t y = 3y 3 (iii) y + y cot t = y 3 sin t

This page intentionally left blank

3 Linear systems of differential equations

3.1 Motivating problems

In section 1.1, we considered how the amount of salt present in a system of two tanks can be modeled through a system of differential equations. In that particular example, we assumed that the volume of solution in each tank (as seen in ﬁgure 3.1) remains constant and all inﬂows and outﬂows happen at the identical rate of 5 liter/min, and further that that the tanks are uniformly mixed so that the salt concentration in each is identical throughout each tank at a given time t . With the additional premises that the volume of solution in tank A is 200 liters and the independent inﬂow entering A carries water contaminated with 4g/liter of salt, we can develop a differential equation that models x1 (t ), the amount of salt (in grams) in tank A at time t . Likewise, by presuming that tank B holds solution of volume 400 liters and the inﬂow entering B carries a concentration of salt of 7g/liter, a similar analysis produces a differential equation whose solution is x2 (t ), the amount of salt (in grams) in tank B at time t . In particular, we found in (1.1.6) that the following system of differential equations arose: dx1 x1 x2 =− + + 20 dt 20 80 x1 x2 dx2 = − + 35 dt 40 40

(3.1.1)

With our experience in linear algebra, we can now represent this system in matrix notation. In particular, if we simultaneously consider the amounts of 187

188

Linear systems of differential equations

B

A

Figure 3.1 Two tanks with inﬂows, outﬂows,

and connecting pipes.

salt x1 (t ) and x2 (t ) as entries in the vector function x1 (t ) x(t ) = x2 (t ) we know that x (t ) =

dx1 /dt dx2 /dt

(3.1.2)

Moreover, in (3.1.1) we recognize the familiar form of a matrix product in the terms involving x1 and x2 . Speciﬁcally, −x1 /20 + x2 /80 1/80 x1 −1/20 = (3.1.3) 1/40 −1/40 x2 x1 /40 − x2 /40 With the observations from (3.1.2) and (3.1.3) substituted into (3.1.1) and replacing the quantities 20 and 35 with the appropriate vector, we may now write the system of differential equations in the form 1/80 20 −1/20 x = x+ (3.1.4) 1/40 −1/40 35 Letting A be the matrix of coefﬁcients that multiplies the vector x and b the vector [20 35]T , we can also write the system in (3.1.4) in the simpliﬁed form x = Ax + b

(3.1.5)

This form reminds us of the familiar nonhomogeneous linear ﬁrst-order differential equation with constant coefﬁcients, for instance, an equation such as y = 2y + 5

(3.1.6)

In this chapter, we will study similarities between (3.1.5) and (3.1.6) with the speciﬁc goal of learning how to completely solve nonhomogeneous linear systems of differential equations with constant coefﬁcients such as the system (3.1.4). We will be especially interested in the role that linear algebra plays in identifying certain characteristics of the coefﬁcient matrix A that enable us to ﬁnd all solutions to the system. Before we proceed to an in-depth study of linear systems of differential equations, at least one more motivating example is appropriate. A spring-mass system

Motivating problems

y

189

y

−y(t)

−y(t)

displacement −y(t) t

equilibrium

mass

t

Figure 3.2 A spring-mass system shown at two different points in time; −y(t ) denotes

the displacement of the mass from equilibrium (where displacements below the t -axis are considered positive).

is a physical situation that models vibrations; for example, such a system arises any time a mass attached to a spring is set in motion. We choose to envision this situation vertically, as seen in ﬁgure 3.2, though one can also imagine the mass resting on a table and moving horizontally. We consider some of the physics of basic springs and motion under the inﬂuence of gravity in order to develop a differential equation that describes the spring-mass system. Initially, the mass will stretch the spring from its natural length. Hooke’s law states that the force necessary to stretch a spring a distance x from its natural length is given by the equation F (x) = kx where k is the spring constant. Assume that the mass stretches the spring a distance L0 . Then from Hooke’s law, when the system is in equilibrium, we see that the force Fs exerted by the spring must be Fs = −kL0 Here the minus sign indicates that the force is opposing the natural downward displacement of the spring. Note particularly that we view the downward direction as positive. We also know that gravity acts on the mass with force Fg given by Fg = mg If the system is in static equilibrium, we know that the sum of the two forces is zero. In other words, Fg + Fs = 0 and therefore mg = kL0 Once the system is set in motion by some initial force or displacement, we track the location of the mass at time t with a function y(t ). In particular, y(t ) represents the displacement of the mass from the equilibrium position at time t ; note that y = 0 is the equilibrium position of the system. We continue to

190

Linear systems of differential equations

designate the downward direction as positive, so y(t ) > 0 means that the mass is below the equilibrium position, while y(t ) < 0 means the mass is above the equilibrium position. We can see the role y(t ) plays in ﬁgure 3.2 as it tracks the displacement of the mass from equilibrium and thus traces out a curve with respect to time. We can now use Newton’s second law to obtain a differential equation that governs the system. The forces that act on the mass are: • Gravity, with Fg = mg . • The spring force Fs . Note now that at a given time t the displacement of the spring from its natural length is L0 + y(t ), so that by Hooke’s law we have Fs = −k(L0 + y). • A possible damping force Fd . Motion may be damped due to air resistance, friction, or some sort of external damping system (usually called a dashpot). We assume that damping forces are directly proportional to the velocity of the mass. Under this assumption, it follows that Fd = −cy . Again, the minus sign indicates that this force opposes the motion of the mass. The positive constant c is called the damping constant. • Finally, there may be an external driving force present (such as the periodic force that drives a piston in an engine). We call this a forcing function F (t ); the role of forcing functions will be considered in detail later on in this chapter. Newton’s second law demands that the resultant force (that is, the sum of all the forces) on the mass must be equal to ma, where a is the body’s acceleration (which is also y ). Summing all the aforementioned forces and equating the result with ma = my , we ﬁnd my = Fg + Fs + Fd + F (t )

(3.1.7)

Using the formulas we developed earlier and substituting in (3.1.7) yields my = mg − k(L0 + y) − cy + F (t )

(3.1.8)

Now recall that mg − kL0 = 0, rearrange (3.1.8), and divide by m. This leads us to the standard form of the differential equation that governs a spring mass system, c k 1 (3.1.9) y + y = F (t ) m m m Note that (3.1.9) is a nonhomogeneous linear second-order differential equation. To see how such a second-order linear differential equation is linked to a system of linear differential equations, let’s consider the speciﬁc example where c = 1, m = 1, k = 6, and F (t ) = 0, which results in the equation y +

y + y + 6y = 0

(3.1.10)

The eigenvalue problem revisited

191

If we introduce the functions x1 and x2 through the substitutions y = x1 and y = x2 , then x1 (t ) represents the displacement of the mass at time t and x2 (t ) is the velocity of the mass at time t . Observe ﬁrst that x1 = x2 Moreover, since Equivalently,

x2

=

y ,

we can rewrite (3.1.10) as

(3.1.11)

x2

+ x2 + 6x1 = 0.

x2 = −6x1 − x2

(3.1.12)

Thus (3.1.11) and (3.1.12) generate the system of differential equations x1 = x2

x2 = −6x1 − x2

which may also be expressed in matrix form as 0 1 x x = −6 −1

(3.1.13)

(3.1.14)

We have therefore shown that the linear second-order differential equation (3.1.9) that describes a spring-mass system may be converted to the system of linear ﬁrst-order equations (3.1.14) through the substitution x1 = y, x2 = y . In fact, any linear higher order differential equation may be converted through a similar substitution to a system of linear ﬁrst-order equations. Therefore, by learning to understand and solve systems of linear equations, we will be able to determine the behavior of higher order linear equations as well. It is this fact that motivates us to study systems of linear equations prior to the study of higher order single equations.

3.2 The eigenvalue problem revisited

As we begin our study of linear systems of ﬁrst-order differential equations, we are ultimately interested in two main questions: the ﬁrst asks, for a linear system x = Ax such as 2 3 x = x 2 1 how can we explicitly solve the system for x(t )? In addition, what is the longterm behavior of the solution x(t ) to such a system? How does its graph appear? We start our investigation by thinking carefully about the meaning of the matrix equation x = Ax and compare our experience with the single ﬁrst-order differential equation x = ax. Note that we naturally begin with the homogeneous system x = Ax; later we will consider nonhomogeneous systems of the form x = Ax + b. In every case, we seek a vector function x(t ) that solves the given system. An elementary example is instructive.

192

Linear systems of differential equations

Example 3.2.1

Solve the linear system x = Ax, where 0 −3 A= 0 −1

Explain the role that the eigenvalues and eigenvectors of A play in the general solution, and graph and discuss the solution curves for different choices of initial conditions. Solution.

First, we observe that the system x1 0 0 x1 −3 −3 x = x = = 0 −1 0 −1 x2 x2

(3.2.1)

tells us that we seek two functions x1 (t ) and x2 (t ) such that x1 = −3x1 and x2 = −x2 . Because the matrix of the system is diagonal, the problem is especially simple. In particular, the system is uncoupled, which means that the differential equation for x1 does not involve x2 and the equation for x2 does not involve x1 . From our experience with linear ﬁrst-order equations, we know that the general solution to x1 = −3x1 is x1 (t ) = c1 e −3t and that the solution to x2 = −x2 is x2 (t ) = c2 e −t . Writing the solution to the system as a single vector, we have −3t x c e x = 1 = 1 −t (3.2.2) x2 c2 e Rewriting x in another form sheds further insight on the key components of this solution. Writing x as the sum of two vectors, we ﬁnd −3t 0 c1 e −3t 1 −t 0 x= (3.2.3) + = c1 e + c2 e c2 e −t 0 1 0 Here, we can make a key observation about the eigenvalues and eigenvectors of A: because A is diagonal, its eigenvalues are its diagonal entries, λ1 = −3 and λ2 = −1. Moreover, its corresponding eigenvectors may be easily conﬁrmed to be the vectors 1 0 and v2 = v1 = 0 1 Thus, in (3.2.3), we see the interesting fact that the solution has the form x = c1 e λ1 t v1 + c2 e λ2 t v2 ; the eigenvalues and eigenvectors therefore play a central role in the system’s behavior. Finally, we explore the solutions to several related initial-value problems for select initial conditions. If we have the initial condition x(0) = [4 0]T , we see in (3.2.3) that c1 = 4 and c2 = 0, so that the solution to the IVP is −3t 1 x(t ) = 4e 0 Two key observations can be made about this solution curve: one is that its graph is a straight line, since for every value of t , x is a scalar multiple of the

The eigenvalue problem revisited

193

vector [1 0]T . Note particularly that the direction of this line is given by the eigenvector corresponding to λ1 = −3. The other important fact is that e −3t → 0 as t → ∞, and therefore x(t ) → 0, so that the solution approaches the origin as time increases without bound. For the initial condition x(0) = [0 5]T , it follows from (3.2.3) that c1 = 0 and c2 = 5, and thus the solution to this IVP is −t 0 x(t ) = 5e 1 Similar observations about the behavior of this solution may be made to those noted above for the ﬁrst chosen initial condition: this solution curve is linear and approaches the origin as t → ∞. Finally, if we consider an initial condition that does not correspond to an eigenvector of the system, such as x(0) = [4 5]T , (3.2.3) tells us that c1 = 4 and c2 = 5, and thus −3t 1 −t 0 x = 4e + 5e 0 1 This last solution’s graph is not a straight line. As seen in ﬁgure 3.3, which shows the three different solutions based on the differing initial conditions, we see the consistent behavior that every solution tends to the origin as t → ∞, as well as that the eigenvectors play a key role in how these graphs appear. We will discuss this graphical perspective further in sections 3.4 and 3.5. The long-term behavior of the solutions to the system (3.2.1) in example 3.2.1 suggests that every solution tends to the zero vector. In fact, the origin itself is a solution, a so-called constant or equilibrium solution. That is, if x2 solution through (0,5) 5

solution through (4,5)

solution through (4,0) x1 5 Figure 3.3 Plots of solutions to three IVPs for

the system in example 3.2.1. Arrows indicate the direction of ﬂow along the solution curve as time increases.

194

Linear systems of differential equations

we consider whether there is any constant vector x that is a solution to x = Ax, it follows that x = 0, and thus x must satisfy Ax = 0. From our work with homogeneous linear equations, we know that x = 0 is always a solution to this equation, and thus the zero vector is a constant solution to every homogeneous linear system of ﬁrst-order differential equations. In sections 3.4 and 3.5 we will investigate the so-called stability of this equilibrium solution. There is a second perspective from which we can see how eigenvectors and eigenvalues arise in the solution of linear systems of differential equations. After constant solutions, the next simplest type of solutions to such a system are straight-line solutions. In other words, solutions whose graph is a straight line in space form a particularly important type of solution to a system. In the preceding example, we saw two such straight-line solutions: each occurred in the direction of an eigenvector and passed through the origin. In search of a general straight-line solution to x = Ax, we know that any such solution must have the form x(t ) = f (t )v, where f (t ) is a scalar function and v is a constant vector. This form guarantees that x(t ) traces out a path that is a straight line through 0 in the direction of v. In order for x(t ) to satisfy the system, we observe that since x (t ) = f (t )v, the equation f (t )v = A(f (t )v)

(3.2.4)

must hold. Moreover, since f (t ) is a scalar, the linearity of matrix multiplication allows us to rewrite (3.2.4) as f (t )v = f (t )Av

(3.2.5)

Equation (3.2.5) is strongly reminiscent of the equation we use to deﬁne eigenvalues and eigenvectors: Ax = λx. In fact, if f (t ) = λf (t ), then (3.2.5) implies that λf (t )v = f (t )Av

Further, if f (t ) = 0, then λv = Av, and λ and v must be an eigenvalueeigenvector pair of A. It is therefore natural for us to want f to satisfy the single differential equation f (t ) = λf (t ). From our work in chapter 2, we know that f (t ) = Ce λt is the general solution to this equation. Substituting this form for f in (3.2.5), we now observe that λe λt v = e λt Av

(3.2.6)

and since e λt is never zero, we can simplify (3.2.6) to λv = Av

(3.2.7)

which is satisﬁed precisely when v is an eigenvector of A with corresponding eigenvalue λ. Our most recent work has demonstrated that if x(t ) is a function of the form x(t ) = e λt v that is a solution to x = Ax, then (λ, v) is an eigenpair of the coefﬁcient matrix A. In fact, the converse also holds (as will be shown in

The eigenvalue problem revisited

195

the exercises), so that the following result is true for any n × n system of linear ﬁrst-order differential equations. Theorem 3.2.1 Let A be an n × n matrix. The vector function x(t ) = e λt v is a solution to the homogeneous linear system of ﬁrst-order differential equations given by x = Ax if and only if v is an eigenvector of A with corresponding eigenvalue λ. We close this section with one more example to demonstrate theorem 3.2.1 and one of its important consequences. Example 3.2.2 Consider the system of differential equations given by x1 = −2x1 − 2x2 x2 = −4x1 Write the system in the form x = Ax and show that A has two real eigenvalues with corresponding linearly independent eigenvectors. Verify by substitution that for each eigenvalue-eigenvector pair, x(t ) = e λt v is a solution of the system. In addition, show that any linear combination of such solutions is also a solution to the system. Solution. First, we observe that the system can be expressed in the form x = Ax by using the matrix −2 −2 A= −4 0 We brieﬂy review the process of determining the eigenvalues and eigenvectors of a matrix A; in most future occurrences, we will use Maple to determine this information using the commands introduced in section 1.10.2. Since the eigenvalues are the roots of the characteristic equation, we solve det(A − λI) = 0. Doing so, 0 = det(A − λI) −2 − λ −2 = −λ(−2 − λ) − 8 = det −4 −λ = λ2 + 2λ − 8 = (λ + 4)(λ − 2)

so the eigenvalues of A are λ = −4 and λ = 2. To ﬁnd the eigenvector v that corresponds to λ = −4, we solve the equation (A − (−4I))v = 0. Row-reducing the appropriate augmented matrix yields 2 −2 0 1 −1 0 → 0 0 0 −4 4 0

196

Linear systems of differential equations

which shows that a corresponding eigenvector is any scalar multiple of the vector v1 = [1 1]T . Similar computations show that for λ = 2, a corresponding eigenvector is v2 = [1 − 2]T . We now verify directly what theorem 3.2.1 guarantees: that x1 (t ) = e −4t [1 1]T and x2 (t ) = e 2t [1 − 2]T are solutions to the given system of equations. Observe ﬁrst that 1 (3.2.8) x1 (t ) = −4e −4t 1 and that

1 −2 −2 −4t 1 −4t −2 −2 Ax1 (t ) = e =e 1 −4 0 −4 0 1 1 −4 = e −4t = −4e −4t 1 −4

(3.2.9)

Equations (3.2.8) and (3.2.9) conﬁrm that indeed x1 (t ) = Ax1 (t ) and demonstrate the role that eigenvalues and eigenvectors play in the solution. Similarly, for the function x2 (t ), 1 x2 (t ) = 2e 2t −2 and

1 1 −2 −2 2t 2t −2 −2 e =e Ax2 (t ) = −4 0 −2 −4 0 −2 2 1 = e 2t = 2e 2t −4 −2

(3.2.10)

This shows that x2 (t ) = Ax2 (t ). Finally, we are asked to show that any linear combination of x1 (t ) and x2 (t ) is also a solution to the differential equation. While we could conﬁrm this somewhat laboriously through direct computations, it is much easier to work more generally and consider known properties of differentiation and matrix multiplication. In particular, differentiation is a linear operator and we know that if we let y(t ) = c1 x1 (t ) + c2 x2 (t ) it follows that y (t ) = (c1 x1 (t ) + c2 x2 (t )) = c1 x1 (t ) + c2 x2 (t )

(3.2.11)

Similarly, matrix multiplication is a linear process, so Ay(t ) = A(c1 x1 (t ) + c2 x2 (t )) = c1 Ax1 (t ) + c2 Ax2 (t ) x1 (t )

Since we have already established that = Ax1 (t ) and follows that c1 x1 (t ) + c2 x2 (t ) = c1 Ax1 (t ) + c2 Ax2 (t )

x2 (t )

(3.2.12)

= Ax2 (t ), it

so by (3.2.11) and (3.2.12) we have shown that y (t ) = Ay(t ) and thus indeed every linear combination of x1 (t ) and x2 (t ) is also a solution to x = Ax.

The eigenvalue problem revisited

197

Example 3.2.2 provides the foundation for much of our study of linear systems of differential equations. It shows that when we can ﬁnd real eigenvalues and eigenvectors, these lead us directly to solutions of the system. In addition, any linear combination of such solutions is also a solution to the system; we state this formally in the next theorem. Theorem 3.2.2 If (λ1 , v1 ), (λ2 , v2 ), . . . , (λk , vk ) are eigenpairs of an n × n matrix A and c1 , . . . , ck are any scalars, then x(t ) = c1 e λ1 t v1 + c2 e λ2 t v2 + · · · + ck e λk t vk is a solution to x = Ax. In upcoming sections, we will determine whether we have found all of the solutions to a given system, address some subtle issues that arise when we cannot ﬁnd enough real eigenvalues and eigenvectors, and better understand the graphical and long-term behavior of solutions. The exercises in this section will help further illuminate the roles of eigenvalues and eigenvectors as well as some of the issues that arise when there is an insufﬁcient number of real eigenvectors for a given system’s matrix. Exercises 3.2 In exercises 1–7, compute by hand the eigenvalues and eigenvectors of the given matrix. 1 4 1. A = 2 3 0 4 2. A = 1 0 0 3 3. A = 3 8 2 2 4. A = −1 −1 ⎤ ⎡ 2 2 0 5. A = ⎣1 2 1⎦ 1 2 1 ⎡ ⎤ 3 0 1 0⎦ 6. A = ⎣0 2 5 0 −1 ⎡ ⎤ 2 1 0 7. A = ⎣0 2 1⎦ 0 0 2

198

Linear systems of differential equations

8. Consider the system of differential equations given by x1 = −2x1 + 3x2 x2 = x1 − 4x2 (a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant (equilibrium) solutions to x = Ax. (c) Compute the eigenvalues and eigenvectors of A. (d) Determine all straight-line solutions to x = Ax. (e) Find a more general solution to x = Ax by taking all possible linear combinations of the straight-line solutions from (d). (f) Solve the initial-value problem x = Ax, x(0) = [1 2]T . Discuss the graphical behavior of this solution. 9. Consider the system of differential equations given by x1 = −x1 + 2x2 x2 = −7x1 + 8x2 (a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant solutions to x = Ax. (c) Compute the eigenvalues and eigenvectors of A. (d) Determine all straight-line solutions to x = Ax. (e) Find a more general solution to x = Ax by taking all possible linear combinations of the straight-line solutions from (d). (f) Solve the initial-value problem x = Ax, x(0) = [−2 0]T . Discuss the graphical behavior of this solution. 10. Consider the system of differential equations given by x1 = 2x1 + 3x2 x2 = −4x2 (a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant solutions to x = Ax. (c) Compute the eigenvalues and eigenvectors of A. (d) Determine all straight-line solutions to x = Ax. (e) Find a more general solution to x = Ax by taking all possible linear combinations of the straight-line solutions from (d). (f) Explain how you could ﬁnd this same general solution without determining eigenvalues and eigenvectors. (Hint: focus on x2 (t ) ﬁrst.) (g) Solve the initial-value problem x = Ax, x(0) = [0 1]T . Discuss the graphical behavior of this solution.

The eigenvalue problem revisited

199

11. Consider the system of differential equations given by x1 = −2x1 + x2 x2 = −2x2 (a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant solutions to x = Ax. (c) Compute the eigenvalues and eigenvectors of A. (d) Determine all straight-line solutions to x = Ax. (e) Find a more general solution to x = Ax by taking all possible linear combinations of your straight-line solutions from (d). (f) Attempt to solve the initial-value problem x = Ax, x(0) = [1 1]T . What does this tell you about the proposed general solution in (e)? 12. Consider the system of differential equations given by x1 = 2x1 + 9x2 x2 = −x1 − 2x2 (a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant solutions to x = Ax. (c) Compute the eigenvalues and eigenvectors of A. (d) Are there any straight-line solutions to x = Ax. Why or why not? 13. Consider the system of differential equations given by x1 = −3x1 + x2 x2 = 3x1 − x2 (a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant solutions to x = Ax. Compare and contrast your ﬁndings with preceding exercises. (c) Compute the eigenvalues and eigenvectors of A. (d) Determine all straight-line solutions to x = Ax. How many such solutions exist? (e) Find a more general solution to x = Ax by taking all possible linear combinations of your straight-line solutions from (d). (f) Solve the initial-value problem x = Ax, x(0) = [3 0]T . Discuss the graphical behavior of this solution. 14. Consider the system of differential equations given by x1 = 3x1 + x2 + x3 x2 = x1 + 3x2 + x3 x3 = x1 + x2 + 3x3

200

Linear systems of differential equations

(a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant solutions to x = Ax. (c) Compute the eigenvalues and eigenvectors of A. (d) Determine all straight-line solutions to x = Ax. (e) Find a more general solution to x = Ax by taking all possible linear combinations of your straight-line solutions from (d). (f) Solve the initial-value problem x = Ax, x(0) = [1 1 1]T . Discuss the graphical behavior of this solution. 15. Consider the system of differential equations given by x1 = 8x1 − x2 − 11x3 x2 = 18x1 − 3x2 − 19x3 x3 = 2x1 − x2 − 5x3 (a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant solutions to x = Ax. (c) Compute the eigenvalues and eigenvectors of A. (d) Determine all straight-line solutions to x = Ax. (e) Find a more general solution to x = Ax by taking all possible linear combinations of your straight-line solutions from (d). (f) Solve the initial-value problem x = Ax, x(0) = [1 1 1]T . Discuss the graphical behavior of this solution. Recall from section 3.1 that a second-order linear differential equation whose solution is y(t ) may be converted to a system of ﬁrst-order linear equations whose solution is x = [x1 x2 ]T through the substitution x1 = y, x2 = y . See, for example, the discussion following (3.1.10). In exercises 16–22, convert each given higher order differential equation to a system of ﬁrst-order equations through an appropriate substitution. 16. y − 4y = 0 17. y + y − 12y = 0 18. y + y + y = 0 19. y − 2y − 8y = e t 20. y + 3y + 3y + y = 0 21. y − 6y + 5y = 0 22. y (4) + 2y − 5y + y − 9y = 0 In sections 1.1 and 3.1, we showed how two connected tanks containing a solute lead to a system of linear ﬁrst-order differential equations. In exercises 23–26,

The eigenvalue problem revisited

201

set up, but do not solve a system of differential equations or initial-value problem whose solution would give the amount of salt in each tank at time t . Write each system in matrix form. 23. A system of two tanks is connected in such a way that each of the tanks has an independent inﬂow that delivers salt solution to it, each has an independent outﬂow (drain), and each tank is connected to the other with an outﬂow and an inﬂow. The relevant information about each tank is given in the table below. Tank A

Tank B

100 liters

200 liters

5 liters/min

9 liters/min

7 g/liter

3 g/liter

4 liters/min

10 liters/min

to B: 3 liters/min

to A: 2 liters/min

Tank volume Rate of inﬂow to the tank Concentration of salt in inﬂow Rate of drain outﬂow Rates of outﬂows to other tank

24. Suppose that in exercise 23 all of the given information remains the same except for the fact that instead of saltwater ﬂowing into each tank, pure water ﬂows in; that is, the concentration of salt in the entering solution is 0 g/liter for each tank. 25. In a closed system of two tanks (i.e., one for which there are no input ﬂows and no output ﬂows), the following information is given. Tank A is ﬁlled with 100 liters of solution whose initial concentration is 0.25 g/liter. Tank B is ﬁlled with 50 liters of solution whose initial concentration is 3 g/liter. The two tanks are connected with two pipes having ﬂows in opposite direction; mixed solution from Tank A ﬂows to Tank B at a rate of 4 liters/min. Similarly, mixed solution ﬂows from Tank B to Tank A at a rate of 4 liters/min. 26. In a closed system of three tanks (i.e., one for which there are no input ﬂows and no output ﬂows), the following information is given. Tank A

Tank B

Tank C

100 liters

150 liters

125 liters

Rates of outﬂows to other tanks

to B: 3 liters/min

to C: 1 liter/min

to A: 4 liters/min

Rates of outﬂows to other tanks

to C: 4 liters/min

to A: 3 liters/min

to B: 1 liter/min

Tank Volume

202

Linear systems of differential equations

Tank A is ﬁlled with 100 liters of solution whose initial concentration is 8 g/liter. Tank B is ﬁlled with 150 liters of solution whose initial concentration is 3 g/liter. Tank C is initially ﬁlled with 125 liters of pure water. The three tanks are connected with pipes having ﬂows in opposite directions; ﬂow rates are given in the table above. 27. Show that if (λ, v) is an eigenpair of the matrix A, then x(t ) = e λt v is a solution to the homogeneous system of linear differential equations given by x = Ax. 3.3 Homogeneous linear ﬁrst-order systems

In preceding sections, we have encountered examples of systems of two (or three) linear differential equations in two (or three) unknown functions. More generally, a linear system of n differential equations in n unknown functions (or simply, a linear system) is a collection of differential equations for which we seek unknown functions x1 (t ), . . . , xn (t ) when given n equations with coefﬁcient functions aij (t ) and bi (t ) in the form dx1 = a11 (t )x1 + a12 (t )x2 + · · · + a1n (t )xn + b1 (t ) dt dx2 = a21 (t )x1 + a22 (t )x2 + · · · + a2n (t )xn + b2 (t ) dt .. .. . . dxn = an1 (t )x1 + an2 (t )x2 + · · · + ann (t )xn + bn (t ) dt It will be convenient to write the above system in matrix form. If we let x denote the vector function whose entries are x(t ) = [xi (t )], A(t ) the n × n matrix of functions whose entries are A = [aij (t )], and b(t ) the vector of functions whose entries are b = [bi (t )], then the above system can be rewritten simply as x (t ) = A(t )x(t ) + b(t )

(3.3.1)

In much of our work, we will suppress the independent variable t and write x = Ax + b. Moreover, it will most often be the case that, as in examples 3.2.1 and 3.2.2, the matrix A has all constant entries. Indeed, from this point on, unless otherwise noted, we will assume the matrix A has constant entries. In the event that b = 0, we say that the linear system is homogeneous. If b is nonzero, the system is nonhomogeneous. We have already encountered in theorems 3.2.1 and 3.2.2 the important facts that for any homogeneous ﬁrst-order linear system x = Ax, every solution of the form x(t ) = e λt v requires (λ, v) to be an eigenpair of A, and that any linear combination of such solutions is also a solution to the system. Just as with individual differential equations, to each system of equations we can associate an initial-value problem. Using the matrix notation (3.3.1), if

Homogeneous linear ﬁrst-order systems

203

we assume that we also have the initial condition x(t0 ) = x0 , then we have the standard initial-value problem x (t ) = A(t )x(t ), x(t0 ) = x0 (3.3.2) We next consider a theoretical result (whose proof we omit) that will frame our overall work with systems. The following theorem is analogous to the earlier result we encountered in theorem 2.2.1 regarding the existence of a unique solution to the initial-value problem associated with a single ﬁrst-order differential equation. Theorem 3.3.1 In (3.3.2), let the entries of the matrix A(t ) be continuous functions on a common interval I that contains the value t0 . Then there exists a unique solution x(t ) to (3.3.2) on the interval I . In particular, we note that in examples where the matrix A has constant coefﬁcients, the entries are continuous functions, so that the IVP x = Ax, x(0) = x0 is guaranteed to have a unique solution. We now examine this result more closely through a particular example, revisiting a problem we considered in the preceding section. Example 3.3.1 Determine the unique solution to the IVP given by −2 −2 −5 x = x , x(0) = 3 −4 0

(3.3.3)

Solution. We note, by theorem 3.3.1, that a unique solution exists. Moreover, from our work in example 3.2.2, every function of the form 1 1 x(t ) = c1 e −4t (3.3.4) + c2 e 2t 1 −2 is a solution to the system x = Ax. We now explore whether we can ﬁnd constants c1 and c2 in order that the function x(t ) will satisfy the given initial condition in (3.3.3). The initial condition in (3.3.3) and (3.3.4) together imply 1 −5 0 1 0 = x(0) = c1 e + c2 e 3 1 −2 or equivalently

1 1 −5 + c2 = c1 1 3 −2

(3.3.5)

We note that since the vectors [1 1]T and [1 − 2]T (which are eigenvectors of A) are linearly independent and span R2 , we are guaranteed a unique solution to (3.3.5). Row-reducing the system (3.3.5), we ﬁnd

1 0 − 73 1 1 −5 → 1 −2 3 0 1 − 83

204

Linear systems of differential equations

Thus, we have shown

7 −4t 1 8 2t 1 − e x(t ) = − e 1 − 2 3 3

is the unique solution to the given initial-value problem. One especially important observation from example 3.3.1 can be made regarding the point at which we solved for the constants c1 and c2 : we were guaranteed not only that a solution existed, but also that it was unique, due to the fact that two linearly independent eigenvectors of the 2 × 2 matrix A were present in the general solution (3.3.4). Indeed, if we imagine wanting to solve any similar IVP with the freedom to choose any initial vector x(0), it will be necessary that x(0) can be written as a linear combination of the vectors v1 and v2 , whenever the general solution has form x(t ) = c1 e λ1 t v1 + c2 e λ2 t v2 This situation is indicative of the general fact that for all 2 × 2 linear systems of DEs, we must have two parts to the general solution, in order to be able to uniquely determine the constants c1 and c2 . Note further that for the solutions x1 (t ) = e λ1 t v1 and x2 (t ) = e λ2 t v2 we encountered above, x1 (0) = v1 and x2 (0) = v2 are linearly independent and form a basis for R2 . This linear independence of the constant vectors v1 and v2 turns out to have an important analog in the linear independence of certain solutions to the system of differential equations. More generally, we can consider these same issues for an n × n homogeneous system. Because theorem 3.3.1 guarantees the existence of a unique solution to the corresponding IVP for every initial condition x(0) ∈ Rn , when we think about the structure of the general solution, it is natural to think this solution will have form x(t ) = c1 x1 (t ) + c2 x2 (t ) + · · · + cn xn (t ) where {x1 (0), x2 (0), . . . , xn (0)} form a basis for Rn . These observations, together with our earlier work in theorem 3.2.2 that showed that every linear combination of solutions to the general homogeneous linear system of DEs (3.3.1) is also a solution to (3.3.1), help explain why the set of all solutions to x = Ax, where A is a matrix with constant coefﬁcients, is a vector space of dimension n. We state this formally in the following result. Theorem 3.3.2 The set of all solution vectors to the homogeneous linear system x = Ax, where A is an n × n matrix with constant coefﬁcients, forms a vector space of dimension n. Theorem 3.3.2 shows us that in order to solve an n × n system of homogeneous ﬁrst-order DEs, we must ﬁnd n linearly independent solutions to the system. Said differently, the general solution to x = Ax will have form x(t ) = c1 x1 (t ) + c2 x2 (t ) + · · · + cn xn (t )

(3.3.6)

Homogeneous linear ﬁrst-order systems

205

where x1 (t ), . . . , xn (t ) are linearly independent functions. Thus, our search for the general solution to the system requires us to ﬁnd these n linearly independent functions x1 (t ), . . . , xn (t ). While we need to discuss in more detail what it means for vector functions (rather than constant vectors) to be linearly independent, we can ﬁrst note that we know by theorem 3.2.1 that when (λi , vi ) is an eigenpair of A, the function xi (t ) = e λi t vi is a solution to x = Ax. This fact, combined with theorem 3.3.2, implies the result depicted in theorem 3.3.3. Theorem 3.3.3 If A is an n × n matrix with n linearly independent eigenvectors v1 , v2 , . . . , vn , with corresponding eigenvalues λ1 , λ2 , . . . , λn (where the eigenvalues are not necessarily distinct), then the general solution to x = Ax is x(t ) = c1 e λ1 t v1 + c2 e λ2 t v2 + · · · + cn e λn t vn

(3.3.7)

The linear independence of v1 , . . . , vn guarantees that we can solve the IVP x = Ax, x(0) = x0 for every possible choice of x0 ∈ Rn , since we may write x0 = c1 v1 + c2 v2 + · · · + cn vn for a unique set of values c1 , . . . , cn . This shows that the general solution (3.3.7) indeed captures all possible solutions to the system. In our original study of the eigenvalue problem in section 1.10, we observed (and proved in one of the exercises) that eigenvectors corresponding to distinct (real1 ) eigenvalues are linearly independent. This yields an important consequence of theorem 3.3.3: if A has n distinct real eigenvalues, then A has n linearly independent (real) eigenvectors. In particular, the following corollary is true. Corollary 3.3.4 If A is an n × n matrix with n distinct real eigenvalues λ1 , λ2 , . . . , λn , then the corresponding eigenvectors v1 , v2 , . . . , vn are linearly independent and the general solution to x = Ax is x(t ) = c1 e λ1 t v1 + c2 e λ2 t v2 + · · · + cn e λn t vn

(3.3.8)

We now consider a speciﬁc example in which we see corollary 3.3.4 at work. Example 3.3.2 Determine the general solution to the homogeneous ﬁrst-order system of DEs x = Ax and determine the unique solution to the initial-value problem ⎤ ⎡ ⎡ ⎤ −4 1 −1 1 5⎦ x , x(0) = ⎣−2⎦ x = Ax = ⎣−1 −2 3 −3 3 0 1 We are interested in real solutions to the system x = Ax; when eigenvalues and eigenvectors are complex, additional work is needed. See section 3.5.

206

Linear systems of differential equations

Solution. We begin by computing the eigenvalues and eigenvectors of A. Using the Eigenvectors(A) command in Maple, we ﬁnd that the eigenvalues of A are λ1 = −6, λ2 = −3, λ3 = 3, with corresponding eigenvectors ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 1 v1 = ⎣−1⎦ , v2 = ⎣1⎦ , v3 = ⎣1⎦ 0 1 1 Since the eigenvalues of A are distinct, we know immediately that the corresponding eigenvectors are linearly independent, and therefore by corollary 3.3.4 that the general solution to the given system is ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 0 x(t ) = c1 e −6t ⎣−1⎦ + c2 e −3t ⎣1⎦ + c3 e 3t ⎣1⎦ (3.3.9) 0 1 1 To solve the IVP with

⎡

⎤ 1 x(0) = ⎣−2⎦ 3

we set t = 0 in (3.3.9) and apply the given condition, which leads to the vector equation ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 0 1 c1 ⎣−1⎦ + c2 ⎣1⎦ + c3 ⎣1⎦ = ⎣−2⎦ 0 1 1 3 Writing this equation in augmented matrix form and row-reducing shows that ⎡ ⎤ ⎡ ⎤ 1 1 0 1 1 0 0 2 ⎣−1 1 1 −2⎦ → ⎣0 1 0 −1⎦ 1 0 1 3 0 0 1 1 and, therefore, the solution to the IVP is ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 0 x(t ) = 2e −6t ⎣−1⎦ − e −3t ⎣1⎦ + e 3t ⎣1⎦ 0 1 1 From corollary 3.3.4, we know that if we have an n × n matrix A with n linearly independent real eigenvectors, then we can completely solve the system x = Ax. But what if A lacks n real linearly independent eigenvectors? While we will encounter this situation in more detail in section 3.5, here it is worthwhile to note that we will still be seeking n linearly independent solutions x1 (t ), . . . , xn (t ) to the general system. For these vector functions, the fundamental meaning of linear independence remains the same as it does for constant vectors: the set of

Homogeneous linear ﬁrst-order systems

207

vector functions {x1 (t ), . . . , xn (t )} is linearly independent if and only if the only values of c1 , . . . , cn that make c1 x1 (t ) + · · · + cn xn (t ) = 0

(3.3.10)

true for all values of t are c1 = · · · = cn = 0. Testing the linear independence of vector functions is more involved; to do so, we introduce a new concept and a corresponding theorem. Deﬁnition 3.3.1 Given vector functions x1 (t ), . . . , xn (t ) where each xi (t ) ∈ Rn for all t , the Wronskian of these functions is W [x1 , . . . , xn ] = det[x1 , . . . , xn ]

(3.3.11)

That is, the Wronskian of a set of n vector functions, each of which lies in Rn , is the determinant of the n × n matrix whose columns are x1 , . . . , xn . The Wronskian enables us to easily test whether or not vector functions are linearly independent through the following theorem, which will be stated without proof. Theorem 3.3.5 Let x1 (t ), . . . , xn (t ) be vector functions continuous on an interval I , where xi (t ) ∈ Rn for all t ∈ I . If at any point t0 in I , W [x1 , . . . , xn ] (t0 ) = 0, then {x1 (t ), . . . , xn (t )} is linearly independent on I . We observe that this result appears reasonable since it is analogous to two statements that appear in the Invertible Matrix theorem: for a set of n constant vectors in Rn , we know that the set is linearly independent if and only if the determinant of the matrix whose columns are these vectors is nonzero. Theorem 3.3.5 is a generalization of this result to the situation where the vectors are not constant. An example will now demonstrate the use of the Wronskian in showing a set of vector functions is linearly independent. Example 3.3.3 Consider the vector functions x1 = [e −t − e −t e −t ]T , x2 = [3e 2t e 2t − 2e 2t ]T , and x3 = [e 5t e 5t e 5t ]T . Are x1 , x2 , and x3 linearly independent? Solution. We use the Wronskian of x1 , x2 , and x3 to determine their linearly independence. Observe that ⎡ −t ⎤ e 3e 2t e 5t e 2t e 5t ⎦ W [x1 , x2 , x3 ] = det ⎣−e −t − t e −2e 2t e 5t = e −t (e 2t e 5t + 2e 5t e 2t ) − 3e 2t (−e −t e 5t − e 5t e −t ) + e 5t (2e 2t e −t − e 2t e −t )

208

Linear systems of differential equations

= e −t (3e 7t ) − 3e 2t (−2e 4t ) + e 5t (e t ) = 10e 6t = 0

Since W [x1 , x2 , x3 ] = 0 for at least one t -value (in fact, for all t ), it follows by theorem 3.3.5 that the functions x1 , x2 , and x3 are linearly independent. In conclusion, we now know that when we encounter a homogeneous system of n linear ﬁrst-order differential equations in n unknown functions, the set of all solutions to the system forms an n-dimensional vector space. Hence, we seek n linearly independent solutions to the system x = Ax. Such a set x1 , . . . , xn of n linearly independent solution vectors to this system is called a fundamental set. Moreover, given a set of fundamental solutions x1 , . . . , xn to x = Ax, on some interval I , the general solution to the system is x(t ) = c1 x1 + · · · + cn xn We have also seen that if an n × n matrix A has n linearly independent real eigenvectors, then these eigenvectors and their corresponding eigenvalues generate a fundamental set for the system x = Ax. In subsequent sections we will ﬁnd that, even in the case when an insufﬁcient number of real eigenvectors exists, the eigenvalue problem enables us to build a fundamental set. Moreover, we will investigate how fundamental solutions allow us to fully understand the graphical behavior of solutions and the stability of equilibrium solutions to the system. Exercises 3.3 1. If x = Ax represents the system of differential equations given by a 4 × 4 matrix A with constant entries, how many linearly independent solutions to the system do we need to ﬁnd in order to determine the general solution? What if A is 7 × 7? 2. Consider the second-order differential equation y + y = 0. Using the substitutions y = x1 and y = x2 , convert the given second-order differential equation to a system of ﬁrst-order equations. What is the dimension of the solution space to the system? What does this tell you about the dimension of the solution space to the original second-order equation? 3. Consider the third-order differential equation y + 3y + 3y + y = 0. Using the substitutions y = x1 , y = x2 , and y = x3 , convert the given differential equation to a system of ﬁrst-order equations. What is the dimension of the solution space to the system? What does this tell you about the dimension of the solution space to the original third-order equation?

Homogeneous linear ﬁrst-order systems

209

In exercises 4–8, use the Wronskian to determine if the given set of vector functions is linearly independent. 4. x1 (t ) = [e −t − e −t ]T , x2 (t ) = [e 2t 2e 2t ]T 5. x1 (t ) = [cos t sin t ]T , x2 (t ) = [sin t cos t ]T 6. x1 (t ) = [e −t − e −t ]T , x2 (t ) = [−3e −t 3e −t ]T 7. x1 (t ) = [e t − e t e t ]T , x2 (t ) = [e 7t 2e 7t − 3e 7t ]T , x3 (t ) = [4e −4t e −4t − e −4t ]T 8. x1 (t ) = [cos t − sin t 0]T , x2 (t ) = [sin t cos t 0]T , x3 (t ) = [0 0 e t ]T 9. Explain why for a set of two vector functions, the Wronskian is unneeded to check for linear independence. (Hint: what is the simple test for a pair of constant vectors to be linearly independent?) 10. Let x = Ax be given by the matrix 1 −2 A= 1 −2 (a) Compute the eigenvalues and eigenvectors of A. Explain why these enable you to ﬁnd the general solution to x = Ax. (b) State the general solution to the system. (c) Solve the IVP with the initial condition x(0) = [3 2]T . 11. Let x = Ax be given by the matrix

3 1 A= 0 3

(a) Compute the eigenvalues and eigenvectors of A. Explain why you have found one linearly independent solution to the system, but still need to determine another. (b) Verify through direct substitution that x2 (t ) = te 3t [1 0]T + e 3t [0 1]T is a solution to the given system x = Ax. (c) Show that the solution you found in (a) above and the solution x2 (t ) in (b) are linearly independent, and hence state the general solution to the system. (d) Solve the IVP with the initial condition x(0) = [3 2]T . 12. Let x = Ax be given by the matrix

3 0 A= 0 3

(a) Compute the eigenvalues and eigenvectors of A. Explain why, despite the repeated eigenvalue, you have found two linearly independent solutions to the system. (b) State the general solution to the system.

210

Linear systems of differential equations

(c) Solve the IVP with the initial condition x(0) = [3 2]T . (d) Explain how you could solve the original system given in this problem without using eigenvalues and eigenvectors. 13. Let x = Ax be given by the matrix A=

0 −1 1 0

(a) Compute the eigenvalues and eigenvectors of A. Explain why the eigenvalues and eigenvectors do not produce any real linearly independent solutions to the system. (b) Verify through direct substitution that x1 (t ) = [cos t sin t ]T and x2 (t ) = [− sin t cos t ]T are solutions to the given system x = Ax. (c) Show that the solutions you veriﬁed in (b) are linearly independent, and hence state the general solution to the system. (d) Solve the IVP with the initial condition x(0) = [3 2]T . 14. Let x = Ax be given by the matrix ⎤ ⎡ 5 6 2 A = ⎣0 −1 −8⎦ 1 0 −2 (a) Compute the eigenvalues and eigenvectors of A. Explain why your work determines two linearly independent solutions to the system, but that one additional linearly independent solution remains to be found. (b) Verify through direct substitution that x3 (t ) = te 3t [5 − 2 1]T + e 3t [1 1/2 0]T is a solution to the given system x = Ax. (c) Show that the set of three solutions from (a) and (b) is linearly independent, and hence state the general solution to the system. (d) Solve the IVP with the initial condition x(0) = [3 2 1]T . 15. Consider the second-order differential equation y + y = 0. Convert this equation to a system of ﬁrst-order equations and solve the system. Use your work to state the general solution y to the original equation. (Hint: See exercise 13.) 16. Convert the second-order differential equation y + 3y + 2y = 0 to a system of ﬁrst-order equations and solve the system. Use your work to state the general solution y to the original equation. 17. Convert the third-order differential equation y − y = 0 to a system of ﬁrst-order equations and solve the system. Use your work to state the general solution y to the original equation.

Systems with all real linearly independent eigenvectors

211

3.4 Systems with all real linearly independent eigenvectors

In this section, we closely examine the graphical and long-term behavior of solutions to 2 × 2 systems in the case where the coefﬁcient matrix A has two real, linearly independent eigenvectors. We do so through a sequence of examples that demonstrate a variety of possibilities that naturally lead to discussion of the stability of equilibrium solutions. We ﬁrst review the graphical behavior of vector functions, a subject normally encountered in multivariable calculus. For the system x = Ax in the case where A is 2 × 2, every solution x(t ) is a vector function whose output lies in R2 . In particular, the graph of x(t ) is the curve that is traced out by the vectors x(t ) at various times t . For example, if −t e −t 1 t 0 x(t ) = e +e = (3.4.1) 0 1 et is a function we have found by solving a system of differential equations, then evaluating x(t ) at t = −1, 0, and 1 yields the vectors 2.719 1 0.368 x(−1) ≈ (3.4.2) , x(0) = , and x(1) ≈ 0.368 1 2.719 Plotting these vectors helps indicate how x(t ) traces out the parametric curve given by (x1 (t ), x2 (t )) = (e −t , e t ), shown at left in ﬁgure 3.4. In addition, it is important to recall the meaning of x (t ), the derivative of a vector function. The direction of the vector x (t ) indicates the instantaneous direction of motion of a particle traveling along the curve traced out by x(t ), while the magnitude of x (t ) determines the instantaneous speed of the particle at time t . For our purposes, the direction of motion is most important because

4

x2

4

(0.368, 2.719) (1,1) −4

−4

x2 (0.368, 2.719) (1,1)

(2.719, 0.368) x1 4 −4

(2.719, 0.368) x1 4

−4

Figure 3.4 At left, the solution curve x(t ) given in (3.4.1). At right, the solution curve x(t ) given in (3.4.1), along with corresponding scaled derivative vectors at times t = −1, t = 0, and t = 1.

212

Linear systems of differential equations

this indicates a ﬂow along the solution curve as time increases. Thus, rather than plotting the vector x (t ) at various times, we plot scaled versions of it, each emanating from the tip of x(t ). For example, since −t −e x (t ) = (3.4.3) et it follows that x (−1) ≈

−2.719 −1 −0.368 , x (0) = , and x (1) ≈ 0.368 1 2.719

(3.4.4)

Plotting scaled versions of each of these vectors emanating from x(−1), x(0), and x(1), respectively, we see the updated image at the right in ﬁgure 3.4. These plots of the derivative vectors and the ﬂow of the solution curve remind us of our earlier work with slope ﬁelds for single differential equations. Indeed, since a solution curve such as x(t ) will always be the result of solving some differential equation x = Ax, we realize that we have a formula for x , just as we had a formula for y in examples like y = −2y. In the example discussed above, we can view x(t ) as being the solution to the system x = Ax where A is the matrix −1 0 A= (3.4.5) 0 1 so that x (t ) satisﬁes the equation

x1 (t ) −x1 (t ) = x (t ) = Ax(t ) = x2 (t ) x2 (t )

(3.4.6)

In particular, (3.4.6) indicates how, for any point (x1 , x2 ) in the plane, we can easily compute x at that point, and hence know the direction of the ﬂow of the solution curve that passes through that point. Using a computer to conduct such computations at points sampled throughout the plane (with each resulting vector scaled to be of equal length), we get a picture of the so-called direction ﬁeld for the system, shown at left in ﬁgure 3.5, which is analogous to a direction ﬁeld for a single differential equation. If we now superimpose our plot of the solution curve in ﬁgure 3.4 in the direction ﬁeld, now shown on the right in ﬁgure 3.5, we see clearly the role that the derivative x and the direction ﬁeld play in determining the graph of the solution x, as well as the typical behavior of a solution as time increases. The x1 –x2 plane is usually called the phase plane; note that the independent variable t is implicit in the ﬂow, while the behavior of the curve relative to the coordinate axes demonstrates the interrelationship between the components x1 (t ) and x2 (t ) of the solution x(t ). Sample solution curves, such the one plotted in ﬁgure 3.5, are typically called trajectories. Each distinct trajectory is a solution to an initial-value problem; the one in ﬁgure 3.5 can be viewed as the solution to x = Ax , x(0) = [1 1]T .

Systems with all real linearly independent eigenvectors

4

x2

4

213

x2

x1 −4

4

−4

−4

4

x1

−4

Figure 3.5 At left, the direction ﬁeld for the system x = Ax given by (3.4.5). At right,

the solution to (3.4.5) that is given by (3.4.1).

We will now explore the direction ﬁeld, phase plane, and trajectories for several examples of 2 × 2 systems of linear differential equations for which the coefﬁcient matrix has two real linearly independent eigenvectors. An important theme throughout will be the long-range behavior of solutions x(t ) as t → ∞. In addition, we will study the equilibrium solutions of each system; a solution x(t ) is an equilibrium or constant solution if and only if x(t ) is constant for all values of t . Example 3.4.1 Consider the system of differential equations given by x = 3 2 Ax where A = . Compute the eigenvalues and eigenvectors of A and 2 3 state the general solution to the system. In addition, determine all equilibrium solutions of the system. Finally, plot the direction ﬁeld for the system, sketch several trajectories, and discuss the long-term behavior of solutions relative to the equilibrium solution(s). Solution. The Maple command > Eigenvectors(A) produces the output 5 1 −1 1 1 1 so that A has eigenvalues λ1 = 5 and λ2 = 1, with corresponding eigenvectors v1 = [1 1]T and v2 = [−1 1]T . We therefore know that the general solution to x = Ax is 5t 1 t −1 + c2 e x(t ) = c1 e 1 1 To ﬁnd the equilibrium solution(s), we seek all constant vectors x that satisfy x = Ax. In this situation, since x is constant with respect to t , we know that

214

Linear systems of differential equations

x = 0, so therefore we must solve the system of linear equations given by Ax = 0 where 3 2 A= 2 3 Since det(A) = 0, it follows that A is an invertible matrix, so the only solution to Ax = 0 is x = 0. Thus the system has the origin as its only equilibrium solution. At the end of this section, in subsection 3.4.1, we will show how to use Maple to plot direction ﬁelds for systems. In this and subsequent examples, well simply provide these plots for discussion. In ﬁgure 3.6, we see not only the direction ﬁeld generated by the system, but also the plots of several trajectories, which are natural to sketch (even by hand, once the direction ﬁeld is provided) by following the map that the direction ﬁeld provides. Note particularly the straight-line solutions that follow the eigenvectors v1 = [1 1]T and v2 = [−1 1]T . Moreover, since both eigenvalues are positive, the respective scalar functions e 5t and e t both increase without bound as t → ∞. This explains why the ﬂow along each straight-line solution is away from the origin. Indeed, every solution besides the zero solution ﬂows away from the equilibrium solution at the origin. In chapter 2, we considered single autonomous differential equations such as y = 2y − 4. When we found equilibrium solutions to such equations, we also classiﬁed their stability based on the behavior exhibited in the direction ﬁeld. We do likewise with equilibrium solutions for systems. In example 3.4.1,

x2 4

x1 −4

4

−4 Figure 3.6 The direction ﬁeld for the system

x = Ax of example 3.4.1 along with several trajectories.

Systems with all real linearly independent eigenvectors

215

we found that x = 0 is the only equilibrium solution of the system, and that every non-constant solution ﬂows away from 0. This shows that 0 is an unstable equilibrium, and in this case we naturally call 0 a repelling node. We next explore the behavior of a system where both eigenvalues are negative. Example 3.4.2 the system of differential equations given by x = Ax Consider 2 −2 where A = . Compute the eigenvalues and eigenvectors of A, and 1 −3 state the general solution to the system. In addition, determine all equilibrium solutions to the system. Finally, plot the direction ﬁeld for the system, sketch several trajectories, and discuss the long-term behavior of solutions relative to the equilibrium solution(s). Solution. Using Maple, we ﬁnd that A has eigenvalues λ1 = −1 and λ2 = −4, with corresponding eigenvectors v1 = [2 1]T and v2 = [−1 1]T . The general solution to x = Ax is therefore −t 2 −4t −1 + c2 e x(t ) = c1 e 1 1 To ﬁnd the equilibrium solution, we set x = 0. Solving the system of linear equations given by Ax = 0, we see that since A is an invertible matrix, the only solution to Ax = 0 is x = 0, so the system has the origin as its only equilibrium solution. Plotting the direction ﬁeld and several trajectories, as shown in ﬁgure 3.7, we observe that all solutions ﬂow towards the equilibrium solution at the origin. This makes sense due to the presence of the scalar functions e −4t and e −t in the general solution, as each approaches 0 as t → ∞, and thus it follows that x(t ) → 0 as t → ∞. Moreover, note the two straight-line solutions that show ﬂow along stretches of the two eigenvectors v1 = [2 1]T and v2 = [−1 1]T . Because every non-constant solution to the system in example 3.4.2 approaches the equilibrium solution at 0, we say that the origin is a stable equilibrium. Moreover, based on the patterns in the ﬂow, we use the terminology that 0 is an attracting node. We study the third case for a 2 × 2 linear system of differential equations with two real, nonzero eigenvalues in the next example: the eigenvalues have opposing signs. 3 −2 Example 3.4.3 Let A = and consider the system of differential 2 −2 equations given by x = Ax. Find the general solution of the system, determine all equilibrium solutions to the system, and plot the direction ﬁeld for the system. Include sketches of several trajectories and discuss the long-term behavior of solutions relative to the equilibrium solution(s).

216

Linear systems of differential equations

4

x2

x1 −4

4

−4 Figure 3.7 The direction ﬁeld for the system

x = Ax in example 3.4.2 along with several trajectories.

Solution. We ﬁnd that A has eigenvalues λ1 = 2 and λ2 = −1, with corresponding eigenvectors v1 = [2 1]T and v2 = [1 2]T . It follows that the general solution to x = Ax is −t 1 2t 2 x(t ) = c1 e + c2 e 1 2 Since A is an invertible matrix, the only solution to Ax = 0 is x = 0, so the origin is only equilibrium solution of the system. As ﬁgure 3.8 shows, the direction ﬁeld and various trajectories exhibit a different type of behavior around the origin. In particular, solutions that do not lie on either eigenvector appear to initially ﬂow toward the origin, and then turn away and tend toward the straight-line solution associated with the positive eigenvalue. More speciﬁcally, it appears that solutions that do not pass through a point on the line in the direction of the eigenvector [1 2]T are eventually attracted to stretches of the eigenvector [2 1]T . This is reasonable since in the general solution, e −t will tend to 0 as t → ∞, leaving the function c1 e 2t [2 1]T to dominate. Since some solutions that pass through points near the origin tend away from the origin as t → ∞, the origin is an unstable equilibrium in example 3.4.3. Moreover, as the trajectories remind us of the contour plot in multivariable calculus of a surface whose graph looks like a saddle, we say in this context as well that the origin is a saddle point. The preceding examples demonstrate the three possible cases for a 2 × 2 system with real, nonzero eigenvalues: both positive, both negative, or opposites. Our next example investigates the situation when one eigenvalue is zero.

Systems with all real linearly independent eigenvectors

217

x2

4

x1 −4

4

−4 Figure 3.8 The direction ﬁeld for the system

x = Ax of example 3.4.3 along with several trajectories.

Example 3.4.4 For the matrix A =

1 −3 and the corresponding system 3 −1

of differential equations x = Ax, ﬁnd the general solution of the system and determine all equilibrium solutions. Furthermore, plot the direction ﬁeld for the system along with sketches of several trajectories; discuss the long-term behavior of solutions relative to the equilibrium solution(s). Solution. We ﬁrst do the standard computations to ﬁnd that A has eigenvalues λ1 = −4 and λ2 = 0, with corresponding eigenvectors v1 = [−1 1]T and v2 = [1 3]T . Thus, the general solution to x = Ax is 1 −4t −1 + c2 x(t ) = c1 e 1 3 We immediately notice something different about x(t ). In particular, because the second eigenvalue is 0, the scalar function e 0t has no effect on the general solution. Furthermore, with e −4t the only part of x(t ) that changes with t , we can see that for any nonzero constant c1 and any c2 , the graph of x(t ) is always a straight line where the direction is given by the eigenvector corresponding to the nonzero eigenvalue. In addition, the presence of a zero eigenvalue has a signiﬁcant impact on the system’s equilibrium solutions. The fact that the columns of A are scalar multiples of each other leads us to see immediately that A is not invertible; this can be equivalently deduced from the fact that A has a zero eigenvalue. The singularity of A further implies that the homogeneous equation Ax = 0 has inﬁnitely many solutions. In particular, row-reducing the appropriate

218

Linear systems of differential equations

augmented matrix, we ﬁnd that 1 0 1 −1/3 0 −3 → 3 −1 0 0 0 0 This implies that any constant vector x of the form 1 x = x1 3 satisﬁes the equation x = Ax, and therefore is an equilibrium solution. Note especially that x = x1 [1 3]T is an eigenvector associated with λ = 0, and thus every eigenvector associated with the zero eigenvalue is an equilibrium solution to the system. The interesting behaviors that we have discussed algebraically are seen in ﬁgure 3.9. Speciﬁcally, every non-constant solution is a straight line solution in the direction of the eigenvector [−1 1]T that is drawn toward an equilibrium point that lies on the eigenvector [1 3]T corresponding to the zero eigenvalue. The ﬂows in ﬁgure 3.9, as well as the long-term behavior of the function e −4t in the general solution x(t ), clearly demonstrate that every equilibrium solution to the system is stable. Moreover, we say that each such equilibrium point is an attracting node. There are two important observations to make in closing. One is that we still must address the situations where A lacks two real linearly independent eigenvectors; we will do so in the next section. In addition, examples 3.4.1–3.4.4 x2 4

x1 −4

4

−4 Figure 3.9 The direction ﬁeld for the system

x = Ax of example 3.4.4 along with several trajectories.

Systems with all real linearly independent eigenvectors

219

indicate that plotting a direction ﬁeld is perhaps best left to a computer; however, in the case where A has two real, linearly independent eigenvectors, it is a straightforward exercise use the eigenvectors to plot these straight-line solutions by hand and to use the signs of the corresponding eigenvalues to understand the ﬂows along the straight line solutions. Then, it is not difﬁcult to imagine the overall appearance of the direction ﬁeld and sketch several probable trajectories by hand, thus fully understanding the graphical behavior of all solutions to the system. 3.4.1 Plotting direction ﬁelds for systems using Maple

We again use the DEtools package, and load it with the command > with(DEtools):

To plot the direction ﬁeld associated with a given system of differential equations, we ﬁrst deﬁne the system itself, similar to how we deﬁned a single differential equation in order to plot its slope ﬁeld. We do this through the 3 2 following command for the system with coefﬁcient matrix A = from 2 3 example 3.4.1. > sys := diff(x(t),t)= 3*x(t)+2*y(t), diff(y(t),t)= 2*x(t)+3*y(t);

The system of differential equations of interest is now stored in “sys”. While we typically use x1 (t ) and x2 (t ) to represent the component functions in our discussion of the theory and solution of systems, in working with Maple it is often simpler to use x(t ) and y(t ). The direction ﬁeld may now be generated by the command > DEplot([sys], [x(t),y(t)], t=-1..1, x=-4..4, y=-4..4, arrows=large, color=gray);

This command produces the output shown at left in ﬁgure 3.10. From here, it is a straightforward exercise to sketch trajectories by hand. Of course, Maple has the capacity to include trajectories that pass through any initial conditions we choose. For example, if we are interested in the various initial conditions x(0) = (2, 2), (0, 4), (4, 0), and (−1, 1), we can modify the earlier DEplot command to > DEplot([sys], [x(t),y(t)], t=-1.6..3.6, x=-4..4, y=-4..4, arrows=large, color=gray, [[x(0)=-2,y(0)=0], [x(0)=0,y(0)=-2], [x(0)=2,y(0)=0], [x(0)=0,y(0)=2],

220

Linear systems of differential equations

x2

x2

4

4

x1

x1 −4

4

−4

−4

4

−4

Figure 3.10 At left, the direction ﬁeld for the system x = Ax of example 3.4.1. At right, the same direction ﬁeld with several trajectories.

[x(0)=0.1,y(0)=0.1], [x(0)=-0.1,y(0)=-0.1], [x(0)=0.1,y(0)=-0.1], [x(0)=-0.1,y(0)=0.1]]);

The results of this most recent DEplot command are shown at right in ﬁgure 3.10. As always, the user can experiment some with the window in which the plot is displayed: the range of x- and y-values can affect how clearly the direction ﬁeld is revealed, and the range of t -values determines how much of each trajectory is plotted. Exercises 3.4 1. Consider the system of differential equations x = Ax given by 2 −1 A= 3 −2 (a) Determine the general solution to the system x = Ax. (b) Classify the stability of all equilibrium solutions to the system. (c) Sketch all straight-line solutions to the system and hence plot several nonlinear trajectories in the phase plane. 2. Consider the system of differential equations x = Ax given by 3 1 A= 1 3 (a) Determine the general solution to the system x = Ax. (b) Classify the stability of all equilibrium solutions to the system.

Systems with all real linearly independent eigenvectors

221

(c) Sketch all straight-line solutions to the system and hence plot several nonlinear trajectories in the phase plane. 3. Consider the system of differential equations x = Ax given by 2 −3 A= 2 −3 (a) Determine the general solution to the system x = Ax. (b) Classify the stability of all equilibrium solutions to the system. (c) Sketch all straight-line solutions to the system and hence plot several nonlinear trajectories in the phase plane. 4. Consider the system of differential equations x = Ax given by 0 −2 A= 0 −2 (a) Determine the general solution to the system x = Ax. (b) Classify the stability of all equilibrium solutions to the system. (c) Sketch the straight-line solutions to the system that correspond to the two linearly independent eigenvectors. Why is every solution to this system also a straight-line solution? 5. Consider the system of differential equations x = Ax given by 2 −2 A= 1 −1 (a) Determine the general solution to the system x = Ax. (b) Classify the stability of all equilibrium solutions to the system. (c) Why is every non-constant solution to this system also a straight-line solution? How are these straight-line solutions related to the eigenvectors of the system? In exercises 6–9, let x(t ) be the stated general solution to some system x = Ax. State the straight-line solutions to the system, classify the stability of the origin, and sketch some sample trajectories. 1 3 6. x(t ) = c1 e −2t + c2 e −5t 3 1 1 −1 + c2 e −3t 7. x(t ) = c1 e 4t 2 2 2 1 8. x(t ) = c1 e 2t + c2 1 −1

222

Linear systems of differential equations

9. x(t ) = c1

e 0.1t

1 −1 10t + c2 e 1 1

10. For the system x = Ax whose general solution is given in exercise 6, determine a possible matrix A for the system. (Hint: If A is a matrix with all real linearly independent eigenvectors and those eigenvectors are the columns of a matrix P, then A satisﬁes the equation AP = PD, where D is the diagonal matrix whose entries are the eigenvalues of A in order corresponding to the eigenvectors in the columns of P.) 11. For the system x = Ax whose general solution is given in exercise 7, determine a possible matrix A for the system. 12. Consider the four systems of equations given by x = Ax where A is given by the matrices I, II, III, and IV below. Match each system with one of the four direction ﬁeld plots (a), (b), (c), and (d) given below. Write one sentence the reasoning behind for each to explain your choice. 5 3 2 −4 2 7 2 3 I. A = II. A = III. A = IV. A = 3 5 7 2 3 −6 −1 2 4

x2

4

(a)

x2

x1 (b) −4

4

x1 −4

−4

−4 4

4

x2

4

x2

x1

x1 (d)

(c) −4

4

−4

−4

4

−4

When a matrix lacks two real linearly independent eigenvectors

223

In exercises 13–17, solve the IVP x = Ax with the given matrix A and stated initial condition. 2 −1 , x(0) = [1 2] 13. A = 3 −2 3 1 14. A = , x(0) = [−3 1]T 1 3 2 −3 15. A = , x(0) = [1 − 2]T 2 −3 0 −2 16. A = , x(0) = [−2 − 2]T 0 −2 2 −2 17. A = , x(0) = [1 4]T 1 −1 In exercises 18–22, use the standard substitution to convert the given secondorder differential equation to a system of two linear ﬁrst-order equations. Solve the system to hence determine the solution y to the second-order equation. 18. y − y − 6y = 0 19. y − 6y + 5y = 0 20. y + 4y = 0 21. y + 3y + 2y = 0 22. y + y = 0 3.5 When a matrix lacks two real linearly independent eigenvectors

We have seen repeatedly, both in theory and in speciﬁc examples, that when a 2 × 2 matrix A has two real linearly independent eigenvectors, we can determine the general solution to x = Ax and its graphical behavior. In this section, we address two remaining cases: when A has a repeated eigenvalue and only one associated real linearly independent eigenvector, and when A has complex eigenvalues and eigenvectors. In each case, we work through preliminary examples to discover general patterns and principles, expand these principles with appropriate theorems, and explore and discuss graphical behavior along the way. Example 3.5.1 Consider the system of differential equations given by x = Ax 1 −2 where A = . Compute the eigenvalues and eigenvectors of A and 0 −2

224

Linear systems of differential equations

explain why this alone does not lead to the general solution of the system. By noting that the system is partially coupled, solve the system and determine a second real, linearly independent solution. Finally, state the general solution. Solution. By inspection, since A is a triangular matrix, we see that λ = −2 is a repeated eigenvalue of A with multiplicity 2. From this, we deduce that v1 = [1 0]T is a corresponding eigenvector, and therefore one solution to x = Ax is x1 = c1 e −2t [1 0]T . However, A lacks a second linearly independent eigenvector associated with λ = −2; therefore, we need to ﬁnd a second real linearly independent solution to the system in order to determine the general solution to x = Ax. In this example, we are fortunate that the system is only partially coupled and that therefore we may solve the system directly by using techniques for single differential equations from chapter 2. In particular, noting that the second equation in the system is x2 = −2x2 , it follows immediately that the solution to this single differential equation is x2 (t ) = ce −2t . Substituting this result into the equation x1 = −2x1 + x2 , it remains for us to solve the single nonhomogeneous linear ﬁrst-order differential equation x1 = −2x1 + ce −2t Applying our understanding of such equations from section 2.3, via the integrating factor v(t ) = e 2t we know that 1 x1 (t ) = 2t e 2t · ce −2t dt = e −2t (ct + k) e To summarize, with x1 (t ) and x2 (t ) as the components of x(t ), we have found that a solution to the system is x (t ) x(t ) = 1 x2 (t ) −2t e (ct + k) (3.5.1) = ce −2t If we factor this expression to write x(t ) as a linear combination of two vectors in order to more clearly identify the role of the constants in (3.5.1), we see −2t −2t te e x(t ) = k (3.5.2) + c −2t 0 e In this form, two key observations can be made. First, each individual vector in (3.5.2) may be veriﬁed to be a solution to the given system. Moreover, these two vectors are linearly independent. Hence, (3.5.2) is the general solution to the given system. While it is good that we were able to solve the system in example 3.5.1, it is still unclear how we will proceed in similar circumstances when neither equation in the system may be solved by techniques for single ﬁrst-order equations. That is,

When a matrix lacks two real linearly independent eigenvectors

225

if the equation for x1 involves x2 and the equation for x 2 involves x1 , but the system’s matrix has only one linearly independent eigenvector, we cannot employ the approach used in example 3.5.1. However, the general form of the solution (3.5.2) can help us guess an appropriate form of the needed second linearly independent solution in the more general case. Recall that we know that whenever (λ, v) is a real eigenpair of A, the function x(t ) = e λt v is a solution to x = Ax, and moreover x(t ) is a straight-line solution to the system. In example 3.5.1, we found that for the given matrix, which had a repeated eigenvalue and only one associated linearly independent eigenvector, the scalar function te λt arose in the solution. If we recall that our original work with e λt v arose from guessing that a function of the form f (t )v was a solution to x = Ax, example 3.5.1 now suggests that in the case where we are missing an eigenvector, we consider a vector function that somehow involves the scalar function te λt as a second linearly independent solution to x = Ax. A closer look at (3.5.2) suggests the form of this second solution we seek. In particular, recalling that the matrix A in example 3.5.1 had v1 = [1 0]T as the eigenvector corresponding to λ = −2, rewriting (3.5.2) reveals the role v1 plays in the general solution. Speciﬁcally, 1 1 0 x(t ) = ke −2t (3.5.3) + cte −2t + ce −2t 0 0 1 and since x1 (t ) = e −2t [1 0]T is the standard solution that arises through the eigenpair, we see from (3.5.3) that the second linearly independent solution 1 0 x2 (t ) = te −2t + e −2t 0 1 has the form te −2t v + e −2t u, where u is not an eigenvector of A corresponding to λ = −2. This suggests a form for the second solution when this case arises in general. We now consider this situation for an arbitrary matrix with the appropriate properties. Let A be a 2 × 2 matrix with a single real, repeated eigenvalue λ with only one linearly independent eigenvector v. Note speciﬁcally that we know Av = λv and x1 (t ) = e λt v is a solution to x = Ax. Now consider a second function (3.5.4) x2 (t ) = te λt v + e λt u where u is an unknown constant vector and (λ, v) remains an eigenpair of A. We seek conditions on u that will make x2 (t ) a solution to x = Ax; as we have previously encountered in several instances, direct substitution into the differential equation reveals the constraints on u. First, differentiating (3.5.4) gives x2 (t ) = (λte λt + e λt )v + λe λt u

(3.5.5)

Next, observe that multiplying x2 (t ) by A yields Ax2 (t ) = A(te λt v + e λt u) = te λt (Av) + e λt (Au)

(3.5.6)

226

Linear systems of differential equations

In order for x2 (t ) to be a solution to x = Ax, it follows from (3.5.5) and (3.5.6) that we require the equality (λte λt + e λt )v + λe λt u = te λt (Av) + e λt (Au)

(3.5.7)

to hold. Using the fact that Av = λv and expanding, we ﬁnd λte λt v + e λt v + λe λt u = λte λt v + e λt (Au)

(3.5.8)

With λte λt v present on both sides of (3.5.8), we can simplify the equality to e λt v + λe λt u = e λt (Au) Since e λt

(3.5.9)

is never zero, we observe from (3.5.9) that u must satisfy the equation v + λu = Au

(3.5.10)

In other words, (A − λI)u = v, where (as we assumed earlier) v is an eigenvector of A that corresponds to the eigenvalue λ. In particular, note that v satisﬁes the equation (A − λI)v = 0. We summarize our work above in the following theorem. Theorem 3.5.1 If A is a 2 × 2 matrix with repeated eigenvalue λ and only one corresponding linearly independent eigenvector v, then the general solution to x = Ax is given by x(t ) = c1 e λt v + c2 e λt (t v + u) where u satisﬁes the equation (A − λI)u = v. The vector u is often called a generalized eigenvector of A corresponding to λ. We now demonstrate the role of theorem 3.5.1 in the following example. 1 4 Example 3.5.2 Let A = and consider the system of differential −1 5 equations given by x = Ax. Find the general solution of the system, determine all equilibrium solutions to the system, and plot the direction ﬁeld for the system. Include sketches of several trajectories and discuss the long-term behavior of solutions relative to the equilibrium solution(s). Solution. We ﬁnd that A has a single repeated eigenvalue λ = 3 with just one corresponding linearly independent eigenvector v = [2 1]T . Thus, one linearly independent solution to x = Ax is x1 (t ) = e 3t v. Applying theorem 3.5.1, we determine a second linearly independent solution to the system. Speciﬁcally, we ﬁrst solve the vector equation (A − 3I)u = v. To do so, we row-reduce the appropriate augmented matrix and ﬁnd 1 −2 −1 −2 4 2 → 0 0 0 −1 2 1 It follows that the vector u must have components u1 and u2 that satisfy the equation u1 = 2u2 − 1, where u2 is a free variable. Since we only need one

When a matrix lacks two real linearly independent eigenvectors

4

227

x2

x1 −4

4

−4 Figure 3.11 The direction ﬁeld for the system

x = Ax of example 3.5.2 along with several trajectories.

such vector u, we choose u2 = 0 and thus u1 = −1. From theorem 3.5.1, it now follows that a second linearly independent solution to x = Ax is given by the function x2 (t ) = e 3t (t v + u). In particular, the general solution to x = A x is 2 2 −1 x(t ) = c1 e 3t + c2 e 3t t + 1 1 0 We note further that since A is an invertible matrix, the only solution to Ax = 0 is x = 0, so the origin is the only equilibrium solution of the system. As ﬁgure 3.11 shows, the direction ﬁeld and several trajectories exhibit behavior consistent with the fact that the system has just one straightline solution, the one that corresponds to the single linearly independent eigenvector of A. Note as well that since the system’s only eigenvalue is positive, every non-constant solution ﬂows away from the origin as t → ∞. In example 3.4.3, the origin is obviously an unstable equilibrium solution. Because there is only one linearly independent eigenvector for the system, we call the origin a degenerate node, and in this case where λ = 3 > 0 and all the trajectories ﬂow away from the origin, this degenerate node is also called a repelling node. We now consider an example that reveals the other possible situation that can arise when a matrix A lacks two real linearly independent eigenvectors: when A has no real eigenvalues and no real eigenvectors.

228

Linear systems of differential equations

Example 3.5.3

Consider the system x = Ax given by the matrix 0 −1 A= 1 0

Compute the eigenvalues and eigenvectors of A and explain why this does not lead directly to the general solution of the system. In addition, plot the direction ﬁeld for the system to conﬁrm these observations from a graphical perspective. Using familiarity with solutions to single differential equations and the form of the equations for the given system, determine the general solution to the system. Solution. The eigenvalues of the matrix A are computed using the characteristic equation −λ −1 = λ2 + 1 = 0 det(A − λI) = det 1 −λ √ We see that λ2 = −1, so that λ = ±i, where i is the complex number2 i = −1. To determine the eigenvector associated with λ = i, we solve (A − iI)v = 0. Row-reducing the appropriate matrix with complex entries just as we would a matrix with real entries, we observe 1 −i 0 1 −i 0 −i − 1 0 → → 1 −i 0 0 0 0 −i −1 0 where the ﬁrst step was achieved by swapping the two rows, while the last step was achieved by computing the row replacement iR1 + R2 → R2 . It follows that any eigenvector v associated with λ = i must have components v1 and v2 that satisfy v1 = iv2 . Choosing v 2 = 1, we see that an eigenvector v corresponding to λ = i is v = [i 1]T . Similar computations with λ = −i show that a corresponding eigenvector is v = [−i 1]T . While we might suggest at this point that i x(t ) = e it 1 is a solution to x = Ax, such a solution involves the complex number i, and is not a real solution to the system. A plot of the direction ﬁeld for the system reveals further why no real solutions arise directly from the eigenvectors. In particular, if we examine ﬁgure 3.12, the direction ﬁeld and various trajectories exhibit behavior consistent with the fact that the system has no straight-line solutions due to the fact that it has no real eigenpairs: every trajectory appears to be circular. In this example, we will suspend our work with eigenvalues and eigenvectors and see whether we can determine a solution to the system more directly. If we examine the two equations given in the system x = Ax, we observe that we 2

A review of key concepts with complex numbers may be found in appendix B.

When a matrix lacks two real linearly independent eigenvectors

4

229

x2

x1 −4

4

−4 Figure 3.12 The direction ﬁeld for the system

x = Ax of example 3.5.3.

are trying to solve the two equations x1 = −x2 and x2 = x1 simultaneously. In particular, we seek two functions x1 (t ) and x2 (t ) such that the derivative of the ﬁrst is the opposite of the second and the derivative of the second is the ﬁrst. This is a familiar scenario encountered in calculus and we recognize that x1 (t ) = cos t and x2 (t ) = sin t form a pair of such functions. Further consideration reveals that the choices x1 (t ) = − sin t and x2 (t ) = cos t also satisfy the system. Our recent observations show that the vector functions cos t − sin t and x2 (t ) = x1 (t ) = sin t cos t each form a real solution to x = Ax; moreover, it is clear that x1 (t ) and x2 (t ) are not scalar multiples of one another, and thus these are two linearly independent solutions to the system. Therefore, theorem 3.3.2 implies that the general solution to the given system is cos t − sin t x(t ) = c1 (3.5.11) + c2 sin t cos t The presence of the sine and cosine functions in the entries of x will also lead to the circular trajectories we expect from the direction ﬁeld in ﬁgure 3.12. Example 3.5.3 shows several new phenomena. In every preceding example we have considered for 2 × 2 systems x = Ax, eigenpairs have directly provided at least one real solution to the system. But for the latest system we examined, the eigenpairs appeared to not produce any solutions to the system at all. Moreover, for the ﬁrst time in our work with linear systems, the sine and cosine

230

Linear systems of differential equations

functions arose. An important question to consider at this point is whether the complex eigenpair i λ = i, v = (3.5.12) 1 can be linked to the general solution that we found in (3.5.11). It turns out that the key idea lies in understanding how the exponential function e z behaves when the input z is a complex number. The great Swiss mathematician Leonhard Euler (1707–1783) is credited with discovering Euler’s formula, which states that for any real number t , e it = cos t + i sin t

(3.5.13)

In exercise 14 in this section, one way to derive Euler’s formula through Taylor series for the exponential and trigonometric functions is explored. For now, we will simply accept (3.5.13) and put it to use. Using the ﬁrst complex eigenpair found in example 3.5.3, let us consider the standard form of a potential solution to x = Ax, x(t ) = e λt v, using the eigenpair identiﬁed in (3.5.12). Here, since the solution we are considering is in fact complex, we will use the notation z(t ). Using Euler’s formula and complex arithmetic, observe that it i z(t ) = e 1 i = (cos t + i sin t ) 1 i cos t − sin t (3.5.14) = cos t + i sin t When working with complex numbers, it is often useful to identify the real and imaginary parts of the numbers. That is, for a complex number z = a + ib where a and b are real, we call a the real part of z, and b the imaginary part of z. The same distinctions hold for vectors with complex entries. Considering (3.5.14), if we separate this vector into its real and imaginary parts, we may write cos t − sin t z(t ) = (3.5.15) +i cos t sin t If we now compare the general solution to x = Ax that we found in (3.5.11) to (3.5.15) above, we can make a critical observation. The two linearly independent solutions to the system seen in (3.5.11) are in fact the real and complex parts of the vector z(t ) which arose from considering z(t ) = e λt v where (λ, v) was a complex eigenpair of A. That this fact holds in general is our next stated theorem. Theorem 3.5.2 If A is a real 2 × 2 matrix with a complex eigenvalue λ = a + ib and corresponding eigenvector v = p + iq, where a, b, p, and q are real, then

When a matrix lacks two real linearly independent eigenvectors

231

the real and imaginary parts of z(t ) = e (a +bi)t (p + iq) are real linearly independent solutions to x = Ax. We proceed to apply this result in another example involving complex eigenvalues and eigenvectors. −1 −2 Example 3.5.4 Let A = and consider the system of differential 2 −1 equations given by x = Ax. Find the general solution of the system, determine all equilibrium solutions to the system, and plot the direction ﬁeld for the system. Include sketches of several trajectories and discuss the long-term behavior of solutions relative to the equilibrium solution(s). Solution. For matrices with complex eigenvalues, Maple provides an efﬁcient and valuable approach: the program completes the necessary complex arithmetic automatically and produces the results we need. Doing so, we ﬁnd that A has complex eigenvalues λ = −1 ± 2i with corresponding complex eigenvectors v = [±i 1]T . We choose one of these complex eigenpairs and consider the complex function (−1+2i)t i z(t ) = e 1 Observe that e (−1+2i)t = e −t e 2ti , so by Euler’s formula e (−1+2i)t = e −t (cos 2t + i sin 2t ) Substituting this fact into z(t ), we observe that

i z(t ) = e (cos 2t + i sin 2t ) 1 −t − sin 2t + i cos 2t =e cos 2t + i sin 2t cos 2t − sin 2t + ie −t = e −t cos 2t sin 2t −t

By theorem 3.5.2, it now follows that the real and imaginary parts of z(t ) form two real linearly independent solutions to x = Ax, and therefore the general solution to x = Ax is cos 2t − sin 2t (3.5.16) + c2 e −t x(t ) = c1 e −t cos 2t sin 2t Since A is an invertible matrix, the origin is the only equilibrium solution of the system. Finally, as ﬁgure 3.13 shows, the direction ﬁeld and plotted trajectories exhibit behavior consistent with the fact that the system has no

232

Linear systems of differential equations

x2 4

x1 −4

4

−4 Figure 3.13 The direction ﬁeld for the system

x = Ax of example 3.5.4 along with several trajectories.

real eigenvectors and therefore no straight-line solutions. Moreover, since the real part of λ = −1 + 2i is negative, the role of e −t in the general solution (3.5.16) draws every solution to 0 and thus the origin is a stable equilibrium. In cases such as the one in example 3.5.4 where there are no straight-line solutions and every nonconstant solution tends to 0 as t → ∞, we naturally say that 0 is a spiral sink. Note that this case corresponds to the situation where the real part of a complex eigenvalue is negative. If the real part a of λ = a + bi is positive, then we will have e at present in the general solution, and this will drive every solution away from the origin. We therefore call 0 a spiral source and note that this equilibrium solution is unstable. Finally, in the event that a = 0 in the complex eigenvalue λ = a + bi, as it was in example 3.5.3, then all nonconstant solutions will orbit the origin while neither being drawn toward or repelled from the equilibrium solution. See, for example, ﬁgure 3.12. Such an equilibrium is called a center and is considered stable. In our discussions in this section we have addressed the two possible cases for a 2 × 2 matrix A which lacks two linearly independent eigenvectors. Our work extends naturally to the case of more general n × n systems where the n × n matrix A may or may not have n real linearly independent eigenvectors. Of course, in the case where A has a full set of n real linearly independent eigenvectors, the eigenpairs allow the general solution to the system to be determined. In cases where some of the eigenvalues are complex, or repeated with missing eigenvectors, we can work with each individual eigenvalue to build real linearly independent solutions in ways similar to our preceding work. Some examples are explored in the exercises that follow.

When a matrix lacks two real linearly independent eigenvectors

233

Table 3.1 The stability of the origin as determined by the eigenvalues of a 2 × 2 matrix A

0 < λ1 ≤ λ 2

0 is unstable and called a repelling node

λ1 < 0 < λ 2

0 is unstable and called a saddle

λ1 ≤ λ 2 < 0

0 is stable and called an attracting node

λ = a ± bi and a > 0

0 is unstable and called a spiral source

λ = a ± bi and a = 0

0 is stable and called a center

λ = a ± bi and a < 0

0 is stable and called a spiral sink

We close this section with a summary in table 3.1 of the stability of the origin as an equilibrium point of x = Ax in the cases where both eigenvalues are nonzero. Exercises 3.5 For each of exercises 1–7, the general solution x(t ) to a homogeneous linear 2 × 2 system of differential equations x = Ax is given. For each problem, sketch any straight-line solutions, classify the stability of the equilibrium solution x = 0, and sketch a few trajectories that are not straight lines. Do not use a computer. 1 −1 1. x(t ) = c1 e −2t + c2 e −3t 2 2 cos t − sin t 2. x(t ) = c1 e −2t + c2 e −2t sin t cos t 1 −1 + c2 e −t 3. x(t ) = c1 e 2t 1 1 1 −1 4. x(t ) = c1 e −2t + c2 1 1 2 cos t − sin t 5. x(t ) = c1 + c2 sin t 2 cos t 2 cos t − sin t t 3t 6. x(t ) = c1 e + c2 e sin t 2 cos t 4 1 7. x(t ) = c1 e 2t + c2 e t 1 4 For each of exercises 8–13, the characteristic polynomial p(λ) of a matrix A is given. That is, the zeros of the given polynomial are the eigenvalues of

234

Linear systems of differential equations

the matrix A. For each, classify the stability of the origin as an equilibrium point of the system given by x = Ax. 8. p(λ) = λ2 − 4 9. p(λ) = λ2 + 4 10. p(λ) = λ2 + λ + 1 11. p(λ) = λ2 − 10λ + 9 12. p(λ) = λ2 − 2λ + 5 13. p(λ) = λ2 + 3λ + 2 14. Recall or look up the formulas for the Taylor series about a = 0 for each of the functions e x , sin x, and cos x. Assuming that the Taylor series for e x is valid for complex numbers x, compute e ib and compare the result to the expansions for cos b and i sin b to show that e ib = cos b + i sin b In addition, show that e a +ib = e a (cos b + i sin b) In exercises 15–19, a matrix A is given. For each, consider the system of differential equations x = Ax and respond to (a) - (d). (a) Determine the general solution to the system x = Ax. (b) Classify the stability of all equilibrium solutions to the system. (c) How many straight-line solutions does this system of equations have? Why? (d) Use a computer algebra system to plot the direction ﬁeld for this system and sketch several trajectories by hand. 0 −2 15. A = 2 0 2 −3 16. A = 3 2 1 −2 17. A = 0 −2 −4 5 18. A = −5 4 7 −1 19. A = 4 11

When a matrix lacks two real linearly independent eigenvectors

235

In exercises 20–24, solve the IVP given by x = Ax and the stated initial condition. 0 −2 20. A = , x(0) = [1 3]T 2 0 2 −3 21. A = , x(0) = [−3 1]T 3 2 1 −2 22. A = , x(0) = [2 − 2]T 0 −2 −4 5 , x(0) = [−2 − 3]T 23. A = −5 4 7 −1 24. A = , x(0) = [0 5]T 4 11 25. Consider the system of differential equations x = Ax given by ⎡ ⎤ 3 1 −1 1⎦ A=⎣ 1 3 −1 1 3 (a) Determine the general solution to the system x = Ax. (b) Classify the stability of all equilibrium solutions to the system. (c) How many straight-line solutions does this system of equations have? Why? 26. Repeat exercise 25 using the matrix ⎡ ⎤ 0 3/2 −1/2 3/2⎦ A = ⎣−1 −3/2 −1 1/2 −1/2 27. Explain why every 3 × 3 homogeneous linear system of differential equations of the form x = Ax must always have at least one straight-line solution. Must every 4 × 4 system have at least one straight-line solution? Explain. What can you say about any n × n homogeneous linear system? In exercises 28–32, use the standard substitution to convert the given secondorder differential equation to a system of two linear ﬁrst-order equations. Solve the system to hence determine the solution y to the second-order equation. 28. y + y − 6y = 0 29. y + 2y + 5y = 0 30. y + 4y = 0 31. y + 3y − 28y = 0 32. y + y + 1 = 0

236

Linear systems of differential equations

3.6 Nonhomogeneous systems: undetermined coefﬁcients

So far in our studies of systems of linear differential equations, we have focused almost exclusively on the case where the system is homogeneous and can be represented in the form x = Ax. We now begin to investigate nonhomogeneous systems, which are systems of the form x = Ax + b where b = 0. In section 3.1, we encountered a system of two tanks where we were interested in the amount of salt in each tank at time t . With the amount of salt in the two tanks represented respectively by x1 (t ) and x2 (t ), we saw that these component functions had to satisfy the system of differential equations given by 1/80 x1 20 −1/20 (3.6.1) + x = 1/40 −1/40 x2 35 and that this system is naturally represented in the form x = Ax + b

(3.6.2)

In our most recent work with the homogeneous equation x = Ax, we noted several times the analogy to solving the single ﬁrst-order differential equation x = ax. In particular, we observed the key role that e λt plays in the process of solving homogeneous systems of equations, much like e at does in the solution of a single homogeneous linear ﬁrst-order equation. We next naturally consider the linear ﬁrst-order analogy of (3.6.2), a nonhomogeneous equation such as y = 2y + 5

(3.6.3)

In section 2.3, we made the observation in theorem 2.3.3 that for any linear ﬁrst-order differential equation in the form y + p(t )y = f (t ) if yp is any solution to the nonhomogeneous equation and yh is a solution to the corresponding homogeneous equation, then y = yp + yh is a solution to the nonhomogeneous equation. In our studies of linear algebra in chapter 1, we made a similar observation in section 1.5: if we have a solution xp to the nonhomogeneous equation Ax = b, and we add to xp any solution xh to the homogeneous equation Ax = 0, the result (x = xp + xh ) is also a solution to Ax = b. See (1.5.1) to revisit the details of this discussion. Note that in this purely linear algebra context, x is a vector whose entries are constant. These two preceding observations for linear ﬁrst-order differential equations and systems of linear algebraic equations are now applied to the nonhomogeneous system of linear ﬁrst-order differential equations, x = Ax + b. We note speciﬁcally that in this context, x(t ) is a function of t . Let’s return to

Nonhomogeneous systems: undetermined coefﬁcients

237

the known situation of the homogeneous system x = Ax and denote its solution by xh (t ). In addition, suppose we are able to determine a single solution xp (t ) to the nonhomogeneous equation x = Ax + b. We claim that the function x(t ) = xh (t ) + xp (t ) is the general solution of the nonhomogeneous equation. To see this, we substitute directly into x = Ax + b and verify that the equation is satisﬁed. By properties of linearity, observe that x (t ) = xh (t ) + xp (t )

(3.6.4)

Ax + b = A(xh + xp ) + b = Axh + Axp + b

(3.6.5)

and furthermore By how we deﬁned xh (t ) and xp (t ), we know that xh (t ) = Axh (t ) and xp (t ) = Axp (t ) + b, and thus (3.6.5) implies Ax + b = xh (t ) + xp (t )

(3.6.6)

From (3.6.4) and (3.6.6), we see that x = xh + xp is indeed a solution to x = Ax + b. In fact, we have found the general solution to the nonhomogeneous system, as stated in the following theorem. Theorem 3.6.1 Let A be an n × n matrix with constant coefﬁcients. If xh is the general solution to the homogeneous system x = Ax and xp is any solution to the nonhomogeneous system x = Ax + b, then x = xh + xp is the general solution to x = Ax + b. Theorem 3.6.1 provides an approach that will guide us throughout our efforts to solve nonhomogeneous systems of differential equations. First, we solve the associated homogeneous system to ﬁnd xh , a process we are familiar with. We usually call xh the complementary solution to the equation x = Ax + b. Next, we must ﬁnd a so-called particular solution xp to the nonhomogeneous system x = Ax + b. Although a more sophisticated approach will be introduced in the next section, for now we will investigate a few examples in which the process of ﬁnding such a particular solution xp is relatively straightforward. Example 3.6.1 From the system of two tanks discussed in sections 1.1 and 3.1, consider the nonhomogeneous system of linear differential equations given by 1/80 20 −1/20 x = x+ (3.6.7) 1/40 −1/40 35 By solving the associated homogeneous system and determining a particular solution to the nonhomogeneous system, ﬁnd the general solution to the given system. In addition, plot an appropriate direction ﬁeld and discuss the longterm behavior of solutions and their meaning in the context of the salt in each tank. Determine and sketch the solution to the IVP with initial condition x(0) = [2000 1000]T .

238

Linear systems of differential equations

Solution.

We begin by solving x = Ax, where 1/80 −1/20 A= 1/40 −1/40

The eigenvalues of A are approximately λ1 = −0.158 and λ2 = −0.592, with corresponding eigenvectors approximated by v1 = [0.366 1.000]T and v2 = [−1.366 1.000]T . It follows that the general solution xh is −0.158t 0.366 −0.592t −1.366 xh (t ) = c1 e + c2 e 1.000 1.000 Next, we must determine a particular solution xp to the nonhomogeneous equation x = Ax + b. In this particular example, b is a constant vector. Therefore, it is natural to guess that a constant vector xp will satisfy the nonhomogeneous equation. More than this, we should recall from earlier discussions of the problem leading to the given system that the vector x represents the amounts of salt in two connected tanks as streams of inﬂow deliver salt, each at a constant rate. Our intuition suggests that over time the two tanks should approach a stable equilibrium, and hence an equilibrium (and therefore constant) solution should be present. Therefore, we assume that xp is a constant vector and observe that this immediately implies that xp = 0. Substituting into x = Ax + b, it follows that xp must satisfy the system of linear equations 0 = Axp + b or Axp = −b. With the given entries of A and b, this leads us to row reduce the appropriate augmented matrix and ﬁnd that 1/80 −20 1 0 1000 −1/20 → 1/40 −1/40 −35 0 1 2400 This shows xp = [1000 2400]T is a particular solution to x = Ax + b, and, more speciﬁcally, is an equilibrium solution of the system. Moreover, it now follows that the general solution to the system is given by 1000 −0.158t 0.366 −0.592t −1.366 x(t ) = xh (t ) + xp (t ) = c1 e + c2 e + 1.000 1.000 2400 (3.6.8) T If we add the initial condition that x(0) = [2000 1000] , we can solve for the constants c1 and c2 , and plot the appropriate corresponding trajectory, as shown in ﬁgure 3.14. In both (3.6.8) and ﬁgure 3.14 we can see how the long-term behavior of every solution tends to the equilibrium solution. Moreover, in the direction ﬁeld we can also recognize the straight-line solutions that correspond to lines in the direction of each eigenvector but that now pass through the equilibrium solution (1000, 2400). From example 3.6.1, we observe that in cases where we want to solve x = Ax + b and b is itself a constant vector, xp may be determined by assuming that xp is a constant vector and solving 0 = Axp + b. If xp is not constant, then the situation is more complicated, as we discover in the following example.

Nonhomogeneous systems: undetermined coefﬁcients

5000

239

x2

3000 equilibrium solution (1000, 2400)

solution through (2000, 1000)

1000 x1 1000

2000

Figure 3.14 The direction ﬁeld for the system x = Ax + b of example 3.6.1.

Example 3.6.2 Find the general solution of the nonhomogeneous system given by 2 −1 cos 2t x = x+ (3.6.9) 3 −2 0

2 −1 are λ1 = −1 and λ2 = 1 with 3 −2 corresponding eigenvectors v1 = [1 3]T and v2 = [1 1]T , it follows that the complementary solution to the related homogeneous system is 1 1 xh = c1 e −t + c2 e t 3 1

Solution.

Since the eigenvalues of A =

To determine the particular solution xp to the given nonhomogeneous system, we need to ﬁnd a vector function x(t ) that simultaneously satisﬁes the system (3.6.9). Due to the presence of cos 2t in the vector b, it is natural to guess that the components of xp will somehow involve cos 2t . In addition, since xp plays a role in the system, we must account for the possibility that the derivative of cos 2t may also arise; moreover, since Ax will also be computed, linear combinations of vectors that involve the entries in x will be present. Therefore, we make the reasonable guess that xp has the form a cos 2t + b sin 2t xp = (3.6.10) c cos 2t + d sin 2t and attempt to determine values for the undetermined coefﬁcients a , b , c , and d that make xp a solution to the system. We accomplish this by direct substitution into (3.6.9). First, observe that −2a sin 2t + 2b cos 2t xp = (3.6.11) −2c sin 2t + 2d cos 2t

240

Linear systems of differential equations

Now substituting (3.6.10) and (3.6.11) into (3.6.9), it follows 2 −1 a cos 2t + b sin 2t cos 2t −2a sin 2t + 2b cos 2t = + 3 −2 c cos 2t + d sin 2t 0 −2c sin 2t + 2d cos 2t If we now expand the matrix product and factor out the terms involving sin 2t and cos 2t on the right side, −2a sin 2t + 2b cos 2t = (2b − d) sin 2t + (2a − c + 1) cos 2t

(3.6.12)

−2c sin 2t + 2d cos 2t = (3b − 2d) sin 2t + (3a − 2c) cos 2t

(3.6.13)

In (3.6.12), we can equate the coefﬁcients of sin 2t to ﬁnd that −2a = 2b − d. Doing likewise for the coefﬁcients of cos 2t , 2b = 2a − c + 1. Similarly, (3.6.13) results in the two equations −2c = 3b − 2d and 2d = 3a − 2c. Reorganizing these four equations in four unknowns, we see that a , b , c , and d must satisfy the system −2a − 2b + d = 0 −2a + 2b + c = 1 −3b − 2c + 2d = 0 −3a + 2c + 2d = 0

Row-reducing,

⎡

0 −2 −2 ⎢−2 2 1 ⎢ ⎣ 0 −3 −2 −3 0 2

1 0 2 2

⎤ ⎡ 0 1 ⎢0 1⎥ ⎥→⎢ ⎣0 0⎦ 0 0

0 1 0 0

0 0 1 0

⎤ 0 −2/5 0 2/5⎥ ⎥ 0 −3/5⎦ 1 0

which shows a = −2/5, b = 2/5, c = −3/5, and d = 0, so a particular solution to the nonhomogeneous system is

− 25 cos 2t + 25 sin 2t xp = − 35 cos 2t Finally, it follows that the general solution to the system is

2 − 5 cos 2t + 25 sin 2t −t 1 t 1 x = x h + x p = c1 e + c2 e + 3 1 − 35 cos 2t One lesson to take from example 3.6.2 is that while the process for trying to solve a nonhomogeneous system of differential equations is straightforward, the actual computation of a particular solution xp can be quite cumbersome. Indeed, even in the case where the vector b is quite simple, as it is in the most recent example, tedious calculations can arise. Moreover, it is less clear how one might proceed in the situation where the vector b is particularly complicated. Speciﬁcally, making an appropriate guess for xp may be difﬁcult. We usually

Nonhomogeneous systems: undetermined coefﬁcients

241

call the process of ﬁnding xp through a guess involving unknown constants the method of undetermined coefﬁcients. To gain a better sense of the guesses that are involved in using undetermined coefﬁcients, we turn to the following example. Example 3.6.3 For nonhomogeneous linear systems of the form x = Ax + b where A is a matrix with constant entries, state the natural guess to use for xp when the vector b is −t 2 −3t e 1 t e (b) b = (c) b = (d) b = (a) b = t 2e −t 0 −2 Solution. (a) With b = [e −t 2e −t ]T , it is natural to expect that any particular solution must involve e −t in its components. Speciﬁcally, we make the guess that −t Ae xp = Be −t and substitute directly into x = Ax + b in order to attempt to ﬁnd values of A and B for which xp satisﬁes the given system.3 (b) Given b = [1 t ]T , we must account for the fact that xp and its derivative can involve constant and linear functions of t . In particular, we suppose that At + B xp = Ct + D and substitute appropriately in an effort to determine A, B, C, and D. (c) For b = [t 2 0]T , with one quadratic term present in b, it is necessary to include quadratic terms in each entry of xp . But since the derivative of xp will be taken, linear terms must be included as well. Finally, once linear terms are included, for the same reason we must permit the possibility that constant terms can be present in xp . Therefore, we guess the form 2 At + Bt + C xp = Dt 2 + Et + F (d) With b = [e −3t − 2]T having both an exponential and constant term present, we account for both of these scalar functions and their derivative by assuming that −3t Ae +B xp = Ce −3t + D 3 It is possible that the guess can fail to work, in which case a modiﬁed form for x is required. One p setting where this may occur is when λ = −1 is an eigenvalue of A, whereby a vector involving e −t already appears in the complementary solution xh . See exercise 8 for further investigation of this issue.

242

Linear systems of differential equations

The method of undetermined coefﬁcients is not foolproof: it is certainly possible to guess incorrectly (as noted in the footnote related to part (a) of example 3.6.3). If our guess is incorrect, an inconsistent linear system of algebraic equations will arise, which tells us we need to modify our guess. Besides the possibility of guessing incorrectly, it can also be the case that the computations involved in determining xp are very cumbersome. In the next section, we consider a different approach, one that parallels our solution of single linear ﬁrst-order differential equations of the form y + p(t )y = f (t ), that provides, at least in theory, an algorithmic approach to solving any nonhomogeneous system x = Ax + b where the matrix A has real, constant entries. Finally, we note that the presence of nonconstant entries in the vector b in a nonhomogeneous system x = Ax + b makes it impossible to plot a direction ﬁeld for the system. In particular, when we sketch direction ﬁelds, we rely on the fact that regardless of time, t , the direction vector x to the solution curve x is dependent only on the location (x1 , x2 ), and not on t . When b is nonconstant and a function of t , this is no longer the case and we therefore are left with only algebraic approaches to the problem. If b is constant, then we can generate the direction ﬁeld for the system, such as the one shown in ﬁgure 3.14. Exercises 3.6 In each of exercises 1–4, show by direct substitution that the given particular solution xp is indeed a solution to the stated nonhomogeneous system of equations. Hence determine the general solution to the stated system. 3 5 −1 −4 1. x = x+ , xp = 2 −3 −1 −3 2t 1 −2 −1/3 e 2t 2. x = x+ , xp = e 2/3 −2 1 0 2 1 sin t −2/5 −3/10 + cos t 3. x = x+ , xp = sin t 1 2 0 1/10 1/5 2t 1 1 3/14 −3 e +1 x+ , xp = + e 2t 4. x = 1 −1 2 1/14 1 5. Consider the system of differential equations 1 1 1 x = x+ 4 1 −3 (a) Explain why it is reasonable to assume that xp is a constant vector, and use this assumption to determine a particular solution to the given nonhomogeneous system. (b) Determine the complementary solution xh to the associated homogeneous system, x = Ax. (c) State the general solution to the system. (d) Is there an equilibrium solution to this system? If so, is it stable? Explain.

Nonhomogeneous systems: undetermined coefﬁcients

243

6. Consider the system of differential equations 4t 1 1 e x = x+ 4 1 0 (a) Explain why it is reasonable to assume that xp is a vector of the form 4t ae xp = be 4t Then use this assumption to determine a particular solution to the given nonhomogeneous system. (b) Determine the complementary solution xh to the associated homogeneus system, x = Ax. (c) State the general solution to the system. 7. Consider the system of differential equations −2t 1 1 e +1 x+ x = 4 1 2e −2t + 3 (a) Explain why it is reasonable to assume that xp is a vector of the form −2t ae +b xp = ce −2t + d Use this assumption to determine a particular solution to the given nonhomogeneous system. (b) Determine the complementary solution xh to the associated homogeneus system, x = Ax. (c) State the general solution to the system. 8. Consider the system of differential equations −t 1 1 e x+ x = 4 1 0 (a) Explain why it is reasonable to assume that xp is a vector of the form −t ae xp = be −t (b) Show that the form of xp above does not result in a particular solution to the system. (c) By assuming that xp is a vector of the form −t ae + bte −t xp = ce −t + dte −t determine a particular solution to the given nonhomogeneous system (d) Determine the complementary solution xh to the associated homogeneus system, x = Ax. (e) State the general solution to the system.

244

Linear systems of differential equations

For the nonhomogeneous linear systems of differential equations given in exercises 9–17, (a) determine a particular solution xp by making an appropriate assumption about the form of xp , (b) determine the complementary solution xh to x = Ax, and (c) hence state the general solution to the system. 5 1 −4 9. x = x+ 5 −4 −1 −2t 5 3e −4 10. x = x+ 5 −4 −e −2t 3e −2t −1 1 11. x = x+ 0 1 −4e −2t 2 −1 1 12. x = x+ 0 1 −5 0 −1 3 13. x = x+ 1 0 −2 0 −1 et 14. x = x+ 1 0 −2e t 0 −1 3 + et 15. x = x+ 1 0 −2 − 2e t 2 −1 t −2 16. x = x+ 3 −2 3t − 4 2 −1 cos 3t 17. x = x+ 3 −2 4 18. For the system of differential equations given in exercise 10, solve the IVP with initial condition x(0) = [1 − 2]T . 19. For the system of differential equations given in exercise 11, solve the IVP with initial condition x(0) = [−3 − 2]T . 20. For the system of differential equations given in exercise 14, solve the IVP with initial condition x(0) = [0 4]T . 21. For the system of differential equations given in exercise 15, solve the IVP with initial condition x(0) = [1 − 2]T . 22. Without actually computing xp , choose and justify the form you would guess for a particular solution to 5 1 −4 −2t x = x + e sin t 5 −4 −1

Nonhomogeneous systems: variation of parameters

245

23. Without actually computing xp , choose and justify the form you would guess for a particular solution to −4 5 sin 3t x = x+ 5 −4 cos 2t 24. Suppose that x1 (t ) and x2 (t ) are solutions of x = Ax + f1 (t ) and x = Ax + f2 (t ) respectively. Show that x(t ) = x1 (t ) + x2 (t ) is a solution of x = Ax + f1 (t ) + f2 (t ) 3.7 Nonhomogeneous systems: variation of parameters

In section 3.6, we discovered that solving the nonhomogeneous linear system x = Ax + b requires us to ﬁnd one particular solution xp to the nonhomogeneous system. We then combine this particular solution with the complementary solution xh —the general solution to the corresponding homogeneous system x = Ax. While we were able to successfully solve a range of problems, the method of undetermined coefﬁcients is somewhat dissatisfying: essentially we made an educated guess as to the form that xp should take, and then substituted to see if our guess was appropriate and resulted in a particular solution. As was shown in exercise 8 in section 3.6, there are instances when the obvious guess fails to work and additional investigation of a possible solution xp is needed. Moreover, with undetermined coefﬁcients we only considered functions b(t ) that had entries that were polynomial, sinusoidal, or exponential in nature. We desire a more systematic approach to ﬁnding xp ; developing such a method is the purpose of this section. In section 2.3, we learned that for any linear ﬁrst-order differential equation of the form y + p(t )y = f (t ), the solution y is given by −P(t ) e P(t ) f (t ) dt (3.7.1) y =e where P(t ) = p(t ) dt . We now seek to establish a similar result for the case of systems of the form x = Ax + b, where A is an n × n matrix with constant entries and b is a vector function of t . Let us ﬁrst consider the form of the general solution xh to the corresponding homogeneous system. Recall that x = c1 x1 + · · · + cn xn , where {x1 , . . . , xn } is a set of n linearly independent solutions to x = Ax. Being more explicit about the vectors present, say with entries xij (t ), we can rewrite x = c1 x1 + · · · + cn xn as ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ x11 x12 x1n c1 x11 + c2 x12 + · · · + cn x1n ⎢ x21 ⎥ ⎢ x22 ⎥ ⎢ x2n ⎥ ⎢ c1 x21 + c2 x22 + · · · + cn x2n ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ x = c1 ⎢ .. ⎣ ... ⎦ + c2 ⎣ ... ⎦ + · · · + cn ⎣ ... ⎦ = ⎣ ⎦ .

xn1

xn2

xnn

c1 xn1 + c2 xn2 + · · · + cn xnn

246

Linear systems of differential equations

Now observe that the right side of the above equation—the overall vector formulation of x—can be expressed as a matrix product. In particular, we write x = C

(3.7.2)

where C is the vector whose entries are the arbitrary constants c1 , . . . , cn that arise in the formulation of the general solution x, and (t ) is the matrix whose columns are the n linearly independent solutions to x = Ax. We call (t ) the fundamental solution matrix of the system. At this point, it is essential to make two observations about (t ). The ﬁrst is that (t ) is nonsingular for every relevant value of t . This holds because the columns of (t ) are linearly independent since, by deﬁnition, they are linearly independent solutions of x = Ax. Second, we note that (t ) = A(t ). Since the derivative of (t ) is taken component-wise, this equation is simply the matrix way to say that each column of (t ) satisﬁes the homogeneous system of equations x = Ax. Now, recall (3.7.2) where we expressed the complementary solution in the form xh = (t )C. As we now seek a particular solution xp to the nonhomogeneous equation, it is natural to suppose that xp has the form xp (t ) = (t )u(t )

(3.7.3)

where u(t ) is a function yet to be determined. We now substitute this guess for xp into x = Ax + b(t ) to see what conditions u must satisfy. For ease of display, in what follows we suppress the “(t )” notation in each of the functions , u, u , and b. By the product rule, xp = (u) = u + u and so substituting into the system x = Ax + b(t ), we have u + u = Au + b

(3.7.4)

Recalling our observation above that = A, we can substitute in (3.7.4) to ﬁnd u + Au = Au + b

(3.7.5)

We next subtract Au from both sides of (3.7.5) to deduce that u = b

(3.7.6)

Since we are interested in determining the unknown function u, and we know that is nonsingular, we may now write u = −1 b and, therefore, u must have the form u(t ) = −1 (t )b(t ) dt

(3.7.7)

(3.7.8)

Nonhomogeneous systems: variation of parameters

247

Finally, recalling the supposition we made in (3.7.3) that xp = u, (3.7.8) now implies xp (t ) = (t ) −1 (t )b(t ) dt (3.7.9) It is remarkable how this form of xp aligns with our experience with a single linear ﬁrst-order differential equation and the form of its solution given by (3.7.1). We summarize our above work in the following theorem. Theorem 3.7.1 If A is an n × n matrix with constant entries, (t ) is the fundamental solution matrix of the homogeneous system of differential equations x = Ax, and b(t ) is a continuous vector function, then a particular solution xp to the nonhomogeneous system x = Ax + b(t ) is given by xp (t ) = (t ) −1 (t )b(t ) dt (3.7.10) The approach to ﬁnding a particular solution given in theorem 3.7.1 is often called variation of parameters. We next consider an example to see theorem 3.7.1 at work. Example 3.7.1 Find the general solution of the nonhomogeneous system given by 2 −1 0 x+ t x = 3 −2 4 Solution. From our determination of the eigenvalues and eigenvectors of the same coefﬁcient matrix in example 3.6.2, the complementary solution is 1 1 xh = c1 e −t + c2 e t 3 1 Therefore, the fundamental matrix is

e −t e t (t ) = 3e −t e t

According to (3.7.10), we next need to compute −1 . While the inverse of this matrix of functions may be computed by row-reducing [ | I] in the usual way, because of the function coefﬁcients in it is much easier to use a shortcut for computing the inverse of a 2 × 2 matrix that we established in exercise 19 of section 1.9. Speciﬁcally, if a b A= c d is an invertible matrix, then 1 d −b A−1 = a det(A) −c

248

Linear systems of differential equations

Here, since det() = e −t e t − 3e −t e t = −2, it follows 1 e t −e t −1 =− −t e −t 2 −3e Thus, by (3.7.10), we now have xp (t ) = (t ) −1 (t )b(t ) dt

e −t 3e −t −t e = 3e −t =

1 t − 12 e t 2e 3 −t 1 −t e − e 2 2 t t e 2te dt et −2te −t

et et

0 dt 4t

Integrating the vector function component-wise by parts and computing the subsequent matrix product, −t t e e 2(t − 1)e t xp (t ) = 3e −t e t 2(t + 1)e −t 2(t − 1) + 2(t + 1) = 6(t − 1) + 2(t + 1) 4t = 8t − 4 Therefore, the general solution to the original nonhomogeneous system is 1 1 4t + c2 e t + x = xh + xp = c1 e −t 3 1 8t − 4 Example 3.7.1 demonstrates that there are three key steps in the solution to systems of the form x = Ax + b(t ). The ﬁrst is solving the related homogeneous system x = Ax to determine the fundamental solution matrix (t ). Next, we have to compute −1 (t ). And ﬁnally, we must integrate the vector function given by −1 (t )b(t ). Since we are seeking just one particular solution xp , there is no need to include the arbitrary constants that arise in antidifferentiating −1 (t )b(t ). We close this section with a second example that shows the computations involved when more complicated functions are present in b(t ). Example 3.7.2 given by

Find the general solution of the nonhomogeneous system x =

2 −1 1/(e t + 1) x+ 3 −2 1

Nonhomogeneous systems: variation of parameters

249

Solution. We ﬁrst ﬁnd xh . By ﬁnding the eigenvalues and eigenvectors of the coefﬁcient matrix A, it is straightforward to show that 1 1 + c2 e t xh = c1 e −t 3 1 Therefore, the fundamental solution matrix is −t t e e (t ) = 3e −t e t Moreover, we can show that −1

1 e t −e t (t ) = − −t e −t 2 −3e

We are now ready to compute xp and write xp (t ) = (t ) −1 (t )b(t ) dt

e −t e t = 3e −t e t =

e −t e t 3e −t e t

e t −e t 1/(e t + 1) 1 − dt 1 2 −3e −t e −t ⎤ ⎡ 2t ⎣

1 e 2 e t +1 ⎦ dt 1 2e −t −1 t 2 e +1

At this point, it is easiest to use a computer algebra system to integrate and complete our calculation of xp . Doing so, and then ﬁnding the required matrix product, we have

−t t 1 t 1 t e e 2 e − 2 ln(e + 1) xp (t ) = 3e −t e t −e −t − 3 t + 3 ln(e t + 1) 2 2

1 1 −t 3 t t − 2 − 2 e ln(e + 1) − 2 te + 32 e t ln(e t + 1) = 1 3 −t 3 t 3 t t t 2 − 2 e ln(e + 1) − 2 te + 2 e ln(e + 1) Hence, the general solution to the given nonhomogeneous system is 1 −t 1 t + c2 e x = x h + x p = c1 e 3 1 ⎤ ⎡ − 12 − 12 e −t ln(e t + 1) − 32 te t + 32 e t ln(e t + 1) ⎦ +⎣ 1 3 −t t + 1) − 3 te t + 3 e t ln(e t + 1) − e ln(e 2 2 2 2

250

Linear systems of differential equations

At each stage in applying variation of parameters it is essential to simplify. In particular, −1 (t ) should be simpliﬁed as much as possible before computing −1 (t )b(t ), and similarly, −1(t )b(t ) dt should be simpliﬁed as much as possible before computing (t ) −1 (t )b(t ) dt . One option, of course, is to use a computer algebra system to avoid the more tedious aspects of the computations. We offer some suggestions for how to use Maple to assist in the computations in the following subsection. 3.7.1 Applying variation of parameters using Maple

Here we address how Maple can be used to execute the computations in a problem such as the one posed in example 3.7.2, where we are interested in solving the nonhomogeneous linear system of equations given by 2 −1 1/(e t + 1) x = x+ 3 −2 1 As usual, we load the Linear Algebra package. > with(LinearAlgebra):

Because we already know how to ﬁnd the complementary solution, we focus on determining xp by variation of parameters. First, we use the complementary solution, 1 1 + c2 e t xh = c1 e −t 3 1 to deﬁne the fundamental matrix (t ): > Phi := |<exp(t),exp(t)>>;

We next use the MatrixInverse command to ﬁnd −1 by entering > MatrixInverse(Phi);

The resulting output is

− 12 e −1 t 3 1 2 et

1 1 2 e −t − 12 e1t

We can simplify this result using negative exponents; Maple can do so through the following command, through which we also store −1 in PhiInv: > PhiInv := simplify(MatrixInverse(Phi));

Next, in order to compute −1 (t )b(t ), we must enter the function b(t ). We enter > b := ;

Nonhomogeneous systems: variation of parameters

251

and then > y := simplify(PhiInv.b);

At this point, y is a 2 × 1 array that holds the vector function −1 (t )b(t ). Speciﬁcally, the output for y displayed by Maple is ⎡ ⎤ 2t 1 e 2 e t +1

⎦ y := ⎣ −t 2+e t ) − 12 e (e− t +1

To access the components in y, we reference them with the commands y[1,1] and y[2,1]. In particular, since we have to integrate −1 (t )b(t ) componentwise, we enter > Y := ;

This last command produces the output

1 t 1 t 2 e − 2 ln(e + 1) Y := − e1t − 32 ln(e t ) + 32 ln(e t + 1) and obviously stores −1 (t )b(t ) in Y . Note that Maple has not made the obvious simpliﬁcation ln(e t ) = t . Finally, in order to compute (t ) −1 (t )b(t ) dt , we need to enter Phi.Y. Of course, we again want to simplify, so we use > simplify(Phi.Y);

which produces the output 1 1 −t

− 2 − 2 e ln(e t + 1) − 32 e t ln(e t ) + 32 e t ln(e t + 1) 1 2

− 32 e −t ln(e t + 1) − 32 e t ln(e t ) + 32 e t ln(e t + 1)

This last result is the particular solution xp to the original system of nonhomogeneous equations given in example 3.7.2. Note again that we can simplify ln(e t ) to t in each component. Exercises 3.7 1. Consider the of system differential equations given by 3 2 5 x = x+ 2 3 −1 (a) Based on the form of b(t ), make a guess and determine xp by undetermined coefﬁcients. (b) Use variation of parameters to determine xp .

252

Linear systems of differential equations

2. Consider the 2t of differential equations given by system 3 2 e x = x+ 2 3 0 (a) Based on the form of b(t ), make a guess and determine xp by undetermined coefﬁcients. (b) Use variation of parameters to determine xp . 3. Consider the t of system differential equations given by 3 2 3e x = x+ 2 3 et (a) Based on the form of b(t ), what is the natural guess for xp ? Show that this natural guess fails to work. (b) Compute the complementary solution xh to the stated system and use its form to explain why the natural guess in (a) is not a valid one. (c) Use variation of parameters to determine xp . 4. Consider equations given by thesystem of differential 0 2 4 sin t x = x+ 1 −1 2 sin t (a) Based on the form of b(t ), what would be the natural guess to make for xp ? How many undetermined coefﬁcients would need to be computed? (b) Use variation of parameters to determine xp . In each of the exercises 5–12, determine the general solution to the given system by ﬁnding xp using variation of parameters. Note that in each case, (t ) is given. −t t 2e 0 1 0 2e 5. x = x+ , (t ) = −1 3 1 e t e 3t 3t t 2e 0 1 0 e 6. x = x+ , (t ) = −1 3 e t e 3t −e 3t t 2e 0 1 0 cos 2t 7. x = x+ , (t ) = 2 sin 2t −1 3 e t e 3t 3t 2 1 10t e −e −t 8. x = x+ , (t ) = 3t 3 0 10t e 3e −t −t 3t 2 1 2e e −e −t 9. x = x+ , (t ) = 3t 3 0 5e −t e 3e −t 2t t 1 1 e cos t e t sin t e 10. x = x+ , (t ) = −1 1 −e t sin t e t cos t 0

Applications of linear systems

1 1 2t + 2 11. = x+ , 0 −1 1 ⎤ ⎡ ⎡ t⎤ 2 1 0 e 12. x = ⎣0 2 0⎦ x + ⎣ 1⎦ , 0 0 1 0

x

e t cos t e t sin t (t ) = −e t sin t e t cos t ⎤ ⎡ 2t e te 2t 0 (t ) = ⎣ 0 e 2t 0⎦ − 0 0 e t

253

3.8 Applications of linear systems

In this section, we consider three fundamental physical problems that may be modeled and studied using linear systems of differential equations. 3.8.1 Mixing problems

Through our study of the motivating example provided at the start of chapter 1 and reconsidered at the beginning of the current chapter, we have seen that mixing problems naturally lead to nonhomogeneous linear systems of differential equations. Below, we examine a slightly more complicated example. Consider a system of three tanks connected in such a way that each of the tanks has an independent inﬂow that delivers salt solution to it, each has an independent outﬂow (drain), and each tank is connected to the other two with both outﬂow and inﬂow pipes. The relevant information about each tank is given in table 3.2. We set up a system of differential equations whose solution represents the amount of salt in each tank at time t and state the system in matrix form. For tank A, we denote the amount of salt (in grams) in the tank at time t (in minutes) by x1 (t ). Similarly, we let x2 (t ) and x3 (t ) represent the amount of salt in tanks B and C. A careful check of the given data shows that for each tank the total rates Table 3.2 Saltwater mixing in three tanks A, B, and C

Tank A

Tank B

Tank C

Tank volume

50 liters

100 liters

200 liters

Rate of inﬂow to the tank

2 liters/min

4 liters/min

5 liters/min

Concentration of salt in inﬂow

0.25 g/liter

2 g/liter

0.9 g/liter

Rate of drain outﬂow

2 liters/min

4 liters/min

5 liters/min

Rates of outﬂows to other tanks to B: 3 liters/min to C: 1 liter/min to A: 4 liters/min Rates of outﬂows to other tanks to C: 4 liters/min to A: 3 liters/min to B: 1 liter/min

254

Linear systems of differential equations

of inﬂow and outﬂow of solution balance so that the volume of solution in each tank is constant. From the given information on the independent inﬂow to the tank, we know that tank A gains salt at a rate of 0.25

g liters g ·2 = 0.5 liter min min

(3.8.1)

Furthermore, tank A also gains salt from the two inﬂows that come from tanks B and C. For tank B, which contains 100 liters of solution, solution ﬂows to A at a rate of 3 liters/min with a concentration of x2 (t )/100 g/liter, so that salt is gained by tank A at a rate of liters 3x2 g x2 g ·3 = 100 liter min 100 min

(3.8.2)

Similarly, the ﬂow from tank C to tank A results in A gaining salt at a rate of liters x3 g x3 g ·4 = 200 liter min 50 min

(3.8.3)

Tank A is also losing salt through its three outﬂows: a drain, ﬂow to tank B, and ﬂow to tank C. Since the concentration of solution in tank A at time t is x1 (t )/50 g/liter, it follows that each outﬂow carries this concentration of salt, doing so at respective rates of 2 liters/min, 3 liters/min, and 4 liters/min. This shows that solution is leaving tank A at a cumulative rate of 9 liters/min, therefore causing the rate at which salt is lost from tank A to be x1 g liters 9x1 g ·9 = 50 liter min 50 min

(3.8.4)

Combining the rates of inﬂow and outﬂow in (3.8.1), (3.8.2), (3.8.3), and (3.8.4), it follows that x1 (t ) satisﬁes the differential equation x1 = 0.5 +

3x2 4x3 9x1 + − 100 200 50

(3.8.5)

Similar reasoning shows that x2 (t ) and x3 (t ) satisfy the differential equations 3x1 x3 8x2 x2 = 8 + + − (3.8.6) 50 200 100 and 4x1 x2 10x3 + − (3.8.7) x3 = 4.5 + 50 100 200 Rearranging (3.8.5), (3.8.6), and (3.8.7) and writing the system they generate in matrix form, we see ⎡ ⎡ ⎤ ⎤ −9/50 3/100 1/50 0.5 x = ⎣ 3/50 −2/25 1/200⎦ x + ⎣ 8⎦ (3.8.8) 4.5 2/25 1/100 −1/20

Applications of linear systems

255

We can easily determine the equilibrium solution to the system by setting x = 0 and row-reducing the resulting linear system of equations. Doing so results in ⎡ ⎤ ⎡ ⎤ −9/50 3/100 1/50 −0.5 1 0 0 50 ⎣ 3/50 −2/25 1/200 −8⎦ → ⎣0 1 0 150⎦ 0 0 1 200 2/25 1/100 −1/20 −4.5 so that x1 = 50, x2 = 150, x3 = 200 is the only equilibrium solution to the system. In addition, the eigenpairs of the coefﬁcient matrix A are approximately λ = − 0.030, −0.204, −0.076 and v = [0.203 0.346 1]T , [−2.041 0.949 1]T , [−0.168 − 1.250 1]T . Since all three eigenvalues are real and negative, we can conclude that the above equilibrium is a stable attracting node. Moreover, we can determine the general solution to the system. The eigenvalues and eigenvectors provide us with xh , the complementary solution, while xp is given by the equilibrium solution so that ⎡ ⎤ ⎡ ⎤ −2.041 0.203 x(t ) = c1 e −0.030t ⎣ 0.346 ⎦ + c2 e −0.204t ⎣ 0.949 ⎦ 1 1 ⎡ ⎤ ⎡ ⎤ −0.168 50 + c3 e −0.076t ⎣ −1.250 ⎦ + ⎣ 150 ⎦ 200 1 We conclude from this example that three connected tanks generate a natural example of a linear system of nonhomogeneous differential equations. Certainly, we can envision similar ideas being applied to more complicated scenarios, such as the spread of a pollutant through a connected chain of rivers and lakes. 3.8.2 Spring-mass systems

In section 3.1, we developed the linear second-order differential equation that governs the behavior of a spring-mass system and converted the equation to a system of two ﬁrst-order equations. In particular, we learned that for a system with mass m, spring constant k, damping constant c, and driving force F (t ), the displacement y(t ) of the mass from its equilibrium position satisﬁes the DE y +

c k 1 y + y = F (t ) m m m

(3.8.9)

Moreover, using the substitution x1 = y and x2 = x1 = y , it follows that (3.8.9) can be represented by the system x1 = x2 x2 = −

k c 1 x1 − x2 + F (t ) m m m

(3.8.10)

256

Linear systems of differential equations

L k1

k2 m1 equilibrium

m2 equilibrium

Figure 3.15 Two masses m1 and m2 joined by two springs, at equilibrium.

Next, we consider the more complicated case of a system involving two masses and two springs, but omit damping and driving forces. In particular, suppose that a mass m1 is attached to a spring with spring constant k1 and that from m1 a second spring with constant k2 and mass m2 is attached, as shown in ﬁgure 3.15. While we represent the masses with boxes, for our theoretical work we assume we are working with point-masses, where all of the mass is concentrated at a single point. We can envision these points as lying at the centers of the respective boxes in ﬁgure 3.15. To omit damping, we assume that the surface on which the masses rest is frictionless. In addition, once the masses are set in motion by some collection of initial displacements and velocities, we let x1 (t ) denote the displacement of m1 from its equilibrium position and x2 (t ) the displacement of m2 from its equilibrium position and set the system in motion, as shown in ﬁgure 3.16. We seek a system of ﬁrst-order differential equations that models this situation. Note that m1 has two springs attached to it, so each spring exerts forces on m1 . One is F1 = −k1 x1 , which is the force the ﬁrst spring exerts to oppose the displacement of the ﬁrst mass. Next, observe that when the system is at equilibrium, the distance between the two masses is some constant L. Once the system is set in motion, the distance between the two masses is L + x2 − x1 . As such, the second spring is being stretched a length of x2 − x1 beyond where it is when the system is at equilibrium. On mass m1 this exerts a force in the opposite direction of F1 , speciﬁcally the force F2 = k2 (x2 − x1 ) on m1 . On the second mass m2 there is only this same force exerted by the second spring, but in the opposite direction as on m1 . In particular, F3 = −k2 (x2 − x1 ) acts on m2 . L

x1

x2

Figure 3.16 Two masses m1 and m2 and two springs displaced from equilibrium.

Applications of linear systems

257

Now, because we have omitted damping and forcing, these are the only forces acting on m1 and m2 . Newton’s second law tells us that the sum of all forces acting on an object must equal the object’s mass times its acceleration. In particular, we have m1 x1 = −k1 x1 + k2 (x2 − x1 ) m2 x2 = −k2 (x2 − x1 ) Dividing through by m1 and m2 , respectively, these observations lead us to the system of linear second-order differential equations k1 k2 x1 + (x2 − x1 ) m1 m1 k2 x2 = − (x2 − x1 ) m2 x1 = −

(3.8.11)

To study the behavior of this system with the techniques that we have developed, we must convert each of the second-order equations to a system of two ﬁrstorder equations. Before doing so, we introduce speciﬁc numerical values for the masses and spring constants to simplify our work. We let k1 = 2 and k2 = 1, and m1 = 2 and m2 = 4. This yields the system x1 = −x1 + 0.5(x2 − x1 ) x2 = −0.25(x2 − x1 )

(3.8.12)

Using the substitutions y1 = x1 , y2 = y1 = x1 , y3 = x2 , y4 = y3 = x2 , it follows that (3.8.12) results in the system of four ﬁrst-order equations given by y1 = y2

y2 = −y1 + 0.5(y3 − y1 )

y3 = y4

(3.8.13)

y4 = −0.25(y3 − y1 )

Letting y be the vector [y1 y2 y3 y4 ]T , we can write (3.8.13) in matrix form, ⎤ ⎡ 0 1 0 0 ⎢−1.5 0 0.5 0⎥ ⎥y (3.8.14) y = ⎢ ⎣ 0 0 0 1⎦ 0.25 0 −0.25 0 From this, we can now analyze the overall behavior of the coupled spring-mass system. In particular, the eigenvalues and eigenvectors of the coefﬁcient matrix in (3.8.14) will enable us to ﬁnd the general solution y. Given initial conditions, we can fully describe the functions yi (t )—particularly y1 and y3 , which represent the respective displacements of the masses in the system—and understand the behavior of the system over time. This problem and others like it are explored further in the exercises at the end of this section.

258

Linear systems of differential equations

3.8.3 RLC circuits

The ﬂow of electricity through a circuit, much like the ﬂow of water in a pipe, naturally involves relationships with rates of change. As such, the study of electrical current involves differential equations. Here, we explore some fundamental properties of electricity and how these lead to such equations. Throughout what follows, we will make use of the analogy that the ﬂow of charge carriers in an electrical circuit is like the ﬂow of particles in a moving stream of water. Just as we consider ﬂow of water in a pipe to be the number of water particles ﬂowing past a given point during a certain time interval, the current I (t ) in a circuit at time t is proportional to the number of positive charge carriers that move past any given point per second in the conductor. Note particularly that current measures a rate of change of charge. Current is measured in amperes(amp), the base unit through which all other units will be deﬁned. One ampere corresponds to 6.2420 × 1018 charge carriers per second moving past a given point. The unit of charge is a coulomb, which is the amount of charge that ﬂows through a cross section of a wire in one second when a one amp current is ﬂowing. In other words, 1 amp = 1 coulomb/s Here, we begin to see how derivatives and integrals are involved in the study of electricity. The current I (t ) at time t is by deﬁnition a rate of change of charge. Thus, by the Fundamental Theorem of Calculus, the total amount of charge that ﬂows past a given point on a time interval [t0 , t1 ] is given by t1 I (s) ds (3.8.15) t0

If we let Q(t ) measure the total accumulated charge at a given point in the circuit from time t0 up to time t , then we have t Q(t ) = Q(t0 ) + I (s) ds (3.8.16) t0

Q (t ) = I (t ).

and therefore As current ﬂows through a circuit, the charge carriers and elements in the circuit exchange energy. We, therefore, deﬁne a potential function V throughout a circuit. The energy (per coulomb of charge) that has been exchanged by the charge carriers as they ﬂow from point a to point b is computed as Vab = Va − Vb where Va and Vb are the values of the potential function at points a and b in the circuit. The difference Vab is called the voltage drop from a to b and is measured in joules per coulomb, which are also known as volts. If we again think of the ﬂow of water through a pipe, the concept of voltage drop is analogous to the change in water pressure between points a and b. Batteries, for example, maintain a voltage drop between two terminals; the energy provided by a battery’s internal chemicals produces a constant amount of energy per coulomb as charge carriers

Applications of linear systems

259

move throughout the battery, which raises the function V by the voltage rating of the battery. As current ﬂows through a circuit, energy is lost. This makes the potential V at one end lower than the potential at the other. Over a portion of a circuit, say from a to b, where a substantial amount of energy is lost, we say that such a portion is called a resistor. Good examples of resistors are light bulbs and heating elements, because they show how electrical energy can be converted into light and heat. The voltage drop across a resistor and the current ﬂowing through it are modeled by Ohm’s law, which says that the potential difference Vab between the endpoints a and b of a resistor is proportional to the current ﬂowing through the resistor. In other words, (3.8.17) Vab = IR where R is a constant called the resistance. The unit of resistance is the ohm, which is equal to one volt per ampere, or one volt-second per coulomb. A changing electrical current I (t ) in a segment of a circuit will create a changing magnetic ﬁeld that results in a voltage drop between the ends of a segment. When this effect is large, such as in a coil between points b and c (the effect can be magniﬁed by different geometrical arrangements of the circuit), the device that induces the effect is called an inductor. Faraday’s law tells us what happens with the voltage drops across inductors. In particular, the voltage drop across an inductor is proportional to the rate of change of the current, or, in other words dI Vbc = L (3.8.18) dt where L is a constant called the inductance. Note speciﬁcally that Faraday’s law regards the rate of change of current. Inductance is measured in henries. Finally, if a circuit is broken and we include two plates separated by an insulating material (such as air), and the terminals of the circuit are connected to a voltage source (such as a battery), then charges will build up on the plates. In the ongoing analogy to water, this is similar to a tank used to store water to provide a source of pressure. We call the set of plates a capacitor, and speak of the total charge Q(t ) on the capacitor. From (3.8.16), since we know that current I is the rate of change of charge Q, if we know an initial charge Q(t0 ), then given a current I (t ) we can ﬁnd the charge Q(t ) by the relationship t I (s) ds (3.8.19) Q(t ) = Q(t0 ) + t0

Finally, Coulomb’s law states that the voltage drop Vcd across a capacitor between points c and d is proportional to the charge on the capacitor, or t 1 1 Vcd = Q(t ) = Q(t0 ) + I (s)ds (3.8.20) C C t0 where C is called the capacitance of the capacitor and is measured in farads.

260

Linear systems of differential equations

All three of the laws (3.8.17), (3.8.18), and (3.8.20) are based on experimental observations of circuits. Similarly, Kirchoff’s law is a conservation law that tells us what we can expect for the voltage drops across various parts of a circuit. Simply stated, Kirchoff’s law says that if we pick a sequence of points in a closed circuit, then the sum of the voltage drops across these segments is zero. Speciﬁcally, for points a1 , a2 , . . . , an , Va1 a2 + Va2 a3 + · · · + Van−1 an + Van a1 = 0

(3.8.21)

A ﬁnal necessary law for us to consider is Kirchoff’s current law, which tells us that at each point of a circuit, the sum of currents ﬂowing into a point equals the sum of the currents ﬂowing out. For a simple RLC circuit with one loop, Kirchoff’s current law guarantees that we can use a single function I (t ) to model the current at any point at a given time t ; for circuits with multiple loops, multiple functions I (t ) are needed. Now we are prepared to see how these fundamental laws of electricity lead to a second-order differential equation, and hence a 2 × 2 system of ﬁrst-order DEs. Let us consider an RLC circuit that consists of a resistor, inductor, and capacitor, along with some energy (voltage) source E(t ), arranged in series, as shown in ﬁgure 3.17. Kirchoff’s law leads us directly to second-order differential equations that determine the behavior of the current I (t ) in the circuit and the charge Q(t ) on the capacitor. By Ohm’s law, we know thatVab = IR. Similarly, Faraday’s law = " impliesthat Vbc # t dI 1 1 L dt and Coulomb’s law tells us that Vcd = C Q(t ) = C Q(t0 ) + t0 I (s) ds . Finally, we know from the voltage source that Vda = −E(t ). Kirchoff’s law now yields the equation Vab + Vbc + Vcd + Vda = 0, or 1 RI (t ) + LI (t ) + Q(t ) = E(t ) (3.8.22) C L b

c

I(t)

R

a

C

+ −

d

E(t) Figure 3.17 An RLC circuit with resistance

R, inductance L, capacitance C, and energy source E(t ).

Applications of linear systems

261

Recalling that Q (t ) = I (t ), we may rewrite (3.8.22) in two different ways. If we differentiate both sides of (3.8.22), and rearrange the terms in decreasing order of derivatives, it follows immediately that the current I (t ) must satisfy the linear second-order differential equation LI (t ) + RI (t ) +

1 I (t ) = E (t ) C

(3.8.23)

If instead we substitute Q for I in (3.8.22), then we see that Q is the solution to the linear second-order differential equation LQ (t ) + RQ (t ) +

1 Q(t ) = E(t ) C

(3.8.24)

We can therefore study the behaviors of different RLC circuits based on the given resistance, inductance, capacitance, and supplied voltage. Moreover, as we well know, any such linear second-order differential equation may be converted to a system of ﬁrst-order equations. For example, letting x1 = I and x2 = I , we can convert (3.8.23) to the system of equations x1 = x2 x2 = −

1 R 1 x1 − x2 + E (t ) CL L L

Example 3.8.1 Determine all solutions I (t ) for an RLC circuit when L = 20 H, R = 80 , C = 10−2 F, and the external voltage is given by the function E(t ) = 50 sin 2t . Solution. From (3.8.23) and the given information, we can immediately determine the second-order differential equation that I (t ) satisﬁes. In particular, since E(t ) = 50 sin 2t , we have E (t ) = 100 cos 2t , and using the values for L, C, and R, I (t ) is a solution to the equation 20I + 80I + 100I = 100 cos 2t

(3.8.25)

Using the substitution x1 = I and x2 = I and multiplying both sides of (3.8.25) by 1/20, the system becomes x1 = x2 x2 = −5x1 − 4x2 + 5 cos 2t From this, we can write the system in matrix form as 0 1 0 x = x+ 5 cos 2t −5 −4

(3.8.26)

For the coefﬁcient matrix A in (3.8.26), we compute the eigenvalues and eigenvectors in order to ﬁnd the complementary solution xh of the system.

262

Linear systems of differential equations

Doing so, we ﬁnd that A has complex eigenvalues and eigenvectors; one eigenvalue-eigenvector pair is −2 − i λ = −2 + i , v = 5 Writing z(t ) = e (−2+i)t

−2 − i 5

we know from theorem 3.5.2 that the real and imaginary parts of the vector function z(t ) will form two real linearly independent solutions to the homogeneous system x = Ax. Rewriting z using Euler’s formula, −2 −1 −2t +i z(t ) = e (cos t + i sin t ) 5 0 −2t −2 cos t + sin t −2t − cos t − 2 sin t + ie =e 5 cos t 5 sin t The real and imaginary parts of z are real linearly independent solutions to x = Ax, so we have determined that the complementary solution to the original system is −2 cos t + sin t − cos t − 2 sin t + c2 e −2t xh = c1 e −2t 5 cos t 5 sin t In theory, we are now ready to apply variation of parameters to ﬁnd a particular solution xp . While we could do so here, the computations get remarkably cumbersome. In the next chapter on higher order differential equations, we will learn that for certain higher order equations, making a good guess at the form of a particular solution provides the simplest approach. In fact, we will even see that keeping certain second-order equations in that form, rather than converting them to systems of ﬁrst-order equations, often is the best way to proceed. For now, we will guess a form for xp . Since 0 b(t ) = 5 cos 2t we assume that a particular solution xp has form a cos 2t + b sin 2t xp = c cos 2t + d sin 2t From this, it follows xp =

−2a sin 2t + 2b cos 2t −2c sin 2t + 2d cos 2t

Applications of linear systems

263

Substituting xp and xp for x and x in (3.8.26), we have c cos 2t + d sin 2t −2a sin 2t + 2b cos 2t = −2c sin 2t + 2d cos 2t −5a cos 2t − 5b sin 2t − 4c cos 2t − 4d sin 2t 0 + 5 cos 2t Equating the coefﬁcients of sin 2t and cos 2t in the entries of the vectors in this most recent vector equation, the following system of four linear equations in a, b, c, and d arises: −2a = d 2b = c −2c = −5b − 4d

2d = −5a − 4c + 5 Rearranging this system to write it in matrix form and row-reducing, we observe ⎡ ⎤ ⎡ ⎤ 0 −1 0 1 0 0 0 1/13 −2 0 ⎢ 0 2 −1 ⎢ 8/13⎥ 0 0⎥ ⎢ ⎥ → ⎢0 1 0 0 ⎥ ⎣ 0 5 −2 ⎣0 0 1 0 16/13⎦ 4 0⎦ 0 0 0 1 −2/13 −5 0 4 2 5 Thus we conclude that a particular solution is 1/13 cos 2t + 8/13 sin 2t xp = 16/13 cos 2t − 2/13 sin 2t In conjunction with our earlier work to ﬁnd xh , we have determined that the general solution to the system of ﬁrst-order differential equations given by (3.8.25) is −2 cos t + sin t − cos t − 2 sin t + c2 e −2t x = c1 e −2t 5 cos t 5 sin t 1/13 cos 2t + 8/13 sin 2t + 16/13 cos 2t − 2/13 sin 2t Recalling that x1 = I is the current in the given RLC circuit, we have shown that 1 8 I (t ) = c1 e −2t (−2 cos t + sin t ) + c2 e −2t (− cos t − 2 sin t ) + cos 2t + sin 2t 13 13 Given initial conditions for I (0) and I (0), we can ﬁnd the values of the constants c1 and c2 . Moreover, we note that as t → ∞, the components of the solution that include e −2t will die off, leaving us with long-term behavior of I (t ) modeled by 1 8 1 8 13 cos 2t + 13 sin 2t . We hence call 13 cos 2t + 13 sin 2t the steady-state solution

264

Linear systems of differential equations

of the original equation (3.8.25) and c1 e −2t (−2 cos t + sin t ) + c2 e −2t (− cos t − 2 sin t ) the transient solution. Overall, we have now seen several examples of important phenomena governed by linear systems of differential equations. Further examples will be considered in the exercises. Exercises 3.8 1. In a closed system of two tanks (i.e, one for which there are no input ﬂows and no output ﬂows), the following information is given. Tank A is ﬁlled with 100 liters of solution whose initial concentration is 0.25 g/liter. Tank B is ﬁlled with 50 liters of solution whose initial concentration is 1 g/liter. The two tanks are connected with two pipes having ﬂows in opposite direction; mixed solution from Tank A ﬂows to Tank B at a rate of 4 liters/min. Similarly, mixed solution ﬂows from Tank B to Tank A at a rate of 4 liters/min. Set up and solve an initial-value problem whose solution will tell you the amount of salt in each tank at time t . Discuss the graphical behavior of the solution x(t ) (whose components are the amount of salt in each tank at time t ). Is there an equilibrium solution to the system? If so, what is it? 2. Consider a system of two tanks connected in such a way that each of the tanks has an independent inﬂow that delivers salt solution to it, each has an independent outﬂow (drain), and each tank is connected to the other with an outﬂow and an inﬂow. The relevant information about each tank is given in the table below. Tank A

Tank B

Tank volume

100 liters

200 liters

Rate of inﬂow to the tank

5 liters/min

9 liters/min

Concentration of salt in inﬂow

7 g/liter

3 g/liter

Rate of drain outﬂow

4 liters/min

10 liters/min

Rates of outﬂows to other tank

to B: 3 liters/min

to A: 2 liters/min

Initially, Tank A has 20 g of salt present in its solution, and Tank B has 75 g of salt present in its solution. Set up and solve an initial-value problem whose solution will determine the amount of salt in each tank at time t . Discuss the graphical behavior of the solution x(t ) (whose components are the amount of salt in each

Applications of linear systems

265

tank at time t ). Is there an equilibrium solution to the system? If so, what is it? 3. Suppose that in exercise 2 all of the given information remains the same except for the fact that instead of saltwater ﬂowing into each tank, pure water ﬂows in. How do the results of your work in exercise 2 change? 4. In a closed system of three tanks (that is, one for which there are no input ﬂows and no output ﬂows), the following information is given. Tank A

Tank B

Tank C

Tank volume

100 liters

150 liters

125 liters

Rates of outﬂows to other tanks

to B: 3 liters/min

to C: 1 liters/min

to A: 4 liters/min

Rates of outﬂows to other tanks

to C: 4 liters/min

to A: 3 liters/min

to B: 1 liter/min

Tank A is ﬁlled with 100 liters of solution whose initial concentration is 8 g/liter. Tank B is ﬁlled with 150 liters of solution whose initial concentration is 3 g/liter. Tank C is initially ﬁlled with 125 liters of pure water. The three tanks are connected with pipes having ﬂows in opposite directions; ﬂow rates are given in the table above. Set up and solve an initial-value problem whose solution will tell you the amount of salt in each tank at time t . Discuss the graphical behavior of the solution x(t ) (whose components are the amount of salt in each tank at time t ). Is there an equilibrium solution to the system? If so, what is it? 5. In a system of three tanks of saltwater, the following information is given. Tank A

Tank B

Tank C

Tank volume

400 liters

200 liters

300 liters

Rate of inﬂow to the tank

7 liters/min

0 liters/min

0 liters/min

Concentration of salt in inﬂow

10 g/liter

n/a

n/a

Rate of drain outﬂow

0 liters/min

0 liters/min

7 liters/min

Rates of outﬂows to other tanks

to B: 7 liters/min

to C: 7 liters/min

to A: 0 liters/min

Rates of outﬂows to other tanks

to C: 0 liters/min

to A: 0 liters/min

to B: 0 liters/min

266

Linear systems of differential equations

Each tank is full; tank A contains solution whose initial concentration is 20 g/liter. Tank B contains solution whose initial concentration is 50 g/liter. Tank C contains pure water. Without setting up a system of differential equations, ﬁrst use your intuition to describe what you think will be the behavior of the functions x1 (t ), x2 (t ), and x3 (t ) that measure the amount of salt in each of the three respective tanks at time t . Then, set up and solve an initial-value problem whose solution will tell you the amount of salt in each tank at time t . Discuss the graphical behavior of each component of the solution x(t ) and compare it to your intuitive expectations. Is there an equilibrium solution to the system? If so, what is it? 6. In a system of three tanks of saltwater interconnected with pipes of inﬂow and outﬂow to and from each, the following information is given. Tank A

Tank B

Tank C

Tank volume

400 liters

800 liters

500 liters

Rate of inﬂow to the tank

5 liters/min

10 liters/min

5 liters/min

Concentration of salt in inﬂow

25 g/liter

15 g/liter

40 g/liter

Rate of drain outﬂow

4 liters/min

7 liters/min

9 liters/min

Rates of outﬂows to other tanks

to B: 6 liters/min

to C: 5 liters/min

to A: 4 liters/min

Rates of outﬂows to other tanks

to C: 4 liters/min

to A: 5 liters/min

to B: 1 liter/min

Assume that the system is such that initially there is a concentration of 10 g/liter of salt in each of the three tanks. Set up and solve an initial-value problem whose solution will tell you the amount of salt in each tank at time t . Discuss the graphical behavior of each component of the solution x(t ). Is there an equilibrium solution to the system? If so, what is it? 7. Recall that for a spring-mass system of mass m, spring constant k, and damping constant c, the displacement y(t ) of the mass from equilibrium is governed by the linear second-order differential equation c k 1 y + y = F (t ) m m m For a mass of 0.5 kg with spring constant k = 2 N/m in an undamped, unforced system, assume the mass is displaced 0.4 m from equilibrium and released (i.e., y(0) = 0.4 and y (0) = 0). y +

Applications of linear systems

267

(a) State the second-order IVP that models this situation. (b) Convert the second-order equation to a system of ﬁrst-order DEs using the standard substitution: x1 = y, x2 = y . (c) Solve the system in (b), and graph the component function x1 (t ). Discuss the long-term behavior of the spring-mass system. 8. For a mass of 0.5 kg with spring constant k = 2 N/m and damping constant c = 0.5 N·s/m in an unforced system, assume the mass is displaced 0.3 m from equilibrium and released. (a) State the second-order IVP that models this situation. (b) Convert the second-order equation to a system of ﬁrst-order DEs using the standard substitution: x1 = y, x2 = y . (c) Solve the system in (b), and graph the component function x1 (t ). Discuss the long-term behavior of the spring-mass system. 9. For a mass of 0.5 kg with spring constant k = 2 N/m and damping constant c = 0.5 N·s/min a forced system with forcing function F (t ) = cos 2t N, assume the mass is initially displaced 0.3 m from equilibrium and released. (a) State the second-order IVP that models this situation. (b) Convert the second-order equation to a system of ﬁrst-order DEs using the standard substitution: x1 = y, x2 = y . (c) Use variation of parameters to solve the system in (b), and graph the component function x1 (t ). Discuss the long-term behavior of the spring-mass system. 10. In section 3.8.2, we considered a system of two masses attached to two springs in parallel, where a mass m1 is attached to a spring with spring constant k1 and from m1 a second spring with constant k2 and mass m2 is attached. See ﬁgure 3.16. If we assume that the surface on which the masses rest is frictionless and let let x1 (t ) denote the displacement of m1 from its equilibrium position and x2 (t ) the displacement of m2 from its equilibrium position and set the system in motion, then the system is governed by the system of second order differential equations k1 k2 (x2 − x1 ) x1 = − x1 + m1 m1 k2 x2 = − (x2 − x1 ) m2 (a) Suppose that k1 = 2, m1 = 1, k2 = 4 and m2 = 0.5. Using the given constant values and the substitution y1 = x1 , y2 = y1 = x1 , y3 = x2 , y4 = y3 = x2 , convert the system of two second-order equations to a system of four ﬁrst-order equations. (b) Assume that the masses m1 and m2 are each displaced 1 unit from their natural equilibrium and released. That is, assume x1 (0) = 1, x1 (0) = 0,

268

Linear systems of differential equations

x2 (0) = 1, and x2 (0) = 0. Solve this initial-value problem using the system in (a) and sketch the plots of y1 and y3 and discuss what they tell you about the system. 11. Recall that the current I (t ) in an RLC circuit is governed by the linear second-order differential equation 1 I (t ) = E (t ) C where L is the inductance, R the resistance, and C the capacitance of the circuit. LI (t ) + RI (t ) +

Suppose we have an RLC circuit for which an inductor of L = 1 henry and capacitor C = 0.01 farad are present. Assume further that I (0) = 100 and I (0) = 0. (a) State a second-order IVP whose solution is I (t ), the current at time t . (b) Convert the IVP in (a) to a system of ﬁrst-order IVPs using a standard substitution. (c) Solve the system in (b) to determine the current I (t ) in the cases where the resistance is (i) R = 0 , (ii) R = 16 , (iii) R = 20 , and (iv) R = 25 , assuming consistent units. Sketch a plot of each solution I (t ) and discuss the impact that changing R has on the current. 12. Suppose we have an RLC circuit for which an inductor of L = 1 H, resistor R = 16 , and capacitor C = 0.01 F are present. Assume further that I (0) = 100 A and I (0) = 0. Finally, suppose that the system is provided a voltage source of E(t ) = 100 sin 10t (a) State a second-order IVP whose solution is I (t ), the current in the circuit at time t . (b) Convert the IVP in (a) to a system of ﬁrst-order IVPs using a standard substitution. (c) Solve the system in (b) to determine the current I (t ) at time t . Sketch a plot of the solution I (t ) and discuss the impact the forcing function has on the current. 3.9 For further study 3.9.1 Diagonalizable matrices and coupled systems

We have seen that in the case where a system of linear ﬁrst-order differential equations is uncoupled, such as x1 3 0 x1 3x1 = = 0 −2 x2 x2 −2x2 the system is particularly straightforward to solve. In addition, even when the coefﬁcient matrix A of the system x = Ax is not a diagonal matrix, in the

For further study

269

case where A is n × n and has n real, linearly independent eigenvectors, it is again a straightforward exercise to determine the general solution to x = Ax. In what follows, we investigate the connections between A having n real linearly independent eigenvectors and the system being uncoupled. (a) Solve the uncoupled system of linear ﬁrst-order equations x1 3 0 x1 3x1 = = 0 −2 x2 x2 −2x2 by directly solving the two individual equations x1 = 3x1 and x2 = −2x2 . (b) For the coefﬁcient matrix

A=

3 0 0 −2

how are your solutions in (a) to the individual differential equations related to the eigenvalues and eigenvectors of A? 1 6 (c) Determine the eigenvalues and eigenvectors of the matrix A = and 5 2 show that A has two real, linearly independent eigenvectors. (d) Let D be the diagonal 2 × 2 matrix whose diagonal entries are λ1 and λ2 , the eigenvalues of A from (c), and let P be the 2 × 2 matrix whose columns are x1 and x2 , the eigenvectors of A corresponding to λ1 and λ2 . Show that AP = PD. (e) More generally, let A be an n × n matrix with n linearly independent real eigenvectors x1 , x2 , . . . , xn that correspond to real eigenvalues λ1 , λ2 , . . . , λn . As in (d), let D be the diagonal matrix whose diagonal entries are the eigenvalues of A and P be the matrix whose columns are the corresponding eigenvectors of A. Explain why AP = PD and thus why A = PDP−1 and D = P−1 DP. A real n × n matrix A with the property that it has n real, linearly independent eigenvectors is called diagonalizable. When we factor A in the form A = PDP−1 , we say that we have diagonalized the matrix A. (f) For a 2 × 2 diagonalizable matrix A, consider the system of differential equations given by x = Ax. Let D and P be the matrices deﬁned above in (d). Note that in this problem A is a arbitrary diagonalizable matrix: we are not specifying the values of λ1 and λ2 , nor the values of the entries in the corresponding eigenvectors. (i) Let y = P−1 x. Show that x = Py . (ii) Use the substitution y = P−1 x and the fact that A = PDP−1 to show that the original system x = Ax may be equivalently represented by the system y = Dy. (iii) Explain why the system y = Dy is preferable to the system x = Ax.

270

Linear systems of differential equations

1 6 (g) For the matrix A = , solve the system x = Ax by executing the 5 2 following steps.

(i) Diagonalize A by determining matrices D and P such that A = PDP−1 . Recall that D is the diagonal matrix whose diagonal entries are the eigenvalues of A and P is the matrix whose columns are the corresponding eigenvectors of A. (ii) Follow your work in (f) to introduce a substitution that converts the system x = Ax to a new system in the variable y that is uncoupled and of the form y = Dy. (iii) Solve the uncoupled system in (ii) for y. (iv) Determine the solution x to the original system by showing that x = Py and using this substitution appropriately. (h) Solve the system x = Ax given by A=

2 1 1 2

using the approach outlined in (g). (i) Solve the system x = Ax given by ⎡

⎤ 3 −1 1 3 −1⎦ A = ⎣−1 1 −1 3

using the approach outlined in (g). (j) Compare your work in (g)–(i) to how you learned to solve the system x = Ax in section 3.3. Is this new approach fundamentally the same or is it markedly different? Explain. 3.9.2 Matrix exponential

An important result in calculus is that e x can be represented by its Taylor series expansion x2 x3 xn ex = 1 + x + + + · · · + (3.9.1) + ··· 2! 3! n! and that (3.9.1) holds for every real value of x. In what follows, we explore the notion of e A , where A is a matrix, through the use of an analogous expansion, as well as the role of e A in the solution of systems of differential equations of the form x = Ax. (a) Let A be the diagonal matrix

3 0 A= 0 −2

For further study

Explain why

n

A =

3n 0 0 (−2)n

271

(b) For the matrix A in (a), show that

2 n 1 2 1 n 1 + 3 + 32! +···+ 3n! 0 I + A + A +···+ A = 2 (−2)n 2! n! 0 1 − 2 + (−22) ! +···+ n ! (3.9.2)

Based on the entries in the right-hand matrix of (3.9.2), explain why it is reasonable to write that 1 1 1 e A = I + A + A2 + A3 + · · · + A n + · · · (3.9.3) 2! 3! n! We use (3.9.3) as the deﬁnition of e A for any diagonal matrix A. 2 −2 (c) Now consider the matrix B = . Find the eigenvalues and −2 −1 eigenvectors of B and diagonalize B by writing B = PDP−1 where D is the diagonal matrix whose diagonal entries are the eigenvalues of B and P is the matrix whose columns are the corresponding eigenvectors of B. For more on the notion of a matrix being ‘diagonalizable’, see subsection 3.9.1. (d) For an arbitrary diagonalizable matrix B for which B = PDP−1 (where D and P have the meaning ascribed in (c)), show that Bn = PDn P−1 (e) For an arbitrary diagonalizable matrix B, explain why 1 1 1 1 1 I + B + B2 + B3 + · · · + Bn + · · · = P I + D + D2 + D3 + · · · 2! 3! n! 2! 3! 1 + Dn + · · · P−1 n! again where D and P have the meaning ascribed in (c). We thus deﬁne e B for any diagonalizable matrix B by the equation eB = I + B +

1 2 1 3 1 B + B + · · · + Bn + · · · 2! 3! n!

(3.9.4)

(f) Show that if B is any diagonalizable matrix such that B = PDP−1 (where D and P have the meaning ascribed in (c)), then e B = Pe D P−1

272

Linear systems of differential equations

(g) Use the result in (f) to compute e B for the speciﬁc matrix B given in (c). (h) Recall that when we solve a single homogeneous linear ﬁrst-order DE such as y = 5y one way to solve the equation is to guess that the solution is y = e rt and work to determine the value of r that satisﬁes the DE. Of course we ﬁnd that r = 5 and y = Ce 5t is the general solution. Indeed, for any constant a, the solution to y = ay is y = Ce at . Now let this consider solving the system of differential equations 3 0 x = Ax = 0 −2

(3.9.5)

noting that A is the diagonal matrix from (a) above. (i) Viewing t as a scalar multiplier of A, update your work from (3.9.3) to write a series expansion for e At . (ii) Noting that e At is a matrix, explain why it is reasonable to guess that (t ) = e At is a solution matrix for the system x = Ax. (iii) Using your expression from (i) for (t ) = e At , compute both (t ) and A (t ) to verify that the matrix function (t ) satisﬁes the equation (t ) = A (t ).

4 Higher order differential equations

4.1 Motivating equations

Through our study of linear systems of differential equations, we have already encountered higher order differential equations that arise naturally in physical applications. Two particularly important ones are those associated with springmass systems and RLC circuits. Here, we brieﬂy revisit these equations. In section 3.1, we considered a mass m suspended from a spring with spring constant k that is subject to damping with proportionality constant c. If F (t ) is an external forcing function on the system, then the displacement y(t ) of the mass from equilibrium satisﬁes my + cy + ky = F (t )

(4.1.1)

This is a nonhomogeneous linear second-order differential equation. While we have already studied this equation by using the substitution x1 = y and x2 = y and considered the resulting linear system of ﬁrst-order differential equations, there is further insight to be gained by examining (4.1.1) solely as a secondorder equation. In fact, while it is theoretically possible to solve (4.1.1) using the corresponding linear system and ideas from chapter 3, doing so in the cases where F (t ) = 0 is often cumbersome; we will see in section 4.4 that this equation may often be solved in a straightforward manner by leaving it in its original form as a second-order equation. In section 3.8, we encountered another important nonhomogeneous linear second-order differential equation. By viewing the ﬂow of electricity through a circuit as analogous to the ﬂow of water in a pipe, we came to understand a differential equation that models the current I (t ). Using results from physics, 273

274

Higher order differential equations

including Ohm’s law, Faraday’s law, and Coulomb’s law, we learned that the current I (t ) must satisfy the linear second-order differential equation LI + RI +

1 I = E (t ) C

(4.1.2)

where L is the inductance, R is the resistance, C is the capacitance, and E(t ) represents an external voltage source. We note speciﬁcally that the governing differential equations for springmass systems and RLC circuits are both linear nonhomogeneous second-order differential equations with constant coefﬁcients. These differential equations therefore merit further study as we endeavor to more fully understand these physical systems. When the damping constant c = 0 and the resistance R = 0 in (4.1.1) and (4.1.2), these equations are often called harmonic oscillator equations. When small damping or resistance is present, we refer to them as damped harmonic oscillators.

4.2 Homogeneous equations: distinct real roots

If we consider our experience with single homogeneous linear ﬁrst-order differential equations and systems thereof, we realize that the exponential function plays a central role in their solution. For example, if we solve the equation y − 5y = 0 the solution is y = ce 5t . Likewise, if we solve the system given by x = Ax, where A is a matrix with eigenvalues λ = 2 and λ = −3, then the general solution is x = c1 e 2t v1 + c2 e −3t v2 where v1 and v2 are eigenvectors that correspond to the eigenvalues λ = 2 and λ = −3. Given this prominence of the exponential function, it is not surprising that functions of the form y = e rt play a central role in our study of higher order equations. For example, consider the second-order linear homogeneous differential equation with constant coefﬁcients given by y − y − 6y = 0

(4.2.1)

Even without our experience with ﬁrst-order equations and systems, it is reasonable to think that one or more functions of the form y = e rt will be a solution to this equation because of the question the equation begs: “what function y is such that its second derivative minus its ﬁrst derivative is equal to 6 times itself?” In essence, we are looking for a function y such that a certain linear combination of the function, its ﬁrst derivative, and its second derivative,

Homogeneous equations: distinct real roots

275

is the zero function. This makes it natural for us to expect that the solution is such that its derivatives are scalar multiples of itself, hence leading us to consider y = e rt . Letting y = e rt , we observe that y = re rt and y = r 2 e rt . Substituting these functions into (4.2.1) requires r to satisfy the equation r 2 e rt − re rt − 6e rt = 0

(4.2.2)

Factoring, we can rewrite (4.2.2) as e rt (r 2 − r − 6) = 0 and since e rt is never zero, it follows that r must be such that r 2 − r − 6 = (r − 3)(r + 2) = 0. From this, r = 3 or r = −2, and therefore y1 = e 3t and y2 = e −2t are both solutions to (4.2.1). Since y1 = e 3t is not a scalar multiple of y2 = e −2t , it follows that y1 and y2 are linearly independent solutions to (4.2.1). Through our work with homogeneous linear systems, we are accustomed to taking linear combinations of linearly independent solutions in order to form a general solution; the same principle holds here, which we will verify directly. Letting y = c1 y1 + c2 y2 = c1 e 3t + c2 e −2t , it follows that y = 3c1 e 3t − 2c2 e −2t and y = 9c1 e 3t + 4c2 e −2t . If we now consider y − y − 6y, we have y − y − 6y = (9c1 e 3t + 4c2 e −2t ) − (3c1 e 3t − 2c2 e −2t ) − 6(c1 e 3t + c2 e −2t ) = (9c1 e 3t − 3c1 e 3t − 6c1 e 3t ) + (4c2 e −2t + 2c2 e −2t − 6c2 e −2t ) =0

Thus, we have shown that every function of the form y = c1 e 3t + c2 e −2t is a solution to (4.2.1). This shows that the solution space of (4.2.1) is at least twodimensional; might there be any other linearly independent solutions to the equation? By our earlier work with systems, we know that the solution space of the equation x = Ax, where A is n × n, is n-dimensional. Since the second-order equation (4.2.1) can be converted to a 2 × 2 system of equations, it follows that its solution space has dimension exactly 2, and thus y = c1 e 3t + c2 e −2t

(4.2.3)

is the general solution to (4.2.1). Our work to show that if y1 and y2 are solutions to (4.2.1), then y = c1 y1 + c2 y2 is also a solution may be generalized to any homogeneous linear secondorder differential equation. We state this result in the following theorem. Theorem 4.2.1 If y1 and y2 are solutions to the second-order linear homogeneous equation y + a(t )y + b(t )y = 0 then y = c1 y1 + c2 y2 is also a solution for any constants c1 and c2 .

276

Higher order differential equations

The important roles the constants c1 and c2 play are further exempliﬁed by initial-value problems. For example, if we consider the initial-value problem y − y − 6y = 0,

y(0) = 2,

y (0) = 1

(4.2.4)

we can show that this IVP has a unique solution. Using the general solution y(t ) = c1 e 3t + c2 e −2t , the condition y(0) = 2 implies that 2 = c1 + c2

(4.2.5)

Differentiating the general solution, we ﬁnd that y (t ) = 3c1 e 3t − 2c2 e −2t , and therefore y (0) = 1 implies (4.2.6) 1 = 3c1 − 2c2 Equations (4.2.5) and (4.2.6) form a linear system of two equations in two unknowns. Solving this system, c1 = 1 and c2 = 1, so that the function y(t ) = e 3t + e −2t is the unique solution to (4.2.4). Our work with the example equation y − y − 6y = 0 is indicative of many broader trends in the study of second-order linear differential equations. Because such equations can be converted to systems, we should not be at all surprised to learn that a broad class of initial-value problems associated with second-order equations have unique solutions, nor that the general solution to a second-order equation belongs to a two-dimensional solution space. We state two theorems in order to formalize these observations. Theorem 4.2.2

Consider the second-order initial-value problem given by

y + p(t )y + q(t )y = f (t )

y(t0 ) = y0 ,

y (t0 ) = y1

(4.2.7)

where the coefﬁcient functions p(t ) and q(t ) and the forcing function f (t ) are continuous on an open interval (a , b). Given any t0 in (a , b), (4.2.7) has a unique solution in (a , b). While the proof of theorem 4.2.2 is beyond the scope of this book, it is notable that in the case that p(t ) and q(t ) are constant functions, we can prove the theorem. Indeed, we will do so by actually constructing the solution in various cases in this section and those following. Just as we almost exclusively considered matrices A with constant entries in our work with systems of linear ﬁrst-order differential equations of the form x = Ax, in our study of second-order linear differential equations, we will normally consider the situation where the coefﬁcient functions p(t ) and q(t ) are constant. For this context, we can deduce the following result. Theorem 4.2.3 The set of all solutions to the second-order homogeneous linear differential equation y + a1 y + a0 y = 0, where a0 and a1 are constants, is a vector space of dimension 2.

Homogeneous equations: distinct real roots

277

This result can be viewed as a consequence of theorem 3.3.2 for linear systems of differential equations with constant coefﬁcients. In particular, given y + a1 y + a0 y = 0

(4.2.8)

if we use the standard substitution x1 = y, x2 = y , then it follows that (4.2.8) is equivalent to the system 0 1 x x = Ax = −a0 −a1 which has a two-dimensional solution space. Thus, in order to solve (4.2.8), we seek two linearly independent solutions that satisfy the equation. In particular, if we can ﬁnd two functions y1 = e r1 t and y2 = e r2 t that are both solutions to (4.2.8), where r1 = r2 , then the general solution must be y = c1 e r1 t + c2 e r2 t More speciﬁcally, if we recall our earlier approach following (4.2.1) in the ﬁrst example in this section, we made the assumption that a solution y has form y = e rt . Doing so and substituting in the general equation y + a1 y + a0 y = 0, we see that r must satisfy r 2 e rt + a1 re rt + a0 e rt = 0

(4.2.9)

Since e rt is never zero, it follows that r must be a solution of the characteristic equation of the second-order homogeneous linear equation (4.2.8), which is r 2 + a1 r + a0 = 0

(4.2.10)

If r1 and r2 are the roots of (4.2.10), then it follows that y1 = e r1 t and y2 = e r2 t are both solutions to the original equation (4.2.8). In particular, if r1 = r2 , then y1 and y2 are linearly independent and we have found the general solution to (4.2.8), which is y = c1 e r1 t + c2 e r2 t We state this result formally in the following theorem. Theorem 4.2.4 Given the second-order linear differential equation with constant coefﬁcients y + a1 y + a0 y = 0 if the characteristic equation r 2 + a1 r + a0 = 0 has two distinct real roots r1 and r2 , then the general solution to (4.2.4) is y = c1 e r1 t + c2 e r2 t We close this section with an example.

278

Higher order differential equations

4

y

2 t −0.5

0.5

1.0

1.5

−2

−4 Figure 4.1 A plot of the solution y(t ) to the IVP given in (4.2.11).

Example 4.2.1

Solve the second-order initial-value problem given by y + 7y + 12y = 0,

y(0) = 3,

y (0) = −1

(4.2.11)

Graph the solution and discuss its long-term behavior. Solution. We begin by assuming that y = e rt . Direct substitution into (4.2.11) and removing the factor e rt results in the characteristic equation r 2 + 7r + 12 = 0 Factoring, we ﬁnd that (r + 3)(r + 4) = 0, and therefore, r = −3 or r = −4. Since the two r values are distinct, it follows that y1 = e −3t and y2 = e −4t are linearly independent solutions to (4.2.11) and the general solution is y = c1 e −3t + c2 e −4t

(4.2.12)

Applying the given initial conditions, we can solve for c1 and c2 . Since y(0) = 3 and y (0) = −1, (4.2.12) implies that 3 = c1 + c2 −1 = −3c1 − 4c2

It follows c1 = 11 and c2 = −8, and thus the unique solution to the given IVP (4.2.11) is y = 11e −3t − 8e −4t . Plotting y(t ) results in the graph shown in ﬁgure 4.1, where we clearly see the given initial behavior at t = 0 (the function value is 3 and the slope of the tangent line is −1) and that the solution’s long-term behavior is that y(t ) → 0 as t → ∞. We can also observe from the negative constants present in the exponents of the general solution y = c1 e −3t + c2 e −4t , that every such solution must tend to zero as t → ∞. We note that y = 0 is the only constant (equilibrium) solution

Homogeneous equations: distinct real roots

279

to the original equation y + 7y + 12y = 0, and that because every solution tends to y = 0, we say y = 0 is a stable equilibrium. Exercises 4.2 In exercises 1–7, determine the general solution to the given second-order homogeneous linear DE. 1. y − y − 12y = 0 2. y + y − 2y = 0 3. y − y = 0 4. y + 3y = 0 5. y = 0 6. y + 4y + 3y = 0 7. y + y − y = 0 In exercises 8–14, solve the stated IVP. In addition, graph your solution and discuss its long-term behavior. Note that the general solution to each equation has been found in exercises 1–7. 8. y − y − 12y = 0,

y(0) = −4,

9. y + y − 2y = 0, 10. y − y = 0,

y(0) = 2,

y(0) = −3,

13. y + 4y + 3y = 0, 14. y + y − y = 0,

y (0) = 2

y (0) = −1

y(0) = 1,

11. y + 3y = 0, 12. y = 0,

y(0) = 2,

y (0) = 1

y (0) = 3

y (0) = 1

y(0) = −2, y(0) = 9,

y (0) = −6

y (0) = −3

In exercises 15–19, construct a second-order homogeneous linear DE having the given functions as solutions. 15. y1 = e −2t , y2 = e 2t 16. y1 = e 5t , y2 = e −3t 17. y1 = e 4t , y2 = 1 18. y1 = e 2t , y2 = e 3t 19. y1 = 1, y2 = t 20. Consider the second-order homogeneous linear equation y − 6y + 9y = 0. (a) Use the substitution y = e rt to attempt to ﬁnd two linearly independent solutions to the given equation.

280

Higher order differential equations

(b) Explain why your work in (a) only results in one linearly independent solution, y1 (t ). (c) Verify by direct substitution that y2 = te 3t is a solution to y − 6y + 9y = 0. Explain why this function is linearly independent from y1 found in (a). (d) State the general solution to the given equation. 21. Consider the second-order homogeneous linear equation y − 2y + 5y = 0. (a) Use the substitution y = e rt to attempt to ﬁnd two linearly independent solutions to the given equation. (b) Explain why your work in (a) does not generate any real solutions to the given equation. (c) Verify by direct substitution that y1 = e t cos 2t and y2 = e t sin 2t are solutions to y − 2y + 5y = 0. Explain why these functions are linearly independent. (d) State the general solution to the given equation. 22. Consider the second-order homogeneous linear equation y + 4y = 0. (a) Use the substitution y = e rt to attempt to ﬁnd two linearly independent solutions to the given equation. (b) Explain why your work in (a) does not generate any real solutions to the given equation. (c) Think about familiar functions that can satisfy the condition that “the second derivative equals −4 times the function itself.” By making a natural guess and verifying by direct substitution, ﬁnd two linearly independent functions y1 and y2 that satisfy the given differential equation. (d) State the general solution to the given equation. Recall that in a spring-mass system, the displacement y(t ) of the mass from its natural equilibrium is governed by the equation c k 1 y + y + y = F (t ) m m m where c is the damping constant, k is the spring constant, m is the mass of the suspended object, and F is the forcing function. 23. For an unforced system with c = 3, k = 2, and m = 1, determine the displacement of the mass at time t if the system is set in motion via the initial conditions y(0) = 2, y (0) = 1. Sketch a graph of the solution you determine and discuss the long-term behavior of the spring-mass system. Assume consistent units on all constants. 24. For an unforced spring-mass system with k = 9, c = 12, and m = 3, determine the displacement of the mass from equilibrium at time t if y(0) = 0 and y (0) = −1. Assume consistent units on all constants.

Homogeneous equations: repeated and complex roots

281

Recall that in a standard RLC electrical circuit, the current I (t ) satisﬁes the equation 1 LI (t ) + RI (t ) + I (t ) = E (t ) C where L is the inductance, R is the resistance, C is the capacitance, and E(t ) represents an external voltage source. 25. For an RLC circuit with no external voltage source, L = 20, R = 80, and C = 1/60, determine the current at time t given the initial conditions I (0) = 100, I (0) = 25. Graph the solution you determine and discuss the long-term behavior of the current. Assume consistent units on all constants. 26. For an RLC circuit with no external voltage source, L = 20, R = 0, and C = 1/60, determine the current at time t given the initial conditions I (0) = 100, I (0) = 25. Graph the solution you determine and discuss the long-term behavior of the current. Assume consistent units on all constants. 4.3 Homogeneous equations: repeated and complex roots

In the preceding section, we observed that any time the characteristic equation of the second-order equation y + a1 y + a0 y has two real, distinct roots, the general solution of the differential equation is easily determined. However, in an equation such as (4.3.1) y − 6y + 9y = 0 with characteristic equation r 2 − 6r + 9 = 0, the only root of this equation is r = 3. Although this leads us to the solution y1 = e 3t , we do not immediately see how to ﬁnd a second linearly independent solution. In a similar way, the equation (4.3.2) y − 2y + 5y = 0 has characteristic equation is r 2 − 2r + 5 = 0 and its roots are r = 1 ± 2i In this case, we see that no real solution to (4.3.2) results using our previous approach, so it remains for us to ﬁnd two real linearly independent solutions. Now we will endeavor to understand how to address these two cases: when roots of the characteristic equation are repeated and when the roots of the characteristic equation are complex. 4.3.1 Repeated roots

Let us consider the second-order homogeneous linear DE given by y + 4y + 4y = 0

(4.3.3)

282

Higher order differential equations

Its characteristic equation is r 2 + 4r + 4 = (r + 2)2 = 0, so that only the solution y1 = e −2t results from the guess that y = e rt . To ﬁnd a second linearly independent solution, it is natural to think that we need to somehow complicate the function y = e −2t , just as we did in section 3.5 when we encountered the similar case where the coefﬁcient matrix of a 2 × 2 system of linear ﬁrst-order DEs had a repeated eigenvalue. Thus, we consider a second potential solution y2 = v(t )e −2t where v(t ) is a function yet to be determined. By using this function and substituting into the equation y + 4y + 4y = 0, we ﬁnd conditions that v(t ) must satisfy. First, observe by the product rule that y2 = −2ve −2t + v e −2t

(4.3.4)

y2 = 4ve −2t − 4v e −2t + v e −2t

(4.3.5)

Similarly,

Next, substituting into (4.3.3), we ﬁnd 0 = y2 + 4y2 + 4y2 = (4ve −2t − 4v e −2t + v e −2t ) + 4(−2ve −2t + v e −2t ) + 4(ve −2t ) = v e −2t

(4.3.6)

Since e −2t is never zero, it follows that v (t ) must equal zero for all values of t . This implies that v(t ) can be any linear function. Because all we seek is one function y2 = v(t )e −2t that is a solution to (4.3.3) and is linearly independent from y1 = e −2t , it sufﬁces to choose v(t ) = t . Speciﬁcally, y2 = te −2t is a second linearly independent solution to (4.3.3). The general solution is therefore y(t ) = c1 e −2t + c2 te −2t The condition we derived at (4.3.6) for v(t ) will hold in any situation where the characteristic equation of a second-order linear homogeneous DE has a repeated root. This leads us to state the following theorem. Theorem 4.3.1 For any second-order linear homogeneous differential equation of the form y + 2ky + k 2 y = 0 whose characteristic equation has repeated real root r = −k, the general solution to the differential equation is y = c1 e −kt + c2 te −kt

Homogeneous equations: repeated and complex roots

283

Before proceeding to the case of complex roots, we consider one example to demonstrate theorem 4.3.1 at work. Example 4.3.1 Determine the general solution to the equation y − 10y + 25y = 0

(4.3.7)

Solution. The characteristic equation of the given DE is r 2 − 10r + 25 = (r − 5)2 = 0, which has the repeated root r = 5. By theorem 4.3.1, it follows that the general solution to (4.3.7) is y = c1 e 5t + c2 te 5t 4.3.2 Complex roots

We continue to be guided throughout our work with second-order linear homogeneous equations by the informed guess that the solution has form y = e rt . When this guess and the corresponding characteristic equation result in two distinct, real values of r, we have found the general solution to the given differential equation. Likewise, we have just shown that when the characteristic equation has only one real root, we can still ﬁnd the general solution to the DE. We next explore how, even in the complex case, we can ﬁnd the general solution through our original guess, y = e rt . We return to the example y − 2y + 5y = 0 (4.3.8) and recall that the roots of the characteristic equation are r = 1 ± 2i. While this suggests that z(t ) = e (1+2i)t should be a solution of the differential equation, the function z(t ) is complex-valued. When we encountered a similar situation in section 3.5 for a linear system whose coefﬁcient matrix had complex eigenvalues and complex eigenvectors, we used Euler’s formula to separate such a complexvalued function into real and imaginary parts in order to ﬁnd real solutions. We proceed similarly here. Recall that Euler’s formula states that e i θ = cos θ + i sin θ , so e (a +bi)t = e at e ibt = e at (cos bt + i sin bt ) For the complex solution z(t ) to (4.3.8), we thus ﬁnd that z(t ) = e (1+2i)t = e t (cos 2t + i sin 2t ) = e t cos 2t + ie t sin 2t (4.3.9) In (4.3.9), we see that z(t ) has been written in the form z(t ) = Re(z) + iIm(z) where Re(z) and Im(z) are themselves real-valued functions of t . Based on our experience with systems of differential equations with complex-valued solutions,

284

Higher order differential equations

it is natural at this point to hope that both the real and imaginary parts of z(t ) will be linearly independent solutions to (4.3.8). Indeed, if we let y1 = e t cos 2t and y2 = e t sin 2t , then it can be shown by direct substitution that both y1 and y2 are solutions to (4.3.8). Because y1 and y2 are not scalar multiples of each other, these two functions are linearly independent, and therefore, by theorem 4.2.3, it follows that y(t ) = c1 e t cos 2t + c2 e t sin 2t is the general solution to (4.3.8). The direct substitution that is used to verify that the real and imaginary parts of z(t ) are solutions to the original equation is somewhat tedious, but not difﬁcult. In fact, in the more general case where we have complex roots a ± bi, it can be similarly veriﬁed by direct substitution into the corresponding second-order equation that y1 = e at cos bt and y2 = e at sin bt are each solutions to the equation. Note that this scenario implies that the characteristic equation has form C(r) = 0 where C(r) = [r − (a + bi)][r − (a − bi)] = r 2 − (a + bi)r − (a − bi)r + (a + bi)(a − bi) = r 2 − 2ar + (a 2 + b 2 )

(4.3.10)

This shows that, up to a scalar multiple of the equation, complex roots to the characteristic equation arise from second-order homogeneous linear differential equations of the form y − 2ay + (a 2 + b 2 )y = 0

(4.3.11)

Our work above now enables us to state a formal result on ﬁnding real, linearly independent solutions from complex-valued ones. Theorem 4.3.2 Let a and b be real constants with b = 0. For the second-order homogeneous linear differential equation y − 2ay + (a 2 + b 2 )y = 0 the roots of the corresponding characteristic equation are r = a ± bi and the general solution to the differential equation is given by y = c1 e at cos bt + c2 e at sin bt Note that it is precisely the presence of complex roots to the characteristic equation that produces the periodic functions cos bt and sin bt in the solution. In physical situations such as spring-mass systems and RLC circuits where we anticipate that solutions will have a sinusoidal component, we can expect that the characteristic equation will have complex roots. We conclude this section by applying theorem 4.3.2 in the following example.

Homogeneous equations: repeated and complex roots

285

Example 4.3.2 Solve the initial-value problem given by y + 2y + 10y = 0,

y(0) = 1,

y (0) = 1

Plot the solution and discuss its long-term behavior. Solution. We ﬁrst ﬁnd the general solution to the given differential equation. The corresponding characteristic equation is r 2 + 2r + 10 = 0, with roots r = −1 ± 3i By theorem 4.3.2 it follows that the general solution is y = c1 e −t cos 3t + c2 e −t sin 3t To determine the solution to the stated IVP, ﬁrst note that y(0) = 1 implies that 1 = c1 e 0 cos(0) + c2 e 0 sin(0) so that c1 = 1. In addition, since y = −c1 e −t cos 3t − c2 e −t sin 3t − 3c1 e −t sin 3t + 3c2 e −t cos 3t it follows from the fact that y (0) = 1 that 1 = −c1 + 3c2 Since c1 = 1, we ﬁnd that c2 = 2/3 and hence the solution to the IVP is 2 y = e −t cos 3t + e −t sin 3t 3 Plotting the function y in ﬁgure 4.2, we see that the function y(t ) oscillates due to the presence of the trigonometric functions, while y(t ) → 0 as t → ∞ because of the damping effect of e −t . In fact, the graphical behavior demonstrated by y(t ) in ﬁgure 4.2 is precisely what we would expect if the given IVP was modeling a spring-mass system where relatively small damping is present: the mass will oscillate once sent in motion, but will eventually return to equilibrium. Exercises 4.3 In exercises 1–9, use the characteristic equation to determine the general solution to the given second-order linear homogeneous differential equation. 1. y − 8y + 16y = 0 2. y + y + y = 0 3. y + y + 14 y = 0 4. y − 4y = 0 5. y + 4y = 0 6. y − 10y + 50y = 0

286

Higher order differential equations

y 1.0

t 1

3

5

−1.0 Figure 4.2 A plot of the solution y(t ) to

the IVP given in example 4.3.2.

7. y − 10y + 25y = 0 8. y = 0 9. 2y + 7y + 5y = 0 In exercises 10–18, solve the stated initial-value problem. In addition, graph your solution and discuss its long-term behavior. Note that the general solution to each equation has been found in corresponding problems in exercises 1–9. 10. y − 8y + 16y = 0, 11. y + y + y = 0,

y(0) = 2,

12. y + y + 41 y = 0,

y (0) = 1

y(0) = −4,

y (0) = 2

y(0) = 0,

y (0) = −1

13. y − 4y = 0,

y(0) = 7,

y (0) = −5

14. y + 4y = 0,

y(0) = 2,

y (0) = 3

15. y − 10y + 50y = 0,

y(0) = −3,

y (0) = 1

16. y − 10y + 25y = 0,

y(0) = −2,

y (0) = −6

17. y = 0,

y(0) = 0,

y (0) = 0

18. 2y + 7y + 5y = 0,

y(0) = 9,

y (0) = −3

19. Consider the second-order linear homogeneous equation y − 6y + 9y = 0. (a) Find the general solution y of the given equation. (b) Convert the given equation to a system x = Ax of two ﬁrst-order equations using the substitution x1 = y, x2 = y .

Homogeneous equations: repeated and complex roots

287

(c) Solve the system x = Ax. (d) Compare your results for y and x1 . What do you observe? 20. Consider the second-order linear homogeneous equation y + 6y + 10y = 0. (a) Find the general solution y of the given equation. (b) Convert the given equation to a system x = Ax of two ﬁrst-order equations using the substitution x1 = y, x2 = y . (c) Solve the system x = Ax. (d) Compare your results for y and x1 . What do you observe? 21. Consider the general second-order linear homogeneous equation with constant coefﬁcients given by y + a1 y + a0 y = 0 Under what conditions on a1 and a0 does the equation have two real distinct roots? one real repeated root? two distinct complex roots? Recall that in a spring-mass system, the displacement y(t ) of the mass from its natural equilibrium is governed by the equation c k 1 y + y + y = F (t ) m m m where c is the damping constant, k is the spring constant, m is the mass of the suspended object, and F (t ) is the forcing function. In the following exercises, we assume that units on all quantities and constants are consistent. 22. For an unforced spring-mass system with c = 2, k = 1, and m = 1, determine the displacement of the mass at time t if the system is set in motion with the initial conditions y(0) = 2, y (0) = 1. Sketch the solution you determine and discuss the behavior of the spring-mass system. 23. For an unforced, undamped spring-mass system with k = 9 and m = 3, determine the displacement of the mass from equilibrium at time t if y(0) = 2 and y (0) = 1. Sketch the solution you determine and discuss the behavior of the spring-mass system. 24. For an unforced spring-mass system with c = 1, k = 2, and m = 1, determine the displacement of the mass at time t if the system is set in motion with the initial conditions y(0) = 2, y (0) = 1. Sketch the solution you determine and discuss the behavior of the spring-mass system. Recall that in a standard RLC electrical circuit, the current I (t ) satisﬁes the equation 1 LI (t ) + RI (t ) + I (t ) = E (t ) C where L is the inductance, R is the resistance, C is the capacitance, and E(t ) represents an external voltage source. In the following exercises, we assume that units on all quantities and constants are consistent.

288

Higher order differential equations

25. For an RLC circuit with no external voltage source, L = 10, R = 40, and C = 1/40, determine the current at time t given the initial conditions I (0) = 100, I (0) = 25. Sketch the solution you determine and discuss the behavior of the current. 26. For an RLC circuit with no external voltage source, L = 10, R = 40, and C = 1/50, determine the current at time t given the initial conditions I (0) = 100, I (0) = 25. Sketch the solution you determine and discuss the behavior of the current. 27. For an RLC circuit with no external voltage source, L = 10, R = 0, and C = 1/90, determine the current at time t given the initial conditions I (0) = 100, I (0) = 25. Sketch the solution you determine and discuss the behavior of the current. 4.4 Nonhomogeneous equations

As motivated by a spring-mass system with a driving force or an RLC circuit with an external voltage source, we are now interested in solving second-order nonhomogeneous linear differential equations of the form y + a1 y + a0 y = f (t )

(4.4.1)

where f (t ) is not zero. We already know a theoretical way to solve such an equation: through the substitution x1 = y and x2 = y , we can convert (4.4.1) to a system of two ﬁrst-order equations in the form x = Ax + b and solve the two ﬁrst-order DEs. While this approach works in theory, the actual execution of the process can be cumbersome. In fact, it is often much easier to solve (4.4.1) directly through the approaches we present in this section. Analogous to several other types of linear algebraic and linear differential equations, a general principle from our work with nonhomogeneous equations guides us throughout: we ﬁrst seek a complementary solution yh (t ) to the corresponding homogeneous equation y + a1 y + a0 y = 0

(4.4.2)

and then determine a particular solution yp (t ) to the nonhomogeneous equation (4.4.1). It follows that y = yh + yp will be the general solution to the nonhomogeneous equation. Indeed, we have the following theorem, a part of whose formal proof will be addressed in exercise 33 at the end of this section. Theorem 4.4.1

Given the equation y + a1 y + a0 y = f (t )

(4.4.3)

where a0 and a1 are constants, if yh (t ) is the general solution to the corresponding homogeneous equation y + a1 y + a0 y = 0 and yp (t ) is any solution to the nonhomogeneous equation (4.4.3) then y = yh + yp is the general solution to (4.4.3).

Nonhomogeneous equations

289

We already understand how to ﬁnd yh , which depends entirely on the roots to the characteristic equation r 2 + a1 r + a0 = 0 as discussed in sections 4.2 and 4.3. It remains, however, to ﬁnd yp . To do so, we explore two methods: the guessing technique of undetermined coefﬁcients, and the brute force technique of variation of parameters. Each of these methods is analogous to those that may be used to solve nonhomogeneous systems of the form x = Ax + b. 4.4.1 Undetermined coefﬁcients

At this point in our discussion, examples are instructive. We consider several different nonhomogeneous linear second-order DEs to see how making reasonable guesses for the form of yp (t ) can lead to the general solution in many elementary cases. Throughout, we use the following idea to guide our choice of the form of yp (t ): since the ﬁrst and second derivatives of many functions are similar to the original function (e.g., derivatives of sine and cosine functions are cosine and sine functions, derivatives of exponential functions are exponential functions, derivatives of polynomial functions are polynomials), and in equations of the form (4.4.3) we take linear combinations of y, y , and y to get f (t ), it is reasonable to guess that the form of yp (t ) will be similar to the form of f (t ), the forcing function in the nonhomogeneous equation. We ﬁrst see this for polynomial functions in the ﬁrst example. Example 4.4.1 Determine the general solution to y − 3y − 4y = 4t 2 + 2t − 9

(4.4.4)

Solution. For the associated nonhomogeneous equation, y − 3y − 4y = 0, by theorem 4.2.4 the complementary solution is yh = c1 e −t + c2 e 4t . For a particular solution, we naturally guess that yp has the form yp = at 2 + bt + c

(4.4.5)

based on the form of the forcing function. The undetermined coefﬁcients a, b, and c are found by direct substitution into (4.4.4). Note that yp = 2at + b and yp = 2a, so that from (4.4.4) we ﬁnd 2a − 3(2at + b) − 4(at 2 + bt + c) = 4t 2 + 2t − 9 Rearranging the left-hand side of this equation, it follows −4at 2 + (−6a − 4b)t + (2a − 3b − 4c) = 4t 2 + 2t − 9

(4.4.6)

Equating like coefﬁcients of the power functions present in (4.4.6), the system of equations −4a = 4 −6a − 4b = 2

2a − 3b − 4c = −9

290

Higher order differential equations

must hold. We see that a = −1, from which it follows that b = 1 and c = 1 so that yp = −t 2 + t + 1. Combining this with yh , we have determined that the general solution to (4.4.4) is y = c1 e −t + c2 e 4t − t 2 + t + 1 We can imagine that if f (t ) was a polynomial other than 4t 2 + 2t − 9, we would have guessed that yp was a general polynomial of the same degree with unknown coefﬁcients. This approach almost always works; we will discuss some exceptions that can arise after examples involving non-polynomial forcing functions. Example 4.4.2

Determine the general solution to y − y = 16e 3t

(4.4.7)

Solution. Just as in example 4.4.1, we ﬁrst solve the corresponding homogeneous equation and ﬁnd yh . Doing so, we observe that for y − y = 0, the solution yh is y h = c1 e t + c2 e − t For the particular solution, we use the natural guess that yp = Ae 3t . From this, yp = 3Ae 3t and yp = 9Ae 3t , so substituting into (4.4.7), we ﬁnd 9Ae 3t − Ae 3t = 16e 3t Equating the coefﬁcients of e 3t , it follows that 8A = 16, so A = 2 and therefore yp = 2e 3t . Hence we have found the general solution of (4.4.7) to be y = yh + yp = c1 e t + c2 e −t + 2e 3t Here, we observe that if f (t ) in (4.4.7) were a different exponential function, say of the form f (t ) = Be kt , we would again guess that yp = Ae kt . This is based on the fact that our guess for yp incorporates all the possible forms of the derivatives of f (t ). Just as with polynomial forcing functions, this approach almost always works. We will consider situations where these natural educated guesses can fail following one more example. Example 4.4.3

Solution. to be

Determine the general solution to y − y − 2y = 10 sin t

(4.4.8)

First, we observe that the complementary solution can be shown yh = c1 e 2t + c2 e −t

To ﬁnd yp , we guess that yp = A sin t + B cos t Note that we must include the cosine function in yp in order to account for the fact that the cosine function arises in the derivative of f (t ) = 10 sin t .

Nonhomogeneous equations

291

From our guess for yp , it follows that yp = A cos t − B sin t and yp = −A sin t − B cos t . Substituting in (4.4.8), we see that A and B must satisfy the equation (−A sin t − B cos t ) − (A cos t − B sin t ) − 2(A sin t + B cos t ) = 10 sin t (4.4.9) Rearranging (4.4.9) in order to compare coefﬁcients of the sine and cosine functions, we have (−A + B − 2A) sin t + (−B − A − 2B) cos t = 10 sin t from which it follows that −3A + B = 10 and −A − 3B = 0. Consequently, A = −3 and B = 1, so that yp = −3 sin t + cos t . Therefore we have shown that the general solution of (4.4.8) is y = yh + yp = c1 e 2t + c2 e −t − 3 sin t + cos t In the more general setting where we imagine the forcing function f (t ) involving sin kt or cos kt , it will be natural to make the guess that yp = A sin kt + B cos kt , which again will work in most cases. We have hinted that while the method of undetermined coefﬁcients will usually work, it can occasionally fail. What can go wrong? First, if the forcing function f (t ) is particularly complicated, this can make determining a reasonable guess for yp challenging. Moreover, even if f (t ) is a relatively simple function whose derivatives take on unusual forms—for example, f (t ) = ln t , where f (t ) and f (t ) are not logarithmic—we may ﬁnd it difﬁcult or impossible to ﬁnd a form of yp that works. These two situations will be addressed by the variation of parameters method that we introduce in the next subsection. In addition, there is one more case in which undetermined coefﬁcients can fail, yet the difﬁculty is straightforward to reconcile. An example is instructive. Example 4.4.4 Find the general solution to the differential equation y − y = 16e −t

(4.4.10)

Solution. Note that this differential equation is nearly identical to the one considered in example 4.4.2, but here the forcing function is f (t ) = 16e −t , rather than f (t ) = 16e 3t . As above, it still holds that yh = c1 e t + c2 e −t . In addition, we naturally guess that yp = Ae −t , from which it follows that yp = −Ae −t and yp = Ae −t . Substituting in (4.4.10), we have Ae −t − Ae −t = 16e −t But this last equality is clearly impossible, regardless of the value of A, since 0 = 16e −t is never true. We can determine where the method failed by observing that in this case, our guess for the particular solution yp was actually part of the complementary solution. Note that yh = c1 e t + c2 e −t , from which it follows that yp cannot have the form Ae −t , since this latter function belongs to yh .

292

Higher order differential equations

We therefore need a more complicated guess for yp ; a natural one to attempt is yp = Ate −t

(4.4.11)

where we have introduced the additional multiplier t . From this, yp = −Ate −t + Ae −t and yp = Ate −t − Ae −t − Ae −t . Substituting in (4.4.10), it now follows (Ate −t − 2Ae −t ) − (Ate −t ) = 16e −t Rearranging and simplifying this last equation in order to compare like coefﬁcients of e −t and te −t , we see that the terms involving te −t drop out and we are left with −2Ae −t = 16e −t

so that A = −8 and yp = −8te −t . We therefore have shown that the general solution is y = yh + yp = c1 e t + c2 e −t − 8te −t The preceding example shows that if the form of the forcing function matches the form of one or more parts of the complementary solution yh , then we have to use a different, more complicated guess for yp than the most natural one. One more example will be helpful before we make some general conclusions. Example 4.4.5

Find the general solution of y − y = 4t

(4.4.12)

Solution. From the characteristic equation r 2 − r = 0 for the corresponding homogeneous equation, we quickly deduce that y h = c1 + c 2 e t Next, since f (t ) = 4t , we naturally guess that yp is a ﬁrst order polynomial: yp = at + b. From this, yp = a and yp = 0. Substituting in (4.4.12), we ﬁnd 0 − a = 4t Clearly, there is no value of a that makes −a = 4t for all values of t , so there can be no particular solution yp of the form yp = at + b. From another perspective, we can see why this must be true by observing that the “b” in our guess for yp is already part of the complementary solution since any constant function is a solution to y − y = 0. Therefore, we revise our guess for yp and assume it has form yp = t (at + b) = 2 at + bt . Doing so, we now have yp = 2at + b and yp = 2a, so substituting in (4.4.12) it follows 2a − (2at + b) = 4t

Nonhomogeneous equations

293

Rearranging so that we can equate like coefﬁcients, we have −2at + (2a − b) = 4t

so −2a = 4 and 2a − b = 0. It follows that a = −2 and b = −4, and thus yp = −2t 2 − 4t . Therefore, we have found the general solution of (4.4.12) to be y = c1 + c2 e t − 2t 2 − 4t From our work with examples 4.4.1–4.4.5, we observe that the method of undetermined coefﬁcients breaks down into two fundamental cases Case 1. No functions in the assumed particular solution yp are also solutions to the associated homogenous differential equation. Case 2. A function in the assumed particular solution yp is also a solution of the associated homogeneous differential equation. Moreover, we can observe that when the forcing function f (t ) is a sum of polynomial, exponential, and sine and cosine functions, the linearity of the differential equation allows us to guess a form for yp that is an appropriate sum of all the different types of functions represented. The following example shows some of the variety that arises in choosing the form of yp . Example 4.4.6 Write an appropriate guess for yp for each of the following equations. Do not solve for the unknown coefﬁcients. (a) y + y = 4e 3t + 5t 2 (b) y − 5y − 6y = 3e −2t + 4 cos 3t (c) y − 2y + 5y = 3te t (d) y − 4y − 5y = 3e 2t sin t Solution. (a) The forcing function f (t ) = 4e 3t + 5t 2 combines an exponential function and a second degree polynomial, so we would guess that yp = Ae 3t + bt 2 + ct + d. (b) The natural guess is yp = Ae −2t + B cos 3t + C sin 3t to account for the exponential and trigonometric functions present. (c) f (t ) = 3te t is a product of a linear function and an exponential one. Its derivatives will be sums of functions of the same form and constant multiples of exponential functions, so we assume that yp = Ate t + Be t = e t (At + B). (d) We observe that every derivative of f (t ) = 3e 2t sin t is the sum of functions of the form Ae 2t cos t + Be 2t sin t , so that we would guess that yp = Ae 2t cos t + Be 2t sin t .

294

Higher order differential equations

Note the general rule we are using in case 1 and example 4.4.6: provided the terms of f (t ) do not belong to yh , the form of yp is a linear combination of all linearly independent functions that are generated by repeated differentiation of the forcing function f (t ). For dealing with equations that fall into case 2, we make a guess yp that is a sum of functions similar to those present in f (t ). We then have to tack on powers of t to modify any parts of yp that already appear in yh . In particular, we use the rule that if any part of yp contains terms that duplicate terms in yh , then we must multiply that part by t n using the smallest possible value of n to eliminate the duplication. For example, if we wanted to solve y + 4y + 4 = 3e −2t , which has characteristic equation r 2 + 4r + 4 = (r + 2)2 = 0, our work in section 4.3 implies that yh = c1 e −2t + c2 te −2t Therefore, for the form of yp , which we initially might assume to be yp = Ae −2t , we see that we must in fact introduce a multiplier of t 2 in order to ensure that yp does not appear in yh . Thus, the appropriate form of yp is yp = At 2 e −2t . A few more examples of the possibilities that arise in case 2 are useful. Example 4.4.7 Write an appropriate trial solution yp for each of the following examples. Do not solve for the unknown coefﬁcients. (a) y − y = 4e t + 5e −t (b) y + 4y = 4 cos 2t (c) y − 2y + y = 3te t Solution. (a) Observe from the characteristic equation r 2 − 1 = 0 that yh = c1 e t + c2 e −t , so both parts of the forcing function appear in yh . We therefore assume that yp = Ate t + Bte −t . (b) The characteristic equation is r 2 + 4 = 0 with roots r = ±2i. It follows that yh = c1 sin 2t + c2 cos 2t . Since cos 2t appears in the forcing function, and both sin 2t and cos 2t arise in yh , the appropriate guess for yp is yp = At cos 2t + Bt sin 2t . (c) Note that the characteristic equation is r 2 − 2r + 1 = (r − 1)2 = 0 so that yh = c1 e t + c2 te t . Since te t is included in yh , this implies that we must choose yp = At 2 e t . Obviously the method of undetermined coefﬁcients requires us to be experienced with a wide range of examples and to understand how the derivatives of the forcing function behave. The exercises at the end of this section will provide further practice in this regard.

Nonhomogeneous equations

295

4.4.2 Variation of parameters

Recall that we are focusing on solving the nonhomogeneous linear second-order equation y + a1 y + a0 y = f (t ) While the method of undetermined coefﬁcients works well for a reasonable collection of forcing functions, it has some fairly strict limitations. In particular, it is unclear whether it is possible to make a reasonable guess for yp in order to solve an equation such as y + 4y − 5y = ln t . In fact, we cannot: the derivative of the logarithm function is not a logarithm, and this is the main issue that prevents the use of this method.1 Here, we study a method that will enable us, in theory, to solve a much wider class of nonhomogeneous linear second-order equations; as always, the approach requires us to ﬁnd the general solution to the related homogeneous equation ﬁrst. Let us again consider the equation y + a1 y + a0 y = f (t )

(4.4.13)

where a0 and a1 are constant and assume only that f (t ) is continuous. Suppose we know that y1 (t ) and y2 (t ) are linearly independent solutions of the associated homogeneous equation, so the complementary solution is yh = c1 y1 (t ) + c2 y2 (t ). In the method of undetermined coefﬁcients, we made a guess of a particular solution yp to (4.4.13) based on the form of f (t ). In the method of variation of parameters, we assume instead that the form of yp is a more complicated version of yh . In particular, we assume that yp has the form yp = u1 (t )y1 (t ) + u2 (t )y2 (t )

(4.4.14)

for unknown functions u1 and u2 , where again y1 and y2 are the functions that arose in solving the related homogeneous equation. The goal of variation of parameters is to ﬁnd the functions u1 (t ) and u2 (t ) such that the function yp = u1 y1 + u2 y2 is a particular solution to (4.4.13). Let us explore what conditions u1 (t ) and u2 (t ) must satisfy. Differentiating yp yields yp = u1 y1 + u1 y1 + u2 y2 + u2 y2

(4.4.15)

While it seems natural at this point to differentiate again to ﬁnd yp and substitute into the differential equation, this becomes rather complicated. Above we have seen that the two unknown functions must satisfy one condition (so far), that being the differential equation itself, as stated in (4.4.13). Because we have two functions, we have the freedom to set a second condition as well. In order to make the functions as simple as possible, and to eliminate 1 If we tried the guess y = A ln t , then y = A /t , which introduces a function of an entirely new p p form. If we tried yp = A ln t + B /t , then the derivative leads us to a function involving 1/t 2 , again of a form not considered.

296

Higher order differential equations

the second derivatives of u1 and u2 from arising in yp , we impose a second condition given by (4.4.16) u1 y1 + u2 y2 = 0 Observe now that by substituting the condition (4.4.16) in (4.4.15) we have yp = u1 y1 + u2 y2 so that yp = u1 y1 + u1 y1 + u2 y2 + u2 y2 Substituting the above expressions for yp and yp in (4.4.13) yields (u1 y1 + u1 y1 + u2 y2 + u2 y2 ) + a1 (u1 y1 + u2 y2 ) + a0 (u1 y1 + u2 y2 ) = f (t ) (4.4.17) Reorganizing (4.4.17) according to the terms u1 , u2 , u1 , and u2 , we have u1 (y1 + a1 y1 + a0 y1 ) + u2 (y2 + a1 y2 + a0 y2 ) + (u1 y1 + u2 y2 ) = f (t ) (4.4.18) Now, at this point we recall that y1 and y2 are fundamental solutions to the associated homogeneous equation y + a1 y + a0 = 0, which shows that in (4.4.18) the coefﬁcients of both u1 and u2 are zero. Therefore, (4.4.18) reduces to (4.4.19) u1 y1 + u2 y2 = f (t ) Combining conditions (4.4.16) and (4.4.19) results in the system of linear equations in u1 and u2 given by y1 u1 + y2 u2 = 0 y1 u1 + y2 u2 = f (t ) To solve for u1 and u2 , we multiply the ﬁrst equation by y2 and the second equation by y2 , which gives y2 y1 u1 + y2 y2 u2 = 0 (4.4.20) y2 y1 u1 + y2 y2 u2 = y2 f Subtracting the second equation from the ﬁrst in (4.4.20), we have y2 y1 u1 − y2 y1 u1 = −y2 f and therefore y2 f u1 = y2 y1 − y1 y2

(4.4.21)

Using similar algebra to solve for u2 , we may show that y1 f (4.4.22) u2 = y1 y2 − y2 y1 Finally, to determine u1 and u2 , we integrate to ﬁnd y2 f y1 f (t ) u1 = dt and u2 = dt (4.4.23) y2 y1 − y1 y2 y1 y2 − y2 y1 Once we integrate in (4.4.23) to solve for u1 and u2 , we can conclude that a particular solution yp to the original nonhomogeneous linear second-order

Nonhomogeneous equations

297

differential equation is yp = u1 y1 + u2 y2 where yh = c1 y1 + c2 y2 . Examples will be helpful to demonstrate the key steps of this method. First, we state the formal result proved by our discussion above. Theorem 4.4.2 (Variation of Parameters Method) For the differential equation y + a1 y + a0 y = f (t ), where f is continuous, assume that y1 and y2 are linearly independent solutions of the corresponding homogeneous equation y + a1 y + a2 y = 0. Then, a particular solution to the non-homogeneous equation is yp = u1 y1 + u2 y2 , where u1 and u2 satisfy y2 f y1 f u1 = dt and u2 = dt (4.4.24) y2 y1 − y1 y2 y1 y2 − y2 y1 Example 4.4.8 Solve the differential equation y + y = sec t where we assume that − π2 < t < π2 .

(4.4.25)

Solution. We ﬁrst observe that the corresponding characteristic equation is r 2 + 1 = 0 so that the complementary solution is yh = c1 cos t + c2 sin t . In particular, y1 = cos t and y2 = sin t . We now seek two functions u1 (t ) and u2 (t ) that satisfy the equations (4.4.24). Since y1 = cos t and y2 = sin t , it follows that y1 = − sin t and y2 = cos t , and therefore, we have y2 f sin t sec t u1 = dt = dt y2 y1 − y1 y2 − sin2 t − cos2 t sin t = − sin t sec t dt = − dt = ln(cos t ) cos t and y1 f cos t sec t u2 = dt = dt y1 y2 − y2 y1 cos2 t + sin2 t = 1 dt = t Note that we have used the fundamental trigonometric identity sin2 t + cos2 t = 1 as well as other standard trigonometric relationships such as sec t = 1/ cos t . Also, since we are seeking any two functions u1 and u2 that satisfy (4.4.24), it is not necessary to include the constants that can arise in integrating. Hence we have found that u1 = ln(cos t ) and u2 = t . This enables us to conclude that a particular solution to the equation (4.4.25) is yp = u1 y1 + u2 y2 = ln(cos t ) cos t + t sin t and, therefore, the general solution is y = yh + yp = c1 cos t + c2 sin t + ln(cos t ) cos t + t sin t

298

Higher order differential equations

Example 4.4.9

Solve the equation y + 4y + 4y = e −2t ln t

Solution.

(4.4.26)

To begin, we solve the associated homogeneous equation and get yh = c1 e −2t + c2 te −2t

Thus for variation of parameters, we assume that yp = u1 (t )e −2t + u2 (t )te −2t and we seek u1 and u2 . Since y1 = e −2t and y2 = te −2t , it follows that y1 = −2e −2t and y2 = −2te −2t + e −2t , and therefore by (4.4.24) y2 f te −2t (e −2t ln t ) u1 = dt = dt − 2t − 2t y2 y1 − y1 y2 te (−2e ) − e −2t (−2te −2t + e −2t ) te −4t ln(t ) 1 1 = dt = − t ln t dt = − t 2 ln t + t 2 e −4t (−2t + 2t − 1) 2 4 and

y1 f dt = y1 y2 − y2 y1

u2 = =

e −2t (e −2t ln t ) dt e −4t (−2t + 1 + 2t )

ln t dt = t ln t − t

From these expressions for u1 and u2 , we can conclude that the overall form of the solution y to (4.4.26) is y = yh + yp

1 1 = c1 e −2t + c2 te −2t + − t 2 ln t + t 2 e −2t + (t ln t − t )te −2t 2 4

1 = c1 e −2t + c2 te −2t + t 2 e −2t (2 ln t − 3) 4 Exercises 4.4 In exercises 1–10, determine the complementary solution yh and state the general form of yp that you would guess in applying the method of undetermined coefﬁcients. 1. y − y − 12y = 10e 5t 2. y + y − 2y = 4t 2 − 1 3. y − y = 11e t 4. y + 3y = 3 sin 2t 5. y = t 2 + 3

Nonhomogeneous equations

299

6. y + 4y + 3y = 2t + 4 cos t 7. y + 4y + 4y = t 2 8. y + 4y = 2 sin 2t 9. y + 4y = 20e t cos t 10. y + y − y = 3 In exercises 11–20, solve the stated IVP using the method of undetermined coefﬁcients. Note that the complementary solutions yh and appropriate guesses for yp were found in the corresponding exercises 1–10. 11. y − y − 12y = 10e 5t ,

y(0) = 2,

y (0) = −1

12. y + y − 2y = 4t 2 − 1,

y(0) = 1,

y (0) = 1

13. y − y = 11e t ,

14. y + 3y = 3 sin 2t , 15. y = t 2 + 3,

y (0) = 2

y(0) = −3, y(0) = −2,

y (0) = −2

16. y + 4y + 3y = 2t + 4 cos t , 17. y + 4y + 4y = t 2 , 18. y + 4y = 2 sin 2t ,

y(0) = 2,

y(0) = 5, y(0) = 1,

19. y + 4y = 20e t cos t , 20. y + y − y = 3,

y (0) = 0

y(0) = 0,

y(0) = 0,

y(0) = −1,

y (0) = 0

y (0) = 3 y (0) = −1 y (0) = −1 y (0) = −1

In exercises 21–27, ﬁnd the general solution of the given differential equation using variation of parameters. − π2 < t

solve(rˆ4 - rˆ3 - 7*rˆ2 + r + 6 = 0, r);

Maple produces the output −1, 1, −2, 3

showing that these are the four roots of the characteristic equation. Of course, not all polynomial equations will have all integer solutions, much less all real solutions. For example, if we consider the equation r4 + r3 + r2 + r + 1 = 0 and use the solve command, we see that > solve(rˆ4 + rˆ3 + rˆ2 + r + 1 = 0, r);

Higher order linear differential equations

317

results in the output + + √ √ 1 √ 1√ 1 1 √ 1 1√ − + 5 + I 2 5 + 5, − 5 − + I 2 5 − 5, 4 4 4 4 4 4 + + √ √ 1√ 1 1 √ 1 1√ 1 √ − 5 − − I 2 5 − 5, − + 5− I 2 5+ 5 4 4 4 4 4 4

In this case, we might prefer a decimal approximation to the roots rather than the exactness that Maple provides. One way to achieve this is to use the fsolve command: > fsolve(rˆ4 + rˆ3 + rˆ2 + r + 1 = 0, r, complex);

which generates the result −0.80902 − 0.58779I , −0.80902 + 0.58779I , 0.30902 − 0.95106I , 0.30902 + 0.95106I

Note that without the option “complex” in the fsolve command, the command will not generate any output. This is because the default setting for fsolve is to numerically approximate all of the real roots of the polynomial equation and to ignore complex ones. For polynomial equations of degree 5 or more, the fsolve command is the appropriate tool to use to determine accurate approximations of the equation’s solutions. Exercises 4.6 In exercises 1–12, use the characteristic equation to determine the general solution to the given higher order linear homogeneous DE. 1. y − 2y − y + 2y = 0 2. y − 2y − 3y = 0 3. 4y − 13y − 6y = 0 4. y (4) − 13y + 36y = 0 5. y + 3y + 3y + y = 0 6. y (4) − y − 7y + y + 6y = 0 7. y − y + 4y − 4y = 0 8. y (4) − y = 0 9. y (5) − 2y (4) − y + 2y = 0 10. y (6) + 9y (4) + 24y + 16y = 0 11. y (4) + 4y + 6y + 4y + y = 0 12. y (4) + 3y + y − 5y = 0

318

Higher order differential equations

In exercises 13–22, solve the given IVP. 13. y − 4y = 0,

y(0) = 1, y (0) = 0, y (0) = 2

14. y − 3y + 2y = 0,

y(0) = 0, y (0) = 2, y (0) = 0

15. y − 6y + 11y − 6y = 0,

y(0) = 0, y (0) = 2, y (0) = 0

16. y (4) − 2y − y + 2y = 0,

y(0) = 2, y (0) = 0, y (0) = 10, y (0) = 0

17. y + y + 4y + 4y = 0, 18. y (4) + 5y + 4y = 0, 19. y = 0,

y(0) = 0, y (0) = 10, y (0) = 0

y(0) = 4, y (0) = 0, y (0) = 10, y (0) = 0

y(0) = 2, y (0) = 0, y (0) = 2

20. y (4) − 16y = 0,

y(0) = 4, y (0) = 0, y (0) = 0, y (0) = 0

21. y − 3y + 3y − y = 0, 22. y (5) + y = 0,

y(0) = 1, y (0) = 2, y (0) = 1

y(0) = 1, y (0) = 0, y (0) = 2, y (0) = 0, y (4) (0) = 4

In exercises 23–28, construct a homogeneous linear differential equation of the least possible order that has the given function(s) as solutions. 23. y1 = c, y2 = e t 24. y1 = t 2 e 2t 25. y1 = t , y2 = cos 3t , y3 = e −t 26. y1 = te 4t sin t 27. y1 = e −t /2 cos t , y2 = sin 5t 28. y1 = sin t , y2 (t ) = t sin t 29. Find the general solution to y (4) + 2y + y = cos t . 30. Find a particular solution to y (4) + 2y + y = sin t + 2 cos t . How is your answer similar to the result in exercise 29? In exercises 31–42, use undetermined coefﬁcients to determine the general solution to the stated nonhomogeneous equation. Note that each of the corresponding homogeneous equations has been solved in exercises 1–12. 31. y − 2y − y + 2y = 2 32. y − 2y − 3y = 2e t 33. 4y − 13y − 6y = cos t 34. y (4) − 13y + 36y = t 35. y + 3y + 3y + y = sin t 36. y (4) − y − 7y + y + 6y = t 2 + 3

For further study

319

37. y − y + 4y − 4y = e −t 38. y (4) − y = 3t 39. y (5) − 2y (4) − y + 2y = 7 40. y (6) + 9y (4) + 24y + 16y = t 2 41. y (4) + 4y + 6y + 4y + y = t + cos t 42. y (4) + 3y + y − 5y = 2t − sin t + e t 4.7 For further study 4.7.1 Damped motion

Consider the general form of the spring-mass equation (4.7.1) my + cy + ky = 0 where c = 0 so that viscous damping is present. In what follows, we explore how the values of the constants m, c, and k affect the behavior of the solution y. Note that in this context, m, c, and k are always positive. (a) Show that the roots of the characteristic polynomial of (4.7.1) are √ −c ± c 2 − 4mk λ= 2m (b) We examine the three possible cases for the roots of the characteristic polynomial: √ (i) Suppose that c 2 − 4km > 0. Explain why c 2 − 4mk < c and thus why both roots of the characteristic equation must be negative. State the general solution to the equation (4.7.1) in terms of the constants c, m, and k. (ii) Suppose that c 2 − 4km = 0. Discuss the number of real roots of the characteristic polynomial and state the general solution to the equation (4.7.1) in terms of the constants c and m. (iii) Suppose that c 2 − 4km < 0. Explain why both roots√of the characteristic polynomial are complex. Using = 4mk − c 2 /(2m), state the general solution to the equation (4.7.1) in terms of the constants c, m, and . (c) The respective cases (i), (ii), and (iii) in (b) are typically called overdamping, critical damping, and underdamping. How is the case of underdamping signiﬁcantly different from overdamping and critical damping? Explain both in terms of the algebraic form of the solution as well as in terms of the solution’s expected graph. (d) A 4-kg mass is suspended from a spring with constant k = 25, and a dashpot with various levels of damping viscosity is present. The mass is

320

Higher order differential equations

displaced 0.5 m from its equilibrium and released. Determine the displacement y(t ) of the mass if (i) c = 15, (ii) c = 20, (iii) c = 25, and (iv) c = 30 In each case, state whether the system is overdamped, critically damped, or underdamped, and sketch the solution curve. (e) The case of underdamping is the most interesting of the three cases, for it is here that multiple oscillations through equilibrium occur. In (b)(iii), you should have shown that the general solution may be expressed in the form c

y = e − 2m t (c1 cos t + c2 sin t ) Show that y may be alternatively expressed in the form c

y = Ae − 2m t cos(t − θ )

+

(4.7.2)

where A = c12 + c22 and tan θ = c1 /c2 . (Hint: Set A cos(t − θ ) = c1 cos t + c2 sin t and equate like coefﬁcients after using the trigonometric identity cos(α − β ) = cos α cos β + sin α sin β .) (f) In the underdamped case, we are interested in how fast the amplitude of the oscillations decays to zero. In what follows, we show how the ratio of consecutive local maxima (or minima) of y(t ) depends only on the constants c, m, and . c

(i) Using y = Ae − 2m t cos(t − θ ) from (e), determine y and show that y = 0 if and only if c tan(t − θ ) = − (4.7.3) 2m (ii) If the solutions of (4.7.3) are denoted by tn , then show that " 1 c # nπ θ + (4.7.4) tn = + arctan − 2m Explain why we expect y(tn ) and y(tn+1 ) to be a local maximum and minimum (or local minimum and maximum), respectively, of y(t ), and hence why y(tn ) and y(tn+2 ) will be consecutive maxima or consecutive minima. (iii) Let yn = y(tn ) and yn+2 = y(tn+2 ). Using (4.7.2), evaluate y(tn ) and y(tn+2 ) and verify that yn cos(tn − θ ) − c (tn −tn+2 ) = (4.7.5) e 2m yn+2 cos(tn+2 − θ ) (iv) Show that (4.7.3) implies (tn − tn+2 ) = −2π and thus cos(tn − θ ) π c /m yn = (4.7.6) e yn+2 cos(tn+2 − θ )

For further study

321

(v) Show that tn − θ = tn+2 − θ − 2π

so that cos(tn − θ ) = cos(tn+2 − θ ) Use this last result to prove that yn = e π c /m yn+2

(4.7.7)

(g) The logarithm of (4.7.7), D = ln

πc yn = ln e π c /m = yn+2 m

(4.7.8)

is called the logarithmic decrement. Note that this quantity is independent of t as well as the initial conditions present in the underdamped case for the DE (4.7.1), and that the value of the logarithmic decrement tells us how rapidly consecutive oscillations diminish in the underdamped case. For each of the following underdamped spring-mass systems, determine the solution function y(t ) and compute the logarithmic decrement. Explain how the value of the logarithmic decrement tells you whether oscillations will die out slowly or rapidly. Using a computer algebra system to execute the routine calculations is particularly appropriate here. In each case, assume the mass is displaced 1 m and released. (i) m = 4, c = 19, k = 25 (ii) m = 4, c = 10, k = 25 (iii) m = 4, c = 1, k = 25 (iv) m = 4, c = 0.1, k = 25 4.7.2 Forced oscillations with damping

Consider the general form of the forced spring-mass equation my + cy + ky = f (t )

(4.7.9)

where c > 0 so that viscous damping is present. Again, we remark that in this context m and k are always positive. (a) Show that if

√ c 2 − 4km = 2m

then the complementary solution of (4.7.9) is c yh (t ) = e − 2m t c1 e t + c2 e −t (b) Explain why lim yh (t ) = 0

t →∞

(4.7.10)

322

Higher order differential equations

Recall that we call yh (t ) the transient solution. What does this tell us about the role played by the particular solution yp (t ) in the general solution y = yh + yp as t → ∞? (c) We now consider the effects of the periodic forcing function f (t ) = F0 cos ωt . With this function, we have seen that resonance is only possible when no damping is present; here, we wish to explore the impact of the parameters in f (t ) on the steady-state solution yp to (4.7.9). (i) Use the method of undetermined coefﬁcients to show that with f (t ) = F0 cos ωt , the particular solution yp to (4.7.9) is F0 (k − m ω2 ) cω yp = cos ω t + sin ω t (4.7.11) (k − m ω2 )2 + ω2 c 2 k − m ω2 (ii) As in our study of undamped spring-mass systems and resonance, we let ω0 = k /m. Show that yp (t ) may be equivalently expressed in the form F0 yp = 2 2 cos(ωt − θ ) (4.7.12) m (ω 0 − ω 2 )2 + ω 2 c 2 Compare the result to (4.7.2). (iii) Observe that the amplitude of the oscillation of yp in (4.7.12) is ( ω ) =

F0 2 2 m (ω 0 − ω 2 )2 + ω 2 c 2

(4.7.13)

and that ω0 , m, and c are ﬁxed constants determined by the given spring-mass system. We now examine how the size of these oscillations depends on ω. First, compute d dω Then, set d /d ω = 0 to show that the maximum amplitude occurs when c2 ω2 = ω02 − (4.7.14) 2m 2 (iv) Explain why if c satisﬁes c 2 > 2m 2 ω02 , then there is no value of ω that produces a maximum amplitude of oscillation. In addition, note that when a maximum amplitude exists (i.e., provided c 2 < 2m 2 ω02 ), its value is given by (ω) where ω satisﬁes (4.7.14). Use this condition to compute (ω) and show that 2mF0 max = + c 4m 2 ω02 − c 2

(4.7.15)

For further study

323

(v) Consider a particular spring-mass system for which m = 1 and k = 4 where we consider various damping constants c. In addition, assume we apply the forcing function f (t ) = cos ωt , so that F0 = 1. Recall that ω0 = k /m, so ω0 = 2. For each of the c-values c = 0.1, 1, 2, 3, 4, 5, 6, plot the function (ω) =

F0 2 2 m (ω0 − ω2 )2 + ω2 c 2

on the interval ω = 0 . . . 10. When a maximum oscillation exists, where does it occur? How is the size of the maximum oscillation correlated with c and ω? What should we ensure about the relationship between ω and ω0 if we want to avoid large amplitude oscillations? (d) Complete the following exercises which examine the magnitude of oscillations in damped, driven spring-mass systems. (i) A forcing function f (t ) = 10 sin 2t is imposed on a spring-mass system for which m = 2 kg and k = 8 N/m. Determine the damping constant necessary to limit the amplitude of the motion to a maximum of 2 m. (ii) A forcing function f (t ) = 50 cos ωt is imposed on a spring-mass system for which m = 4 kg, k = 100 N/m, and c = 2 kg/s. Calculate the amplitude of the resulting motion for ω = 4, ω = 4.5, ω = 5, and ω = 6. (iii) Determine the input frequency ω that gives the maximum amplitude for the spring-mass system in (ii) above. For this frequency, what is the maximum amplitude? 4.7.3 The Cauchy–Euler equation

The vast majority of our efforts with higher order DEs have involved linear equations with constant coefﬁcients. The Cauchy–Euler equation is an important example of a linear, second-order DE whose coefﬁcients are not constant. In particular, the Cauchy–Euler equation is a differential equation of form t 2 y + pty + qy = 0

(4.7.16)

where p and q are real constants and t > 0. (a) Explain why it is reasonable to guess that y(t ) = t λ is a solution to (4.7.16). Show by direct substitution in (4.7.16) that the guess y(t ) = t λ requires λ to be a solution to the characteristic equation λ2 + (p − 1)λ + q = 0

(4.7.17)

(b) In the case where (4.7.17) has two distinct real roots λ1 and λ2 , then the general solution to the Cauchy–Euler equation is y = c1 t λ1 + c2 t λ2

324

Higher order differential equations

Solve each of the following Cauchy–Euler initial-value problems: (i) t 2 y − 5ty + 8y = 0, y(1) = 1, y (1) = 0 (ii) t 2 y + 9ty + 12y = 0, y(1) = 1, y (1) = 0 (c) When (4.7.17) has a repeated real root λ1 = λ2 = λ, then we have only determined one linearly independent solution (y1 = t λ ) of the Cauchy–Euler equation. Here we determine a second linearly independent solution. (i) Assuming that λ is a repeated root of (4.7.17), show that 1 − p = 2λ. (ii) Letting v(t ) be an unknown function, consider the guess y2 = v · t λ . By direct substitution in the Cauchy–Euler equation, show that v must satisfy the equation t λ [t 2 v + (2λ + p)tv + (λ2 + (p − 1)λ + q)v ] = 0

(4.7.18)

(iii) Use your work in (i) and (ii), as well as the fact that λ satisﬁes the equation λ2 + (p − 1)λ + q = 0

to show that y2 = v · t λ is a solution of the Cauchy–Euler equation in the case of a repeated root provided that tv + v = 0

(4.7.19)

(iv) Show that v(t ) = ln t is a solution of (4.7.19) and hence state the general solution of the Cauchy–Euler equation in the case where the characteristic equation has a single real repeated root. (d) Solve each of the following Cauchy–Euler initial-value problems: (i) t 2 y + 7ty + 9y = 0, y(1) = 1, y (1) = 0 (ii) t 2 y − 9ty + 25y = 0, y(1) = 1, y (1) = 0 (e) When (4.7.17) has complex roots, say λ1 = a + bi and λ2 = a − bi, then we proceed with a corresponding complex solution to the Cauchy–Euler equation and verify that its real and imaginary parts are themselves real, linearly independent solutions to the equation. In particular, with λ = a + bi, observe that z(t ) = t λ = t a +bi = t a t bi By writing t bi = e ln(t

bi )

= e bi ln t

and applying Euler’s formula, show that z(t ) = t a [cos(b ln t ) + i sin(b ln t )]

(4.7.20)

In addition, show by direct substitution that y1 (t ) = t a cos(b ln t ) is a solution to the Cauchy–Euler equation when a + bi is a root of the

For further study

325

characteristic polynomial. Likewise, show that y2 (t ) = t a sin(b ln t ) is a solution. Hence, state the general solution to the Cauchy–Euler equation in the case where the characteristic polynomial has complex roots λ = a ± bi. (f) Solve each of the following Cauchy–Euler initial-value problems: (i) t 2 y + 3ty + 5y = 0, y(1) = 1, y (1) = 0 (ii) t 2 y − 3ty + 13y = 0, y(1) = 1, y (1) = 0 4.7.4 Companion systems and companion matrices

Given a second-order linear differential equation with constant coefﬁcients such as y + by + cy = 0

(4.7.21)

we know that through the substitution x1 = y, x2 = y we can convert (4.7.21) to the system of ﬁrst-order equations given by x1 = x2

x2 = −cx1 − bx2

(4.7.22)

The system (4.7.22) is called the companion system of (4.7.21). In what follows, we explore the connections between the original equation and its companion system. (a) Consider the homogeneous linear second-order DE y + 3y + 2y = 0

(4.7.23)

Using the guess y = e rt , ﬁnd the characteristic equation of (4.7.23) and the values of r that make y = e rt a solution of the given DE. (b) Convert the DE (4.7.23) into a system of ﬁrst-order equations in the form x = Ax. In addition, determine the eigenvalues of the matrix A. (c) What do you observe about the roots of the characteristic equation in (a) and the eigenvalues of the matrix in (b)? Why is this result not surprising? (d) Find the general solution of the second-order equation (4.7.23) using standard methods from chapter 4. Find the general solution of the ﬁrst-order system you found in (b) using standard methods from chapter 3. Explain how your two results agree. (e) Now consider the general equation (4.7.21) where b and c are arbitrary constants and its corresponding companion system. (i) Show that the roots of the characteristic equation are √ −b ± b 2 − 4c r= 2

326

Higher order differential equations

and that the eigenvalues of the coefﬁcient matrix of the companion system are √ −b ± b 2 − 4c λ= 2 2 (ii) Assuming that b − 4c > 0 so that the values of r in (i) are real and distinct, state the general solution of (4.7.21). (iii) Show that the eigenvectors of the matrix of the companion system that correspond to λ1 and λ2 are given by 1 1 and v1 = v1 = λ1 λ1 √ √ where λ1 = (−b + b 2 − 4c)/2 and λ2 = (−b − b 2 − 4c)/2. State the general solution to the companion system. (iv) Compare your result from (ii) to the result for x2 in (iii). Do your solutions agree? (f) Our work above shows that for any second-order differential equation, there exists a companion system of two ﬁrst-order equations whose vector solution contains the solution of the second-order equation. For the third-order equation y + 2y − y − 2y = 0 ﬁnd the solution of the system directly by using standard methods from chapter 4. Then, ﬁnd the general solution of the ﬁrst-order companion system constructed from the substitution x1 = y, x2 = y , x3 = y using standard methods from chapter 3. Compare your results. (g) In both the direct solution of higher order linear differential equations and in the solution of systems of linear ﬁrst-order equations, the solution methods require us to ﬁnd roots of polynomials. Our work above enables us to see the fact that any polynomial has an associated matrix, a so-called companion matrix, whose eigenvalues are the same as the zeros of the polynomial. In general, given a polynomial function p(t ) = t n + an−1 t n−1 + an−2 t n−2 + · · · + a1 t + a0 the companion matrix of p(t ) is given by ⎡ 0 1 0 0 ··· ⎢ 0 0 1 0 ··· ⎢ ⎢ .. . .. C=⎢ . ⎢ ⎣ 0 0 ··· 0 0 −a0 −a1 −a2 · · · −an−2

0 0 .. . 1 −an−1

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(4.7.24)

That is, C is an n × n matrix whose ﬁrst n − 1 rows are all zero except for the entry just above the diagonal, whose value is 1. The ﬁnal row consists

For further study

327

of the opposites of the coefﬁcients of the constant, linear, etc., terms of the polynomial p. It can be proved that, in general, the eigenvalues of C are the same as the zeros of p(t ). We verify this fact through a few examples. (i) For the polynomial p(t ) = t 2 + 3t + 2, determine the companion matrix C. Compute the eigenvalues of C directly and compare the result to the zeros of p(t ). (ii) For the polynomial p(t ) = t 3 + 3t 2 + 3t + 1, determine the companion matrix C. Compute the eigenvalues of C directly and compare the result to the zeros of p(t ). (iii) For the polynomial p(t ) = t 4 − 1, determine the companion matrix C. Compute the eigenvalues of C directly and compare the result to the zeros of p(t ). (h) For the nth-order linear homogeneous equation y (n) + an−1 y (n−1) + · · · a1 y + a0 y = 0

(4.7.25)

show that the coefﬁcient matrix of the corresponding companion system is in fact that companion matrix of the characteristic polynomial of (4.7.25).

This page intentionally left blank

5 Laplace transforms

5.1 Motivating problems

In this chapter, we again consider solving nonhomogeneous linear differential equations such as y + a1 y + a2 y = f (t ) but in contexts where the forcing function is different from those we have previously encountered. While we have developed the methods of undetermined coefﬁcients and variation of parameters to approach this problem, there are several reasons to consider a different means of solution. Perhaps, most prominent is that in every example to date, we have assumed that the function f (t ) is continuous. Indeed, it has also typically been the case that f (t ) is a standard function, one belonging to the library of basic functions like sin 2t and ln t that we encounter in calculus. In many applications, however, it is possible for f (t ) to be piecewise deﬁned, discontinuous, or worse. We consider two examples that demonstrate these possibilities. Electrical circuits with a voltage source provide a common situation where the forcing function f (t ) is not continuous. If we ﬂip a switch to turn the voltage on, then the forcing function is actually a step function that leaps from zero to a constant value. Recall that the charge Q(t ) in an RLC circuit is modeled by the second-order equation 1 LQ + RQ + Q = E(t ) (5.1.1) C where E(t ) is an external voltage source. Suppose that we are given an RLC circuit with an initial charge Q(0) and initial current Q (0), and that the voltage 329

330

Laplace transforms

E(t ) = 1000 is turned on at t = 4. The voltage function E(t ) is, therefore, deﬁned piecewise by the formula ! 0, if 0 ≤ t < 4 E(t ) = 1000, if t ≥ 4 Let us further assume that L = 20 H, R = 40 , C = 10−2 F, and that Q(0) = 25 and Q (0) = 0. From the given information and (5.1.1), we know that Q(t ) is modeled by the initial-value problem 20Q + 40Q + 100Q = E(t ), Q(0) = 25, Q (0) = 0

(5.1.2)

We have not yet encountered means to deal with a step function as the forcing function in an initial-value problem. In section 5.4, we will discuss step functions in detail, learning how they may be used to turn other functions on and off; in addition, we will show how the Laplace transform provides an ideal tool for dealing with piecewise-deﬁned functions in initial-value problems. With these tools, we will be able to determine the solution Q(t ) for (5.1.2) whose graph is shown in ﬁgure 5.1. Observe that we see the expected damped oscillation in Q(t ) up until time t = 4 when the forcing function E(t ) is turned on, at which point we see the solution driven vertically away from zero so that as t increases, Q(t ) → 10. That Q(t ) approaches 10 should not surprise us since Q(t ) = 10 is a constant solution to the equation 20Q + 40Q + 100Q = 1000 In fact, Q(t ) = 10 is a stable equilibrium solution of the equation. In addition to functions that get turned on or off at a certain time, another important forcing function to consider is a so-called impulse function. These functions are ones where a force is imparted over an extremely short time interval such as a hammer striking a mass. In section 5.4, we introduce the Dirac delta function, δ (t ), study its properties, and see how it may be used in settings such as the following. Q 20

10 t 4

8

Figure 5.1 The solution Q(t ) to

(5.1.2).

Laplace transforms: getting started

331

y 0.4

0.2 t 8

4

Figure 5.2 The solution curve y(t ) to

(5.1.3).

Suppose that a mass of 1 kg is attached to a spring with constant k = 4 and the system’s damping constant is c = 2. In addition, assume that the mass is initially displaced 0.5 m from equilibrium and released. At time t = 4, the mass is struck with a hammer imparting a unit impulse in the positive direction. The combination of all of these conditions leads to the initial-value problem y + 2y + 4y = δ (t − 4),

y(0) = 0.5, y (0) = 0

(5.1.3)

where the function δ (t − 4) represents the hammer imparting the unit force of impulse. Just as with piecewise-deﬁned functions, we will learn that the Laplace transform provides an ideal tool for dealing with impulses. Once we develop the appropriate theory, we will be able to solve initial-value problems such as (5.1.3) and see that the solution behaves as shown in ﬁgure 5.2. In the solution, we see the noticeable impact of the impulse as the problem appears to restart, almost as if new initial conditions have been given at time t = 4. In addition to being able to address discontinuous and impulse forcing functions, the Laplace transform is a powerful tool because it handles all allowable forcing functions in the same manner. Moreover, in each case it proceeds directly to the solution of initial-value problems without ﬁrst ﬁnding the general solution to the differential equation. These ideas and more will be studied in subsequent sections.

5.2 Laplace transforms: getting started

The motivating idea behind the Laplace transform is natural: to solve a differential equation, our desire is to integrate. For the simplest examples, such as y = y, we know that we can separate variables and integrate in order to determine y. However, if we approach the problem y + a0 y = f (t )

(5.2.1)

332

Laplace transforms

by attempting to integrate both sides from 0 to s with respect to t in order to eliminate y , doing so leads to the equation s s s y (t ) dt + a0 y(t ) dt = f (t ) dt (5.2.2) s

0

0

0

While s 0 y (t ) dt = y(s) − y(0) eliminates the derivative y from the equation, and 0 f (t ) dt can s usually be computed for a given f , in (5.2.2) we are left with the expression 0 y(t ) dt , where y is an unknown function. Essentially this step of integrating has replaced the derivative of the unknown function y with its integral in the equation we are endeavoring to solve. This leaves us no closer to ﬁnding the solution function y(t ). Rather than simply trying to integrate, the Laplace transform uses a modiﬁed approach in which every function in (5.2.1) is multiplied by another function before integrating; this approach will enable us to convert differential equations in y(t ) and y (t ) to algebraic equations in a new unknown function Y (s) that we can solve for Y (s). This method is similar to the use of integrating factors when solving linear ﬁrst-order equations. Before we formally deﬁne the Laplace transform, we discuss a few preliminary ideas, some of which are familiar concepts from calculus. First, we assume throughout this chapter that all forcing functions are piecewise continuous functions deﬁned for t > 0 and that

f (0) = f (0+ ) = lim f (t ) t →0+

(5.2.3)

That is, f cannot be discontinuous at the origin itself, though it is allowed to have ﬁnitely many discontinuities for t > 0. Furthermore, we assume that the forcing function does not grow more rapidly than an exponential function. Formally, we will assume that f (t ) is of exponential order, which means that for sufﬁciently large t , |f (t )| ≤ Me bt

(5.2.4)

for positive constants M and b. Functions that are piecewise continuous and meet conditions (5.2.3) and (5.2.4) are called acceptable. For example, polynomial functions, sin kt , e kt , and sums and products of these functions are acceptable, as are piecewise-deﬁned functions with ﬁnitely many discontinuities whose pieces consist of these basic functions. In particular, linear combinations of acceptable functions are acceptable. Functions such as e t , t −1/2 , (t − 1)−1 2

are not acceptable. The ﬁrst grows too rapidly to be of exponential order, the second fails to meet the condition (5.2.3) that a limit exists from the right at the origin, and the third is not piecewise continuous on any interval containing t = 1.

Laplace transforms: getting started

333

In addition, from calculus we recall the following important concepts: t • If y = f (t ) and y(0) = 0, then y = 0 f (s) ds. ∞ • The improper integral 0 f (t ) dt is said to converge whenever

r

lim

r →∞ 0

f (t ) dt

exists. If this limit fails to exist, we say the improper integral diverges. • Given a function of two variables K (s , t ), if we integrate this function with respect to t from t = a to t = b, the result is a function of s. That is,

b

K (s , t ) dt

a

is a function of s. Recall our earlier note regarding the overall approach with Laplace transforms: in order to solve an initial-value problem, we integrate both sides of the differential equation after both sides have been multiplied by a more complicated function. The main idea is that we use the transformation given by

∞

K (s , t )f (t ) dt

0

Knowing the prominent role that the exponential function has played throughout our work with differential equations to date, it is not surprising that we choose to use K (s , t ) = e −st . Speciﬁcally, we make the following deﬁnition. Deﬁnition 5.2.1 Let f (t ) be an acceptable function deﬁned on the interval [0, ∞). The Laplace transform of f (t ), denoted L[f ], is the function deﬁned by ∞ L[f ] = e −st f (t ) dt (5.2.5) 0

We note that because L[f ] is a function of s, we often write F (s) rather than the more explicit L[f (t )]. We consider an example to see the Laplace transform at work. Example 5.2.1 Compute the Laplace transform of f (t ) = t . Solution.

By deﬁnition,

∞

L[t ] = 0

te −st dt

(5.2.6)

334

Laplace transforms

Replacing the improper integral with a limit and integrating by parts, we observe that r L[t ] = lim te −st dt r →∞ 0

1 1 −st r = lim − t+ e r →∞ 0 s s 1 1 −sr 1 1 0 r+ e + 0+ e = lim − r →∞ s s s s r −sr 1 −sr 1 = lim − e − 2 e + 2 r →∞ s s s

(5.2.7)

By L’Hopital’s Rule,1 we know that re −sr → 0 as r → ∞ for each s > 0. Combined with the fact that e −sr → 0 as r → ∞, it follows from (5.2.7) that 1 L[t ] = F (s) = 2 (5.2.8) s Soon we will apply the Laplace transform in order to solve initial-value problems. This process will require us to also use the inverse Laplace transform which asks, “given a function F (s), what function f (t ) is such that L[f (t )] = F (s)?” For instance, (5.2.8) tells us we may write 1 (5.2.9) L−1 2 = t s Much more on inverse transforms will follow as we progress in our study. It is not obvious that the Laplace transform of every acceptable function exists. While we omit the proof, it is possible to prove the following theorem by showing that not only does f (t ) being acceptable guarantee that L[f (t )] = F (s) exists, but that F (s) is a function that must tend to 0 as s → ∞. Theorem 5.2.1 If f (t ) is acceptable, then the Laplace transform F (s) of f (t ) exists. Moreover, 1. sF (s) is bounded as s → ∞, from which it follows that 2. lim F (s) = 0. s →∞

Although it is not necessary for a function to be acceptable in order to have a Laplace transform, our focus will be almost exclusively on acceptable functions. In addition, we note that not all elementary functions can be generated by taking the Laplace transform of an acceptable function. For instance, F (s) = 1 cannot be the Laplace transform of an acceptable function since both parts of theorem 5.2.1 are contradicted. 1

lim

r

r →∞ e sr

= lim

r →∞

1 = 0. se sr

Laplace transforms: getting started

335

The next three examples further illustrate the deﬁnition and notational conventions we use with Laplace transforms. Example 5.2.2 Compute the Laplace transform of f (t ) = 1. Solution.

From the deﬁnition, we observe that ∞ 1 −st r 1 −sr 1 1 −st = L[1] = e dt = lim − e = lim − e + r →∞ r →∞ 0 s s s s 0

since e −sr → 0 as r → ∞. Example 5.2.3 Find the Laplace transform of f (t ) = e at . Solution.

We compute ∞ at at −st L[e ] = e e dt = 0

∞

e

(a −s)t

dt = lim

r →∞ 0

0

r

e (a −s)t dt

1 (a −s)t r 1 (a −s)r 1 1 = lim − = e e r →∞ a − s r →∞ a − s 0 a −s s −a

= lim

provided that s > a, for then e (a −s)r → 0 as r → ∞. At times, we will need to restrict the values of s in order for the Laplace transform to exist. Above, we observed that L[e at ] = 1/(s − a), provided that s > a. Usually, we will suppress the discussion of the restriction on s-values and simply assume that the domain of the Laplace transform is as large as possible. Example 5.2.4 Find L[cos kt ] and L[sin kt ]. Solution.

By deﬁnition,

∞

L[cos kt ] =

cos kte −st dt

0

Integrating by parts twice or using a table of integrals, −st r 1 2 L[cos kt ] = lim 2 k sin kt − s cos kt e r →∞ s + k 2 0 −sr 1 2 1 = lim 2 k sin kr − s cos kr e − (0 − s) r →∞ s + k 2 s2 + k2 −sr 2 e k sin kr e −sr s cos kr s = lim − + (5.2.10) r →∞ s2 + k2 s2 + k2 s2 + k2 Since e −sr → 0 as r → ∞ and | sin kr | and | cos kr | are bounded by 1 as r → ∞, it follows from (5.2.10) that s L[cos kt ] = 2 s + k2

336

Laplace transforms

Similar computations show L[sin kt ] =

k s2 + k2

Table 5.1 Laplace transforms of some basic functions ∞ f (t) F(s) = L[f (t)] = 0 f (t)e−st dt

1

1/s

t

1/s 2

t2

2/s 3

e at

1/(s − a)

cos kt

s /(s 2 + k 2 )

sin kt

k /(s 2 + k 2 )

We close this section with table 5.1, which summarizes the Laplace transforms we have computed so far. Observe that each line in the table may also be written in inverse form. For example, L−1 [1/(s − a)] = e at . This will be particularly useful in the next section as we see the ﬁrst example of how the transform and its inverse can be used to solve an initial-value problem. In order to apply the Laplace transform successfully, we need to develop a deeper understanding of its properties and explore the impact of the transform on a wide range of functions. The following exercises and our investigations in the next section continue our work to this end. Exercises 5.2 In exercises 1–4, explain why the limit of each function g (r) is 0 as r → ∞. In each, assume s > 0. 1. g (r) = re −sr 2. g (r) = r 2 e −sr 3. g (r) = r n e −sr 4. g (r) = e −sr sin kr In exercises 5–16, use the deﬁnition of the Laplace transform to compute L[f (t )]. For each, state the domain of s-values on which L[f (t )] = F (s) is deﬁned. 5. f (t ) = 2t 6. f (t ) = t − 3

General properties of the Laplace transform

337

7. f (t ) = 2 − t 8. f (t ) = t 2 9. f (t ) = t 2 − 3 10. f (t ) = (t − 2)2 11. f (t ) = e 3t 12. f (t ) = e 2t −3 13. f (t ) = e 3t +5 14. f (t ) = cos 4t 15. f (t ) = te at 16. f (t ) = t sin 2t From examples 5.2.2 and 5.2.1, we know that L[1] =

1 1 and L[t ] = 2 s s

Use these facts to compute the Laplace transform of each of the functions in exercises 17–19 with as little computation as possible. What properties of integrals and limits are being used? 17. f (t ) = 1 + t 18. f (t ) = 3t − 2 19. f (t ) = c + kt 20. Explain why the Laplace transform is a linear operator on the vector space of acceptable functions.2 That is, explain why for any real numbers a and b and any acceptable functions f and g , L[af (t ) + bg (t )] = a L[f (t )] + b L[g (t )]

5.3 General properties of the Laplace transform

In many ways, the Laplace transform resembles the differentiation and integration operators from calculus. For example, given a function f (t ) = 3t 4 + 5t + 1, taking the derivative results in a new function f (t ). Using the alternate notation D [f ] for the derivative of f with respect to t , we see that D [3t 4 + 5t + 1] = 12t 3 + 5 2

See appendix D for further discussion on linear transformations of vector spaces.

338

Laplace transforms

In particular, the “D” operator transforms one function into another. Likewise, if we consider the deﬁnite integral of f (t ) = t − 1 from t = 0 to t = x, we ﬁnd that x 1 (t − 1) dt = x 2 − x 2 0 x Letting I (f ) = 0 f (t ) dt , we see that I transforms one function f (t ) into another function F (x) by the process of integration. In the same way, as we have seen in examples 5.2.1–5.2.4, the Laplace transform takes an acceptable function f (t ) and transforms it into a new function F (s) by a process slightly more complicated than standard integration. From calculus and our preceding work with differential equations, we know that taking the derivative of a function is a linear process, as is calculating the deﬁnite integral. More speciﬁcally, for any constants a and b and functions f (t ) and g (t ) that are differentiable and integrable, we know that D [af (t ) + bg (t )] = aD [f (t )] + bD [g (t )] and

x 0

x

[af (t ) + bg (t )] dt = a 0

x

f (t ) dt + b

g (t ) dt 0

Similarly, because the Laplace transform’s deﬁnition involves limits and integrals, it has the same properties of linearity as the derivative and integral operators. In particular, as was shown in exercise 20 of section 5.2, the following theorem holds. Theorem 5.3.1 and g (t ),

For every pair of scalars a and b and acceptable functions f (t ) L[af (t ) + bg (t )] = a L[f (t )] + b L[g (t )]

(5.3.1)

Theorem 5.3.1 shows that the Laplace transform, like the differential and integral operators, is a linear transformation or linear operator. Formally, a linear transformation is a function T that maps one vector space V to another vector space W where T satisﬁes the property that for all constants a and b and all elements u and v in V , T (au + bv) = aT (u) + bT (v). Appendix D provides further discussion on linear transformations of vector spaces. In calculus, following the deﬁnitions of the derivative and the deﬁnite integral, we quickly discover more general properties that enable us to compute derivatives and integrals without using the deﬁnition directly. In the same way, while we have seen a few examples of how to use the deﬁnition to compute the Laplace transform of certain functions f (t ), we can use results such as theorem 5.3.1 to more easily determine the Laplace transform of more complicated functions. Two examples follow. Example 5.3.1

Find the Laplace transform of f (t ) = 7 − 3e 2t .

General properties of the Laplace transform

339

Solution. We know from examples 5.2.2 and 5.2.3 that L[1] = 1/s and L[e 2t ] = 1/(s − 2). By theorem 5.3.1 it now follows that L[7 − 3e 2t ] = 7L[1] − 3L[e 2t ] =

7 3 − s s −2

We note that the individual Laplace transforms are deﬁned on different domains: 7/s is valid for s > 0 while 3/(s − 2) is deﬁned if s > 2. We usually suppress discussion of this issue and assume that L[f (t )] is deﬁned on the largest interval possible. In example 5.3.1, this domain is {s |s > 2}. Example 5.3.2 Find the Laplace transform of cosh kt and sinh kt . Solution. By deﬁnition, the hyperbolic cosine function is given by cosh kt = 1 kt 1 −kt . By the linearity of the Laplace transform, it follows that 2e + 2e 1 1 L[cosh kt ] = L[e kt ] + L[e −kt ] 2 2 1 1 1 s + = 2 = 2 s −k s +k s − k2 Similarly,

L[sinh kt ] = L

1 kt 1 1 1 k − = 2 (e − e −kt ) = 2 2 s −k s +k s − k2

In addition to taking linear combinations of functions, we often want to multiply a given function by t or some power of t . For example, it is natural to wonder if we can use our work in preceding examples to compute L[te at ]. If we ﬁrst consider the Laplace transforms of the simple power functions 1, t , t 2 , and so on, we ﬁnd evidence for a conjecture on how we might approach L[te at ]. In particular, note that L[1] =

1 s

L[t ] =

1 s2

L[t 2 ] =

2 s3

(5.3.2)

The last result was shown in exercise 8 of section 5.2. In fact, we could go on to show that L[t 3 ] = 6/s 4 . This sequence of results reminds us of derivatives: in particular, 1 d 1 2 d 2 6 d 1 (5.3.3) =− 2 = − =− 4 ds s s ds s 2 s 3 ds s 3 s From this sequence of examples, it appears that each time we take a given function f (t ) = t n and multiply it by t , the impact on its Laplace transform is that the transform of the new function is the opposite of the derivative of the transform of the original. Using a result from multivariable calculus known as Leibniz’s rule, a formal proof of this fact may be established, not only for power

340

Laplace transforms

functions, but also for all functions having Laplace transforms. We defer this work to exercise 25 and state the following theorem. If L[f (t )] = F (s), then

Theorem 5.3.2

L[tf (t )] = −F (s) = −

d F (s) ds

(5.3.4)

Theorem 5.3.2 enables us to expand on our observations above regarding the Laplace transforms of the power functions t , t 2 , t 3 , and so on. In particular, replacing F (s) with L[t ], we can take the perspective that (5.3.4) implies L[tf (t )] = −

d L[f (t )] ds

(5.3.5)

This shows that, for example, L[t 4 ] = L[t · t 3 ] = −

d d 6 24 L[t 3 ] = − = 5 ds ds s 4 s

In addition, a generalization of this reasoning can be used to show the following corollary to theorem 5.3.2. See exercise 26. Corollary 5.3.3

For each positive integer n, L[t n f (t )] = (−1)n F (n) (s)

(5.3.6)

We next consider two examples that show how we can use recent results to compute the Laplace transform of familiar functions multiplied by t . Example 5.3.3

Find L[te at ] and L[t 2 e at ].

Solution. We know from earlier work that L[e at ] = 1/(s − a). It follows from theorem 5.3.2 that d d 1 1 L[te at ] = − L[e at ] = − = ds ds s − a (s − a)2 Similarly,

d d 1 2 at L[t e ] = − L[te ] = − = ds ds (s − a)2 (s − a)3 2 at

In fact, as we will see in exercise 27, we can show in general that L[t n e at ] =

Example 5.3.4

Find L[t sin kt ].

n! (s − a)n+1

(5.3.7)

General properties of the Laplace transform

Solution.

341

In example 5.2.4, we showed that L[sin kt ] =

k s2 + k2

Applying theorem 5.3.2, we know that d k 2ks L[t sin kt ] = − = 2 ds s 2 + k 2 (s + k 2 )2 As we have noted, we are motivated to develop the Laplace transform by the need to solve initial-value problems that involve unusual forcing functions. For example, we will soon work to solve equations of the form y + a0 y = f (t )

(5.3.8)

where f (t ) is a step function or other piecewise deﬁned function. We will use our understanding of the Laplace transform to solve these equations by taking the Laplace transform of each side of (5.3.8) to transform the differential equation (in t ) into an algebraic equation (in s). Our hope is that upon doing so, we can solve the new algebraic equation in order to ultimately solve the differential one. To see how this process begins, we take the Laplace transform of both sides of (5.3.8) and apply the linearity property. Doing so results in the equation L[y ] + a0 L[y ] = L[f (t )]

(5.3.9)

Here, we realize that while we can compute L[f (t )] using the deﬁnition or established results, it is unclear how to work with L[y ] and L[y ]. Ideally, if we could understand how the Laplace transform L[y ] of the derivative of an unknown function is related to the Laplace transform L[y ] of the function itself, that would enable us to work with one unknown quantity. To this end, we return to the deﬁnition and show how L[y ] depends on L[y ]. Let us suppose that y and y are acceptable functions and that y is continuous. By deﬁnition, ∞ r −st L[y (t )] = y (t )e dt = lim y (t )e −st dt (5.3.10) r

r →∞ 0

0

To evaluate 0 y (t )e −st dt , we use integration by parts with u = e −st and dv = y (t )dt . It follows that du = −se −st dt and v = y(t ). Integrating3 (5.3.10), r r −st L[y (t )] = lim y(t )e + s y(t )e −st dt r →∞

0

0

= lim y(r)e −sr − y(0) + s r →∞

r

y(t )e −st dt

(5.3.11)

0

3 The integration by parts formula holds since y is continuous. If y has a jump discontinuity, then this part of the argument is more complicated.

342

Laplace transforms

Since y is an acceptable function, it is of exponential order and |y(t )| ≤ Me bt for some positive constants M and b. Assuming that s > b, it follows y(r)e −sr → 0 as r → ∞. In addition, in (5.3.11) we observe r ∞ lim s y(t )e −st dt = s y(t )e −st dt = s L[y(t )] r →∞

0

0

by the deﬁnition of the Laplace transform. Hence, (5.3.11) implies L[y (t )] = s L[y(t )] − y(0) Our work has proved the following theorem. Theorem 5.3.4 Then

(5.3.12)

Suppose y(t ) is continuous and y(t ) and y (t ) are acceptable. L[y (t )] = s L[y(t )] − y(0)

(5.3.13)

Note particularly the appearance of y(0) in the conclusion of theorem 5.3.4. This foreshadows how we will use the Laplace transform to solve an initialvalue problem directly without resorting to a general solution of the associated differential equation. To see further how we will use the Laplace transform, we consider the following example. Example 5.3.5

Use the Laplace transform to solve the initial-value problem y + y = e −t ,

y(0) = 0

(5.3.14)

Solution. We begin by taking the Laplace transform of both sides of (5.3.14) to achieve L[y ] + L[y ] = L[e −t ] (5.3.15) From example 5.2.3, we know that L[e −t ] = 1/(s + 1). Furthermore, we just established L[y ] = s L[y ] − y(0) (5.3.16) Combining (5.3.15), (5.3.16), and the given fact that y(0) = 0, we have 1 (5.3.17) s L[y ] + L[y ] = s +1 Letting Y (s) = L[y ], factoring, and solving for Y (s), 1 (5.3.18) Y (s) = (s + 1)2 To solve the initial-value problem, it remains for us to determine the function y(t ) whose Laplace transform is Y (s) = 1/(s + 1)2 . That is, we must ﬁnd L−1 [Y (s)] = L−1 [1/(s + 1)2 ]. In example 5.3.3, we saw that L[te at ] = 1/(s − a)2 . In particular, 1 1 −1 or L = te −t L[te −t ] = (s + 1)2 (s + 1)2

General properties of the Laplace transform

343

From (5.3.18), it now follows that y(t ) = te −t This is precisely the solution we expect had we applied another method (such as using an integrating factor) to solve (5.3.14). Note particularly that our work in (5.3.14)–(5.3.17) converted the given initialvalue problem (5.3.14) involving y to an algebraic equation (5.3.17) involving L[y ] = Y (s). We then had to use the inverse Laplace transform in order to determine y(t ). This process is typical for how the transform is used to solve IVPs; at this point, we largely need to gain experience with more complicated functions and situations in order to solve more advanced problems. We make note of one more result that relates the Laplace transform of a higher order derivative to the transform of the original function in order to help us solve higher order IVPs before proceeding to establish additional results on products of familiar functions and piecewise-deﬁned functions in order to more fully understand the workings of the Laplace transform. Corollary 5.3.5 Suppose y(t ) and y (t ) are continuous and y(t ), y (t ), and y (t ) are acceptable. Then L[y (t )] = s 2 L[y(t )] − sy(0) − y (0)

(5.3.19)

The proof of corollary 5.3.5 is straightforward by two applications of theorem 5.3.4; see exercise 28. In theorem 5.3.2, we computed the Laplace transform of tf (t ) in terms of the Laplace transform of f (t ). In addition to multiplying by t (or powers of t ), another function that arises frequently in the study of differential equations is e at . Hence we are naturally interested in how L[e at f (t )] is related to L[f (t )]. Letting f (t ) be an acceptable function and L[f (t )] = F (s), we have by deﬁnition that ∞ F (s) = f (t )e −st dt (5.3.20) 0

e at f

For the Laplace transform of (t ), we note that e at f (t ) is an acceptable function and, by deﬁnition, ∞ ∞ L[e at f (t )] = e at f (t )e −st dt = f (t )e −(s −a)t dt (5.3.21) 0

0

From the right-hand sides of (5.3.20) and (5.3.21), we observe that the only difference is that s has been replaced by s − a. In particular, L[e at f (t )] = F (s − a), where L[f (t )] = F (s). We say that F (s) has been shifted by multiplying f (t ) by e at and call the theorem we have just proved the ﬁrst shifting property, which is stated as follows.

344

Laplace transforms

Theorem 5.3.6 (First Shifting Property). Let f (t ) be acceptable and L[f (t )] = F (s). For any real value of a, L[e at f (t )] = F (s − a)

In the next example, we compute three Laplace transforms to show the straightforward application of theorem 5.3.6. Example 5.3.6 Solution.

Find L[e at cos kt ], L[e at sin kt ], and L[e at t 2 ].

We have already established that s s2 + k2

L[cos kt ] =

so by the ﬁrst shifting property, L[e at cos kt ] =

s −a (s − a)2 + k 2

Similarly, from the fact that L[sin kt ] =

k s2 + k2

we observe L[e at sin kt ] =

k (s − a)2 + k 2

Finally, L[t 2 ] =

2 s3

and theorem 5.3.6 together imply L[e at t 2 ] =

2 (s − a)3

A summary of the results we established in this section follows in table 5.2. Exercises 5.3 In exercises 1–5, use the linearity property and the transforms derived in the examples to ﬁnd the Laplace transform of the given function. 1. f (t ) = 3 − e t 2. f (t ) = 4 cos t + 2 sin t 3. f (t ) = 3e 2t − 3 sin 2t 4. f (t ) = 2 + 5 sin 3t 5. f (t ) = 4 cos 5t − 6e −2t

General properties of the Laplace transform

345

Table 5.2 Summary of results on the Laplace Transform from section 5.3 ∞ f (t) F(s) = L[f (t)] = 0 f (t)e−st dt

af (t ) + bg (t )

a L[f (t )] + b L[g (t )]

tf (t )

d −F (s) = − ds L[f (t )]

t n f (t )

(−1)n F (n) (s)

f (t )

s L[f (t )] − f (0) = sF (s) − f (0)

f (t )

s 2 L[f (t )] − sf (0) − f (0) = s 2 F (s) − sf (0) − f (0)

e at f (t )

F (s − a)

In exercises 6–11, use theorem 5.3.2 or corollary 5.3.3 and the transforms derived in the examples to ﬁnd the Laplace transform of the given function. 6. f (t ) = 3te 3t 7. f (t ) = t 2 e −t 8. f (t ) = 3t cos 4t 9. f (t ) = t 3 sin t 10. f (t ) = t 2 cos t 11. f (t ) = 4 cos 5t − 6e −2t In exercises 12– 17, use the ﬁrst shifting property and the transforms derived in the examples to ﬁnd the Laplace transform of the given function. 12. f (t ) = 3te 3t 13. f (t ) = t 2 e −t 14. f (t ) = e −2t cos 4t 15. f (t ) = e −t sin 2t 16. f (t ) = e 4t sinh 2t 17. f (t ) = cosh 2t sin 3t In exercises 18–24, use established general properties and the transforms derived in the examples to ﬁnd the Laplace transform of the given function. 18. f (t ) = 3te 3t − e 2t cos t 19. f (t ) = 4t 2 e −t + 7e −3t sin t 20. f (t ) = e −2t (t 2 + 4t + 5)

346

Laplace transforms

21. f (t ) = (t 2 − t ) sin t 22. f (t ) = t (cos 4t − 2 sin 4t ) 23. f (t ) = te −t sin 2t 24. f (t ) = t 2 e −t sin 2t 25. In multivariable calculus, students may have encountered Leibniz’s rule, which allows differentiation across the integral sign. In particular, the rule states that under reasonable hypotheses on a function K (s , t ), t =b ∂ d t =b K (s , t ) dt = [K (s , t )] dt ds t =a t =a ∂ s Use Leibniz’s rule to explain why theorem 5.3.2 is true. In particular, show that if F (s) = L[f (t )], then −F (s) = L[tf (t )] 26. Using the rule established in theorem 5.3.2, show why corollary 5.3.3 is true. Speciﬁcally, show that if n is a positive integer, then L[t n f (t )] = (−1)n F (n) (s)

(Hint: Apply the theorem to L[t · t n−1 f (t )] to show that L[t n f (t )] = −

d L[t n−1 f (t )] ds

and then repeat this line of reasoning on the expression L[t n−1 f (t )].) 27. Use corollary 5.3.3 to show that L[t n e at ] =

n! (s − a)n+1

28. Apply theorem 5.3.4 twice to prove corollary 5.3.5.

29. Express L f (4) (t ) in terms of L[f (t )] and the ﬁrst three derivatives of f (t ) at t = 0 by using theorem 5.3.4. 30. We have established that L[e at ] = 1/(s − a) for any real number a. Assume now that this formula holds for any complex number a = α + β i, and hence compute the Laplace transform L[e (α+β i)t ]

Use Euler’s formula and properties of complex numbers to show that L[e α t (cos β t + i sin β t )] =

s −α β +i 2 2 (s − α ) + β (s − α )2 + β 2

Explain how equating real and imaginary parts produces an alternate derivation for the Laplace transforms of e α t cos β t and e α t sin β t .

Piecewise continuous functions

2

347

y

1

t a Figure 5.3 The translated unit step

function u(t − a).

5.4 Piecewise continuous functions

In physical applications, we sometimes encounter step functions that represent some quantity being turned on or off, such as an electric switch. If a mass in a spring-mass system is struck with a hammer or a drug is delivered by muscle injection, impulse functions that involve forces acting over very short time periods play a key role. To help us address these and related situations, we study the application of the Laplace transform to two important functions—the Heaviside function and the Dirac delta function. 5.4.1 The Heaviside function

We deﬁne the Heaviside function, or unit step function, denoted u(t ), to be the function that is 0 for all t < 0 and 1 for all t ≥ 0. That is, ! 0, if t < 0 u(t ) = (5.4.1) 1, if t ≥ 0 Often, we will make use of a step function that turns on at t = a, rather than t = 0. Thus we employ the translated unit step function, u(t − a), which by (5.4.1) is given by ! 0, if t < a (5.4.2) u(t − a) = 1, if t ≥ a A plot of the translated unit step function is given in ﬁgure 5.3. Step functions may be used to turn other functions on or off. For example, if we consider the function f (t ) = (4 − t )u(t − 4), we observe that since u(t − 4) = 0

348

Laplace transforms

for t < 4 and u(t − 4) = 1 for t ≥ 4, it follows ! 0, if t < 4 f (t ) = 4 − t , if t ≥ 4

(5.4.3)

From this perspective, we see that the function (4 − t ) is off until t = 4, at which time it is turned on. To see how we can use step functions to turn another function both on and off at various times, we consider the function g (t ) = u(t − a) − u(t − b), where a < b. This difference of translated unit step functions turns on for a ≤ t < b and turns off when t ≥ b. More speciﬁcally, for t < a, both u(t − a) and u(t − b) are zero, so g (t ) = 0. For a ≤ t < b, u(t − a) = 1 and u(t − b) = 0, thus g (t ) = 1. And ﬁnally, once t ≥ b, both u(t − a) = 1 and u(t − b) = 1, so that g (t ) = 0. This can be written equivalently as ⎧ ⎪ ⎨0, if t < a g (t ) = 1, if a ≤ t < b (5.4.4) ⎪ ⎩0, if t ≥ b This property of the function u(t − a) − u(t − b) enables us to write a single formula for any piecewise-deﬁned function that arises, rather than the traditional cases format where we stipulate the different formulas on different intervals, as in (5.4.4). The next example demonstrates the role of u(t − a) − u(t − b). Example 5.4.1 functions.

Deﬁne the following piecewise function using unit step ⎧ ⎪ ⎨t , f (t ) = 2, ⎪ ⎩0,

if 0 ≤ t < 2 if 2 ≤ t < 4 otherwise

Solution. We use the fact that the function u(t ) − u(t − 2) is 1 in the interval 0 ≤ t < 2 and 0 otherwise, and u(t − 2) − u(t − 4) is 1 on 2 ≤ t < 4 and 0 otherwise. Thus, we turn on t for 0 ≤ t < 2 and turn on 2 for 2 ≤ t < 4 by writing f (t ) = t [u(t ) − u(t − 2)] + 2[u(t − 2) − u(t − 4)] = tu(t ) + (2 − t )u(t − 2) − 2u(t − 4)

A plot of f (t ) is shown in ﬁgure 5.4 At this point, we should again not lose sight of our goal: we are interested in using Laplace transforms to solve initial-value problems such as y + 2y + 5y = u(t − 2),

y(0) = 1, y (0) = 0

where the forcing function is turned on at time t = 2. Since we will solve such equations by taking the Laplace transform of both sides, we must understand the

Piecewise continuous functions

349

y

2

1

t 2

4

Figure 5.4 The function f (t ) in

example 5.4.1.

transform of basic step functions. In fact, since step functions will be used to turn other functions on and off, we are more generally interested in L[u(t − a)f (t )]. We return to the deﬁnition to explore this situation further. Because we will employ a change of variables in our work, we begin by using z as a different variable of integration than the usual t in the deﬁnition. Speciﬁcally, from the deﬁnition of the Laplace transform we have ∞ ∞ −sz L[u(t − a)f (t )] = u(z − a)f (z)e dz = f (z)e −sz dz 0

a

The second equality follows from the fact that u(z − a) = 0 for all z < a and u(z − a) = 1 for all z ≥ a, which allows us to eliminate the presence of the unit step function. We now employ the substitution z = t + a and note that t = z − a and dz = dt . From this and our work above, we see ∞ L[u(t − a)f (t )] = f (z)e −sz dz a

= lim

z =r

f (z)e −sz dz

r →∞ z =a t =r −a

= lim

f (t + a)e −s(t +a) dt

= lim

f (t + a)e −st e −as dt

r →∞ t =0 t =r −a r →∞ t =0

(5.4.5)

In (5.4.5), since e −as is constant with respect to t , we can remove it from the integral. Moreover, we can take the limit as r → ∞ and note that (r − a) → ∞ as well. From this, we now have ∞ L[u(t − a)f (t )] = e −as f (t + a)e −st dt 0

350

Laplace transforms

On the right, we observe that the Laplace transform of f (t + a) has arisen, and therefore L[u(t − a)f (t )] = e −as L[f (t + a)]

We call this result the second shifting property and state it formally in the next theorem. Theorem 5.4.1 (Second Shifting Property) If f (t ) has a Laplace transform, then L[u(t − a)f (t )] = e −as L[f (t + a)]

(5.4.6)

When working with inverse transforms, we’ll often use the equivalent formulations of this result that L[u(t − a)f (t − a)] = e −as L[f (t )] or L−1 [e −as F (s)] = u(t − a)f (t − a) (5.4.7)

which come from replacing t with t − a in the argument of f . To see how the second shifting property works and gain more experience with the roles played by unit step functions, we consider several examples. Example 5.4.2

Determine the Laplace transform of the step function, u(t − 3).

Solution. We can view u(t − 3) as the function u(t − 3) · 1. Since we know that L[1] = 1/s, by the second shifting property it follows that L[u(t − 3)] = L[u(t − 3) · 1] = e −3s L[1] =

e −3s s

More generally, we can show that for any a ≥ 0, L[u(t − a)] =

Example 5.4.3 Solution.

e −as s

(5.4.8)

Determine the Laplace transform of f (t ) = u(t − 3) t 2 .

With f (t ) = t 2 , by the second shifting property we have L[u(t − 3) t 2 ] = e −3s L[(t + 3)2 ]

= e −3s L[t 2 + 6t + 9] 2 6 9 −3s =e + + s3 s2 s

Example 5.4.4

Determine the Laplace transform of f (t ) = u(t − a) − u(t − b).

Piecewise continuous functions

351

Solution. Because we know L[u(t − a)] = e −as /s, we can use the linearity of the Laplace transform to ﬁnd 1 1 1 L[u(t − a) − u(t − b)] = e −as − e −bs = (e −as − e −bs ) s s s With our understanding of the Laplace transform of step functions and the second shifting property, we are now prepared to compute transforms of a wide range of step functions. Example 5.4.5 Find the Laplace transform of ⎧ ⎪ if 0 ≤ t < 1 ⎨ 1, f (t ) = t , if 1 ≤ t < 2 ⎪ ⎩2, if 2 ≤ t Solution. We ﬁrst use step functions to write f (t ) with a single formula. Using u(t ) − u(t − 1) to turn 1 on and off, and similar ideas for t and 2, we have f (t ) = 1[u(t ) − u(t − 1)] + t [u(t − 1) − u(t − 2)] + 2u(t − 2) = u(t ) + (t − 1)u(t − 1) + (2 − t )u(t − 2)

Using the linearity of the Laplace transform, the second shifting property, and familiar transforms, L[f (t )] = L[u(t )] + L[(t − 1)u(t − 1)] + L[(2 − t )u(t − 2)]

1 + e −s L[(t + 1) − 1] + e −2s L[2 − (t + 2)] s 1 = + e −s L[t ] + e −2s L[−t ] s 1 1 −s 1 −2s = + 2e − 2e s s s =

Example 5.4.6 Find the Laplace transform of f (t ), where f (t ) is the piecewise linear function shown in the following graph. Solution. From the graph, we see that f has slope 1 on [0, 2) and slope −2 on [2, 3). Therefore, f can be deﬁned piecewise by the rule ⎧ if 0 ≤ t < 2 ⎪ ⎨t , f (t ) = 6 − 2t , if 2 ≤ t < 3 ⎪ ⎩0, if 3 ≤ t Using step functions, we can write f according to the formula f (t ) = t [u(t ) − u(t − 2)] + (6 − 2t )[u(t − 2) − u(t − 3)] = tu(t ) + (6 − 3t )u(t − 2) − (6 − 2t )u(t − 3)

352

Laplace transforms

2

y

1

t 2

4

Applying the second shifting property, linearity, and familiar transforms, we see that L[f (t )] = L[tu(t )] + L[(6 − 3t )u(t − 2)] − L[(6 − 2t )u(t − 3)]

= L[t ] + e −2s L[6 − 3(t + 2)] − e −3s L[6 − 2(t + 3)] = L[t ] + e −2s L[−3t ] − e −3s L[−2t ] =

1 3 2 − 2 e −2s + 2 e −3s 2 s s s

At this point, we have become familiar with piecewise-deﬁned functions and how the Laplace transform may be applied to them. In the near future, we will be solving initial-value problems of the form y + 2y = 6 · u(t − 4), y(0) = 1 through the use of Laplace transforms. In order to assess our progress to date, we explore this approach brieﬂy here. Taking the transform of both sides of the differential equation, 6e −4s s L[y ] − 1 + 2L[y ] = s Letting Y (s) = L[y ] and solving for Y (s), it follows that Y (s)(s + 2) = 1 + so

6e −4s s

1 1 + 6e −4s (5.4.9) s +2 s(s + 2) Here, it remains to determine the function y(t ) whose Laplace transform is Y (s). That is, we must compute the inverse Laplace transform of the righthand side of (5.4.9). Doing so involves using the inverse perspective on the second shifting property, as well as some algebraic work with the quantity 1/s(s + 2). We will pursue these and related ideas further in subsequent sections. Y (s) =

Piecewise continuous functions

353

Next, however, we turn our attention to the study of impulse functions that can model phenomena such as the striking of a hammer. 5.4.2 The Dirac delta function

In physical situations where a large force is delivered over a very short time interval, unit step functions are no longer sufﬁcient to model the forcing function. For example, if a hammer is used to strike a mass attached to a spring at a given time, it is not immediately clear how we should represent this forcing function. To address this situation, physicist Paul Dirac proposed what is today called the Dirac delta function, denoted δ (t ). We seek to understand this function by ﬁrst examining what happens when a force of constant magnitude acts over a smaller and smaller time interval. Suppose that a force Fh of constant magnitude acts on an object over the time interval [a − h , a + h ], where a > 0. Assume that the force is zero otherwise. The impulse (or amount of push) of the force is deﬁned by a +h Fh dt (5.4.10) I= a −h

If we want this constant force Fh to deliver a one-unit impulse, it follows that 1 Fh = 2h More speciﬁcally, if we wish to view the delivered force Fh as being generated by a forcing function Fh (t ), we can use the unit step function to express Fh (t ) through the formula 1 (5.4.11) Fh (t ) = [u(t − (a − h)) − u(t − (a + h))] 2h A plot of Fh (t ) for several different values of h is shown in ﬁgure 5.5; the vertical lines in each are technically not a part of the graph of Fh (t ), but are 10 h = 0.05 h = 0.1

5

h = 0.2 t a Figure 5.5 The forcing function Fh (t ) for h = 0.2, h = 0.1, and h = 0.05.

354

Laplace transforms

included to help contrast the different values of h. Note particularly that Fh (t ) satisﬁes the property that ∞ Fh (t ) dt = 1 (5.4.12) −∞

and that as h → 0, the magnitude of the force grows without bound in order to maintain the same total amount of push being delivered. For an actual impulse, such as when a hammer strikes a mass, we want the force to act instantaneously at time t = a, where a > 0. This instantaneous impulse function is known as the Dirac delta function,4 denoted δ (t − a), and is determined by letting h → 0 in Fh (t ). In particular, we note two key properties of δ (t − a): 1 I. δ (t − a) = lim Fh (t ) = lim [u(t − (a − h)) − u(t − (a + h))] h →0 h →0 2h ∞ II. −∞ δ (t − a) dt = 1 Property I is the deﬁnition of the Dirac delta function; Property II is a consequence of (5.4.12) and taking the limit as h → 0. A good way to think of δ (t − a) is as a function that is zero everywhere except at a, but inﬁnite right at a. Actually, δ (t − a) is a limit of step functions that are nonzero over shorter and shorter intervals, but that always enclose an area of one unit, thus having spikes that grow in magnitude as the interval width shrinks. In situations such as a mass being struck with a hammer, we can now use the delta function to model the forcing function. For instance, if a hammer strikes the mass at t = 3, we can model the forcing function by f (t ) = δ (t − 3). In order to solve initial-value problems that involve the delta function, it will be essential to know the Laplace transform of L[δ (t − a)]. To do so, we ﬁrst apply the deﬁnition of the transform to the step function Fh (t ). In particular, by familiar properties of the Laplace transform, 1 L[Fh (t )] = L [u(t − (a − h)) − u(t − (a + h))] 2h 1 1 L[u(t − (a − h))] − L[u(t − (a + h))] 2h 2h 1 1 −(a −h)s 1 −(a +h)s − e = e 2h s s " # − as e = e hs − e −hs 2hs

=

(5.4.13)

4 Technically, the Dirac delta function is not a function, because it has the unusual property that it is zero everywhere but a, and inﬁnite at t = a. Ultimately, the Laplace transform is what enables us to make sense of this function.

Piecewise continuous functions

355

Since δ (t − a) is deﬁned as the limit of Fh (t ) as h → 0, we naturally deﬁne the Laplace transform of δ (t − a) to be the limit of the Laplace transform of Fh (t ) as h → 0. In particular, from (5.4.13), some algebraic rearrangement, and an application of L’Hopital’s Rule, we can state that # e −as " hs e − e −hs lim L[Fh (t )] = lim h →0 h →0 2hs =

e −as e hs − e −hs lim s h →0 2h

=

e −as se hs + se −hs lim s h →0 2

=

e −as · s = e −as s

We therefore deﬁne L[δ (t − a)] = e −as . We close this section with an example that foreshadows the use of the delta function in a spring-mass system and the role of Laplace transforms in solving the corresponding IVP. Example 5.4.7 Consider a spring mass system where m = 1, k = 13, and c = 4. Assume that the mass is initially displaced 1 m and released. Finally, assume that at t = 3, the mass is struck with a hammer in the positive direction. Set up and solve an initial-value problem that describes this situation. Solution. Using the delta function, the given problem is a standard damped harmonic oscillator equation with an impulse forcing function. In particular, the displacement y of the mass satisﬁes the initial-value problem y + 4y + 13y = δ (t − 3),

y(0) = 1, y (0) = 0

(5.4.14)

Before we solve the IVP, we can use our intuition as a guide: we expect the size of the oscillations of the mass to decrease in magnitude until t = 3, at which time we expect the problem to restart as the blow from the hammer will increase the displacement of the mass, from which oscillations should eventually decrease to zero. We begin to solve (5.4.14) by using the Laplace transform in order to see how far our method enables us to progress. Taking the Laplace transform of both sides of (5.4.14), L[y ] + 4L[y ] + 13L[y ] = L[δ (t − 3)]

From corollary 5.3.5, it follows that s 2 L[y ] − sy(0) − y (0) + 4s L[y ] − 4y(0) + 13L[y ] = L[δ (t − 3)] Using the conditions y(0) = 1 and y (0) = 0, as well as the fact that L[δ (t − 3)] = e −3s , we now have s 2 L[y ] − s + 4s L[y ] − 4 + 13L[y ] = e −3s

356

Laplace transforms

1.0

y

0.5

t 2

4

6

Figure 5.6 The solution to the IVP

(5.4.14).

Solving for L[y ] = Y (s), we see that Y (s)(s 2 + 4s + 13) = s + 4 + e −3s or Y (s) =

s +4 e −3s + s 2 + 4s + 13 s 2 + 4s + 13

(5.4.15)

It remains for us to learn how to compute the inverse Laplace transform of (5.4.15) in order to ﬁnd the solution y to the IVP. The following sections are devoted to these ideas. Upon further study, we will be able to show that the function y(t ) that satisﬁes (5.4.15) is 1 1 y = e −2t (3 cos 3t + 2 sin 3t ) + u(t − 3)e −2(t −3) sin 3(t − 3) 3 3 A plot of this solution is shown in ﬁgure 5.6, where y(t ) demonstrates precisely the type of behavior we expect. The Laplace transform helps us make sense of the Dirac delta function in several ways. One is that we can imagine wanting to say that a hammer strikes a mass with different intensities. If, say, we want to compare the results of the initialvalue problems where a hammer strikes a mass to deliver a given impulse versus what happens when the hammer strikes the mass three times as hard, this at ﬁrst seems to be nonsense: δ (t − 3) and 3δ (t − 3) are both zero everywhere and inﬁnite at t = 3. But the power of the Laplace transform rescues us again. Since by linearity, L[3δ (t − 3)] = 3L[δ (t − 3)] = 3e −3s , the transform detects the difference in the amount of push delivered by the hammer strike, and the results are shown accordingly in the solution to the initial-value problem. In addition, since L[δ (t − a)] = e −as , we know that the presence of e −as in Y (s) will lead to the presence of u(t − a) in y(t ): here we see how the delta function leads to a restart at t = a as the function u(t − a) turns on at this time in the function y(t ).

Piecewise continuous functions

357

5.4.3 The Heaviside and Dirac functions in Maple

Both the Heaviside and Dirac functions belong to Maple’s library of basic functions. The syntax for the Heaviside function is simply > Heaviside(t);. Similarly, the Dirac function is given by > Dirac(t);. For work with the Heaviside function, we often denote the function by u(t ). In Maple, this can be accomplished with the command > u := t -> Heaviside(t);

Then, to enter and plot a piecewise-deﬁned function such as f (t ) = t (u(t ) − u(t − 2)) + (6 − 2t )(u(t − 2) − u(t − 3)) we may use the syntax > f := t -> t*(u(t)-u(t-2)) + (6-2*t)*(u(t-2)-u(t-3)); > plot(f(t), t=-1..5, color=black, thickness=2);

to generate the plot shown in ﬁgure 5.7. More on both the Heaviside function and the Dirac function in Maple, particularly related to their roles in solving initial-value problems with Laplace transforms, can be found in section 5.6.1. 2

y

1

t 2

4

Figure 5.7 The function f (t ) =

t (u(t ) − u(t − 2)) + (6 − 2t ) (u(t − 2) − u(t − 3)).

Exercises 5.4 In exercises 1–7, sketch a graph of each of the following functions and write each in terms of unit step functions. ⎧ ⎪ ⎨0, if 0 ≤ t < 1 1. f (t ) = 1, if 1 ≤ t < 2 ⎪ ⎩0, if 2 ≤ t

358

Laplace transforms

!

1, if 0 ≤ t < 4 2, if 4 ≤ t ⎧ ⎪ ⎨0, if 0 ≤ t < 1 3. f (t ) = t , if 1 ≤ t < 2 ⎪ ⎩t 2 , if 2 ≤ t ! t , if 0 ≤ t < 2 4. f (t ) = 0, if 2 ≤ t ! sin t , if 0 ≤ t < 2π 5. f (t ) = 0, if 2π ≤ t ! sin t , if 0 ≤ t < 2π 6. f (t ) = sin 2t , if 2π ≤ t ⎧ ⎪ if 0 ≤ t < 2 ⎨t , 7. f (t ) = 2, if 2 ≤ t < 4 ⎪ ⎩4 − t , if 4 ≤ t 2. f (t ) =

8. Determine the Laplace Transform of the function f (t ) given in (a) Exercise 1 (b) Exercise 2 (c) Exercise 3 (d) Exercise 4 (e) Exercise 5 (f) Exercise 6 (g) Exercise 7 In exercises 9–11, compute the Laplace transform of f (t ). 9. f (t ) = 2[u(t − 1) − u(t − 3)] + δ (t − 5) 10. f (t ) = 2 sin 5t + δ (t − 3) 11. f (t ) = 2e −3t sin 2t + δ (t − 8) 12. Set up, but do not solve, an initial-value problem that represents a spring-mass system with m = 4 kg, spring constant k = 10, and damping constant c = 2, where a unit impulse is delivered by a hammer at t = 6. Assume the units on all quantities are consistent and that the mass is initially displaced 0.25 m and released. 13. Set up, but do not solve, an initial-value problem that represents a spring-mass system with m = 4 kg, spring constant k = 10, and damping constant c = 2, where a forcing function f (t ) = 3 sin 2t is turned on at t = 4 and an impulse of magnitude 4 is delivered by a hammer at t = 10.

Solving IVPs with the Laplace transform

359

Assume the units on all quantities are consistent and that the mass is initially displaced 0.25 m and released.

5.5 Solving IVPs with the Laplace transform

As we have seen in examples 5.3.5 and 5.4.7, in order to solve initial-value problems using the Laplace transform, the ﬁnal step in the process is to answer the question “what function y(t ) has Laplace transform Y (s)?” In this section, we will further study the inverse Laplace transform, the process that takes the Laplace transform of an unknown function back to the function itself. Throughout, we motivate our work through examples of solving initial-value problems to see some of the typical functions Y (s) that arise in this approach and the steps necessary to determine y(t ) = L−1 [Y (s)]. Example 5.5.1 Use Laplace transforms to solve the initial-value problem y − 2y = 5,

y(0) = 4

Solution. We begin by taking the Laplace transform of both sides of the differential equation. Using the linearity of the transform, L[y ] − 2L[y ] = 5L[1]

By theorem 5.3.4 and the familiar transform of the function f (t ) = 1, it follows that 5 s L[y ] − y(0) − 2L[y ] = s Using the given fact that y(0) = 4 and denoting L[y ] = Y (s), 5 (5.5.1) s Note particularly that (5.5.1) is now an algebraic equation in the unknown function Y (s). Solving for Y (s), we ﬁnd sY (s) − 2Y (s) = 4 +

Y (s) =

4s + 5 s(s − 2)

At this point, we recall that Y (s) = L[y ], where y(t ) is the original unknown function we seek as the solution to the stated IVP. Solving the IVP has now been reduced to ﬁnding the function y(t ) that has Laplace transform Y (s). That is, we seek y(t ) = L−1 [Y (s)]. With a bit of algebraic rearrangement and insight, we can ﬁnd the function y(t ). In particular, using a partial fraction decomposition, we can show that Y (s) =

4s + 5 5/2 13/2 =− + s(s − 2) s s −2

(5.5.2)

360

Laplace transforms

Recalling that L[1] = 1/s and L[e 2t ] = 1/(s − 2), (5.5.2) implies 5 13 y(t ) = − + e 2t 2 2 This is precisely the solution we would ﬁnd to the IVP were we to use an integrating factor or separation of variables to solve the differential equation. Whenever we use the Laplace transform to solve an IVP, we will employ a process similar to our work in example 5.5.1: (1) Take the transform of both sides of the stated differential equation to transform the differential equation in y(t ) into an algebraic equation in Y (s) = L[y ]; (2) Use algebra to solve for Y (s); (3) Determine which function y(t ) has the Laplace transform Y (s). As we have noted previously, given a function F (s), a function f (t ) such that L[f (t )] = F (s) is called the inverse Laplace transform of F . We use the notation L−1 [F (s)] = f (t ). For our purposes, a good way to view the operator L−1 is as one that reverses the work of the Laplace transform. A key step in working backward will be to decompose the function F (s) into more manageable pieces, often through a partial fraction decomposition. A review of partial fractions can be found in appendix A; partial fractions are an algebraic technique that proves useful for more than just integration, as we will see throughout this section. Once the pieces of F (s) are in a recognizable form, we use standard rules we have developed for Laplace transforms to compute the inverse transform. For example, after using partial fractions to decompose Y (s) in example 5.5.1, we showed that since L[e 2t ] = 1/(s − 2), it follows that 1 −1 L = e 2t s −2 More generally, we can state that −1

L

1 = e at s −a

(5.5.3)

Indeed, we realize that we can turn around any known relationship generated by the Laplace transform in order to make a statement about the inverse transform. For example, the inverse transform satisﬁes the linearity property stated in the following theorem. Theorem 5.5.1

For every pair of constants a and b,

L−1 [aF (s) + bG(s)] = a L−1 [F (s)] + b L−1 [G(s)]

Solving IVPs with the Laplace transform

361

Both shifting properties we have developed are regularly used in their inverse form. For the ﬁrst shifting property, given L[f (t )] = F (s), we know that for any real value of a, L[e at f (t )] = F (s − a) Stated differently, this ﬁrst shifting property implies L−1 [F (s − a)] = e at f (t )

(5.5.4)

Likewise, from the slightly revised version of the second shifting property, we know that L[u(t − a)f (t − a)] = e −as L[f (t )] = e −as F (s) and therefore stated in inverse form, L−1 [e −as F (s)] = u(t − a)f (t − a)

(5.5.5)

In our next example, we see how several of these fundamental concepts are employed in practice, speciﬁcally when step functions are involved. Example 5.5.2 Use Laplace transforms to solve the initial-value problem y + y = 5u(t − 1),

y(0) = 4

Solution. Taking the Laplace transform of both sides of the differential equation and applying the initial condition, s L[y ] − 4 + L[y ] = 5L[u(t − 1)] Using the established fact that L[u(t − 1)] = e −s /s and letting Y (s) = L(y), sY (s) − 4 + Y (s) =

5e −s s

Solving for Y (s), 4 1 + 5e −s (5.5.6) s +1 s(s + 1) At this point, we need to use the inverse transform to solve for y(t ). Finding L−1 [4/(s + 1)] is straightforward: by linearity and the ﬁrst shifting property,5 4 L−1 (5.5.7) = 4e −t s +1 Y (s) =

To deal with the remaining term in (5.5.6), we note that with e −s present we will need to use the second shifting property (5.5.5) in reverse. For this, it will be most useful to have the function 1 F (s) = s(s + 1) 5

We know L−1 [1/s ] = 1, and thus the ﬁrst shifting property implies L−1 [1/(s + 1)] = e −t · 1

362

Laplace transforms

y 4

2

t 2

4

6

Figure 5.8 The solution to the IVP

of example 5.5.2.

in a simpler form. Using its partial fraction decomposition, we observe that F (s) =

1 1 − s s +1

By (5.5.5), it now follows that " # 1 1 = 5u(t − 1) 1 − e −(t −1) L−1 5e −s − s s +1

(5.5.8)

Combining our work at (5.5.7) and (5.5.8) to determine y(t ) from (5.5.6), we have shown that y(t ) = 4e −t + 5u(t − 1) − 5u(t − 1)e −(t −1) A plot of this solution curve is shown in ﬁgure 5.8, where we see qualitative behavior consistent with what we would expect from the forcing function in the IVP. In particular, the forcing function is 5u(t − 1), which makes the forcing function behave as if the constant function 5 is turned on at t = 1 in the initialvalue problem. For t = 0 to t = 1, we see the standard exponential decay that we would expect for the homogeneous equation y + y = 0. But at t = 1, the solution function turns and begins to approach the equilibrium solution y = 5 that we expect in the nonhomogeneous equation y + y = 5. We note speciﬁcally that the Laplace transform has successfully handled all of this at once, including the role of the initial condition y(0) = 4 and the corner in the solution function y(t ) at t = 1. We next solve a second-order initial-value problem that involves the unit step function. Here, we will see how the higher order of the equation introduces additional complexity in determining the inverse Laplace transform needed to solve the IVP.

Solving IVPs with the Laplace transform

363

Example 5.5.3 Use the Laplace transform to solve the initial-value problem y + 2y + 5y = u(t − 2),

y(0) = 1, y (0) = 0

(5.5.9)

Solution. Taking the Laplace transform of both sides of (5.5.9) and writing Y (s) = L[y(t )], we observe that e −2s s Substituting the given initial conditions and factoring on the left, we have s 2 Y (s) − sy(0) − y (0) + 2(sY (s) − y(0)) + 5Y (s) =

Y (s)(s 2 + 2s + 5) = s + 2 +

e −2s s

Solving for Y (s), we can write Y (s) = Y1 (s) + Y2 (s) =

s +2 s 2 + 2s + 5

+ e −2s

1 s(s 2 + 2s + 5)

(5.5.10)

It remains for us to determine the function y(t ) whose transform is Y (s). By linearity, it helps for us to break the function Y (s) into the simplest pieces we can; we begin by determining the inverse transform of Y1 (s). Because of shifting properties of the transform (and because of the fact that we cannot factor s 2 + 2s + 5 in an effort to apply partial fractions), it is useful to complete the square in expressions such as s 2 + 2s + 5. We instead write (s + 1)2 + 4, and seek to identify other parts of the expression that involve (s + 1). Separating the numerator (s + 2) into (s + 1) + 1, we can express the ﬁrst term in (5.5.10) as s +2 s +1 1 Y1 (s) = 2 = + (5.5.11) 2 s + 2s + 5 (s + 1) + 4 (s + 1)2 + 4 Recalling that L[cos 2t ] = s /(s 2 + 4) and L[sin 2t ] = 2/(s 2 + 4), we know L−1 [s /(s 2 + 4)] = cos 2t and L−1 [2/(s 2 + 4)] = sin 2t

The inverse of the ﬁrst shifting property, L−1 [F (s + 1)] = e −t f (t ), now implies that s +1 1 1 L−1 + (5.5.12) = e −t cos 2t + e −t sin 2t (s + 1)2 + 4 (s + 1)2 + 4 2 Hence, the ﬁrst term Y1 (s) in (5.5.10) comes from taking the Laplace transform of the function y1 (t ) = e −t cos 2t + 12 e −t sin 2t . From (5.5.10), it remains for us to ﬁnd the function y2 (t ) whose Laplace transform is 1 Y2 (s) = e −2s 2 s(s + 2s + 5) Using a partial fraction decomposition on the rational part of the function, we have 1 1 1 s +2 = e −2s − 2 e −2s 2 s(s + 2s + 5) 5 s s + 2s + 5

364

Laplace transforms

1.0

y

0.5

t 4

8

Figure 5.9 The solution y(t ) to the IVP

in example 5.5.3.

Observe that we have already determined the inverse transform of the function (s + 2)/(s 2 + 2s + 5) above at (5.5.12). Here, we must deal with the additional presence of the constant 1/5, the multiplier e −2s , and the basic function 1/s. Recalling the inverse second shifting property, L−1 [e −as F (s)] = u(t − a)f (t − a), and (5.5.12), we observe that 1 s +2 L−1 e −2s − 2 s s + 2s + 5 1 −(t −2) (cos 2(t − 2) + sin 2(t − 2)) (5.5.13) = u(t − 2) 1 − e 2 Combining (5.5.10), (5.5.12), and (5.5.13), we have shown that the solution y(t ) to the initial-value problem is e −t 1 y(t ) = e −t cos 2t + sin 2t + u(t − 2) 2 5

e −(t −2) −(t −2) 1−e cos 2(t − 2) + sin 2(t − 2) 2 A plot of the function y(t ) is shown in ﬁgure 5.9. Here, we see evidence of the qualitative behavior we expect: until the unit step function turns on, the homogeneous equation should show damped oscillations so that y(t ) → 0. But once the step function turns on, the forcing function makes the equation nonhomogeneous with a constant forcing function, making y = 1/5 the stable equilibrium solution to which y(t ) tends. To further explore the ideas that arise in computing inverse transforms, we next consider a slight modiﬁcation of the preceding example, but in an applied setting where a more complicated forcing function is present. In particular, we examine a spring-mass system in which a periodic forcing function is introduced at t = π .

Solving IVPs with the Laplace transform

365

Example 5.5.4 Consider a mass of 1 kg attached to a spring with spring constant k = 13 such that the system has damping constant c = 4. Assume that the mass is displaced 1 m from equilibrium and released at t = 0; furthermore, at time t = π the forcing function f (t ) = 2 sin 3t is applied. Assuming consistent units, set up an IVP that models this situation and solve the IVP using Laplace transforms. Solution. From our work with spring-mass systems, we know that the displacement y(t ) of the mass from equilibrium must satisfy the initial-value problem y + 4y + 13y = 2u(t − π ) sin 3t ,

y(0) = 1, y (0) = 0

Taking Laplace transforms, it follows that s 2 Y (s) − sy(0) − y (0) + 4(sY (s) − y(0)) + 13Y (s) = 2L[u(t − π ) sin 3t ] (5.5.14) We know that L[sin 3t ] = 3/(s 2 + 9), and by the second shifting property L[u(t − π ) sin 3t ] = e −π s L[sin 3(t + π )]

(5.5.15)

At this point, we observe by basic trigonometry that sin(3t + 3π ) = sin 3t cos 3π + cos 3t sin 3π = − sin 3t . Hence, from (5.5.15) we have 3 L[u(t − π ) sin 3t ] = e −π s L[− sin 3t ] = −e −π s 2 s +9 Returning to (5.5.14) and using the given initial conditions, it follows that 3 s 2 Y (s) − s + 4sY (s) − 4 + 13Y (s) = −2e −π s 2 s +9 Factoring, 3 Y (s)(s 2 + 4s + 13) = s + 4 − 2e −π s 2 s +9 Solving for Y (s), Y (s) = Y1 (s) + Y2 (s) =

s +4 s 2 + 4s + 13

− 2e −π s

3 (s 2 + 9)(s 2 + 4s + 13)

(5.5.16)

It remains to ﬁnd the inverse transform of Y (s); we do so one piece at a time using the linearity of the inverse transform. In both Y1 (s) and Y2 (s), we will algebraically rearrange the expression in order to help us more easily determine the inverse Laplace transform, using an approach similar to our work in example 5.5.3. Taking the ﬁrst term in (5.5.16), we observe that since the denominator does not factor, we need to write it in a more familiar form. Completing the square and separating the numerator enables us to write s +4 s +2 2 Y1 (s) = = + (s + 2)2 + 9 (s + 2)2 + 9 (s + 2)2 + 9

366

Laplace transforms

and see the structure of Laplace transforms of basic functions. In particular, from the ﬁrst shifting property and the known Laplace transforms of cos 3t and sin 3t , it follows that s +2 2 2 −1 −1 = e −2t cos 3t + e −2t sin 3t L [Y1 (s)] = L + 2 2 (s + 2) + 9 (s + 2) + 9 3 (5.5.17) Next we ﬁnd the inverse transform of the term Y2 (s) in (5.5.16). That is, we must determine 1 L−1 [Y2 (s)] = L−1 −6e −π s 2 (5.5.18) (s + 9)(s 2 + 4s + 13) From the presence of e −π s , we know the second shifting property will be used; in addition, we must algebraically rearrange the remaining part of the expression in order to ﬁnd the inverse transform. Computing the partial fraction decomposition of the rational function in (5.5.18), we equivalently seek s −1 s +3 −1 −1 6 −π s L [Y2 (s)] = L − (5.5.19) e 40 s 2 + 9 s 2 + 4s + 13 One additional rearrangement will enable us to ﬁnd the desired inverse transform. Completing the square in the second fraction and separating the numerator in each enables us to rewrite (5.5.19) as s 1 s +2 1 6 L−1 [Y2 (s)] = L−1 e −π s 2 − 2 − − 40 s + 9 s + 9 (s + 2)2 + 9 (s + 2)2 + 9 Applying the inverse of the second shifting property to each of the terms in L−1 [Y2 (s)], it follows that 6 1 L−1 [Y2 (s)] = u(t − π ) cos 3(t − π ) − sin 3(t − π ) 40 3 1 − e −2(t −π ) cos 3(t − π ) − e −2(t −π ) sin 3(t − π ) (5.5.20) 3 Noting that sin(3t − 3π ) = − sin 3t and cos(3t − 3π ) = − cos 3t , we can simplify (5.5.20) to 3 1 1 −1 −2(t −π ) L [Y2 (s)] = u(t − π ) − cos 3t + sin 3t − e (− cos 3t + sin 3t ) 20 3 3 Combining our work with L−1 [Y1 (s)] and L−1 [Y2 (s)], we have therefore shown that y(t ) = L−1 [Y (s)] is the function 2 3 y(t ) = e −2t (cos 3t + sin 3t ) + u(t − π )[− cos 3t 3 20 1 1 + sin 3t − e −2(t −π ) (− cos 3t + sin 3t )] 3 3

Solving IVPs with the Laplace transform

1.0

367

y

0.5

t 3

5

7

9

Figure 5.10 The solution to the IVP in

example 5.5.4.

A plot of the function y(t ) is given in ﬁgure 5.10, where we see that until the forcing function activates at t = π , we see the standard damped oscillations decaying to zero. When the periodic forcing function turns on, the system demonstrates the repeating oscillations generated by this function. At this point in our work, we have been exposed to most of the main ideas necessary for using the Laplace transform to solve initial-value problems. In addition to knowing the standard properties of the transform and its effects on basic functions, we must understand how to compute the inverse transform and the algebraic rearrangements that such inversion entails. Speciﬁcally, we have seen in several examples the need to determine partial fraction decompositions, complete the square, and separate the numerator in fractions. For example, the key computations necessary to ﬁnd the inverse transform of the function 11 F (s) = 2 s(s + 6s + 11) are to ﬁrst determine the partial fraction decomposition and write 1 s +6 F (s) = − 2 s s + 6s + 11 The ﬁrst term is straightforward to invert; but the second term requires further manipulation. Completing the square in the denominator, we see that s 2 + 6s + 11 = (s + 3)2 + 2, and therefore it is convenient to write the numerator as s + 6 = (s + 3) + 3. Doing so, 1 s +3 3 F (s) = − − 2 s (s + 3) + 2 (s + 3)2 + 2 It is at this point, together with the ﬁrst shifting property, that we can ﬁnally compute L−1 [F (s)] and ﬁnd √ √ 3 f (t ) = L−1 [F (s)] = 1 − e −3t cos 2t − √ e −3t sin 2t 2

368

Laplace transforms

Finally, we have also seen that the second shifting property also plays an important role. In the presence of the unit step function u(t − a), the multiplier e −as will arise in F (s). In that case, we must invert e −as F (s); doing so, we get u(t − a)f (t − a), as opposed to simply f (t ). In light of these overall comments, we see the need to practice the computation of inverse Laplace transforms so that we can use these concepts in the solution of initial-value problems. In the next section, we will summarize key properties of the inverse transform, consider a few additional examples of more complicated inverse transforms, demonstrate the role technology plays in computations, and provide exercises for additional practice. We close the current section with an example involving the Dirac delta function. Example 5.5.5 Consider an undamped spring-mass system with spring constant c = 4. Suppose that the mass is displaced 1 unit from equilibrium and struck with a force to impart an initial velocity of y (0) = 1. In addition, at times t = 7 and t = 20, a hammer delivers a one-unit impulse to the mass in the positive direction. Assuming consistent units, set up and solve an IVP that models this situation. Solution. We use the Dirac delta function to represent the impulse forces delivered at times t = 7 and t = 20. Coupled with the standard equation to represent the spring-mass system, we see that the displacement y(t ) of the mass at time t satisﬁes the initial-value problem y + 4y = δ (t − 7) + δ (t − 20), y(0) = 1, y (0) = 1 To solve the IVP, we begin by taking Laplace transforms and ﬁnd that s 2 Y (s) − sy(0) − y (0) + 4Y (s) = L[δ (t − 7)] + L[δ (t − 20)] Recalling that L[δ (t − a)] = e −as and using the given initial conditions, Y (s) must satisfy the equation s 2 Y (s) − s − 1 + 4Y (s) = e −7s + e −20s Factoring, Y (s)(s 2 + 4) = s + 1 + e −7s + e −20s and therefore s 1 1 1 Y (s) = 2 + 2 + e −7s 2 + e −20s 2 s +4 s +4 s +4 s +4 Using the second shifting property to ﬁnd the inverse of the last two terms on the right, we ﬁnd y(t ) = L−1 [Y (s)] 1 1 1 = cos 2t + sin 2t + u(t − 7) sin 2(t − 7) + u(t − 20) sin 2(t − 20) 2 2 2 A plot of the solution function y(t ) is shown in ﬁgure 5.11. We know that because the system is undamped, once it is set in motion it will oscillate at the

Solving IVPs with the Laplace transform

369

y 1

t 30

−1 Figure 5.11 The solution to the IVP of

example 5.5.5.

same amplitude indeﬁnitely in the absence of other forces. When the hammer blows are delivered at t = 7 and t = 20, this will obviously change the amplitude of oscillation. At ﬁrst the observed behavior may seem counterintuitive, as the hammer strikes are diminishing the amount of oscillation. However, if we note that the impulses are delivered in the positive direction at a time when the mass is traveling in the negative direction, then, indeed, the resulting solution accurately models the physical situation. It is interesting to explore how delivering the impulses at other times impacts the system. Note that our work with Laplace transforms in example 5.5.5 is essentially unchanged by the times the impulses occur. In particular, if the hammer strikes occur at t = a and t = b, then the solution will be 1 1 1 y(t ) = cos 2t + sin 2t + u(t − a) sin 2(t − a) + u(t − b) sin 2(t − b) 2 2 2 If we choose a = 9 and b = 18, we see substantially different behavior in the solution function due to the fact that these impulses occur in the same direction as the motion at the time they are delivered. A plot of the solution y(t ) in this case is shown in ﬁgure 5.12. Exercises 5.5 In exercises 1–20, solve the stated initial-value problem using Laplace transforms. In each case, sketch a plot of your solution. 1. y + 5y = 20,

y(0) = 3

2.

y + 3y

= e 2t ,

y(0) = −2

3.

y − 2y

= e 2t ,

y(0) = 1

4.

y + 4y

= sin 3t ,

5.

y + y

= te t ,

y(0) = 5

y(0) = −1

370

Laplace transforms

y

2

t 30

−2 Figure 5.12 The solution to the IVP of

example 5.5.5 where the impulses instead occur at t = 9 and t = 18.

6. y − 8y = u(t − 1),

y(0) = −4

7.

y − 8y

= u(t − 3) · t ,

8.

y − 8y

= δ (t − 1),

9.

y + 9y

= 0,

y(0) = 0, y (0) = 5

10. y − 9y = 0,

y(0) = 2, y (0) = 0

11. y + 9y = 2,

y(0) = 0, y (0) = 1

y(0) = −4 y(0) = −4

12. y + 9y = 5 cos t ,

y(0) = 0, y (0) = 0

13. y + 9y = 5 cos 3t ,

y(0) = 0, y (0) = 0

14. y + 7y + 12y = 0,

y(0) = 0, y (0) = 3

15. y + 6y + 9y = 0,

y(0) = 2, y (0) = 0

16. y + 2y + y = 3t ,

y(0) = 0, y (0) = 0

17. y + 2y + 5y = u(t − 4),

y(0) = 1, y (0) = 0

18. y − 2y − 3y = u(t − 3),

y(0) = 2, y (0) = 0

19. y − 2y − 3y = u(t − 3),

y(0) = 2, y (0) = 0

20. y + 2y + 5y = δ (t − 1),

y(0) = 0, y (0) = 0

For exercises 21–26, solve the stated initial-value problem from exercises 1–20 by standard means developed in preceding chapters (i.e., without using Laplace transforms). 21. y + 3y = e 2t , 22.

y + 4y

y(0) = −2

= sin 3t ,

23. y + y = te t ,

y(0) = 5

y(0) = −1

More on the inverse Laplace transform

24. y + 9y = 2,

371

y(0) = 0, y (0) = 1

25. y + 9y = 5 cos 3t ,

y(0) = 0, y (0) = 0

26. y + 2y + y = 3t ,

y(0) = 0, y (0) = 0

In exercises 27–32, use Laplace transforms to determine the displacement y(t ) of the spring-mass system with spring constant k = 72 and mass m = 2 kg for the given forcing function f (t ). Assume each time the system starts from rest; solve for y(t ) in the cases where the spring constant c is (a) c = 0, (b) c = 2, (c) c = 24, and (d) c = 40, assuming consistent units. Sketch a plot of each solution. 27. f (t ) = 2 28. f (t ) = 10 sin 2t 29. f (t ) = 10 sin 6t 30. f (t ) = 10[u(t ) − u(t − 4π )] 31. f (t ) = 10e −0.2t 32. f (t ) = 100δ (t ) In exercises 33–38, consider an RLC circuit for which an inductor of L = 1 H and capacitor C = 0.01 F are present. For each given forcing function f (t ), use Laplace transforms to determine the charge Q(t ) and current I (t ) in the circuit at time t if initially Q(0) = 0 and I (0) = 0. Determine the charge and current in the cases where the resistance is (a) R = 0 , (b) R = 16 , (c) R = 20 , and (d) R = 25 , assuming consistent units. Sketch a plot of each solution. 33. f (t ) = 10 34. f (t ) = 10 sin 10t 35. f (t ) = 5 sin 10t 36. f (t ) = 10[u(t ) − u(t − 2π )] 37. f (t ) = 10δ (t ) 38. f (t ) = 20e −t 5.6 More on the inverse Laplace transform

In this section, we provide an overall summary of properties of the inverse transform and present some further practice with computations. We close with a discussion of how transforms and inverse transforms may be found using a computer algebra system. To begin, table 5.3 provides a list of familiar functions F (s) and their inverse transforms, as well as several key general properties of the inverse transform.

372

Laplace transforms

Table 5.3 Inverse Laplace transforms of some basic functions and other fundamental properties. F(s)

f (t) = L−1 [F(s)]

1/s n

t n /n !

1/(s − a)

e at

s /(s 2 + k 2 )

cos kt

k /(s 2 + k 2 )

sin kt

s /(s 2 − k 2 )

cosh kt

k /(s 2 − k 2 )

sinh kt

aF (s) + bG(s)

af (t ) + bg (t )

F (s − a)

e at f (t )

e −as

δ (t − a)

e −as F (s)

u(t − a)f (t − a)

Most of the lines in the table are derived from taking the inverse perspective on statements in tables 5.1 and 5.2. While full tables of Laplace transforms typically number many pages, we present only a small collection for use in standard problems involving spring-mass systems and RLC circuits, leaving other examples for exploration in other sources or computer algebra systems. The next several examples demonstrate standard techniques in the computation of inverse transforms. Determine L−1 [F (s)] for each of the following functions: e −2s 2 4se −2π s (b) F (s) = 4 (c) F (s) = 2 (a) F (s) = 2 2 s(s + 1) s + 4s (s + 2s + 5)(s 2 + 9)

Example 5.6.1

Solution. (a) Because of the presence of e −2s in F (s), we will use the second shifting property. But ﬁrst, we ﬁnd the partial fraction decomposition 1 1 1 1 = − − 2 s(s + 1) s s + 1 (s + 1)2 and note that 1 1 1 −1 −1 1 L − − =L s(s + 1)2 s s + 1 (s + 1)2 = 1 − e −t − te −t

More on the inverse Laplace transform

373

Now, in order to compute the inverse transform of the given function, we use the second shifting property to address the presence of e −2s in each term and thus ﬁnd that −2s e −1 L = u(t − 2)[1 − e −(t −2) − (t − 2)e −(t −2) ] s(s + 1)2 (b) Partial fractions shows that F (s) =

2 1 = s 4 + 4s 2 2

1 1 − s2 s2 + 4

Using the inverses of familiar transforms of f (t ) = t and f (t ) = sin 2t , we see 1 1 −1 L [F (s)] = t − sin 2t 2 2 (c) Given the function F (s) =

4se −2π s (s 2 + 2s + 5)(s 2 + 9)

we see that the presence of e −2π s implies the inverse of the second shifting property will be used. As is now custom, we ﬁrst use partial fractions to break the rational part of F (s) into a sum of simpler expressions. Doing so and completing the square to re-express s 2 + 2s + 5, 1 4s − 18 4s − 10 4s = + − 2 (s 2 + 2s + 5)(s 2 + 9) 13 s + 9 s 2 + 2s + 5 1 −4s 18 4(s + 1) 14 + + − = 13 s 2 + 9 s 2 + 9 (s + 1)2 + 4 (s + 1)2 + 4 Letting G(s) = 4s /(s 2 + 2s + 5)(s 2 + 9), it now follows from familiar rules with inverse transforms and the ﬁrst shifting property that L−1 [G(s)] = −

4 18 4 7 cos 3t + sin 3t + e −t cos 2t − e −t sin 2t 13 39 13 13

Finally, since F (s) = e −2π s G(s), the second shifting property implies 4 6 L−1 [F (s)] = u(t − 2π ) − cos 3(t − 2π ) + sin 3(t − 2π ) 13 13 4 −t 7 + u(t − 2π ) e cos 2(t − 2π ) − e −(t −2π ) sin 2(t − 2π ) 13 13

374

Laplace transforms

The 2π shift in each of the sine and cosine functions can be removed; for instance, cos 3(t − 2π ) = cos 3t . Doing so throughout shows that L−1 [F (s)] = u(t − 2π ) 4 6 4 −(t −2π ) 7 −(t −2π ) cos 2t − e sin 2t − cos 3t + sin 3t + e 13 13 13 13

There are certainly other properties of the inverse Laplace transform that we could study. For example, theorem 5.3.4 in inverse form allows us to say that if L−1 [F (s)] = f (t ) and f (0) = 0, then L−1 [sF (s)] = f (t ) (5.6.1) While results like this are theoretically interesting and can occasionally enable us to determine inverse transforms in alternate ways, they are less useful in pragmatic terms when we think of our overarching goal: using Laplace transforms to solve initial-value problems. Indeed, our work throughout this chapter has given us a good overview of how Laplace transforms work, especially the role they play in solving initialvalue problems. Of course, there are also many forcing functions we have not discussed for which Laplace transforms may be taken. There are books that contain lengthy tables of Laplace transforms and inverse transforms that we could, if necessary, consult. But because of the technology available to us, these tables have essentially been rendered obsolete. Most computer algebra systems are fully capable of computing Laplace transforms and their inverses, so we choose not to study methods for these more difﬁcult calculations. The next example demonstrates one such function F (s) which is beyond the methods we have developed but that can easily be handled by a computer algebra system. Example 5.6.2

Solution.

Find the inverse Laplace transform of 9 F (s) = 2 2 (s + 1) (s 2 + 4)2

The partial fraction decomposition of F (s) is 2/3 1 2/3 1 F (s) = − 2 + 2 + 2 + 2 (5.6.2) 2 s + 1 (s + 1) s + 4 (s + 4)2 Two of the terms in (5.6.2) are straightforward to invert, but the two involving squares of irreducible quadratic terms are not among familiar functions from our previous work. In the following subsection, we demonstrate how to use Maple to compute the inverse transform of such functions. These computations reveal that 1 1 1 L−1 = sin t − t cos t (s 2 + 1)2 2 2 and 1 1 1 −1 L = sin 2t − t cos 2t (s 2 + 4)2 16 8

More on the inverse Laplace transform

375

From this work and (5.6.2), we ﬁnd 1 1 1 2 1 1 L−1 [F (s)] = − sin t + sin t − t cos t + sin 2t + sin 2t − t cos 2t 3 2 2 3 16 8 1 1 19 1 = − sin t − t cos t + sin 2t − t cos 2t 6 2 48 8 Further discussion of how to use Maple to compute transforms and inverse transform follows in the next subsection. 5.6.1 Laplace transforms and inverse transforms using Maple

As we have noted, while we have computed Laplace transforms for a range of functions, there are many more examples we have not considered. Moreover, even for familiar functions, certain combinations of them can lead to tedious, involved calculations. Computer algebra systems such as Maple are fully capable of computing Laplace transforms of functions, as well as inverse transforms. Here we demonstrate the syntax required in the solution of the initial-value problem from example 5.5.4: y + 4y + 13y = 2u(t − π ) sin 3t ,

y(0) = 1, y (0) = 0

(5.6.3)

To begin, we load the inttrans package in Maple. > with(inttrans);

If, for example, we desire to use Maple to compute the Laplace transform of 2u(t − π ) sin 3t , we use the syntax > laplace(2*Heaviside(t-Pi)*sin(3*t),t,s);

This command results in the output 6e −s π s2 + 9 which is precisely the transform we expect. After computing by hand the transform of the left-hand side of (5.6.3) and solving for Y (s), as shown in detail in example 5.5.4, we have s +4 3 − 2e −π s 2 Y (s) = 2 2 s + 4s + 13 (s + 9)(s + 4s + 13) −

Here, we may use Maple’s invlaplace command to determine L−1 [Y (s)]. While we could choose to do so all at once, for simplicity of display we do so in two steps. First, > invlaplace((s+4)/(sˆ2 + 4*s + 13),s,t);

376

Laplace transforms

results in the output 1 (−2t ) (3 cos(3t ) + 2 sin(3t )) e 3 Similarly, for the second term in Y (s), we compute

(5.6.4)

> invlaplace(2*exp(-Pi*s)*3/((sˆ2 + 9)*(sˆ2 + 4*s + 13)),s,t);

Maple produces the output 1 Heaviside(t − π )(3 cos(3t ) − sin(3t ) − e (−2t +2π ) (3 cos(3t ) + sin(3t ))) 3 (5.6.5) which corresponds to our work in example 5.5.4. The sum of the two functions of t that have resulted from inverse transforms in (5.6.4) and (5.6.5) is precisely the solution to the IVP. Note that in computing the inverse transform (5.6.5), Maple has implicitly executed the partial fraction decomposition of the expression 3 2 2 (s + 9)(s + 4s + 13) If we wish to ﬁnd this explicitly, we can use the command > convert(3/((sˆ2 + 9)*(sˆ2 + 4*s + 13)), parfrac, s);

which produces the output 1 3 − 3s 1 9 + 3s + 2 2 40 s + 9 40 s + 4s + 13 In general, we see that to compute the Laplace transform of f (t ) in Maple we use the syntax > laplace(f(t),t,s);

whereas to compute the inverse transform of F (s), we enter > invlaplace(F(s),s,t);

Exercises 5.6 In exercises 1–9, ﬁnd the inverse Laplace transform of the given function F (s) using familiar techniques or a computer algebra system. 1. F (s) =

2s (s + 3)2

2. F (s) =

4 (s 2 − 4)2

More on the inverse Laplace transform

3. F (s) =

1 s 2 (s − 2)

4. F (s) =

2 (s 2 − 1)2 (s 2 + 1)

5. F (s) =

s2 + 1 (s + 1)2 (s 2 + 4)

6. F (s) =

e −s s 2 (s − 2)

7. F (s) = e −3s 8. F (s) =

377

2 (s 2 − 1)2 (s 2 + 1)

5s 2 + 20 s(s − 1)(s 2 − 5s + 4)

9. F (s) = e −π s

5s 2 + 20 s(s − 1)(s 2 − 5s + 4)

In exercises 10–22, solve the stated initial-value problem using Laplace transforms (using a computer algebra system as necessary). Sketch a plot of each solution. 10. y + y = e −t + te −t , 11. y + 4y = sin 2t ,

y(0) = 1 y (0) = 1

y(0) = 0,

12. y + 4y = sin 2t + δ (t − 6),

y (0) = 1

y(0) = 0,

13. y + 4y = sin 2t + δ (t − 6) + δ (t − 12),

y(0) = 0,

14. y + 9y = cos 3t + t cos 3t ,

y(0) = 0,

y (0) = 1

15. y + 2y + 5y = e −t sin 2t ,

y(0) = 0,

y (0) = 1

16. y + 2y + 5y = e −t sin 2t + te −t sin 2t ,

y(0) = 0,

17. y + 2y + 5y = e −t sin 2t + u(t − π )te −t sin 2t , 18. y + y − 2y = 4e t + 1,

20. y + y − 2y = 4e t + u(t − 3),

y(0) = 0,

y(0) = 1, y(0) = 1,

y (0) = 1

y(0) = 0,

y (0) = 0

y (0) = 0

21. y + 2y + 5y = e −t sin 2t + te −t sin 2t + δ (t − 5), 22. y + 2y + 5y = 13e t sin t ,

y (0) = 1

y (0) = 0

y(0) = 1,

19. y + y − 2y = 4e t + 1 + δ (t − 3),

y (0) = 1

y(0) = 0,

y (0) = 0

y (0) = 1

378

Laplace transforms

5.7 For further study 5.7.1 Laplace transforms of inﬁnite series

If f (t ) is a function of exponential order that is analytic6 at t = 0 with an inﬁnite radius of convergence, then f (t ) may be expressed as a power series and also has a Laplace transform. It therefore follows that if f (t ) =

∞ ,

an t n

n =0

then its transform is F (s) = L[f (t )] =

∞ ,

an L[t n ] =

n =0

∞ ,

n !an

n =0

1 s n +1

(5.7.1)

We begin by exploring the transforms of some familiar functions through the use of inﬁnite series. (a) Recall that f (t ) = e t is analytic at t = 0 with series expansion et =

∞ n , t

n =0

n!

= 1+t +

t2 t3 + + ··· 2! 3!

(5.7.2)

By taking the Laplace transform of the series (5.7.2) term-wise,7 show that L[e t ] =

∞ , 1 s n+1

(5.7.3)

n =0

Then, recognize (5.7.3) as a geometric series to show that L[e t ] =

1 s −1

(b) Similarly, use the fact that f (t ) = sin t has the series expansion sin t = t −

t3 t5 + − ··· 3! 5!

to show using inﬁnite series that L[sin t ] =

1 s2 + 1

6 More on power series expansions of functions and the meaning of terms such as “analytic” may be found in Section 8.2. 7 While the Laplace transform of a ﬁnite sum is the sum of the Laplace transforms of the individual terms, it is not obvious that this property holds for inﬁnite sums. The formal justiﬁcation that this is valid in what follows is beyond the scope of this text; the reader may assume that this step is valid, and proceed as directed.

For further study

379

In addition, develop the Laplace transform of f (t ) = cos t using the series expansion cos t = 1 − t 2 /2! + t 4 /4! − · · · . While power series expansions of such familiar functions as e t , sin t , and cos t are important and offer a different perspective on the development of the transforms of these functions, power series are even more useful for working with functions that are more complicated. For example, if we seek the transform of e −t − 1 (5.7.4) f (t ) = t none of the methods we have previously discussed apply. However, standard techniques8 with inﬁnite series may be used to address functions such as (5.7.4). (c) Use the standard power series expansion for e t to show that f (t ) = (e −t − 1)/t has the series expansion ∞

, (−1)n e −t − 1 t t2 t3 = −1 + − + − · · · = t n −1 t 2! 3! 4! n! n =1

Then, compute the Laplace transform of the series expression to show that −t e −1 1 1 1 L (5.7.5) = − + 2 − 3 + ··· t s 2s 3s (d) Even though the Laplace transform of an analytic function will result in an inﬁnite sum involving negative powers of s, sometime we can recognize the transform as a familiar function. To see this in (5.7.5), use the known series expansion 1 1 ln(1 + x) = x − x 2 + x 3 − · · · 2 3 and the substitution x = 1/s to show that −t e −1 1 L = − ln 1 + t s (e) From the standard series expansion for the function sin t , determine the Taylor series of sin t (5.7.6) f (t ) = t and hence compute the Laplace transform of (5.7.6). Then, use the expansion 1 1 1 arctan x = x − x 3 + x 5 − x 7 + · · · 3 5 7 8

A review of the development of power series of functions can be found in section 8.2.

380

Laplace transforms

and an appropriate substitution to show that sin t 1 L = arctan t s (f) Use series techniques to show that cos t − 1 1 1 L = − ln 1 + 2 s t 2 5.7.2 Laplace transforms of periodic forcing functions

Nonhomogeneous differential equations often involve periodic forcing functions. In section 4.5, we considered the effects of the forcing function f (t ) = sin ωt in connection with the natural frequency of a system. More generally, here we examine periodic forcing functions that are piecewise continuous. Such functions satisfy the relationship that for some value of a, f (t ) = f (t + a) + f (t + 2a) + f (t + 3a) + · · · + f (t + na) + · · ·

(5.7.7)

An example of such a function is shown in ﬁgure 5.13. Taking the Laplace transform of such a function f , we may write the transform as the inﬁnite sum of integrals ∞ L[f (t )] = f (t )e −st dt

0 a

=

f (t )e −st dt +

0

2a

a

f (t )e −st dt +

3a

f (t )e −st dt + · · ·

2a

f(t) a

t

Figure 5.13 A periodic function with

period a that is piecewise continuous.

(5.7.8)

For further study

381

(a) Using the change of variables t = τ + a in the second integral, t = τ + 2a in the third, and so on, show that a a −st L[f (t )] = f (t )e dt + f (τ + a)e −s(τ +a) d τ 0

0

a

+

f (τ + 2a)e −s(τ +2a) d τ + · · ·

(5.7.9)

0

(b) By replacing the integration variable τ with t in (5.7.9), show that a L[f (t )] = [1 + e −as + e −2as + · · · ] f (t )e −st dt (5.7.10) 0

Then, use the fact that the inﬁnite series in (5.7.10) is geometric in order to conclude a 1 L[f (t )] = f (t )e −st dt (5.7.11) 1 − e −as 0 (c) Use (5.7.11) to determine the Laplace transform of the square wave function shown in ﬁgure 5.14. (The vertical lines shown in the graph are not actually part of the function’s graph; indeed, f is piecewise constant with value 3 on [0, 2) and value −3 on [−2, 4), and so on.) In particular, show that L[f (t )] =

3 1 − e −2s · s 1 + e −2s

where f (t ) is the function pictured in ﬁgure 5.14. y 3

t −1

1

3

5

7

−3 Figure 5.14 A square wave with amplitude 3

and period 4.

382

Laplace transforms

(d) Consider the periodic function with period 2π given by ! sin t , if 0 < t < π f (t ) = 0, if π < t < 2π This function is called the half-rectiﬁed sine wave since it only consists of the top-half of the standard sine function. Sketch a graph of this function and show that its Laplace transform is L[f (t )] =

1 + e −π s (1 − e −2π s )(s 2 + 1)

(e) Let a slightly damped spring-mass system be given with m = 1, c = 0.02, and k = 25, and be driven by a square-wave periodic forcing function f (t ) with amplitude 5 and period 2π . We will use Laplace transforms to solve the initial-value problem that governs this system under the assumption that the system starts from rest. (i) The stated problem is modeled by the initial-value problem y + 0.02y + 25y = f (t ),

y(0) = 0, y (0) = 0

Take Laplace transforms to show that Y (s) = L[y(t )] must satisfy the equation F (s) (5.7.12) Y (s) = 2 s + 0.02s + 25 where F (s) = L[f (t )]. (ii) While we have learned in (c) how to write the transform of a square wave function without using inﬁnite series in its expression, it turns out for this problem that a series expansion is necessary for ﬁnding the inverse transform when solving the IVP. By writing the square wave function given in this problem in the form f (t ) = 5u(t ) − 10u(t − π ) + 10u(t − 2π ) − 10u(t − 3π ) + · · · show that 5 F (s) = L[f (t )] = [1 − 2e −π s + 2e −2π s − 2e −3π s + · · · ] s (iii) Explain why 1 1 ≈ s 2 + 0.02s + 25 (s + 0.01)2 + 52

(5.7.13)

(5.7.14)

(iv) Combine (5.7.12), (5.7.13), and (5.7.14) in order to conclude that y(t ) = L−1 [Y (s)] 5 −π s −2π s = L−1 1 − 2e + 2e − · · · ] (5.7.15) [ s [(s + 0.01)2 + 52 ]

For further study

383

Explain why we have to ﬁnd the inverse transform in (5.7.15) term-by-term. (v) Compute the inverse transform of the ﬁrst term 5 y1 (t ) = L−1 s [(s + 0.01)2 + 52 ] in (5.7.15) given the partial fraction decomposition 5 0.2 0.2s + 0.004 = − 2 2 s [(s + 0.01) + 5 ] s (s + 0.01)2 + 52 (Hint: 0.2s + 0.004 = 0.2(s + 0.01) + 0.002) Conclude that y1 (t ) = 0.2 − e −0.01t (0.2 cos 5t − 0.0004 sin 5t )

(5.7.16)

(vi) Compute the inverse transform of the second term 5 −1 −π s y2 (t ) = L −2e s [(s + 0.01)2 + 52 ] in (5.7.15) using (5.7.16) and the second shifting property. Using the fact that cos 5(t − π ) = − cos 5t and sin 5(t − π ) = − sin 5t , conclude that . y2 (t ) = −2u(t − π ) 0.2 + e −0.01(t −π ) (0.2 cos 5t + 0.0004 sin 5t ) = −2u(t − π ){0.2 + e 0.01π [0.2 − y0 (t )]}

(5.7.17)

(vii) Compute the inverse transform of the third term 5 −1 −2π s y3 (t ) = L 2e s [(s + 0.01)2 + 52 ] in (5.7.15) using (5.7.16) and the second shifting property. Using the fact that cos 5(t − 2π ) = − cos 5t and sin 5(t − 2π ) = − sin 5t , conclude that . y3 (t ) = 2u(t − 2π ) 0.2 − e −0.01(t −2π ) (0.2 cos 5t + 0.0004 sin 5t ) = 2u(t − 2π ){0.2 − e 0.02π [0.2 − y0 (t )]}

(5.7.18)

(viii) So far, we have found the formula for y(t ) valid up to t = 3π . In fact, y(t ) = y1 (t ), if 0 < t < π y(t ) = y1 (t ) + y2 (t ), if π < t < 2π y(t ) = y1 (t ) + y2 (t ) + y3 (t ), if 2π < t < 3π

384

Laplace transforms

Using y1 (t ) = 0.2 − e 0π [0.2 − y1 (t )], together with (5.7.17) and (5.7.18), plus the fact that on 2π < t < 3π we know u(t − π ) = u(t − 2π ) = 1, show that on 2π < t < 3π , y(t ) = 0.2 − [0.2 − y1 (t )] 1 + 2e 0.01π + 2e 0.02π (ix) Using the patterns established in (5.7.17) and (5.7.18), explain why y(t ) = y1 (t ) + y2 (t ) + · · · + yn (t ) = (−1)n 0.2 − [0.2 − y1 (t )] 1 + 2e 0.01π + · · · + 2e 0.01nπ

(5.7.19)

is valid for n π < t < (n + 1)π for any positive integer n (x) Letting z(t ) = e −0.01t (cos 5t + 0.002 sin 5t ) and using the fact that 1 − x n+1 /1 − x = 1 + x + x 2 + · · · x n , show that on n π < t < (n + 1)π , 2 2e (n+1)0.01π n 1 − z(t ) + z(t ) (5.7.20) y(t ) = (−1) 5 5(1 − e 0.01π 5(1 − e 0.01π ) Explain why as t → ∞, it follows that y(t ) → ∞. Using a computer algebra system, graph the solution function on several consecutive large intervals of width π , such as [200π, 201π], [201π, 202π], etc., and discuss the behavior of the system.

5.7.3 Laplace transforms of systems

Recall that the standard initial-value problem for a system of ﬁrst-order DEs is given in matrix form by x = Ax + f (t ),

x(0) = b

(5.7.21)

In the event that f is a continuous function, the variation of parameters technique applies. But, if f is a step function or otherwise piecewise deﬁned, our earlier methods fail, and Laplace transforms may be used. Regardless, the Laplace transform can be a useful tool for systems for many of the same reasons it is for single DEs, such as the fact that it treats all linear systems in a uniform manner and incorporates the initial conditions immediately into the process of ﬁnding the solution. Since each of the three terms in the equation in (5.7.21) is a vector, Laplace transforms may be applied component-wise. For example, x1 (t ) L[x1 (t )] L[x (t )] = L = x2 (t ) L[x2 (t )] sX1 (s) − x1 (0) = = sX(s) − x(0) sX2 (s) − x2 (0)

For further study

385

where we let X(s) denote the Laplace transform of the vector function x(t ). Letting F(s) be the transform of the vector f (t ), we may deduce from (5.7.21) and theorem 5.3.4 that sX(s) − x(0) = AX(s) + F(s)

(5.7.22)

(a) Solve (5.7.22) for X(s) to show that X(s) = Z(s)(F(s) + b)

(5.7.23)

where Z(s) = (sI − A)−1 and b = x(0). Explain why we must assume that s is not an eigenvalue of A when we write X(s) in the form (5.7.23). (b) Next we solve an example system in step-by-step fashion. Consider the IVP 2t 1 0 1 e , x+ x(0) = (5.7.24) x = 0 −1 3 3 (i) Compute F(s) and hence show that

F(s) + x(0) =

1 s −2 + 1 3 s

(ii) Use the given coefﬁcient matrix A to compute Z(s) = (sI − A)−1 and conclude9 that 1 s −3 0 Z(s) = −1 s − 1 (s − 1)(s − 3) (iii) Compute X(s) using (5.7.23) to show that 1 s X(s) = s(s − 2) 2 (iv) Finally, use the inverse Laplace transform component-wise on X(s) (using standard inverse transform techniques) to ﬁnd 2t e −1 x(t ) = L [X(s)] = 2t −1 e (c) Use Laplace transforms and the solution technique outlined in (b) above to ﬁnd the solution of each system of IVPs below. 1 1 cos t −1 (i) x = x+ x(0) = , 0 −1 1 − sin t 0 2 sin t 1 (ii) x = x+ x(0) = , 1 −1 sin t 0 9

Recall the shortcut

a c

b d

−1

=

1 ad − bc

d −c

−b a

.

386

Laplace transforms

(iii)

x

(iv)

x

= = ⎡

2 1 3 0 2 1 3 0

2 1 (v) x = ⎣ 0 2 0 0 ⎡

t 1 , x+ x(0) = t 0 t 0 , x+ x(0) = t 0 ⎡ ⎤ ⎤ ⎡ t⎤ 0 e 0 0 ⎦x + ⎣ 1 ⎦, x(0) = ⎣ 0 ⎦ 0 0 −1

⎡ t⎤ ⎤ 2 1 0 e 0 ⎦x + ⎣ 1 ⎦, (vi) x = ⎣ 0 2 0 0 −1 0

⎡ ⎤ 1 x(0) = ⎣ 0 ⎦ 0

6 Nonlinear systems of differential equations

6.1 Motivating problems

In our studies so far, we have seen that a variety of interesting physical situations can be modeled by linear systems of differential equations. Moreover, nearly all linear systems may be solved explicitly. But, many important phenomena are nonlinear in nature; in order to motivate our upcoming work with such systems, we consider two applications where nonlinear systems of equations arise. A pendulum is a mesmerizing phenomenon. Whether on a grandfather clock or in the hand of a hypnotist, there is something fascinating about its motion. It turns out that a nonlinear second-order differential equation (and hence a system of nonlinear ﬁrst-order equations) models its behavior. To develop this differential equation, let a rigid arm of length L be attached to a point from which it may swing freely. In this discussion, we will assume for simplicity that no damping is present. Similarly, to simplify the physics we assume that the arm itself has negligible mass. Finally, we attach a mass m to the end of the rigid arm and set the pendulum in motion, as shown in ﬁgure 6.1. We are interested in how the mass travels along a circular arc once the mass is set in motion. The quantities of interest to us are noted in ﬁgure 6.1; the variable θ represents the angle (in radians) the arm makes with the vertical axis and s denotes the displacement of the center of the mass along the circular arc. Because the mass is traveling along a circular arc, it follows that s = L θ . Noting that both s and θ are implicit functions of t , we can differentiate with respect to t and ﬁnd s (t ) = L θ (t ) and s (t ) = L θ (t ). In particular, the velocity of the center of the mass along the arc is s (t ) and its acceleration is s (t ).

387

388

Nonlinear systems of differential equations

y

q

L

m

s x

Figure 6.1 A simple pendulum.

y

q

L

m s W=mg

mgcosq mgsinq

x Figure 6.2 Component of gravity’s force

along the pendulum’s motion.

Since the acceleration a(t ) is given by a(t ) = s (t ), we have d 2s d 2θ = L (6.1.1) dt 2 dt 2 Since we have assumed that there is no damping present, once the mass is set in motion the only force acting on the pendulum is gravity. Because we are studying the displacement, velocity, and acceleration of the mass along its path, we must consider the magnitude of the weight W = mg in the direction of motion. From ﬁgure 6.2, we see that gravity induces a force of magnitude W sin θ along the circular arc. Note, too, that this force opposes the motion of the pendulum, assuming s (t ) is positive. a(t ) =

Motivating problems

389

From Newton’s second law, F = ma, it now follows that ma = −mg sin θ , or (6.1.2) a(t ) = −g sin θ (t ) Using the two equivalent expressions for acceleration in (6.1.1) and (6.1.2), it follows that d 2θ (6.1.3) L 2 = −g sin θ dt If we assume that an initial displacement angle θ (0) = θ0 and initial angular velocity θ (0) = θ0 are given, then after rearranging (6.1.3) it follows that θ satisﬁes the initial-value problem g θ + sin θ = 0, θ (0) = θ0 , θ (0) = θ0 (6.1.4) L Because of the presence of sin θ in this equation, this second-order differential equation is nonlinear, which means that none of our previous solution methods apply. If we use the substitution x1 = θ and x2 = θ to recast (6.1.4) as a nonlinear system of ﬁrst-order differential equations, then it turns out that the system has a natural graphical interpretation through its slope ﬁeld, just as we saw with linear systems of differential equations. Using this substitution, we observe that the pendulum is governed by the system

x1 = x2 g x2 = − sin x1 L with initial conditions x1 (0) = θ0 and x2 (0) = θ0 . Besides studying the associated slope ﬁeld, we will also learn that it is possible to approximate this nonlinear system at key points with a linear system to better understand its behavior, particularly at any equilibrium points it may have. In subsequent sections, we will explore these issues in greater detail and return to this example involving the pendulum several times, including an investigation of what happens when friction is present. In addition to the pendulum, another system of nonlinear differential equations arises in the study of population dynamics. Let us consider a population W (t ) of wolves (in hundreds) that prey upon a population M (t ) of moose (in hundreds), where t is time measured in years. A good example of such a situation, and one that biologists have studied in detail, occurs on Isle Royale in Lake Superior. On this remote island, wolves are the only predator of moose and moose are essentially the only prey of wolves. Suppose that in the absence of moose, the wolves would die off at a rate proportional to their own number according to a differential equation such as dW = −0.75W dt In the presence of moose, however, we expect more of the wolves to be able to survive, and to do so at a rate proportional to the moose–wolf interactions since

390

Nonlinear systems of differential equations

these can result in food for the wolves. The number of moose–wolf interactions can be modeled by taking the product of M and W ; only some fraction of such interactions will be beneﬁcial to the wolves. Thus, the wolf population can be assumed to satisfy a differential equation of the form dW = −0.75W + 0.25MW (6.1.5) dt Likewise, in the absence of wolves, we would expect the number of moose to grow unencumbered (at least in the short term). We might, therefore, have a differential equation like dM = 0.5M dt But with wolves around, some of the moose will die due to moose–wolf interactions, hence we assume the moose population satisﬁes an equation like dM = 0.5M − 0.1MW (6.1.6) dt Equations (6.1.5) and (6.1.6) lead to the system of nonlinear differential equations dW = −0.75W + 0.25MW dt dM = 0.5M − 0.1MW dt Systems of this form (regardless of the values of the constants) are typically known as predator–prey or Lotka–Volterra equations. Factoring the right-hand side in each equation above, we see that the wolf and moose populations satisfy dW = W (−0.75 + 0.25M ) dt dM = M (0.5 − 0.1W ) dt from which it is evident that the system of differential equations has not only the obvious equilibrium point at the origin, but also one at (5, 3). What kind of behavior should we expect for the wolf and moose populations for initial conditions near (5, 3)? In particular, is this equilibrium point stable? Are there ways we can approximate this nonlinear system with a linear one? These questions and more are the focus of subsequent sections as we investigate nonlinear systems of DEs. Our in-depth study of linear systems of differential equations in chapter 3 will prove useful in the study of nonlinear systems: as we see in section 6.2, we can study the graphical behavior of solutions to nonlinear systems in the phase plane by plotting a direction ﬁeld, just as we did with linear systems. Moreover, in section 6.3 we will study a process by which we can approximate the nonlinear system at a point by a linear system and use our understanding of the behavior of linear systems to make predictions about the nonlinear system.

Graphical behavior of solutions for 2 × 2 nonlinear systems

391

6.2 Graphical behavior of solutions for 2 × 2 nonlinear systems

In our study of single ﬁrst-order initial-value problems in chapter 2, we learned that every IVP associated with a linear differential equation with sufﬁciently well-behaved coefﬁcient functions has a unique solution; moreover, we can determine an explicit formula for the solution. As we learned in chapter 3, essentially the same situation holds for linear systems of differential equations; those with constant coefﬁcients and their corresponding IVPs can always be solved. However, in the case when the governing differential equation or system of equations is nonlinear, we are not guaranteed that solutions to initial-value problems exist, nor that they are unique when they do exist. In addition, as we now study nonlinear systems, we will ﬁnd that even when unique solutions exist, we are usually unable to determine explicit formulas for them. We therefore turn again to graphical and numerical investigations of the qualitative properties of solutions to nonlinear systems in order to understand their short- and long-term behavior. To begin, let us choose an example through which we can develop intuition. We consider the system given by x1 = x2 − x13 x2 = x1 − x23 If we let

x(t ) =

x1 (t ) x2 (t )

(6.2.1)

and F : R2 → R2 be the function deﬁned by F(x) = F(x1 , x2 ) = (x2 − x13 , x1 − x23 ) then it follows that we may view (6.2.1) as having the form x = F(x)

(6.2.2)

This is analogous to our work with linear systems of differential equations that may be expressed in the form x = Ax, where A is a matrix. In that setting, the right-hand side of the system is a linear function of x, but in (6.2.2), F(x) is not linear. Nonetheless, a graphical interpretation of the system remains both possible and enlightening. In section 3.4, we discussed the graphical behavior of a vector function. Here, we simply remind ourselves that for the system x = F(x) in (6.2.1), a solution x(t ) is a vector function whose output lies in R2 and whose graph is the curve that is traced out by the vectors x(t ) at various times t . Moreover, the derivative x (t ) of x(t ) is itself a vector function that indicates the instantaneous velocity of a particle traveling along the curve traced out by x(t ). In particular, scalar multiples of x (t ) tell us the direction of motion or ﬂow along the solution curve as time increases.

392

Nonlinear systems of differential equations

We therefore turn again to direction ﬁelds to study the ﬂow of the solution curves through the vector ﬁeld generated by the system of differential equations. In particular, (6.2.2) indicates how, for any point (x1 , x2 ) in the plane, we can easily compute x = F(x1 , x2 ) at that point, and hence know the direction of the ﬂow of the solution curve that passes through that point. Using a computer algebra system to execute these computations repeatedly at points sampled throughout the plane, we can view the direction ﬁeld for the nonlinear system, which is analogous to the direction ﬁeld for a linear system. A direction ﬁeld for (6.2.1) is shown in ﬁgure 6.3. The x1 –x2 plane is again called the phase plane; the independent variable t remains implicit in the ﬂow, while the behavior of the curve relative to the coordinate axes demonstrates the interrelationship among the components x1 (t ) and x2 (t ) of the solution x(t ). Sample solution curves, such those plotted in ﬁgure 6.4 are typically called trajectories. In section 6.4 we will learn how to construct trajectories for systems through numerical approximation techniques such as Euler’s method. From ﬁgures 6.3 and 6.4, it appears that the system (6.2.1) has three equilibrium solutions. Speciﬁcally, the behavior of trajectories suggests the possibilities of equilibria at (−1, −1), (0, 0), and (1, 1). We can conﬁrm this algebraically by setting x = 0 and solving the resulting nonlinear system of equations 0 = x2 − x13

(6.2.3)

0 = x1 − x23

(6.2.4)

x2 3

x1 −3

3

−3 Figure 6.3 The direction ﬁeld for the system

x = F(x) given in (6.2.1).

Graphical behavior of solutions for 2 × 2 nonlinear systems

393

x2 3

x1 −3

3

−3 Figure 6.4 The direction ﬁeld for the system

x = F(x) given in (6.2.1) with three trajectories.

Equation (6.2.3) implies that x2 = x13 . Substituting this result in (6.2.4), it follows that 0 = x1 − (x13 )3 Factoring, we see 0 = x1 (1 − x18 ) = x1 (1 − x14 )(1 + x14 ) = x1 (1 − x12 )(1 + x12 )(1 + x14 ) from which we determine that x1 = 0, 1, or −1. Recalling that x2 = x13 , the corresponding x2 -values are x2 = 0, 1, and −1, and we have found that the equilibrium points of the system (6.2.1) are indeed (−1, −1), (0, 0), and (1, 1). Here, we see another distinction between linear and nonlinear systems of differential equations. For a linear system x = Ax, the search for equilibrium solutions means we must solve Ax = 0, which we know has either a unique solution or inﬁnitely many solutions. With nonlinear systems, it is possible that any number of equilibrium solutions exist (from none to inﬁnitely many). Moreover, there are no guarantees that we can even expect to analytically solve the resulting system of nonlinear algebraic equations to ﬁnd such equilibria. When we do ﬁnd equilibrium solutions to a system, it is natural to ask about their stability. For example, for the equilibrium solution (0, 0) to (6.2.1), we might observe from ﬁgure 6.3 that the origin seems to exhibit behavior similar to a saddle point and therefore may be unstable. To investigate this further, one option is to see if there is a linear system of differential equations to which we can compare (6.2.1). For x1 and x2 near zero, observe that both x13 and x23 are extremely small, so that in this region close to the origin it is reasonable for us to say that x1 = x2 − x13 ≈ x2 (6.2.5) x2 = x1 − x23 ≈ x1

394

Nonlinear systems of differential equations

In particular, note that the approximate system is linear, and we can write x = Ax, for x near 0 with 0 1 A= (6.2.6) 1 0 The eigenvalues of the matrix A are λ1 = −1 and λ2 = 1 with corresponding eigenvectors v1 = [−1 1]T and v2 = [1 1]T . Due to the fact that the eigenvalues are real and of opposing signs, it follows that the origin is indeed a saddle point for this approximating linear system and is therefore unstable. The phase plane for the linear system corresponding to (6.2.6) near 0 is displayed in ﬁgure 6.5. This behavior is consistent with that observed near the origin in ﬁgure 6.3. We will call the system x = Ax, where A is given by (6.2.6), the linearization of (6.2.1) near 0. In section 6.3, we will study this approximation to a nonlinear system of differential equations near any particular point of interest to us. We close this section with two examples of nonlinear systems in which we determine all equilibrium solutions and examine the graphical behavior of solutions near the equilibria. Example 6.2.1

Consider the system of differential equations given by x1 = sin x2

(6.2.7)

x2 = x2 − x12

Determine all equilibrium solutions of the system, plot the direction ﬁeld, and discuss the behavior of solutions near at least two of the equilibrium solutions.

x2 1

x1 −1

1

−1 Figure 6.5 The direction ﬁeld for the linear

system x = Ax given in (6.2.5).

Graphical behavior of solutions for 2 × 2 nonlinear systems

395

Solution. To ﬁnd the equilibrium solutions, we set x1 = x2 = 0 and solve the system of equations 0 = sin x2

(6.2.8)

0 = x2 − x12

(6.2.9)

Equation (6.2.8) implies that x2 must be any integer multiple of π , while (6.2.9) shows that x1 and x2 must satisfy the relationship x12 = x2 . This latter equation implies that x2 must be non-negative, and √ therefore with x2 = k π for any nonnegative integer k, it follows that x = ± k π and we have equilibrium solutions 1 √ √ of the form ( k π, k π ), (− k π, k π ) for k = 0, 1, 2, . . .. An appropriate window in which to plot the direction ﬁeld for this system might therefore be √ [−3, 3] × √ [−1, 8], as this √ ﬁve equilibrium √ will include the solutions (0, 0), (− π, π ), ( π, π ), (− 2π, 2π ), and ( 2π, 2π ). Plotting the direction ﬁeld, as shown in ﬁgure 6.6, we see that the system appears to demonstrate familiar √ the equilibrium solutions. For example, √ behavior around at the solutions ( π, π ) and (− 2π, 2π ), each seems to be a saddle point, based on √ the behavior √ of trajectories nearby. In addition, at the equilibrium points (− π, π ) and ( 2π, 2π ), the system appears to demonstrate spiraling behavior where the equilibria might act as stable centers or possibly as unstable spiral sources. Based on the periodicity of the sine function, we can reasonably expect that we would√see similar behavior demonstrated at other equilibrium points of the form (± k π, k π ), for k = 3, 4, . . .. Note further that all equilibria lie along the parabola x2 = x12 , as dictated by (6.2.9). Finally, it is evident that (0, 0) is an unstable equilibrium, though the precise behavior of solutions nearby is not entirely clear from the plot.

x2 7.5 5.0 2.5 x1 −3

−2

−1

1

2

3

Figure 6.6 The direction ﬁeld for

the system (6.2.7)√with equilibrium √ points √ (0, 0), (− √π , π ), ( π, π ), (− 2π , 2π ), and ( 2π , 2π ).

396

Nonlinear systems of differential equations

Indeed, it is apparent that we desire more precision, and not just in the vicinity of (0, 0); our study of the linearization of a system of nonlinear differential equations in the next section will enable a much more rigorous understanding of a system’s behavior near any equilibrium point. Example 6.2.2

Consider the system of differential equations given by x1 = −x1 + x1 x22

(6.2.10)

x2 = −2x2 + x2 x1

Determine all equilibrium solutions of the system, plot the direction ﬁeld, and discuss the behavior of solutions near at least two of the equilibrium solutions. Solution. In the standard way, to ﬁnd the equilibrium solutions we set x1 = x2 = 0 and solve the nonlinear system of equations 0 = −x1 + x1 x22 = x1 (−1 + x22 )

(6.2.11)

0 = −2x2 + x2 x1 = x2 (−2 + x1 )

(6.2.12)

From (6.2.12), we see that either x2 = 0 or x1 = 2. If x2 = 0, substituting this value for x2 in (6.2.11), it follows that x1 = 0, so one equilibrium solution is (0, 0). If x1 = 2, then (6.2.11) implies that −1 + x22 = 0, which in turn shows that x2 = ±1. Thus, two additional equilibrium solutions have been found: (2, 1) and (2, −1). A reasonable window for plotting the direction ﬁeld for this system is [−2, 4] × [−3, 3], since this will include the three equilibrium solutions we

3

x2

1 x1 −2

2

4

−1

−3 Figure 6.7 The direction ﬁeld for the sys-

tem (6.2.10) with equilibrium points (0, 0), (2, 1), and (2, −1).

Graphical behavior of solutions for 2 × 2 nonlinear systems

397

have found at (0, 0), (2, 1), and (2, −1). As we see in ﬁgure 6.7, it appears that (0, 0) is a stable attracting ﬁxed point and that both coordinate axes are straightline solutions. This observation is not surprising if we also think about linear approximations: for x1 and x2 near zero, x1 x22 and x1 x2 will be extremely small, and thus for such values the nonlinear system (6.2.10) can be approximated by the linear system x1 = −x1 (6.2.13) x2 = −2x2 The linear system (6.2.13) has the obvious solutions x1 (t ) = e −t and x2 (t ) = e −2t , which lead to the observed behavior near (0, 0) in the nonlinear system. From ﬁgure 6.7, it also appears that the equilibrium points (2, 1) and (2, −1) are saddle points. From all of our work in this section, we see that equilibrium solutions remain a vital part of our understanding of any system, whether linear or not. In addition, the picture painted by the direction ﬁeld is fundamental to understanding the behavior of solutions to a nonlinear system. And yet, we are left desiring more detail than the direction ﬁeld can provide. In section 6.3 we will develop the concept of the linearization of a system in order to link our understanding of linear systems to the behavior of nonlinear systems near equilibrium points. Furthermore, in section 6.4, we will generalize Euler’s method for single differential equations in order to apply it to systems to generate approximate solutions to solutions. 6.2.1 Plotting direction ﬁelds of nonlinear systems using Maple

The Maple syntax used to generate the plots in this section is essentially identical to that discussed for direction ﬁelds for linear systems in section 3.4.1. As always, we use the DEtools package, and load it with the command > with(DEtools):

To deﬁne the system of differential equations from example 6.2.1 in Maple, we use the command > sys := diff(x[1](t),t) = sin(x[2](t)), diff(x[1](t),t) = x[2](t) - x[1](t)ˆ2;

The system of differential equations of interest is now stored in “sys.” The direction ﬁeld may now be generated by the command > DEplot([sys], [x[1](t),x[2](t)], t=-1..1, x[1]=-3..3, x[2]=-1..8, arrows=large, color=gray);

398

Nonlinear systems of differential equations

In plots in section 6.2, we have also included the equilibrium points. These may be generated by the pointplot command, which requires us to load the plots package. For example, the syntax > with(plots): pointplot([0,0], [sqrt(Pi),Pi], [-sqrt(Pi),Pi], [sqrt(2*Pi),2*Pi], [-sqrt(2*Pi), 2*Pi], symbol=circle, symbolsize=7);

will produce a plot of just these ﬁve points in the plane. To superimpose these points on the direction ﬁeld, we can assign names to each plot and then display them together. Giving the respective plots the names DF and EQsol, we can use the display command as follows. Note the use of colons, rather than semicolons, to suppress output when we assign names to the plots. > DF := DEplot([sys], [x[1](t),x[2](t)], t=-1..1, x[1]=-3..3, x[2]=-1..8, arrows=large, color=gray): > EQsol := pointplot( [0,0], [sqrt(Pi),Pi], [-sqrt(Pi),Pi], [sqrt(2*Pi),2*Pi], [-sqrt(2*Pi), 2*Pi], symbol=circle, symbolsize=7): > display(DF, EQsol);

This combination of commands results in the output shown at left in ﬁgure 6.8. If desired, we can now sketch trajectories by hand. Maple has the capacity to include such trajectories, given initial conditions. For example, if we are given the initial conditions x(0) = (2, 6) and (−2, 6), we can modify the earlier DEplot x2

x2 7.5

7.5

5.0

5.0

2.5

2.5 x1

−3

−2

−1

1

2

3

x1 −3

−2

−1

1

2

3

Figure 6.8 At left, the direction ﬁeld for√the system (6.2.7) √with equilibrium

√ √ points (0, 0), (− π , π ), ( π , π ), (− 2π , 2π ), and ( 2π, 2π ). At the right, the same direction ﬁeld with trajectories through (2, 6) and (−2, 6) is included.

Graphical behavior of solutions for 2 × 2 nonlinear systems

399

command to > DEplot([sys], [x[1](t),x[2](t)], t=-2..2, x[1]=-3..3, x[2]=-1..8, arrows=medium, color=gray), [[x[1](0)=2,x[2](0)=6], [x[1](0)=-2,x[2](0)=6]]);

This most recent command, when saved and displayed simultaneously with the above plot of equilibrium solutions, results in the righthand plot in ﬁgure 6.8. As a reminder, we always expect to experiment some with the window in which the plot is displayed: the range of x- and y-values certainly affects how clearly the direction ﬁeld is revealed, and the range of t -values impacts how much of each trajectory is plotted. As the most recent section shows, a study of a system’s equilibrium points is a helpful guide for choosing a window in which to display a plot. Exercises 6.2 In exercises 1–7, (a) determine all equilibrium solutions, (b) use Maple to plot the direction ﬁeld, and (c) from the direction ﬁeld, visually estimate whether equilibrium solutions are stable or unstable and discuss the long-term behavior of solutions. 1. x1 = x2 − 2x1 x2 x2 = 4x1 x2 − x1

2. x1 = 4 − x22

x2 = 1 − x1 + x2

3. x1 = cos x2

x2 = 1 − sin x1

4. x1 = 2x1 − x2

x2 = −4x1 + 2x2

5. x1 = e −x2

x2 = 1/(1 + x12 )

6. x1 = ln(2 + x2 ) x2 = x12 + x2

7. x1 = x2 − x12

x2 = x1 − 8x22

8. Recall from section 6.1 that the nonlinear system of differential equations W = −0.75W + 0.25MW M = 0.5M − 0.1MW

400

Nonlinear systems of differential equations

models the numbers of wolves and moose (each measured in hundreds) in a predator–prey situation. Determine all equilibrium solutions to this system, plot an appropriate direction ﬁeld in a computer algebra system, and discuss the apparent long-term behavior of the wolf and moose populations. 9. Recall that if x1 = θ is the angle that the arm of a pendulum forms with the positive x-axis (as shown in ﬁgure 6.2) and x2 = x1 = θ , then x1 and x2 satisfy the nonlinear system of differential equations x1 = x2 g x2 = − sin x1 L Let g = 9.8 m/s2 and assume that the length of the arm is L = 2 m. Determine all equilibrium solutions to this system, plot an appropriate direction ﬁeld in a computer algebra system, and discuss the long-term behavior of solutions to the system. Be sure to relate your answers directly to the behavior of the pendulum and corresponding initial conditions.

6.3 Linear approximations of nonlinear systems

In our ﬁrst look at nonlinear systems in the preceding section, we considered the system x1 = x2 − x13 x2 = x1 − x23

(6.3.1)

and observed informally that near the origin where x ≈ 0, we can drop the x13 and x23 terms so that (6.3.1) can be approximated by the linear system x = Ax where 0 1 (6.3.2) A= 1 0 In this section, we make this notion of linear approximation of nonlinear systems more precise and use this approach to classify the stability of equilibria of nonlinear systems. An important idea in calculus is that all well-behaved functions are locally linear. That is, they appear linear when viewed up close; the line the function emulates is the tangent line to the curve at the point on which we focus. In particular, for a function f (x) that is differentiable at the value x = a, f (x) ≈ L(x) for x near a, where L(x) = f (a) + f (a)(x − a)

(6.3.3)

The function L(x) is usually called the tangent line approximation or linearization of f at x = a.

Linear approximations of nonlinear systems

401

We encounter the very same ideas in multivariable calculus. For a differentiable vector function r : R → R3 given by ⎡ ⎤ f (t ) r(t ) = ⎣ g (t ) ⎦ h(t ) for values of t near some ﬁxed value a, the curve in space that r(t ) generates can be approximated by the tangent line to the curve. In particular, r(t ) ≈ L(t ) where ⎡ ⎤ f (a) + f (a)(t − a) (6.3.4) L(t ) = r(a) + r (a)(t − a) = ⎣ g (a) + g (a)(t − a) ⎦ h(a) + h (a)(t − a) for t near a. As in the case of the scalar function f , L is called the tangent line approximation or linearization of r at t = a. Similarly for a differentiable real-valued function of several variables F : R2 → R given by z = F (x , y), F (x , y) can be approximated by its tangent plane for (x , y) near some ﬁxed point (a , b). That is, we have the approximation F (x , y) ≈ L(x , y) where L(x , y) = f (a , b) + fx (a , b)(x − a) + fy (a , b)(y − b)

(6.3.5)

L is called the tangent plane approximation or linearization of f at (a , b). There is obviously a great deal of similarity in the algebraic forms of the linear approximations given in (6.3.3), (6.3.4), and (6.3.5). How can we apply these ideas to systems of nonlinear differential equations? The next example, in which we reconsider (6.3.1), suggests one approach. Because of the pending use of partial derivatives, we will temporarily use the notation x = [x1 x2 ]T = [x y ]T . Example 6.3.1 Consider the system of differential equations x = f (x , y) = y − x 3 y = g (x , y) = x − y 3

(6.3.6)

Determine linear approximations to both f (x , y) and g (x , y) at the point (1, 1). Then explain how these linear combinations may be combined to form an overall linear approximation of (6.3.6) near (1, 1). Solution. In section 6.2, we considered this same system (using x1 and x2 for the functions, instead of x and y) and learned that the equilibrium solutions to the system are (−1, −1), (0, 0), and (1, 1). As noted at the start of this section, we have already considered a linear approximation of the system at (0, 0). Here, we focus on the behavior of solutions near the equilibrium solution (1, 1). To ﬁrst approximate x = f (x , y) = y − x 3 near (1, 1), we use (6.3.5) to ﬁnd the tangent plane approximation. Noting that fx (x , y) = −3x 2 and fy (x , y) = 1,

402

Nonlinear systems of differential equations

it follows that fx (1, 1) = −3 and fy (1, 1) = 1. Moreover, f (1, 1) = 0 since (1, 1) is an equilibrium solution of the system. Now, it follows that for (x , y) near (1, 1), f (x , y) ≈ f (1, 1) + fx (1, 1)(x − 1) + fy (1, 1)(y − 1) = 0 − 3(x − 1) + 1(y − 1) (6.3.7) Similar ideas applied to y = g (x , y) = x − y 3 show that for (x , y) near (1, 1), g (x , y) ≈ g (1, 1) + gx (1, 1)(x − 1) + gy (1, 1)(y − 1) = 0 + 1(x − 1) − 3(y − 1) (6.3.8) If we now consider the overall system (6.3.6), for (x , y) near (1, 1) we have the approximation x = f (x , y) ≈ −3(x − 1) + 1(y − 1) y = g (x , y) ≈ 1(x − 1) − 3(y − 1)

(6.3.9)

Using the fact that both equations in (6.3.9) are linear and writing this system in matrix form with x = [x y ]T , we have x ≈

1 −3 1 −3

x −1 1 1 −1 −3 −3 x+ = y −1 1 −3 1 −3 −1 1 2 −3 x+ (6.3.10) = 1 −3 2

Hence we have approximated the original nonlinear system with a linear one by writing it in the form x ≈ A(x − a) = Ax + b, where b = −Aa, for x near a. Because we have found that we may approximate the system (6.3.6) with the linear system (6.3.10), we can now use our understanding of linear systems to determine the behavior of the nonlinear system near the chosen equilibrium point. Speciﬁcally, the fact that the eigenvalues of the matrix A in (6.3.10) are λ = −2 and λ = −4 tells us that the equilibrium solution (1, 1) of (6.3.1) is a stable, attracting node, as we initially conjectured graphically from ﬁgure 6.4. Moreover, the approach we have taken in example 6.3.1 may certainly be generalized. Any nonlinear system of two differential equations may be written in the form x = F(x)

(6.3.11)

where F is a function of the form F(x) = F(x , y) = (f (x , y), g (x , y)). Given an equilibrium solution of (6.3.11) at a = (a , b), notice that F(a) = 0; in particular, f (a , b) = g (a , b) = 0.

Linear approximations of nonlinear systems

403

If, as in example 6.3.1, we approximate f and g near (a , b) with f (x , y) ≈ f (a , b) + fx (a , b)(x − a) + fy (a , b)(y − b) = fx (a , b)(x − a) + fy (a , b)(y − b)

g (x , y) ≈ g (a , b) + gx (a , b)(x − a) + gy (a , b)(y − b) = gx (a , b)(x − a) + gy (a , b)(y − b)

we observe that in matrix form we have x = F(x) f (x , y) = g (x , y) fx (a , b)(x − a) + fy (a , b)(y − b) ≈ gx (a , b)(x − a) + gy (a , b)(y − b) fx (a , b) fy (a , b) x − a = gx (a , b) gy (a , b) y − b In matrix notation, we have written that x = F(x) ≈ J(a)(x − a) for x near a, where a is an equilibrium point of the original system and J(a) is a matrix with constant entries. The matrix J(a), which is deﬁned by fx (a , b) fy (a , b) J(a) = (6.3.12) gx (a , b) gy (a , b) is known as the Jacobian matrix of the function F evaluated at the point (a , b). More generally, for any differentiable function F : Rn → Rm given by F(x) = F(x1 , . . . , xn ) = (f1 (x1 , . . . , xn ), . . . , fm (x1 , . . . , xn )), the Jacobian matrix J(x) is given by ⎤ ⎡ ∂ f1 /∂ x1 ∂ f1 /∂ x2 · · · ∂ f1 /∂ xn ⎢ ∂ f2 /∂ x1 ∂ f2 /∂ x2 · · · ∂ f2 /∂ xn ⎥ ⎥ ⎢ (6.3.13) J(x) = ⎢ ⎥ .. .. .. .. ⎦ ⎣ . . . . ∂ fm /∂ x1 ∂ fm /∂ x2 · · · ∂ fm /∂ xn

The Jacobian enables us to write the linearization of any differentiable function F for x near a point a as F(x) ≈ F(a) + J(a)(x − a)

(6.3.14)

which is remarkably similar to the tangent line approximation (6.3.3). Note that we must evaluate the Jacobian matrix at the point a of interest; moreover, if we are working with a nonlinear system of differential equations with equilibrium point a, it follows that F(a) = 0, so that we have x = F(x) ≈ J(a)(x − a)

(6.3.15)

404

Nonlinear systems of differential equations

This entire discussion of linearizing nonlinear systems is important for several reasons. One is that it demonstrates how we can take a problem we do not fully understand (the nonlinear system) and gain more knowledge of it by approximating the system near a point of interest with a simpler (linear) system that we do understand. Moreover, because we have completely classiﬁed the stability of equilibria of linear systems through the eigenvalues of the system’s matrix, we classify the equilibria of nonlinear systems by doing so for the corresponding linearization. We will use the same terminology and classiﬁcation scheme for equilibria of nonlinear systems that we established for linear ones in sections 3.4 and 3.5. Two examples now follow to demonstrate these ideas in greater detail. Example 6.3.2

Given the system of differential equations x1 = 9x2 − x22 x2 = x1

determine all equilibrium points of the system, evaluate the Jacobian at each equilibrium point, and ﬁnd a corresponding linearization of the system in order to analyze the behavior of trajectories near each equilibrium point and the stability of equilibria. Finally, plot the direction ﬁeld of the given system to conﬁrm the observations made. Solution.

First, we observe that x = F(x) for F(x) = F(x1 , x2 ) = (f (x1 , x2 ), g (x1 , x2 )) = (9x2 − x22 , x1 )

Setting x = 0, it follows that x1 = 0 and x2 (9 − x2 ) = 0, so that the equilibrium points of the system are (0, 0) and (0, 9). Taking the appropriate partial derivatives, the Jacobian of F is 0 9 − 2x2 J(x) = 1 0 Therefore, for values of x1 and x2 near the equilibrium point a = (0, 0) = 0, we have that x = F(x) ≈ J(0)(x − 0), or 0 9 x x ≈ 1 0 For this linear system, the eigenvalues of the matrix J(0) are λ = 3 and λ = −3, so the origin is a saddle point and therefore unstable. Moreover, we expect there to be two approximately straight-line solutions (along the respective eigenvectors of J(0)) that pass through the origin, along one of which the solution tends toward (0, 0) while on the other the solution is repelled away from (0, 0). For x1 and x2 near the equilibrium point a = (0, 9), we have that x = F(x) ≈ J(a)(x − a), or 0 −9 x1 − 0 0 −9 81 x+ = x ≈ 1 0 x2 − 9 1 0 0

Linear approximations of nonlinear systems

405

x2 10.0

5.0

x1 −10

−5

5

10

−2.5 Figure 6.9 The

direction

ﬁeld

for

Example 6.3.2.

For this nonhomogeneous linear system, the eigenvalues of the matrix J(0, 9) are λ = 3i and λ = −3i. Because the eigenvalues are purely imaginary, it follows that the equilibrium point (0, 9) is a stable center. Nearby this point, we expect to see trajectories orbit the point in approximately elliptical loops. All of our observations are conﬁrmed by the graphical behavior evidenced in ﬁgure 6.9.

Example 6.3.3 For the system of differential equations x1 = sin x2 x2 = x2 − x12

(6.3.16)

determine all equilibrium points of the system, evaluate the Jacobian at each equilibrium point, and ﬁnd a corresponding linearization of the system in order to analyze the behavior of trajectories near each equilibrium point and the stability of equilibria. Finally, plot the direction ﬁeld of the given system to conﬁrm the observations made. Solution. The given system is the same one that we studied in example 6.2.1 in the preceding section. There we discovered that for any equilibrium solution x = (x1 , x2 ), x2 must be any integer multiple of π and x12 = x2 , so that √ x2 must be √ non-negative. Thus, the equilibrium solutions have the form ( k π, k π ), (− k π , k π ) for k = 0, 1, 2, . . .. Letting x = F(x) = (sin x2 , x2 − x12 ), it follows that the Jacobian of F is 0 cos x2 J(x) = −2x1 1

406

Nonlinear systems of differential equations

For values of x1 and x2 near the equilibrium point a = (0, 0) = 0, we have that x = F(x) ≈ J(0)(x − 0), or 0 1 x ≈ x 0 1 The eigenvalues of the matrix J(0) are λ = 0 and λ = 1, so the origin is unstable, because the real eigenvalue λ = 1 > 0 will drive solutions away from the origin as t → ∞. Moreover, because λ = 0 is an eigenvalue of J(0), it also follows that all solutions near 0 are approximately straight-line solutions. √ For x1 and x2 near the equilibrium point a = ( π, π ), we have that x = F(x) ≈ J(a)(x − a), or √ √ 0 −1 x1 − π = √ 0 −1 x + π x ≈ −2 π 1 x2 − π −2 π 1 π √ The eigenvalues of the matrix J( π, π )√are approximately λ = 2.448 and λ = −1.448, and so the equilibrium point ( π, π ) is a saddle √ point and unstable. However, if we consider the equilibrium point a = (− π, π ), we have that x = F(x) ≈ J(a)(x − a), or √ 0 −1 x1 + π 0 −1 π = √ x ≈ √ x+ 2 π 2 π 1 x2 − π 1 π √ In this case, the eigenvalues of the matrix J(− π, π ) are approximately λ = 0.5 ± 1.815i. Because these complex√eigenvalues have positive real parts, it follows that the equilibrium solution (− π, π ) is a spiral source and √ is unstable. If we continue exploring equilibrium points of the form (± √k π, k π ), we can show through the Jacobian √that whenever k is odd, the point ( k π, k π ) is a saddle point √ Conversely, whenever √and the point (− k π, k π ) is a spiral source. k is even, ( k π, k π ) is a spiral source and the point (− k π, k π ) is a saddle. In particular, every equilibrium point of the system is unstable. These observations are all conﬁrmed in the direction ﬁeld shown in ﬁgure 6.10. Through linear approximation, the tools we developed for linear systems enable us to understand and classify the stability of equilibria and behavior of solutions near equilibrium points for nonlinear systems. In the next section, we will explore how to actually compute approximate solutions via Euler’s method for systems. Exercises 6.3 In exercises 1– 6, ﬁnd the Jacobian of the given function, F. 1. F(x1 , x2 ) = (x12 + x2 , x1 − x22 ) 2. F(x1 , x2 ) = (e 2x1 x2 , cos x1 + sin x2 ) 3. F(x1 , x2 ) = (x2 − 2x1 x2 , 4x1 x2 − x1 )

Linear approximations of nonlinear systems

407

x2 7.5 5.0 2.5 x1 −3

−2

−1

1

2

3

Figure 6.10 The direction ﬁeld for

the system (6.3.16)√with equilibrium √ points √ (0, 0), (− √π , π ), ( π, π ), (− 2π , 2π ), and ( 2π , 2π ).

4. F(x1 , x2 ) = (4 − x22 , 1 − x12 ) 5. F(x1 , x2 , x3 ) = (1/(1 + x12 + x22 + x32 ), e −x1 −x2 −x3 , 2x1 − 3x22 + x34 ) 2

2

2

6. F(x1 , x2 , x3 ) = (3x1 − x2 + 4x3 , x1 + x2 − 2x3 , −2x1 + 5x2 − x3 ) In exercises 7–10, ﬁnd the linearization of the given function, F(x1 , x2 ), at the given point a. 7. F(x1 , x2 ) = (x12 + x2 , x1 − x22 ),

a = (1, −1)

8. F(x1 , x2 ) = (x2 e 2x1 , cos x1 + sin x2 ), 9. F(x1 , x2 ) = (x2 − 2x1 x2 , 4x1 x2 − x1 ), 10. F(x1 , x2 ) = (4 − x22 , 1 − x12 ),

a = (π/2, 0) a = (1/2, 1/4)

a = (−1, 2)

In exercises 11–17, ﬁnd all equilibrium points of the system, determine the linearization of the given system near each equilibrium point, classify the stability of each equilibrium point, and compare your work to a plot of the direction ﬁeld for the system.1 11. x1 = x2 − 2x1 x2 x2 = 4x1 x2 − x1

12. x1 = 4 − x22

x2 = 1 − x1 + x2

1

Note that in the exercises of section 6.2, equilibrium solutions were found and direction ﬁelds were plotted in exercises 1–7, which correspond to the same systems of differential equations given here.

408

Nonlinear systems of differential equations

13. x1 = cos x2 x2 = 1 − sin x1 14. x1 = 2x1 − x2 x2 = −4x1 + 2x2 15. x1 = e −x2 x2 = 1/(1 + x12 ) 16. x1 = ln(2 + x2 ) x2 = x12 + x2 17. x1 = x2 − x12 x2 = x1 − 8x22 18. Recall from section 6.1 that the nonlinear system of differential equations W = −0.75W + 0.25MW M = 0.5M − 0.1MW models the numbers of wolves and moose (each measured in hundreds) in a predator-prey situation. Determine the linearization of the system near the nonzero equilibrium solution, classify the stability of this equilibrium, and discuss the long-term behavior of the wolf and moose populations.2 19. Recall that if x1 = θ is the angle that the arm of a pendulum forms with the positive x-axis (as shown in ﬁgure 6.2) and x2 = x1 = θ , then x1 and x2 satisfy the nonlinear system of differential equations x1 = x2 g x2 = − sin x1 L Let g = 9.8 m/s2 and L = 2 m. Determine the linearization of the system near the equilibrium solution at zero and at least one other equilibrium solution, classify the stability of these equilibria, and discuss the long-term behavior of the pendulum. Be sure to relate your answers directly to the behavior of the pendulum and corresponding initial conditions. 20. In example 6.2.2, we considered the system of differential equations given by x1 = −x1 + x1 x22

x2 = −2x2 + x2 x1 Determine the linearization of the system near each equilibrium solution, classify the stability of each equilibrium point, and discuss the behavior of solutions nearby.

2

In the exercises of section 6.2, equilibrium solutions were found and the direction ﬁeld was plotted for this system in exercise 8; similarly, see the results of exercise 9 in section 6.2 for use in the problem 19 below.

Euler’s method for nonlinear systems

409

6.4 Euler’s method for nonlinear systems

Just as we experienced with single nonlinear initial-value problems such as y = ye −y + 1,

y(0) = 1

(6.4.1)

or y(0) = −1 (6.4.2) y = t 2 + y 2 + 1, that we could not solve explicitly, in the past two sections we have encountered systems of nonlinear differential equations for which solutions to corresponding initial-value problems cannot be determined analytically. We therefore desire to explore ways to estimate solutions to these problems. For IVPs such as (6.4.1) and (6.4.2), we know that we may estimate a solution to the problem through Euler’s method. Recall from Section 2.6 that for any ﬁrst-order IVP in the form y = f (t , y), y(t0 ) = y0 , given a step-size h we are able to generate the sequence of points (t1 , y1 ), . . . , (tn , yn ) such that tn+1 = tn + h and yn+1 = yn + hf (tn , yn ),

for n ≥ 0

(6.4.3)

where yn ≈ y(tn ). That is, yn approximates the solution y to the initial-value problem at the point where t = tn . To explore how we can extend Euler’s method to systems of differential equations, let us consider the initial-value problem given by x = 9y − y 2 , x(0) = 1 y(0) = 8 y = x,

(6.4.4)

Here, we choose to use the notation x = [x y ]T rather than [x1 x2 ]T due to the fact that we will be using subscripts to label approximations to the component solutions x(t ) and y(t ). Keeping in mind that x and y are each implicit functions of t , we can view (6.4.4) as being of the form x = f (x , y , t ), y = g (x , y , t ),

x(t0 ) = x0 y(t0 ) = y0

(6.4.5)

To see how to approximate solutions to this system of IVPs, let us reconsider our earlier studies of single differential equations. In section 2.6, we considered the equation y = f (t , y) in a ﬁrst-order IVP and emphasized the fact that Euler’s method relies on following the tangent line approximation to y(t ) at each step. In particular, if we have some approximation yn to the solution y at the t -value tn , then to move along the tangent line to the next approximation (tn+1 , yn+1 ), it follows that yn+1 = yn + y = yn +

y · t t

= yn + m · t

(6.4.6)

where m is the slope at each step of our approximation given by m = y = f

(t , y) in the differential equation that we are attempting to solve. Speciﬁcally, given

410

Nonlinear systems of differential equations

the approximation yn at time tn , the slope of the tangent line to the solution curve at this point is f (tn , yn ). Therefore, using this value for m in (6.4.6), letting h = t be the step size, we have yn+1 = yn + hf (tn , yn )

(6.4.7)

An essentially identical approach will work for the system (6.4.5). In particular, given the initial condition (x0 , y0 ) and a step-size h, we can generate the approximate solution (x(t1 ), y(t1 )) ≈ (x1 , y1 ) by taking x1 = x0 + h · f (t0 , x0 , y0 ) y1 = y0 + h · g (t0 , x0 , y0 )

(6.4.8)

The only difference between this approach and our experience with Euler’s method for a single equation is that we obviously have to update two approximations at once, as estimates of both x(tn ) and y(tn ) are needed to generate approximations of x(tn+1 ) and y(tn+1 ). We generalize our latest observation in (6.4.8) for a step from the approximation (xn , yn ) to the approximation (xn+1 , yn+1 ) by xn+1 = xn + h · f (tn , xn , yn ) yn+1 = yn + h · g (tn , xn , yn )

(6.4.9)

At the end of this section, we will discuss the implementation of Euler’s method for systems in Excel. For now, we simply report the results of such an implementation here to see the approximations generated. For the original system we considered above, x = 9y − y 2 , y = x,

x(0) = 1 y(0) = 8

(6.4.10)

recall that this system was also studied in example 6.3.2 in section 6.3. There we observed that the equilibrium solution (0, 9) is a stable center of the system and that we expect elliptical orbits nearby. If, for the IVP (6.4.10), we choose a step-size of h = 0.1 and take enough steps to complete the expected loop in the orbit, we see the abbreviated data in table 6.1. In particular, we notice that after taking a sufﬁcient number of steps to loop back around to near the initial condition (1, 8), we have in fact not returned to this point; in fact, we have missed it appreciably with the two nearest approximations being (0.527, 6.259) and (2.243, 6.312). If we decrease the step size h and take more steps, we can improve the accuracy of the approximation. Doing so with h = 0.01 results in the values in table 6.2. We see that the approximate trajectory has completed one full loop and has nearly returned to pass through the point (1, 8) where the trajectory began. This behavior is more consistent with what we expected based on the classiﬁcation of the equilibrium point (0, 9) as a stable center through linearization in the preceding section.

Euler’s method for nonlinear systems

411

Table 6.1 Euler’s method applied to (6.4.10) with step-size h = 0.1 tn

xn

yn

0

1

8

0.1

1.8

8.1

0.2

2.529

8.28

0.3 .. .

3.12516 .. .

8.5329 .. .

2

−1.146540202

6.373703158

2.1

0.527383445

6.259049138

2.2

2.242958058

6.311787483

2.3

3.93970067

6.536083289

Table 6.2 Euler’s method applied to (6.4.10) with step-size h = 0.01 tn

xn

yn

0

1

8

0.01

1.08

8.01

0.02

1.159299

8.0208

0.03 .. .

1.237838674 .. .

8.03239299 .. .

2.09

0.934286677

7.878994865

2.1

1.022610614

7.888337731

2.11

1.110302289

7.898563837

2.12

1.197299927

7.90966686

In the ﬁrst example with Euler’s method we just completed, we observe one of the major weaknesses of the method: when a large number of steps are needed and some of the changes in x and y are large, a substantial amount of roundoff error enters the calculations. While more sophisticated numerical methods exist

412

Nonlinear systems of differential equations

(and are studied in chapter 7), for now we limit ourselves to Euler’s method in order to ﬁrst get an intuitive feel for the numerical behavior of approximate solutions. Another example follows. Example 6.4.1

For the system of initial-value problems given by x = y − x 3, y = x − y 3,

x(0) = 2 y(0) = −1

(6.4.11)

estimate the solution to the IVP up to t = 5 using h = 0.1 and comment on the behavior of the trajectory. Solution. In the given problem, if we take the perspective that x = f (t , x , y) and y = g (t , x , y), then it follows that f (t , x , y) = y − x 3 and g (t , x , y) = x − y 3 . Applying (6.4.9) with h = 0.1, we have xn+1 = xn + 0.1 · (yn − xn3 ) yn+1 = yn + 0.1 · (xn − yn3 ) Beginning this iteration with x0 = 2 and y0 = −1, we generate the following table. tn

xn

yn

0

2

−1

0.1

1.1

−0.7

0.2

0.8969

−0.5557

0.3 .. .

0.769180708 .. .

−0.448849846 .. .

4.7

0.994536765

0.994533281

4.8

0.995620126

0.995618024

4.9

0.996490144

0.996488877

5

0.997188297

0.997187534

In the table, we see behavior consistent with the fact that the equilibrium point (1, 1) of the system is a stable attracting node. In addition, the numerical data is in agreement with the graphical behavior we expect based on the direction ﬁeld in ﬁgure 6.4 where we ﬁrst considered the given nonlinear system. This behavior is also seen in the following plot in ﬁgure 6.11, which shows the (xn , yn ) data from n = 0, . . . , 50 generated by Excel.

Euler’s method for nonlinear systems

413

1.5 1.0 0.5 0

0.5

1.0

1.5

2.0

Series1

−0.5 −1.0 −1.5 Figure 6.11 The trajectory for the IVP (6.4.11) generated by Euler’s method

with h = 0.1.

Example 6.4.1 shows that when small changes in t lead to very small changes in x(t ) and y(t ), such as near a stable, attracting node, Euler’s method produces reasonable approximations without having to resort to extremely small h-values. We also see the importance of having a theoretical understanding of the expected behavior in advance of executing computations in order to check the reasonableness of our results. 6.4.1 Implementing Euler’s method for systems in Excel

Just as we did for single initial-value problems in section 2.6.1, we will use Excel to generate approximate solutions to system IVPs. In this setting, given an initial value problem x = f (x , y , t ), y = g (x , y , t ),

x(t0 ) = x0 y(t0 ) = y0

(6.4.12)

we seek approximations x1 , x2 , . . . and y1 , y2 , . . . such that (xn , yn ) ≈ (x(tn ), y(tn )), where tn+1 = tn + h for some chosen step-size h. In particular, we have shown that these approximations are generated using Euler’s method by the rule xn+1 = xn + h · f (tn , xn , yn ) yn+1 = yn + h · g (tn , xn , yn )

(6.4.13)

In a spreadsheet, we will view the following data: step number n, stepsize h, tn , xn , yn , f (tn , xn , yn ), and g (tn , xn , yn ), where tn is the value of the independent variable and (xn , yn ) ≈ (x(tn ), y(tn )) is an estimate to the solution to the IVP at

414

Nonlinear systems of differential equations

the value tn . This data will appear in a given row where the row contains all these values for the corresponding n-value. From this, we naturally build subsequent approximations (xn+1 , yn+1 ) based on the preceding row. We will demonstrate the development of such an Excel spreadsheet for the particular example x = y − x 3, y = x − y 3,

x(0) = 2 y(0) = −1

(6.4.14)

that we investigated in example 6.4.1. To begin, we establish names for the various columns, say in cells A1 through G1, and see on our screen in Excel the information below.

1

A

B

n

h

C t

D n

x

E n

y

F n

f(x

n,y

G n) g(x

n,y

n)

In most of the examples we consider with Euler’s method, the system will be autonomous (i.e., t is implicit in the functions f and g ), and therefore we choose to omit t from the column labels for f (tn , xn , yn ) and g (tn , xn , yn ). In the subsequent row 2, we now enter the given data at step zero. In particular, in cell A2 we enter the step number (“0”), in B2 the chosen stepsize (“0.1”), in C2 the starting t -value (“0”), in D2 the starting x-value (“2”), and in E2 the starting y-value (“-1”). Next, in F2, we apply the function f (t , x , y) to get the slope at the point at this step. That is, since in this IVP f (t , x , y) = y − x 3 , we enter in F2 the command “= E2 - D2ˆ3”. Similarly, since g (t , x , y) = x − y 3 , in G2 we enter “= D2 - E2ˆ3”. Now our spreadsheet appears as follows. A

B

1

n

h

2

0

0.1

C t

D n

0

x

E n

2

y

F n

-1

f(x

n,y -9

G n) g(x

n,y

n)

3

In the next row, row 3, we may now build subsequent entries based on existing data. To increase the step number, in A3 we enter “= A2 + 1”. Since the step-size stays constant throughout, in B3 we input “= B2”. Since the next t -value will be the preceding t -value plus the stepsize (t1 = t0 + h), we enter in C3 the command “= C2 + B2”. To compute the next x-value in cell D3 from Euler’s method, we know that x1 = x0 + hf (t0 , x0 , y0 ). Hence, in D3 we write “= D2 + B2*F2”. Similarly, to compute y1 = y0 + hg (t0 , x0 , y0 ), in cell E3 we enter “= E2 + B2*G2”. Finally, we also need values of f (t1 , x1 , y1 ) and g (t1 , x1 , y1 ) for use in the following step. This involves simply updating the functions f (t , x , y) and g (t , x , y) at the given t -, x-, and y-values, so we select cell F2, copy it, and paste it into cell F3. Equivalently, we can directly enter in F3 “= E3 - D3ˆ3”.

Euler’s method for nonlinear systems

415

We can similarly copy G2 into G3, or in G3 enter “= D3 - E3ˆ3”. Below is the current state of our spreadsheet. A

B

1

n

h

C

2

0

0.1

0

2

-1

-9

3

3

1

0.1

0.1

1.1

-0.7

-2.031

1.443

t

D n

x

E n

y

F n

f(x

n,y

G n) g(x

n,y

n)

Now we can harness the power of Excel to compute as many subsequent steps as we like. By using the mouse to highlight row 3, and then placing the cursor on the bottom right corner of cell E3, we can click and drag downward to ﬁll subsequent rows with similar calculations. For example, doing so through row 7 yields the following. A

B

1

n

h

C

2

0

0.1

0

3

1

0.1

0.1

1.1

-0.7

-2.031

1.443

4

2

0.1

0.2

0.8969

-0.5557

-1.2771929

1.0685015

5

3

0.1

0.3

0.7691807 -0.4488498

-0.9039271

0.8596087

6

4

0.1

0.4

0.6787879 -0.3628889

-0.6756426

0.7265762

7

5

0.1

0.5

0.6112237 -0.2902313

-0.5185811

0.6356711

t

D n

x

E n

2

y

F n

-1

f(x

n,y

G n) g(x

-9

n,y

n)

3

As we have noted previously, besides the relative simplicity of these computations, there are further advantages Excel offers. One is that changing one appropriately chosen cell will update all of our computations. For example, if we are interested in the change induced by a different step-size, say h = 0.01, all we need to do is enter “0.01” in cell B2, and every other cell will update accordingly. In addition, if we desire to see the graphical results of our work, we can use Excel’s Chart Wizard. To plot the trajectory generated by our approximations, we can simultaneously highlight the x and y columns in our chart above (cells C2 through C7 and D2 through D7), and then go to Insert menu and select Chart (alternatively, we may click on the Chart Wizard icon on the toolbar). In the prompt window that arises, we choose “XY (Scatter)” and select one of the graph style options at the right. By clicking “Next” in a few subsequent windows (in which advanced users can avail themselves of more options), we eventually get to a ﬁnal window where our graph appears and the option to “Finish.” Clicking on “Finish,” the graph will appear in the spreadsheet and may be moved around by clicking and dragging it accordingly. We see the resulting plot displayed as in ﬁgure 6.12. Exercises 6.4 In exercises 1–7, use Euler’s method with the stated h-value to estimate the solution of the given system of IVPs at the given t -value. Compare your work to

416

Nonlinear systems of differential equations

0.5

1.0

1.5

2.0

−0.2 −0.4 −0.6

Series1

−0.8 −1.0

Figure 6.12 An Excel plot of an approximate solution to the IVP (6.4.14).

a plot of the direction ﬁeld for the system and the classiﬁcation of any relevant equilibrium solutions.3 1. x = y − 2xy , y = 4xy − x ,

x(0) = 0.75 y(0) = 0.5

t = 1, h = 0.1

2. x = 4 − y 2 , y = 1 − x + y,

x(0) = −2 y(0) = −1

t = 1, h = 0.05

3. x = cos y , y = 1 − sin x ,

x(0) = 2 y(0) = 3

4. x = 2x − y , y = −4x + 2y , 5. x = e −y , y = 1/(1 + x 2 ),

x(0) = 0 y(0) = 0

6. x = ln(2 + y), y = x2 + y,

x(0) = −1 y(0) = −0.5

7. x = y − x 2 , y = x − 8y 2 ,

3

x(0) = 1 y(0) = 1

x(0) = 1 y(0) = 0.75

t = 1, h = 0.1 t = 1, h = 0.1 t = 1, h = 0.05 t = 1, h = 0.1 t = 1, h = 0.05

In the exercises of section 6.2, equilibrium solutions were found and direction ﬁelds were plotted in exercises 1–7, which correspond to the same systems of differential equations given here. Similarly, in section 6.3, equilibrium solutions were classiﬁed through linearization in exercises 11–17, which also correspond to these systems.

For further study

417

8. Recall from section 6.1 that the nonlinear system of differential equations W = −0.75W + 0.25MW M = 0.5M − 0.1MW models the numbers of wolves and moose (each measured in hundreds), in a predator-prey model where time is measured in years. Assume that at time t = 0 there are 250 moose and 550 wolves. Estimate the numbers of moose and wolves present at t = 3, 6, and 9 years using a step-size of (a) h = 0.1, and (b) h = 0.01. Discuss your ﬁndings and describe the behavior of the trajectory.4

6.5 For further study 6.5.1 The damped pendulum

In our development of the pendulum equation, we learned that for a pendulum with an arm of length L and bob of mass m, the angle θ that the arm forms with the positive x-axis at time t satisﬁes the IVP L θ = −g sin θ,

θ (0) = θ0 , θ (0) = θ0

(6.5.1)

provided that we assume no friction is present in the screw from which the pendulum hangs and there is no air drag on the bob. Here, we investigate the effects of such resistance on the pendulum’s behavior. (a) Under the natural assumption that the friction or damping that is present is directly proportional to the velocity of the bob along the arc of motion, explain why it follows the pendulum is governed by the IVP L θ = −g sin θ − c θ ,

θ (0) = θ0 , θ (0) = θ0

(6.5.2)

where c is the damping constant. (b) Using the standard change of variables, convert the nonlinear second-order IVP (6.5.2) to a nonlinear system of ﬁrst-order IVPs. Write the system in the form x = F(x) for an appropriate function F. (c) Determine all equilibrium solutions of the system in (b). Are the equilibria different from those of the undamped pendulum? (d) Let a given pendulum have an arm of length L = 1 m, and recall that g = 9.8 m/sec2 . For each of the c-values c = 0.5, c = 1, c = 2, and c = 5, plot the direction ﬁeld for the system in (b) as well as trajectories that correspond to the stated initial conditions below. For each plot, discuss the 4 In the exercises of section 6.2, equilibrium solutions were found and the direction ﬁeld was plotted for this system in exercise 8.

418

Nonlinear systems of differential equations

behavior of the pendulum over time and how damping affects the observed behavior. (i) θ (0) = 2, θ (0) = 0 (ii) θ (0) = 4, θ (0) = 0 (iii) θ (0) = 2, θ (0) = 10 (iv) θ (0) = 2, θ (0) = −10 In addition, be sure to discuss the physical interpretation of each set of initial conditions and how these conditions affect the trajectories. (e) Using c = 1, ﬁnd the linear approximation of the system in (b) at two different equilibrium points, one that is stable and another that is unstable. Discuss the graphical behavior of the two linear systems you ﬁnd near the equilibrium points and how this compares to the plot of the corresponding direction ﬁeld in (d). (f) Again using c = 1 and L = 1, apply Euler’s method with h = 0.01 to the system in (b) with the initial conditions θ (0) = 2, θ (0) = 10. Experiment with how many steps are needed in order to have the approximations approach the stable equilibrium (2π, 0), plot the approximations you compute, and compare the results to the appropriate direction ﬁeld in (d). 6.5.2 Competitive species

In our development of the predator–prey equations, we used the fundamental assumption that the prey population would, in the absence of a predator, grow according to an exponential model, and similarly that the predator would decay exponentially if no prey is available. These hypotheses led us to equations of the form x = ax − cxy (6.5.3) y = −by + dxy where x is the prey population and y represents the number of predators. Recall that the terms −cxy and dxy represent a fraction of the number of predator–prey interactions that are, respectively, harmful or beneﬁcial to the two species. In what follows, we consider a similar scenario where, instead of one species preying on the other, two species are competing for resources. In this setting, species interactions (modeled by “xy”) are harmful to both species. In addition, rather than assuming exponential growth or decay for the individual populations, we explore the affects of the assumption that each population on its own grows logistically. (a) Assume that in the absence of another species competing for resources, the population x(t ) grows according to the logistic model " x# x = ax 1 − A

For further study

419

where a and A are positive constants (a is the population’s growth constant and A is its carrying capacity). Similarly, for a second population y(t ), assume that without another competing species present y(t ) is governed by the model " y# y = by 1 − B where b and B are positive constants. By viewing a fraction of the interactions xy as harmful, we can subtract from each of the above differential equations a term proportional to xy – say α xy from x and β xy from y – to account for this competition. Do so, and show that the populations x(t ) and y(t ) satisfy the system of equations given by x = ax(1 − A1 x − αa y) y = by(1 − B1 y − βb x)

(6.5.4)

(b) Throughout the remaining questions, we assume that x and y represent populations measured in thousands. We explore the impact of different constants in the equations, as well as various initial conditions. In (6.5.4), let a = 0.5, b = 0.25, A = 5, B = 2, α = 0.04, and β = 0.02. Find all equilibrium points of the system. (Hint: there are more than two equilibria.) (c) At each of the equilibrium points determined in (b), compute the linearization of the system (6.5.4), and hence determine the stability of the equilibrium point. (d) In an appropriate window, plot the direction ﬁeld for the system (6.5.4) and discuss how the direction ﬁeld supports your conclusions regarding the stability of various equilibrium points in (c). Discuss the long-term behavior of the two populations for several different initial conditions. (e) With the initial conditions x(0) = 2, y(0) = 2, use Euler’s method for systems to estimate the values of the populations at a range of time values. Use a step size of h = 0.1 and compare your results to the plot in (d). (f) In (6.5.4), use the parameter values given in (b), except change the carrying capacity of the second population to B = 15. Respond to prompts (b), (c), (d), and (e) for this scenario and compare and contrast the updated system with the ﬁrst one considered. In the new situation, which population will dominate in the long run? Why do you think this is the case? (g) In (6.5.4), let a = 0.5, b = 0.25, A = 5, and B = 2, but now adjust the parameters α and β to reﬂect greater competition for resources by setting α = 0.4, and β = 0.2. Respond to prompts (b), (c), (d), and (e) for this scenario and compare and contrast the updated system with the ﬁrst one considered. In the new situation, which population is more likely to

420

Nonlinear systems of differential equations

dominate in the long run? For which initial conditions is the weaker population able to survive? (h) Suppose there are three different species x, y, and z, all competing for resources. Under the assumption that population interactions xy and xz are harmful to x, and so on, what system of differential equations models the behavior of the three species?

7 Numerical methods for differential equations

7.1 Motivating problems

In previous chapters, we have learned to solve a wide range of differential equations. Primarily, our focus has been on linear differential equations: ﬁrstorder linear equations, higher order linear equations with constant coefﬁcients, and systems of linear equations with constant coefﬁcients. Indeed, we have learned through a variety of techniques that under the proviso that a differential equation or system is linear, we can almost always ﬁnd a solution. The situation is much more complicated for nonlinear equations. For example, while we can use an integrating factor to solve the linear ﬁrst-order differential equation y + y = t , if we replace y by y 2 , the differential equation y + y2 = t

(7.1.1)

is no longer linear. In addition, (7.1.1) is not separable, nor is it exact. With none of our established analytical methods available, it appears that we cannot solve this differential equation. If faced with the related initial-value problem y + y2 = t ,

y(0) = 1

(7.1.2)

we know that we can visually approximate a solution by plotting the direction ﬁeld that corresponds to the differential equation. Moreover, we learned in section 2.6 that we can generate a sequence of estimates of the values of the solution y(t ) at discrete t -values separated by a step-size h according to the rule tn+1 = tn + h

and yn+1 = yn + hf (tn , yn ), for n ≥ 0 421

(7.1.3)

422

Numerical methods for differential equations

The algorithm that generates this sequence of approximations is called Euler’s method. We encounter the same difﬁculties with higher order differential equations. While we can solve almost any higher order linear equation with constant coefﬁcients, such as y + a1 y + a0 y = f (t ) nonlinear equations are much more difﬁcult. For instance, as discussed in section 6.1, a simple pendulum may be modeled by the nonlinear second-order initial-value problem g θ + sin θ = 0, θ (0) = θ0 , θ (0) = θ1 (7.1.4) L where θ (t ) is the angle the arm of the pendulum forms with a vertical axis at time t . In chapter 6, we introduced several different approaches to approximate the solution to (7.1.4); each was based on converting the second-order equation to a system of ﬁrst-order equations and approximating the solution to the resulting system. Finally, nonlinear systems of differential equations are important in their own right. A prominent example is the predator–prey equations, discussed in detail in section 6.1, where two populations M (t ) and W (t ) (in hundreds) are modeled by the following system of nonlinear ﬁrst-order initial-value problems: W = W (−0.75 + 0.25M ), W (0) = 3 (7.1.5) M = M (0.5 − 0.1W ), M (0) = 7 As with the pendulum, the nonlinearity of these equations makes determining an analytical solution (i.e., formulas for W (t ) and M (t )) impossible, and therefore we must instead be content to ﬁnd approximate solutions. In section 6.4, we introduced an extension of Euler’s method that can be used to produce some basic approximations to the solution of a system of nonlinear initial-value problems such as (7.1.5). But through a variety of examples considered in sections 2.6 and 6.4, we have seen that Euler’s method has a big downside: each step produces signiﬁcant error, and each step compounds the error from the preceding step. To get an accurate approximation using Euler’s method, a very small step-size h is usually needed. With modern computing power so readily available, we might be tempted to simply take very small h-values in this approach and be content to do thousands of computations to get estimates of solutions. But taking smaller and smaller values of h proves to be an unsatisfactory approach for many reasons, perhaps most signiﬁcantly because of the fact that as numbers get extremely small, computers have great difﬁculty distinguishing them from zero and major round-off errors can result. Instead, we will seek to develop approaches in the spirit of Euler’s method, but more sophisticated in that they naturally reduce the error that comes from using a step of h = t . Our goal is to develop numerical methods for initial-value

Beyond Euler’s method

423

problems (for ﬁrst-order, higher order, and systems) that, given a step-size h, produce an accurate approximate solution to the initial-value problem. We desire that the methods give reasonably good approximations for small (but not too small) values of h, while at the same time not requiring too many calculations. In the upcoming sections, we will discuss problems of the nature of (7.1.2), (7.1.4), (7.1.5), and more, and develop and apply algorithms that produce acceptable approximations to solutions.

7.2 Beyond Euler’s method

To approach an initial-value problem that we cannot solve by standard techniques, such as separation of variables or integrating factors, we have learned that one option is to use Euler’s method. Given the IVP y = f (t , y),

y(t0 ) = y0

this algorithm generates a sequence of points (t1 , y1 ), (t2 , y2 ), . . ., (tn , yn ) according to the rule yn+1 = yn + hf (tn , yn )

for n ≥ 0

(7.2.1)

where tn+1 = tn + h. Each yn is an approximation to the value of the actual solution y at the value tn . That is, y(tn ) ≈ yn . Euler’s method is developed by using the standard tangent line approximation in calculus. While this is instructive and intuitive, the method is the least accurate of many other available methods. In this section, we begin to develop algorithms beyond Euler’s method in an effort to increase the accuracy of our approximations while actually decreasing the number of computations we execute. Before we develop new approaches, we ﬁrst revisit some important concepts from numerical integration in calculus. These ideas not only remind us of key issues in approximation techniques, but also inform our efforts to approximate solutions to initial-value problems. Given a continuous function f (t ) on an t +h interval [t0 , t0 + h ], there are several basic approximations to t00 f (t ) dt . Speciﬁcally, t0 + h f (t ) dt ≈ h · f (t0 ) (left endpoint rule) tt00 +h f (t ) dt ≈ h · f (t0 + h) (right endpoint rule) tt00 +h f (t0 )+f (t0 +h) f (t ) dt ≈ h · (trapezoid rule) t " 2 # t00 +h h f (t ) dt ≈ h · f t0 + 2 (midpoint rule) t0 It is a standard exercise in calculus to show that the left and right endpoint rules are the least accurate approximations of the four, while the midpoint rule is the best. While one can make sophisticated arguments using Taylor series to justify claims about the size of the error in such an approximation, visual arguments are

424

Numerical methods for differential equations

just as convincing: sampling f at the midpoint of the interval usually balances the behavior of the function and leads to the best approximation of the integral of the four options above. There is a direct link between the numerical approximation of deﬁnite integrals and numerical methods to estimate solutions to initial-value problems such as Euler’s method. Given the IVP y (t ) = f (t , y),

y(t0 ) = y0

if we integrate both sides of the differential equation with respect to t from t = t0 to t = t0 + h for some h > 0, then t0 +h t0 +h y (t ) dt = f (t , y(t )) dt (7.2.2) t0

t0

Integrating the left side of (7.2.2), we have t0 +h f (t , y(t )) dt y(t0 + h) − y(t0 ) = t0

or equivalently

y(t0 + h) = y(t0 ) +

t0 +h

f (t , y(t )) dt

(7.2.3)

t0

Estimating the integral in (7.2.3) with the left endpoint rule, y(t0 + h) ≈ y(t0 ) + hf (t0 , y(t0 ))

(7.2.4)

Using the initial condition y(t0 ) = y0 , it follows that y(t0 + h) ≈ y0 + hf (t0 , y0 )

(7.2.5)

which is precisely the ﬁrst step in Euler’s method. That is, we have shown in our efforts to step from t = t0 to t = t0 + h along the solution y(t ) that this process can be equivalently achieved by estimating the value of a deﬁnite integral. Moreover, Euler’s method can be viewed as arising naturally from estimating the required deﬁnite integral through a left endpoint rule. As such, it is not surprising that Euler’s method is not an accurate approach, for neither is the left endpoint rule for approximating integrals. The availability of the trapezoid and midpoint rules as better approximations leads us to consider two improvements upon Euler’s method. 7.2.1 Heun’s method

To improve on Euler’s method, we return to (7.2.3), and instead estimate the deﬁnite integral on the right-hand side with the trapezoid rule. Doing so, we ﬁnd f (t0 , y(t0 )) + f (t0 + h , y(t0 + h)) y(t0 + h) ≈ y(t0 ) + h · (7.2.6) 2

Beyond Euler’s method

425

The difﬁculty in (7.2.6) is that the last term in the approximation on the righthand side involves y(t0 + h), the very quantity we are trying to estimate. One way to view what is occurring in this approach is that we are trying to use not only the slope at (t0 , y0 ), computed as f (t0 , y0 ), but also the slope at (t0 + h , y(t0 + h)). While we do not know y(t0 + h) exactly, we can estimate this value using Euler’s method. In particular, if we use the fact that y(t0 ) = y0 and employ the Euler approximation y(t0 + h) ≈ y0 + hf (t0 , y0 ), then from (7.2.6) we ﬁnd that y(t0 + h) ≈ y0 + h ·

f (t0 , y0 ) + f (t0 + h , y0 + hf (t0 , y0 )) 2

(7.2.7)

Generalizing (7.2.7) to the situation where we are moving from the known approximation y(tn ) ≈ yn at point (tn , yn ) to a new approximation (tn+1 , yn+1 ) with tn+1 = tn + h, we have developed Heun’s method given by yn+1 = yn + h ·

f (tn , yn ) + f (tn+1 , yn + hf (tn , yn )) 2

(7.2.8)

Because this algorithm is more complicated than Euler’s method, some additional notation can assist us in its implementation. We ﬁrst let an = f (tn , yn )

(7.2.9)

which is the slope of the solution curve at (tn , yn ) given by the IVP. We observe that the expression an arises twice in (7.2.8), and that we also have to compute f (tn+1 , yn + han ). We therefore let bn = f (tn+1 , yn + han )

(7.2.10)

It follows that Heun’s method is then executed by computing yn+1 = yn + h ·

a n + bn 2

(7.2.11)

In this light, we see that Heun’s method uses the average of two slopes (the slope at (tn , yn ) and the approximate slope at (tn+1 , yn+1 )) in order to predict the next value of the solution y(t ). We consider an example to demonstrate how Heun’s method is implemented and to contrast its results with those from Euler’s method. Example 7.2.1 Execute ten steps of Heun’s method with h = 0.1 to ﬁnd an approximate solution of the initial-value problem y = 2t (2 − y),

y(0) = 1

Compare the results to Euler’s method as well as the exact solution of the IVP. Solution. Note ﬁrst that the given differential equation is both linear and 2 separable. The exact solution of the IVP is y(t ) = 2 − e −t .

426

Numerical methods for differential equations

To apply Heun’s method, we must compute an , bn , and yn at each step. To begin, a0 = f (t0 , y0 ). From the stated IVP, f (t , y) = 2t (2 − y) and (t0 , y0 ) = (0, 1). Thus, a0 = 2 · 0 · (2 − 1) = 0 In addition, b0 = f (t1 , y0 + ha0 ), so b0 = 2 · 0.1 · (2 − (1 + 0.1 · 0)) = 0.2 With both a0 and b0 calculated, we can now determine y1 to be h 0.1 (0 + 0.2) = 1.01 y1 = y0 + (a0 + b0 ) = 1 + 2 2 Repeating these same steps to determine y2 , we ﬁnd that a1 = f (t1 , y1 ) = f (0.1, 1.01) = 2 · 0.1 · (2 − 1.01) = 0.198 and b1 = f (t2 , y1 + ha1 ) = f (0.2, 1.01 + 0.1 · 0.198) = 2 · 0.2 · (2 − 1.0298) = 0.38808

so that 0.1 (a1 + b1 ) = 1.01 + 0.05(0.198 + 0.38808) = 1.039304 2 Implementing the remaining computations in a program such as Excel, it follows that we can generate the values shown in table 7.1. Included in the table are the approximations generated by Euler’s method, as well as the errors resulting from both methods which are computed by comparison to the exact solution of the IVP. For simplicity, we report the results from every other step in each algorithm. y2 = y1 +

Table 7.1 Euler’s method and Heun’s method applied to the IVP y = 2t(2 − y), y(0) = 1, using h = 0.1 Euler

Heun

Solution

Euler error

Heun error

tn

yn

yn

y(tn )

|y(tn ) − yn |

|y(tn ) − yn |

0

1

1

1

0

0

0.2

1.02

1.039304

1.039210561

0.019989439

0.000093439

0.4

1.115648

1.147959794

1.147856211

0.038539949

0.000103583

0.6

1.267756544

1.302226785

1.302323674

0.053302085

0.000096889

0.8

1.445838152

1.472149858

1.472707576

0.061796472

0.000557718

1

1.618293319

1.630946606

1.632120559

0.062514097

0.001173953

Beyond Euler’s method

427

Obviously, Heun’s method is a major improvement over Euler’s method. In fact, given that we use the Euler approximation at each step to help forecast the next slope encountered, it is somewhat remarkable how accurate Heun’s method is. It can be shown rigorously that the error in Heun’s method is a signiﬁcant improvement over Euler’s method by relating the error in the approximation to the step-size h; it turns out1 that the error in Euler’s method is proportional to h 2 , while the error in Heun’s method is proportional to h 3 . Finally, we might observe that it appears unusual that the error in Heun’s method actually drops from t4 = 0.4 to t6 = 0.6, and that the growth in the error slows in Euler’s method at the same stage. This is due to the fact that the solution function 2 y(t ) = 2 − e −t is an increasing function whose concavity changes (from concave up to concave down) at the point t = 1/2; the change in concavity allows the linear approximations to temporarily catch up, instead of having the error continue to increase at an increasing rate. We have seen that Heun’s method is developed using an application of the trapezoid rule in numerical integration. We consider another similar method (based on the midpoint rule) before introducing more sophisticated techniques in section 7.3. 7.2.2 Modiﬁed Euler’s method

The midpoint rule is normally more accurate than the trapezoid rule.2 Given our experience with Heun’s method and its connection to the trapezoid rule, it makes sense to see if we can develop a related method that uses the perspective of the midpoint rule. Recalling (7.2.3), t 0 +h y(t0 + h) = y(t0 ) + f (t , y(t )) dt t0

if we use the midpoint rule to estimate the integral, then we have to evaluate the integrand at the midpoint t0 + h /2 of the interval [t0 , t0 + h ]. Doing so, h h (7.2.12) y(t0 + h) ≈ y(t0 ) + hf t0 + , y t0 + 2 2 As with Heun’s method, in the context of trying to solve the IVP y = f (t , y), y(t0 ) = y0 , only y(t0 ) is known. Thus, we do not know—and therefore have to estimate—the value of y(t0 + h /2) in (7.2.12). We again employ Euler’s method and write h h ≈ y(t0 ) + f [t0 , y(t0 )] y t0 + (7.2.13) 2 2 1 A more formal analysis of errors that shows the dependence on powers of h is discussed in section 7.3. 2 On an interval where f (x) has consistent concavity, the midpoint rule is approximately twice as accurate as the trapezoid rule.

428

Numerical methods for differential equations

Substituting (7.2.13) in (7.2.12) and replacing y(t0 ) with y0 , h h y(t0 + h) ≈ y0 + hf t0 + , y0 + f (t0 , y0 ) (7.2.14) 2 2 Generalizing (7.2.14) to the situation where we are moving from a known approximation y(tn ) ≈ yn at point (tn , yn ) to the next approximation at (tn+1 , yn+1 ), we have developed the Modiﬁed Euler method given by h h yn+1 = yn + hf tn + , yn + f (tn , yn ) (7.2.15) 2 2 As with Heun’s method, some additional notation assists us in tracking our computations. Let an = f (tn , yn ) and h cn = yn + an 2 so that h yn+1 = yn + hf tn + , cn (7.2.16) 2 We consider an example in order to see the implementation of the Modiﬁed Euler method and to compare its results to those of Heun’s method. We again employ an IVP that we can solve exactly in order to compare the errors of the two methods. Example 7.2.2 Consider the initial-value problem y = e 2t − y, y(0) = 1. Apply the Modiﬁed Euler method to estimate the value of y(1) using h = 0.1 and compare the results with Heun’s method and the exact solution. Solution. Since y = e 2t − y is a linear ﬁrst-order differential equation, we can ﬁnd the general solution y(t ) = Ce −t + 13 e 2t , and hence the exact solution to the IVP is 2 1 y(t ) = e −t + e 2t 3 3 To begin the Modiﬁed Euler method, we know from the given IVP that f (t , y) = e 2t − y and that (t0 , y0 ) = (0, 1). Thus, a0 = f (t0 , y0 ) = e 2·0 − 1 = 0. Next, we observe that c0 = y0 + h2 a0 = 1 + 0.05 · 0 = 1. To compute y1 , by (7.2.16) we have h y1 = y0 + hf t0 + , c0 = 1 + 0.1 · (exp 2(0 + 0.05) − 1) 2 = 1 + 0.1 · 0.105170918 = 1.010517092

Continuing to the next step, a1 = f (t1 , y1 ) = exp (2 · 0.1) − 1.010517092 = 0.210885666. Next, h c1 = y1 + a1 = 1 + 0.05 · 0.210885666 = 1.021061375 2

Beyond Euler’s method

429

Table 7.2 Heun’s method and Modiﬁed Euler’s method (ME) applied to the IVP y = e2t − y, y(0) = 1 with h = 0.1 Heun

ME

Solution

Heun error

ME error

tn

yn

yn

y(tn )

|y(tn ) − yn |

|y(tn ) − yn |

0

1

1

1

0

0

0.2

1.044572834

1.043396835

1.043095401

0.001477433

0.000301434

0.4

1.192009094

1.189291538

1.188727007

0.003282087

0.000564531

0.6

1.478251184

1.473408204

1.472580065

0.005671119

0.000828139

0.8

1.959569856

1.951698881

1.950563451

0.009006405

0.00113543

1

2.722082435

2.70981115

2.70827166

0.013810775

0.001539489

Finally,

h y2 = y1 + hf t1 + , c1 = 1.010517092 + 0.1 · (exp2(0.1 + 0.05) − 1.021061375) 2 = 1.010517092 + 0.1 · 0.328797432 = 1.043396835

Executing eight more steps using a computer, we ﬁnd the results in table 7.2. We also show the results from Heun’s method in order to make a comparison between the two approaches we have developed beyond Euler’s method, again reporting the results from every other step. From the table, we see that the Modiﬁed Euler method is an improvement over Heun’s method. This is not too surprising since the former stems from the midpoint rule for integration, while the latter from the trapezoid rule. In addition, if we plot the exact solution function, we see that the solution is always increasing and concave up over the interval of interest; in the presence of such consistent concavity in the solution function, the midpoint rule will generate noticeably more accurate approximations than will the trapezoid rule. Obviously Heun’s method and the Modiﬁed Euler method are substantial improvements over the standard Euler’s method. Not only are their errors much smaller, but the errors grow less quickly. To better understand why this is so, observe that Euler’s method relies solely on presently available data in generating its estimates. That is, the method takes an approach that relies on just one data point in order to proceed to the next approximation. Our two newest methods instead look into the future: rather than using the current point and the slope at that location, they use the current point and an estimate of the slope at a point that is ahead of our current location. We create these estimates using only

430

Numerical methods for differential equations

the currently available data, but the approaches lead to a substantial increase in accuracy that makes us hopeful for signiﬁcant improvements through other predictive approximation techniques that we are yet to investigate. Exercises 7.2 In exercises 1–10, use (a) Euler’s method, (b) Heun’s method, and (c) the Modiﬁed Euler method to estimate y(1) using h = 0.1, and compare the approximations generated by the three methods. In exercises 1–6, compare the approximations with the exact solution. 1. y + 2ty = 0, 2.

y

y(0) = −2

= 2y − 1,

y(0) = 2

3. y − y = 0, 4.

(y )2 − 2y

y(0) = 2

= 0,

5. y − y 2 = 1, 6.

tyy

y(0) = 2 y(0) = 0

= −1 − y 2 ,

y(0) = 2

7. y + ty = t 2 , 8.

y + y2

y(0) = 1

= t,

y(0) = 1

9. y + sin y = 2e −t , √ 10. y = 2e t /2 sin y,

y(0) = 0 y(0) = 0

7.3 Higher order methods

In calculus, we learn that if F (x) is a function with n + 1 derivatives in an interval surrounding a value x = a, then F has a Taylor polynomial expansion that obeys the relationship F (x) = F (a) + F (a)(x − a) +

F (a) F (n) (a) (x − a)2 +···+ 2! n!

F (n+1) (ζx ) (x − a)n+1 (7.3.1) (n + 1)! which is valid for x-values in an interval surrounding a and ζx is a number within that interval that depends on x. If we think of our interest in the solution y(t ) of an initial-value problem, assuming that y is sufﬁciently differentiable, the Taylor series expansion of y provides insight into errors that arise in approximation schemes. In (7.3.1), if we replace F by y, a by t0 , and x by t0 + h, noting that x − a = h, it follows that +

h 2 hn y (t0 ) + · · · + y (n) (t0 ) + O(h n+1 ) (7.3.2) 2! n! n + 1 n + 1 where by “O(h )” we mean “of order h or “proportional to h n+1 .” y(t0 + h) = y(t0 ) + hy (t0 ) +

Higher order methods

431

From (7.3.2), we can discern the so-called truncation error of certain methods. For example, if we use the approximation y(t0 + h) ≈ y(t0 ) + hy (t0 )

(7.3.3)

which corresponds to Euler’s method,3 we see that the truncation error is proportional to h 2 from the equation y(t0 + h) = y(t0 ) + hy (t0 ) + O(h 2 ). We therefore say that Euler’s method is ﬁrst-order, in reference to the highest power of h present in (7.3.3). Since we use a small step-size h, it is evident that higher order methods are superior: in the error due to truncation, higher powers of h will approach zero faster. In what follows, we will investigate second-, third-, and fourthorder approaches. The ﬁrst two arise through using the Taylor series expansion directly, and are therefore called Taylor methods. 7.3.1 Taylor methods

To employ a second-order Taylor method, from (7.3.2) we must be able to compute h2 (7.3.4) y(t0 + h) ≈ y(t0 ) + hy (t0 ) + y (t0 ) 2 In a standard initial-value problem, we are given y = f (t , y) (plus an initial condition), so we can compute y from the form of the differential equation. In particular, since y (t ) = f (t , y(t )) the chain rule for functions of two variables,4 implies that d

f (t , y(t )) dt d d = ft (t , y) [t ] + fy (t , y) [y ] dt dt = ft (t , y) + fy (t , y)y

y (t ) =

= ft (t , y) + fy (t , y)f (t , y)

(7.3.5)

Combining (7.3.5) with (7.3.4), we have developed the second-order Taylor method given by h2 [ft (t0 , y0 ) + fy (t0 , y0 )f (t0 , y0 )] 2 Generalizing (7.3.6) to the step from yn to yn+1 , we ﬁnd that y(t0 + h) ≈ y(t0 ) + hf (t0 , y0 ) +

yn+1 = yn + hf (tn , yn ) +

h2 [ft (tn , yn ) + fy (tn , yn )f (tn , yn )] 2

(7.3.6)

(7.3.7)

Observe that we are writing y (t0 ), which is given by f (t0 , y0 ) in Euler’s method. We are using the rule that if f (x , y) is a differentiable function of x and y, and x and y are each differentiable functions of t , then d /dt [f (x , y)] = fx (x , y)dx /dt + fy (x , y)dy /dt . 3 4

432

Numerical methods for differential equations

where yn ≈ y(tn ). We consider an example to demonstrate the implementation of this method and compare it to results previously considered. Example 7.3.1 Execute ten steps of the second-order Taylor series method with h = 0.1 to ﬁnd an approximate solution of the initial-value problem y(0) = 1 y = e 2t − y , Compare the results to those of Heun’s method and to the exact solution. Solution. This is the same IVP that we considered in example 7.2.2 with Heun’s method and the Modiﬁed Euler method. To employ (7.3.7), we ﬁrst must compute ft (t , y) and fy (t , y). Since f (t , y) = e 2t − y, we know that ft (t , y) = 2e 2t and fy (t , y) = −1. In addition, to simplify the implementation of the method, we use notation similar to Heun’s method. We let an = f (tn , yn ), rn = ft (tn , yn ), and sn = fy (tn , yn ), so that h2 [rn + sn an ] 2 Beginning with t0 = 0 and y0 = 1, observe that a0 = f (0, 1) = e 2·0 − 1 = 0 yn+1 = yn + han +

r0 = ft (0, 1) = 2e 2·0 = 2 s0 = fy (0, 1) = −1 We then have y1 = y0 + ha0 +

h2 [r0 + s0 a0 ] 2

= 1 + 0.1 · 0 +

0.12 [2 − 1 · 0] 2

= 1.01 Similarly, we can compute a1 = f (0.1, 1.01) = e2·0.1 − 1.01 = 0.211402758

r1 = ft (0.1, 1.01) = 2e2·0.1 = 2.442805516 s1 = fy (0.1, 1.01) = −1 and thus y2 = y1 + ha1 +

h2 [r1 + s1 a1 ] 2

= 1.01 + 0.1 · 0.211402758 +

0.12 [2.442805516 − 1 · 0.211402758] 2

= 1.04229729 Continuing these computations through ten steps, we ﬁnd the results noted in table 7.3, which are listed for every other step. Note, too, that we have included

Higher order methods

Table 7.3 Taylor’s method and Heun’s method applied to the IVP y = 2t(2 − y), using h = 0.1

433

y(0) = 1

Taylor

Heun

Solution

Taylor error

Heun error

tn

yn

yn

y(tn )

|y(tn ) − yn |

|y(tn ) − yn |

0

1

1

1

0

0

0.2

1.04229729

1.044572834

1.043095401

0.000798112

0.001477433

0.4

1.186750654

1.192009094

1.188727007

0.001976353

0.003282087

0.6

1.468880073

1.478251184

1.472580065

0.003699992

0.005671119

0.8

1.944339609

1.959569856

1.950563451

0.006223842

0.009006405

1

2.698337638

2.722082435

2.70827166

0.009934023

0.013810775

the results of Heun’s method from its application to the same IVP with the same step-size h = 0.1. From table 7.3, we can see that the errors in Heun’s method and the secondorder Taylor method are roughly proportionate and seem to grow at the same rate. This suggests that Heun’s method may also be a second-order method—an assertion that may be proved by studying related higher order methods. In particular, Heun’s method can be viewed as one of a collection of algorithms known as Runge–Kutta methods, which we will consider after some additional work with Taylor methods. Having shown that we can use the Taylor series (7.3.2) to motivate the development of the second-order method (7.3.7), it is natural to wonder if we could extend this work further to a third-order method. This is desirable since if the error in our method is proportionate to h 4 , then the method will be more accurate without having to use smaller values of h. It is indeed possible to develop a third-order method, provided that the function f (t , y) from the given IVP is sufﬁciently differentiable. In particular, in order to write y(t0 + h) ≈ y(t0 ) + hy (t0 ) +

h 2 h3 y (t0 ) + y (t0 ) 2 3!

(7.3.8)

we must compute the third derivative of y. From our earlier work (7.3.5), we know that y = ft (t , y) + fy (t , y)f (t , y)

(7.3.9)

434

Numerical methods for differential equations

Applying the chain rule to the ﬁrst term in (7.3.9), along with the fact that y = f (t , y), d

d d ft (t , y) = ftt (t , y) [t ] + fty (t , y) [y ] dt dt dt = ftt (t , y) + fty (t , y)f (t , y)

(7.3.10)

where the ﬁnal step follows from using y

= f (t , y). Using both the product rule and the chain rule on the second term in (7.3.9) and suppressing the “(t , y)” argument of each function present,

d d d fy f = fy f + fy f dt dt dt = fy (ft + fy f ) + (fyt + fyy f )f = fy ft + fy2 f + fyt f + fyy f 2

(7.3.11)

Combining (7.3.10) and (7.3.11) and using the fact that fty = fyt , we have shown that y = ftt + fty f + fy ft + fy2 f + fyt f + fyy f 2 = ftt + 2fty f + fy ft + fy2 f + fyy f 2

(7.3.12)

From (7.3.12), we understand why we normally do not use third-order Taylor methods in practice: the computations are extremely cumbersome. Were we to attempt to write h 2 h3 y (t0 ) + y (t0 ) 2 3! in terms of the function f from the given IVP, we would have to compute y(t0 + h) ≈ y(t0 ) + hy (t0 ) +

h2 h3 (ft + fy f ) + (ftt + 2fty f + fy ft + fy2 f + fyy f 2 ) 2 3! where each appearance of the function f or one of its partial derivatives is also being evaluated at the point (t0 , y0 ). This combination of the determination of a large number of functions and the evaluation of each at every stage of an algorithm makes Taylor methods of orders higher than two unreasonable to use. Hence, we next introduce one of the most popular and effective numerical methods for the solution of IVPs (known as Runge–Kutta methods) that enable us to achieve higher order approximations without the difﬁculty of computing multiple partial derivatives and evaluating these functions repeatedly. y(t0 + h) ≈ y0 + hf +

7.3.2 Runge–Kutta methods

Where higher order Taylor methods require ﬁnding partial derivatives of y = f (t , y) and evaluating these derivatives at each stage of the algorithm, Runge–Kutta methods seek to avoid using partial derivatives altogether, while

Higher order methods

435

still achieving the desired higher order accuracy. Instead, in Runge–Kutta methods the function f is evaluated at a greater number of points, essentially seeking to compute the slope at the current and future points in an effort to make as accurate a prediction as possible. Formally, Runge–Kutta methods can be viewed as a generalization of Heun’s method. Recall that in Heun’s method we write h yn+1 = yn + (an + bn ) 2 where an = f (tn , yn ) and bn = f (tn+1 , yn + han ) Rather than prescribing that we compute or estimate slopes at the points (tn , yn ) and (tn+1 , yn+1 ) and simply average them, a two-stage Runge–Kutta method takes an arbitrary combination of the function values f (tn , yn ) and f (tn + α h , yn + β hf (tn , yn )). Speciﬁcally, we set yn+1 = yn + c1 hf (tn , yn ) + c2 hf (tn + α h , yn + β hf (tn , yn )) (7.3.13) and then determine conditions on c1 , c2 , α , and β that guarantee the approximation generated by (7.3.13) is second-order through a comparison to the Taylor expansion of y(tn + h). It can be shown that among the inﬁnitely many possible valid choices for c1 , c2 , α , and β , taking α = β = 1 and c1 = c2 = 1/2 results in Heun’s method, which justiﬁes the fact that Heun’s method is second-order. Heun’s method is an example of a two-stage Runge–Kutta method; twostage refers to the fact that slopes are evaluated or estimated at two points. It is possible to achieve even higher order Runge–Kutta methods by generalizing the idea in (7.3.13). In particular, we can take arbitrary combinations of the values (or estimated values) of f (t , y) at points in the interval tn ≤ t ≤ tn+1 and select the weights so that the approximation agrees with the Taylor series expansion for y(tn + h) up to, and including, the term involving h 4 , h 5 , or whatever accuracy we desire. The details of the rigorous development of such methods are complicated and unenlightening. But, a more intuitive approach can help us gain a better sense of why the Runge–Kutta method works so well and where the formulas used in the algorithm come from. If we recall our development of Heun’s method and the Modiﬁed Euler method, each was linked to the idea of numerically approximating a deﬁnite integral. Speciﬁcally, Heun’s method is analogous to the trapezoid rule, and the Modiﬁed Euler method corresponds to the midpoint rule. The trapezoid rule and midpoint rule both give the exact value of the deﬁnite integral of any linear function; in addition, when a function has consistent concavity over an interval, the midpoint rule is roughly twice as accurate as the trapezoid rule and the errors in the midpoint and trapezoid rules have opposite signs. As such, it makes sense to take a weighted average of the two rules in an effort to cancel out the error of each. Computing the weighted average 2 · MID + TRAP 3

436

Numerical methods for differential equations

results in a new method known as Simpson’s rule that is a remarkably accurate approximation of the deﬁnite integral. In fact, it can be shown that Simpson’s rule is exact for every cubic polynomial. This same increase in accuracy can be accomplished through similar ideas in the numerical approximation of solutions to initial-value problems. Recalling our work with Heun’s method (H) and the Modiﬁed Euler method (ME), H: ME:

f (tn , yn ) + f (tn+1 , yn + hf (tn , yn )) 2 h h = yn + hf tn + , yn + · f (tn , yn ) 2 2

yn+1 = yn +

(7.3.14)

yn +1

(7.3.15)

we note that each uses a different expression for y, the approximate change in y(t ) in moving from tn to tn+1 . If we let h yH = [f (tn , yn ) + f (tn+1 , yn + hf (tn , yn ))] 2 and

yME = hf

h h tn + , yn + · f (tn , yn ) 2 2

then the analogy to Simpson’s Rule for approximating the solution y to the IVP y = f (t , y), y(t0 ) = y0 is given by yn+1 = yn +

2yME + yH 3

(7.3.16)

Using (7.3.14) and (7.3.15) and letting an = f (tn , yn ), we have the approximation rule given by yn+1 = yn + yS where 2 h h 1 h yS = hf tn + , yn + an + · [an + f (tn+1 , yn + han )] 3 2 2 3 2 h h h an + 4f tn + , yn + an + f (tn+1 , yn + han ) (7.3.17) = 6 2 2 If we slightly modify this expression for yS in recognition of the fact that as we proceed across the interval we have more and more information available (and hence a better approximation of the slope to use), the fourth-order Runge– Kutta rule emerges. In particular, rather than rely on the value an at every stage in (7.3.17), we recognize that we are attempting to compute approximate slopes at not just the left endpoint, but also at the midpoint and right endpoint. It makes sense that we should use these approximations as they become available to us; for instance, when we compute the approximate slope at the right endpoint, we ought to use the approximate slope at the midpoint to do so. Furthermore, given that the midpoint slope is weighted at 4 and the others at 1 in the average given by (7.3.17), it is reasonable to invest additional effort ensuring that the midpoint slope is as accurate as possible.

Higher order methods

437

As in Heun’s method, the computations are easier to understand, track, and implement if we introduce some additional notation. In particular, letting an = f (tn , yn ) slope at left endpoint 1 1 bn = f (tn + 2 h , yn + 2 han ) slope at midpoint (7.3.18) cn = f (tn + 12 h , yn + 12 hbn ) updated slope at midpoint dn = f (tn + h , yn + hcn ) slope at right endpoint we can replace the expression 4f (tn + h /2, yn + h /2an ) in (7.3.17) with the more accurate estimate 2bn + 2cn , and replace f (tn+1 , yn + han ) with f (tn+1 , yn + hcn ); each of these updates takes advantage of the most recent calculation of the approximate slope at points nearby. We thus arrive at the fourth-order Runge– Kutta method by setting yn+1 = yn + y to ﬁnd h (7.3.19) yn+1 = yn + (an + 2bn + 2cn + dn ) 6 where an , bn , cn , and dn are deﬁned as at (7.3.18). Again, through a lengthy development involving complicated calculations, it can be established rigorously that (7.3.19) is a fourth-order approximation technique: the resulting truncation error in the approximation is proportional to h 5 . The next example demonstrates the remarkable accuracy of the Runge– Kutta method. Example 7.3.2 Execute ten steps of the fourth-order Runge–Kutta method with h = 0.1 to ﬁnd an approximate solution of the initial-value problem y(0) = 1 y = e 2t − y , Compare the results to those of the second-order Taylor method. Solution. This is the same IVP as we considered in example 7.3.1. Recall that the exact solution to the problem is y(t ) = 2/3e −t + 1/3e 2t . To implement the Runge–Kutta method, we use f (t , y) = e 2t − y and compute an , bn , cn , and dn as given by (7.3.18). Using the initial condition (t0 , y0 ) = (0, 1), we compute a0 = f (t0 , y0 ) = f (0, 1) = e2·0 − 1 = 0 h ha0 b0 = f t0 + , y0 + = f (0.05, 1 + 0.05 · 0) = f (0.05, 1) 2 2 = e 2·0.05 − 1 = 0.105170918 h hb0 c0 = f t0 + , y0 + = f (0.05, 1 + 0.05 · 0.105170918) 2 2 = f (0.05, 1.005258546) = e2·0.05 − 1.005258546 = 0.099912372

d0 = f (t1 , y0 + hc0 ) = f (0.1, 1 + 0.1 · 0.099912372) = f (0.1, 1.009991237) = e2·0.1 − 1.009991237 = 0.211411521

438

Numerical methods for differential equations

Table 7.4 Fourth-order Runge–Kutta method and second-order Taylor’s method applied to the IVP y = 2t(2 − y), y(0) = 1 using h = 0.1 Runge–Kutta (RK)

Solution

RK error

Taylor error

tn

yn

y(tn )

|y(tn ) − yn |

|y(tn ) − yn |

0

1

1

0

0

0.2

1.043096313

1.043095401

0.000000912

0.000798112

0.4

1.188729047

1.188727007

0.000002040

0.001976353

0.6

1.472583611

1.472580065

0.000003546

0.003699992

0.8

1.950569107

1.950563451

0.000005656

0.006223842

1

2.708280362

2.70827166

0.000008701

0.009934023

and therefore h y1 = y0 + (a0 + 2b0 + 2c0 + d0 ) 6 0.1 = 1+ (0 + 0.210341836 + 0.199824744 + 0.211411521) 6 = 1.010359635 Implementing these same calculations for subsequent steps, we can generate the output displayed in table 7.4, where again we report the results from every other step. The error from Taylor’s method is being reported from table 7.3. In table 7.4 we can see the exceptional accuracy of the fourth-order Runge–Kutta method. In one sense, this is not surprising. Being a fourth-order method, we expect the error in the ﬁrst step to be proportional to h 5 = (0.1)5 = 0.00001, which is in contrast to the second-order Taylor’s method with error proportional to h 3 = 0.001. In each method, the errors are in fact much smaller; one reason why this is so can be understood by thinking about the coefﬁcient 1/5! = 1/120 that arises in the Taylor expansion of y(t0 + h) and multiplies h 5 . What can be considered surprising about the Runge–Kutta method is that it generates such signiﬁcant accuracy through a relatively limited number of computations and by only evaluating the function f (t , y) from the IVP at a select number of points, without the need to compute higher order derivatives. Fundamentally, the method takes four actual or approximate slopes and computes a weighted average of them in order to predict the next value of the solution function y(t ). This fourth-order Runge–Kutta method is so accurate that it is used as the standard plotting tool in Maple when using the DEplot command. In addition, if we command Maple to produce a

Methods for systems and higher order equations

439

numerical estimate to the solution of a stated IVP, the standard option in the dsolve command is a slightly more sophisticated algorithm known as the Runge–Kutta–Fehlberg method. Exercises 7.3 In exercises 1–10, use (a) the second-order Taylor’s method and (b) the fourth-order Runge–Kutta method to estimate y(1) using h = 0.1, and compare the approximations generated by the methods. In exercises 1–6, compare the approximations with the exact solution. Each IVP in exercises 1–10 is identical to those in exercises 1–10 in section 7.2. 1. y + 2ty = 0, 2.

y

= 2y − 1,

3. y − y = 0, 4.

(y )2 − 2y tyy

y(0) = 2 y(0) = 2 y(0) = 0

= −1 − y 2 ,

7. y + ty = t 2 , 8.

y(0) = 2

= 0,

5. y − y 2 = 1, 6.

y(0) = −2

y + y2

= t,

y(0) = 2

y(0) = 1 y(0) = 1

9. y + sin y = 2e −t , √ 10. y = 2e t /2 sin y,

y(0) = 0 y(0) = 0

7.4 Methods for systems and higher order equations

In section 6.4, we introduced an extension of Euler’s method for estimating the solution to nonlinear IVPs such as x = 9y − y 2 , y = x,

x(0) = 1 y(0) = 8

(7.4.1)

We again choose to use the notation x = [x y ]T rather than [x1 x2 ]T because we will be using subscripts to label approximations to the component solutions x(t ) and y(t ): for instance, x1 ≈ x(t1 ), where t1 = t0 + h. Recalling that x and y are each implicit functions of t , we can view (7.4.1) in the form x = f (x , y , t ), y = g (x , y , t ),

x(t0 ) = x0 y(t0 ) = y0

(7.4.2)

For a single initial-value problem y = f (t , y), y(0) = y0 , we have developed a variety of methods for estimating the solution, including Euler’s method, Heun’s method, and Runge–Kutta, in order of increasing accuracy. We will generalize each of these methods to the situation for systems, leaving it as an exercise for

440

Numerical methods for differential equations

the reader to consider other alternatives, such as the Modiﬁed Euler method. Throughout, we keep in mind that for a single IVP, every method has the form yn+1 = yn + y where y is an estimate that is obtained by taking the step-size h times some approximation of the slope of the solution y at or near (tn , yn ). Because Euler’s method is the simplest, we begin there. 7.4.1 Euler’s method for systems

Recall that for a single IVP y = f (t , y), y(0) = y0 , Euler’s method is given by the algorithm yn+1 = yn + hf (tn , yn ) (7.4.3) where tn+1 = tn + h, given a step-size h. As was shown in section 6.4, to implement Euler’s method for a system of two IVPs in the form (7.4.2), for the step from the approximation (xn , yn ) to the approximation (xn+1 , yn+1 ), we compute xn+1 = xn + h · f (tn , xn , yn ) (7.4.4) yn+1 = yn + h · g (tn , xn , yn ) Viewed from a vector perspective, if we let x f (t , x , y) and F(t , x) = x= y g (t , x , y) it follows that Euler’s method for systems is given by the rule x (n+1) = x (n) + hF(tn , x (n) )

(7.4.5)

We use the superscript x (n) ≈ x(tn ) to denote the approximation since subscripts on vectors often indicate particular entries in the vector. In section 6.4, we saw evidence that Euler’s method is not very effective because of the errors that arise. To demonstrate this further, we consider an example involving a linear system whose solution we know exactly. Example 7.4.1 Use Euler’s method with h = 0.1 to estimate the solution x(1) to the initial-value problem 2 2 −1 x = x , x(0) = 0 −2 −1 Compare the results to the exact solution. Solution. Using established methods from chapter 3, it is straightforward to show that the solution to the given IVP is cos 2t x(t ) = 2e −t sin 2t

Methods for systems and higher order equations

441

To estimate this solution via Euler’s method, we ﬁrst observe that 2 x −1 −x + 2y x = F(t , x) = = −2 −1 y −2x − y To compute x (1) ≈ x(t1 ), we use (7.4.5) and write 2 2 2 −1 x (1) = x (0) + hF(0, x (0) ) = + 0.1 0 −2 −1 0 2 1.8 −2 + 0.1 = = 0 −4 −0.4 Continuing Euler’s method in this manner for the subsequent nine steps with h = 0.1 to estimate x(1), we ﬁnd the results shown in table 7.5, where the values from every other step are reported. The ﬁnal column in table 7.5 merits some discussion. Since our exact solution is a vector function and the approximate solutions are also vectors, the error at each stage is given by the vector e(n) = |x(tn ) − x (n) |, where | · | denotes the absolute value function. The size of a vector can be measured by a single number, its length (or magnitude or norm), which is computed by taking the 3 square root + of the sum of the squares of its entries. For a vector x ∈ R , its length is x = (x12 + x22 + x32 ), where x1 , x2 , and x3 are the entries in x. The entries in Table 7.5 Euler’s method applied to the IVP in example 7.4.1 using h = 0.1

tn

0 0.2 0.4 0.6 0.8 1

Euler’s method

Exact solution

Euler error

x(n)

x(tn )

x(tn ) − x(n)

2 0 1.54 −0.72 0.9266 −1.1088 0.314314 −1.187352 −0.18542494 −1.02741408 −0.512646273 −0.724355863

2 0 1.508201923 −0.637657545 0.934032947 −0.961716336 0.397732304 −1.023027791 −0.026240382 −0.898274743 −0.306183731 −0.669023658

0.000000000 0.088268894 0.147271358 0.184285265 0.204979735 0.213748529

442

Numerical methods for differential equations

the ﬁnal column in table 7.5 are computed by taking the length of the vector e(n) which is the difference between the exact solution and the approximate solution at step n. For example, the error that is present at the second step is / / / / / / / 1.54 1.5082 / / = / 0.03180 / e(1) = / − / / / −0.72 −0.6376 −0.08234 / =

(0.03180)2 + (−0.08234)2 = 0.08827

which is the second entry in the third column of table 7.5. Clearly, the errors in Euler’s method are signiﬁcant. From our earlier work with Heun’s method and the Runge–Kutta method, we expect that we can attain much better approximations by using analogous approaches for systems. We consider Heun’s method next. 7.4.2 Heun’s method for systems

From our most recent work, we know that if we view a system of IVPs from the perspective of vector functions, we are trying to estimate the solution to x = F(t , x),

x(t0 ) = x0

and that from this point of view, the vector version of Euler’s method is x (n+1) = x (n) + hF(tn , x (n) ) Recalling that Heun’s method for a single differential equation is given by the rule h (7.4.6) yn+1 = yn + (an + bn ) 2 where an = f (tn , yn ) and bn = f (tn+1 , yn + han ), we realize that the vector analog of (7.4.6) is h (7.4.7) x (n+1) = x (n) + (a(n) + b(n) ) 2 where a (n) and b(n) are given by a (n) = F(tn , x (n) ) and b(n) = F(tn+1 , x (n) + ha (n) )

(7.4.8)

In order to compare and contrast the vector version of Heun’s method with Euler’s method, we consider the following example which builds upon example 7.4.1. Example 7.4.2 Use Heun’s method with h = 0.1 to estimate the solution x(1) to the initial-value problem 2 2 −1 x = x , x(0) = 0 −2 −1 Compare the results to the exact solution and to those from Euler’s method in example 7.4.1.

Methods for systems and higher order equations

443

Solution.

We are considering the IVP 2 x x 2 −x + 2y −1 x = F(t , x) = = , x(0) = 0 = 0 −2 −1 y −2x − y y0

To compute x (1) ≈ x(0.1) by Heun’s method, we ﬁrst compute 2 −1 a(1) = F(t0 , x (0) ) = x (0) −2 −1 2 2 −1 −2 = = 0 −2 −1 −4 Next, to determine b(1) we write (1)

b

2 −1 = F(t0 , x + ha ) = (x (0) + ha (0) ) −2 −1 2 2 + 0.1 · (−2) −2.6 −1 = = 0 + 0.1 · (−4) −2 −1 −3.2 (0)

(0)

Finally, we determine x (1) = x (0) + h /2(a (1) + b(1) ) to ﬁnd 2 1.77 −2 −2.6 (1) x = + 0.05 + = 0 −4 −3.2 −0.36 Updating our work and computing the subsequent approximations results in the values for x (2) , . . . , x (10) shown in table 7.6, where we also display the errors computed in table 7.5 for Euler’s method applied to the same IVP. It is apparent from table 7.6 that just as Heun’s method for a single IVP is a substantial improvement over Euler’s method, it is also better for systems. At the same time, knowing that even higher order methods such as Runge–Kutta are available, we aspire to develop even more accurate methods for systems by converting the Runge–Kutta method for a single DE to one for systems. 7.4.3 Runge–Kutta method for systems

Recall that for the single ﬁrst-order IVP y = f (t , y), y(t0 ) = y0 , the fourth-order Runge–Kutta method is given by h yn+1 = yn + (an + 2bn + 2cn + dn ) 6

(7.4.9)

an = f (tn , yn ) bn = f tn + 12 h , yn + 12 han cn = f tn + 12 h , yn + 12 hbn dn = f (tn + h , yn + hcn )

(7.4.10)

where

444

Numerical methods for differential equations

Table 7.6 Heun’s method applied to the IVP in example 7.4.2 using h = 0.1

tn

0 0.2 0.4 0.6 0.8 1

Heun

Solution

Heun error

Euler error

x(n)

x(tn )

x(tn ) − x(n)

x(tn ) − x(n)

2 0 1.50165 −0.6372 0.924464441 −0.95685138 0.389258164 −1.012962308 −0.03046503 −0.884575076 −0.304699526 −0.654454923

2 0 1.508201923 −0.637657545 0.934032947 −0.961716336 0.397732304 −1.023027791 −0.026240382 −0.898274743 −0.306183731 −0.669023658

0

0

0.006567879

0.088268894

0.010734249

0.147271358

0.013157697

0.184285265

0.014336266

0.204979735

0.014644143

0.213748529

Just as with Euler’s method and Heun’s method, we can develop the vector analog of the Runge–Kutta method. We do so by letting # h " (n) a + 2b(n) + 2c(n) + d(n) (7.4.11) x (n+1) = x (n) + 6 where a (n) = F tn , x (n) b(n) = F tn + 12 h , x (n) + 12 ha (n) (7.4.12) c(n) = F tn + 12 h , x (n) + 12 hb(n) d(n) = F tn + h , x (n) + hc(n) The computations for the Runge–Kutta method for systems can be implemented in a way very similar to those for Heun’s method. Doing so and applying the Runge–Kutta method to the IVP stated in examples 7.4.1 and 7.4.2 results in the values shown in table 7.7; we also display the error from Heun’s method by way of contrast. As with single IVPs, the results of the Runge–Kutta method for systems are impressive. This is again due to the fact that the Runge–Kutta method is fourth-order, while Heun’s method is only second-order. We close this section by recalling the important link between higher order differential equations and systems of ﬁrst-order equations.

Methods for systems and higher order equations

445

Table 7.7 Runge–Kutta method applied to the IVP in example 7.4.2 using h = 0.1

tn

0 0.2 0.4 0.6 0.8 1

RK

Solution

RK error

Heun error

x(n)

x(tn )

x(tn ) − x(n)

x(tn ) − x(n)

2 0 1.508211151 −0.637671316 0.934038085 −0.96174299 0.397725368 −1.023060398 −0.026261217 −0.89830458 −0.306215262 −0.66904348

2 0 1.508201923 −0.637657545 0.934032947 −0.961716336 0.397732304 −1.023027791 −0.026240382 −0.898274743 −0.306183731 −0.669023658

0

0

0.00001658

0.006567879

0.00002714

0.010734249

0.00003334

0.013157697

0.00003639

0.014336266

0.00003724

0.014644143

7.4.4 Methods for higher order IVPs

We have repeatedly used the fact that any linear nth-order differential equation can be converted to a system of linear ﬁrst-order equations. For example, given a second-order equation such as y + 2y − 3y = sin t , we know that with the substitution x1 = y, x2 = y , it follows that x = [x1 x2 ]T is a solution to the system of differential equations x1 = x2 x2 = 3x1 − 2x2 + sin t Given our current interest in approximating solutions to initial-value problems, we are particularly focused on nonlinear equations, including g θ + sin θ = 0, θ (0) = a , θ (0) = b L which governs the motion of a simple undamped pendulum, as developed in section 6.1. In this setting, we are unable to determine an exact solution, and thus wish to generate an approximate one. More generally, we want to be able to develop an approximate solution to any nonlinear IVP. In the second-order case, we can view this problem as having the form y = f (t , y , y ),

y(0) = a , y (0) = b

(7.4.13)

446

Numerical methods for differential equations

We introduce the substitution z = y , then z = y = f (t , y , y ) = f (t , y , z), so that (7.4.13) may be rewritten as the system of IVPs y = z, y(0) = a (7.4.14) z = f (t , y , z), z(0) = b Letting x = [y z ]T and F(t , x) = [z f (t , y , z)]T , we may rewrite (7.4.14) in the form a x = F(t , x), x(0) = b which is precisely the form we considered for Euler’s method, Heun’s method, and the Runge–Kutta method for systems. That is, once we have converted a higher order IVP to a system of ﬁrst-order IVPs, we may choose from any of our existing approximation methods for systems of DEs. We demonstrate this for a particular example using Heun’s method. Example 7.4.3 Use Heun’s method to estimate the solution y(t ) from t = 0 to t = 1 to the second-order IVP y + 0.1y + 4 sin y = 0, y(0) = 1, y (0) = 0 with step-size h = 0.1. Solution. We begin by letting z = y , so that z = y = −4 sin y − 0.1y = −4 sin y − 0.1z. Writing x = [y z ]T , it follows that z x = = F(t , x) −4 sin y − 0.1z Recalling Heun’s method, we must compute h x (n+1) = x (n) + (a(n) + b(n) ) 2 where a (n) = F(tn , x (n) ) and b(n) = F(tn+1 , x (n) + ha (n) ) With the initial condition x (0) = [1 0], we ﬁrst ﬁnd that 0 0 a (0) = = −4 sin(1) − 0.1 · 0 −3.366 from which it follows that −0.3366 (0) (0) (0) b = F(0.1, x + ha ) = −3.332 Therefore, x (1) is given by h x (1) = x (0) + (a(0) + b(0) ) 2 0.1 1 0 −0.3366 = + + 0 −3.366 −3.332 2 0.98317 = −0.33490

Methods for systems and higher order equations

447

Table 7.8 Heun’s method applied to the second-order IVP in example 7.4.3 using h = 0.1 n

x(n)

a(n)

1 0 0.933202302 2 −0.659006349 0.740589862 4 −1.240510452 0.445309489 6 −1.663161107 0.088048126 8 −1.840715689 −0.276080886 10 −1.728307853

0

0 −3.365883939 −0.659006349 −3.148220455 −1.240510452 −2.574842511 −1.663161107 −1.556632719 −1.840715689 −0.167666053 −1.728307853 1.263178986

x(n+1)

b(n)

−0.336588394 −3.3322251 −0.973828394 −2.952961961 −1.497994703 −2.163059344 −1.818824379 −0.919669919 −1.857482294 0.569252015 −1.601989955 1.896140173

0.98317058 −0.334905452 0.851560565 −0.96406547 0.603664604 −1.477405545 0.271210214 −1.786976239 −0.096861773 −1.820636391 −0.442595776 −1.570341895

Executing similar computations for the remaining nine steps to approximate x(1), we ﬁnd the results shown in table 7.8. From the results of table 7.8, we see that −0.276080886 (10) x(1) ≈ x = −1.728307853 Recalling that x(t ) = [y(t ) z(t )]T and that our ultimate goal is to estimate the solution y(t ) to the stated IVP, it follows that y(1) ≈ −0.2761. The approach in example 7.4.3 can be implemented for higher order initialvalue problems through a substitution to convert a given higher order equation to a system of ﬁrst-order ones. More accurate results may be obtained through applying the fourth-order Runge–Kutta method for systems. We note particularly that not only can we estimate solutions to nonlinear equations, but even those with non-constant coefﬁcients. For example, solutions to IVPs like y + ty = 10 sin 2t , y(0) = y (0) = 0 can now be approximated. Exercises 7.4 In exercises 1–6, (a) use Euler’s method for systems with h = 0.1 to estimate the solution x(1) to the initial-value problem, (b) use Heun’s method

448

Numerical methods for differential equations

for systems with h = 0.1 to estimate the solution x(1) to the initial-value problem, and (c) if possible, compare the results to the exact solution. 0 −1 1 1. x = x, x(0) = 1 0 1 1 −1 3 2. x = x, x(0) = 3 1 1 2 1 −1 3. x = x, x(0) = 2 −4 1 t −1 1 4. x = x, x(0) = 1 0 0 0 −1 1 1 5. x = x+ , x(0) = 1 0 t 0 1 −1 1 0 6. x = x+ , x(0) = t 0 1 0 In exercises 7–13, (a) use Heun’s method and (b) use the Runge–Kutta method to estimate the solution of the system of IVPs at the given t -value using the stated h-value. 7. x = y − 2xy , x(0) = 0.75 y = 4xy − x , y(0) = 0.5

t = 1, h = 0.1

8. x = 4 − y 2 , x(0) = −2 y = 1 − x + y , y(0) = −1

t = 3, h = 0.05

9. x = cos y , x(0) = 2 y = 1 − sin x , y(0) = 3

t = 1.5, h = 0.1

10. x = 2x − y , x(0) = 1 y = −4x + 2y , y(0) = 1

t = 1.5, h = 0.1

11. x = e −y , x(0) = 0 y = 1/(1 + x 2 ), y(0) = 0

t = 2, h = 0.05

12. x = ln(2 + y), x(0) = −1 y(0) = −0.5 y = x2 + y, 13. x = y − x 2 , x(0) = 1 y = x − 8y 2 , y(0) = 0.75

t = 2, h = 0.1 t = 1, h = 0.05

14. Recall from section 6.1 that the nonlinear system of differential equations W = −0.75W + 0.25MW M = 0.5M − 0.1MW

For further study

449

models the numbers of wolves and moose (each measured in hundreds) in a predator–prey model, where time is measured in years. Assume that at time t = 0 there are 250 moose and 550 wolves present. Estimate the numbers of moose and wolves present at t = 3, 6, and 9 years using a step-size of (a) h = 0.1, and (b) h = 0.05 with both Euler’s method and Heun’s method. In exercises 15–18, (a) convert the given second-order IVP to a system of ﬁrstorder IVPs, (b) use Euler’s method for systems with h = 0.1 to estimate the solution y(1) to the initial-value problem, (c) use Heun’s method for systems with h = 0.1 to estimate the solution y(1) to the initial-value problem, and (d) if possible, compare the results to the exact solution. 15. y + 16y = 2t + 1,

y(0) = y (0) = 0

16. y + 16y = 2 sin 2t , 17. y + 16y 2 = 2 sin 2t ,

y(0) = y (0) = 0 y(0) = y (0) = 0

18. y + 0.2(y )2 + 2y 2 = 4e −t sin t ,

y(0) = y (0) = 0

7.5 For further study 7.5.1 Predator–prey equations

Recall that a predator–prey scenario is modeled by the equations x = 0.6x − 0.3xy y = −0.9x + 0.6xy

x(0) = 2 y(0) = 3

(7.5.1)

(a) Determine the nontrivial equilibrium solution of (7.5.1) and use a computer algebra system to plot the direction ﬁeld of the system in a suitable window containing the equilibrium solution and the given initial condition. (b) Use a computer to implement Heun’s method to estimate the solution (x(t ), y(t )) of (7.5.1) on the interval 0 ≤ t ≤ 20 using h = 0.1. (c) Use your data from (b) to generate two plots: one a parametric plot of the approximate curve (x(t ), y(t )) and the other a simultaneous plot of the separate functions x(t ) and y(t ) on the same coordinate axes. Discuss the behavior of the populations x(t ) and y(t ) over time. (d) Modify your calculations in (b) appropriately to investigate the impact of changing the parameter ‘0.3’ in the ﬁrst equation to each of the values 0.1, 0.2, 0.4, 0.5, and 0.9. In each case, generate the same plots as instructed in (c). What impact does this have on the behavior of the populations? (e) Modify your calculations in (b) in order to consider the following different initial conditions: x(0) = 1.7, y(0) = 1.8; x(0) = 2.5, y(0) = 3.6; x(0) = 5,

450

Numerical methods for differential equations

y(0) = 1. In each case, generate the same plots as instructed in (c). What impact do the initial conditions have on the behavior of the populations? 7.5.2 Competitive species

In section 6.5.2, we developed the model x = ax 1 − A1 x − αa y " # y = by 1 − B1 y − βb x

(7.5.2)

where a, A, and α are positive constants (a is the population x(t )’s growth constant, A its carrying capacity, and α a parameter that reﬂects the competition for resources from population y(t )). The constants b, B, and β play the same roles for the second population. (a) In (7.5.2), let a = 0.5, b = 0.25, A = 5, B = 2, α = 0.04, and β = 0.02. Find all equilibrium points of the system and plot a direction ﬁeld in a computer algebra system of this system that contains all the equilibrium solutions. (b) Apply Heun’s method to estimate the solution (x(t ), y(t )) of (7.5.2) on the interval 0 ≤ t ≤ 20 using h = 0.1. Plot the trajectory of the approximate solution. (c) Leaving all other parameters the same, change the value of B to B = 8. Repeat questions (a) and (b) and discuss the differences between the results for the two B-values. (d) Repeat question (c) with B = 15. (e) What is the largest value of B for which the two populations can coexist with a stable equilibrium in which each population tends to a nonzero value as t → ∞? What value(s) of B ensure that population y(t ) will dominate as t → ∞ and force x(t ) → 0? (f) For each of the three values of B above, experiment with the impact of the following different sets of initial conditions: x(0) = 1, y(0) = 1; x(0) = 5, y(0) = 1; x(0) = 1, y(0) = 5; x(0) = 5, y(0) = 5. How do the different initial conditions impact the behaviors of the two populations? 7.5.3 The damped pendulum

In section 6.5.1, it was shown that for a pendulum with an arm of length L, bob of mass m, and damping constant c, the angle θ that the arm forms with the vertical axis at time t satisﬁes the IVP L θ = −g sin θ − c θ ,

θ (0) = θ0 , θ (0) = θ0

(7.5.3)

For further study

451

(a) Using the change of variables x = θ , y = x , show that the nonlinear second-order IVP (6.5.2) is equivalent to the system x = y g c (7.5.4) y = − sin x − y L L (b) Apply Heun’s method to estimate the solution (x(t ), y(t )) of (7.5.4) with g = 9.8, L = 1, and c = 1 with initial conditions x(0) = 2, y(0) = 2 on the interval 0 ≤ t ≤ 10 using h = 0.1. Plot the trajectory of the approximate solution. (c) Repeat question (b) using c = 0.1 and c = 5. Discuss the differences in the results. (d) Investigate the effects of changing the initial conditions to the following: x(0) = 2, y(0) = 5; x(0) = 2, y(0) = 15; x(0) = 2, y(0) = −5. Do so for each of the three c-values noted above and discuss the differences among the results and the physical interpretation that explains how the pendulum is behaving.

This page intentionally left blank

8 Series solutions for differential equations

8.1 Motivating problems

In more sophisticated courses in mathematical physics or special functions, a different type of linear differential equation frequently arises from those we have studied to date. From several perspectives, we have thoroughly analyzed the behavior of linear differential equations with constant coefﬁcients of the form y + a1 y + a0 y = f (t ) But there are other important and well-known equations with non-constant coefﬁcients. We list some of these here in anticipation of more in-depth study in subsequent sections. Airy’s equation is a linear second-order equation that arises in physics in the study of light refraction. While it can be stated in a slightly more general form, a good example to begin with is y + ty = 0

(8.1.1)

The explicit presence of the coefﬁcient “t ” in (8.1.1) makes this equation substantially different from those (such as y + y = 0) we have already solved. If we recall the initial approach to solving y + y = 0, we can gain intuition for how to proceed with (8.1.1). We know that guessing y = e rt in y + y = 0 leads to the characteristic equation r 2 + 1 = 0, so that y = e it or y = e −it . We then know from Euler’s formula that both y = sin t and y = cos t arise as linearly independent solutions to y + y = 0. One key characteristic the exponential, sine, and cosine functions have in common is that they can be expressed as inﬁnite power series; indeed, this fact was used to justify the validity of Euler’s formula. 453

454

Series solutions for differential equations

In particular, we can write et = 1 + t + sin t = t −

t2 t3 tn + + ··· + + ··· 2! 3! n!

t3 t5 t 2n+1 + − · · · + (−1)n+1 + ··· 3! 5! (2n + 1)!

(8.1.2) (8.1.3)

t2 t4 t 2n + − · · · + (−1)n + ··· (8.1.4) 2! 4! (2n)! 0 n Each of these expressions for e t , sin t , and cos t is of the form ∞ n =0 an t and is valid for every real number t . In the upcoming chapter, rather than making guesses of the form y = e rt , we instead assume much more generally that enough function to have 0y is a nice n a power series expansion of the form y = ∞ n =0 an t , and then substitute this form of the potential solution function y into the differential equation in order to deduce the coefﬁcients an . Other well-known differential equations that we will consider include the Hermite equation (8.1.5) y − 2ty + 2qy = 0 where q is a constant, the Laguerre equation ty + (1 − t )y + qy = 0 (8.1.6) (again where q is constant), and the Bessel equation cos t = 1 −

t 2 y + ty + (t 2 − n 2 )y = 0

(8.1.7)

where n is a constant. Again, in each of (8.1.5), (8.1.6), and (8.1.7), it is the presence of nonconstant coefﬁcient(s) involving t that makes us seek new ways to ﬁnd solutions. Finally, recalling an elementary differential equation from calculus further motivates the importance of inﬁnite series representations of functions. Among the simplest of all ﬁrst-order differential equations are those of the form y = f (t ); these can be solved (in theory) by integrating. But if we consider an example such as y = e −t 2 we are immediately stuck since the function e −t lacks an elementary antiderivative. If we use (8.1.2) and replace t with −t 2 , then we can write t4 t6 t 2n 2 y = e −t = 1 − t 2 + − + · · · + (−1)n+1 + ··· 2! 3! n! Integrating, it follows that t3 t5 t7 t 2n+1 y = C +t − + − + · · · + (−1)n+1 + ··· 3 5 · 2! 7 · 3! (2n + 1) · n ! Hence we are able to determine the general solution function y, although we must be content to leave y in its series representation. Discovering solutions in 2

A review of Taylor and power series

455

this power series form will be typical of the results we obtain in our work in this chapter.

8.2 A review of Taylor and power series

From calculus, we know that if a function has a derivative at a given point t = a, then the function is approximately linear near t = a. Indeed, the existence of the ﬁrst derivative ensures that the function is smooth: the function must be continuous at a and it’s graph cannot have a corner there. Of course, if having one derivative is a good thing, having several derivatives is even better. The best possible scenario of all is that the function is inﬁnitely differentiable at t = a. That is, f (k) (a) exists for every k = 0, 1, 2, . . .. A function that is inﬁnitely differentiable at t = a and at all points in some small open interval containing a is said to be analytic 1 at t = a. If a function fails to be analytic at a given point, we say that f is singular at that point. For example, the rational function t f (t ) = 2 (t + 9)(t − 4) is singular at t = 4 and t = ±3i since it is undeﬁned at these values (as are each of its derivatives). At every other value of t , f (t ) is analytic. Much of the theory of analytic functions is a natural extension of the ideas of Taylor polynomials and Taylor series from calculus. Here our intention is not to develop a complete theory of analytic functions, but rather to remind the reader of important results on Taylor series and extend this perspective slightly in order to suit our purposes. Most results will be stated without proof. To begin, we assume that f is an analytic function at a = 0 and recall that the polynomial functions P0 (t ) = f (0) P1 (t ) = f (0) + f (0)t P2 (t ) = f (0) + f (0)t +

f (0) 2 t 2!

.. .

f (0) 2 f (k) (0) k t + ··· + t (8.2.1) 2! k! are called the Taylor polynomials of f at a = 0 and form the sequence of partial sums of the inﬁnite series f (0) 2 f (k) (0) k P(t ) = f (0) + f (0)t + t + ··· + t + ··· (8.2.2) 2! k! Pk (t ) = f (0) + f (0)t +

1 Usually when analytic functions are discussed, we allow the function to have complex inputs and consider a disk of a given radius around a complex point. For our purposes, a discussion restricted to real values is sufﬁcient.

456

Series solutions for differential equations

In particular, the function Pk (t ) in (8.2.1) is the kth Taylor polynomial of f at a = 0, and the inﬁnite series (8.2.2) is called the Taylor series of f centered at a = 0; the series converges in (8.2.2) if and only if the sequence of partial sums converges. That is, P(t ) is deﬁned if and only if lim Pk (t )

k →∞

exists. If this limit fails to exist, we say that the Taylor series diverges at this point. What is perhaps most remarkable is the fact that wherever the series (8.2.2) converges, it does so to the value of the given analytic function f ; moreover, the Taylor series converges in an interval centered at t = 0 that extends to the nearest singular point. Formally, we have the following theorem. Theorem 8.2.1 Suppose that f (t ) is an analytic function at 0 and R is the distance from 0 to the nearest singular point of f (t ). Then the Taylor series of f (t ) centered at t = 0 converges to f (t ) in the interval |t | < R and diverges in the interval |t | > R. The number R is called the radius of convergence of the Taylor series. We note, too, that it is possible for singular points to be complex, so R is not necessarily the distance from 0 to the nearest real singular point. We also observe speciﬁcally that for any t such that |t | < R, we know f (0) 2 f (k) (0) k t + ··· + t + ··· 2! k! We consider an example to see many of these ideas at work. f (t ) = f (0) + f (0)t +

Example 8.2.1 Find the Taylor series of f (t ) = ln(1 + t ) centered at t = 0 and determine the radius of convergence of the series. Solution. We begin by taking the ﬁrst several derivatives of f and evaluating them at 0: f (t ) = ln(1 + t ) f (0) = ln(1) = 0 f (t ) = (1 + t )−1

f (0) = 1

f (t ) = (−1)(1 + t )−2

f (0) = −1

f (t ) = (−2)(−1)(1 + t )−3

f (0) = 2!

f (4) (t ) = (−3)(−2)(−1)(1 + t )−4 f (4) (0) = −3! From these calculations, we see that the fourth Taylor polynomial is 1 2 2! 3 3! 4 t + t − t 2! 3! 4! 1 1 1 = t − t2 + t3 − t4 2 3 4

P4 (t ) = 0 + 1t −

A review of Taylor and power series

457

The established pattern implies that the Taylor series of f (t ) = ln(1 + t ) is ∞

, 1 1 1 1 (−1)n+1 t n P(t ) = t − t 2 + t 3 − t 4 + · · · = 2 3 4 n n =1

From calculus, the standard way to test a power series for convergence is to use the Ratio Test. Doing so here with an = (−1)n+1 (1/n)t n , we observe that an+1 (−1)n+2 (1/n + 1) t n+1 = lim lim n →∞ an n →∞ (−1)n+1 (1/n) t n n · t = lim −1 · n →∞ n+1 = |t |

The Ratio Test states that a given series converges if limn→∞ |an+1 /an | < 1. Thus, if |t | < 1, it follows that ∞

, 1 1 1 1 (−1)n+1 t n ln(1 + t ) = t − t 2 + t 3 − t 4 + · · · = 2 3 4 n

(8.2.3)

n =1

converges. The result of example 8.2.1 makes further sense in light of theorem 8.2.1 since we know that f (t ) = ln(1 + t ) has a singularity at t = −1. If we substitute t = −1 in (8.2.3), the opposite of the harmonic series arises (−1 − 12 − 13 − 14 −· · · ), which diverges. However, it can be shown by the alternating series test that (8.2.3) does converge when t = 1; indeed, for any power series that converges for |t | < R, it is possible for the series to converge at both t = ±R, neither, or just one of the points. While this is an interesting mathematical topic in its own right, it is largely irrelevant in our discussion of series solutions to differential equations. We next state several prominent Taylor series expansions along with their respective radii of convergence and leave the development and testing of these series for convergence to the exercises at the end of this section. et = 1 + t + sin t = t −

t2 t3 tn + + ··· + + ··· 2! 3! n!

t3 t5 t 2n+1 + − · · · + (−1)n+1 + ··· 3! 5! (2n + 1)!

R=∞ R=∞ (8.2.4)

t2 t4 t 2n + ··· cos t = 1 − + − · · · + (−1)n 2! 4! (2n)!

R=∞

1 = 1 + t + t2 + t3 + ··· + tn + ··· 1−t

R=1

458

Series solutions for differential equations

From these fundamental Taylor series, the series expansions of other related functions may often be easily found. The following example demonstrates one way in which this may be accomplished. Example 8.2.2

Find the Taylor series expansion of t f (t ) = 1 + 4t 2 as well as its radius of convergence. Solution. If we ﬁrst omit the t in the numerator of f (t ), we can use the ﬁnal result from (8.2.4) and substitute −4t 2 for t , writing 1 = 1 + (−4t 2 ) + (−4t 2 )2 + (−4t 2 )3 + · · · + (−4t 2 )n + · · · 1 − (−4t 2 ) = 1 − 4t 2 + 16t 4 − 64t 6 + · · · + (−4)n t 2n + · · ·

(8.2.5)

To get the Taylor series of f (t ), we now multiply both sides of (8.2.5) by t , and have t f (t ) = = t − 4t 3 + 16t 5 + 64t 7 + · · · + (−4)n t 2n+1 + · · · (8.2.6) 1 + 4t 2 Since the original series from (8.2.4) converges for |t | < 1 and we replaced t with −4t 2 , it follows that (8.2.5) converges for | − 4t 2 | < 1, or in other words for |t | < 1/2. Multiplying (8.2.5) by t has no effect on the radius of convergence of the series, and therefore (8.2.6) converges for |t | < 1/2. Note further that the denominator 1 + 4t 2 of f (t ) is zero at t = ±i /2; each of these complex numbers lies a distance of 1/2 unit away from the origin and is a singular point of f . This observation is additional evidence that R = 1/2 is the radius of convergence of the series expansion of f (t ). Similar reasoning may be used to ﬁnd expansions for such functions as e −t , t sin 4t , and (cos t − 1)/t 2 . In each case, the approach of example 8.2.2 is far simpler than using the deﬁnition of Taylor series directly and computing derivatives of the given function. One reason why the development of Taylor series for functions similar to those in (8.2.4) is so straightforward is the fact that Taylor series are unique. Said differently, if we can ﬁnd a power series expression for a given function, it must be the Taylor series. This is stated formally in the following theorem. 2

0 k Theorem 8.2.2 The series ∞ k =0 bk t converges in the interval |t | < R to the function f (t ) if and only if f (t ) is analytic for all t such that |t | < R and

bk =

1 (k) f (0) k!

A review of Taylor and power series

459

0 k An immediate consequence of theorem 8.2.2 is that if ∞ k =0 bk t = 0 for |t | < R, then bk = 0 for all t in the interval. We will use this result frequently when we solve differential equations by equating like coefﬁcients of two equal power series. If we cannot use substitution to ﬁnd a Taylor series expansion (as we did in example 8.2.2), it may be possible to use differentiation or integration to do so. The following example introduces this approach.

Example 8.2.3 Find the Taylor series expansion and radius of convergence of f (t ) = arctan t . Solution. If we were to attempt to ﬁnd the series via the deﬁnition by taking derivatives, we would ﬁnd that the process becomes laborious after computing f (t ) = 1/(1 + t 2 ), since differentiating will involve both the chain and quotient rules. Instead, we observe that 1 f (t ) = 1 + t2 itself has a series expansion that is not difﬁcult to ﬁnd. Similar to our work in example 8.2.2, we use the ﬁnal result in (8.2.4) and substitute −t 2 for t to write f (t ) =

1 = 1 + (−t 2 ) + (−t 2 )2 + (−t 2 )3 + · · · + (−t 2 )n + · · · 1 − (−t 2 ) = 1 − t 2 + t 4 − t 6 + · · · + (−1)n t 2n + · · ·

(8.2.7)

Because we now have a series expansion for f (t ), it is natural to integrate both sides of (8.2.7) to ﬁnd the series for f (t ). Doing so, we see that 1 1 1 (−1)n 2n+1 f (t ) = arctan t = C + t − t 3 + t 5 − t 7 + · · · + · · · (8.2.8) t 3 5 7 2n + 1 It is a straightforward exercise to use the Ratio Test to show that (8.2.8) converges for all t such that |t | < 1. Moreover, since arctan(0) = 0, it follows that C = 0. While intuition guides our work in example 8.2.3, and we certainly know that we can integrate any ﬁnite polynomial, the one step that is perhaps questionable is when we say we will integrate both sides of (8.2.7) to ﬁnd the series for f (t ). That this step is legitimate (and that it preserves the radius of convergence) is the conclusion of our next formal result, the Taylor series Differentiation and Integration Theorem. Theorem 8.2.3 If f (t ) has the Taylor series expansion f (t ) =

∞ ,

k =0

bk t k , |t | < R

460

Series solutions for differential equations

t then its antiderivative F (t ) = 0 f (x) dx and its derivative f (t ) have the respective Taylor series expansions t ∞ ∞ , , bk k +1 bk x k dx = , |t | < R (8.2.9) t F (t ) = k +1 0 k =0

f (t ) =

∞ ,

k =0

k =0 ∞

bk

, d k [t ] dx = kbk t k −1 , |t | < R dt

(8.2.10)

k =1

That is, theorem 8.2.3 states that any power series may be differentiated or integrated term-wise and that doing so does not change the radius of convergence of the power series. This fact makes more reasonable our plan to solve differential equations by letting y be an unknown power series, taking its appropriate derivative(s), and substituting into the differential equation to determine the coefﬁcients in the series. Finally, it is not always possible to determine an explicit expression for the nth coefﬁcient of the Taylor series expansion of a function in terms of n. In this situation, we must be content with knowing the values of the ﬁrst few coefﬁcients. For this type of computation, we sometimes abbreviate the tail end of a power series by writing O(t n ) = cn t n + cn+1 t n+1 + · · ·

(8.2.11)

where we read the notation O(t n ) as “order of t n ”. For instance, we could write t2 + O(t 3 ) 2 The next example emphasizes the fact that we cannot always explicitly determine a formula for the general nth term in the Taylor expansion of a function. et = 1 + t +

Example 8.2.4 Find the ﬁrst four terms of the Taylor series expansion about t = 0 of the function t f (t ) = t e +1 Solution. Because f is the quotient of two functions that are analytic everywhere and the denominator is never zero, it follows that f is analytic everywhere. In particular, f is analytic at a = 0 and, therefore, has a Taylor series expansion there of the form t = b0 + b1 t + b2 t 2 + b 3 t 3 + · · · (8.2.12) t e +1 We know from the standard expansion of e t that et + 1 = 2 + t +

t2 t3 + + ··· 2! 3!

A review of Taylor and power series

461

Multiplying both sides of (8.2.12) by this expression for e t + 1, we obtain the identity t2 t3 t = 2 + t + + + · · · b0 + b1 t + b2 t 2 + b3 t 3 + · · · 2! 3! Distributing to multiply these two series, we ﬁnd that b0 2 b1 b0 3 t = 2b0 + (2b1 + b0 )t + 2b2 + b1 + t + 2b3 + b2 + + t + ··· 2 2 6 In order for this identity to hold, the uniqueness of Taylor series expansions established in theorem 8.2.2 implies that all of the coefﬁcients of powers of t on the left must equal the corresponding coefﬁcients of powers of t on the right. In particular, it must be the case that 0 = 2b0 1 = 2b1 + b0 1 0 = 2b2 + b1 + b0 2 1 1 0 = 2b3 + b2 + b1 + b0 2 6 From this sequence of equalities, it follows that b0 = 0, b1 = 1/2, b2 = −1/4, and b3 = 0, so that t 1 1 = t − t 2 + 0t 3 + · · · f (t ) = t e +1 2 4 Exercises 8.2 In exercises 1–4, determine the radius of convergence of the stated power series. 1.

∞ n , t

n =1

2.

∞ n n , 2 t

n =1

3.

n!

∞ 2 , n (t − 2)n

n =1

4.

n

5n

∞ , (n !)2 (t + 3)n

n =1

(2n)!

In exercises 5–17, ﬁnd the ﬁrst four nonzero coefﬁcients of the Taylor series expansion for each function f (t ) about a = 0. In addition, state the radius of

462

Series solutions for differential equations

convergence of the series expansion. Wherever possible, use known expansions and the techniques of examples 8.2.2, 8.2.3, and 8.2.4. √ 5. f (t ) = t + 1 6. f (t ) = t 3 + 5t 2 − 3t + 8 7. f (t ) =

1 1 + t4

8. f (t ) = e −t

2

e 2t − 1 2t sin t 10. f (t ) = t 9. f (t ) =

11. f (t ) = t 3 sin t 2 12. f (t ) = cos t 3 13. f (t ) = cos t sin t 14. f (t ) = cos2 (t ) 15. f (t ) = e −t sin t 16. f (t ) =

et 1+t

17. f (t ) = arctan t 2 In exercises 18–24, ﬁnd the ﬁrst four nonzero coefﬁcients of the Taylor series expansion for each integral by ﬁrst ﬁnding the expansion of the integrand and then integrating term by term.2 t 1 18. ds 4 0 1+s t 2 19. e −s ds

0 t

e 2s − 1 ds 2s

t

sin s ds s

20.

0

21.

0 t

22.

s 3 sin s 2 ds

0 2

Your work in exercises 5–17 will be helpful.

Power series solutions of linear equations

t

23.

463

cos s 3 ds

0 t

24.

arctan s 2 ds

0

8.3 Power series solutions of linear equations

In this section, we begin solving linear differential equations by assuming that the solution function may be expressed as a power series. To motivate our work, we revisit a familiar ﬁrst-order equation (which we can solve easily by other means) to explore how series can be used in this way. Example 8.3.1 By assuming that y has a power series expansion of the form y(t ) = a0 + a1 t + a2 t 2 + a3 t 3 + · · · , determine the solution to the initial-value problem y = y, y(0) = 1 Writing y(t ) = a0 + a1 t + a2 t 2 + a3 t 3 + · · · , we know y (t ) = a1 + 2a2 t + 3a3 t 2 + 4a4 t 3 + · · · Equating y and y , we observe that a0 + a1 t + a2 t 2 + a3 t 3 + · · · = a1 + 2a2 t + 3a3 t 2 + 4a4 t 3 + · · · (8.3.1) Because of the uniqueness of Taylor series expansions (theorem 8.2.2), we may equate like coefﬁcients of powers of t in (8.3.1), from which we deduce that the following recurrence relation among the coefﬁcients ai must hold: a 0 = a1 Solution.

a1 = 2a2 a2 = 3a3 .. .

an = (n + 1)an+1 Provided that we know a0 , we can ﬁnd all of the remaining values of ai . Clearly, a0 = y(0), so using the initial condition y(0) = 1, 1 1 1 , ... a0 = 1, a1 = 1, a2 = , a3 = a2 = 2 3 3·2 From this sequence of coefﬁcients and the general recurrence relation an+1 = 1 1 n +1 an , we observe that an = n ! , and therefore 1 1 1 y(t ) = 1 + t + t 2 + t 3 + · · · + t n + · · · 2! 3! n! which we recognize as the familiar power series expansion of y(t ) = e t , the solution to the IVP y = y, y(0) = 1.

464

Series solutions for differential equations

Obviously there is no need to use power series to solve the IVP given in example 8.3.1, as it is a standard linear ﬁrst-order equation. However, given our desire to solve higher order equations that are linear, but for which we currently lack a method for obtaining an analytic solution, this example is important since we hope to generalize from the simpler ﬁrst-order constant coefﬁcient case to the more difﬁcult second-order non-constant coefﬁcient one. For example, a linear second-order differential equation such as y − 2ty + y = 0 (8.3.2) in which the coefﬁcients of y, y , and y are not all constant is not among the collection of equations whose solutions we can currently determine. Equations such as (8.3.2) belong to a family of equations of the general form y + p(t )y + q(t )y = f (t ) (8.3.3) that we now aspire to solve. Before we solve equations of form (8.3.3), we consider one more familiar example that introduces other critical ideas that arise when solving linear second-order equations through power series expansions. Because we already know the solution to the equation we consider, we will be able to check our work appropriately and better see the role that series expansions play. Example 8.3.2

Solve the initial-value problem y + y = 0,

y (0) = 1

y(0) = 1,

by assuming that y has a power series expansion y(t ) = a0 + a1 t + a2 t 2 + a3 t 3 + a4 t 4 + · · · . Solution.

Since y = a0 + a1 t + a2 t 2 + a3 t 3 + a4 t 4 + · · · , it follows that y = a1 + 2a2 t + 3a3 t 2 + 4a4 t 3 + · · · and y = 2a2 + 3 · 2a3 t + 4 · 3a4 t 2 + 5 · 4a5 t 3 + · · ·

Substituting for y and y in the given equation y + y = 0, we have (a0 + a1 t + a2 t 2 + a3 t 3 + a4 t 4 +··· ) + (2a2 + 6a3 t + 12a4 t 2 + 20a5 t 3 +··· ) = 0 Gathering terms with like coefﬁcients, (a0 + 2a2 ) + (a1 + 6a3 )t + (a2 + 12a4 )t 2 + (a3 + 20a5 )t 3 + · · · = 0

(8.3.4)

Setting each coefﬁcient of powers of t in (8.3.4) equal to zero implies that the following sequence of equalities holds: a0 = −2a2

a1 = −6a3

a2 = −12a4

a3 = −20a5

a4 = −30a6

a5 = −42a7

.. .

a2n = −(2n + 2)(2n + 1)a2n+2

.. .

a2n+1 = −(2n + 3)(2n + 2)a2n+3

Power series solutions of linear equations

465

We group these equations into the two columns shown for the natural reason that the coefﬁcients with even indices depend recursively on one another, as do the coefﬁcients with odd indices. Furthermore, we see that if we can identify both a0 and a1 (which we can through the two stated initial conditions), then we can determine all of the remaining coefﬁcients. Speciﬁcally, since y(0) = 1 and a0 = y(0), it follows that a0 = 1. Similarly, with the given condition y (0) = 1 and the fact that a1 = y (0), we know a1 = 1. Thus, from the sequence of equalities with even indices above, 1 1 1 1 = , a0 =1, a2 = − , a4 = − a2 = 2 4·3 4 · 3 · 2 4! 1 1 1 =− and a6 = − a4 = − 30 6 · 5 · 4! 6! From this and the stated recurrence relation for a2n and a2n+2 , we observe that 1 , n = 0, 1, 2, . . . . (8.3.5) a2n = (−1)n (2n)! The formula (8.3.5) implies that the portion of the series expansion for y in which all of the powers of t are even will be 1 1 1 y1 = 1 − t 2 + t 4 − t 6 + · · · (8.3.6) 2! 4! 6! which we recognize as the familiar series expansion for cos t . Returning to the recurrence relation involving the coefﬁcients with odd indices, nearly identical work to that with the even coefﬁcients shows that 1 1 1 1 1 a3 = − , a5 = − a3 = , and a7 = − a4 = − 3! 5·4 5! 42 7! These observations imply that the part of the expansion of y involving odd coefﬁcients has form 1 1 1 y2 = t − t 3 + t 5 − t 7 + · · · (8.3.7) 3! 5! 7! which is sin t . Hence our work with series expansions at (8.3.6) and (8.3.7) has shown that 1 1 1 1 1 1 y = 1 + t − t2 − t3 + t4 + t5 − t6 − t7 + ··· 2! 3! 4! 5! 6! 7! 1 2 1 4 1 6 1 3 1 5 1 7 = 1 − t + t − t + ··· + t − t + t − t + ··· 2! 4! 6! 3! 5! 7! = cos t + sin t (8.3.8) a 1 = 1,

Again, it is no surprise that y = cos t + sin t is the solution to the IVP y + y = 0, y(0) = 1, y (0) = 1. We know from our work in several different contexts that the general solution to this differential equation is y = c1 cos t + c2 sin t , and can easily see that the given two initial conditions lead to c1 = c2 = 1. Even without the initial conditions, we could have determined from our work in example 8.3.2 that y = a0 cos t + a1 sin t . Regardless, there is a great deal we can learn about

466

Series solutions for differential equations

series solutions to differential equations by thinking carefully about our work in this familiar example. First, we saw that in order to get the recurrence relations started, we needed to know the values of a0 and a1 . This reinforces the fact that the solution space to the second-order equation is two dimensional, and suggests that the power series expansion has the property that it detects the need for two linearly independent solutions. Next, we observe from our work in example 8.3.2 that two different unlinked series solutions arose in the solution; these turned out to be the expansions for the cosine and sine functions, respectively, each of which has an inﬁnite radius of convergence. This led to the overall solution series being convergent for every value of t . Finally, we note that normally we will need to be content with expressions that state the ﬁrst few nonzero terms of a power series expansion, as we cannot expect in general to be able to recognize familiar power series expansions within solutions, as we did at (8.3.8). In general, we will be interested in linear differential equations of the form y + p(t )y + q(t )y = 0 (8.3.9) If p(t ) and q(t ) are both analytic functions at t = a (that is, both have a Taylor expansion at a), then we call t = a an ordinary point of the DE (8.3.9). Otherwise, t = a is a singular point of (8.3.9). The following theorem tells us that if t = 0 is an ordinary point of (8.3.9), then there exist two linearly independent solutions to the DE that may be represented by Taylor series centered at t = 0. Theorem 8.3.1 If t = 0 is an ordinary point of (8.3.9), then there exist two linearly independent solutions y1 (t ) =

∞ ,

n =0

an t n and y2 (t ) =

∞ ,

bn t n

(8.3.10)

n =0

Both series converge in a disk |t | < R, where R is at least as large as the distance from the origin to the nearest singular point of the functions p(t ) and q(t ). In example 8.3.2, the coefﬁcient functions of y and y in the DE were simply the constant functions 0 and 1, which are each analytic everywhere. Theorem 8.3.1 implies that the two series expansions we found (which were those of the cosine and sine functions) must therefore converge everywhere. We see from this result that anytime the coefﬁcient functions p(t ) and q(t ) are constant, the solution functions that arise must converge everywhere. This is not surprising, given our experience that in the case of linear differential equations with constant coefﬁcients, solutions essentially consist of the functions e kt , sin kt , and cos kt . More generally, we can now state that if p(t ) and q(t ) are polynomial functions, which are also analytic everywhere, then the series in (8.3.10) must both converge everywhere. We now consider an example involving a differential equation that we are unable to solve by other means in order to gain more understanding of the role played by inﬁnite series in its solution.

Power series solutions of linear equations

467

Example 8.3.3 Consider the linear second-order differential equation y − 2ty + y = 0 (8.3.11) Determine two linearly independent series solutions to this equation. Then, solve the initial-value problem given by this DE along with the initial conditions y(0) = 2, y (0) = −1. Solution. We begin by assuming that y = a0 + a1 t + a2 t 2 + a3 t 3 + · · · . From this, it follows ∞ , = nan t n−1 y = a1 + 2a2 t + 3a3 t 2 + 4a4 t 3 + · · · n =1

−2ty = −2at − 4a2 t 2 − 6a3 t 3 − 8a4 t 3 + · · · = −

∞ ,

2nan t n

n =1

y = 2a2 + 6a3 t + 12a4 t 2 + 20a5 t 3 + · · · =

∞ ,

n(n − 1)an t n−2

n =2

In many instances, it will be most convenient to work with power series represented in the shorthand sigma ( ) notation, which is how we will proceed from here. Substituting in (8.3.11) with the series expressions for y , −2ty , and y, we ﬁnd ∞ ∞ ∞ , , , n −2 n n(n − 1)an t − 2nan t + an t n = 0 (8.3.12) n =1

n =2

n =0

In order to equate the coefﬁcients of like powers of t , it is helpful to write each series in (8.3.12) using the same indices for the sum. Replacing n with n + 2 allows us to write ∞ ∞ , , n(n − 1)an t n−2 = (n + 2)(n + 1)an+2 t n n =2

n =0

In addition, observe that ∞ ,

2nan t n =

n =1

∞ ,

2nan t n

n =0

because the term −2nan vanishes when n = 0. Therefore we can revise (8.3.12) to have the form ∞ ∞ ∞ , , , (n + 2)(n + 1)an+2 t n + −2nan t n + an t n = 0 (8.3.13) n =0

n =0

n =0

Now that each series is indexed from n = 0 with corresponding powers of t , we can combine the three sums into one and write ∞ , [(n + 2)(n + 1)an+2 − 2nan + an ]t n = 0 (8.3.14) n =0

468

Series solutions for differential equations

Because (8.3.14) implies that every coefﬁcient of the series must be zero, we see that the constants an must satisfy the recurrence relation (n + 2)(n + 1)an+2 − 2nan + an = 0 or equivalently 2n − 1 an+2 = (8.3.15) an , n = 0, 1, 2, . . . . (n + 2)(n + 1) Here it is essential to observe that since the subscripts differ by two in (8.3.15), we can obtain two distinct series solutions to the original equation (8.3.11), one involving all of the even terms and the other all of the odd ones. In particular, considering n = 0, 2, 4, . . ., we have from (8.3.15) that −1 · 3 −1 · 3 · 7 1 3 7 a 2 = − a 0 , a4 = a2 = a0 , and a6 = a4 = a0 2 3·4 2·3·4 6·5 2·3·4·5·6 More generally, the pattern −1 · 3 · 7 · · · (4n − 5) a2n = (2n)! holds and therefore 1 1 7 y1 (t ) = a0 − a0 t 2 − a0 t 4 − a0 t 6 + · · · 2 8 240 ∞ , 1 · 3 · 7 · · · (4n − 5) 2n = a0 − a0 t (8.3.16) (2n)! n =1

Similarly, if we examine the odd terms for n = 1, 3, 5, . . . in (8.3.15), we see 1 5 1·5 9 1·5·9 a3 = a 1 , a5 = a3 = a1 , and a7 = a5 = a1 2·3 4·5 2·3·4·5 6·7 2·3·4·5·6·7 Thus, we ﬁnd 1 · 5 · 9 · · · (4n − 3) a1 a2n+1 = (2n + 1)! and therefore 1 1 y2 (t ) = a1 t + a1 t 3 + a1 t 5 + · · · 6 24 ∞ , 1 · 5 · 9 · · · (4n − 3) 2n+1 = a1 t + a1 t (8.3.17) (2n + 1)! n =1

Because y1 only involves even powers of t and y2 only involves odd powers of t , it is obvious that y1 and y2 must be linearly independent functions: it is impossible for one to be a scalar multiple of the other. Hence we have found the two basic solutions to the given DE and the general solution is y = a0 y1 + a1 y2 1 2 1 2 ∞ ∞ , , 1 · 3 · 7 ··· (4n − 5) 2n 1 · 5 · 9 ··· (4n − 3) 2n+1 = a0 1 − t t + a1 t + (2n)! (2n + 1)! n =1

n =1

Power series solutions of linear equations

469

Moreover, since p(t ) = −2t and q(t ) = 1 are analytic everywhere, it follows from theorem 8.3.1 that both y1 and y2 converge for all values of t , as must the general solution (8.3.18). Finally, if we desire to solve the initial-value problem with y(0) = 2 and y (0) = −1, we need only observe from our beginning assumption regarding the series expansion of y that y(0) = a0 = 2 and y (0) = a1 = −1. Therefore, the solution to the IVP is 1 2 1 2 ∞ ∞ , , 1 · 3 · 7 · · · (4n − 5) 2n 1 · 5 · 9 · · · (4n − 3) 2n+1 y = 2 1− − t+ t t (2n)! (2n + 1)! n =1

n =1

In the recurrence relation that arises from assuming that y = a0 + a1 t + a2 t 2 + · · · , it is not always obvious that two linear solutions to the original linear second-order equation arise. Often, we must content ourselves with ﬁnding the ﬁrst several terms of the overall general solution and rely on theorem 8.3.1 to tell us that both have been found. We close this section with an example that demonstrates this fact through connections to earlier material we have studied. Example 8.3.4 Use inﬁnite series to determine the solution to the initial-value problem (8.3.18) y − 2y − 3y = 0, y(0) = 4, y (0) = 0 Compare your result to the known solution to this IVP which can be found without using series. Solution.

Considering the series expansions for y, y , and y , we observe that

y = a0 + a1 t + a2 t 2 + a3 t 3 + · · · + an t n + · · · y = a1 + 2a2 t + 3a3 t 2 + 4a4 t 3 + · · · + (n + 1)an+1 t n + · · · y = 2a2 + 6a3 t + 12a4 t 2 + 20a5 t 3 + · · · + (n + 2)(n + 1)an+2 t n + · · · From the differential equation y − 2y − 3y = 0, we know that y = 2y + 3y. Equating like coefﬁcients from the expressions for y and 2y + 3y, we ﬁnd the recurrence relation 2a2 = 2a1 + 3a0 6a3 = 4a2 + 3a1 12a4 = 6a3 + 3a2 20a5 = 8a4 + 3a3 .. .

470

Series solutions for differential equations

More generally, we can state that for any n ≥ 2, an =

(2n − 2)an−1 + 3an−2 n(n − 1)

Using the given initial conditions, we ﬁnd that a0 = y(0) = 4 and a1 = y (0) = 0, and subsequently that 2a1 + 3a0 0 + 12 = =6 a2 = 2 2 a3 =

4a2 + 3a1 24 + 0 = =4 6 6

6a3 + 3a2 24 + 18 7 = = 12 12 2 and therefore the solution to the IVP is 7 y = 4 + 6t 2 + 4t 3 + t 4 + · · · 2 We can conﬁrm that this is in fact the correct solution by solving the IVP through another approach and considering power series expansions of the basic solution functions. In particular, since the characteristic equation of (8.3.18) is r 2 − 2r − 3 = 0 with roots r = 3 and r = −1, the general solution of the DE is y = c1 e 3t + c2 e −t a4 =

It is a standard exercise to show that the values of the constants that satisfy the initial conditions are c1 = 1 and c2 = 3, so that y = e 3t + 3e −t If we now employ the standard power series expansion for e t to write series expansions for the two solutions present in y, and then combine like terms, we observe that y = e 3t + 3e −t 9t 2 27t 3 81t 4 3t 2 3t 3 3t 4 + + +··· + 3 − 3t + − + −··· = 1 + 3t + 2! 3! 4! 2! 3! 4! = 4+

12t 2 24t 3 84t 4 + + +··· 2! 3! 4!

7 = 4 + 6t 2 + 4t 3 + t 4 +··· 2 which is precisely the power series expansion of the solution we found at the outset.

Legendre’s equation

471

Example 8.3.4 demonstrates that although the series form of the solution can hide some of the inherent structure in the solution, this approach is nonetheless straightforward to apply and will effectively lead us to the power series expansion of the solution to a stated IVP. Exercises 8.3 In exercises 1–13, ﬁnd the ﬁrst four terms in the Taylor series representation of the general solution to the stated DE. 1. y + ty = 0 2. y + 4y = 0 3. y + 4y = 0 4. y + ty = 0 5. y + 6y + 5y = 0 6. y + y + 4y = 0 7. y − y − 6y = 0 8. y + t 2 y = 0 9. (1 − t )y + y = 0 10. (t 2 − 1)y − 4y = 0 11. y + 3ty + 3y = 0 12. (t 2 + 1)y − 2y = 0 13. (1 − t 2 )y − 12ty − 18y = 0 In exercises 14–17, ﬁnd the ﬁrst four nonzero coefﬁcients of the Taylor series expansion for the solution to the stated IVP. 14. (4 − t 2 )y + 2y = 0, 15. y + (1 − t )y = 0,

y(0) = 1,

16. y − t 2 y + y sin t = 0, 17. y + y sin t = 0,

y (0) = 1

y(0) = 0,

y (0) = 0

y(0) = 0,

y(0) = 1,

y (0) = 1

y (0) = 0

8.4 Legendre’s equation

A differential equation that arises naturally in physics, particularly when using spherical coordinates, is the Legendre equation, (8.4.1) (1 − t 2 )y − 2ty + λ(λ + 1)y = 0 The parameter λ is often a positive integer, though it is allowed to be any real, non-negative constant. If we divide both sides of (8.4.1) by 1 − t 2 to write the

472

Series solutions for differential equations

equation in standard form y + p(t )y + q(t )y = 0, we have y −

2t λ(λ + 1) y + y =0 1 − t2 1 − t2

(8.4.2)

With

λ(λ + 1) 2t and q(t ) = 1 − t2 1 − t2 it follows that the origin is an ordinary point of Legendre’s equation and the nearest singularities lie at t = ±1. We therefore expect that we can ﬁnd Taylor series expansions about t = 0 for each of the two linearly independent solutions of (8.4.1), and the radius of convergence of each such series will be at least 1. To solve the Legendre equation, we assume that

p(t ) = −

y(t ) =

∞ ,

an t n

n =0

and consider the three terms present in the DE: (1 − t 2 )y , −2ty , and λ(λ + 1)y. Letting α = λ(λ + 1) and writing each of these expressions in their series expansion, we have (1 − t 2 )y = (1 − t 2 )

∞ ,

n(n − 1)an t n−2 =

n =2

=

∞ ,

(n + 2)(n + 1)an+2 t n −

∞ ,

∞ ,

n(n − 1)an t n

n =2

n(n − 1)an t n

(8.4.3)

n =0

−2ty = −2t

∞ ,

nan t n−1 =

n =1 ∞ ,

n(n − 1)an t n−2 −

n =2

n =0

αy =

∞ ,

∞ ,

−2nan t n =

n =1

∞ ,

−2nan t n

(8.4.4)

n =0

α an t n

(8.4.5)

n =0

To achieve the ﬁnal expression for (1 − t 2 )y in (8.4.3), we re-indexed the ﬁrst sum by letting n be replaced by n + 2 and lowering the index, and re-indexed the second sum by noting that when n = 0 and n = 1, the coefﬁcient n(n − 1) vanishes, so starting at n = 0 is the same as starting at n = 2. Likewise, for the expression for −2ty , the term nan t n is zero when n = 0, so we can start the sum at n = 0 instead of n = 1 in (8.4.4). Thus, all three series are written in terms of powers of t n starting at n = 0. Next, to satisfy Legendre’s equation (8.4.1), we take the series expressions in (8.4.3), (8.4.4), and (8.4.5) and set their collective sum to zero. Doing so, 0 = (1 − t 2 )y − 2ty + α y =

∞ ,

n =0

(n + 2)(n + 1)an+2 t n −

∞ ,

n =0

n(n − 1)an t n +

∞ ,

n =0

−2nan t n +

∞ ,

n =0

α an t n

Legendre’s equation

=

∞ ,

473

[(n + 2)(n + 1)an+2 − (n(n − 1) + 2n − α )an ] t n

n =0

=

∞ ,

(n + 2)(n + 1)an+2 − (n 2 + n − α )an t n

(8.4.6)

n =0

We thus observe (8.4.6) implies the recurrence relation (n + 2)(n + 1)an+2 − (n 2 + n − α )an = 0

(8.4.7)

Recalling that α = λ(λ + 1) = λ2 + λ, we may write n 2 + n − α = n 2 + n − λ2 − λ = (n − λ)(n + λ + 1)

(8.4.8)

Hence, (8.4.7) and (8.4.8) together show a n +2 =

(n − λ)(n + λ + 1) an (n + 2)(n + 1)

(8.4.9)

As we have seen in certain other DEs, the recurrence relation (8.4.9) makes all of the even coefﬁcients in the expansion for y depend on a0 , and all of the odd coefﬁcients depend on a1 . Assuming that a0 = 1 and computing the ﬁrst few even coefﬁcients, we ﬁnd that a0 = 1, a2 =

(−λ)(λ + 1) (2 − λ)(3 + λ) a0 , a4 = a2 2·1 4·3

so that one solution to the Legendre equation is y1 (t ) = 1 −

1 1 λ(λ + 1)t 2 + λ(λ + 1)(λ − 2)(λ + 3)t 4 + · · · 2! 4!

(8.4.10)

Similar computations for the odd coefﬁcients with a1 = 1 results in the function y2 (t ) = t −

1 1 (λ− 1)(λ+ 2)t 3 + (λ− 1)(λ− 3)(λ+ 2)(λ+ 4)t 5 +· · · (8.4.11) 3! 5!

The solutions y1 and y2 are clearly linearly independent and therefore form a basis for the set of all solutions to the Legendre equation. Note particularly that each depends directly on the parameter λ, as the Legendre equation is actually a family of equations where each equation depends on λ. In our development of y1 and y2 , note that we assumed a0 = 1 and a1 = 1, which is equivalent to assuming that y(0) = 1 and y (0) = 1. The general solution of the Legendre equation is y = a0 y1 + a1 y2 , where y1 and y2 are given by 8.4.10 and 8.4.11, respectively. The case when λ is a non-negative integer is particularly interesting. From the recurrence relation (8.4.9), whenever λ = n, it follows that an+2 = 0 and hence an+4 , an+6 , . . . are all zero. Since this causes the series expansion of y1 or y2 to terminate, one of the resulting solutions to the differential equation is a polynomial. In particular, if λ is an even integer, say λ = 2m, then y1 (t ) is a

474

Series solutions for differential equations

polynomial of degree 2m. For example, λ=0:

y1 (t ) = 1

λ=2:

y1 (t ) = 1 − 3t 2

35 4 t 3 Similarly, in the case where λ = 2m + 1 is an odd integer, y2 (t ) is a polynomial of degree 2m + 1. The ﬁrst few examples for small values of λ are λ=4:

y1 (t ) = 1 − 10t 2 +

λ=1:

y2 (t ) = t

5 y2 (t ) = t − t 3 3 14 21 λ=5: y2 (t ) = t − t 3 + t 5 3 5 These polynomials demonstrate that when λ is non-negative integer, at least one basic solution of the Legendre equation is a polynomial function. Moreover, since the Legendre equation is linear, any scalar multiple of a solution is also a solution, so we can scale these polynomials however we like. Doing so to make the polynomial’s value 1 when t = 1 results in the family of polynomials λ=3:

P0 (t ) = 1 P1 (t ) = t 3 1 P2 (t ) = t 2 − 2 2 5 3 3 P3 (t ) = t − 2 2 35 4 30 2 3 P4 (t ) = t − t + 8 8 8 63 5 70 3 15 P5 (t ) = t − t + 8 8 8 The polynomials Pn (t ), which can also be described through a recurrence relation linking Pn+2 to Pn+1 and Pn , are known as the Legendre polynomials and form a well-known class of so-called orthogonal polynomials. The Legendre polynomials have many interesting properties, including the fact that each has n real, distinct roots that lie in the interval (−1, 1) and demonstrate an oscillatory behavior similar to the graph of P11 (t ) shown in ﬁgure 8.1. The study of orthogonal polynomials has important ramiﬁcations in many areas of mathematics and physics, but lies beyond the scope of this text. Regardless of whether λ is a non-negative integer or not, the two inﬁnite series expansions for y1 and y2 in (8.4.10) and (8.4.11) are the two linearly independent solutions of the Legendre equation. In the case where λ is a nonnegative integer, we have shown that one of these two inﬁnite series terminates

Legendre’s equation

475

1

t −1

1

−1 Figure 8.1 The degree 11 Legendre polyno-

mial, P11 (t ).

to form a polynomial, one of the Legendre polynomials. The other solution turns out to have recognizable structure as well. For instance, when λ = 0, we know that one solution to the Legendre equation comes from y1 (t ) = 1 = P0 (t ). Setting λ = 0 in y2 (t ), it follows −1 · 2 3 −1 · (−3) · 2 · 4 5 y2 (t ) = t − t + t + ··· 3! 5! 1 1 = t + t3 + t5 + ··· 3 5 It can be shown from this expansion that 1 1+t y2 (t ) = ln 2 1−t

(8.4.12)

Thus, " when # λ = 0, a second linearly independent solution is given by Q0 (t ) = 1 1+t ln 2 1−t and we write y = c1 P0 + c2 Q0 . More generally, it can be shown that for any non-negative integer λ = n, a related expression involving Q0 exists for the second linearly independent solution Qn that is not a polynomial. In particular, these functions are known as Legendre functions of the second kind; the ﬁrst several of these functions are given by 1 1+t Q0 (t ) = ln 2 1−t Q1 (t ) = P1 (t )Q0 (t ) − 1 3 Q2 (t ) = P2 (t )Q0 (t ) − t 2

476

Series solutions for differential equations

5 2 Q3 (t ) = P3 (t )Q0 (t ) − t 2 + 2 3 35 3 55 t + 8 24 Note that the presence of Q0 (t ) in each solution highlights the fact that singularities are present in the Legendre equation at t = ±1. The functions P1 (t ), P2 (t ), . . . are the previously noted Legendre polynomials. Further, the general solution of the Legendre equation with λ = n ≥ 0 is therefore Q4 (t ) = P3 (t )Q0 (t ) −

y(t ) = c1 Pn (t ) + c2 Qn (t )

(8.4.13)

We close this section with an example. Example 8.4.1

Find the solution of the initial-value problem

(1 − t 2 )y − 2ty + 12y = 0,

y(0) = 1,

y (0) = 1

Solution. First, observe that the given DE is Legendre’s equation with λ = 3, since 3(3 + 1) = 12. From our earlier work in this section, we know that the general solution is y(t ) = c1 P3 (t ) + c2 Q3 (t ) 5 2 2 = c1 P3 (t ) + c2 P3 (t )Q0 (t ) − t + 2 3 5 2 = P3 (t )(c1 + c2 Q0 (t )) + c2 − t 2 + 2 3 5 3 3 c2 1 − t 5 2 2 + c2 − t + c1 + ln t − t = 2 2 2 1+t 2 3

(8.4.14)

Applying the initial conditions y(0) = 1 and y (0) = 1 to 8.4.14, we can show that c1 = −2/3 and c2 = 3/2, and thus 5 3 3 2 3 1−t 15 − + ln − t2 + 1 t − t y= 2 2 3 4 1+t 4 is the solution to the given IVP.

Exercises 8.4 1. Verify by direct substitution that the Legendre equation is satisﬁed by the polynomials P2 (t ) and P3 (t ) when λ = 2 and λ = 3, respectively. 2. Verify by direct substitution that Q0 (t ) = 12 ln(1 + t )/(1 − t ) is a solution of Legendre’s equation with λ = 0.

Three important examples

477

3. Determine the Taylor series expansion about a = 0 of f (t ) = 12 ln(1 + t )/(1 − t ) and conﬁrm that this matches (8.4.12). 4. Determine expressions for P6 (t ) and P7 (t ). In exercises 5–7, ﬁnd the general solution of the stated differential equation in terms of Pn (t ) and Qn (t ). (Hint : Use the method of undetermined coefﬁcients in the standard way to ﬁnd a particular solution of each equation.) 5. (1 − t 2 )y − 2ty + 6y = 6 6. (1 − t 2 )y − 2ty + 20y = 36t 7. (1 − t 2 )y − 2ty + 30y = 12t 2 In exercises 8–17, ﬁnd the ﬁrst four nonzero coefﬁcients of the Taylor series expansion (about t = 0) for the solution to the stated IVP. 8. (1 − t 2 )y − 2ty + 2y = 0,

y(0) = 1, y (0) = 0

9. (1 − t 2 )y − 2ty + 3y = 0,

y(0) = 1, y (0) = 0

10. (1 − t 2 )y − 2ty + 20y = 18t ,

y(0) = 0, y (0) = 1

11. 9(1 − t 2 )y − 18ty + 4y = 0, 12. (1 − t 2 )y − 2ty + 20y = 0,

y(0) = 0, y (0) = 1 y(0) = 1, y (0) = 1

13. (1 − t 2 )y − 2ty + 20y = 14t 2 ,

y(0) = 3, y (0) = 1

8.5 Three important examples

In this penultimate section on series solutions to differential equations, we consider and discuss three examples that arise in applied physics. 8.5.1 The Hermite equation

The Hermite equation is the linear second-order differential equation given by y − 2ty + 2qu = 0 where q is a real constant. Using the Taylor series expansions for y, in the usual way with y = a0 + a1 t + a2 t 2 + · · · , it can be shown that ∞ , [(n + 2)(n + 1)an+2 − 2(n − q)an ]t n = 0

(8.5.1)

y ,

and y

(8.5.2)

n =0

from which follows the recurrence relation 2(n − q) an+2 = (8.5.3) an , n = 0, 1, 2, . . . . (n + 1)(n + 2) As we have seen in previous examples, the even-subscripted coefﬁcients depend on y(0) = a0 , and the odd-subscripted coefﬁcients involve y (0) = a1 .

478

Series solutions for differential equations

To calculate the ﬁrst few nonzero terms in the expansions for the solution y1 (t ) involving even powers of t , we observe that 2(0 − q) q a 0 = −2 a 0 1·2 2! 2(2 − q) q(2 − q) a4 = a0 a2 = −22 3·4 4! 2(4 − q) q(2 − q)(4 − q) a6 = a0 a2 = −23 5·6 6!

a2 =

More generally, it follows that a2k = −2k

q(2 − q) · · · (2k − 2 − q) a0 (2k)!

(8.5.4)

If we elect to use the initial conditions y(0) = 1 and y (0) = 0, this implies that a0 = 1 and a1 = 0; the latter condition and the recurrence relation (8.5.3) imply that all odd-subscripted coefﬁcients are zero, and hence one solution to the Hermite differential equation is y1 (t ) = a0 + a1 t + a2 t 2 + · · · 2q 2 22 q(2 − q) 4 t − t − ··· 2! 4! ∞ , q(2 − q) · · · (2n − 2 − q) 2n = 1− 2n t (2n)! = 1−

(8.5.5)

n =1

Using similar reasoning with odd-subscripted coefﬁcients, (8.5.3) implies 2(1 − q) a1 2·3 2(3 − q) (1 − q)(3 − q) a5 = a1 a3 = 22 4·5 5! 2(5 − q) (1 − q)(3 − q)(5 − q) a7 = a1 a5 = 23 6·7 7! a3 =

From this, we can deduce that the general odd coefﬁcient is given by a2k +1 = 2k

(1 − q)(3 − q) · · · (2k − 1 − q) a1 (2k + 1)!

(8.5.6)

Using the initial conditions y(0) = 0 = a0 and y (0) = 1 = a1 , a second solution to the Hermite equation is y2 (t ) = t +

∞ ,

n =1

2n

(1 − q)(3 − q) · · · (2n − 1 − q) 2n+1 t (2n + 1)!

(8.5.7)

Three important examples

479

Since y1 (t ) and y2 (t ) are linearly independent, the general solution to the Hermite equation is y = c1 y1 + c2 y2 1 2 ∞ , n q(2 − q) · · · (2n − 2 − q) 2n = c1 1 − 2 t (2n)! n =1

1 + c2 t +

∞ ,

n =1

(1 − q)(3 − q) · · · (2n − 1 − q) 2n+1 2n t (2n + 1)!

2 (8.5.8)

Just as we experienced with Legendre’s equation, there are values for the constant q in the Hermite equation that lead to polynomial solutions. In particular, the presence of the factor (2n − 2 − q) in y1 (t ) implies that whenever q is an even, non-negative integer, then y1 (t ) is a polynomial. Speciﬁcally, from (8.5.5), when q = 0, q = 2, and q = 4, it follows that q=0:

y1 (t ) = 1

q=2:

y1 (t ) = 1 − 2t 2

(8.5.9)

4 y1 (t ) = 1 − 4t 2 + t 4 3 Similarly, for q = 1, q = 3, and q = 5, the function y2 (t ) that is a solution to the Hermite equation is found to be q=4:

q=1:

y2 (t ) = t

2 y2 (t ) = t − t 3 (8.5.10) 3 4 4 q=5: y2 (t ) = t − t 3 + t 5 3 15 The polynomial solutions to Hermite’s equation given in (8.5.9) and (8.5.10) are usually called the Hermite polynomials Hn (t ) when scaled such that the coefﬁcient of the highest power of t is 2n . The ﬁrst four Hermite polynomials are q=3:

H0 (t ) = 1 H1 (t ) = 2t H2 (t ) = 4t 2 − 2 H3 (t ) = 8t 3 − 12t The Hermite polynomials are another example of a family of orthogonal polynomials; Hermite polynomials are orthogonal on (−∞, ∞) with respect 2 to the weighting function w(t ) = e −t . Like Legendre polynomials, they have a wide range of interesting properties and the possibilities they present for further study go well beyond the scope of this text. A plot of H11 (t ) is shown in ﬁgure 8.2. The Hermite polynomials have large oscillations; the degree 11

480

Series solutions for differential equations

10

6

t −2

2

−106 Figure 8.2 The degree 11 Hermite

polynomial, H11 (t ), plotted on the interval [−3, 3].

polynomial has two more zeros, located at approximately ±3.7, which are not shown in ﬁgure 8.2. 8.5.2 The Laguerre equation

The Laguerre equation is given by ty + (1 − t )y + qy = 0

(8.5.11)

where q is, once again, a real constant. If we divide through by t , Laguerre’s equation is equivalently expressed as y +

1−t q y + y =0 t t

Since the coefﬁcient functions p(t ) of y and q(t ) of y are each undeﬁned at t = 0, the Laguerre equation has a singular point at the origin. Nonetheless, it turns out that we can ﬁnd a series expansion for a solution at the origin. Letting y = a0 + a1 t + a2 t 2 +· · · and substituting for y, y , and y in (8.5.11) it can be shown that the coefﬁcients an must satisfy ∞ ,

(n + 1)2 an+1 + (q − n)an t n = 0

(8.5.12)

n =1

It follows from (8.5.12) that (n + 1)2 an+1 + (q − n)an = 0 and therefore an+1 = −

q−n an (n + 1)2

(8.5.13)

Three important examples

481

Note that this recurrence relation relies only on the value of a0 , and therefore only leads to one solution to the Laguerre equation.3 Applying (8.5.13), we see q a 1 = − 2 a0 1 q−1 1 a2 = − 2 a1 = (q − 1)qa0 2 (1 · 2)2 q−2 1 a3 = − 2 a2 = − (q − 2)(q − 1)qa0 3 (1 · 2 · 3)2 More generally, q−n−1 (q − n + 1) · · · (q − 1)q an−1 = (−1)n a0 2 n n !2 Taking a0 = 1, we have found that one solution to the Laguerre equation is an = −

y1 (t ) = 1 +

∞ ,

(−1)n

n =1

(q − n + 1) · · · (q − 1)q n t n !2

(8.5.14)

When q is a non-negative integer, we see from (8.5.14) that y1 (t ) is a polynomial of degree q. Recalling the binomial coefﬁcient nq given by q q! q(q − 1) · · · (q − n + 1) = (8.5.15) = n n !(q − n)! n! we are able to ﬁnd a relatively simple expression for these polynomial solutions. The Laguerre polynomial of degree q is given by q , (−1)n q n Lq (t ) = 1 + t (8.5.16) n n! n =1

and these functions turn out to be the only solutions (up to scalar multiples) of the Laguerre equation that are analytic at t = 0. The Laguerre polynomials are yet another family of orthogonal polynomials. The ﬁrst few of these polynomials are given below, followed by a graph of L11 (t ) in ﬁgure 8.3. L1 (t ) = 1 − t 1 L2 (t ) = 1 − 2t + t 2 2 3 3 1 L3 (t ) = 1 − t + t 2 − t 3 2 2 6 2 1 L4 (t ) = 1 − 4t + 3t 2 − t 3 + t 4 3 24 3 A second solution can be found by more sophisticated techniques that lie beyond the scope of this book.

482

Series solutions for differential equations

15 10 5 t 4

2

6

8

10

−5 Figure 8.3 The degree 11 Laguerre poly-

nomial L11 (t ) plotted on the interval [0, 10]. 8.5.3 The Bessel equation

The Bessel equation t 2 y + ty + (t 2 − λ2 )y = 0

(8.5.17)

is a very important DE in mathematical physics. The properties of its solutions have been well studied; the equation often appears in the process of solving certain partial differential equations that appear when using cylindrical coordinates. The parameter λ in (8.5.17) is a real constant. Like the Laguerre equation, the Bessel equation has a singular point at t = 0, so we cannot expect to ﬁnd solutions to the equation with Taylor series centered at a = 0. Nonetheless, as we will show shortly, a solution analytic at t = 0 exists when λ is a non-negative integer. While a second linearly independent solution to the Bessel equation can be found, the techniques required are beyond the scope of this text. Here we only explore the series solutions that do exist for the Bessel equation. Let λ = m be a non-negative integer and assume that y1 (t ) = a0 + a1 t + a2 t 2 + · · · . Substituting directly in (8.5.17) leads to −m 2 a0 + (1 − m 2 )a1 t +

∞ , [(k 2 − m 2 )ak + ak −2 ]t k = 0

(8.5.18)

k =2

Since each coefﬁcient of powers of t in (8.5.18) must be zero, it follows that m 2 a0 = 0, (1 − m 2 )a1 = 0, and (k 2 − m 2 )ak + ak −2 = 0,

k ≥2

(8.5.19)

If k < m, then it follows ak = 0 for each such k by the three preceding equalities. When k = m, the coefﬁcient k 2 − m 2 of ak vanishes and thus (8.5.19) becomes the identity, rendering the value of am arbitrary. Note further that am+1 = am+3 = · · · = 0 is another consequence of (8.5.19). Thus, am can be any

Three important examples

483

constant, and subsequent terms must satisfy the recurrence 1 1 ak = − , k = m , m + 2, m + 4,... (k + 2)2 − m 2 (k + 2 − m)(k + 2 + m) (8.5.20) Hence, given a positive integer λ = m and a value for am , we can determine all of the coefﬁcients of the Taylor expansion of an analytic solution to the Bessel equation. In particular, these coefﬁcients am+2j for j ≥ 0 must satisfy the recurrence relation (8.5.20), from which using am = 1 we ﬁnd the closed formula (−1)j am+2j = 2−2j (8.5.21) j !(m + 1)(m + 2) · · · (m + j) Hence, one solution of Bessel’s equation (again, when λ = m is a positive integer) is ak +2 = −

y1 (t ) =

∞ ,

2−2j

j =0

(−1)j t m+2j j !(m + 1)(m + 2) · · · (m + j)

(8.5.22)

The Bessel function of the ﬁrst kind of order n (it is standard to use n rather than m for the order of the Bessel function) is the scalar multiple of y1 (t ) given by ∞

Jn (t ) =

, 2−n (−1)j n+2j y1 (t ) = 2−2j −n t n! j !(n + j)!

(8.5.23)

j =0

For example, the ﬁrst two Bessel functions are J0 (t ) =

∞ ,

2−2j

j =0

and J1 (t ) =

∞ ,

j =0

2−2j −1

(−1)j 2j t j !j !

(−1)j 2j +1 t j !(j + 1)!

(8.5.24)

(8.5.25)

The graph of J0 (t ) in ﬁgure 8.4 shows that the Bessel function exhibits damped oscillation. In this section, through the Hermite, Laguerre, and Bessel equations, we have encountered examples not only of three important DEs, but also of the various types of important functions that arise as solutions to these equations. Hermite polynomials, Laguerre polynomials, and Bessel functions are often studied in courses on special functions and demonstrate a wide range of interesting properties that mathematicians, engineers, and physicists have studied. Exercises 8.5 1. Determine the degree 4 and 5 Hermite polynomials, H4 (t ) and H5 (t ).

484

Series solutions for differential equations

0.8 0.4 t 20

10 −0.4

Figure 8.4 The Bessel function of the

ﬁrst kind, J0 (t ).

In exercises 2–4, ﬁnd the ﬁrst three nonzero terms in the Taylor series representation of the general solution to the given Hermite equation. 2. y − 2ty + 6y = 0 3. y − 2ty + 10y = 0 4. y − 2ty + 4y = 0 In exercises 5–7, ﬁnd the ﬁrst three nonzero terms in the Taylor series representation of the general solution to the given IVP. 5. y − 2ty + 6y = 0,

y(0) = 2,

y (0) = 10

6. y − 2ty + 10y = 0,

y(0) = 1,

y (0) = 0

7. y − 2ty + 4y = 8t ,

y(0) = 1,

y (0) = 0

8. Determine the degree 5 and 6 Laguerre polynomials, L5 (t ) and L6 (t ). Given that a general solution of Laguerre’s equation is c1 Lq (t ) + c2 u2 (t ), where u2 (t ) is singular at the origin, in exercises 9–11, determine the solution to the given IVP. 9. ty + (1 − t )y + 3y = 0,

y(0) = ﬁnite,

y(1) = 1

10. ty + (1 − t )y + 4y = 0,

y(0) = ﬁnite,

y(2) = 2

11. ty + (1 − t )y + 4y = 3t ,

y(0) = ﬁnite,

y(1) = 4

12. Determine the ﬁrst ﬁve nonzero terms in the series expansion of J2 (t ) about t = 0. In addition, state the form of J2 (t ) in sigma notation. It can be shown that a second linearly independent solution to the Bessel equation when λ = n (called the Bessel function of the second kind of

The Method of Frobenius

485

order n is given by

2 t Yn (t ) = Jn (t ) ln + γ + R(t ) + u(t ) π 2

where R(t ) is a rational function, γ ≈ 0.577215665 is Euler’s constant, and u(t ) is a power series convergent for all t . Note that Yn (t ) is singular at the origin. In exercises 13–15, determine the general solution to the given equation. 13. t 2 y + ty + (t 2 − 4)y = 0 14. t 2 y + ty + (t 2 − 9)y = 0 15. t 2 y + ty + (t 2 − 16)y = 0 In exercises 16–18, determine the solution to the given IVP. 16. t 2 y + ty + (t 2 − 4)y = 0,

y(0) = ﬁnite,

y(1) = 1

17. t 2 y + ty + (t 2 − 9)y = 0,

y(0) = ﬁnite,

y(1) = −3

18.

t 2 y + ty + (t 2 − 16)y

= 0,

y(0) = ﬁnite,

y(1) = 2

8.6 The Method of Frobenius

Some second-order linear DEs that appear in physical applications do not have two linearly independent analytic solutions about t = 0. Perhaps the most important and well-studied example is the Bessel equation (8.5.17). A somewhat simpler example is 3 1 (8.6.1) t 2 y + ty − y 2 2 which is a Cauchy–Euler equation (on which more information can be found in section 4.7.3). It is a√straightforward exercise to show that for all t > 0, y1 (t ) = t −1 and y2 (t ) = t are linearly independent solutions of (8.6.1). Note that neither y1 nor y2 has a derivative at the origin, and therefore neither is analytic at t = 0; thus, each lacks a Taylor series expansion at the origin. F. Georg Frobenius (1847–1917) showed that a certain class of linear second-order DEs with a singular point at the origin can be represented in series form by a slight generalization of a Taylor series. In particular, he showed that these series solutions have the form y =t

r

∞ ,

k =0

k

bk t =

∞ ,

k =0

bk t k +r

(8.6.2)

0 k where r is a real number and ∞ k =0 bk t converges in some open interval containing the origin. The series (8.6.2) is called a Frobenius series, and the following method we will discuss for obtaining r and the coefﬁcients bk is known as the Method of Frobenius.

486

Series solutions for differential equations

The Cauchy–Euler equation and the Bessel equation both belong to this class of equations that can be solved by the Method of Frobenius. In what follows, we focus particularly on equations of the form t 2 y + tp(t )y + q(t )y = 0

(8.6.3)

where p(t ) and q(t ) are low-degree polynomials. Note that p and q are analytic at the origin, and therefore each has a convergent Taylor series there. Any linear second-order DE with this property is said to have a regular singular point at the origin. The Method of Frobenius applies to all such equations. Finally, observe that if p(t ) and q(t ) are constant polynomials, then (8.6.3) reduces to a Cauchy–Euler equation. To begin, we suppose that there is a solution of (8.6.3) that has a series expansion of the form ∞ , y= bk t k +r (8.6.4) where b0 = 0 and

k =0

0∞

k k =0 bk t converges in 0 < |t | < R. From this, it follows that

y =

∞ ,

(k + r)bk t k +r −1

(8.6.5)

(k + r)(k + r − 1)bk t k +r −2

(8.6.6)

k =0

and y =

∞ ,

k =0

Furthermore, we suppose that p(t ) and q(t ) have the expansions p(t ) = p0 + p1 t + p2 t + · · · + pnn + · · · q(t ) = q0 + q1 t + q2 t + · · · + qnn + · · ·

Substituting these expressions for y, y , y , p, and q in (8.6.3) and gathering like terms, we ﬁnd that 0 = t 2 y + tp(t )y + q(t )y =

∞ ,

(k + r)(k + r − 1)bk t k +r + (p0 + p1 t + p2 t + · · · + pnn + · · · )

k =0

×

∞ ,

(k + r)bk t

k +r

+ (q0 + q1 t + q2 t + · · · + qnn

k =0

+ ···)

∞ ,

bk t k +r

k =0 2

= (r(r − 1) + p0 r + q0 )b0 + c1 t + c2 t + · · ·

(8.6.7)

where the general term cn depends on n and all earlier coefﬁcients for each n ≥ 1. A general formula for cn turns out to be complicated and not particularly useful for the examples we wish to study, so we choose not to derive such a formula.

The Method of Frobenius

487

The most important conclusion to draw from (8.6.7) comes from the fact that each coefﬁcient of the general power series expansion must be zero, so that since b0 = 0, (8.6.8) r(r − 1) + p0 r + q0 = 0 Equation (8.6.8) is called the indicial equation for the Method of Frobenius. Note that this equation is quadratic in r; its two roots are the values of r that are used in (8.6.2). At this point, it is useful for us to turn our attention to two speciﬁc example of the Method of Frobenius at work. Example 8.6.1 Find a Frobenius series solution for the Bessel–Clifford equation (8.6.9) t 2 y + (1 − a)ty + ty = 0 where a is a constant. Solution. With a being a constant, we have p(t ) = 1 − a, so in the series expansion for p, p0 = 1 − a. Moreover, q(t ) = t , so q0 = 0. Thus, for the given DE the indicial equation is r(r − 1) + (1 − a)r = 0 Rearranging, we see that r(r − 1 + 1 − a) = r(r − a) = 0, and thus the roots of the indicial equation are r = 0 and r = a. In the case that r = 0, the Method of Frobenius is providing an analytic solution to (8.6.9) of the form ∞ , bk t k y1 = k =0

Dividing both sides of (8.6.9) by t and substituting this expression for y using the standard series methods we have already discussed, it follows that ∞ , [(k + 1)(k + 1 − a)bk +1 + bk ]t k k =0

from which we obtain the recurrence relation −1 bk bk +1 = (k + 1)(k + 1 − a) It follows from (8.6.10) that the closed form expression for bk is bk = so we ﬁnd that

(8.6.10)

(−1)k b0 , k ≥ 1 k !(1 − a)(2 − a) · · · (k − a)

1

y1 (t ) = b0 1 +

∞ ,

k =1

(−1)k tk k !(1 − a)(2 − a) · · · (k − a)

2 (8.6.11)

which is valid for all t provided that a = 1, 2, . . .. Note that from this recurrence relation, every bn is a function of b0 , and thus there cannot be two linearly

488

Series solutions for differential equations

independent solutions to the Bessel–Clifford equation that are analytic at 0. Indeed, every solution linearly independent of y1 (t ) must be singular at 0. And while the equation has a singular point at the origin, there is an analytic solution there for every a except when a is a positive integer. We now turn to the other root of the indicial equation in search of a second solution to the Bessel–Clifford equation. Using r = a, we have ty(t ) =

∞ ,

bk t k +a +1

k =0

(1 − a)ty (t ) =

∞ ,

(1 − a)(k + a)bk t k +a

k =0

t 2 y (t ) =

∞ ,

(k + a)(k + a − 1)bk t k +a

k =0

Adding these equations forms the left side of the differential equation we aspire to solve; doing so and simplifying, we ﬁnd that 0 = t 2 y (t ) + (1 − a)ty (t ) + ty(t ) =

∞ ,

k(k + a)bk t k +a +

k =0

∞ ,

bk t k +a +1

k =0

Since the ﬁrst term in the ﬁrst sum is zero, if we adjust the index of the summation in the second sum and combine, we have ∞ ,

[k(k + a)bk + bk −1 ]t k +a = 0

k =1

from which it follows that k(k + a)bk + bk −1 = 0, k ≥ 1 This standard recurrence relation can be solved to write every bk in terms of b0 . Indeed, we see bk =

(−1)k b0 , k ≥ 1 k !(1 + a)(2 + a) · · · (k + a)

so that the Frobenius series representation of the solution is 1 2 ∞ , (−1)k a k y2 (t ) = b0 t 1 + t k !(1 + a)(2 + a) · · · (k + a)

(8.6.12)

k =1

We close this example with a few important observations. First, if a = 0, then the Frobenius solution y2 (t ) is identical to the earlier obtained y1 (t ). Moreover, if a is a non-negative integer, then the Method of Frobenius produces a Taylor series expansion that is analytic at t = 0. Thus, the cases for a valid analytic solution

The Method of Frobenius

489

excluded by our approach in ﬁnding y1 (t ) are here reconciled. Finally, if a is not an integer, then y2 (t ) is singular at t = 0 and, together with the analytic y1 (t ) given by (8.6.11), we have found a linearly independent set of solutions for the Bessel–Clifford equation valid for t > 0. To complete this section, we consider a second example. Example 8.6.2 Find a Frobenius series solution of Bessel’s equation, t 2 y + ty + (t 2 − λ2 )y = 0

(8.6.13)

Solution. In section 8.5.3, we derived a solution to (8.6.13) in the case where λ is an integer. Thus, in what follows we assume that λ > 0 is not an integer. Since p(t ) = 1 and q(t ) = −λ2 + t 2 , we have p0 = 1 and q0 = −λ2 , which tells us that the indicial equation is r(r − 1) + r − λ2 = r 2 − λ2 = 0 Thus, r = ±λ. Choosing r = λ and using (8.6.4), (8.6.5), and (8.6.6), we ﬁnd that the three relevant series for the differential equation (8.6.13) are (t 2 − λ2 )y(t ) =

∞ ,

bk t k +λ+2 −

k =0

ty (t ) =

∞ ,

∞ ,

bk t k +λ+2

k =0

(k + λ)bk t k +λ

k =0 2

t y (t ) =

∞ ,

(k + λ)(k + λ − 1)bk t k +λ

k =0

From the form of Bessel’s equation, the sum of these three expressions vanishes; adding and simplifying, we observe that ∞ ,

k(k + 2λ)bk t k +λ −

k =0

∞ ,

bk t k +λ+2 = 0

k =0

To combine the sums, we step up the index in the second summation by 2 and ﬁnd ∞ , [k(k + 2λ)bk + bk −2 ]t k +λ = 0 (1 + 2λ)b1 t 1+λ − k =2

So, (1 + 2λ)b1 = 0, and k(k + 2λ)bk + bk −2 = 0, k ≥ 2

(8.6.14)

One solution to this recurrence relation is obtained by setting b0 = 1 and b1 = 0. Then, since we are assuming that λ is not an integer and b1 = 0, (8.6.14) implies

490

Series solutions for differential equations

that all odd-subscripted coefﬁcients are zero and that −1 bk −2 , k = 2, 4, . . . k(2λ + k)

bk =

Therefore, it follows that in closed form we have b2k =

(−1)k 2−2k k !(1 + λ)(2 + λ) · · · (k + λ)

and thus a Frobenius solution to the Bessel equation is y(t ) = t λ +

∞ ,

k =1

(−1)k 2−2k t 2k +λ k !(1 + λ)(2 + λ) · · · (k + λ)

Note that since λ > 0, the ratio test can be applied to show that this series converges for all values of t . A more detailed study of the Method of Frobenius is beyond the scope of this text. (For further discussion, see Potter and Goldberg, Mathematical Methods, second edition, Great Lakes Press 1995.) Exercises 8.6 In exercises 1–10, ﬁnd the indicial equation and use the root that either is not an integer or that is the larger integer to ﬁnd the ﬁrst three nonzero coefﬁcients in a Frobenius series solution to the given DE. 1. 2t 2 y − ty + (1 + t )y = 0 2. 2ty + y + ty = 0 3. ty + (t − 2)y + y = 0 4. 2ty + (1 + 4t )y + y = 0 5. t 2 y − t (t + 5)y + (t + 5)y = 0 6. 2t 2 y − ty + (t − 5)y = 0 7. 4t 2 y + 6ty + (t − 2)y = 0 8. 2ty + (1 − t )y − y = 0 9. t 2 y + ty + (t − 3)y = 0 10. 3t 2 y − ty − 4y = 0 11. Find the indicial equation for the Cauchy–Euler equation t 2 y + pty + qy = 0 12. Show that the roots of the indicial equation are equal for the Laguerre equation ty + (1 − t )y + qu = 0

For further study

491

8.7 For further study 8.7.1 Taylor series for ﬁrst-order differential equations

Let y(t ) =

0∞

n =0 an t

n

be the Taylor series of a solution of

ty + λy = f (t ) 0 n where λ is constant and f (t ) = ∞ n =0 fn t .

(8.7.1)

(a) Show that y(t ) =

∞ , fn n t n+λ

n =0

(b) In terms of the inﬁnite series derived in (a), what is the general solution to (8.7.1)? (c) Using series expansions appropriately and your work in (a), determine the general solution to each of the following DEs. (i) ty + 2y = e t (ii) ty + 3y = sin t (iii) ty + 4y = arctan t (d) Show that t, ∞ ∞ ∞ , , fn n fn n+λ −λ −λ t =t t =t fn x n+λ−1 dx n+λ n+λ 0 n =0

n =0

= t −λ

t

n =0

x λ−1

0

∞ ,

fn x n dx = t −λ

n =0

t

x λ−1 f (x) dx

0

(e) Substitute directly in (8.7.1) to show that t −λ y(t ) = t x λ−1 f (x) dx 0

is indeed a solution. (f) Solve (8.7.1) by use of an integrating factor (see section 2.3) and compare your result to y(t ) as given in (e). 8.7.2 The Gamma function

The Gamma function (x), like Bessel functions and families of orthogonal polynomials, is a special function that plays an important role in many areas of mathematics. The Gamma function is deﬁned by ∞ (s + 1) = e −t t s dt , s > −1 (8.7.2) 0

492

Series solutions for differential equations

(a) Show that (1) = 1. (b) Use integration by parts to show that (s + 1) = s (s). (c) Show that if s is a positive integer, then (s) = s !. ∞ (d) Let r > 0 be given and recall that L[t r ] = 0 e −st t r dt . Hence show that L[t r −1 ] =

(e) Show that

(r) sr

∞ √ 1 2 e −x dx = π =2 2 −∞

(f) Use (b) to show that hn

(h + x /h) = x(x + h)(x + 2h) · · · (x + (n − 1)h) (x /h)

Hence, show that 1 · 3 · 5 · · · (2n − 1) = 2n

(n + 1/2) 2n = √ (n + 1/2) (1/2) π

(g) Finally, explain why 1 · 3 · 5 · · · (2n − 1) = (2n)!/(2n n !) and therefore show 1 (2n)! √ n+ π = n 2 2 n!

A Review of integration techniques

Several standard solution techniques for differential equations require us to integrate functions. Here we brieﬂy review some fundamentals from calculus.

u-substitution

For integrals of the form

f (g (t ))g (t ) dt

we can evaluate the integral by undoing the chain rule through a change of variables. Letting u = g (t ), it follows du = g (t ) dt , and thus f (g (t ))g (t ) dt = f (u) du If we can evaluate the new, simpler integral in u, all that remains is to substitute back to the variable t . For instance, to evaluate t sin t 2 dt we let u = t 2 and du = 2t dt . We note that t dt = 12 du. Thus, substituting for t 2 and t dt , we ﬁnd that the given integral is equivalently 1 sin u du 2 Evaluating the integral in u and substituting back to t , 1 1 1 t sin t 2 dt = sin u du = − cos u + C = − cos t 2 + C 2 2 2 493

494

Appendix A: Review of integration techniques

Overall, u-substitution is particularly relevant for working with composite functions. In attempting to use u-substitution, we should search the integrand for an inside function, and then hope that its derivative (up to a constant multiple) is present outside the composite function. Examples for further practice: 2 1. te −t dt

t 21 (4t 22 − 13)20 dt

2.

3.

4.

5.

6e 1/t · t −2 dt sin t dt 1 + cos2 t (sin t )3 dt

Hint: sin2 t = 1 − cos2 t .

Integration by parts

As u-substitution is used to undo the chain rule, integration by parts undoes the product rule. It is particularly applicable to integrals that involve products of basic functions such as te t dt . Recall that the product rule states d [u(t )v(t )] = u(t )v (t ) + v(t )u (t ) dt Integrating both sides of (A.1), it follows that u(t )v(t ) = u(t )v (t ) dt + v(t )u (t ) dt Solving for u(t )v (t ) dt , we have u(t )v (t ) dt = u(t )v(t ) − v(t )u (t ) dt

(A.1)

(A.2)

(A.3)

Writing dv = v (t ) dt and du = u (t ) du and suppressing the presence of t , we see in (A.3) the standard statement of the integration by parts rule: udv = uv − v du (A.4) For example, let’s evaluate te t dt . Letting u = t and dv = e t dt , we observe that du = dt and v = e t . Thus, integrating by parts, t t te dt = te − e t dt = te t − e t + C

Partial fractions

495

A good way to think of integration by parts is to view it as integrating the product u dv by trading u for its derivative and trading dv for its antiderivative. In particular, once we have decided to use integration by parts, we must make appropriate choices for u and dv. One guideline is that dv should be fairly easy to antidifferentiate; another is that the derivative of u should not be signiﬁcantly more complicated than u itself. Overall, we generally want the integral of v du to be somehow simpler (or at least not more complicated) than the integral of u dv. Examples for further practice: 1. t 4 ln t dt

2.

5t sin t dt

3te 2t dt

3.

4.

√ t 7t + 5 dt

5.

ln t dt

Hint: Try dv = 1.

t 2 e t dt

6.

7.

e t cos t dt

Partial fractions

A remarkable fact is that any rational function (that is, any quotient of two polynomials) may be integrated. The standard method for approaching an integration problem of the form p(t ) dt q(t ) is the technique known as partial fractions. It is necessary to assume (or apply long division so) that the degree of p is less than the degree of q. While partial fractions is an important technique for integration, it is also a useful tool in its own right. For example, we frequently use it when working with the Laplace transform; see sections 5.5 and 5.6. The method is best understood through a sequence of examples. Example A.1 Evaluate the integral

t dt t 2 + 5t + 6

(A.5)

496

Appendix A: Review of integration techniques

Solution.

Factoring the integrand, we can write t t 2 + 5t

+6

=

t (t + 2)(t + 3)

(A.6)

If we view the righthand side as the result of adding two simpler fractions, we can make the reasonable assumption that two fractions of the form A /(t + 2) and B /(t + 3) had to be combined by getting a common denominator to form (A.6). Thus we assume A B t = + (t + 2)(t + 3) t + 2 t + 3

(A.7)

and seek values of A and B which make this relationship hold for all values of t . Multiplying both sides of (A.7) by (t + 2)(t + 3), we ﬁnd t = A(t + 3) + B(t + 2)

(A.8)

Since (A.8) must be valid for every value of t , we can choose t -values that make it especially easy to identify A and B. Choosing t = −2, we see that −2 = A(−2 + 1) = A. Choosing t = −3, it follows −3 = B(−3 + 2), so B = 3. Thus, we have determined t 2 3 =− + (t + 2)(t + 3) t +2 t +3

(A.9)

Having completed the partial fraction decomposition, we can now integrate. In particular, t 2 3 + dt = − t 2 + 5t + 6 t +2 t +3 = −2 ln(t + 2) + 3 ln(t + 3) + C

The approach of example A.1 works any time the denominator q(t ) can be written as a product of distinct linear terms. That is, if q(t ) = (t − r1 ) (t − r2 ) · · · (t − rn ), then we can write A1 A2 An p(t ) = + + ··· + q(t ) t − r1 t − r2 t − rn and use algebra similar to our work above to determine A1 , . . . , An . Example A.2

Solution.

Evaluate the integral

t2 − 4 dt t3 + t2

Factoring the denominator of the integrand, we have t2 − 4 t2 − 4 = t 3 + t 2 t 2 (t + 1)

Partial fractions

497

If we think of the possible simpler fractions from which the given one can arise, we see that it is possible for terms of the form A B C , , and t t2 t +1 to be present. In particular, we must include A /t since this denominator is included in the necessary B /t 2 . Thus we write A B C t2 − 4 = + 2+ 2 t (t + 1) t t t +1

(A.10)

Multiplying both sides of (A.10) by the least common denominator t 2 (t + 1), we ﬁnd t 2 − 4 = At (t + 1) + B(t + 1) + Ct 2 Setting t = 0 implies −4 = B; using t = −1 shows −3 = C. To ﬁnd A, we may use any other value of t , along with the established values of B and C. With t = 1, −3 = A(1)(2) + (−4)(2) + (−3)12 and therefore A = 4. We now apply the partial fractions decomposition and integrate: 2 t −4 4 4 3 dt = − − dt t3 + t2 t t2 t + 1 = 4 ln t + 4t −1 − 3 ln(t + 1) + C

In any rational function where the denominator contains a repeated factor, we use a similar form of partial fraction decomposition. For instance, t 3 − 2t + 1 A B C D E F = + + + + + (t + 4)3 (t − 2)2 (t − 5) t + 4 (t + 4)2 (t + 4)3 t − 2 (t − 2)2 t − 5 so that each repeated factor is represented once for each possible order up to the highest power. Example A.3 Evaluate the integral

t −5 dt t3 + t

Solution. When we factor the integrand, we observe that a quadratic

This page intentionally left blank

Differential Equations with Linear Algebra

Matthew R. Boelkins, J. L. Goldberg, and Merle C. Potter

3 2009

3

Oxford University Press, Inc., publishes works that further Oxford University’s objective of excellence in research, scholarship, and education. Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With ofﬁces in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam

Copyright © 2009 by Oxford University Press, Inc. Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016 www.oup.com Oxford is a registered trademark of Oxford University Press All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press. Library of Congress Cataloging-in-Publication Data Boelkins, Matthew R. Differential equations with linear algebra / Matthew R. Boelkins, J.L. Goldberg, Merle C. Potter. p. cm. Includes index. ISBN 978-0-19-538586-1 (cloth) 1. Differential equations, Linear. 2. Algebras, Linear. I. Goldberg, Jack L. (Jack Leonard), 1932– II. Potter, Merle C. III. Title. QA372.B657 2009 515 .354–dc22 2008050361

9

8

7

6

5

4

3

2

1

Printed in the United States of America on acid-free paper

Contents

1

Introduction

xi

Essentials of linear algebra 1.1 Motivating problems 1.2 Systems of linear equations

3 3 8 15 21

1.2.1

1.3

Row reduction using Maple

Linear combinations 1.3.1 1.3.2

Markov chains: an application of matrix-vector multiplication Matrix products using Maple

1.4 1.5 1.6 1.7

The span of a set of vectors Systems of linear equations revisited Linear independence Matrix algebra

1.8

The inverse of a matrix

1.7.1 1.8.1 1.8.2

1.9

Computer graphics Matrix inverses using Maple

The determinant of a matrix 1.9.1

1.10

Matrix algebra using Maple

Determinants using Maple

The eigenvalue problem 1.10.1 1.10.2

Markov chains, eigenvectors, and Google Using Maple to ﬁnd eigenvalues and eigenvectors

26 29 33 39 49 58 62 66 70 73 78 82 84 93 94

vi

Contents

1.11 1.12 1.13

Generalized vectors Bases and dimension in vector spaces For further study 1.13.1 1.13.2 1.13.3

2

First-order differential equations 2.1 Motivating problems 2.2 Deﬁnitions, notation, and terminology 2.2.1

2.3 2.4

2.5

Plotting slope ﬁelds using Maple

Linear ﬁrst-order differential equations Applications of linear ﬁrst-order differential equations 2.4.1 2.4.2 2.4.3

Mixing problems Exponential growth and decay Newton’s law of Cooling

Nonlinear ﬁrst-order differential equations 2.5.1 2.5.2

Separable equations Exact equations

2.6

Euler’s method

2.7

Applications of nonlinear ﬁrst-order differential equations

2.6.1

2.7.1 2.7.2

2.8

Implementing Euler’s method in Excel

The logistic equation Torricelli’s law

For further study 2.8.1 2.8.2 2.8.3 2.8.4

3

Computer graphics: geometry and linear algebra at work Bézier curves Discrete dynamical systems

Converting certain second-order des to ﬁrst-order DEs How raindrops fall Riccati’s equation Bernoulli’s equation

Linear systems of differential equations 3.1 Motivating problems 3.2 The eigenvalue problem revisited 3.3 Homogeneous linear ﬁrst-order systems 3.4 Systems with all real linearly independent eigenvectors 3.4.1

3.5 3.6 3.7

Plotting direction ﬁelds for systems using Maple

When a matrix lacks two real linearly independent eigenvectors Nonhomogeneous systems: undetermined coefﬁcients Nonhomogeneous systems: variation of parameters 3.7.1

Applying variation of parameters using Maple

99 108 115 115 119 123 127 127 129 135 139 147 147 148 150 154 154 157 162 167 172 172 176 181 181 182 183 184 187 187 191 202 211 219 223 236 245 250

Contents

3.8

Applications of linear systems 3.8.1 3.8.2 3.8.3

3.9

For further study 3.9.1 3.9.2

4

4.4

Repeated roots Complex roots

Nonhomogeneous equations 4.4.1 4.4.2

Undetermined coefﬁcients Variation of parameters

4.5 4.6

Forced motion: beats and resonance Higher order linear differential equations

4.7

For further study

4.6.1 4.7.1 4.7.2 4.7.3 4.7.4

Solving characteristic equations using Maple Damped motion Forced oscillations with damping The Cauchy–Euler equation Companion systems and companion matrices

Laplace transforms 5.1 Motivating problems 5.2 Laplace transforms: getting started 5.3 General properties of the Laplace transform 5.4 Piecewise continuous functions 5.4.1 5.4.2 5.4.3

5.5 5.6

5.7

The Heaviside function The Dirac delta function The Heaviside and Dirac functions in Maple

Solving IVPs with the Laplace transform More on the inverse Laplace transform 5.6.1

Laplace transforms and inverse transforms using Maple

For further study 5.7.1 5.7.2 5.7.3

6

Diagonalizable matrices and coupled systems Matrix exponential

Higher order differential equations 4.1 Motivating equations 4.2 Homogeneous equations: distinct real roots 4.3 Homogeneous equations: repeated and complex roots 4.3.1 4.3.2

5

Mixing problems Spring-mass systems RLC circuits

Laplace transforms of inﬁnite series Laplace transforms of periodic forcing functions Laplace transforms of systems

Nonlinear systems of differential equations 6.1 Motivating problems

vii

253 253 255 258 268 268 270 273 273 274 281 281 283 288 289 295 300 309 316 319 319 321 323 325 329 329 331 337 347 347 353 357 359 371 375 378 378 380 384 387 387

viii

Contents

6.2

Graphical behavior of solutions for 2 × 2 nonlinear systems 6.2.1

6.3 6.4

Linear approximations of nonlinear systems Euler’s method for nonlinear systems

6.5

For further study

6.4.1 6.5.1 6.5.2

7

Implementing Euler’s method for systems in Excel The damped pendulum Competitive species

Numerical methods for differential equations 7.1 Motivating problems 7.2 Beyond Euler’s method 7.2.1 7.2.2

7.3

7.4

Taylor methods Runge–Kutta methods

Methods for systems and higher order equations 7.4.1 7.4.2 7.4.3 7.4.4

7.5

Heun’s method Modiﬁed Euler’s method

Higher order methods 7.3.1 7.3.2

Euler’s method for systems Heun’s method for systems Runge–Kutta method for systems Methods for higher order IVPs

For further study 7.5.1 7.5.2 7.5.3

8

Plotting direction ﬁelds of nonlinear systems using Maple

Predator–Prey equations Competitive species The damped pendulum

Series solutions for differential equations 8.1 Motivating problems 8.2 A review of Taylor and power series 8.3 Power series solutions of linear equations 8.4 Legendre’s equation 8.5 Three important examples 8.5.1 8.5.2 8.5.3

8.6 8.7

The Hermite equation The Laguerre equation The Bessel equation

The method of Frobenius For further study 8.7.1 8.7.2

Taylor series for ﬁrst-order differential equations The Gamma function

391 397 400 409 413 417 417 418 421 421 423 424 427 430 431 434 439 440 442 443 445 449 449 450 450 453 453 455 463 471 477 477 480 482 485 491 491 491

Contents

ix

Appendix A

Review of integration techniques

493

Appendix B

Complex numbers

503

Appendix C

Roots of polynomials

509

Appendix D

Linear transformations

513

Appendix E

Solutions to selected exercises

523

Index

549

This page intentionally left blank

Introduction

In Differential Equations with Linear Algebra, we endeavor to introduce students to two interesting and important areas of mathematics that enjoy powerful interconnections and applications. Assuming that students have completed a semester of multivariable calculus, the text presents an introduction to critical themes and ideas in linear algebra, and then, in its remaining seven chapters, investigates differential equations while highlighting the role that linearity plays in their study. Throughout the text, we strive to reach the following goals: • To motivate the study of linear algebra and differential equations through interesting applications in order that students may see how theoretical results can answer fundamental questions that arise in physical situations. • To demonstrate the fact that linear algebra and differential equations can be presented as two parts of a mathematical whole that is coherent and interconnected. Indeed, we regularly discuss how the structure of solutions to linear differential equations and systems of equations exemplify important ideas in linear algebra, and how linear algebra often answers key questions regarding differential equations. • To present an exposition that is intended to be read and understood by students. While certainly every textbook is written with students in mind, often the rigor and formality of standard mathematical presentation takes over, and books become difﬁcult to read. We employ an examples-ﬁrst philosophy that uses an intuitive approach as a lead-in to more general, theoretical results. xi

xii

Introduction

• To develop in students a deep understanding of what may be their ﬁrst exposure to post-calculus mathematics. In particular, linear algebra is a fundamental subject that plays a key role in the study of much higher level mathematics; through its study, as well as our investigations of differential equations, we aim to provide a foundation for further study in mathematics for students who are so interested. Whether designed for mathematics or engineering majors, many universities offer a hybrid course in linear algebra and differential equations, and this text is written for precisely such a class. At other institutions, linear algebra and differential equations are treated in two separate courses; in settings where linear algebra is a prerequisite to the study of differential equations, this text may also be used for the differential equations course, with its ﬁrst chapter on linear algebra available as a review of previously studied material. More details on the ways the book can be implemented in these courses follows shortly in the section How to Use this Text. An overriding theme of the book is that if a differential equation or system of such equations is linear, then we can usually solve it exactly.

Linear algebra and systems ﬁrst

In most other texts that present the subjects of differential equations and linear algebra, the presentation begins with ﬁrst-order differential equations, followed by second- and higher order linear differential equations. Following these topics, a modest amount of linear algebra is introduced before beginning to consider systems of linear differential equations. Here, however, we begin on the very ﬁrst page of the text with an example that shows the natural way that systems of linear differential equations arise, and use this example to motivate the need to study linear algebra. We then embark on a one-chapter introduction to linear algebra that aims not only to introduce such important concepts as linear combinations, linear independence, and the eigenvalue problem, but also to foreshadow the use of such topics in the study of differential equations. Following chapter 1, we consider ﬁrst-order differential equations brieﬂy in chapter 2, using the study of linear ﬁrst-order equations to highlight some of the key ideas already encountered in linear algebra. From there, we quickly proceed to an in-depth presentation of systems of linear differential equations in chapter 3. In that setting, we show how the eigenvalues of an n × n matrix A naturally provide the general solution to systems of linear differential equations in the form x = Ax. Moreover, we include examples that show how any single higher order linear differential equation may be converted to a system of equations, thus providing further motivation for why we choose to study systems ﬁrst. Through this approach, we again strive to emphasize critical connections between linear algebra and differential equations and to demonstrate the most important ideas that arise in the study of each. In the remainder of the text, the

Introduction

xiii

role of linear algebra is continually emphasized, even in the study of nonlinear equations and systems.

Features of the text

Instructors and students alike will ﬁnd several consistent features in the presentation. • Each chapter begins with one or two motivating problems that present a natural situation—often a physical application—in which linear algebra or differential equations arises. From such problems, we work to develop related ideas in subsequent sections that enable us to solve the original problem. In discussing the motivating problems, we also endeavor to use our intuition to predict the solution(s) we expect to ﬁnd, and then later test our results against these predictions. • In almost every section of the text, we use an examples-ﬁrst approach. By this we mean that we introduce a certain type of problem that we are interested in solving, and then consider a relatively simple one that can be solved by intuition or ideas studied previously. From the solution of an elementary example, we then discuss how this approach can be generalized or modiﬁed to solve more complex examples, and then ultimately prove or state theorems that provide general results that enable the solution of a wide range of problems. With this philosophy, we strive to demonstrate how the general theory of mathematics comes from experimenting and investigating through individual examples followed by looking for overall trends. Moreover, we often use this approach to foreshadow upcoming ideas: for example, while studying linear algebra, we look ahead to a handful of fundamental differential equations. Similarly, early on in our investigations of the Laplace transform, we regularly attempt to demonstrate through examples how the transform will be used to solve initial-value problems. • While there are many formal theoretical results that hold in both linear algebra and differential equations, we have endeavored to emphasize intuition. Speciﬁcally, we use the aforementioned examples-ﬁrst approach to solve sample problems and then present evidence as to why the details of the solution process for a small number of examples can be generalized to an overall structure and theory. This is in contrast to many books that ﬁrst present the overall theory, and then demonstrate the theory at work in a sequence of subsequent examples. In addition, we often eschew formal proofs, choosing instead to present more heuristic or intuitive arguments that offer evidence of the truth of important theorems. • Wherever possible, we use visual reasoning to help explain important ideas. With over 100 graphics included in the text, we have provided

xiv

Introduction

ﬁgures that help deepen students’ understanding and offer additional perspective on essential concepts. By thinking graphically, we often ﬁnd that an appropriate picture sheds further light on the solution to a problem and how we should expect it to behave, thus adding to our intuition and understanding. • With computer algebra systems (CASs), such as Maple and Mathematica, approaching their twentieth year of existence, these technologies are an important part of the landscape of the teaching and learning of mathematics. Especially in more sophisticated subjects with computationally complicated problems, these tools are now indispensable. We have chosen to integrate instructional support for Maple directly within the text, while offering similar commentary for Mathematica, MATLAB, and SAGE on our website, www.oup.com/ differentialequations/. For each, students can ﬁnd directions for how to effectively use computer algebra systems to generate important graphs and execute complicated or tedious calculations. Many sections of the text are followed by a short subsection on “Using Maple to . . ..” Parallel sections for the other CASs, numbered similarly, can be found on the website. • Each chapter ends with a section titled For further study. In this setting, rather than a full exposition, a sequence of leading questions is presented to guide students to discover some key ideas in more advanced problems that arise naturally from the material developed to date. These sections can be used as a basis for instructor-led in-class discussions or as the foundation for student projects or other assignments. Interested students can also pursue these topics on their own.

How to use this text

There are two courses for which this text is well-suited: a hybrid course in linear algebra and differential equations, or a course in differential equations that requires linear algebra as a prerequisite. We address each course separately with some suggestions for instructors. Linear algebra and differential equations

For a hybrid course in the two subjects, instructors should begin with chapter 1 on linear algebra. There, in addition to an introduction to many essential ideas in the subject, students will encounter a handful of examples on linear differential equations that foreshadow part of the role of linear algebra in the ﬁeld of differential equations. The goal of the chapter on linear algebra is to introduce important ideas such as linear combinations, linear independence and span, matrix algebra, and the eigenvalue problem. At the close of chapter 1

Introduction

xv

we also introduce abstract vector spaces in anticipation of the structural role that vector spaces play in solving linear systems of differential equations and higher order linear differential equations. Instructors may choose to move on from chapter 1 upon completing section 1.10 (the eigenvalue problem), as this is the last topic that is absolutely essential for the solution of linear systems of differential equations in chapter 3. Discussion of ideas like basis, dimension, and vector spaces of functions from the ﬁnal two sections of chapter 1 can occur alongside the development of general solutions to systems of linear differential equations or higher order linear differential equations. Over the past decade or two, ﬁrst-order differential equations have become a standard topic that is normally discussed in calculus courses. As such, chapter 2 can be treated lightly at the instructor’s discretion. In particular, it is reasonable to expect that students are familiar with direction ﬁelds, separable differential equations, Euler’s method, and several fundamental applications, such as Newton’s law of Cooling and the logistic differential equation. It is less likely that students will have been exposed to integrating factors as a solution technique for linear ﬁrst-order equations and the solution methods for exact equations. In any case, chapter 2 is not one on which to linger. Instructors can choose to selectively discuss a small number of sections in class, or assign the pages there as a reading assignment or project for independent investigation. Chapter 3 on systems of linear differential equations is the heart of the text. It can be begun immediately following section 1.10 in chapter 1. Here we ﬁnd not only a large number of rich ideas that are important throughout the study of differential equations, but also evidence of the essential role that linear algebra plays in the solution of these systems. As is noted on several occasions in chapter 3, any higher order linear differential equation may be converted to a system of ﬁrst-order equations, and thus an understanding of systems enables one to solve these higher order equations as well. Thus, the material in chapter 4 may be de-emphasized. Instructors may choose to provide a brief overview, in class, of how the ideas in solving linear systems translate naturally to the higher order case, or may choose to have students investigate these details on their own through a sequence of reading and homework assignments or a group project. Section 4.5 on beats and resonance is one to discuss in class as these phenomena are fascinating and important and the perspective of higher order equations is a more natural context in which to consider their solution. The Laplace transform is a topic that affords discussion of a variety of important ideas: linear transformations, differentiation and integration, direct solution of initial-value problems, discontinuous forcing functions, and more. In addition, it can be viewed as a gateway to more sophisticated mathematical techniques encountered in more advanced courses in mathematics, physics, and engineering. Chapter 5 is written with the goal of introducing students to the Laplace transform from the perspective of how it can be used to solve initial-value problems. This emphasis is present throughout the chapter, and culminates in section 5.5.

xvi

Introduction

Finally, a course in both linear algebra and differential equations should not be considered complete until there has been at least some discussion of nonlinearity. Chapter 6 on nonlinear higher order equations and systems offers an examination of this concept from several perspectives, all of which are related to our previous work with linear differential equations. Direction ﬁelds, approximation by linear systems, and an introduction to numerical approximation with Euler’s method are natural topics with which to round out the course. Due to the time required to introduce the subject of linear algebra to students, the ﬁnal two chapters of the text (on numerical methods and series solutions) are ones we would normally not expect to be considered in a hybrid course. Differential equations with a linear algebra prerequisite

For a differential equations course in which students have already taken linear algebra, chapter 1 may be used as a reference for students, or as a source of review as needed. The comments for the hybrid course above for chapters 2–5 hold for a straight differential equations class as well, and we would expect instructors to use the time not devoted to the study of linear algebra to focus more on the material on nonlinearity in chapter 6, numerical methods in chapter 7, and series solutions in chapter 8. The ﬁrst several sections of chapter 7 may be treated any time after ﬁrst-order differential equations have been discussed; only the ﬁnal section in that chapter is devoted to systems and higher order equations where the methods naturally generalize work with ﬁrst-order equations. In addition to spending more time on the ﬁnal three chapters of the text, instructors of a differential equations-only course can take advantage of the many additional topics for consideration in the For further study sections that close each chapter. There is a wide range of subjects from which to choose, both theoretical and applied, including discrete dynamical systems, how raindrops fall, matrix exponentials, companion matrices, Laplace transforms of periodic piecewise continuous forcing functions, and competitive species. Appendices

Finally, the text closes with ﬁve appendices. The ﬁrst three—on integration techniques, polynomial zeros, and complex numbers—are intended as a review of familiar topics from courses as far back in students’ experience as high school algebra. The instructor can refer to these topics as necessary and encourage students to read them for review. Appendix D is different in that it aims to connect some key ideas in linear algebra and differential equations through a more sophisticated viewpoint: linear transformations of vector spaces. Some of the material there is appropriate for consideration following chapter 1, but it is perhaps more suited to discussion after the Laplace transform has been introduced. Finally, appendix E contains answers to nearly all of the odd-numbered exercises in the text.

Introduction

xvii

Acknowledgments

We are grateful to our institutions for the time and support provided to work on this manuscript; to several anonymous reviewers whose comments have improved it; to our students for their feedback in classroom-testing of the text; and to all students and instructors who choose to use this book. We welcome all comments and suggestions for improvement, while taking full responsibility for any errors or omissions in the text. Matt Boelkins/J. L. Goldberg/Merle Potter

This page intentionally left blank

Differential Equations with Linear Algebra

This page intentionally left blank

1 Essentials of linear algebra

1.1 Motivating problems

The subjects of differential equations and linear algebra are particularly important because each ﬁnds a wide range of applications in fundamental physical problems. We consider two situations that involve systems of equations to motivate our work in this chapter and much of the remainder of the text. The pollution of bodies of water is an important issue for humankind. Environmental scientists are particularly interested in systems of rivers and lakes where they can study the ﬂow of a given pollutant from one body of water to another. For example, there is great concern regarding the presence of a variety of pollutants in the Great Lakes (Lakes Michigan, Superior, Huron, Erie, and Ontario), including salt due to snow melt from highways. Due to the large number of possible ways for salt to enter and exit such a system, as well as the many lakes and rivers involved, this problem is mathematically complicated. But we may gain a feel for how one might proceed by considering a simple system of two tanks, say A and B, where there are independent inﬂows and outﬂows from each, as well as two pipes with opposite ﬂows connecting the tanks as pictured in ﬁgure 1.1. We will let x1 denote the amount of salt (in grams) in A at time t (in minutes). Since water ﬂows into and out of the tank, and each such ﬂow carries salt, the amount of salt x1 is changing as a function of time. We know from calculus that dx1 /dt measures the rate of change of salt in the tank with respect to time, and is measured in grams per minute. In this basic model, we can see that the rate of change of salt in the tank will be the difference between the net rate of salt ﬂowing in and the net rate of salt ﬂowing out. 3

4

Essentials of linear algebra

A

B

Figure 1.1 Two tanks with inﬂows, outﬂows,

and connecting pipes.

As a simplifying assumption, we will suppose that the volume of solution in each tank remains constant and all inﬂows and outﬂows happen at the identical rate of 5 liters per minute. We will further assume that the tanks are uniformly mixed so that the salt concentration in each is identical throughout the tank at a given time t . Let us now suppose that the volume of tank A is 200 liters; as we just noted, the pipe ﬂowing into A delivers solution at a rate of 5 liters per minute. Moreover, suppose that this entering water is contaminated with 4 g of salt per liter. An analysis of the units on these quantities shows that the rate of inﬂow of salt into A is 5 liters 4 g g · = 20 min liter min

(1.1.1)

There is one other inﬂow to consider, that being the pipe from B, which we will consider momentarily after ﬁrst examining the behavior of the outﬂow. For the solution exiting the drain from A at a rate of 5 liters/min, observe its concentration is unknown and depends on the amount of salt in the tank at time t . In particular, since there are x1 g of salt in the tank at time t , and this is distributed over the volume of 200 liters, we can say (using the simplifying assumption that the tank’s contents stay uniformly mixed) that the rate of outﬂow of salt in each of the exiting pipes is 5 liters x1 g x1 g · = min 200 liters 40 min

(1.1.2)

Since there are two such exit ﬂows, this means that the combined rate of outﬂow of salt from A is twice this amount, or x1 /20 g/min. Finally, there is one last inﬂow to consider. Note that solution from B is entering A at a rate of 5 liters per minute. If we assume that B has a (constant) volume of 400 liters, this ﬂow has a salt concentration of x2 g/400 liters. Thus the rate of salt entering A from B is 5 liters x2 g x2 g · = min 400 liters 80 min

(1.1.3)

Motivating problems

5

Combining the rates of inﬂow (1.1.1) and (1.1.3) and outﬂow (1.1.2), where inﬂows are considered positive and outﬂows negative, leads us to the differential equation x2 x1 dx1 = 20 + − (1.1.4) dt 80 20 Since we have two tanks in the system, there is a second differential equation to consider. Under the assumptions that B has a volume of 400 liters, the pipe entering B carries a concentration of salt of 7 g/liter, and the net rates of inﬂow and outﬂow match those into A, a similar analysis to the above reveals that dx2 x1 x2 = 35 + − dt 40 40 Together, these two DEs form a system of DEs, given by

(1.1.5)

dx1 x2 x1 = 20 + − (1.1.6) dt 80 20 dx2 x1 x2 = 35 + − dt 40 40 Systems of DEs are therefore, seen to play a key role in environmental processes. Indeed, they ﬁnd application in studying the vibrations of mechanical systems, the ﬂow of electricity in circuits, the interactions between predators and prey, and much more. We will begin our examination of the mathematics involved with systems of differential equations in chapter 3. An important question related to the above system of DEs leads us to a more familiar mathematical situation, one that is the foundation of much of the subject of linear algebra. For the system of tanks above, we might ask, “under what circumstances is the amount of salt in the two tanks not changing?” In such a situation, neither x1 nor x2 varies, so the rate of change of each is zero, and therefore dx1 dx2 = =0 dt dt Substituting these values into the system of DEs, we see that this results in the system of linear equations x2 x1 0 = 20 + − (1.1.7) 80 20 x1 x2 − 0 = 35 + 40 40 Multiplying both sides of the ﬁrst equation by eighty and the second by forty and rearranging terms, we ﬁnd an equivalent system to be 4x1 − x2 = 1600 x1 − x2 = −1400 Geometrically, this system of linear equations represents the set of all points that simultaneously lie on each of the two lines given by the respective equations.

6

Essentials of linear algebra

The solution of such 2 × 2 systems is typically discussed in introductory algebra classes where students learn how to solve systems like these with the methods of substitution and elimination. Doing so here leads to the unique solution x1 = 1000, x2 = 2400; one interpretation of this ordered pair is that the system of two tanks has an equilibrium state where, if the two tanks ever reach this level of salinity, that salinity will then stay constant. With further study of linear algebra and DEs, we will be able to show that over time, regardless of how much salt is initially in each tank, the amount of salt in A will approach 1000 g, while that in B will approach 2400 g. We will thus call the equilibrium point stable. Electrical circuits are another physical situation where systems of linear equations naturally arise. Flow of electricity through a collection of wires is similar to the ﬂow of water through a sequence of pipes: current measures the ﬂow of electrons (charge carriers) past a given point in the circuit. Typically, we think about a battery as a source that provides a ﬂow of electricity, wires as a collection of paths along which the electricity may ﬂow, and resistors as places in the circuit where electricity is converted to some sort of output such as heat or light. While we will discuss the principles behind the ﬂow of electricity in more detail in section 3.8, for now a basic understanding of Kirchoff’s laws enables us to see an important application of linear systems of equations. In a given loop or branch j of a circuit, current is measured in amperes (A) and is denoted by the symbol Ij . Resistances are measured in ohms (), and the energy produced by the battery is measured in volts. As shown in ﬁgure 1.2, we use arrows in the circuit to represent the direction of ﬂow of the current; when 10V + −

6Ω

I3

I3 I2

I2

a

b

2Ω I1

3Ω

4Ω

I1

+ −

5V Figure 1.2 A simple circuit with two loops, two

batteries, and four resistors.

Motivating problems

7

this ﬂow is away from the positive side of a battery (the circles in the diagram), then the voltage is taken to be positive. Otherwise, the voltage is negative. Two fundamental laws govern how the currents in various loops of the circuit behave. One is Kirchoff’s current law, which is essentially a conservation law. It states that the sum of all current ﬂowing into a node equals the sum of the current ﬂowing out. For example, in ﬁgure 1.2 at junction a, I1 + I3 = I2

(1.1.8)

Similarly, at junction b, we must have I2 = I1 + I3 . This equation is identical to (1.1.8) and adds no new information about the currents. Ohm’s law governs the ﬂow of electricity through resistors, and states that the voltage drop across a resistor is proportional to the current. That is, V = IR, where R is a constant that is the amount of resistance, measured in ohms. For instance, in the circuit given in ﬁgure 1.2, the voltage drop through the 3- resistor on the bottom right is V = 3 . Kirchoff’s voltage law states that, in any closed loop, the sum of the voltage drops must be zero. Since the battery that is present maintains a constant voltage, it follows that in the bottom loop of the given circuit, 4I1 + 2I2 + 3I1 = 5

(1.1.9)

Similarly, in the upper loop, we have 6I3 + 2I2 = 10

(1.1.10)

Finally, in the outer loop, taking into account the direction of ﬂow of electricity by regarding opposing ﬂows as having opposing signs, we observe 6I3 − 4I1 − 3I1 = −5 + 10

(1.1.11)

Taking (1.1.8) through (1.1.11), combining like terms, and rearranging each so that indices are in increasing order, we have the system of linear equations I 1 − I2 + I3 = 0 7I1 + 2I2 =5 2I2 + 6I3 = 10 −7I1 + 6I3 = 5

(1.1.12)

We will call the system (1.1.12) a 4 × 3 system to represent the fact that it is a collection of four linear equations in three unknown variables. Its solution—the set of all possible values of (I1 , I2 , I3 ) that make all four equations simultaneously true—provides the current in each loop of the circuit. In this ﬁrst chapter, we will develop our understanding of the more general situation of systems of linear equations with m linear equations in n unknown variables. This problem will lead us to consider important ideas from the theory of matrices that play key roles in a variety of applications ranging from computer graphics to population dynamics; related ideas will ﬁnd further applications in our subsequent study of systems of differential equations.

8

Essentials of linear algebra

1.2 Systems of linear equations

Linear equations are the simplest of all possible equations and are involved in many applications of mathematics. In addition, linear equations play a fundamental role in the study of differential equations. As such, the notion of linearity will be a theme throughout this book. Formally, a linear equation in variables x1 , . . . , xn is one having the form a1 x1 + a2 x2 + · · · + an xn = b

(1.2.1)

where the coefﬁcients a1 , . . . , an and the value b are real or complex numbers. For example, 2x1 + 3x2 − 5x3 = 7 is a linear equation, while x12 + sin x2 − x3 ln x1 = 5 is not. Just as the equation 2x1 + 3x2 = 7 describes a line in the x1 –x2 plane, the linear equation 2x1 + 3x2 − 5x3 = 7 determines a plane in three-dimensional space. A system of m linear equations in n unknown variables is a collection of m linear equations in n variables, say x1 , . . . , xn . We often refer to such a system as an “m × n system of equations.” For example, x1 + 2x2 + x3 = 1 x1 + x2 + 2x3 = 0

(1.2.2)

is a system of two linear equations in three unknown variables. A solution to the system is any point (x1 , x2 , x3 ) that makes both equations simultaneously true; the solution set for (1.2.2) is the collection of all such solutions. Geometrically, each of these two equations describes a plane in three-dimensional space, as shown in ﬁgure 1.3, and hence the solution set consists of all points that lie on both of the planes. Since the planes are not parallel, we expect this solution set to 10 x3 x1+ 2x2+ x3 = 1 5 x1+ x2+ 2x3 = 0 2 x1

2

x2

Figure 1.3 The intersection of the planes x1 + 2x2 +

x3 = 1 and x1 + x2 + 2x3 = 0.

Systems of linear equations

9

form a line in R3 . Note that R denotes the set of all real numbers; R3 represents familiar three-dimensional Euclidean space, the set of all ordered triples with real entries. The solution set for the system (1.2.2) may be determined using elementary algebraic steps. We say that two systems are equivalent if they share the same solution set. For example, if we multiply both sides of the ﬁrst equation by −1 and add this to the second equation, we eliminate x1 in the second equation and get the equivalent system x1 + 2x2 + x3 = 1 −x2 + x3 = −1 Next, we multiply both sides of the second equation by −1 to get x1 + 2x2 + x3 = 1 x 2 − x3 = 1 Finally, if we multiply the second equation by −2 and add it to the ﬁrst equation, it follows that x1 + 3x3 = −1 (1.2.3) x2 − x3 = 1 This shows that any solution (x1 , x2 , x3 ) of the original system must satisfy the (simpler) equivalent system of equations x1 = −1 − 3x3 and x2 = 1 + x3 . Said differently, any point in R3 of the form (−1 − 3x3 , 1 + x3 , x3 ), where x3 ∈ R (here the symbol ‘∈’ means is an element of ), is a solution to the system. Replacing x3 by the parameter t , we recognize that the solution to the system is the line parameterized by (1.2.4) (−1 − 3t , 1 + t , t ), t ∈ R which is the intersection of the two planes with which we began, as seen in ﬁgure 1.3. Note that this shows there are inﬁnitely many solutions to the given system of equations; a particular example of such a solution may be found by selecting any value of t (i.e., any point on the line). We can also check that the resulting point makes both of the original equations true. It is not hard to see in the 2 × 2 case that any linear system has either no solution (the lines are parallel), a unique solution (the lines intersect once), or inﬁnitely many solutions (the two equations represent the same line). These three options (no solution, exactly one solution, or inﬁnitely many) turn out to be the only possible cases for any m × n system of linear equations. A system with at least one solution is said to be consistent, while a system with no solution is called inconsistent. In our work above from (1.2.2) to (1.2.3) in reducing the given system of equations to a simpler equivalent one, it is evident that the coefﬁcients of the system played the key role, while the variables x1 , x2 , and x3 (and the equals sign) were essentially placeholders. It proves expedient to therefore change notation and collect all of the coefﬁcients into a rectangular array (called a matrix) and eliminate the redundancy of repeatedly writing the variables. Let us reconsider

10

Essentials of linear algebra

our above work in this light, where we will now refer to rows in the coefﬁcient matrix rather than equations in the original system. When we create a right-most column consisting of the constants from the right-hand side of each equation, we often say we have an augmented matrix. From the ‘simplest’ version of the system at (1.2.3), the corresponding augmented matrix is 1 0 3 −1 0 1 −1 1 The 0’s represent variables that have been eliminated in each equation. From this, we see that our goal in working with a matrix that represents a system of equations is essentially to introduce as many zeros as possible through operations that do not change the solution set of the system. We now repeat the exact same steps we took with the system above, but translate our operations to be on the matrix, rather than the equations themselves. We begin with the augmented matrix 1 2 1 1 1 1 2 0 To introduce a zero in the bottom left corner, we add −1 times the ﬁrst row to the second row, to yield a new row 2 and the updated matrix 1 2 1 1 0 −1 1 −1 The ‘0’ in the second entry of the ﬁrst column shows that we have eliminated the presence of the x1 variable in the second equation. Next, we can multiply row 2 by −1 to obtain an updated row 2 and the augmented matrix 1 2 1 1 0 1 −1 1 Finally, if we multiply row 2 by −2 and add this to row 1, we ﬁnd a new row 1 and the matrix 1 0 3 −1 0 1 −1 1 At this point, we have introduced as many zeros as possible1 , and have arrived at our goal of the simplest possible equivalent system. We can reinterpret the matrix as a system of equations: the ﬁrst row implies that x1 + 3x3 = −1, while the second row implies x2 − x3 = 1. This leads us to ﬁnd, as we did above, that any solution (x1 , x2 , x3 ) of the original system must be of the form (−1 − 3x3 , 1 + x3 , x3 ), where x3 ∈ R. 1 Any additional row operations to introduce zeros in the third or fourth columns will replace the zeros in columns 1 or 2 with nonzero entries.

Systems of linear equations

11

We will commonly need to refer to the number of rows and columns in a matrix. For example, the matrix

1 0 3 −1 0 1 −1 1

has two rows and four columns; therefore, we say this is a 2 × 4 matrix. In general, an m × n matrix has m rows and n columns. Observe that if we have a 2 × 3 system of equations, its corresponding augmented matrix will be 2 × 4. The above example demonstrates the general fact that there are basic operations we can perform on an augmented matrix that, at each stage, result in the matrix representing an equivalent system of equations; that is, these operations do not change the solution to the system, but rather make the solution more easily obtained. In particular, we may 1. Replace one row by the sum of itself and a multiple of another row; 2. Interchange any two rows; or 3. Scale a row by multiplying every entry in a given row by a ﬁxed nonzero constant. These three types of operations are typically called elementary row operations. Two matrices are row equivalent if there is a sequence of elementary row operations that transform one matrix into the other. When matrices are used to represent systems of linear equations, as was done above, it is always the case that row-equivalent matrices correspond to equivalent systems. We desire to use elementary row operations systematically to produce row equivalent matrices from which we may easily interpret the solution to a system of equations. For example, the solution to the system represented by ⎡ ⎤ 1 0 0 −5 ⎣0 1 0 6⎦ 0 0 1 −3

(1.2.5)

is easy to obtain (in particular, x1 = −5, x2 = 6, x3 = −3), while the solution for ⎡

⎤ 3 −2 4 −39 ⎣−1 2 7 −4⎦ 6 9 −3 33

is not, even though the two matrices are equivalent. Therefore, we desire each variable in the system to be represented in its corresponding augmented matrix as infrequently as possible. Essentially our goal is to get as many columns of the matrix as possible to have one entry that is 1, while all the rest of the entries in that column are 0.

12

Essentials of linear algebra

A matrix is said to be in reduced row echelon form (RREF) if and only if the following characteristics are satisﬁed: • All nonzero rows are above any rows with all zeros • The ﬁrst nonzero entry (or leading entry) in a given row is 1 and is in a column to the right of the ﬁrst nonzero entry in any row above it • Every other entry in a column with a leading 1 is 0 For example, the matrix in (1.2.5) is in RREF, while the matrix ⎡ ⎤ 1 −2 4 −5 ⎣0 2 7 6⎦ 0 0 −3 −3 is not, since two of the rows lack leading 1’s, and columns 2 and 3 lack zeros in the entries above the lowest nonzero locations. Each leading 1 in RREF is said to be in a pivot position, the column in which the 1 lies is termed a pivot column, and the leading 1 itself is called a pivot. Rows with all zeros do not contain a pivot position. The process by which row operations are applied to a matrix to convert it to RREF is usually called Gauss– Jordan elimination. We will also say that we “row-reduced” a given matrix. While this process can be described in a somewhat cumbersome algorithm, it is best demonstrated with a few examples. By working through the details of the following problems (in particular by deciding which elementary row operations were performed at each stage), the reader will not only learn the basics of row reduction, but also will see and understand the key possibilities for the solution set of a system of linear equations. Example 1.2.1

Solve the system of equations 3x1 + 2x2 − x3 = 8 x1 − 4x2 + 2x3 = −9 −2x1 + x2 + x3 = −1

Solution.

(1.2.6)

We begin with the corresponding augmented matrix ⎡ ⎤ 3 2 −1 8 ⎣ 1 −4 2 −9⎦ −2 1 1 −1

and then perform a sequence of row operations. The arrows below denote the fact that one or more row operations have been performed to produce a row equivalent matrix. We ﬁnd that ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 3 2 −1 1 −4 2 −9 1 −4 2 −9 8 ⎣ 1 −4 2 −1 2 −9⎦ → ⎣ 3 8⎦ → ⎣0 14 −7 35⎦ → 0 −7 −2 1 1 −1 −2 1 1 −1 5 −19

Systems of linear equations

13

x3 1 2 1

5

3

x2

2 x1

Figure 1.4 The intersection of the three

planes given by the linear system (1.2.6).

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 1 1 −4 2 −9 1 0 0 1 5 1 5⎦ 5⎦ 1 ⎣0 ⎣0 1 − 2 ⎦ ⎣ 1 − 12 2 → 0 1 −2 2 → 2 → 3 3 0 −7 5 −19 0 0 1 −1 0 0 2 −2 ⎡ ⎤ 1 0 0 1 ⎣0 1 0 2⎦ 0 0 1 −1

This shows us that the original 3 × 3 system has a unique solution, and that this solution is the point (1, 2, −1). Geometrically, this demonstrates that the three planes with equations given by the system (1.2.6) meet in a single point, as we can see in ﬁgure 1.4.

Example 1.2.2 Solve the system of equations x1 + 2x2 − x3 = 1 x 1 + x2 =2 3x1 + x2 + 2x3 = 8 Solution.

(1.2.7)

We consider the corresponding augmented matrix ⎡ ⎤ 1 2 −1 1 ⎣1 1 0 2⎦ 3 1 2 8

and again perform a sequence of row operations: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 −1 1 1 2 −1 1 1 2 −1 1 1 0 1 3 ⎣1 1 0 2⎦ → ⎣0 −1 1 1⎦ → ⎣0 1 −1 −1⎦ → ⎣0 1 −1 −1⎦ 3 1 2 8 0 −5 5 5 0 −5 5 5 0 0 0 0 In this case, we see that one row of the matrix has essentially vanished. This shows that one of the equations in the original system was redundant, and

14

Essentials of linear algebra

did not contribute any restrictions on the system. Moreover, as the matrix is now in RREF, we can see that the simplest equivalent system is given by the two equations x1 + x3 = 3 and x2 − x3 = −1. In other words, x1 = 3 − x3 and x2 = −1 + x3 . Since the variable x3 has no restrictions on it, we call x3 a free variable. This implies that the system under consideration has inﬁnitely many solutions, each having the form (3 − t , −1 + t , t ), where t ∈ R

(1.2.8)

In the next section, we will begin to emphasize the role that vectors play in systems of linear equations. For example, the ordered triple (3 − t , −1 + t , t ) in (1.2.8) may be viewed as a vector in R3 . In addition, the representation (1.2.8) of the set of all solutions involving the parameter t is often called the parametric vector form of the solution. As we saw in the very ﬁrst system of equations discussed in this section, example 1.2.2 shows that the three planes given in the system (1.2.7) meet in a line. Example 1.2.3

Solve the system of equations x1 + 2x2 − x3 = 1 x1 + x2 =2 3x1 + x2 + 2x3 = 7

Solution. Observe that the only difference between this example and the previous one is that the “8” in the third equation has been replaced with “7.” We proceed with identical row operations to those above and ﬁnd that ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 −1 1 1 2 −1 1 1 2 −1 1 1 0 1 3 ⎣1 1 0 2⎦ → ⎣0 −1 1 1⎦ → ⎣0 1 −1 −1⎦ → ⎣0 1 −1 −1⎦ 3 1 2 7 0 −5 5 4 0 −5 5 4 0 0 0 −1 In this case, the ﬁnal row of the reduced matrix corresponds to the equation 0x1 + 0x2 + 0x3 = −1. Since there are no points (x1 , x2 , x3 ) that make this equation true, it follows that there can be no points which simultaneously satisfy all three equations in the system. Said differently, the three planes given in the original system of equations do not meet at a single point, nor do they meet in a line. Therefore, the system has no solution; recall that we call such a system inconsistent. Note that the only difference between example 1.2.2 and example 1.2.3 is one constant in the righthand side in the equation of one of the planes. This changed the result dramatically, from the case where the system had inﬁnitely many solutions to one where no solutions were present. This is evident geometrically if we think about a situation where three planes meet in a line, and then we alter the equation of one of the planes to shift it to a new plane parallel to its original location: the three planes will no longer have any points in common.

Systems of linear equations

15

Algebraically, we can see what is so special about the one constant we changed (8 to 7) if we replace this value with an arbitrary constant, say k, and perform row operations: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 −1 1 1 2 −1 1 1 0 1 3 ⎣1 1 0 2⎦ → ⎣0 1 −1 −1⎦ → ⎣0 1 −1 −1⎦ 3 1 2 k 0 −5 0 0 0 k −8 5 k −3 This shows that for any value of k other than 8, the resulting system of linear equations will be inconsistent, therefore having no solutions. In the case that k = 8, we see that a free variable arises and then the system has inﬁnitely many solutions. Overall, the question of consistency is an important one for any linear system of equations. In asking “is this system consistent?” we investigate whether or not the system has at least one solution. Moreover, we are now in a position to understand how RREF determines the answer to this question. We note from considering the RREF of a matrix that there are two overall cases: either the system contains an equation of the form 0x1 + · · · + 0xn = b, where b is nonzero, or it has no such equation. In the former case, the system is inconsistent and has no solution. In the latter case, it will either be that every variable is uniquely determined, or that there are one or more free variables present, in which case there are inﬁnitely many solutions to the system. This leads us to state the following theorem. Theorem 1.2.1 For any linear system of equations, there are only three possible cases for the solution set: there are no solutions, there is a unique solution, or there are inﬁnitely many solutions. This central fact regarding linear systems will play a key role in our studies. 1.2.1 Row-reduction using Maple

Obviously one of the problems with the process of row reducing a matrix is the potential for human arithmetic errors. Soon we will learn how to use computer software to execute all of these computations quickly; ﬁrst, though, we can deepen our understanding of how the process works, and simultaneously eliminate arithmetic mistakes, by using a computer algebra system in a step-bystep fashion. Our software of choice is Maple. For now, we only assume that the user is familiar with Maple’s interface, and will introduce relevant commands with examples as we go. We will use the LinearAlgebra package in Maple, which is loaded using the command > with(LinearAlgebra):

(The symbol ‘>’ is called a Maple prompt; the program makes this available to the user automatically, and it should not be entered by the user.) To demonstrate

16

Essentials of linear algebra

various commands, we will revisit the system from example 1.2.1. The reader should explore this code actively by entering and experimenting on his or her own. Recall that we were interested in row-reducing the augmented matrix ⎡ ⎤ 3 2 −1 8 ⎣ 1 −4 2 −9⎦ −2 1 1 −1 We enter the augmented matrix, say A, column-wise in Maple with the command > A := ;

We ﬁrst want to swap rows 1 and 2; this is accomplished by entering > A1 := RowOperation(A,[1,2]);

Note that this stores the result of this row operation in the matrix A1, which is convenient for use in the next step. After executing the most recent command, the following matrix will appear on the screen: ⎤ ⎡ 1 −4 2 −9 2 −1 8⎦ A1 := ⎣ 3 −2 1 1 −1 To perform row-replacement, our next step is to add (−3) · R1 to R2 (where rows 1 and 2 are denoted R1 and R2 ) to generate a new second row; similarly, we will add 2 · R1 to R3 for an updated row 3. The commands that accomplish these steps are > A2 := RowOperation(A1,[2,1],-3); > A3 := RowOperation(A2,[3,1],2);

and lead to the following output: ⎡ ⎤ 1 −4 2 −9 35⎦ A3 := ⎣0 14 −7 0 −7 5 −19 Next, we will scale row 2 by a factor of 1/14 using the command > A4 := RowOperation(A3,2,1/14);

to ﬁnd that

⎡ ⎤ 1 −4 2 −9 5⎦ 1 − 12 A4 := ⎣0 2 0 −7 5 −19

Systems of linear equations

17

The remainder of the computations in this example involve slightly modiﬁed versions of the three versions of the RowOperation command demonstrated above, and are left as an exercise for the reader. Recall that the unique solution to the original system is (1, 2, −1). Maple is certainly capable of performing all of these steps at once. After completing each step-by-step command above in the row-reduction process, the result can be checked by executing the command > ReducedRowEchelonForm(A);

The corresponding output should be ⎡ ⎤ 1 0 0 1 ⎣0 1 0 2⎦ 0 0 1 −1 which clearly reveals the unique solution to the system, (1, 2, −1). Exercises 1.2 In exercises 1–4, solve each system of equations or explain why no solution exists. 1. x1 + 2x2 = 1 x 1 + x2 = 0 2.

x1 + 2x2 = 1 −2x1 − 4x2 = −2

3.

x1 + 2x2 = 1 −2x1 − 4x2 = −3

4. 4x1 − 3x2 = 5 −x1 + 4x2 = 2 In exercises 5–9, for each linear system represented by a given augmented matrix in RREF, decide whether or not the system is consistent or not. If the system is consistent, determine its solution set. For systems with inﬁnitely many solutions, express the solution in parametric vector form. ⎡ ⎤ 5. 1 0 0 4 ⎣0 1 0 −2⎦ 0 0 1 3 ⎡ ⎤ 6. 1 0 0 4 ⎣0 1 1 −2⎦ 0 0 0 3 ⎡ ⎤ 7. 1 0 2 −3 ⎢0 1 1 −2⎥ ⎢ ⎥ ⎣0 0 0 0⎦ 0 0 0 0

18

Essentials of linear algebra

⎡ 1 ⎣0 0 ⎡ 9. 1 ⎢0 ⎢ ⎣0 0

8.

⎤ 0 0 −3 5 0 1 −2 4⎦ 0 0 0 0 −2 0 0 0

0 1 0 0

4 3 0 0

⎤ 0 −1 0 2⎥ ⎥ 1 −5⎦ 0 0

In exercises 10–14, the given augmented matrix represents a system for which some row operations have been performed to partially row-reduce the matrix. By deciding which operations must next be executed, ﬁnish row-reducing each matrix. Finally, interpret your results to state the solution set to the system. ⎡ ⎤ 10. 1 3 2 5 ⎣0 1 −4 −1⎦ 0 0 1 7 ⎡ ⎤ 11. 1 0 0 4 ⎣0 0 0 3⎦ 0 1 1 −2 ⎡ ⎤ 12. 1 0 2 −3 ⎢0 1 1 −2⎥ ⎢ ⎥ ⎣0 3 3 −6⎦ 0 2 2 −1 ⎡ ⎤ 13. 1 0 5 −1 6 ⎣0 0 2 −8 2⎦ 0 0 0 0 0 ⎡ ⎤ 14. 1 −3 0 5 0 −3 ⎢0 0 1 3 0 4⎥ ⎢ ⎥ ⎣0 0 0 1 2 −9⎦ 0 0 0 0 1 4 Determine all value(s) of h that make each augmented matrix in exercises 15–18 correspond to a consistent linear system. For such h, describe the solution set to the system. 15. 1 −2 7 −3 6 h 16. 1 −2 7 −3 h −21 17. 1 h 3 2 h 6 18. 1 2 3 −2 h 5

Systems of linear equations

19

Use a computer algebra system to perform step-by-step row operations to solve each of the following linear systems in exercises 19–23. If the system is consistent, determine its solution set. For systems with inﬁnitely many solutions, express the solution in parametric vector form. 19. x1 − x2 + x3 = 5 2x1 − 4x2 + 3x3 = 0 x1 − 6x2 + 2x3 = 3 20.

4x1 + 2x2 − x3 = −2 x 1 − x2 + x 3 = 6 −3x1 + x2 − 4x3 = −20

21.

4x1 + 2x2 − x3 = −2 x 1 − x2 + x 3 = 6 −2x1 − 4x2 + 3x3 = 14

22.

4x1 + 2x2 − x3 = −2 x 1 − x2 + x 3 = 6 −2x1 − 4x2 + 3x3 = 13

23.

2x2 + 3x3 2x3 2x1 + 2x2 − 5x3 2x1 − 6x3

− 4x4 + 3x4 + 2x4 + 9x4

=1 =4 =4 =7

In exercises 24–27, determine whether or not the given three lines or planes meet in a single point. Justify your answer using appropriate row operations. 24. x1 + x2 = 5, 2x1 − 3x2 = −5, −4x1 + 2x2 = −2 25. x1 + x2 = 5, 2x1 − 3x2 = −5, −4x1 + 2x2 = −3 26. x1 + x2 + x3 = 5, 2x1 − 3x2 + x3 = 1, −4x1 + 2x2 + 5x3 = 4 27. x1 + x2 + x3 = 5, 2x1 − 3x2 + x3 = 3, −4x1 + 2x2 + 5x3 = 4 28. Consider a linear system whose corresponding augmented matrix has all zeros in its ﬁnal column. Is it ever possible for such a system to be inconsistent? Why or why not? 29. Is it possible for a 2 × 3 linear system to be inconsistent? Explain. 30. If a 3 × 4 linear system has three pivot columns in its corresponding augmented matrix, can you determine whether or not the system must be consistent? Explain. 31. A system of linear equations has a unique solution. What can be determined about the relationship between the number of pivot columns in the augmented matrix and the number of variables in the system?

20

Essentials of linear algebra

32. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) Two lines must either intersect or be parallel. (b) A system of three linear equations in three unknown variables can have exactly three solutions. (c) If the RREF of a matrix has a row of all zeros, then the corresponding system must have a free variable present. (d) If a system has a free variable present, then the system has inﬁnitely many solutions. (e) A solution to a 4 × 3 linear system is a list of four numbers (x1 , x2 , x3 , x4 ) that simultaneously makes every equation in the system true. (f) A matrix with three columns and four rows is 3 × 4. (g) A consistent system is one with exactly one solution. 33. Suppose that we would like to ﬁnd a quadratic function p(t ) = a2 t 2 + a1 t + a0 that passes through the three points (1, 4), (2, 7), and (3, 6). How does this problem lead to a system of linear equations? Find the function p(t ). (Hint: p(1) = 4 implies that 4 = a2 12 + a1 1 + a0 .) 34. Find a quadratic function p(t ) = a2 t 2 + a1 t + a0 that passes through the three points (−1, 1), (2, −1), and (5, 4). How does this problem involve a system of linear equations? 35. For the circuit shown at the left in ﬁgure 1.5, set up and solve a system of linear equations whose solution is the respective currents I1 , I2 , and I3 . 36. For the circuit shown at the right in ﬁgure 1.5, set up and solve a system of linear equations whose solution is the respective currents I1 , I2 , and I3 . I3

20V + −

I3

2Ω

2Ω

4Ω I2

I3

I3 I2

I2

3Ω

5Ω 1Ω

I1

I1

I1

+ −

6V I1

1Ω

+ −

+ −

10V

8V

Figure 1.5 Circuits for use in exercises 35 and 36.

I2

Linear combinations

21

1.3 Linear combinations

An important theme in mathematics that is especially present in linear algebra is the value of considering the same idea from a variety of different perspectives. Often, we can make statements that on the surface may seem unrelated, when in fact they ultimately mean the same thing, and one of the statements is most advantageous for solving a particular problem. Throughout our study of linear algebra, we will see that the subject offers a wide variety of perspectives and terminology for addressing the central concept: systems of linear equations. In this section, we take another look at the concept of consistency, but do so in a different, geometric light. Example 1.3.1 Consider the system of equations x1 − x2 = 1 x1 + x2 = 3 x1 + 2x2 = 4

(1.3.1)

Rewrite the system in vector form and explore how two vectors are being combined to form a third, particularly in terms of the geometry of R3 . Then solve the system. Solution. In multivariable calculus, we learn to think of vectors in R3 very much like we think of points. For example, given the point (a , b , c), we may write v = a , b , c or v = ai + bj + ck to denote the vector v that emanates from (0, 0, 0) and ends at (a , b , c). (Here i, j, and k represent the standard unit coordinate vectors: i is the vector from (0, 0, 0) to (1, 0, 0), j to (0, 1, 0), and k to (0, 0, 1).) In linear algebra, we will prefer to take the perspective of writing such an ordered triple as a matrix with only one column, also known as a column vector, in the form ⎡ ⎤ a v = ⎣b⎦ c

(1.3.2)

To save space, we will sometimes use the equivalent notation2 v = [a b c ]T . Recall that two vectors are equal if and only if their corresponding entries are equal, that a vector may be multiplied by a scalar, and that any two vectors of the same size may be added.

2 The ‘T ’ stands for transpose, and the transpose of a matrix is achieved by turning every column into a row.

22

Essentials of linear algebra

We can now re-examine the system of equations (1.3.1) in the light of equality among vectors. In particular, observe that it is equivalent to say ⎡ ⎤ ⎡ ⎤ x1 − x2 1 ⎣ x1 + x2 ⎦ = ⎣ 3 ⎦ (1.3.3) 4 x1 + 2x2 since two vectors are equal if and only if their corresponding entries are equal. Recalling further that vectors are added component-wise, we can rewrite (1.3.3) as ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −x2 x1 1 ⎣ x1 ⎦ + ⎣ x 2 ⎦ = ⎣ 3 ⎦ (1.3.4) 4 x1 2x2 Finally, we observe in (1.3.4) that the ﬁrst vector on the left-hand side has a common factor of x1 in each component, and the second vector similarly contains x2 . Since a scalar multiple of a vector is computed component-wise, here we can rewrite the equation once more, now in the form ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 1 1 x 1 ⎣ 1 ⎦ + x2 ⎣ 1 ⎦ = ⎣ 3 ⎦ (1.3.5) 1 2 4 Equation (1.3.5) is equivalent to the original system (1.3.1), but is now being viewed in a very different way. Speciﬁcally, this last equation asks if there are values of x1 and x2 for which x 1 v 1 + x2 v 2 = b where

⎤ ⎡ ⎤ ⎡ ⎡ ⎤ −1 1 1 v1 = ⎣ 1 ⎦ , v2 = ⎣ 1 ⎦ , and b = ⎣ 3 ⎦ 1 2 4

(1.3.6)

If we plot the vectors v1 , v2 , and b, an interesting situation comes to light, as seen in ﬁgure 1.6. In particular, it appears as if all three vectors lie in the same plane. Moreover, if we think about the parallelogram law of vector addition and stretch the vector v1 by a factor of 2, we see the image in ﬁgure 1.7. This shows geometrically that it appears b = 2v1 + v2 ; a quick check of the vector arithmetic conﬁrms that this is in fact the case. In other words, the unique solution to the system (1.3.1) is x1 = 2 and x2 = 1. Among the many important ideas in example 1.3.1, perhaps most signiﬁcant is the way we were able to re-cast a problem about a system of linear equations as a question involving vectors. In particular, we saw that it was equivalent to ask if there exist constants x1 and x2 such that x 1 v 1 + x2 v 2 = b

(1.3.7)

Linear combinations

23

x3 6 4 v2

−1 2

b 1

v1

3

2 x 1

4 x2

Figure 1.6 The vectors v1 , v2 , and b

from (1.3.6).

x3 6 4 −1 2

v2 b

1

2v1

3

2 x 1

4 x2

Figure 1.7 The parallelogram formed by

the vectors 2v1 and v2 from (1.3.6).

Note that in (1.3.7), we are only taking scalar multiples of vectors and adding them—computations that are linear in nature. We thus naturally come to use the terminology that “x1 v1 + x2 v2 is a linear combination of the vectors v1 and v2 .” A more general deﬁnition now follows, from which we will be able to widen our perspective on systems of linear equations. Deﬁnition 1.3.1 If v1 , . . . , vk are vectors in Rn (that is, each vi is a vector with n entries), and x1 , . . . , xk are scalars, then the vector b given by b = x1 v1 + · · · + xk vk

(1.3.8)

is a linear combination of the vectors v1 , . . . , vk , with weights or coefﬁcients x1 , . . . , xk . Note the notational convention we use, as in example 1.3.1: a bold, nonitalicized, lowercase variable, say x, represents a vector, while a non-bold, italicized, lower-case variable, say c, denotes a scalar. A bold, non-italicized, uppercase variable, say A, will represent a matrix with at least two columns.

24

Essentials of linear algebra

In light of this new terminology of linear combinations, in example 1.3.1 we saw that the question “is there a solution to the linear system (1.3.1)?” is equivalent to asking “is the vector b a linear combination of the vectors v1 and v2 ?” If we now consider the more general situation of a system of linear equations, say a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .. .

am1 x1 + am2 x2 + · · · + amn xn = bm it follows (as in section 1.2) that we can view this system in terms of the augmented matrix [a1 a2 · · · an b] where a1 is the vector in Rm representing the ﬁrst column of the augmented matrix, and so on. Now, however, we have the additional perspective, as in example 1.3.1, that the columns of the augmented matrix A are precisely the vectors being used to form a linear combination in an attempt to construct b. That is, the general m × n linear system above asks the question, “is b a linear combination of a1 , . . . , an ?” We make the connection between linear combinations and augmented matrices more explicit by deﬁning matrix–vector multiplication in terms of linear combinations. Deﬁnition 1.3.2 Given an m × n matrix A with columns a1 , . . . , an that are vectors in Rm , if x is a vector in Rn , then we deﬁne the product Ax by the equation ⎡ ⎤ x1 ⎢ x2 ⎥ ⎢ ⎥ (1.3.9) Ax = [a1 a2 · · · an ] ⎢ .. ⎥ = x1 a1 + x2 a2 + · · · + xn an ⎣ . ⎦ xn That is, the matrix–vector product of A and x is the vector Ax obtained by taking the linear combination of the column vectors of A according to the weights prescribed by the entries in x. Certainly we must have the same number of entries in x as columns in A, or Ax will not be deﬁned. The following example highlights how to compute and interpret matrix–vector products. Example 1.3.2 Let a1 = [1 − 4 2]T and a2 = [−3 1 5]T , and let A be the matrix whose columns are a1 and a2 . Compute Ax, where x = [−5 2]T , and interpret the result in terms of linear combinations.

Linear combinations

25

Solution.

By deﬁnition, we have that ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ −3 −11 1 −3 1 −5 1⎦ = −5 ⎣ −4 ⎦ + 2 ⎣ 1 ⎦ = ⎣ 22 ⎦ Ax = ⎣ −4 2 5 0 2 5 2

The above computations show clearly that the vector Ax = [−11 22 0]T is a linear combination of a1 and a2 . Following a few more computational examples in homework exercises, the reader will quickly see how to compute the product Ax whenever it is deﬁned; usually we skip past the intermediate stage of writing out the explicit linear combination of the columns and simply write the resulting vector. Matrix– vector multiplication also has several important general properties, some of which will be explored in the exercises. For now, we simply list these properties here for future reference: for any m × n matrix A, vectors x, y ∈ Rn , and c ∈ R, • A(x + y) = Ax + Ay • A(cx) = c(Ax) The ﬁrst property shows that matrix multiplication distributes over addition; the second demonstrates that a scalar multiple can be taken either before or after multiplying the vector x by A. These two properties of matrix multiplication are often referred to as being properties of linearity—note the use of only scalar multiplication and vector addition in each, and the linear appearance of each equation.3 Finally, note that it is also the case that A0n = 0m , where 0n is the vector in Rn with all entries being zero, and 0m is the corresponding zero vector in Rm . There is one more important perspective that this new matrix–vector product notation permits. Recall that, in example 1.3.1, we learned that the question “is b a linear combination of a1 and a2 ?” is equivalent to asking “is there a solution to the system of linear equations whose augmented matrix has columns a1 , a2 , and b?” Now, in light of matrix–vector multiplication, we also see that the question “is b a linear combination of a1 and a2 ?” may be rephrased as asking “does there exist a vector x such that Ax = b?” That is, are there weights x1 and x2 (the entries in vector x) such that b is a linear combination of the columns of A? In particular, we may now adopt the perspective that we desire to solve the equation Ax = b for the unknown vector x, where A is a matrix whose entries are known, and b is a vector whose entries are known. This equation is strikingly similar to the most elementary of equations encountered in algebra, ones such as 2x = 7. Therefore, we see that the linear equation Ax = b, involving matrices and vectors, is of fundamental importance as it is another way of expressing questions

3

A deeper discussion of the notion of linear transformations can be found in appendix D.

26

Essentials of linear algebra

regarding linear combinations and solutions of systems of linear equations. In subsequent sections, we will explore this equation from several perspectives. 1.3.1 Markov chains: an application of matrix–vector multiplication

People are often distributed naturally among various groupings. For example, much political discussion in the United States is centered on three classiﬁcations of voters: Democrat, Republican, and Independent. A similar situation can be considered with regard to peoples’ choices for where to live: urban, suburban, or rural. In each case, the state of the population at a given time is its distribution among the relevant categories. Furthermore, in each of these situations, it is natural to assume that if we consider the state of the system at a given point in time, its state depends on the system’s state in the preceding year. For example, the percentage of Democrats, Republicans, and Independents in the year 2020 ought to be connected to the respective percentages in 2019. Let us assume that a population of voters (of constant size) is considered in which every-one must classiﬁed as either D, R, or I (Democrat, Republican, or Independent). Suppose further that a study of voter registrations over many years reveals the following trends: from one year to the next, 95 percent of Democrats keep their registration the same. For the remaining 5 percent who change parties, 2 percent become Republicans and 3 percent become Independents. Similar data for Republicans and Independents is given in the following table. Future party (↓)/current party (→)

D(%)

R(%)

I(%)

Democrat

95

3

7

Republican

2

90

13

Independent

3

7

80

If we let Dn , Rn , and In denote the respective numbers of registered Democrats, Republicans, and Independents in year n, then the table shows us how to determine the respective numbers in year n + 1. For example, Dn+1 = 0.95Dn + 0.03Rn + 0.07In

(1.3.10)

since 95 percent of the Democrats in year n stay registered Democrats, and 3 percent of Republicans and 7 percent of Independents change to Democrats. Similarly, we have Rn+1 = 0.02Dn + 0.90Rn + 0.13In

(1.3.11)

In+1 = 0.03Dn + 0.07Rn + 0.80In

(1.3.12)

Linear combinations

27

If we combine (1.3.10), (1.3.11), and (1.3.12) in a single vector equation, then ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ Dn+1 0.95 0.03 0.07 ⎣ Rn+1 ⎦ = Dn ⎣ 0.02 ⎦ + Rn ⎣ 0.90 ⎦ + In ⎣ 0.13 ⎦ (1.3.13) 0.03 0.07 0.80 In+1 Here we ﬁnd that linear combinations of vectors have naturally arisen. Note, for example, that the vector [0.03 0.90 0.07]T is the Republican vector, and represents the likelihood that a Republican in a given year will be in one of the three parties in the following year. More speciﬁcally, we observe that probabilities are involved: a Republican has a 3 percent likelihood of registering as a Democrat in the following year, a 90 percent likelihood of staying a Republican, and 7 percent chance of becoming an Independent. The sum of the entries in each column vector is 1. If we use the vector x (n) to represent ⎡ ⎤ Dn x (n) = ⎣ Rn ⎦ In and use matrix–vector multiplication to represent the linear combination of vectors in (1.3.13), then (1.3.13) is equivalently expressed by the equation x (n+1) = Mx (n) where M is the matrix

(1.3.14)

⎡

⎤ 0.95 0.03 0.07 M = ⎣ 0.02 0.90 0.13 ⎦ 0.03 0.07 0.80

The matrix M is often called a transition matrix since it shows how the population transitions from state n to state n + 1. We observe that in order for such a matrix to represent the probabilities that groups in a particular set of states will transition to another set of states, the columns of the matrix M must be nonnegative and add to 1. Such a matrix is called a stochastic matrix or a Markov matrix. Finally, we call any system such as the one with three classiﬁcations of voters, where the state of the system in a given observation period results from applying probabilities to a previous state, a Markov chain or Markov process. We see, for example, that if we had a group of 250 000 voters that at year n = 0 was distributed among Democrats, Republicans, and Independents by the vector (with entries measured in thousands) x (0) = [120 110 20]T then we can easily compute the projected distribution of voters in subsequent years. In particular, (1.3.14) implies ⎡ ⎡ ⎡ ⎤ ⎤ ⎤ 118.70 117.80 117.18 x (1) = Mx (0) = ⎣ 104⎦, x (2) = Mx (1) = ⎣ 99.52⎦, x (3) = Mx (2) = ⎣ 96.18⎦ 27.3 32.68 36.65

28

Essentials of linear algebra

Interestingly, if we continue the sequence, we eventually ﬁnd that there is very little variation from one vector x (n) to the next. For example, ⎡ ⎡ ⎤ ⎤ 116.67 116.79 x (17) = ⎣ 85.95 ⎦ ≈ x (18) = ⎣ 85.76 ⎦ 47.42 47.44 In fact, as we will learn in our later study of eigenvectors, there exists a vector x ∗ called the steady-state vector for which x ∗ = Mx ∗ . This shows that the system can reach a state in which it does not change from one year to the next. Another example is instructive. Example 1.3.3 Geographers studying a metropolitan area have observed a trend that while the population of the area stays roughly constant, people within the city and its suburbs are migrating back and forth. In particular, suppose that 85 percent of people whose homes are in the city keep their residence from one year to the next; the remainder move to the suburbs. Likewise, while 92 percent of people whose homes are in suburbs will live there the next year, the other 8 percent will move into the city. Assuming that in a given year there are 230 000 people living in the city and 270 000 people in the surrounding suburbs, predict the population distribution over the next 3 years. Solution. If we let Cn and Sn denote the populations of the city and suburbs in year n, the given information tells us that the following relationships hold: Cn+1 = 0.85Cn + 0.08Sn Sn+1 = 0.15Cn + 0.92Sn Using the notation

Cn x = Sn we can model the changing distribution of the population between the city and suburbs with the Markov process x (n+1) = Mx (n) , where M is the Markov matrix 0.85 0.08 M= 0.15 0.92

(n)

In particular, starting with x (0) = [230 270]T , we see that 217.10 207.17 199.52 x (1) = , x (2) = , x (3) = 282.90 292.83 300.48 As with voter distribution, this example is oversimpliﬁed. For instance, we have not taken into account members of the population who move into or away from the metropolitan area. Nonetheless, the basic ideas of Markov processes are important in the study of systems whose current state depends on preceding ones, and we see the key role matrices and matrix multiplication play in representing them.

Linear combinations

29

1.3.2 Matrix products using Maple

After becoming comfortable with computing elementary matrix products by hand, it is useful to see how Maple can assist us with more complicated computations. Here, we demonstrate the relevant command. Revisiting example 1.3.2, to compute the product Ax, we ﬁrst enter A and x using the familiar commands > A := ; x := ;

Next, we use the ‘period’ symbol to inform Maple that we want to multiply. Entering > b := A.x;

yields the expected output that

⎡

⎤ −11 b = ⎣ 22 ⎦ 0

Note: Maple will obviously only perform the multiplication when it is deﬁned. If, say, we were to attempt to multiply a 2 × 2 matrix and a 3 × 1 vector, Maple would report the following: Error, (in LinearAlgebra:-MatrixVectorMultiply) vector dimension (3) must be the same as the matrix column dimension (2).

Exercises 1.3 For exercises 1–4, where a matrix A and vector x are given, compute the product Ax in every case that it is deﬁned. If the product is undeﬁned, explain why. 1 −3 2 −1 1. A = , x= 2 −4 1 0 ⎤ ⎡ −1 1 −3 2 , x = ⎣ 2⎦ 2. A = −4 1 0 4 ⎤ ⎡ 5 −2 3 ⎦ ⎣ 1 −1 , x = 3. A = −2 −3 2 ⎡ ⎤ 3

4. A = −4 2 7 , x = ⎣ 5 ⎦ −1

30

Essentials of linear algebra

5. Recall from multivariable calculus that given vectors x , y ∈ R3 , the dot product of x and y, x · y, is computed by taking x · y = x1 y1 + x2 y2 + x3 y3 How can matrix–vector multiplication (when deﬁned) be viewed as the result of computing several appropriate dot products? Explain. 6. For the system of equations given below, determine a vector equation with an equivalent solution. What is the system asking in regard to linear combinations of certain vectors? x1 + 2x2 = 1 x1 + x2 = 0 In addition, determine a matrix A and vector b so that the equation Ax = b is equivalent to the given system of equations. 7. For the system of differential equations (1.1.6) (also given below) from the introductory section, how can we rewrite the system in matrix–vector notation? dx1 x 1 x2 = 20 − + dt 20 80 dx2 x 1 x2 = 35 + − dt 40 40 Hint: recall that if x(t ) is a vector function, we write x (t ) or dx /dt for the vector [dx1 /dt dx2 /dt ]T . 8. Determine if the vector b = [−3 1 5]T is a linear combination of the vectors a1 = [−1 2 1]T , a2 = [3 1 1]T , and a3 = [1 5 3]T . If so, will more than one set of weights work? 9. Determine if the vector b = [0 7 4]T is a linear combination of the vectors a1 = [−1 2 1]T , a2 = [3 1 1]T , and a3 = [1 5 3]T . If so, will more than one set of weights work? 10. We know from our work in this section that the matrix equation Ax = b corresponds both to a vector equation and a system of linear equations. What is the augmented matrix that represents this system of equations? In exercises 11–15, let A be the stated matrix and b the given vector. Solve the linear equation Ax = b by converting the equation to a system of linear equations and row-reducing appropriately. If the system has more than one solution, express the solution in parametric vector form. Finally, write a sentence in each case that explains how the vector b is related to linear combinations of the columns of A. 4 5 −1 13 11. A = , b= 3 1 2 −4

Linear combinations

31

2 5 5 A= , b= 6 −3 −1 6 2 7 A= , b= −3 −1 −1 ⎡ ⎡ ⎤ ⎤ 1 −3 5 1⎦, b = ⎣ −5 ⎦ A = ⎣−2 3 −1 5 ⎤ ⎤ ⎡ ⎡ 5 −3 1 0 1 4⎦, b = ⎣ 22 ⎦ A = ⎣−2 1 0 −2 −11

12. 13.

14.

15.

16. Linear equations of the form Ax = 0 are important for a variety of reasons, some of which we will study in the next section. Explain why the system of linear equations corresponding to the equation Ax = 0 is always consistent, regardless of the matrix A. In exercises 17–21, solve the linear equation Ax = 0 by row-reducing appropriately. If the system has more than one solution, express the solution in parametric vector form. 4 5 −1 17. A = 3 1 2 2 5 18. A = −3 −1 6 2 19. A = −3 −1 ⎡ ⎤ 1 −3 1⎦ 20. A = ⎣−2 3 −1 ⎡ ⎤ 5 −3 1 1 4⎦ 21. A = ⎣−2 1 0 −2 3 −4 b 22. Let A = and b = 1 . Describe the set of all vectors b for −6 8 b2 which the equation Ax = b is consistent. 3 b −4 , v2 = , and b = 1 . Describe the set of all 23. Let v1 = 8 −6 b2 vectors b for which b is a linear combination of v1 and v2 .

32

Essentials of linear algebra

24. Let A be an m × n matrix, x and y ∈ Rn , and c ∈ R. Show that (a) A(x + y) = Ax + Ay (b) A(cx) = c(Ax) 25. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) To compute the product Ax, the vector x must have the same number of entries as the number of rows in A. (b) A linear combination of three vectors in R3 produces another vector in R3 . (c) If b is a linear combination of v1 and v2 , then there exist scalars c1 and c2 such that c1 v1 + c2 v2 = b. (d) If A is a matrix and x and b are vectors such that Ax = b, then x is a linear combination of the columns of A. (e) The equation Ax = 0 can be inconsistent. 26. Suppose that for a large population that stays relatively constant, people are classiﬁed as living in urban, suburban, or rural settings. Moreover, assume that the probabilities of the various possible transitions are given by the following table: Future location (↓)/current location (→)

U(%)

S(%)

R(%)

92

3

2

Suburban

7

96

10

Rural

1

1

88

Urban

Given that the population of 250 million is initially distributed in 100 million urban, 100 million suburban, and ﬁfty million rural, predict the population distribution in each of the following ﬁve years. 27. Car-owners can be grouped into classes based on the vehicles they own. A study of owners of sedans, minivans, and sport utility vehicles shows that the likelihood that an owner of one of these automobiles will replace it with another of the same or different type is given by the table Future vehicle (↓)/ current vehicle (→)

Sedan(%)

Minivan(%)

SUV(%)

91

3

2

Minivan

7

95

8

SUV

2

2

90

Sedan

The span of a set of vectors

33

If there are currently 100 000 sedans, 60 000 minivans, and 80 000 SUVs among the owners being studied, predict the distribution of vehicles among the population after each owner has replaced her vehicle 3 times.

1.4 The span of a set of vectors

In section 1.3, we saw that the question “is b a linear combination of a1 and a2 ?” provides an important new perspective on solutions of linear systems of equations. It is natural to slightly rephrase this question and ask more generally “which vectors b may be written as linear combinations of a1 and a2 ?” We explore this question further through the following sequence of examples. Example 1.4.1 Describe the set of all vectors in R2 that may be written as a linear combination of the vector a1 = [2 1]T . Solution. Since we have just one vector a1 , any linear combination of a1 has the form ca1 , which of course is a scalar multiple of a1 . Geometrically, the vectors that are linear combinations of a1 are stretches of a1 , which lie on the line through (0, 0) in the direction of a1 , as shown in ﬁgure 1.8. In this ﬁrst example, we see a visual way to interpret the question about linear combinations: essentially we want to know “which vectors can we create using only linear combinations of a1 ?” The answer is not surprising: only vectors that lie on the line through the origin in the direction of a1 . Next, we consider how the situation changes when we consider two parallel vectors. 4

x2

a1 x1 −4

4

−4 Figure 1.8 The set of all linear combinations of

a1 in example 1.4.1.

34

Essentials of linear algebra

Example 1.4.2 Describe the set of all vectors in R2 that may be written as a linear combination of the vectors a1 = [2 1]T and a2 = [−1 − 12 ]T . Solution. Observe ﬁrst that − 12 a1 = a2 . Here we are considering the set of all vectors y of the form −1 2 + c2 y = c1 1 − 12 In ﬁgure 1.9, we observe that the vectors a1 and a2 point in opposing directions. When we take a linear combination of these vectors to form y, we are adding a stretch of c1 units of the ﬁrst to a stretch of c2 units of the second. Because the two directions are parallel, this leaves the resulting vector as a stretch of one of the two original vectors, and therefore on the line through the origin in their direction. This may also be seen algebraically since − 12 a1 = a2 implies y = c1 a1 + c2 a2 = c1 a1 − 12 c2 a1 = (c1 − 12 c2 )a1 . We note particularly that since the two given vectors a1 and a2 are parallel, any linear combination of them is actually a scalar multiple of a1 . Thus, the resulting set of all linear combinations is identical to what we found with the single vector given in example 1.4.1. Finally, we consider the situation where we consider all linear combinations of two non-parallel vectors.

4

x2

a1 x1 −4

a2

4

−4 Figure 1.9 The set of all linear combinations of

a1 and a2 in example 1.4.2.

Example 1.4.3 Describe the set of all vectors in R2 that may be written as a linear combination of the vectors a1 = [2 1]T and a2 = [1 2]T .

The span of a set of vectors

4 −2a1+ 2a2

35

x2 a1+ a2

a2 a1

−4

x1 4

7/3 a1− 5/3 a2

−4 Figure 1.10 Linear combinations of a1 and a2

from example 1.4.3.

Solution. Algebraically, we are again considering the set of all vectors y such that y = c1 a1 + c2 a2 . A visual way to think about how the set of all such vectors y looks is found in the question, “which vectors can we create by taking a stretch of a1 and adding this to a stretch of a2 ?” If we consider a plot of the given two vectors a1 and a2 and think of the “grid” that is formed by considering all of their stretches and the sums of their stretches, we have the picture shown in ﬁgure 1.10. The fact that a1 and a2 are not parallel enables us to “get off the line” that each one generates through the origin. For example, if we simply take the sum of these two vectors and set y = a1 + a2 , by the parallelogram law of vector addition we arrive at the new vector [3 3]T shown in ﬁgure 1.10. Two other linear combinations are shown as well, and from here it is not hard to visualize the fact that we can create any vector in the plane using linear combinations of the non-parallel vectors a1 and a2 . In other words, the set of all linear combinations of a1 and a2 is R2 . It is also possible to verify our ﬁndings in example 1.4.3 algebraically. We will explore this further in the exercises and in section 1.5. Certainly we are not limited to considering linear combinations of only two vectors. We therefore introduce a more formal perspective and terminology to describe the phenomena examined in the above examples. Deﬁnition 1.4.1 Given a set of vectors S = {v1 , . . . , vk }, vi ∈ Rm , the span of S, denoted Span(S) or Span{v1 , . . . , vk }, is the set of all linear combinations of the vectors v1 , . . . , vk . Equivalently, Span(S) is the set of all vectors y of the form y = c1 v1 + · · · + ck vk

36

Essentials of linear algebra

where c1 , . . . , ck are scalars. We also say that Span(S) is the subset of Rm spanned by the vectors v1 , . . . , vk . For any single nonzero vector v1 ∈ Rm , Span{v1 } consists of all vectors that lie on the line through the origin in Rm in the direction of v1 . For two nonparallel vectors v1 , v2 ∈ Rm , Span{v1 , v2 } is the plane through the origin that contains both the vectors v1 and v2 . Next, let us recall that our interest in linear combinations was motivated by a desire to look at systems of linear equations from a new perspective. How is the concept of span related to linear systems? We begin to answer this question by considering the special situation where b = 0. A system of linear equations that can be represented in matrix form by the equation Ax = 0 is said to be homogeneous; the case when b = 0 is termed nonhomogeneous. We also call the equation Ax = 0 a homogeneous equation. By the deﬁnition of matrix–vector multiplication, it is immediately clear that A0 = 0 (note that these two zero vectors may be of different sizes), and thus any homogeneous equation has at least one solution and is guaranteed to be consistent. We will usually call the solution x = 0 the trivial solution. Under what circumstances will a homogeneous system have nontrivial solutions? How is this question related to the span of a set of vectors? The following example provides insight into these questions. Example 1.4.4 Solve the homogeneous system of linear equations given by the equation Ax = 0 where A is the matrix ⎡ ⎤ 1 1 1 1 ⎢2 1 −1 3⎥ ⎥ A=⎢ ⎣1 0 −2 2⎦ 8 5 −1 11 If more than one solution exists, express the solution in parametric vector form. Solution. To begin, we augment the matrix A with a column of zeros to represent the vector 0 in the system given by Ax = 0. We then row-reduce this augmented matrix to ﬁnd ⎡ ⎤ ⎡ ⎤ 1 1 1 1 0 1 0 −2 2 0 ⎢2 1 −1 3 0⎥ ⎢0 1 3 −1 0⎥ ⎢ ⎥ ⎢ ⎥ ⎣1 0 −2 2 0⎦ → ⎣0 0 0 0 0⎦ 0 0 0 0 0 8 5 −1 11 0 We observe that the system has two free variables, and therefore inﬁnitely many solutions. In particular, these solutions must satisfy the equations x1 − 2x3 + 2x4 = 0 x2 + 3x3 − x4 = 0

The span of a set of vectors

37

where x3 and x4 are free. Equivalently, using these equations and vector addition and scalar multiplication, it must be the case that any solution x to Ax = 0 has the form ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ x1 2x3 − 2x4 2 −2 ⎢ x2 ⎥ ⎢ −3x3 + x4 ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ = x3 ⎢ −3 ⎥ + x4 ⎢ 1 ⎥ (1.4.1) x=⎢ ⎣ x3 ⎦ = ⎣ ⎦ ⎣ ⎦ ⎣ 0⎦ 1 x3 1 0 x4 x4 where x3 , x4 ∈ R. Note particularly that this shows that every solution x to the original homogeneous equation Ax = 0 can be expressed as a linear combination of the two vectors on the rightmost side of (1.4.1). Moreover, it is also the case that every linear combination of these two vectors is a solution to the equation. In light of the terminology of span, we can say that the set of all solutions to the homogeneous equation Ax = 0 is Span{v1 , v2 }, where ⎡ ⎡ ⎤ ⎤ −2 2 ⎢ −3 ⎥ ⎢ 1⎥ ⎢ ⎥ ⎥ v1 = ⎢ ⎣ 1 ⎦ , v2 = ⎣ 0 ⎦ 1 0 In this section, we have seen that the set of all linear combinations of a set of vectors can be interpreted geometrically, particularly in the case when we only have one or two vectors present, by thinking about lines and planes. In addition, the span of a set of vectors arises naturally in considering homogeneous equations in which inﬁnitely many solutions are present. In that situation, the set of all solutions can be expressed as the span of a set of k vectors, where k is the number of free variables that arise in row-reducing the augmented matrix. Exercises 1.4 In exercises 1–6, solve the homogeneous equation Ax = 0, given the matrix A. If inﬁnitely many solutions exist, express the solution set as the span of the smallest possible set of vectors. 1 −3 2 1. A = −4 1 0 ⎡ ⎤ −4 2 2. A = ⎣ 1 −3⎦ 6 5 8 −5 3. A = 10 −16 2 −4 4. A = 2 −1

38

Essentials of linear algebra

⎡

3 5. A = ⎣ 1 −1 ⎡ 1 6. A = ⎣ 4 −7

⎤ 1 −1 3 1⎦ 1 3 ⎤ −1 2 −2 6⎦ 3 −10

7. Let A be an m × n matrix where n > m. Is it possible that Ax = 0 has only the trivial solution? Explain why or why not. 8. Let A be an m × n matrix where n ≤ m. Is it guaranteed that Ax = 0 will have only the trivial solution? Explain why or why not. 9. Determine if the vector b = [11 − 4]T is in the span of the vectors a1 = [3 − 2]T and a2 = [−9 6]T . Justify your answer carefully. 10. Determine if the vector b = [−17 31]T is in the span of the vectors a1 = [1 0]T and a2 = [0 1]T . What do you observe? 11. Determine if the vector b = [9 17 11]T is in the span of the vectors a1 = [−1 2 1]T , a2 = [3 1 1]T , and a3 = [1 5 3]T . Justify your answer. 12. Explain why the vector b = [3 2]T does not lie in the span of the set S, where S = {v } and v = [1 1]T . 13. Describe geometrically the set W = Span{v1 , v2 }, where v1 = [1 1 1]T and v2 = [−3 0 2]T . 14. Can every vector b ∈ R3 be found in W = Span{v1 , v2 }, where v1 = [1 1 1]T and v2 = [−3 0 2]T ? If so, explain why. If not, ﬁnd a vector not in W and justify your answer. 15. Show that every point (vector) that lies on the line with equation 2x1 − 3x2 = 0 also lies in the set W = Span{v1 }, where v1 = [3 2]T . 16. Show that every point (vector) that lies on the plane with equation −x + y + z = 0 also lies in the set W = Span{v1 , v2 }, where v1 = [1 − 1 2]T and v2 = [2 1 1]T . 17. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) The span of a single nonzero vector in R2 can be thought of as a line through the origin. (b) The span of any two nonzero vectors in R3 can be viewed as a plane through the origin in R3 . (c) If Ax = b holds true for a given matrix A and vectors x and b, then x lies in the span of the columns of A.

Systems of linear equations revisited

39

(d) It is possible for a homogeneous equation Ax = 0 to be inconsistent. (e) The number of free variables present in the solution to Ax = 0 is the same as the number of pivot columns in the matrix A.

1.5 Systems of linear equations revisited

From our initial work with row-reducing a system of linear equations to our recent discussions of linear combinations and span, we have seen already that there are several perspectives from which to view a system of linear equations. One is purely algebraic: “is there at least one ordered list (x1 , . . . , xn ) that makes every equation in a given system true?” Here we are viewing the system in the form a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .. .. .=.

am1 x1 + am2 x2 + · · · + amn xn = bm In light of linear combinations, we can rephrase this question geometrically as “is the vector b a linear combination of the vectors a1 , . . . , an ?”, where ai is the ith column of the coefﬁcient matrix of the system. From this standpoint, asking if the system has a solution can be thought of in terms of the question, “does the vector b belong to the span of the columns of A?” Finally, through matrix multiplication, we can also express this system of equations in its simplest form: Ax = b. From all of this, we know that the question, “Does Ax = b have at least one solution?” is one of fundamental importance. We have also seen that in the special case of the homogeneous equation Ax = 0, the answer to the above questions is always afﬁrmative, since setting x = 0 guarantees that we have at least one solution. In what follows, we further explore the nonhomogeneous case Ax = b, with particular emphasis on understanding characteristics of the matrix A that enable us to answer the questions in the preceding paragraph. We begin by revisiting example 1.4.2 from a more algebraic perspective. Example 1.5.1 For which vectors b is the equation Ax = b consistent, if A is the matrix whose columns are the vectors a1 = [2 1]T and a2 = [−1 − 12 ]T ? Solution. By the deﬁnition of matrix multiplication, this question is equivalent to asking, “which vectors b are linear combinations of the columns of A?” This question may be equivalently rephrased as “which vectors b are in the span of the columns of A?” We have already answered this question from a geometric perspective in example 1.4.2, where we saw that since a1 and a2 are parallel, it follows that every vector in R2 that lies on the line through the origin in

40

Essentials of linear algebra

the direction of a1 can be written as a linear combination of the two vectors. Nonetheless, it is insightful to explore algebraically why this is the case. Letting b be the vector whose entries are b1 and b2 and writing the equation Ax = b in the form of an augmented matrix, we row-reduce and ﬁnd that 2 −1 b1 1 − 12 b2 → 1 − 12 b2 0 0 b1 − 2b2 The second row in the augmented matrix represents the equation 0x1 + 0x2 = b1 − 2b2 Observe that if b1 − 2b2 = 0, this equation cannot possibly be true, and therefore the system would be inconsistent. Said differently, the only way for Ax = b to be consistent is for b1 − 2b2 = 0. That is, if b is a vector such that b1 = 2b2 , or 2b2 b= b2 then Ax = b is consistent. This makes sense geometrically, since the span of the columns of A is all the stretches of the vector a1 = [2 1]T . An important lesson to take from example 1.5.1 is that the equation Ax = b discussed there is not consistent for every choice of b. In fact, the equation is only consistent for very limited choices of b. For example, if b = [6 3]T , the equation is consistent, but if b = [6 k ]T for any k = 3, the equation is inconsistent. Moreover, we should observe that for the matrix in this example, A does not have a pivot position in every row. This is what ultimately leads to the algebraic equation 0x1 + 0x2 = b1 − 2b2 , and the potential inconsistency of Ax = b. At this point in our work, it is important that we begin to generalize our observations in order to apply them in new, but similar, circumstances. We again emphasize that it is a noteworthy characteristic of linear algebra that the discipline often offers great ﬂexibility through the large number of ways to say the same thing; at times, one way of stating a fact can give more insight than others, and therefore it is important to be well versed in shifting among multiple perspectives. The following theorem is of the form “the following statements are equivalent”; this means that if any one of the statements is true, all the others are as well. Likewise, if any one statement is false, every statement in the theorem must be false. This theorem formalizes our ﬁndings in the example above, and, in some sense, our work in the ﬁrst several sections of the text. Theorem 1.5.1 Let A be an m × n matrix and b a vector in Rm so that the equation Ax = b represents a system of m linear equations in n unknown variables. The following statements are equivalent: a. The equation Ax = b is consistent b. The vector b is a linear combination of the columns of A

Systems of linear equations revisited

41

c. The vector b is in the span of the columns of A d. When the augmented matrix [A b] is row-reduced, there are no rows where the ﬁrst n entries are zero and the last entry is nonzero. The following example demonstrates how we can use theorem 1.5.1 to answer questions about span and linear combinations. Example 1.5.2 Does the vector b = [1 − 7 − 14]T belong to the span of the vectors a1 = [1 3 4]T , a2 = [2 1 − 1]T , and a3 = [0 5 9]T ? Does the result change if we ask the same question about the vector c = [1 − 7 − 13]T ? Solution. By theorem 1.5.1, we know that it is equivalent to ask if the equation Ax = b is consistent, where b is the given vector and A is the matrix whose columns are a1 , a2 , and a3 . To answer that question, we consider the augmented matrix [A | b] and row-reduce: ⎡ ⎤ ⎡ ⎤ 1 2 0 1 1 0 2 −3 ⎣3 1 5 −7⎦ → ⎣0 1 −1 2⎦ 4 −1 9 −14 0 0 0 0 Because this system of equations is consistent, it follows that b is indeed a linear combination of the columns of A and therefore b lies in the span of a1 , a2 , and a3 . If we instead consider the vector c stated in the example and proceed similarly, row-reduction shows that ⎡ ⎤ ⎡ ⎤ 1 2 0 1 1 0 2 0 ⎣3 1 5 −7⎦ → ⎣0 1 −1 0⎦ 4 1 9 −13 0 0 0 1 which implies that the system is inconsistent and therefore c is not a linear combination of the columns of A, or equivalently, c does not lie in the span of a1 , a2 , and a3 . At this point, it is natural to think the situations in examples 1.5.1 and 1.5.2 are somewhat dissatisfying: sometimes Ax = b is consistent, and sometimes not, all depending on our choice of b. A natural question to ask is, “are there matrices A for which Ax = b is consistent for every choice of b?” With that question, we are certainly interested in the properties of the matrix A that make this situation occur. We next revisit example 1.4.3 and explore these issues further. Example 1.5.3 For which vectors b is the equation Ax = b consistent, if A is the matrix whose columns are the vectors a1 = [2 1]T and a2 = [1 2]T ?

42

Essentials of linear algebra

Solution. Proceeding as in the previous example, we row reduce the augmented matrix form of the equation and ﬁnd that

2 1 1 0 2 1 b1 3 b1 − 3 b2 → 1 2 b2 0 1 − 13 b1 + 23 b2 Algebraically, this shows that regardless of the entries we select for the vector b, we can always ﬁnd a solution to the equation Ax = b. In particular, x is the vector in R2 whose components are x1 = 23 b1 − 13 b2 and x2 = − 13 b1 + 23 b2 . Thus the equation Ax = b is consistent for every b in R2 . Note that this is not surprising, given our work in example 1.4.3, where we found that from a geometric perspective, every vector b ∈ R2 could be written as a linear combination of a1 and a2 . This example simply conﬁrms that ﬁnding, but now from an algebraic point of view. In terms of a key property of the matrix in example 1.5.3, we see that A has a pivot position in every row. In particular, there is no row in RREF(A) where we encounter all zeros, and thus it is impossible to ever encounter an equation of the form 0 = k, where k = 0. This is, therefore, one property of the matrix A that guarantees consistency for every choice of b. We generalize our ﬁndings in this example in the following theorem, which is similar to theorem 1.5.1, but now focuses solely the matrix A and no longer requires a vector b to be initially chosen. Theorem 1.5.2 equivalent:

Let A be an m × n matrix. The following statements are

a. The equation Ax = b is consistent for every b ∈ Rm b. Every vector b ∈ Rm is a linear combination of the columns of A c. The span of the columns of A is Rm d. A has a pivot position in every row. That is, when the matrix A is row-reduced, there are no rows of all zeros. Our next example shows how we can apply theorem 1.5.2 to answer general questions about the span of a set of vectors and the consistency of related systems of equations. Example 1.5.4 Does the vector b = [1 − 7 − 13]T belong to the span of the vectors a1 = [1 3 4]T , a2 = [2 1 − 1]T , and a3 = [0 5 10]T ? Can every vector in R3 be found in the span of the vectors a1 , a2 , and a3 ? Solution. Just as in example 1.5.2, we know by theorem 1.5.1 that it is equivalent to ask if the equation Ax = b is consistent, where b is the given vector and A is the matrix whose columns are a1 , a2 , and a3 . We thus consider

Systems of linear equations revisited

43

the augmented matrix [A | b] and row-reduce: ⎡ ⎤ ⎡ ⎤ 1 2 0 1 1 0 0 −5 ⎣3 1 5 −7⎦ → ⎣0 1 0 3⎦ 0 0 1 1 4 −1 10 −13 Because this system of equations is consistent, it follows that b is indeed a linear combination of the columns of A and therefore b lies in the span of a1 , a2 , and a3 . But by theorem 1.5.2 we can now make a much more general observation. Because we see that the coefﬁcient matrix A has a pivot in every row, it follows that regardless of which vector b we choose in R3 , we can write that vector as a linear combination of the columns of A. That is, the vectors a1 , a2 , and a3 span all of R3 and the equation Ax = b will be consistent for every choice of b. This example demonstrates that it is in some sense ideal if a matrix A has a pivot in every row. As we proceed with further study of linear algebra, we will focus more and more on properties of the coefﬁcient matrix and their implications for related systems of equations. We conclude this section by examining a key link between homogeneous and nonhomogeneous equations in order to foreshadow an essential concept in our pending study of differential equations. Example 1.5.5 Solve the nonhomogeneous system of linear equations given by the equation Ax = b where A and b are ⎡ ⎡ ⎤ ⎤ 1 1 1 1 1 ⎢2 1 −1 3⎥ ⎢ −8 ⎥ ⎢ ⎥ ⎥ A=⎢ ⎣1 0 −2 2⎦ , b = ⎣ −9 ⎦ 8 5 −1 11 −22 If more than one solution exists, express the solution in parametric vector form. Solution. Note that the coefﬁcient matrix A is identical to the one in example 1.4.4, so that here we are simply considering a related nonhomogeneous equation. We augment the matrix A with b and then row reduce to ﬁnd ⎡ ⎤ ⎡ ⎤ 1 1 1 1 1 1 0 −2 2 −9 ⎢2 1 −1 3 −8⎥ ⎢ 3 −1 10⎥ ⎢ ⎥ → ⎢0 1 ⎥ ⎣1 0 −2 2 −9⎦ ⎣0 0 0 0 0⎦ 0 0 0 0 0 8 5 −1 11 −22 As we found with the homogeneous equation, the system is consistent and has two free variables, and therefore inﬁnitely many solutions. These solutions must satisfy the equations x1 = −9 + 2x3 − 2x4 x2 = 10 − 3x3 + x4

44

Essentials of linear algebra

where x3 and x4 are free. Equivalently, it must be the case that any solution x has the form ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ −9 + 2x3 − 2x4 −9 −2 x1 2 ⎢ ⎥ ⎢ ⎥ ⎢ x2 ⎥ ⎢ 10 − 3x3 + x4 ⎥ ⎢ 10 ⎥ ⎥ ⎢ ⎥=⎢ ⎥ + x3 ⎢ −3 ⎥ + x4 ⎢ 1 ⎥ x=⎢ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎣ x3 ⎦ = ⎣ 0 0⎦ 1 x3 0 1 0 x4 x4 where x3 , x4 ∈ R. Observe that if we let xp = [−9 10 0 0]T and let xh be any vector of the form ⎡ ⎤ ⎡ ⎤ −2 2 ⎢ −3 ⎥ ⎢ ⎥ ⎥+s⎢ 1⎥ xh = t ⎢ ⎣ 1⎦ ⎣ 0⎦ 1 0 then any solution to the equation Ax = b has the form x = xp + xh . Moreover, it is now apparent that this vector xh is the same general solution vector that we found for the corresponding homogeneous equation in example 1.4.4. In addition, it is straightforward to check that Axp = b. Thus, we see that the general solution to the nonhomogeneous equation contains the general solution to the corresponding homogeneous equation. It appears from example 1.5.5 that if we have a solution, say xp , to a nonhomogeneous equation Ax = b, we may add any solution xh to the homogeneous equation Ax = 0 to xp and still have a solution to Ax = b. To see why any vector of the form xp + xh is a solution to Ax = b, let us assume that xp is a solution to Ax = b, and xh is a solution to Ax = 0. We claim that x = xp + xh is also a solution to Ax = b. This holds since Ax = A(xp + xh ) = Axp + Axh = b+0 =b

(1.5.1)

Clearly, this shows that the solution to the corresponding homogeneous equation plays a central role in the solution of nonhomogeneous equations. One observation we can make is that in the event we can ﬁnd a single particular solution xp to the nonhomogeneous equation, if the corresponding homogeneous equation has at least one free variable, then we know that there must be inﬁnitely many solutions to the nonhomogeneous equation as well. We could even take the perspective that, in order to solve a nonhomogeneous equation, we simply need to do two things: ﬁnd one particular solution to Ax = b, and then combine that particular solution with the general solution to the corresponding homogeneous equation Ax = 0. While this is not so useful with systems of linear algebraic equations, it turns out that this approach of solving the homogeneous equation ﬁrst is essential in the solution of differential equations.

Systems of linear equations revisited

45

The following example shows how the same structure is present in a class of differential equations that we will discuss in detail in section 2.3. Example 1.5.6 Consider the differential equations y + 3y = 0 and y + 3y = 6. Compare and contrast the solutions to these two equations. Solution. The ﬁrst equation, y + 3y = 0, we will call a homogeneous linear ﬁrst-order differential equation. Note that it asks a straightforward question: what function y(t ) is such that the function’s derivative plus 3 times itself is the zero function? Said differently, we seek a function y such that y = −3y. From our experience with exponential functions in calculus, we know that if y = e −3t , then y = −3e −3t . The same is true for functions like y = 2e −3t and y = −5e −3t ; indeed, we see that for any constant C, the function y = Ce −3t satisﬁes the differential equation. (It also turns out that these are the only functions that satisfy the differential equation.) If we next consider the related differential equation y + 3y = 6 – one that we will call a nonhomogeneous linear ﬁrst-order differential equation—we see that there is one obvious solution to the equation. In particular, if we let y(t ) be the constant function y(t ) = 2, then y (t ) = 0 and this function clearly makes the differential equation true since 3 × 2 = 6. Now, we should wonder if we have found all of the possible solutions to y + 3y = 6. The answer is no: as we will see in section 2.3, it turns out that the general solution y to this differential equation is y(t ) = 2 + Ce −3t We can verify that this is the case by direct substitution. Note that y = −3Ce −3t and therefore y + 3y = −3Ce −3t + 3(2 + Ce −3t ) = −3Ce −3t + 6 + 3Ce −3t = 6 Observe the structure of this solution function: if we let yp = 2, we have a particular solution to the nonhomogeneous equation. Further, letting yh = Ce −3t , this is the general solution to the related homogeneous equation. This demonstrates that the overall solution to the nonhomogeneous equation is y = yp + yh = 2 + Ce −3t Exercises 1.5 For each of the following m × n matrices A in exercises 1–8, determine whether the equation Ax = b is consistent for every choice of b ∈ Rm . If not, describe the set of all b ∈ Rm for which the equation is consistent. In each case, explain your reasoning fully. 4 −1 1. A = 1 −4 4 −1 2. A = −12 3

46

Essentials of linear algebra

1 0 2 3. A = 0 1 −3 ⎡ ⎤ 2 1 3⎦ 4. A = ⎣−1 4 −2 ⎡ ⎤ 1 5 −2 7⎦ 5. A = ⎣ 2 −1 −3 4 −14 ⎤ ⎡ 1 5 −2 7⎦ 6. A = ⎣ 2 −1 −3 4 −13 ⎡ ⎤ 1 0 0 ⎢0 1 0⎥ ⎥ 7. A = ⎢ ⎣0 0 1⎦ 0 0 0 ⎡ ⎤ 1 0 0 2 5⎦ 8. A = ⎣0 1 0 0 0 1 −3

9. If A is an m × n matrix and m > n, is it possible for the equation Ax = b to be consistent for every b ∈ Rm ? Explain. 10. If A is an m × n matrix and m ≤ n, is it guaranteed that the equation Ax = b will be consistent for every b ∈ Rm ? Explain. In each of exercises 11–16, determine whether the given vector b is in the span of the columns of the given matrix A. If b lies in the span of the columns of A, determine weights that enable you to explicitly write b as a linear combination of the columns of A. 2 4 −1 11. b = , A= 1 −4 5 6 4 −1 12. b = , A= −20 −12 3 6 1 0 2 13. b = , A= 0 1 −3 −2 ⎡ ⎡ ⎤ ⎤ 1 2 1 3⎦ 14. b = ⎣ −11 ⎦, A = ⎣−1 14 4 −2 ⎡ ⎡ ⎤ ⎤ −4 1 5 −2 7⎦ 15. b = ⎣ −2 ⎦, A = ⎣ 2 −1 1 −3 4 −14

Systems of linear equations revisited

⎤ −4 16. b = ⎣ −2 ⎦, 1 ⎡

47

⎤ 1 5 −2 7⎦ A = ⎣ 2 −1 −3 4 −13 ⎡

For each matrix A given in exercises 17–21, determine the general solution xh to the homogeneous equation Ax = 0. 1 −3 2 17. A = −4 1 3 ⎡ ⎤ 1 2 0 1 1 5 −7 ⎦ 18. A = ⎣3 4 −1 10 −13 8 −5 19. A = 10 −16 ⎡ ⎤ 3 1 −1 1⎦ 20. A = ⎣ 1 3 −1 1 3 ⎡ ⎤ 1 −1 2 6⎦ 21. A = ⎣ 4 −2 −7 3 −10 In exercises 22–26, solve the nonhomogeneous equation Ax = b, given the matrix A and vector b. Express your solution x (if one exists) in the form x = xp + xh , where xp is a particular solution to Ax = b and xh is the solution to the corresponding homogeneous equation Ax = 0. Compare your results to exercises 17–21, respectively. 1 −3 2 5 22. A = , b= −4 1 3 −9 ⎡ ⎡ ⎤ ⎤ 1 2 0 1 1 1 5 −7⎦, b = ⎣ 3 ⎦ 23. A = ⎣3 5 4 −1 10 −13 8 −5 −21 24. A = , b= 10 −16 42 ⎡ ⎡ ⎤ ⎤ 3 1 −1 3 1 ⎦, b = ⎣ − 1 ⎦ 25. A = ⎣ 1 3 1 −1 1 3 ⎤ ⎤ ⎡ ⎡ 1 −1 5 2 6⎦, b = ⎣ 16 ⎦ 26. A = ⎣ 4 −2 −7 3 −10 −27

48

Essentials of linear algebra

27. Suppose that A is a 6 × 9 matrix that has a pivot in every row. What can you say about the consistency of Ax = b for every b ∈ R6 ? Why? 28. Suppose that A is a 3 × 4 matrix and that the span of the columns of A is R3 . What can you say about the consistency of Ax = b for every b ∈ R3 ? Why? 29. If possible, give an example of a 3 × 2 matrix A such that the span of the columns of A is R3 . If ﬁnding such a matrix is impossible, explain why. 30. Suppose that A is a 4 × 3 matrix for which the homogeneous equation Ax = 0 has only the trivial solution. Will the equation Ax = b be consistent for every b ∈ R4 ? Explain. For the vectors b for which Ax = b is indeed a consistent equation, how many solution vectors x does each equation have? Why? 31. Suppose that A is a 3 × 4 matrix for which the homogeneous equation Ax = 0 has exactly one free variable present. Will the equation Ax = b be consistent for every b ∈ R3 ? Explain. For the vectors b for which Ax = b is indeed a consistent equation, how many solution vectors x does each equation have? Why? 32. Suppose that A is a 4 × 5 matrix for which the homogeneous equation Ax = 0 has exactly two free variables present. Will the equation Ax = b be consistent for every b ∈ R4 ? Explain. For the vectors b for which Ax = b is indeed a consistent equation, how many solution vectors x does each equation have? Why? 33. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) If Ax = b is consistent for at least one vector b, then A has a pivot in every row. (b) If A is a 4 × 3 matrix, then it is possible for the columns of A to span R4 . (c) If A is a 3 × 3 matrix with exactly two pivot columns, then the columns of A do not span R3 . (d) If A is a 3 × 4 matrix, then the columns of A must span R3 . (e) If y and z are solutions to the equation Ax = 0, then the vector y + z is also a solution to Ax = 0. (f) If y and z are solutions to the equation Ax = b, where b = 0, then the vector y + z is also a solution to Ax = b. 34. Solve the linear ﬁrst-order differential equation y + y = 3 by ﬁrst ﬁnding all functions yh that satisfy the homogeneous equation y + y = 0 and then determining a constant function yp that is a solution to y + y = 3. Verify by direct substitution that y = yh + yp is a solution to the given equation. 35. Solve the linear ﬁrst-order differential equation y − 5y = 6 by ﬁrst ﬁnding all functions yh that satisfy the homogeneous equation y − 5y = 0 and

Linear independence

49

then determining a constant function yp that is a solution to y − 5y = 6. Verify by direct substitution that y = yh + yp is a solution to the given equation.

1.6 Linear independence

In theorem 1.5.2, we found that when solving Ax = b, an ideal situation occurs when A has a pivot position in every row. Equivalently, this means that the equation Ax = b is guaranteed to have at least one solution for every vector b ∈ Rm (when A is m × n), or that every b ∈ Rm can be written as a linear combination of the columns of A. In other words, regardless of the choice of b, the equation Ax = b is always consistent. Because the equation is consistent, we are guaranteed that at least one solution x exists. In what follows, we explore conditions that imply not only that at least one solution exists, but in fact that only one solution exists. First, we consider the simpler situation of homogeneous equations. In section 1.4, we discovered that the equation Ax = 0 is always consistent. Because x = 0 always makes this equation true, we know that we at least have the trivial solution present. It is natural to ask: under what conditions on A is the trivial solution the only solution to the homogeneous equation Ax = 0? Geometrically, we are asking whether or not a nontrivial linear combination of the columns of A can be formed that leads to the zero vector. We revisit an earlier example to further explore these issues. Example 1.6.1 Does the equation Ax = 0 have nontrivial solutions if A is the matrix whose columns are a1 = [2 1]T and a2 = [−1 − 12 ]T ? Discuss the geometric implications of your conclusions. Solution. We ﬁrst consider the corresponding augmented matrix and row reduce, ﬁnding that

2 −1 0 1 − 12 0 → 1 − 12 0 0 0 0 This shows that any vector x = [x1 x2 ]T that satisﬁes x1 = 12 x2 will be a solution to Ax = 0. The presence of the free variable x2 implies that there are inﬁnitely many nontrivial solutions to this equation. If we interpret the matrix–vector product Ax as the linear combination Ax = x1 a1 + x2 a2 , then the equation 1 x2 a1 + x2 a2 = 0 2 implies geometrically that the zero vector (on the right) may be expressed as a nontrivial linear combination of a1 and a2 . For example, a1 + 2a2 = 0.

50

Essentials of linear algebra

4

x2

a1 x1 −4

a2

4

−4 Figure 1.11 Linear combinations of a1 and a2

from example 1.6.1.

Indeed, if we consider ﬁgure 1.11 this conclusion is evident: if we add one length of a1 to two lengths of a2 , we end up at 0. Another way to express the equation a1 + 2a2 = 0 is to write a1 = −2a2 . In this setting, we can see that a1 depends on a2 , and that the relationship is given by a linear equation. We hence say that a1 and a2 are linearly dependent vectors. The situation in example 1.6.1, where the vectors a1 and a2 are parallel is in contrast to that of example 1.4.3, where we instead considered the non-parallel vectors a1 = [2 1]T and a2 = [1 2]T ; in that setting, if we solve the associated homogeneous equation Ax = 0, we ﬁnd that 2 1 0 1 0 0 → 1 2 0 0 1 0 In this case, the only solution to Ax = 0 is the trivial solution, x = 0. The geometry of the situation also informs us: if we desire a linear combination of the vectors a1 and a2 (as shown in ﬁgure 1.12) that results in the zero vector, we see that the only way to accomplish this is to take 0a1 + 0a2 . Said differently, if we take any nontrivial linear combination c1 a1 + c2 a2 , we end up at a location other than the origin. When a1 and a2 in example 1.6.1 were parallel, we said that a1 and a2 were linearly dependent. In the current context, where a1 and a2 are not parallel, it makes sense to say that a1 and a2 are linearly independent, since neither depends on the other. Of course, in linear algebra we often consider sets of more than two vectors. The next deﬁnition formalizes what the terms linearly dependent and linearly independent mean in a more general context. Observe that the key criterion is

Linear independence

4

51

x2

a2 a1 −4

x1 4

−4 Figure 1.12 Linear combinations of a1 and a2

from example 1.4.3.

a geometric one: can we form a nontrivial linear combination of vectors that results in 0? Deﬁnition 1.6.1 Given a set S = {v1 , . . . , vk } where each vector vi ∈ Rm , the set S is linearly dependent if there exists a nontrivial solution x to the vector equation x1 v1 + x2 v2 + · · · + xk vk = 0

(1.6.1)

If (1.6.1) has only the trivial solution, then we say the set S is linearly independent. Note that (1.6.1) also takes us back to the fundamental questions about any linear system of equations: “does at least one solution exist?” (Yes; the zero vector is always a solution.) And “is that solution unique?” (Maybe; only if the vectors are linearly independent and the zero vector is the only solution.) The latter question addresses the fundamental issue of linear independence. We consider an example to demonstrate how we interpret the language of this most recent deﬁnition as well as how we will generally respond to the question of whether or not a set of vectors is linearly independent. Example 1.6.2 Determine whether the set S = {v1 , v2 , v3 } is linearly independent or linearly dependent if ⎤ ⎤ ⎡ ⎡ ⎡ ⎤ −1 1 0 v1 = ⎣ 1 ⎦ , v2 = ⎣ 0 ⎦ , v3 = ⎣ 1 ⎦ 1 1 −1

52

Essentials of linear algebra

Solution. By deﬁnition, the linear independence of the set S rests on whether or not nontrivial solutions exist to the vector equation x1 v1 + x2 v2 + x3 v3 = 0. Letting A = [v1 v2 v3 ], we know that this question is equivalent to determining whether or not Ax = 0 has a nontrivial solution. Considering the augmented matrix [A 0] and row-reducing, we ﬁnd ⎤ ⎡ ⎤ ⎡ 1 −1 0 0 1 0 0 0 ⎣ 1 0 1 0⎦ → ⎣0 1 0 0⎦ (1.6.2) 0 0 1 0 −1 1 1 0 It follows that Ax = 0 has only the trivial solution, and therefore the set S is linearly independent. Geometrically, this means that if we take any nontrivial combination of v1 , v2 , and v3 , the result is a vector that is not the zero vector. From example 1.6.2, we see how we will normally test a set of vectors for linear independence: we take advantage of our understanding of linear combinations and matrix multiplication and convert the vector equation x1 v1 + x2 v2 + · · · + xk vk = 0 to the matrix equation Ax = 0, where A is the matrix with columns v1 , . . . , vk . Row-reducing, we can test whether or not nontrivial solutions exist to Ax = 0 by examining pivot locations in the matrix A. Several facts about linear dependence and independence will prove to be useful in many aspects of our upcoming work. We simply state them here, and leave their veriﬁcation to the exercises at the end of this section: • Any set containing the zero vector is linearly dependent. • Any set {v1 } consisting of a single nonzero vector is linearly independent. • Any set of two vectors {v1 , v2 } is linearly independent whenever v1 is not a scalar multiple of v2 . • The columns of a matrix A are linearly independent if and only if the equation Ax = 0 has only the trivial solution. The concepts of linear independence and span both involve linear combinations of a set of vectors. Furthermore, there are many important and natural connections between span and linear independence. The next example extends the previous one and lays the foundation for a discussion of several general results. Example 1.6.3 Let the vectors v1 , v2 , v3 , and v4 be given by ⎡ ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ ⎤ −1 1 0 5 v1 = ⎣ 1 ⎦ , v2 = ⎣ 0 ⎦ , v3 = ⎣ 1 ⎦ , v4 = ⎣ 6 ⎦ 1 1 −1 −1 Let R = {v1 , v2 }, S = {v1 , v2 , v3 }, and T = {v1 , v2 , v3 , v4 }. Which of the sets R, S, and T are linearly independent? Which of the sets R, S, and T span R3 ?

Linear independence

53

Solution. We have already seen in example 1.6.2 that the set S is linearly independent. Moreover, we saw that when we let A = [v1 v2 v3 ] and row-reduce the augmented matrix for the equation Ax = 0, it follows that ⎡

⎤ ⎡ ⎤ 1 −1 0 0 1 0 0 0 ⎣ 1 0 1 0⎦ → ⎣0 1 0 0⎦ 0 0 1 0 −1 1 1 0

Not only does this show that the vectors in set S are linearly independent (Ax = 0 has only the trivial solution because A has a pivot in every column so there are no free variables present), but also, by theorem 1.5.2, the vectors in S span R3 since A has a pivot in every row. Since the vectors in S span R3 , this means that we can write every vector in R3 as a linear combination of the three vectors in S. Moreover, since A has a pivot in every column, it will also follow that every such linear combination is unique: every vector in R3 can be written in exactly one way as a linear combination of v1 , v2 , and v3 . What happens if we remove v3 from S and instead consider the set R = {v1 , v2 }? To answer the question of linear independence, we ask if there is a nontrivial solution to the vector equation x1 v1 + x2 v2 = 0. Equivalently, we let B be the 3 × 2 matrix whose columns are v1 and v2 and solve Bx = 0. Doing so, we ﬁnd that ⎡ ⎤ ⎡ ⎤ 1 −1 0 1 0 0 ⎣ 1 0 0⎦ → ⎣0 1 0 ⎦ 0 0 0 −1 1 0 so only the trivial solution exists and thus the set R is linearly independent. Note again that this is due to the fact that B has a pivot in every column. This should not be surprising, since we removed a vector from the linearly independent set S to get the set R: if the vectors in S do not depend on one another, neither should the vectors in R. On the other hand, we can also say by theorem 1.5.2 that the set R does not span R3 , since B does not have a pivot position in every row. For example, the vector b = [0 1 1]T cannot be written as a linear combination of v1 and v2 . This can be seen by row-reducing the augmented matrix that represents Bx = b, where we ﬁnd that ⎡ ⎤ ⎡ ⎤ 1 −1 0 1 0 0 ⎣ 1 0 1⎦ → ⎣0 1 0⎦ 0 0 1 −1 1 1 The last equation tells us that 0x1 + 0x2 = 1, which is impossible, and thus b cannot be written as a linear combination of the vectors in R. Finally, we consider the set T = {v1 , v2 , v3 , v4 }. To test if T is linearly independent, we let C be the matrix whose columns are v1 , v2 , v3 , and v4 ,

54

Essentials of linear algebra

and consider the equation Cx = 0, which corresponds to the equation x1 v1 + x2 v2 + x3 v3 + x4 v4 = 0. Row-reducing, ⎡ ⎤ ⎡ ⎤ 1 −1 0 5 0 1 0 0 2 0 ⎣ 1 0 1 6 0⎦ → ⎣0 1 0 −3 0⎦ 0 0 1 4 0 −1 1 1 −1 0 Note that the variable x4 is free, since C does not have a pivot in its fourth column. This shows that any vector x with entries x1 , x2 , x3 , and x4 such that x1 = −2x4 , x2 = 3x4 , and x3 = −4x4 will be a solution to the equation Cx = 0. For example, taking x4 = 1, it follows that ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 1 0 5 0 −2 ⎣ 1 ⎦ + 3 ⎣ 0 ⎦ − 4 ⎣ 1 ⎦ + 1 ⎣ 6 ⎦ = ⎣ 0 ⎦ 1 1 0 −1 −1 Thus, the set T is linearly dependent. We can also see from our computations that the set T does indeed span R3 , since the matrix C has a pivot position in every row. This result should be expected: we have already shown that every vector in R3 can be written as a linear combination of the vectors in S, and the set T contains all three vectors in S. There are many important generalizations we can make from example 1.6.3. For instance, from an algebraic perspective we see that we can easily answer questions about the linear independence and span of the columns of a matrix simply by considering the location of pivots in the matrix. In particular, the columns of A are linearly independent if and only if A has a pivot in every column, while the columns of A span Rm if and only if A has a pivot in every row. We state these results formally in the two following theorems. Theorem 1.6.1 equivalent:

Let A be an m × n matrix. The following statements are

a. The columns of A span Rm . b. A has a pivot position in every row. c. The equation Ax = b is consistent for every b ∈ Rm . In the next theorem, note particularly the change in emphasis in statement (b) from rows to columns when considering pivot positions in the matrix. Theorem 1.6.2 equivalent:

Let A be an m × n matrix. The following statements are

a. The columns of A are linearly independent. b. A has a pivot position in every column. c. The equation Ax = 0 has only the trivial solution.

Linear independence

55

At this point, it appears ideal if a set is linearly independent or spans Rm . The best scenario, then, is the case when a set has both of these properties and forms a linearly independent spanning set. In this case, for the matrix whose columns are the vectors in the set, we need the matrix to have a pivot in every column, as well as in every row. As we saw in example 1.6.3 with the set S and the corresponding matrix A, this can only happen when the number of vectors in the set S matches the number of entries in each vector. In other words, the corresponding matrix A must be square. Obviously if a square matrix has a pivot in every row, it must also have a pivot in every column, and vice versa. We close our current discussion with an important result that links the concepts of linear independence and span in the columns of a square matrix; theorem 1.6.3 is a consequence of the two preceding ones. Theorem 1.6.3 Let A be an n × n matrix. The following statements are equivalent: a. The columns of A are linearly independent. b. The columns of A span Rn . c. A has a pivot position in every column. d. A has a pivot position in every row. e. For each b ∈ Rn , the equation Ax = b has a unique solution. Theorem 1.6.3 shows that square matrices play a particularly important role in linear algebra, an idea that will further demonstrate itself when we study the notion of the inverse of a matrix in the following section. We conclude this section with a look ahead to our study of linear differential equations, in which the concepts of linear independence and span will also ﬁnd a prominent role. Example 1.6.4 Consider the differential equation y + y = 0. Explain why the function y = c1 cos t + c2 sin t is a solution to the differential equation. Solution. In our upcoming study of differential equations, we will call the equation y + y = 0 a linear second-order homogeneous equation with constant coefﬁcients. Equations of this form will be considered in chapter 3 and be the focus of chapter 4. For now, we can intuitively understand why y = c1 cos t + c2 sin t is a solution to the equation. Note that in order to solve the equation y + y = 0, we must ﬁnd all functions y such that y = −y. From our experience in calculus, we know that d d [sin t ] = cos t and [cos t ] = − sin t dt dt

56

Essentials of linear algebra

Furthermore, if we consider second derivatives, d d d2 d2 [ sin t ] = [ cos t ] = − sin t and [cos t ] = [− sin t ] = − cos t dt 2 dt dt 2 dt Hence, the second derivative of each basic trigonometric function is the opposite of itself, which makes both y = cos t and y = sin t solutions to the equation y + y = 0. Moreover, it is a straightforward exercise to show (using properties of the derivative) that any scalar multiple (such as y = 3 sin t ) of either function is also a solution to the differential equation, as is any combination of the form y = 2 cos t + 3 sin t . More generally, this makes any function y = c1 cos t + c2 sin t a solution to the differential equation. If we think about our understanding of linear independence for a set of two vectors, we ﬁnd an analogy to the two functions cos t and sin t : since these two functions are not scalar multiples of one another, it makes sense to call these functions linearly independent. Moreover, from the form of the function y = c1 cos t + c2 sin t , we are taking linear combinations of the basic trigonometric functions to form other solutions to the differential equation. We can even go so far as to say that the solution set to the differential equation is the span of the two functions cos t and sin t . In future work, we will see that this broader perspective on linear independence and span serves us well in solving linear differential equations. We will gain additional understanding of why the solution set to every secondorder linear homogeneous differential equation with constant coefﬁcients demonstrates a similar structure in subsequent work. Exercises 1.6 In each of exercises 1–8, determine whether the given set S is linearly independent or linearly dependent. 1. S = {v1 , v2 } where v1 = [3 − 2]T and v2 = [−9 6]T 2. S = {v1 , v2 } where v1 = [1 0]T and v2 = [0 1]T 3. S = {v1 , v2 } where v1 = [5 − 2]T and v2 = [5 2]T 4. S = {v1 , v2 , v3 } where v1 = [5 − 2]T , v2 = [5 2]T , and v3 = [11 − 5]T 5. S = {v1 , v2 , v3 } where v1 = [−1 2 1]T , v2 = [3 1 1]T , and v3 = [1 5 3]T 6. S = {v1 , v2 , v3 } where v1 = [−1 2 1]T , v2 = [3 1 1]T , and v3 = [1 5 2]T 7. S = {v1 , v2 } where v1 = [1 − 2 4 3]T and v2 = [−3 6 − 12 − 9]T 8. S = {v1 , v2 , v3 , v4 } where v1 = [−1 2 1]T , v2 = [3 1 1]T , v3 = [1 5 2]T , and v4 = [1 1 1]T 9. For each of the sets S in exercises 1–8, determine whether or not S spans Rm , where m is chosen appropriately.

Linear independence

57

10. Suppose that S is a set of three vectors in R5 . Is it possible for S to span R5 ? Why or why not? 11. Suppose that S is a set of two vectors in R3 . Is S linearly independent, linearly dependent, or not necessarily either? Explain your answer. 12. Let S be a set of four vectors in R3 . Is it possible for S to be linearly independent? Is it possible for S to span R3 ? Why or why not? 13. Let S be a set of ﬁve vectors in R4 . Must S span R4 ? Is it possible for S to be linearly independent? Explain. 14. If A is an m × n matrix, for what relationship between n and m are the columns of A guaranteed to not span Rm ? For what relationship between n and m will the columns have to be linearly dependent? 15. Prove that any set that contains the zero vector must be linearly dependent. 16. Explain why any set consisting of a single nonzero vector must be linearly independent. 17. Show that any set of two vectors, {v1 , v2 }, is linearly independent if and only if v1 is not a scalar multiple of v2 . 18. Explain why the columns of a matrix A are linearly independent if and only if the equation Ax = 0 has only the trivial solution. 19. Let v1 = [−1 2 1]T , v2 = [3 1 1]T , and v3 = [5 3 k ]T . For what value(s) of k is {v1 , v2 , v3 } linearly independent? For what value(s) of k is v3 in the span of {v1 , v2 }? How are these two questions related? 20. Consider the set S = {v1 , v2 , v3 } where v1 = [1 0 0]T , v2 = [0 1 0]T , and v3 = [0 0 1]T . Explain why S spans R3 , and also why S is linearly independent. In addition, determine the weights x1 , x2 , and x3 that allow you to write the vector [−27 13 91]T as a (unique) linear combination of v1 , v2 , v3 . What do you observe? 21. Let A be a 4 × 7 matrix. Suppose that when solving the homogeneous equation Ax = 0 there are three free variables present. Do the columns of A span R4 ? Explain. Are the columns of A linearly dependent, linearly independent, or is it impossible to say? Justify your answer. 22. Suppose that A is a 9 × 6 matrix and that A has six pivot columns. Are the columns of A linearly dependent, linearly independent, or is it impossible to say? Do the columns of A span R9 , or is it impossible to tell? Justify your answers. 23. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) If the system represented by Ax = 0 has a free variable present, then the columns of the matrix A are linearly independent vectors.

58

Essentials of linear algebra

(b) If a matrix has more columns than rows, then the columns of the matrix must be linearly dependent. (c) If an m × n matrix A has a pivot in every column, then the columns of A span Rm . (d) If A is an m × n matrix that is not square, it is possible for its columns to be both linearly independent and span Rm . 24. Consider the linear second-order homogeneous differential equation y + y = 0. Show by direct substitution that y1 = e t and y2 = e −t are solutions to the differential equation. In addition, show by substitution that any linear combination y = c1 e t + c2 e −t is also a solution. 25. We have seen that the general solution to the linear second-order differential equation y + y = 0 is given by y(t ) = c1 sin(t ) + c2 cos(t ) Suppose we know initial values for y(0) and y (0) to be y(0) = 4 and y (0) = −2 What are the values of c1 and c2 ? How is a system of linear equations involved? 26. It can be shown that the solution to the linear second-order differential equation y − y = 0 is given by y(t ) = c1 e t + c2 e −t Suppose we know initial values for y(0) and y (0) to be y(0) = 4 and y (0) = −2 What are the values of c1 and c2 ? How is a system of linear equations involved? 1.7 Matrix algebra

For a given system of linear equations, we are now interested in solving the vector equation Ax = b, where A is a known m × n matrix, b ∈ Rm is given, and we seek x ∈ Rn . It is natural to compare this equation to an elementary linear equation such as 2x = 7. The key algebraic step in solving 2x = 7 is to divide both sides of the equation by 2. Said differently, we multiply both sides by the multiplicative inverse of the number 2. In anticipation of a new approach to solving the vector equation Ax = b, we carefully state the details required to solve 2x = 7. In particular, from the equation 2x = 7, it follows that 1 1 1 7 7 7 2 (2x) = 2 (7), so that ( 2 · 2)x = 2 . Thus, 1 · x = 2 , so x = 2 . From a sophisticated perspective, to solve the equation 2x = 7, we need to be able to multiply, to have a multiplicative identity (that is, the number 1), and to be able to compute a multiplicative inverse (here, the number 12 ).

Matrix algebra

59

In this section, we lay the foundation for similar ideas that provide an alternate way to solve the equation Ax = b: essentially we are interested in determining whether we can ﬁnd a matrix B so that when we compute BA the result is the matrix equivalent of “1”. To do this, we will ﬁrst have to learn what it means to multiply two matrices; a simpler (and still important) place to begin is with the addition of matrices and multiplication of matrices by scalars. We already know how to add vectors and multiply them by scalars; similar principles hold for matrices. Two matrices can be added (or subtracted) if and only if they have an identical number of rows and columns. When addition (subtraction) is deﬁned, the result is computed component-wise. Furthermore, the multiple of a matrix by a scalar c ∈ R is attained by multiplying every entry of the matrix by the same constant c. The following example demonstrates these basic facts. Example 1.7.1 Let A and B be the matrices 1 3 −4 −6 10 −1 A= , B= 0 −7 3 2 11 2 Compute A + B and −3A. Solution. Since A and B are both 2 × 3, their sum is deﬁned and is given by 1 3 −4 −6 10 −1 −5 13 −5 A+B = + = 0 −7 3 2 11 3 −5 13 2 The scalar multiple of a matrix is always deﬁned, and −3A is given by −3 −9 12 −3A = 0 21 −6 Matrix addition, when deﬁned, has all of the expected properties of addition. In particular, A + B = B + A, so order does not matter, and we say matrix addition is commutative. Since A + (B + C) = (A + B) + C, the way we group more than two matrices to add also does not matter and we say matrix addition is associative. There is even a matrix that acts like the number 0. If Z is a matrix of the same number of rows and columns as A such that every entry in Z is zero, then it follows that A + Z = Z + A = A. We call this zero matrix the additive identity. The next natural operation to consider, of course, is multiplication. What does it mean to multiply two matrices? And when does it even make sense to multiply two matrices? We know for matrix–vector multiplication that the product Ax computes the vector b that is the unique linear combination of the columns of A having the entries of the vector x as weights. Moreover, this product is only deﬁned when the number of entries in x matches the number of columns of A. If we now consider a matrix B, we can naturally think about the matrix product AB by considering the columns of B, say b1 , . . . , bk . In particular, we make the following deﬁnition.

60

Essentials of linear algebra

Deﬁnition 1.7.1 If A is an m × n matrix, and B is a matrix whose columns are b1 , . . . , bk such that the matrix–vector product Abj is deﬁned for each j = 1, . . . , k, then we deﬁne the matrix product AB by (1.7.1) AB = [Ab1 Ab2 · · · Abk ] Note particularly that since A has n columns, in order for Abj to be deﬁned each bj must belong to Rn . This in turn implies that the matrix B must have dimensions n × k. Speciﬁcally, the number of rows in B must equal the number of columns in A. We explore matrix multiplication and its properties in the next example. Example 1.7.2

Let A and B be the matrices 1 3 −4 −6 10 A= , B= 0 −7 3 2 2

Compute the matrix products AB and BA, or explain why they are not deﬁned. Solution. First we consider AB. To do so, we would have to compute both Ab1 and Ab2 , where b1 and b2 are the columns of B. But neither of these products is defined, since A has three columns and B has just two rows. Thus, AB is not defined. On the other hand, BA is deﬁned. For instance, we can compute the ﬁrst column of BA by taking Ba1 , where we see that 1 −6 10 −6 Ba1 = = 3 2 0 3 Similar computations for Ba2 and Ba3 show that −6 −88 44 BA = 3 −5 −8 There are several important observations to make based on example 1.7.2. One is that if A is m × n and B is n × k so that the product AB is deﬁned, then the resulting matrix AB is m × k. This is true since the columns of AB are each of the form Abj , thus being linear combinations of the columns of A, which have m entries, so that AB has m rows. Moreover, we have to consider each of the products Ab1 , . . . , Abk , therefore giving AB k columns. Furthermore, we clearly see that order matters in matrix multiplication. Speciﬁcally, given matrices A and B for which AB is deﬁned, it is not even guaranteed that BA is deﬁned, much less that AB = BA. Even when both products are deﬁned, it is possible (even typical) that AB = BA. Formally, we say that matrix multiplication is not commutative. This fact will be explored further in the exercises. It is, however, the case that matrix multiplication (for matrices of the appropriate sizes) is both associative and distributive. That is, A(BC) = (AB)C and A(B + C) = AB + AC, again provided the sizes of the matrices make the relevant products and sums deﬁned.

Matrix algebra

61

Now, we should not forget our motivation for considering matrix multiplication: we want to develop an alternative approach to solving equations of the form Ax = b by multiplying A by another matrix B so that the product BA is the matrix equivalent of the number 1 (while simultaneously multiplying b by the same matrix B). What is the matrix equivalent of the number 1? We consider this question and more in the following example. Example 1.7.3 Consider the matrices 5 11 1 0 A= and I2 = 0 1 −3 −7 Compute AI2 and I2 A. What is special about the matrix I2 ? Solution.

Using the rules for matrix multiplication, we observe that 5 11 1 0 5 11 AI2 = = =A −3 −7 0 1 −3 −7

and similarly I2 A =

1 0 5 11 5 11 = =A 0 1 −3 −7 −3 −7

Thus, we see that multiplying the matrix A by I2 has no effect on the matrix A. The matrix I2 in example 1.7.3 is important because it has the property that I2 A = A for any matrix A with two rows (not simply the matrix A in example 1.7.3) and AI2 = A for any A with two columns. We can similarly show that if I3 is the matrix ⎤ ⎡ 1 0 0 I3 = ⎣0 1 0⎦ 0 0 1 then I3 A = A for any matrix A with three rows, and AI3 = A for any matrix A with three columns. Similar results hold for corresponding matrices In of larger size; each of these matrices acts like the number 1, since multiplying other matrices by In has no effect on the given matrix. Matrices which when multiplied by other matrices do not change the other matrices, are called identity matrices. More formally, the n × n identity matrix In is the square matrix whose diagonal entries all equal 1, and whose off-diagonal entries are all 0. (The diagonal entries in a matrix are those whose row and column indices are the same.) Often, when the context is clear, we will write simply I, rather than In . We also note that In is the only matrix that is n × n and acts as a multiplicative identity. Finally, it is evident that for any m × n matrix A, Im A = AIn = A. In the next section, we will explore the notion of the inverse of a matrix, and there see that identity matrices play a central role. One ﬁnal algebraic operation with matrices merits formal introduction here. Given a matrix A, its transpose, denoted AT , is the matrix whose columns

62

Essentials of linear algebra

are the rows of A. That is, taking the transpose of a matrix replaces its rows with its columns, and vice versa. For example, if A is the 2 × 3 matrix 1 3 −4 A= 0 −7 2 then its transpose AT is the 3 × 2 matrix ⎡

⎤ 1 0 AT = ⎣ 3 −7⎦ −4 2

Note that this is the same notation we regularly use to express a column vector in the form b = [1 2 3]T . In the case that A is a square matrix, taking its transpose results in swapping entries across its diagonal. For example, if ⎤ ⎡ 5 −2 7 A = ⎣ 0 −3 −1⎦ −4 8 −6 then

⎡

⎤ 5 0 −4 8⎦ AT = ⎣−2 −3 7 −1 −6

The transpose operator has several nice algebraic properties, some of which will be explored in the exercises. For example, for matrices for which the appropriate sums and products are deﬁned, (A + B)T = AT + BT and (AB)T = BT AT For a square matrix such as A=

3 −1 −1 2

it happens that AT = A. Any square matrix A for which AT = A is said to be symmetric. It turns out that symmetric matrices have several especially nice properties in the context of more sophisticated concepts that arise later in the text, and we will revisit them at that time. 1.7.1 Matrix algebra using Maple

While it is important that we ﬁrst learn to add and multiply matrices by hand to understand how these processes work, just like with row-reduction it is reasonable to expect that we will often use available technology to perform tedious computations like multiplying a 4 × 5 and 5 × 7 matrix. Moreover, in real-world applications, it is not uncommon to have to deal with matrices that

Matrix algebra

63

have thousands of rows and thousands of columns, or more. Here we introduce a few Maple commands that are useful in performing some of the algebraic manipulations we have studied in this section. Let us consider some of the matrices deﬁned in earlier examples: 1 3 −4 −6 10 −6 10 −1 A= , B= , C= 0 −7 3 2 3 2 11 2 After deﬁning each of these three matrices with the usual commands in Maple, such as > A := ;

we can execute the sum of A and C and the scalar multiple −3B with the commands > A + C; > -3*B;

for which Maple will report the outputs 18 −30 −5 13 −5 and 3 −5 13 −9 −6 We have previously seen that to compute a matrix–vector product, the period is used to indicate multiplication, as in > A.x;. The same syntax holds for matrix multiplication, where deﬁned. For example, if we wish to compute the product BA, we enter > B.A;

which yields the output

−6 −88 44 3 −5 −8

If we try to have Maple compute an undeﬁned product, such as AB through the command > A.B;, we get the error message Error, (in LinearAlgebra:-MatrixMatrixMultiply) first matrix column dimension (3) second matrix row dimension (2)

In the event that we need to execute computations involving an identity matrix, rather than tediously enter all the 1’s and 0’s, we can use the built-in Maple command IdentityMatrix(n); where n is the number of rows and columns in the matrix. For example, entering > Id := IdentityMatrix(4);

64

Essentials of linear algebra

results in the output

⎡ 1 ⎢0 Id := ⎢ ⎣0 0

0 1 0 0

0 0 1 0

⎤ 0 0⎥ ⎥ 0⎦ 1

Note: Id is the name we are using to store this √ identity matrix. We cannot use the letter I because I is reserved to represent −1 in Maple . Finally, if we desire to compute the transpose of a matrix A, such as 1 3 −4 A= 0 −7 2 the relevant command is > Transpose(A);

which generates the output

⎤ 1 0 AT = ⎣ 3 −7⎦ −4 2 ⎡

Exercises 1.7 1. Let A, B, and C be the given matrices. In each of the following problems, compute (by hand) the prescribed algebraic combination of A, B, and C if the operation is deﬁned. If the operation is not deﬁned, explain why. ⎡ ⎡ ⎤ ⎤ −6 10 5 3 3 −5 2 0⎦ , B = ⎣ 2 11⎦ , C = ⎣−1 A= −1 5 −4 2 −4 −3 −2 (a) B + C (f) BA (k) AT + B (p) (BA)T

(b) A + B (g) AA (l) (B + C)T

(c) −2A (h) A(B + C) (m) BT C

(d) −3B + 4C (i) CA (n) BCT

(e) AB (j) C(A + B) (o) (AB)T

2. Let A, B, and C be the given matrices. In each of the following problems, compute (by hand) the prescribed algebraic combination of A, B, and C whenever the operation is deﬁned. If the operation is not deﬁned, explain why. 2 11 1 0 −5 3 A= , B= , C= 2 4 −3 −2 −5 3 (a) B + C (f) BA (k) AT + B (p) (BA)T

(b) A + B (g) AA (l) (B + C)T

(c) −2A (h) A(B + C) (m) BT C

(d) −3B + 4C (i) CA (n) BCT

(e) AB (j) C(A + B) (o) (AB)T

Matrix algebra

65

3. Discuss the differences between multiplying two square matrices versus multiplying non-square matrices. That is, under what circumstances can two square matrices be multiplied? How does the situation change for non-square matrices? In addition, if the product AB is deﬁned, is BA? 4. Give an example of 2 × 2 matrices A and B for which AB = BA. 5. Give an example of 2 × 2 matrices A and B for which AB = BA. 6. If A is m × n and B is n × k, and neither A nor B is square, can AB ever equal BA? Explain. In exercises 7–9, let A be the given matrix. If possible, ﬁnd a matrix B such that BA = I2 ; if B exists, determine whether BA = AB. 2 0 7. A = 0 5 2 4 8. A = 0 5 1 −1 9. A = −1 2 In exercises 10 and 11, for the given matrix A, answer each of the following questions: (a) Are the columns of A linearly independent? (b) Do the columns of A span R2 ? (c) How many pivot positions does A have? (d) Solve the equation Ax = 0 by row reducing by hand. Is A row equivalent to an important matrix? (e) If possible, determine a 2 × 2 matrix B such that BA = I2 . 2 −1 10. A = 2 −3 2 −1 11. A = 2 −4 12. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) If A and B are matrices of the same size, then the products AB and BA are always deﬁned. (b) If A and B are matrices such that the products AB and BA are both deﬁned, then AB = BA. (c) If A and B are matrices such that AB is deﬁned, then (AB)T = AT BT .

66

Essentials of linear algebra

(d) If A and B are matrices such that A + B is deﬁned, then (A + B)T = AT + BT . 13. Compute the prescribed algebraic computations in exercise 1 using a computer algebra system. 14. Compute the prescribed algebraic computations in exercise 2 using a computer algebra system.

1.8 The inverse of a matrix

We have observed repeatedly that linear algebra is a subject centered on one idea—systems of linear equations—viewed from several different perspectives. Continuing with this theme, we have recently considered an alternative method for solving the equation Ax = b by attempting to ﬁnd a matrix B such that BA = I, where I is the appropriate identity matrix. If we can in fact ﬁnd such a matrix B, it follows that B(Ax) = Bb

(1.8.1)

By the associativity of matrix multiplication and the deﬁning property of B, it follows that B(Ax) = (BA)x = Ix = x

(1.8.2)

Equations (1.8.1) and (1.8.2) together imply that x = Bb. Thus, the existence of such a matrix B shows us how we can solve Ax = b by multiplication. It turns out that from a computational point of view, row-reduction is a superior approach to solving Ax = b; nonetheless, the perspective that it may be possible to solve the equation through the use of a multiplicative inverse has many important theoretical applications. In addition, similar ideas will be encountered in our study of differential equations. Our work in section 1.7 showed that if A and B are not square matrices, it is never the case that AB and BA are equal. Thus it is only possible to ﬁnd a matrix B such that AB = BA = I if A is square (though even then it is not always the case that such a matrix B exists). Moreover, as we know from theorem 1.6.3, some square matrices have the important property that the equation Ax = b has a unique solution for every possible choice of b. For the next few sections, we therefore focus our attention almost exclusively on square matrices. Here, our emphasis is on the questions “when does a matrix B exist such that AB = BA = I?” and “when such a matrix B exists, how can we ﬁnd it?” The next deﬁnition formalizes the notion of the inverse of a matrix. Deﬁnition 1.8.1 If A is an n × n matrix, we say that A is invertible if and only if there exists an n × n matrix B such that AB = BA = In

(1.8.3)

The inverse of a matrix

67

When A is invertible, we call B the inverse of A and write B = A−1 (read “B is A-inverse”). If A is not invertible, A is often called a singular matrix, and thus saying “A is invertible” is equivalent to saying “A is nonsingular.” It can be shown (see exercise 19) that if A is an invertible n × n matrix, then its inverse is unique (i.e., a given matrix cannot have two distinct inverses). In addition, we note from our discussion above in (1.8.1) and (1.8.2) that if A is invertible, then the equation Ax = b has a solution for every b ∈ Rn . In particular, that solution is x = A−1 b. Moreover, since Ax = b has a solution for every b ∈ Rn , we know from theorem 1.6.1 that A has a pivot position in every row. From this, the fact that A is square, and theorem 1.6.3, it follows that Ax = b has a unique solution for every b ∈ Rn . We state this result formally in the following theorem. Theorem 1.8.1 If A is an n × n invertible matrix, then the equation Ax = b has a unique solution for every b ∈ Rn . Before beginning to explore how to ﬁnd the inverse of a matrix, as well as when the inverse even exists, we consider an example to see how we may check if two matrices are inverses and how to apply an inverse to solve a related equation. Example 1.8.1 Let A and B be the matrices 4 5 2/3 −5/3 A= , B= 1 2 −1/3 4/3 Show that A and B are inverses, and then use this fact to solve Ax = b, where b = [−7 3]T , without using row reduction. Solution. The reader should verify that the following matrix products indeed hold: 4 5 2/3 −5/3 1 0 AB = = 1 2 −1/3 0 1 4/3 and 2/3 −5/3 4 5 1 0 = BA = 0 1 −1/3 4/3 1 2 This shows that indeed B = A−1 . Note, equivalently, that A = B−1 . Now, we can easily solve the equation Ax = b where b is the given vector: 2/3 −5/3 −7 −29/3 −1 x=A b= = 3 19/3 −1/3 4/3 Of course, what is not clear in example 1.8.1 is how, given the matrix A, one might determine the entries in the inverse matrix B = A−1 . We now explore this in the 3 × 3 case for a general matrix A, and along the way learn conditions that guarantee that A−1 exists.

68

Essentials of linear algebra

Given a 3 × 3 matrix A, we seek a matrix B such that AB = I3 . Let the columns of B be b1 , b2 , and b3 , and the columns of I3 be e1 , e2 , and e3 . The column-wise deﬁnition of matrix multiplication then tells us that the following three vector equations must hold: Ab1 = e1 , Ab2 = e2 , and Ab3 = e3

(1.8.4)

For the unique inverse matrix B to exist, it follows that each of these equations must have a unique solution. Clearly if A has a pivot position in every row (or, equivalently, the columns of A span R3 ), then by theorem 1.6.3 it follows that we can ﬁnd unique vectors b1 , b2 , and b3 that make these three equations hold. Thus, any one of the conditions in theorem 1.6.3 will guarantee that B = A−1 exists. Moreover, if A−1 exists, we know from theorem 1.8.1 that every condition in theorem 1.6.3 also holds. Momentarily, let us assume that A is indeed invertible. If we proceed to ﬁnd the matrix B by solving the three equations in (1.8.4), we see that row-reduction provides an approach for producing all three vectors at once. To ﬁnd these vectors one at a time, it would be necessary to row-reduce each of the three augmented matrices [A e1 ], [A e2 ], and [A e3 ]

(1.8.5)

In each case, the exact same elementary row operations will be applied to A and thus be applied, respectively, to the vectors e1 , e2 , and e3 . As such, we may do all of them at once by considering the augmented matrix [A e1 e2 e3 ]

(1.8.6)

Note particularly that the form of the augmented matrix in (1.8.6) is [A I3 ]. If we now row-reduce this matrix, and A has a pivot in every row, it follows that we will be able to read the coefﬁcients of A−1 from the result. This process is best illuminated by an example, so we now explore how these computations lead us to A−1 in a concrete situation. Example 1.8.2

Find the inverse of the matrix ⎤ ⎡ 2 1 −2 1 −1⎦ A=⎣ 1 −2 −1 3

Solution. Following the discussion above, we identity matrix and row-reduce. It follows that ⎡ ⎤ ⎡ 2 1 −2 1 0 0 1 0 ⎣ 1 1 −1 0 1 0⎦ → ⎣0 1 0 0 −2 −1 3 0 0 1

augment A with the 3 × 3 ⎤ 0 2 −1 1 0 −1 2 0⎦ 1 1 0 1

These computations demonstrate two important things. The ﬁrst is that the row reduction of A in the ﬁrst three columns of the augmented matrix shows

The inverse of a matrix

69

that A has a pivot position in every row, and therefore A is invertible. Moreover, the row-reduced form of [A I3 ] tells us that A−1 is the matrix ⎡ ⎤ 2 −1 1 2 0⎦ A−1 = ⎣−1 1 0 1 Again, we observe from our preceding discussion and example 1.8.2 that we have found an algorithm for ﬁnding the inverse of a square matrix A. We augment A with the corresponding identity matrix and row-reduce. Provided that A has a pivot in every row, we ﬁnd by row-reducing that [A I] → [I A−1 ] That is, row-reduction of an invertible matrix A augmented with the identity matrix leads us directly to the inverse, A−1 . Next, we examine what happens in the event that a square matrix is not invertible. Example 1.8.3 Find the inverse of the matrix 2 1 A= −6 −3 provided the inverse exists. If the inverse does not exist, explain why. Solution. We augment A with the 2 × 2 identity matrix and row-reduce, ﬁnding that 1

1 2 0 − 16 2 1 1 0 → 1 −6 −3 0 1 0 0 1 3 Again, we see at least two key facts from these computations: A does not have a pivot position in every row, and thus A is not invertible. In particular, recall that we are solving two vector equations simultaneously in these computations: Ab1 = e1 and Ab2 = e2 . If we consider the ﬁrst of these and observe the rowreduction 1 2 1 1 1 2 0 → −6 −3 0 0 0 1 we see that this system of equations is inconsistent—the last row of the augmented matrix is equivalent to the equation 0b11 + 0b12 = 1, where b = [b11 b12 ]T . This is yet another way of saying that A does not have an inverse. The above two examples together show us, in general, how we answer two questions at once: does the square matrix A have an inverse? And if so, what is A−1 ? In a computational sense, we can simply row-reduce A augmented with the appropriate identity matrix and then observe if A has a pivot position in every row. If A is row equivalent to the appropriately sized identity matrix, then A is invertible and A−1 will be revealed through the row-reduction.

70

Essentials of linear algebra

We close this section with a formal statement of a theorem that summarizes our discussion. Note particularly how this result extends theorem 1.6.3 and demonstrates the theme of linear algebra: one idea from several perspectives. We will refer to this result as The Invertible Matrix Theorem. Theorem 1.8.2 (The Invertible Matrix Theorem) Let A be an n × n matrix. The following statements are equivalent: a. A is invertible. b. The columns of A are linearly independent. c. The columns of A span Rn . d. A has a pivot position in every column. e. A has a pivot position in every row. f. A is row equivalent to In . g. For each b ∈ Rn , the equation Ax = b has a unique solution. In addition to being of great theoretical signiﬁcance, inverse matrices ﬁnd many key applications. We investigate one such use in the following subsection. 1.8.1 Computer graphics

Linear algebra is the engine that drives computer animations. While animated movies originally were constructed by artists hand-drawing thousands of similar sketches that were photographed and played in sequence, today such ﬁlms are created entirely with computers. Once a ﬁgure has been constructed, moving the image around the screen is essentially an exercise in matrix multiplication. Every pixel in an image on a computer screen can be represented through coordinates. For an elementary example, consider an animated ﬁgure which, at a given point in time, has its hand located at the point (3, 4). To see how a basic animation can be built, assume further that the ﬁgure’s elbow is at the origin (0, 0), and that an animator wishes to make the hand wave back and forth. This enables us to represent the forearm of the ﬁgure with the vector v = [3 4]T . If we now consider the matrix √ 3/2 √ −1/2 R= 3/2 1/2 and apply the matrix R to the vector v, we see that the product is √ √ 3/2 √ −1/2 3 3 3− 4/2 0.598 √ = ≈ Rv = 4.964 3/2 4 1/2 3 + 4 3/2

The inverse of a matrix

5

71

Rv v

3 Figure 1.13 The vectors

v = [3 4] and Rv = [0.598 4.964]T .

Thus, the ﬁgure’s hand is now located at the point (0.598, 4.964). In fact, the hand has been rotated 30◦ counterclockwise about the origin, as shown in ﬁgure 1.13. The matrix R is known as a rotation matrix; its impact on any vector is to rotate the vector 30◦ counterclockwise about the origin. One way to see why this is so is to compute the vectors Re1 and Re2 , where e1 and e2 are the columns of the 2 × 2 identity matrix. Since each of those two vectors is rotated 30◦ when multiplied by R, the same thing happens to any vector in R2 , because any such vector may be written as a linear combination of e1 and e2 . Not only do computer animations show one application of matrix–vector multiplication, but they also demonstrate the need for inverse matrices. For instance, suppose we knew that the matrix R had been applied to some unknown vector v and that the result was 2 Rv = 5 That is, a hand located at some unknown point v was waved and had been moved to the new point (2, 5). An animator might want to wave the hand back so that it ended up at its original location, which is again represented by the vector v. To do so, he must answer the question “for which vector v is Rv = [2 5]T ?” We now know that one way to solve for v is to use the inverse of R. The matrix R is clearly invertible because its columns are linearly independent; we can compute R −1 in the standard way to ﬁnd that √ 3/2 √1/2 R −1 = −1/2 3/2

72

Essentials of linear algebra

We can solve for v by computing v=R so that v=R

−1

−1

(Rv) = R

−1

2 5

√ 2 3/2 √1/2 2 4.232 = ≈ 5 3.330 −1/2 3/2 5

Of course, in actual animations, we would not wave the hand by a single 30◦ rotation, but rather through a sequence of consecutive small rotations, for instance, 1-degree rotations. Again, computers enable us to do thousands of such computations almost instantly and make amazing animations possible. We consider an additional example to see the role of matrices to store data as well as matrices and their inverses to transform the data. Example 1.8.4

Consider the matrix

0 1 B= 1 0

Let v1 = [2 1]T , v2 = [3 3]T , and v3 = [4 0]T be the vertices of a triangle in the plane. Compute Bv1 , Bv2 , and Bv3 . Sketch a picture of the new triangle that has resulted from applying the matrix B to the vertices (2, 1), (3, 3), and (4, 0). What is the impact of the matrix B on each point? Finally, determine the inverse of B. What do you observe? Solution.

We observe ﬁrst that 0 1 2 1 0 1 3 3 = , Bv2 = = , and Bv1 = 1 0 1 2 1 0 3 3 0 1 4 0 Bv1 = = 1 0 0 4

From these calculations, we see that multiplying by B moves a given point to a new point that corresponds to the one found by switching the coordinates of the given point. Geometrically, the matrix B accomplishes a reﬂection across the line y = x in the plane, as we can see in ﬁgure 1.14. Moreover, if we think about how we might undo reﬂection across the line y = x, it is clear that to restore a point to its original location, we need to reﬂect the point back across the line. Said differently, the inverse of the matrix B must be the matrix itself. We can conﬁrm that B−1 = B by computing the product 0 1 0 1 BB = =I 1 0 1 0 It is noteworthy that the calculations of Bv1 , Bv2 , and Bv3 can be simpliﬁed into a single matrix product if we let T = [v1 v2 v3 ]. That is, the matrix T holds the

The inverse of a matrix

73

5 (0,4) (3,3) (1,2) (2,1) (4,0) 5 Figure 1.14 The

triangle with vertices v1 = [2 1]T , v2 = [3 3]T , and v3 = [4 0]T and its image under multiplication by the matrix B.

coordinates of the three points in the given triangle; the product BT is then the image of the triangle under multiplication by the matrix B. A more complicated polygonal ﬁgure than a triangle would be stored in a matrix with additional columns. Of course, the actual work of computer animations is much more complicated than what we have presented here. Nonetheless, matrix multiplication is the platform on which the entire enterprise of animated ﬁlms is built. In addition to achieving rotations and reﬂections, matrices can be used to dilate (or magnify) images, to shear images, and even to translate them (provided that we are clever about the coordinate system we use to represent points). Finally, matrices are even essential to the storage of images, as each column of a matrix can be viewed as a data point in an image. More about the application of matrices and their inverses to computer graphics can be learned in one of the projects found at the end of this chapter. In addition, a deeper discussion of the notion of linear transformations (of which reﬂection and rotation matrices are a part) can be found in appendix D. 1.8.2 Matrix inverses using Maple

Certainly we can use Maple’s row-reduction commands to ﬁnd inverses of matrices. However, an even simpler command exists that enables us to avoid having to enter the corresponding identity matrix. Let us consider the two matrices from examples 1.8.2 and 1.8.3. Let ⎡ ⎤ 2 1 −2 1 −1⎦ A=⎣ 1 −2 −1 3 If we enter the command > MatrixInverse(A);

74

Essentials of linear algebra

we see the resulting output which is indeed A−1 , ⎡ ⎤ 2 −1 1 ⎣−1 2 0⎦ 1 0 1 For the matrix

2 1 A= −6 −3

executing the command > MatrixInverse(A); produces the output Error, (in LinearAlgebra:-LA Main:-MatrixInverse) singular matrix

which is Maple’s way of saying “A is not invertible.” Exercises 1.8 In exercises 1–5, ﬁnd the inverse of each matrix (doing the computations by hand), or show that the inverse does not exist. 2 1 1. 2 2 5 0 2. 0 −3 2 −1 3. −4 2 ⎡ ⎤ 1 2 −1 3⎦ 4. ⎣0 1 0 0 2 ⎡ ⎤ 1 −2 −1 1 0⎦ 5. ⎣−1 1 3 4 1 3 2 11 −3 6. Let A = and b1 = , b2 = , b3 = . Find A−1 and 1 4 5 4 −7 use it to solve the equations Ax = b1 , Ax = b2 , and Ax = b3 . In addition, show how you can use row reduction to solve all three of these equations simultaneously. 1 −3 10 2 −1/2 7. Let A = and b1 = , b2 = , b3 = . Solve the 1 1 −2 6 −20 equations Ax = b1 , Ax = b2 , and Ax = b3 . What do you observe about the matrix A? 1 −2 3 8. Let A = and b = . Without doing any computations, explain 1 2 5 why b may be written as a linear combination of the columns of A.

The inverse of a matrix

75

Then execute computations to ﬁnd the explicit weights by which b is a linear combination of the columns of A. ⎤ ⎡ 1 0 0 9. Let E be the elementary matrix given by E = ⎣0 0 1⎦. Note that E is 0 1 0 obtained by interchanging rows 2 and 3 of the 3 × 3 identity matrix. Choose a 3 × 3 matrix A, and compute EA. What is the effect on A of multiplication by E? 10. Without doing any row-reduction, determine E−1 where E is the matrix deﬁned in exercise 9. (Hint: E−1 EI = I. Think about the impact that E has on I, and then what E−1 must accomplish.) ⎡ ⎤ 1 0 0 11. Let E be the elementary matrix given by E = ⎣0 c 0⎦. Note that E is 0 0 1 obtained by scaling the second row of the 3 × 3 identity matrix by the constant c. Choose a 3 × 3 matrix A, and compute EA. What is the effect on A of multiplication by E? 12. Without doing any row reduction, determine E−1 where E is the matrix deﬁned in exercise 11. What do you observe? ⎡ ⎤ 1 0 0 13. Let E be the elementary matrix given by E = ⎣ 0 1 0⎦. Note that E is a 0 1 obtained by applying the row operation of taking a times row 1 of the 3 × 3 identity matrix and adding it to row 3 to form a new row 3. Choose a 3 × 3 matrix A, and compute EA. What is the effect on A of multiplication by E? 14. Without doing any row reduction, determine E−1 where E is the matrix deﬁned in exercise 13. (Hint: E−1 EI = I. Think about the impact that E has on I, and then what E−1 must accomplish.) √ √ 1/√2 −1/√2 15. Let A = . Compute A−1 . What do you observe about the 1/ 2 1/ 2 relationship between A and A−1 ? cos θ − sin θ . Compute AT and 16. Let θ be any real number and A = sin θ cos θ AT A. What do you observe about the relationship between A and AT ? 17. Let A and B be invertible n × n matrices with inverses A−1 and B−1 , respectively. Show that AB is also an invertible matrix by ﬁnding (AB)−1 in terms of A−1 and B−1 . 18. Let A be an invertible matrix. Explain why A−1 is also invertible, and ﬁnd (A−1 )−1 . 19. Show that if A is an invertible n × n matrix, then its inverse is unique. (Hint: suppose that both B and C are inverses of A. What can you say about AB and AC?)

76

Essentials of linear algebra

20. For real numbers a and b, the Zero Product Property states that “if a · b = 0, then a = 0 or b = 0.” Said differently, if a = 0 and b = 0, then a · b = 0. Let 0 be the 2 × 2 zero matrix (i.e., all entries are zero). Does the Zero Product Property hold for matrices? That is, can you ﬁnd two nonzero matrices A and B such that AB = 0? Can you ﬁnd such matrices where none of the entries in A or B are zero? If so, what kind of matrices are A and B? 21. Does there exist a 2 × 2 matrix A, none of whose entries are zero, such that A2 = 0? 22. Does there exist a 2 × 2 matrix A other than the identity matrix such that A2 = I? What is special about such a matrix? 23. Let D be a diagonal matrix, P an invertible matrix, and A = PDP−1 . Using the expression PDP−1 for A, compute and simplify the matrix A2 = A · A. Do likewise for A3 = A · A · A. What will be the simpliﬁed form of An in terms of P, D, and P−1 ? a b 24. Let A be the matrix . Find conditions on a, b, c, and d that c d guarantee that Ax = 0 has inﬁnitely many solutions. What must therefore be true about a, b, c, and d in order for A to be invertible? √ 1/2 3/2 √ and v1 , v2 , v3 be the vectors that emanate from 25. Let A = − 3/2 1/2 the origin to the vertices of the triangle given by (2, 1), (3, 3), and (4, 0). Compute the new triangle that results from applying the matrix A to the given vertices, and sketch a picture of the original triangle and the resulting image. What is the effect of multiplying by A? 26. Suppose that A in exercise 25 was applied to a different set of three unknown vectors x1 , x2 , and x3 . The resulting output from these products is 0 2 −4 Ax1 = , Ax2 = , and Ax3 = 2 3 1 In other words, the new image after multiplying by A is the triangle whose vertices are (−4, 2), (0, 3), and (2, 1). Determine the exact vectors x1 , x2 , and x3 and sketch the original triangle that was mapped to the triangle with vertices (−4, 2), (0, 3), and (2, 1). 27. Consider the matrix 0 −1 B= 1 0 Let v1 = [2 1]T , v2 = [3 3]T , and v3 = [4 0]T . Compute Bv1 , Bv2 , and Bv3 . Sketch a picture of the new triangle that has resulted from applying the matrix B to the vertices (1, 1), (2, 3), and (4, 0). What is the geometric effect of the matrix B on each point?

The inverse of a matrix

77

28. Determine the inverse of B in exercise 27. What do you observe? 29. An unknown 2 × 2 matrix C is applied to the two vectors v1 = [1 1]T and v2 = [2 3]T , and the results are Cv1 = [0.1 0.7]T and Cv2 = [−0.1 1.8]T . Determine the entries in the matrix C. 30. Suppose that a computer graphics programmer decides to use the matrix √ √ 1/√2 1/√2 A= 1/ 2 1/ 2 Why is the programmer’s choice a bad one? What will be the result of applying this matrix to any collection of points? 31. Suppose that for a large population that stays relatively constant, people are classiﬁed as living in urban, suburban, or rural settings. Moreover, assume that the probabilities of the various possible transitions are given by the following table: Future location (↓)/current location (→)

U(%)

S(%)

R(%)

92

3

2

Suburban

7

96

10

Rural

3

1

88

Urban

Given that the population of 250 million in a certain year is distributed among 100 million urban, 100 million suburban, and 50 million rural, determine the population distribution in each of the preceding two years. 32. Car-owners can be grouped into classes based on the vehicles they own. A study of owners of sedans, minivans, and sport-utility vehicles shows that the likelihood that an owner of one of these automobiles will replace it with another of the same or different type is given by the table Future vehicle (↓)/ current vehicle (→)

Sedan(%)

Minivan(%)

SUV(%)

91

3

2

Minivan

7

95

8

sUV

2

2

90

Sedan

If there are currently 100 000 sedans, 60 000 minivans, and 80 000 SUVs among the owners being studied, determine the distribution of vehicles among the population before each current owner replaced his or her previous vehicle.

78

Essentials of linear algebra

33. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) If A is a matrix with a pivot in every row, then A is invertible. (b) If A is an invertible matrix, then its columns are linearly independent. (c) If Ax = b has a unique solution, then A is an invertible matrix. (d) If A and B are invertible matrices, then (AB)−1 exists and (AB)−1 = A−1 B−1 . (e) If A is a square matrix row equivalent to the identity matrix, then A is invertible. (f) If A is a square matrix and Ax = b has a solution for a given vector b, then Ax = c has a solution for every choice of c. (g) If R is a matrix that reﬂects points across a line through the origin, then R −1 = R. (h) If A and B are 2 × 2 matrices with all nonzero entries, then AB cannot equal the 2 × 2 zero matrix. 1.9 The determinant of a matrix

The Invertible Matrix Theorem (theorem 1.8.2) tells us that there are several different ways to determine whether or not a matrix is invertible, and hence whether or not an n × n system of linear equations has a unique solution. There is at least one more useful way to characterize invertibility, and that is through the concept of a determinant. As seen in exercise 24 of section 1.8, it may be shown through row-reduction that the general 2 × 2 matrix a b c d is invertible if and only if ad − bc = 0. We call the quantity (ad − bc) the determinant of the matrix A, and write4 det(A) = ad − bc. Note that this expression provides a condition on the entries of matrix A that determines whether or not A is invertible. We can explore similar ideas for larger matrices. For example, if we take an arbitrary 3 × 3 matrix ⎡ ⎤ a11 a12 a13 A = ⎣a21 a22 a23 ⎦ a31 a32 a33 and row-reduce in order to explore conditions under which the matrix has a pivot position in every row, it turns out to be necessary that the quantity D = a11 a22 a33 − a11 a23 a32 − a12 a21 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 4

Some authors use the notation |A| instead of det(A).

The determinant of a matrix

79

is nonzero. Grouping and factoring, we see that D may be rewritten in the form D = a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 ) (1.9.1) We again call this quantity D the determinant of the matrix A. In (1.9.1) we see evidence of the fact that determinants of larger matrices can be deﬁned recursively in terms of smaller matrices found within the original matrix A. For example, letting a22 a23 A11 = a32 a33 it follows that det(A11 ) = a22 a33 − a23 a32 , which is the expression multiplied by a11 in (1.9.1). More generally, if we let Aij be the submatrix deﬁned by deleting row i and column j of the original matrix A, then we see from (1.9.1) that D = a11 det(A11 ) − a12 det(A12 ) + a13 det(A13 ) The formal deﬁnition of the determinant of an n × n matrix is given through a similar recursive process. Deﬁnition 1.9.1 The determinant of an n × n matrix A with entries aij is deﬁned to be the number given by det(A) = a11 det(A11 ) − a12 det(A12 ) + · · · + (−1)n+1 a1n det(A1n )

(1.9.2)

where Aij is the matrix found by deleting row i and column j of A. We next consider an example to see some concrete computations. Example 1.9.1 Compute the determinant of the matrix ⎡ ⎤ 2 −1 1 1 2⎦ A=⎣ 1 −3 0 −3 In addition, determine if A is invertible. Solution. By deﬁnition, ⎡ ⎤ 2 −1 1 1 2 1 2 1 1 1 2⎦ = 2 det − (−1) det + 1 det det ⎣ 1 0 −3 −3 −3 −3 0 −3 0 −3 = 2(−3 − 0) + 1(−3 − (−6)) + 1(0 − (−3)) = −6 + 3 + 3 =0

80

Essentials of linear algebra

Next, to determine whether or not A is invertible, we row-reduce A to see if A has a pivot position in every row. Doing so, we ﬁnd that ⎡ ⎤ ⎡ ⎤ 2 −1 1 1 0 1 ⎣ 1 1 2⎦ → ⎣0 1 1⎦ 0 0 0 −3 0 −3 Thus, we see that A does not have a pivot in every row, and therefore A is not invertible. Of course, we should note that the primary motivation for the concept of the determinant comes from the question, “is A invertible?” Indeed, one reason the 3 × 3 matrix in the above example is not invertible is precisely because its determinant is zero. Later in this section, we will formally establish the connection between the value of the determinant and the invertibility of a general n × n matrix. It is clear at this point that determinants of most n × n matrices with n ≥ 3 require a substantial number of computations. Certain matrices, however, have particularly simple determinants to calculate, as the following example demonstrates. Example 1.9.2

Compute the determinant of the matrix ⎡ ⎤ 2 −2 7 A = ⎣0 −5 3⎦ 0 0 4

In addition, determine if A is invertible. Solution.

Again using the deﬁnition, we see that 0 3 0 −5 −5 3 det(A) = 2 det − (−2) det + 7 det 0 4 0 4 0 0 = 2(−5 · 4 − 2 · 0) + 2(0 − 0) + 7(0 − 0) = 2(−5)(4) = −40

Note particularly that the determinant of A is the product of its diagonal entries. Moreover, A clearly has a pivot position in every row, and so by this fact (or equivalently by the nonzero determinant of A) we see that A is invertible. In general, the determinant of any triangular matrix (one where all entries either below or above the diagonal are zero) is simply the product of its diagonal entries. There are other interesting properties that the determinant has, several of which are explored in the next example for the 2 × 2 case. Example 1.9.3

Let A=

a b c d

The determinant of a matrix

81

be an arbitrary 2 × 2 matrix. Explore the effect of elementary row operations on the determinant of A. Solution. First, let us consider a row swap, calling A1 the matrix c d A1 = a b We observe immediately that det(A) = ad − bc and det(A1 ) = cb − ad = − det(A). We next consider scaling; let A2 be the matrix whose ﬁrst row is [ka kb ], a scaled version of row 1 in A. We see that det(A2 ) = kad − kbc = k(ad − bc) = k · det(A). Finally, replacing, say, row 2 of A by the sum of k times row 1 with itself, we arrive at the matrix a b A3 = c + ka d + kb Then det(A3 ) = a(d + kb) − b(c + ka) = ad + kab − bc − kab = ad − bc = det(A). Thus, we see that for the 2 × 2 case, swapping rows in a matrix changes only the sign of the determinant, scaling a row by a nonzero constant scales the determinant by the same constant, and executing a row replacement does not change the value of the determinant at all. These demonstrate the effect that the three elementary row operations from the process of row-reduction have on a 2 × 2 matrix A. Given that the general deﬁnition of the determinant is recursive, it should not be surprising that the properties witnessed in example 1.9.3 can be shown to hold for n × n matrices. We state this result formally as our next theorem. Theorem 1.9.1 Let A be an n × n matrix and k a nonzero constant. Then a. If two rows of A are exchanged to produce matrix B, then det(B) = − det(A). b. If one row of A is multiplied by k to produce B, then det(B) = k det(A). c. If B results from a row replacement in A, then det(B) = det(A). Theorem 1.9.1 enables us to more clearly see the link between invertibility and determinants. Through a ﬁnite number of row interchanges and row replacements, any square matrix A may be row-reduced to upper triangular form U (where we have all subdiagonal zeros, but we do not necessarily scale to get 1’s on the diagonal). It follows from theorem 1.9.1 that det(A) = (−1)k det(U), where k is the number of row interchanges needed. Note that since U is triangular, its determinant is the product of its diagonal entries, and these entries

82

Essentials of linear algebra

lie in the pivot locations of A. Thus, A has a pivot in every row if and only if this determinant is nonzero. Speciﬁcally, we have shown that A is invertible if and only if det(A) = 0. To conclude this section, we note that linear algebra has once again afforded an alternate perspective on the problem of solving an n × n system of linear equations, and we can now add an additional statement involving determinants to the Invertible Matrix Theorem. Theorem 1.9.2 (Invertible Matrix Theorem) Let A be an n × n matrix. The following statements are equivalent: a. A is invertible. b. The columns of A are linearly independent. c. The columns of A span Rn . d. A has a pivot position in every column. e. A has a pivot position in every row. f. A is row equivalent to In . g. For each b ∈ Rn , the equation Ax = b has a unique solution. h. det(A) = 0. 1.9.1 Determinants using Maple

Obviously for most square matrices of size greater than 3 × 3, the computations necessary to ﬁnd determinants are tedious and present potential for error. As with other concepts that require large numbers of arithmetic operations, Maple offers a single command that enables us to take advantage of the program’s computational powers. Given a square matrix A of any size, we simply enter > Determinant(A);

As we explore properties of determinants in the exercises of this section, it will prove useful to be able to generate random matrices. Within the LinearAlgebra package in Maple, one accomplishes this for a 3 × 3 matrix with the command > RandomMatrix(3);

For example, if we wanted to consider the determinant of a random matrix A we could enter the code > A := RandomMatrix(3); > det(A);

See exercise 11 for a particular instance where this code will be useful.

The determinant of a matrix

83

Exercises 1.9 Compute (by hand) the determinant of each of the following matrices in exercises 1–7, and hence state whether or not the matrix is invertible. 2 1 1. A = 2 2

2 4 2. A = 1 2

⎤ 2 1 −3 5⎦ 3. A = ⎣2 2 2 3 −1 ⎡

⎡

⎤ 2 1 3 4. A = ⎣2 2 4⎦ 2 3 5 ⎡

−3 ⎢ 0 5. A = ⎢ ⎣ 0 0

⎤ 1 0 5 2 −4 0⎥ ⎥ 0 −7 11⎦ 0 0 6

⎤ a a d 6. A = ⎣b b e ⎦ c c f ⎡

7. In , where In is the n × n identity matrix. 1 2 invertible? Explain your 8. For which value(s) of h is the matrix −3 h answer in at least two different ways. 2−z 1 9. For which value(s) of z is the matrix invertible? Why? 1 2−z 10. For of z do nontrivial solutions x to the equation which value(s) 2−z 1 x = 0 exist? For one such value of z, determine a nontrivial 1 2−z solution x to the equation. 11. In a computer algebra system, devise code that will generate two random 3 × 3 matrices A and B, and that subsequently computes det(A), det(B), and det(AB). What theorem do you conjecture is true about the relationship between det(AB) and the individual determinants det(A) and det(B)?

84

Essentials of linear algebra

12. In a computer algebra system, devise code that will generate a random 3 × 3 matrix A and that subsequently computes its transpose AT , as well as det(A) and det(AT ). What theorem do you conjecture is true about the relationship between det(A) and det(AT )? 13. Use the formula conjectured in exercise 11 above to show that if A is 1 invertible, then det(A−1 ) = . (Hint: AA−1 = I.) det(A) 14. What can you say about the determinant of any square matrix in which one of the columns (or rows) is zero? Why? 15. What can you say about the determinant of any square matrix where one of the columns (or rows) is repeated in the matrix? Why? 16. Suppose that A is a n × n matrix and that Ax = 0 has inﬁnitely many solutions. What can you say about det(A)? Why? 17. Suppose that A2 is not invertible. Can you determine if A is invertible or not? Explain. 18. Two matrices A and B are said to be similar if there exists an invertible matrix P such that A = PBP−1 . What can you say about the determinants of similar matrices? 19. Let A be an arbitrary 2 × 2 matrix of the form a b c d where a = 0 and A is assumed to be invertible. Working by hand, row reduce the augmented matrix [A I2 ] and hence determine a formula for A−1 in terms of the entries of A. What role does det(A) play in the formula for A−1 ? 20. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) Swapping the rows in a square matrix A does not change the value of det(A). (b) If A is a square matrix with a pivot in every column, then det(A) = 0. (c) The determinant of any diagonal matrix is the product of its diagonal entries. (d) If A is an n × n matrix and Ax = b has a unique solution for every b ∈ Rn , then det(A) = 0.

1.10 The eigenvalue problem

Another powerful characteristic of linear algebra is the way the subject often allows us to better understand an inﬁnite collection of objects in terms of the properties of a small, ﬁnite number of elements in the set. For example, if we have

The eigenvalue problem

85

a set of three linearly independent vectors that spans R3 , then every vector in R3 may be understood as a unique linear combination of the three special vectors in the linearly independent spanning set. Thus, in some ways it is sufﬁcient to understand these three vectors, and to use that knowledge to better understand the rest of the vectors in R3 . In a similar way, as we will see in this section, for an n × n matrix A there are up to n important vectors (called eigenvectors) that enable us to better understand a variety of properties of the matrix. The process of matrix multiplication enables us to associate a function with any given matrix A. For example, if A is a 2 × 2 matrix, then we may deﬁne a function T by the formula T (x) = Ax

(1.10.1)

Note that the domain of the function T is R2 , the set of all vectors with two entries. Moreover, note that every output of the function T is also a vector in R2 . We therefore use the notation T : R2 → R2 . This is analogous to familiar functions like f (x) = x 2 , where for every real number input we obtain a real number output (f : R → R); the difference here is that for the function T , for every vector input we get a vector output. In what follows, we go in search of special input vectors to the function T for which the corresponding output is particularly simple to compute. The next example will highlight the properties of the vector(s) we seek. Example 1.10.1 Explore the geometric effect of the matrix 2 1 A= 1 2 on the vectors u = [1 0]T and v = [1 1]T from the perspective of the function T (x) = Ax. Solution. We ﬁrst compute T (u) = Au = [2 1]T . In ﬁgure 1.15, we see a plot of the vector u on the left, and T (u) on the right. This shows that the geometric effect of T on u is to rotate u and stretch it. For the vector v, we observe that T (v) = Av = [3 3]T . Graphically, as shown in ﬁgure 1.16, it is clear that T (v) is simply a stretch of v by a factor of 3. Said slightly differently, we might write that 3 1 T (v) = Av = =3 = 3v 3 1 This shows that the result of the function T (and hence the matrix A) being applied to the vector v is particularly simple: v is only stretched by T . For any n × n matrix A, there is an associated function T : Rn → Rn deﬁned by T (x) = Ax. This function takes a given vector in Rn and maps it to a corresponding vector in Rn ; in every case, we may view this output as resulting from the input vector being stretched and/or rotated. Input vectors that are

86

Essentials of linear algebra

3

3

T(u) T

u −3

−3

3

−3

3

−3

Figure 1.15 The vectors u and T (u) in example 1.10.1.

T(v)

3

3

v

T −3

3

−3

3

−3

−3 Figure 1.16 The vectors v and T (v).

only stretched have corresponding outputs that are simplest to determine: the input vector is simply multiplied by a scalar. To put this another way, for these stretched-only vectors, multiplying them by A is equivalent to multiplying them by a constant. Such vectors prove to be important for a host of reasons, and are called the eigenvectors of a matrix A. Deﬁnition 1.10.1 For a given n × n matrix A, a nonzero vector v is said to be an eigenvector of A if and only if there exists a scalar λ such that Av = λv

(1.10.2)

The scalar λ is called the eigenvalue corresponding to the eigenvector v.

The eigenvalue problem

87

In example 1.10.1, we found that the vector v = [1 1]T is an eigenvector of the given matrix A with corresponding eigenvalue 3 since Av = 3v. What is not yet clear is how we even begin to ﬁnd eigenvectors and eigenvalues. We will soon see that some of the many different perspectives we can take on systems of linear equations will help us solve this problem. In general, given an n × n matrix A, we seek eigenvectors v that are, by deﬁnition, nonzero and satisfy the equation Av = λv. In one sense, what makes this problem challenging is that neither v nor λ is initially known. We thus explore some different perspectives on the problem to see if we can highlight the role of either v or λ. Early in this chapter, we spent signiﬁcant effort studying homogeneous equations and the circumstances under which they have nontrivial solutions. Here, the eigenvector problem can be rephrased in a similar light. Subtracting λv from both sides of (1.10.2), we equivalently seek λ and v such that Av − λv = 0

(1.10.3)

Viewing λv as (λI)v, we can factor (1.10.3) and write (A − λI)v = 0

(1.10.4)

Now the question becomes, “for which values of λ does (1.10.4) have a nontrivial solution?” At this point, we recall theorem 1.6.2, which tells us that the equation Bx = 0 has only the trivial solution if and only if the matrix B has a pivot in every column. To have a nontrivial solution, we therefore want A − λI to not have a pivot in every column. In (1.10.4), the matrix A − λI is square, so by the Invertible Matrix Theorem such a nontrivial solution exists if and only if A − λI is not invertible. This last observation brings us, ﬁnally, to determinants. As we saw in Section 1.9, a matrix is invertible if and only if its determinant is nonzero. Therefore, a nontrivial solution to (1.10.4) exists whenever λ is such that det(A − λI) = 0. In the next example, we explore how this equation enables us to ﬁnd the eigenvalues of a matrix A, and hence the eigenvectors as well. Example 1.10.2 Find the eigenvalues and eigenvectors of the matrix 2 1 A= 1 2 Solution. As seen in our preceding discussion, by the deﬁnition of eigenvalues and eigenvectors, λ is an eigenvalue of A if and only if the equation (A −λI)v = 0 has a nontrivial solution. Note ﬁrst that A − λI is the matrix A with the scalar λ subtracted from each diagonal entry since 2 1 2−λ 1 λ 0 = A − λI = − 1 2 0 λ 1 2−λ

88

Essentials of linear algebra

We next compute det(A − λI) so that we can see which values of λ make this determinant zero. In particular, we have 2−λ 1 det(A − λI) = det 1 2−λ = (2 − λ)2 − 1 = λ2 − 4λ + 3

(1.10.5)

Thus, in order for det(A − λI) = 0, λ must satisfy the equation λ2 − 4λ + 3 = 0. Factoring, (λ − 3)(λ − 1) = 0, and therefore λ = 3 and λ = 1 are eigenvalues of A. The value λ = 3 is not surprising, given our earlier discoveries in example 1.10.1. Next, we proceed to ﬁnd the eigenvectors that correspond to each eigenvalue. Beginning with λ = 3, we seek nonzero vectors v that satisfy Av = 3v, or equivalently (A − 3I)v = 0 This problem is a familiar one: solving a homogeneous system of linear equations for which inﬁnitely many solutions exist. Augmenting A − 3I with a column of zeros and row-reducing, we ﬁnd that 1 0 1 −1 0 −1 → 1 −1 0 0 0 0 Note that from the very deﬁnition of an eigenvector, by which we seek a nontrivial solution to (A − λI)v = 0, it must be the case at this point that the matrix A − λI does not have a pivot in every row. Interpreting the row-reduced matrix with the free variable v2 , we ﬁnd that the vector v = [v1 v2 ]T must satisfy v1 − v2 = 0. Thus, any vector v of the form v2 1 v= = v2 1 v2 is an eigenvector of A that corresponds to the eigenvalue λ = 3. In particular, we observe that any scalar multiple of the vector v = [1 1]T is an eigenvector of A with associated eigenvalue 3. We say that the set of all eigenvectors associated with eigenvalue 3 is the eigenspace corresponding to λ = 3. It now only remains to ﬁnd the eigenvectors associated with λ = 1. We proceed in the same manner as above, now solving the homogeneous equation (A − 1I)v = 0. Row-reducing, we ﬁnd that 1 1 0 1 1 0 → 1 1 0 0 0 0

The eigenvalue problem

89

and therefore the eigenvector v must satisfy v1 + v2 = 0 and have the form −v2 −1 = v2 v= 1 v2 Here, any scalar multiple of v = [−1 1]T is an eigenvector of A corresponding to λ = 1. There are several important general observations to be made from example 1.10.2. One is that for any 2 × 2 matrix, the matrix will have 0, 1, or 2 real eigenvalues. This comes from the fact that det(A − λI) is a quadratic function in the variable λ, and therefore can have up to two real zeros. While it is possible to consider complex eigenvalues, we will wait until these arise in our study of systems of differential equations to address them in detail. In addition, we note that there are inﬁnitely many eigenvectors associated with each eigenvalue. Often we will be interested in ﬁnding representative eigenvectors— ones for which all others with the same eigenvalue are linear combinations. Finally, it is worthwhile to note that the two representative eigenvectors found in example 1.10.2, corresponding respectively to the two distinct eigenvalues, are linearly independent. More on why this is important will be discussed at the end of this section; for now, we remark that it is possible to show that eigenvectors corresponding to distinct eigenvalues are always linearly independent. This fact will be proved in exercise 16. The observations in the preceding paragraph generalize to the case of n × n matrices. It may be shown that det(A − λI) is a polynomial of degree n in λ. This function is usually called the characteristic polynomial; the equation det(A − λI) = 0 is typically referred to as the characteristic equation. Because the characteristic polynomial has degree n, it follows that A has up to n real eigenvalues5 . Next we consider two additional examples that demonstrate some more of the possibilities and important ideas that arise in trying to ﬁnd the eigenvalues and eigenvectors of a given matrix. Example 1.10.3 Determine the eigenvalues and eigenvectors of the matrix √ √ 1/√2 −1/√2 R= 1/ 2 1/ 2 In addition, explore the geometric effect of the function T (v) = Rv on vectors in R2 .

5 See appendix C for a review and discussion of important properties of roots of polynomial equations.

90

Essentials of linear algebra

Solution. solve

We consider the characteristic equation det(R − λI) = 0 and hence

0 = det

√1 − λ 2 √1 2

− √1

2

√1 2

−λ

2 1 1 = √ −λ + 2 2 √ 2 = λ − 2λ + 1

By the quadratic formula, it follows that √ √ √ √ 2± 2−4 2±i 2 λ= = 2 2 which shows that R does not have any real eigenvalues. If we explore the geometric effect of T (v) = Rv graphically, we can better understand why this is the√ case.√Beginning with the vector e1 = [1 0]T and computing Re1 = [1/ 2 1/ 2]T , as seen in ﬁgure 1.17, we see that the function T (x) = Rx rotates the vector e1 counterclockwise by π/4 radians, and (as computing the length of each vector shows) there is no stretching Similarly, for √ involved. √ the vector e2 = [0 1]T , we can see that Re2 = [−1/ 2 1/ 2]T . Just as with the previous vector e1 , we see that the function T (v) = Rv simply rotates the vector e2 counterclockwise by π/4 radians. In fact, since every vector in R2 can be written as a linear combination of e1 and e2 , it follows that the image Rv of any vector v is simply the original vector rotated counterclockwise π/4 radians. This shows that no vector in R2 is simply stretched under multiplication by R, and therefore R has no real eigenvectors.

2

2

Re1 T

e1 −2

2

−2 Figure 1.17 The vectors e1 and T (e1 ) = Re1 .

−2

2

−2

The eigenvalue problem

91

Matrices such as R in example 1.10.3 with the property that they rotate every vector by a ﬁxed angle (with no stretching factor) are usually called rotation matrices. Other interesting cases arise in the search for eigenvectors when some of the eigenvalues are repeated. That is, when a value λ is a multiple root of the characteristic equation det(A − λI) = 0. We explore this further in the next example. Example 1.10.4 Determine all eigenvalues and eigenvectors of the matrix ⎡ ⎤ 5 6 2 A = ⎣0 −1 −8⎦ 1 0 −2 Solution. As in previous examples, we ﬁrst compute det(A − λI). Doing so and simplifying yields det(A − λI) = −36 + 15λ + 2λ2 − λ3 Factoring, it follows that det(A − λI) = −(λ + 4)(λ − 3)2 Setting the characteristic polynomial equal to zero, it is required that −(λ + 4) (λ − 3)2 = 0. This shows that A has two distinct eigenvalues; moreover, just as with zeros of polynomials, we say that λ = −4 has multiplicity 1, while λ = 3 has multiplicity 2. We now ﬁnd the eigenvectors corresponding to each eigenvalue. For λ = −4, we solve the equation (A + 4I)v = 0, and see by row-reducing that ⎡ ⎤ ⎡ ⎤ 1 0 2 0 9 6 2 0 ⎣0 3 −8 0⎦ → ⎣0 1 − 8 0⎦ 3 1 0 2 0 0 0 0 0 Note that v3 is a free variable, and that the corresponding eigenvector v must have components which satisfy v1 + 2v3 = 0 and v2 − 83 v3 = 0, which shows that v has form ⎡ ⎤ −2 ⎢ ⎥ v = v3 ⎣ 83 ⎦ 1 Likewise, for λ = 3, we consider (A − 3I)v = 0, and row-reduce to ﬁnd that ⎡ ⎤ ⎡ ⎤ 2 6 2 1 0 −5 ⎣0 −4 −8⎦ → ⎣0 1 2⎦ 0 0 0 1 0 −5

92

Essentials of linear algebra

This leads us to see that the corresponding eigenvector has form ⎡ ⎤ 5 v = v3 ⎣ −2 ⎦ 1 Therefore, we see that for this matrix A, the matrix has two distinct eigenvalues (−4 and 3), and each of these eigenvalues has only one associated linearly independent eigenvector. That is, every eigenvector of A associated with λ = −4 is a scalar multiple of [−2 83 1]T while every eigenvector associated with λ = 3 is a scalar multiple of [5 − 2 1]T . In the three preceding examples, we have seen that an n × n matrix has up to n real eigenvalues. It turns out that there are also up to n linearly independent eigenvectors of the matrix. For many reasons, the best possible scenario is when a matrix has n linearly independent eigenvectors, such as the matrix A in example 1.10.2. In that 2 × 2 situation, A had two distinct real eigenvalues, and two corresponding linearly independent eigenvectors. One reason that this is so useful is that the eigenvectors are not only linearly independent, but also span R2 . If we call the two eigenvectors found in example 1.10.2 u and v, corresponding to λ = 3 and μ = 1, respectively, then, since these two vectors are linearly independent in R2 and span R2 , we can write every vector in R2 uniquely as a linear combination of u and v. In particular, given a vector x, there exist coefﬁcients α and β such that x = αu + β v If we are interested in computing Ax, we can do so now solely by knowing how A acts on the eigenvectors. Speciﬁcally, if we apply the linearity of matrix multiplication and the deﬁnition of eigenvectors, we have Ax = A(α u + β v) = α Au + β Av = αλu + βμv

This then reduces matrix multiplication essentially to scalar multiplication. In conclusion, we have seen in this section that via matrix multiplication, every matrix can be viewed as a function in the way that, through multiplication, it stretches and rotates vectors. Those vectors that are only stretched are called eigenvectors, and the factor by which the matrix stretches them are called eigenvalues. By knowing the eigenvalues and eigenvectors, we can better understand how A acts on an arbitrary vector, and, with some more sophisticated approaches, even further understand key properties of the matrix. Some of these properties will be studied in detail later in this text when we consider systems of differential equations.

The eigenvalue problem

93

1.10.1 Markov chains, eigenvectors, and Google

In a Markov process such as the one discussed in subsection 1.3.1 that represents the transition of voters from one classiﬁcation to another, it is natural to wonder whether or not there is a distribution of voters for which the total number in each category will remain constant from one year to the next. For example, for the Markov process represented by x (n+1) = Mx (n) where M is the matrix

(1.10.6)

⎡ ⎤ 0.95 0.03 0.07 M = ⎣0.02 0.90 0.13⎦ 0.03 0.07 0.80

we can ask: is there a voter distribution x such that Mx = x? In light of our most recent work with eigenvalues and eigenvectors, we see that this question is equivalent to asking if the matrix M has λ = 1 as an eigenvalue with some corresponding eigenvector that can represent a voter distribution. If we compute the eigenvalues and eigenvectors of M, we ﬁnd that the eigenvalues are λ = 1.000, 0.911, 0.739. The eigenvector corresponding to λ = 1 is v = [0.770 0.558 0.311]T . Scaling v so that the sum of its entries is 250, we see that the eigenvector v = [117.450 85.113 47.437]T represents the distribution of a population of 250 000 people in such a way that the total number of Democrats, Republicans, and Independents does not change from one year to the next, under the hypothesis that voters change categories annually according to the likelihoods expressed in the Markov matrix M. This eigenvector is sometimes also called a stationary vector. Remarkably, we can also note that in our earlier computations in subsection 1.3.1 for this Markov chain, we observed that the sequence of vectors x (1) , x (2) , . . . , x (20) , . . . was approaching a single vector. In fact, the limiting value of this sequence is the eigenvector v = [117.450 85.113 47.437]T . That this phenomenon occurs is the result of the so-called Power method, a rudimentary numerical technique for computing an eigenvalue–eigenvector pair of a matrix. More about this concept can be studied in the project on discrete dynamical systems found in section 1.13.3. Example 1.10.5 Find the stationary vector from the matrix in example 1.3.3. Solution. Under the assumptions stated in example 1.3.3, we saw that the migration of citizens from urban to suburban areas of a metropolitan area, or vice versa, were modeled by the Markov process x (n+1) = Mx (n) where M is the matrix 0.85 0.08 M= 0.15 0.92

94

Essentials of linear algebra

Solving the equation x = Mx by writing (M − I)x = 0, we see that we need to ﬁnd the eigenvector of x that corresponds to λ = 1. Doing so, we ﬁnd that the eigenvector is 0.4706 v= 0.8824 Scaling this vector so that the sum of its entries is one, we see that the population stabilizes when it is distributed with 34.78 percent in the city and 65.22 percent in the suburbs, in accordance with the vector [0.3478 0.6522]T . One of the most stunning applications of eigenvalues and eigenvectors can be found on the World Wide Web. In particular, the idea of ﬁnding a stationary vector that satisﬁes Mx = x is at the center of Google’s Page Rank Algorithm that it uses to index the importance of billions of pages on the Internet. What is particularly challenging about this problem is the fact that the stochastic matrix M used by the algorithm is a square matrix that has one column for every page on the World Wide Web that is indexed by Google! In early 2007, this meant that M was a matrix with 25 billion columns. Nonetheless, properties of the matrix M and sophisticated numerical algorithms make it possible for modern computers to quickly ﬁnd the stationary vector of M and hence provide the user with the results we have all grown accustomed to in using Google.6 1.10.2 Using Maple to ﬁnd eigenvalues and eigenvectors

Due to its reliance upon determinants and the solution of polynomial equations, the eigenvalue problem is computationally difﬁcult for any case larger than 3 × 3. Sophisticated algorithms have been developed to compute eigenvalues and eigenvectors efﬁciently and accurately. One of these is the socalled QR algorithm, which through an iterative technique produces excellent approximations to eigenvalues and eigenvectors simultaneously. While Maple implements these algorithms and can ﬁnd both eigenvalues and eigenvectors, it is essential that we not only understand what the program is attempting to compute, but also how to interpret the resulting output. As always, in what follows we are working within the LinearAlgebra package. Given an n × n matrix A, we can compute the eigenvalues of A with the command > Eigenvalues(A);

6 A detailed description of how the Page Rank Algorithm works and the role that eigenvectors play may be read at http://www.ams.org/featurecolumn/archive/pagerank.html.

The eigenvalue problem

Doing so for the matrix

A=

95

2 1 1 2

from example 1.10.2 yields the Maple output 3 1 Despite the vector format, the program is telling us that the two eigenvalues of the matrix A are 3 and 1. If we desire the eigenvectors, too, we can use the command > Eigenvectors(A);

which leads to the output

3 1 −1 , 1 1 1

Here, the ﬁrst vector tells us the eigenvalues of A. The following matrix holds the corresponding eigenvectors in its columns; the vector [1 1]T is the eigenvector corresponding to λ = 3 and [−1 1]T corresponds to λ = 1. Maple is extremely powerful. It is not at all bothered by complex numbers. So, if we enter a matrix like the one in example 1.10.3 that has no real eigenvalues, Maple will ﬁnd complex eigenvalues and eigenvectors. To see how this appears, we enter the matrix √ √ 1/√2 −1/√2 R= 1/ 2 1/ 2 and execute the command > Eigenvectors(R);

The resulting output is

√ 1√ 1 I −I 2 2 + 2I 2 , √ √ 1 1 1 1 2 2 − 2I 2

Note √ that here Maple is using ‘I ’ to denote not the identity matrix, but rather −1. Just as we saw in example 1.10.3, R does not have any real eigenvalues. We can use familiar properties of complex numbers (most importantly, I 2 = 1) to actually check that the equation Ax = λx holds for the listed complex eigenvalues and complex eigenvectors above. However, at this point in our study, these complex eigenvectors are of less importance, so we defer further details on them until later work with systems of differential equations. One ﬁnal example is relevant here to see how Maple deals with repeated eigenvalues and missing eigenvectors. If we enter the 3 × 3 matrix A from

96

Essentials of linear algebra

example 1.10.4 and execute the Eigenvectors command, we receive the output ⎤ ⎡ ⎤ ⎡ 5 0 −2 3 8⎥ ⎣ 3 ⎦, ⎢ ⎣−2 0 3⎦ −4 1 0 1 Here we see that 3 is a repeated eigenvalue of A with multiplicity 2. The ﬁrst two columns of the matrix in the output contain the (potentially) linearly independent eigenvectors which correspond to this eigenvalue. The second column of all zeros indicates that A has only one linearly independent eigenvector corresponding to this particular eigenvalue. The third column, of course, is the eigenvector associated with the eigenvalue λ = −4. The column of all zeros also demonstrates that R3 does not have a linearly independent spanning set that consists of eigenvectors of A. Exercises 1.10 In exercises 1–8, compute (by hand) the eigenvalues and any corresponding real eigenvectors of the given matrix A. 5 1 1. A = 0 3 3 −1 2. A = −1 3 3 4 3. A = −5 −5 1 4 4. A = 1 4 ⎡ ⎤ 2 1 0 5. A = ⎣0 2 1⎦ 0 0 2 ⎡ ⎤ 2 1 0 6. A = ⎣0 2 0⎦ 0 0 2 ⎡ ⎤ 2 0 0 7. A = ⎣0 2 0⎦ 0 0 2 ⎡ ⎤ −3 2 5 8. A = ⎣ 0 6 −2⎦ 0 0 5

The eigenvalue problem

97

9. A 2 × 2 matrix A has eigenvalues 5 and −1 and corresponding eigenvectors u = [0 1]T and v = [1 0]T . Use this information to compute Ax, where x is the vector x = [−5 4]T . 10. A 2 × 2 matrix A has eigenvalues −3 and −2 and corresponding eigenvectors u = [−1 1]T and v = [1 1]T . Use this information to compute Ax, where x is the vector x = [−3 5]T . 11. Consider the matrix

⎡

⎤ −2 1 1 1⎦ A = ⎣ 1 −2 1 1 −2

(a) Determine the eigenvalues and eigenvectors of A. (b) Does R3 have a linearly independent spanning set that consists of eigenvectors of A? 12. Consider the matrix

A=

3 −1 −1 3

(a) Determine the eigenvalues and eigenvectors of A, and show that A has two linearly independent eigenvectors. (b) Let P be the matrix whose columns are two linearly independent eigenvectors of A. Why is P invertible? (c) Let D be the diagonal matrix whose diagonal entries are the eigenvalues of A; place the eigenvalues on the diagonal in an order corresponding to the order of the eigenvectors in the columns of P, where P is the matrix deﬁned in (b) above. Compute AP and PD. What do you observe? (d) Explain why A = PDP−1 . Use this factorization to compute A2 , A3 , and A10 in terms of P, D, and P−1 . In particular, explain how A10 can be easily computed by using the diagonal matrix D along with P and P−1 . 13. Consider the matrix

⎡

⎤ 3 −1 1 3 −1⎦ A = ⎣−1 1 −1 3

(a) Determine the eigenvalues and eigenvectors of A, and show that A has three linearly independent eigenvectors. (b) Let P be the matrix whose columns are three linearly independent eigenvectors of A. Why is P invertible? (c) Let D be the diagonal matrix whose diagonal entries are the eigenvalues of A; place the eigenvalues on the diagonal in an order corresponding to the order of the eigenvectors in the columns of P,

98

Essentials of linear algebra

where P is the matrix deﬁned in (b) above. Compute AP and PD. What do you observe? (d) Explain why A = PDP−1 . Use this factorization to compute A2 , A3 , and A10 in terms of P, D, and P−1 . 14. Prove that an n × n matrix A is invertible if and only if A has no eigenvalue equal to zero. 15. Show that if A, B, and P are square matrices (with P invertible) such that B = PAP−1 , then A and B have the same eigenvalues. (Hint: consider the characteristic equation for PAP−1 .) 16. Prove that if A is a 2 × 2 matrix and v and u are eigenvectors of A corresponding to distinct eigenvalues λ and μ, then v and u are linearly independent. (Hint: suppose to the contrary that v and u are linearly dependent.) 17. For a differentiable function y, denote the derivative of y with respect to x by D(y). Now consider the function y = e 7x , and compute D(y). For what value of λ is D(y) = λy? Explain how this value behaves like an eigenvalue of the operator D. What is the corresponding eigenvector? How does the problem change if we consider y = e rx for any other real value of r? 18. For a vector-valued function x(t ), let the derivative of x with respect to t be denoted by D(x). For the function e −2t x(t ) = −3e −2t compute D(x). For what value(s) of λ is D(x) = λx? Explain how it appears from your work that the operator D has an eigenvalue-eigenvector pair. 19. Suppose that for a large population that stays relatively constant, people are classiﬁed as living in urban, suburban, or rural settings. Moreover, assume that the probabilities of the various possible transitions are given by the following table: Future location (↓)/current location (→)

U(%)

S(%)

R(%)

90

3

2

Suburban

7

96

10

Rural

3

1

88

Urban

Given that a population of 250 million is present, is there a stationary vector that reveals a population which does not change from year to year? 20. Car-owners can be grouped into classes based on the vehicles they own. A study of owners of sedans, minivans, and sport utility vehicles shows

Generalized vectors

99

that the likelihood that an owner of one of these automobiles will replace it with another of the same or different type is given by the table Future vehicle (↓)/ current vehicle (→)

Sedan(%)

Minivan(%)

SUV(%)

91

3

2

Minivan

7

95

8

SUV

2

2

90

Sedan

If there are currently 100 000 vehicles in the population under study, is there a stationary vector that represents a distribution in which the number of owners of each type of vehicle will not change as they replace their vehicles? 21. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) If x is any vector and λ is a constant such that Ax = λx, then x is an eigenvector of A. (b) If Ax = 0 has nontrivial solutions, then λ = 0 is an eigenvalue of A. (c) Every 3 × 3 matrix has three real eigenvalues. (d) If A is a 2 × 2 matrix, then A can have up to two real linearly independent eigenvectors. 1.11 Generalized vectors

Throughout our work with vectors in Rn , we have regularly used several key algebraic properties they possess. For example, any two vectors u and v can be added to form a new vector u + v, any single vector can be multiplied by a scalar to determine a new vector cu, and there is a zero vector 0 with the property that for any vector v, v + 0 = v. Of course, we use other algebraic properties of vectors as well, often implicitly. Other sets of mathematical objects behave in ways that are algebraically similar to vectors. The purpose of this section is to expand our perspective on what familiar mathematical entities might also reasonably be called vectors; much of this expanded perspective is in anticipation of our pending work with differential equations and their solutions. We motivate our study with several familiar examples, and then summarize a collection of formal properties that all these examples share. Example 1.11.1 Let M2×2 denote the collection of all 2 × 2 matrices with real entries. Show that if A and B are any 2 × 2 matrices and c ∈ R, then A + B and cA are also 2 × 2 matrices. In addition, show that there exists a “zero matrix” Z such that A + Z = A for every matrix A.

100

Essentials of linear algebra

Solution.

Let

a11 a12 b11 b12 and B = A= a21 a22 b21 b22

By the deﬁnition of matrix addition, a + b11 a12 + b12 A + B = 11 a21 + b21 a22 + b22 and thus we see that A + B is also a 2 × 2 matrix. Recall that it only makes sense for matrices of the same size to be added; here we are simply pointing out the obvious fact that the sum of two matrices of the same size is yet another matrix of the same size. In the same way, ca11 ca12 cA = ca21 ca22 which shows that not only is the scalar multiple deﬁned, but also that cA is a 2 × 2 matrix. Finally, if we let Z be the 2 × 2 matrix all of whose entries are zero, 0 0 Z= 0 0 then our work with matrix sums shows us immediately that A + Z = A for every possible 2 × 2 matrix A. Certainly, we can see that there is nothing particularly special about the 2 × 2 case in this example; the same properties will hold for Mm×n for any positive integer values of m and n. Mathematicians often use the language “M2×2 is closed under addition and scalar multiplication” and “M2×2 contains a zero element ” to describe the observations we made in example 1.11.1. Speciﬁcally, to say that a set is closed under an operation means simply that if we perform the operation on an appropriate number of elements from the set, the result is another element in the set. We next consider several more examples of sets that demonstrate the properties of being closed and having a zero element. Example 1.11.2 Let P2 denote the set of all polynomials of degree 2 or less. That is, P2 is the set of all functions of the form p(x) = a2 x 2 + a1 x + a0 where a0 , a1 , a2 ∈ R. Show that P2 is closed under addition and scalar multiplication, and that P2 contains a zero element. Solution. Before we formally address the stated tasks, let us remind ourselves how we add polynomial functions. If we are given, say, f (x) = 2x 2 − 5x + 11 and g (x) = 4x − 3, we compute (f + g )(x) = f (x) + g (x) = 2x 2 − 5x + 11 + 4x − 3. We can then add like terms to simplify and ﬁnd that (f + g )(x) = 2x 2 − x + 8.

Generalized vectors

101

Similarly, if we wanted to compute (−3f )(x), we have (−3f )(x) = −3f (x) = −3(2x 2 − 5x + 11) = −6x 2 + 15x − 33. We now show that P2 is indeed closed under the operations of addition and scalar multiplication. Given two arbitrary elements of P2 , say f (x) = a2 x 2 + a1 x + a0 and g (x) = b2 x 2 + b1 x + b0 , it follows upon adding and combining like terms that (f + g )(x) = (a2 + b2 )x 2 + (a1 + b1 )x + (a0 + b0 ) which is obviously a polynomial of degree 2 or lower, and thus f + g is an element of P2 . In the same way, for any real value c, (cf )(x) = ca2 x 2 + ca1 x + ca0 which also belongs to P2 . Finally, it is evident that if we let z(x) = 0x 2 + 0x + 0 (i.e., z(x) is the zero function), then (f + z)(x) = f (x) for any choice of f in P2 . Here, too, we should observe that while these properties hold for P2 , there is nothing special about the 2. In fact, Pn (the set of all polynomials of degree n or less) has the exact same properties. Even P, the set of all polynomials, behaves in the same manner. Example 1.11.3 From calculus, consider the set C [−1, 1] of all continuous functions on the interval [−1, 1]. That is, C [−1, 1] = {f | f is continuous on [−1, 1]}.

Show that C [−1, 1] is closed under addition and scalar multiplication, and also that C [−1, 1] contains a zero element. Solution. Two standard facts from calculus tell us that the sum of any two continuous functions is also a continuous function and that a constant multiple of a continuous function is also a continuous function. Thus C [−1, 1] is closed under addition and scalar multiplication. Furthermore, the zero function z(x) = 0 is itself continuous, which shows that C [−1, 1] indeed has a zero element. One of the principal reasons that we are shifting our attention from vectors in Rn to this more generalized concept of vector where the objects under consideration are often functions is the fact that our focus in subsequent chapters will be solving differential equations. The solution to a differential equation is a function that makes the equation true. Moreover, we will also see that for certain important classes of differential equations, there are multiple solutions to the equation and that often these solution sets are closed under addition and scalar multiplication and also contain the zero function. From each of the above examples, we see that Rn has many important properties that we can consider in a broader context. We therefore introduce the notion of a vector space, which is a set of objects that have deﬁned operations of addition and scalar multiplication that satisfy the list of ten rules below. The concept of a vector space is a generalization of Rn .

102

Essentials of linear algebra

While many of the rules are technical in nature, the most important ones to verify turn out to be the three that we have focused on so far: being closed under addition, closed under scalar multiplication, and having a zero element. All three sets described in the above examples are vector spaces, as is Rn . Deﬁnition 1.11.1 A vector space is a nonempty set V of objects, on which operations of addition and scalar multiplication are deﬁned, where the objects in V (called vectors) adhere to the following ten rules: 1. For every u and v in V , the sum u + v is in V (V is “closed under vector addition”) 2. For every u and v in V , u + v = v + u (“vector addition is commutative”) 3. For every u, v , w in V , (u + v) + w = v + (u + w) (“vector addition is associative”) 4. There exists a zero vector 0 in V such that u + 0 = u for every u ∈ V (0 is called the additive identity of V ) 5. For every u ∈ V , there is a vector −u such that u + (−u) = 0 (−u is called the additive inverse of u) 6. For every u ∈ V and every scalar c, the scalar multiple cu ∈ V (V is “closed under scalar multiplication”) 7. For every u and v in V and every scalar c, c(u + v) = cu + cv (“scalar multiplication is distributive over vector addition”) 8. For every u ∈ V and scalars c and d, (c + d)u = cu + du 9. For every u ∈ V and scalars c and d, c(du) = (cd)u 10. For every u ∈ V , 1u = u Sometimes we can take a sub-collection (i.e., a subset) of the vectors in a vector space, and that smaller set itself acts like a vector space. For example, the set of all polynomial functions is a vector space. If we take just the polynomials of degree 2 or less (as in example 1.11.2 above), that subset is itself a vector space. This leads us to introduce the notion of a subspace. Deﬁnition 1.11.2 Given a vector space V , let H be a subset of V (i.e., every object in H is also in V .) There are then operations of addition and scalar multiplication on objects in H : speciﬁcally, the same addition and scalar multiplication as on the objects in V . We say H is a subspace of V if and only if all three of the following conditions hold: 1. H is closed under addition 2. H is closed under scalar multiplication 3. H contains the zero element of V

Generalized vectors

103

We close this section with two important examples of subspaces. The ﬁrst is a subspace of Rn associated with a given matrix A. The second is a subspace of the set of all continuous functions on [−1, 1]. Example 1.11.4 Recall the matrix A from example 1.10.4 in section 1.10, ⎡ ⎤ 5 6 2 A = ⎣0 −1 −8⎦ 1 0 −2 Show that the set of all eigenvectors that correspond to a given eigenvalue of A forms a subspace of R3 . Solution. In example 1.10.4, we saw that the eigenvalues of A are λ = −4 (with multiplicity 1) and λ = 3 (with multiplicity 2). In addition, the corresponding eigenvectors are v = [−2 83 1]T for λ = −4 and v = [5 − 2 1]T for λ = 3. In particular, recall that every scalar multiple of vλ=−4 is also an eigenvector of A corresponding to λ = −4. We now show that the set of all these eigenvectors corresponding to λ = −4 is a subspace of R3 . Let Eλ=−4 denote the set of all vectors v such that Av = −4v. First, certainly it is the case that A0 = −40. This shows that the zero element of R3 is an element of Eλ=−4 . Furthermore, we have already seen that every scalar multiple of an eigenvector is itself an eigenvector, and thus Eλ=−4 is closed under scalar multiplication. Finally, suppose we have two vectors x and y such that Ax = −4x and Ay = −4y. Observe that by properties of linearity, A(x + y) = Ax + Ay = −4x − 4y = −4(x + y)

which shows that (x + y) is also an eigenvector of A corresponding to λ = −4. Therefore, Eλ=−4 is closed under addition. This shows that Eλ=−4 is indeed a subspace of R3 . In a similar fashion, Eλ=3 is also a subspace of R3 . Our observations for the eigenspaces of the 2 × 2 matrix A in example 1.11.4 hold in general for any n × n matrix A: the set of all eigenvectors corresponding to a given eigenvalue of A forms a subspace of Rn . Example 1.11.5 Show that the set of all linear combinations of the sine and cosine functions is a subspace of the vector space C of all continuous functions. Solution. We let C denote the vector space of all continuous functions, and now let H be the subset of C which is deﬁned to be all functions that are linear combinations of sin t and cos t . That is, a typical element of H is a function f of the form f (t ) = c1 sin t + c2 cos t

104

Essentials of linear algebra

where c1 and c2 are any real scalars. We need to show that the set H contains the zero function from C , that H is closed under scalar multiplication, and that H is closed under addition. First, if we choose c1 = c2 = 0, the function z(t ) = 0 sin t + 0 cos t = 0 is the function that is identically zero, which is the (continuous) zero function from C . Next, if we take a function from H , say f (t ) = c1 sin t + c2 cos t , and multiply it by a scalar k, we get kf (t ) = k(c1 sin t + c2 cos t ) = (kc1 ) sin t + (kc2 ) cos t which is of course another element in H , so H is closed under scalar multiplication. Finally, if we consider two elements f and g in H , given by f (t ) = c1 sin t + c2 cos t and g (t ) = d1 sin t + d2 cos t , then it follows that f (t ) + g (t ) = (c1 sin t + c2 cos t ) + (d1 sin t + d2 cos t ) = (c1 + d1 ) sin t + (c2 + d2 ) cos t

so that H is closed under addition, too. Thus, H is a subspace of C . In fact, it turns out that the subspace considered in example 1.11.5 contains all of the solutions to a familiar differential equation. We will revisit this issue in example 1.11.7. It is also instructive to consider an example of a set that is not a subspace. Example 1.11.6 Consider the vector space C [−1, 1] of all continuous functions on the interval [−1, 1]. Let H be the set of all functions with the property that f (−1) = f (1) = 2. Determine whether or not H is a subspace of C [−1, 1]. Solution. The set H does not satisfy any of the three required properties of subspaces, so any one of these sufﬁces to show that H is not a subspace. In particular, the zero function z(t ) = 0 does not have the property that z(−1) = 2, and thus the zero function from C [−1, 1] does not lie in H , so H is not a subspace. We could also observe that any scalar multiple of a function whose value at t = −1 and t = 1 is 2 will result in a new function whose value at these points is not 2; similarly, the sum of two functions whose values at t = −1 and t = 1 are 2 will lead to a new function whose values at these points is 4. These facts together show that H is not closed under scalar multiplication, nor under addition. As we have already mentioned, we are considering this generalization of the term vector to include mathematical objects like functions because this structure underlies the study of differential equations, and this vector space perspective will help us to better understand a variety of key ideas when we are solving important problems later on. To foreshadow these coming ideas, we present an example of an elementary differential equation that shows how the set of solutions to the equation is in fact the subspace of continuous functions considered in example 1.11.5.

Generalized vectors

105

Example 1.11.7 Consider the differential equation y + y = 0 Show that y1 = sin t and y2 = cos t are solutions to this differential equation, and that every function of the form y = c1 y1 + c2 y2 is a solution as well. Solution. This example is very similar to example 1.6.4. Because of its importance, we discuss the current problem in full detail here as well. For any equation, a solution is an object that makes the equation true. In the above differential equation, y represents a function. The equation asks “for which functions y is the sum of y and its second derivative equal to zero?” Observe ﬁrst that if we let y1 = sin t , then y1 = cos t , so y1 = − sin t , and therefore y1 + y1 = − sin t + sin t = 0. In other words, y1 is a solution to the differential equation. Similarly, for y2 = cos t , y2 = − sin t and y2 = − cos t , so that y2 + y2 = − cos t + cos t = 0. Thus, y2 is also a solution to the differential equation. Now, consider any function y of the form y = c1 y1 + c2 y2 . That is, let y be any linear combination of the two solutions we have already found. We then have y = c1 sin t + c2 cos t so that, using standard properties of the derivative (properties which are linear in nature), it follows that y = c1 cos t − c2 sin t and y = −c1 sin t − c2 cos t We, therefore see that y + y = (−c1 sin t − c2 cos t ) + (c1 sin t + c2 cos t ) = −c1 sin t + c1 sin t − c2 cos t + c2 cos t =0

so that y is indeed also a solution of y + y = 0. In example 1.11.7, we ﬁnd a large number of connections to our work in systems of linear equations and linear algebra: properties of linearity, linear combinations of vectors, homogeneous equations, inﬁnitely many solutions, and more. In particular, the set of all solutions to the differential equation in example 1.11.7 is precisely the subspace of continuous functions examined in example 1.11.5. Certainly, we will revisit these topics in greater detail as we progress in our study of differential equations. Exercises 1.11 In exercises 1–16, determine whether or not the set H is a subspace of the given vector space V . If H is a subspace, show that it satisﬁes the

106

Essentials of linear algebra

three required properties stipulated by the deﬁnition; if not, show at least one example of why at least one of the properties does not hold. x 2 1. V = R , H = : x ≥ 0, y ≥ 0 y x 2 : x ·y ≥ 0 2. V = R , H = y ⎧ ⎡ ⎫ ⎤ 2 ⎨ ⎬ 3. V = R3 , H = t ⎣ 0 ⎦ : t ∈ R ⎩ ⎭ −1 ⎫ ⎧ ⎡ ⎤ ⎡ ⎤ 2 1 ⎬ ⎨ 4. V = R3 , H = t ⎣ 0 ⎦ + ⎣ 1 ⎦ : t ∈ R ⎭ ⎩ 1 −1 5. V = P2 , H = at 2 : a ∈ R 6. V = P2 , H = at 2 + 1 : a ∈ R 2 −1 5 2 7. V = R , H = x : Ax = b where A = and b = −6 3 −15 2 −1 0 8. V = R2 , H = x : Ax = b where A = and b = 0 −6 3 9. V = M2×2 , H = {A ∈ M2×2 : A is invertible} 10. V = M2×2 , H = {A ∈ M2×2 : A is not invertible} a 0 11. V = M2×2 , H = A ∈ M2×2 : A = b c a 1 12. V = M2×2 , H = A ∈ M2×2 : A = b c 13. V = C [−1, 1], H = f ∈ C [−1, 1] : f (−1) = 0 14. V = C [−1, 1], H = f ∈ C [−1, 1] : f (−1) = 5 15. V = C [−1, 1], H = f ∈ C [−1, 1] : f + f = 0 16. V = C [−1, 1], H = f ∈ C [−1, 1] : f + f = 1 17. Recall that for a given eigenvalue λ of a matrix A, the eigenspace associated to that eigenvalue is the set of all eigenvectors that correspond to λ. For the 2 −1 matrix A = , describe all of the eigenspaces of A. −1 2 2 1 , describe all of the eigenspaces of A. 18. For the matrix A = 0 2

Generalized vectors

107

19. Explain why for any set of vectors {u, v } in Rn , Span{u, v } is a subspace of Rn . Similarly, explain why Span {v1 , . . . , vk } is a subspace of Rn for any set {v1 , . . . , vk }. ⎫ ⎧⎡ ⎤ ⎬ ⎨ 2a + b 20. Let V = R3 and H = ⎣ a − b ⎦ : a , b ∈ R . Determine vectors u and ⎭ ⎩ 3a + 5b v so that H can be expressed as the set Span{u, v }, and hence explain why H is a subspace of R3 . ⎧⎡ ⎫ ⎤ ⎨ 2a + b ⎬ −2 ⎦ : a , b ∈ R . Explain why H is not a 21. Let V = R3 and H = ⎣ ⎩ ⎭ 3a + 5b subspace of R3 . 22. Let A be an m × n matrix. The null space of the matrix A, denoted Nul(A) is the set of all solutions to the equation Ax = 0. Explain why Nul(A) is a subspace of Rn . 23. Let A be an m × n matrix. The column space of the matrix A, denoted Col(A) is the set of all linear combinations of the columns of A. Explain why Col(A) is a subspace of Rm . In exercises 24–27, use the deﬁnitions of the null space Nul(A) and column space Col(A) of a matrix given in exercises 22 and 23. 2 1 −1 24. Let A = . Is the vector v = [−2 1 1]T in Nul(A)? Justify your 1 3 4 answer clearly. In addition, describe all vectors that belong to Nul(A) as the span of a ﬁnite set of vectors. ⎡ ⎤ 1 −2 1⎦. Is the vector v = [−2 1 1]T in Col(A)? Justify your 25. Let A = ⎣ 3 −4 0 answer. Is the vector u = [−1 4 − 4]T in Col(A)? In addition, describe all vectors that belong to Col(A) as the span of a ﬁnite set of vectors. 26. Given a matrix A and a vector v, is it easier to determine whether v lies in Nul(A) or Col(A)? Why? 27. Given a matrix A and a vector v, is it easier to describe Nul(A) or Col(A) as the span of a ﬁnite set of vectors? Why? 28. Consider the differential equation y = 3y. Explain why any function of the form y = Ce 3t is a solution to this equation. Is the set of all these solutions a subspace of the vector space of continuous functions? 29. Consider the differential equation y = 3y − 3. Explain why any function of the form y = Ce 3t + 1 is a solution to this equation. Is the set of all these solutions a subspace of the vector space of continuous functions?

108

Essentials of linear algebra

30. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) If H is a subspace of a vector space V , then H is itself a vector space. (b) If H is a subset of a vector space V , then H is a subspace of V . (c) The set of all linear combinations of any two vectors in R3 is a subspace of R3 . (d) Every nontrivial subspace of a vector space has inﬁnitely many elements. 1.12 Bases and dimension in vector spaces

In section 1.11, we saw that some common sets we encounter in mathematics are very similar to Rn . For instance, the set M2×2 of all 2 × 2 matrices, the set P2 of all polynomials of degree 2 or less, and the set C [−1, 1] of all continuous functions on [−1, 1] are sets that contain a zero element, are closed under addition, and are closed under scalar multiplication. In addition, because they each satisfy the other required seven characteristics we noted, these sets are all vector spaces. We speciﬁcally observe that this enables us to take linear combinations of elements of a vector space, because addition and scalar multiplication are deﬁned and closed in these collections of objects. Every vector space has further characteristics that are similar to Rn . For example, it is natural to discuss now-familiar concepts such as linear independence and span in the context of the more generalized notion of vector. As we will see, the deﬁnitions of these terms in the setting of vector spaces are almost identical to those we encountered earlier in Rn . Moreover, just as we can frequently describe sets in Rn in terms of a small number of special vectors, we will ﬁnd that this often occurs in general vector spaces. We begin by updating two key deﬁnitions. Deﬁnition 1.12.1 In a vector space V , given a set S = {v1 , . . . , vk } where each vector vi ∈ V , the set S is linearly dependent if there exists a nontrivial solution to the vector equation x1 v1 + x2 v2 + · · · + xk vk = 0

(1.12.1)

If (1.12.1) has only the trivial solution (x1 = · · · = xk = 0), then we say the set S is linearly independent. The only difference between this deﬁnition and deﬁnition 1.6.1 that we encountered in section 1.6 is that Rn has been replaced by V . Just as with vectors in Rn , it is an equivalent formulation to say that a set S in a vector space V is linearly independent if and only if no vector in the set may be written as a linear combination of the other vectors in the set. We can also deﬁne the span of a set of vectors in a vector space V .

Bases and dimension in vector spaces

109

Deﬁnition 1.12.2 In a vector space V , given a set of vectors S = {v1 , . . . , vk }, vi ∈ V , the span of S, denoted Span(S) or Span{v1 , . . . , vk }, is the set of all linear combinations of the vectors v1 , . . . , vk . Equivalently, Span(S) is the set of all vectors y of the form y = c1 v1 + · · · + ck vk , where c1 , . . . , ck are scalars. We also say that Span(S) is the subset of V spanned by the vectors v1 , . . . , vk . In example 1.6.3 in section 1.6, we studied three sets R, S, and T in R3 . R contained two vectors and was linearly independent but did not span R3 ; S contained three vectors, was linearly independent, and spanned R3 ; and T consisted of four vectors, was linearly dependent, and spanned R3 . In that setting, we came to see that the set S was in some ways the best of the three: it had both key properties of being linearly independent and a spanning set. In other words, the set had enough vectors to span R3 , but not so many vectors as to generate redundancy by being linearly dependent. Through the next deﬁnition, we will now call such a set a basis, even in the generalized setting of vector spaces and subspaces. Deﬁnition 1.12.3 Let V be a vector space and H a subspace of V . A set B = {v1 , v2 , . . . , vk } of vectors in H is called a basis of H if and only if B is linearly independent and Span(B) = H . That is, B is a basis of H if and only if it is a linearly independent spanning set. Several examples now follow that use the terminology of linear independence, span, and basis in the context of different vector spaces. Example 1.12.1 In the vector space P of all polynomials, consider the subspace H = P2 of all polynomials of degree 2 or less. Show that the set B = {1, t , t 2 } is a basis for H . Is the set {1, t , t 2 , 4 − 3t } also a basis for H ? Solution. To begin, we observe that every element of H = P2 is a polynomial function of the form p(t ) = a0 + a1 t + a2 t 2 . In particular, every element of P2 is a linear combination of the functions 1, t , and t 2 , and therefore the set B = {1, t , t 2 } spans H . In addition, to determine whether the set B is linearly independent, we consider the equation (1.12.2) c 0 + c 1 t + c2 t 2 = 0 and ask whether or not this equation has a nontrivial solution. Keeping in mind that the ‘0’ on the right-hand side represents the zero function in P2 , the function that is everywhere equal to zero, we can see that if at least one of c0 , c1 , or c2 is nonzero, we will be guaranteed to have either a nonzero constant function, a linear function, or a quadratic function, thus making c0 + c1 t + c2 t 2

110

Essentials of linear algebra

not identically zero. This shows that (1.12.2) has only the trivial solution, and therefore the set B = {1, t , t 2 } is linearly independent. Having shown that B is a linearly independent spanning set for H = P2 , we can conclude that B is a basis for H . On the other hand, the set {1, t , t 2 , 4 − 3t } is not a basis for H since we can observe that the element 4 − 3t is a linear combination of the elements 1 and t : 4 − 3t = 4 · 1 − 3 · t . This shows that the set {1, t , t 2 , 4 − 3t } is linearly dependent and thus cannot be a basis. Example 1.12.2 Consider the set H of all functions of the form y = c1 sin t + c2 cos t . In the vector space C of all continuous functions, explain why the set B = {sin t , cos t } is a basis for the subspace H . Solution. First, we recall that H is indeed a subspace of C [−1, 1] due to our work in example 1.11.5. By the deﬁnition of H (the set of all functions of the form y = c1 sin t + c2 cos t ), we see immediately that B is a spanning set for H . In addition, it is clear that the functions sin t and cos t are not scalar multiples of one another: any scalar multiple of sin t is simply a vertical stretch of the function, which cannot result in cos t . This tells us that the set B = {sin t , cos t } is also linearly independent, and therefore is a basis for H . Example 1.12.3 In R3 , consider the set B = {e1 , e2 , e3 }, where e1 = [1 0 0]T , e2 = [0 1 0]T , and e3 = [0 0 1]T . Explain why B is a basis for R3 . Is the set S = {v1 , v2 , v3 }, where v1 = [1 2 − 1]T , v2 = [−1 1 3]T , and v3 = [0 3 1]T also a basis for R3 ? Solution. First, we observe that while the formal deﬁnition of a basis refers to the basis of a subspace H of a vector space V , since every vector space is a subspace of itself, it follows that we can also discuss a basis for a vector space. Considering the set B = {e1 , e2 , e3 }, we observe that the vectors in this set are the columns of the 3 × 3 identity matrix. By the Invertible Matrix Theorem, it follows that the set B is linearly independent because I3 has a pivot in every column. Likewise, the set B spans R3 since I3 has a pivot in every row. As a linearly independent spanning set in R3 , B is indeed a basis. For the set S whose elements are the columns of the matrix ⎡ ⎤ 1 −1 0 1 3⎦ A=⎣ 2 −1 3 1 we again use the Invertible Matrix Theorem to determine whether or not S is a basis for R3 . Row-reducing A, it is straightforward to see that A is row equivalent to the identity matrix, and therefore is invertible. In particular, A has a pivot in

Bases and dimension in vector spaces

111

every column and every row, and thus the columns of A are linearly independent and span R3 . It follows that S is also a basis for R3 . The basis B = {e1 , e2 , e3 } consisting of the columns of the 3 × 3 identity matrix is often referred to as the “standard basis of R3 .” In addition, by our work in example 1.12.3, we can see the role that the Invertible Matrix Theorem plays in determining whether a set of vectors in Rn is a basis or not. Speciﬁcally, since we know that it is logically equivalent for the columns of a square matrix A to be linearly independent and to be a spanning set for Rn , it follows that a matrix A is invertible if and only if its columns form a basis for Rn . We therefore update the Invertible Matrix Theorem with an additional statement as follows. Theorem 1.12.1 (Invertible Matrix Theorem) Let A be an n × n matrix. The following statements are equivalent: a. A is invertible. b. The columns of A are linearly independent. c. The columns of A span Rn . d. A has a pivot position in every column. e. A has a pivot position in every row. f. A is row equivalent to In . g. For each b ∈ Rn , the equation Ax = b has a unique solution. h. det(A) = 0. i. The columns of A form a basis for Rn . Our next example demonstrates how certain families of vectors naturally form subspaces of Rn and how vector arithmetic can be used to determine a basis for the subspace they form. ⎡ ⎤ 3a + b − c ⎢ 4a − 5b + c ⎥ ⎥ Example 1.12.4 Consider the set W of all vectors of the form ⎢ ⎣ a + 2b − 3c ⎦. a −b 4 Show that W is a subspace of R and determine a basis for this subspace. Solution.

First, we observe that a typical element v of W is a vector of the form ⎤ ⎡ 3a + b − c ⎢ 4a − 5b + c ⎥ ⎥ v=⎢ ⎣ a + 2b − 3c ⎦ a −b

112

Essentials of linear algebra

Using properties of vector addition and scalar multiplication, we can write ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 3 1 ⎢4⎥ ⎢ −5 ⎥ ⎢ 1⎥ ⎥ ⎢ ⎥ ⎢ ⎥ v =a⎢ ⎣ 1 ⎦ + b ⎣ 2 ⎦ + c ⎣ −3 ⎦ 1 −1 −1 From this, we observe that W may be viewed as the span of the set S = {w1 , w2 , w3 }, where ⎤ ⎤ ⎡ ⎤ ⎡ ⎡ −1 3 1 ⎥ ⎥ ⎢4⎥ ⎢ ⎢ ⎥ , w2 = ⎢ −5 ⎥ , w3 = ⎢ 1 ⎥ w1 = ⎢ ⎣1⎦ ⎣ 2⎦ ⎣ −3 ⎦ 1 −1 −1 As seen in exercise 19 in section 1.11, the span of any set of vectors in Rn generates a subspace of Rn ; it follows that W is a subspace of R4 . Moreover, we can observe that S = {w1 , w2 , w3 } is a linearly independent set since ⎡ ⎤ ⎡ ⎤ 3 1 −1 1 0 0 ⎢4 −5 ⎢ ⎥ 1⎥ ⎢ ⎥ → ⎢0 1 0⎥ ⎣1 ⎦ ⎣ 0 0 1⎦ 2 −3 0 0 0 1 −1 −1 Since S both spans the subspace W and is linearly independent, it follows that S is a basis for W . In example 1.12.4 we used the fact that the span of any set in Rn is a subspace of Rn . This result extends to general vector spaces and is stated formally in the following theorem. Theorem 1.12.2 subspace of V .

In any vector space V , the span of any set of vectors forms a

It is not hard to prove this result. Since the span of a set contains all linear combinations of the set, it must contain the zero combination and be closed under both vector addition and scalar multiplication. One of the reasons that a basis for a subspace is important is that a basis tells us the minimum number of vectors needed to fully describe every element of the subspace. More speciﬁcally, given a basis B for a subspace W , we know that we can write every element of W uniquely as a linear combination of the elements in the basis. Note that a subspace does not have a unique basis; for example, in example 1.12.3, we saw two different bases for R3 . Furthermore, in R3 we have seen that the standard basis (and one example of another basis) has three elements. By the Invertible Matrix Theorem, it is clear that every basis of R3 consists of three vectors since we are required to have a set that is both linearly independent and spans R3 . Likewise, any basis of Rn will have n elements. It can be shown that even in vector spaces

Bases and dimension in vector spaces

113

other than Rn , any two bases of a subspace are guaranteed to have the same number of elements. Therefore, this number of elements in a basis can be used to identify a fundamental property of any subspace: the minimum number of elements needed to describe all of the elements in the space. We call this number the dimension of the subspace. Deﬁnition 1.12.4 Given a subspace W in a vector space V and a basis B for W , the number of elements in B is the dimension of W . Equivalently, if B has k elements, we write dim(W ) = k. Thus we naturally use the language that “R3 is three-dimensional” and similarly that “Rn has dimension n.” Similarly, we can say dim(P2 ) = 3 (see example 1.12.1), and that the dimension of the vector space of all linear combinations of the functions sin t and cos t is two (see example 1.12.2). In closing, it is worth recalling example 1.6.3 in section 1.6, where we considered three sets R, S, and T in R3 . R contained two vectors and was linearly independent but did not span R3 ; S contained three vectors, was linearly independent, and spanned R3 ; and T consisted of four vectors, was linearly dependent, and spanned R3 . Since the set S has both key properties of being linearly independent and a spanning set, we can say that the set S is a basis for R3 , which further reﬂects the fact that dim(R3 ) = 3. Exercises 1.12 In the vector space V given in each of exercises 1–7, determine a basis for the subspace H and hence state the dimension of H . ⎧ ⎡ ⎫ ⎤ 2 ⎨ ⎬ 1. V = R3 , H = t ⎣ 0 ⎦ : t ∈ R ⎩ ⎭ −1 2. V = P2 , H = at 2 : a ∈ R ⎧⎡ ⎫ ⎤ 2a + 3b ⎪ ⎪ ⎪ ⎪ ⎨⎢ ⎬ ⎥ a − 4b 4 ⎥ : a, b ∈ R 3. V = R , H = ⎢ ⎣ −3a + 2b ⎦ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ a −b 4. V = P (the vector space of all polynomials), H = Pn (the subspace of all polynomials of degree n or less) 2 −1 2 5. V = R , H = x : Ax = 0 where A = −6 3 1 −3 2 −1 4 6. V = R , H = x : Ax = 0 where A = −2 5 0 4 a 0 7. V = M2×2 , H = A ∈ M2×2 : A = b c

114

Essentials of linear algebra

8. Determine whether or not the following set S is a basis for R3 . If not, is some subset of S a basis for R3 ? Explain. ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 0 1 2 ⎬ ⎨ 1 S = ⎣ 0 ⎦,⎣ 1 ⎦,⎣ 1 ⎦,⎣ 1 ⎦ ⎩ ⎭ 1 1 1 3 9. Is the set S = {[1 2]T , [2 1]T } a basis for R2 ? Justify your answer. 10. Is the set S = {[1 2]T , [−4 − 8]T } a basis for R2 ? Justify your answer. 11. Is the set S = {[1 2 1 1]T , [2 1 1 − 1]T , [−1 1 3 1]T , [2 4 5 1]T } a basis for R4 ? Justify your answer. 12. Is the set S = {[1 2 1 1]T , [2 1 1 − 1]T , [−1 1 3 1]T , [2 4 5 0]T } a basis for R4 ? Justify your answer. 13. Can a set with three vectors be a basis for R4 ? Why or why not? 14. Can a set with seven vectors be a basis for R6 ? Why or why not? 15. Not every vector space has a basis with ﬁnitely many elements. If there is not a ﬁnite basis, then we say that the vector space is inﬁnite dimensional. Explain why the vector space P of all polynomial functions is an inﬁnite dimensional vector space. 16. Let V be the vector space V = C [−1, 1] and H the subset deﬁned by H = f ∈ C [−1, 1] : f is differentiable Explain why H is an inﬁnite dimensional subspace of V and why we cannot explicitly write down the elements in a basis for H . 17. Recall from exercises 22 and 23 in section 1.11 that the null space of a matrix A is the subspace of all solutions to the equation Ax = 0 and that the column space of A is the space spanned by the columns of A. By exploring several different examples of matrices A of your choice, discuss how the dimensions of the null and column spaces are related to the number of pivot columns in the matrix. In particular, explain what you can say about the relationship between the sum of the dimensions of the null and column spaces and the number of columns in the matrix A. 18. Decide whether each of the following sentences is true or false. In every case, write one sentence to support your answer. (a) Any set of ﬁve vectors is a basis for R5 . (b) If S is a linearly independent set of six vectors in R6 , then S is a basis for R6 . (c) If the determinant of a 3 × 3 matrix A is zero, then the columns of A form a basis for R3 . (d) If A is an n × n matrix whose columns span Rn , then the columns of A form a basis for Rn .

For further study

115

1.13 For further study 1.13.1 Computer graphics: geometry and linear algebra at work

In modern computer graphics, images consisting of sets of pixels are moved around the screen through mathematical computations that rely on linear algebra. If we focus on two-dimensional objects, there are several basic moves that we must be able to perform: translation, rotation, reﬂection, and dilation. In what follows, we explore the role that linear algebra plays in the geometry of linear transformations and computer graphics. (a) In section 1.8.1 we began to develop an understanding of how matrix multiplication can be used to move a two-dimensional image around the plane. If you have not already read this section, do so now. If we take the perspective that a given point in the plane is stored in the vector v, then for any 2 × 2 matrix A, the matrix A moves the vector via multiplication to the new location Av. If we have a ﬁnite set of points (which together constitute an image), we can store the points in a matrix M whose columns represent the individual points), and the new image which results from multiplication by A is given by AM. Consider the triangle with vertices (0, 0), (3, 1), and (2, 2), stored in the matrix 0 3 2 M= 0 1 2 Choose three different matrices A and compute AM. Then explain why it is impossible to use multiplication by a 2 × 2 matrix to translate the triangle so that all three of its vertices appear in new locations. (b) Due to our discovery in (a) that a simple translation is impossible using 2 × 2 matrices, we introduce the notion of homogeneous coordinates; instead of representing points in the two-dimensional plane as [x y ]T , we move to a plane in three-dimensional space where the third coordinate is always 1. That is, instead of [x y ]T we use [x y 1]T . Consider the matrix A given by

⎡ ⎤ 1 0 a A = ⎣0 1 b ⎦ 0 0 1

(1.13.1)

and the triangle from (a) which can be represented in homogeneous coordinates by the matrix ⎡ ⎤ 0 3 2 M = ⎣0 1 2⎦ 1 1 1

116

Essentials of linear algebra

Compute AM. What has happened to each vertex of the triangle represented by M? Explain in terms of the parameters a and b in A. (c) Using a = 2 and b = −1 in (1.13.1) along with the triangle M from above, compute AM in order to determine the translation of the triangle 2 units in the x-direction and −1 units in the y-direction. Sketch both the original triangle and its image under this translation. (d) In order to view some more sophisticated graphics, we use Maple in our computations that follow. Rather than performing operations on a triangle, we will use the syntax > with(plots): with(LinearAlgebra): > setoptions(scaling=constrained, axes=boxed, tickmarks=[5,5]): > X := cos(t)*(1+sin(t))*(1+0.3*cos(8*t))* (1+0.1*cos(24*t)): > Y := sin(t)*(1+sin(t))*(1+0.3*cos(8*t))* (1+0.1*cos(24*t)): > plot([X,Y,t=0..2*Pi], color=blue, thickness=3);

which generates a parametric curve whose plot is the leaf shown in ﬁgure 1.18. Input these commands in Maple, as well as the syntax > leaf := plot([X,Y,t=0..2*Pi], color=grey, thickness=1):

to store the image of the original leaf in leaf.

2.0

1.0

0.0 −1.0

0.0

Figure 1.18 A Maple leaf.

1.0

For further study

117

Finally, for a given matrix A of the form ⎡ ⎤ a11 a12 a13 A = ⎣a21 a22 a23 ⎦ 0 0 1 and a vector Z = [X Y 1], compute AZ (by hand) to show how AZ depends on the entries in A. (e) By our work in (c) and (d), if we now let ⎡ ⎤ 1 0 2 A = ⎣0 1 −1⎦ 0 0 1 the product AZ should result in translation of the leaf by the vector [2 − 1]T . To test this, we deﬁne the matrix A in Maple by > A := ;

and compute the coordinates in the new image by > Xnew := A[1,1]*X + A[1,2]*Y + A[1,3]*1: > Ynew := A[2,1]*X + A[2,2]*Y + A[2,3]*1: > image1 := plot([Xnew,Ynew,t=0..2*Pi], thickness=3, color=blue):

The last command above plots the resulting image and stores it in image1. Display both the original leaf and the new image with the command > display(leaf, image1);

and show that this results indeed in the translated leaf as shown in ﬁgure 1.19. (f) In section 1.8.1, we learned that a matrix of the form cos θ − sin θ R= sin θ cos θ is known as a rotation matrix and, through multiplication, rotates any vector by θ radians counterclockwise about the origin. To work with a rotation matrix in homogeneous coordinates, we update the matrix as follows: ⎤ ⎡ cos θ − sin θ 0 cos θ 0⎦ R = ⎣ sin θ 0 0 1 Let us say that we wanted to perform two operations on the leaf. First, we wish to translate the leaf as above along the vector [2 − 1]T , and then we

118

Essentials of linear algebra

1

−1 −1

1

3

Figure 1.19 The original leaf and its transla-

tion by [2 − 1]T .

want to rotate the resulting image π/4 radians clockwise about the origin. We can accomplish this through two matrices by computing their product, as the following discussion shows. From (e), we know that using the matrix > Translation := ;

leads to the desired translation. Likewise, the matrix > Rotation := ;

will produce the sought rotation. Explain why the matrix > A := Rotation.Translation;

will produce the combined translation and rotation, and plot the resulting ﬁgure by updating your computations for Xnew and Ynew and using the syntax > image2 := plot([Xnew,Ynew,t=0..2*Pi], thickness=4, color=black): > display(leaf, image1, image2);

(g) What is the result of applying the matrix ⎤ ⎡ 1 0 0 ⎥ ⎢ A = ⎣0 12 0⎦ 0 0 1

For further study

119

on the leaf? What kind of geometric transformation is performed by this matrix? What matrix would keep the height of the leaf constant but stretch its width by a factor of 2? (h) It can be shown that to reﬂect an image across a line through the origin that forms an angle α with the positive x-axis, the necessary matrix is ⎡ ⎤ cos 2α sin 2α 0 A = ⎣ sin 2α − cos 2α 0⎦ 0 0 1 By ﬁnding the appropriate value of α , ﬁnd the matrix that will reﬂect an image across the line y = x and compute and plot the image of the original leaf under this reﬂection. (i) Exercises for further practice and investigation: 1. Find the image of the original leaf under rotation about the origin by 2π/3 radians, followed by a reﬂection across the y-axis. 2. Find the image of the original leaf under rotation about the point (−3, 1) by −π/6 radians. (Hint: To rotate about a point other than the origin, ﬁrst translate that point to the origin, then rotate, then translate back.) 3. Find the image of the original leaf under translation along the vector [3 2]T , followed by reﬂection across the line y = x /2. 1.13.2 Bézier curves

In what follows7 , we explore the use of a speciﬁc type of parametric curves, called Bézier curves (pronounced “bezzy-eh”), which have a variety of important applications. These curves were originally developed by two automobile engineers in France in the 1960s, P. Bézier and P. de Casteljau, who were working to develop mathematical formulas to graph the smooth, wiggle-free curves that formed the shape of a car’s body. Today, Bézier curves ﬁnd their way into our lives every day: they are used to create the letters that appear in typeset fonts. The principles that govern these curves involve fundamental mathematics from linear algebra and calculus. (a) In calculus, we study parametric curves given in the form x = f (t ), y = g (t ), where f and g are each functions of the parameter t . Another way to denote this situation is to write P(t ) = (f (t ), g (t )) where t belongs to some interval of real numbers. Note that P(t ) is essentially a vector; the graph of P(t ) is the parametric curve traced out by 7 The material in this project has been adapted from Steven Janke’s chapter “Designer Curves” in Applications of Calculus, MAA Notes Number 29, Philip Strafﬁn, Ed.

120

Essentials of linear algebra

the vector over time. It will be most convenient if we simply write this as P(t ) = (x(t ), y(t )) in what follows. In this problem we begin to consider some special formulas for x(t ) and y(t ). To parameterize the line between the points P0 (1, 3) and P1 (3, 7), we can think about wanting to make x go from 1 to 3, and y go from 4 to 7. Indeed, we want these to occur simultaneously as t goes from 0 to 1. Consider the parameterization: x = x(t ) = 1 + t (3 − 1) = t · 3 + (1 − t ) · 1 y = y(t ) = 3 + t (7 − 3) = t · 7 + (1 − t ) · 3 0≤t ≤1 Observe that when t = 0, x = 1 and y = 3, and when t = 1, x = 3 and y = 7. Show that the curve parameterized by these two equations is indeed the line segment between P0 and P1 . For instance, you might use algebra to eliminate the variable t , thereby deducing a relationship between x and y. (b) We can think about the equations for x and y in (a) in a more compact manner. Consider the following vector notation to replace the previous equations: x(t ) 3 1 P(t ) = (1.13.2) =t + (1 − t ) y(t ) 7 3 This is sometimes referred to as taking a convex combination of the points (1, 3) and (3, 7), because t and 1 + t are both nonnegative and sum to 1. Using the above style, write the parametric equations for the line segment that passes between the general points P0 (x0 , y0 ) and P1 (x1 , y1 ). (c) An even more concise notation is to simply write P(t ) = (1 − t )P0 + tP1 . We will now use this notation to combine two or more of these parameterizations for line segments in a way that constructs curves that can be “controlled” in very interesting ways. Consider three points, labeled P0 , P1 , and P2 . In the most recent form of P(t ) given above at (1.13.2), write parameterizations for the two line segments from P0 to P1 and from P1 to P2 , as pictured below. Call the ﬁrst parameterization P (1) (t ) and the second parameterization P (2) (t ). In addition, determine the parameterizations P (1) (t ) and P (2) (t ) for the speciﬁc set of points P0 (2, 3), P1 (4, 7), and P2 (7, 1). Show your work, and write each out in the expanded form where you have an expression for x(t ) and another for y(t ). (d) From the two line-segment parameterizations in (c), we will now create a new parametric plot by taking similar combinations of P (1) (t ) and P (2) (t ).

For further study

121

P2 P0 P (2) P

(1)

P1 Figure 1.20 The

line segments from P0 to P1 and P1 to P2 .

Consider the function Q(t ) deﬁned as follows: Q(t ) = (1 − t ) · P (1) (t ) + t · P (2) (t )

(1.13.3)

First, substitute in (1.13.3) your expressions for P (1) (t ) and P (2) (t ) from (c) that involve the general points P0 , P1 , and P2 . Simplify the result as much as possible in order to write the formula for Q in the following form: Q(t ) = a0 (t )P0 + a1 (t )P1 + a2 (t )P2 where a0 (t ), a1 (t ), and a2 (t ) are polynomial functions of t . Then, using the speciﬁc parameterizations for P (1) (t ) and P (2) (t ) for the points P0 (2, 3), P1 (4, 7), and P2 (7, 1), determine the parametric equations for x(t ) and y(t ) that make up the function Q(t ). For each of these three parameterizations (P (1) , P (2) , and Q), use Maple to sketch a plot8 and describe the results in detail. For example, how does Q(t ) look in comparison to the two line segments? What kind of functions make up the components x(t ) and y(t ) in Q? What is true about Q(0) relative to the points P0 , P1 , and P2 ? Q(1)? What direction is a particle moving along Q(t ) headed as t starts out away from 0? As t gets near to 1? (e) It turns out that we will have even more freedom and control in drawing curves if we start with four control points, P0 , P1 , P2 , and P3 . The development here is similar to what was done above, just using a greater number of points. First, parameterize the segments from P0 to P1 (with P (1) (t )), P1 to P2 (with P (2) (t )), and from P2 to P3 (with P (3) (t )). The usual formulas apply 8 The Maple syntax to plot a parametric curve (f (t ), g (t )) on the interval [a , b ] is > plot([f(t),g(t),t=a..b]);.

122

Essentials of linear algebra

here; write down the basic form of each P (j) (t ), j = 1, 2, 3, in terms of the various points Pi . Then combine, as in (d) above, the parameterizations for the ﬁrst two segments to get a new function Q (1) ; also combine the parameterizations for the second two segments to get Q (2) . These Q parameterizations are written as Q (1) (t ) = (1 − t ) · P (1) (t ) + t · P (2) (t ) Q (2) (t ) = (1 − t ) · P (2) (t ) + t · P (3) (t ) Finally, combine Q (1) and Q (2) to get a new parametric function that we call B(t ) according to the natural formula B(t ) = (1 − t ) · Q (1) (t ) + t · Q (2) (t ) By substituting appropriately for Q (1) (t ) and Q (2) (t ) and then replacing these with the appropriate P (j) (t ) functions, show that B(t ) = P0 (1 − t )3 + 3P1 t (1 − t )2 + 3P2 t 2 (1 − t ) + P3 t 3 . B(t ) is called a cubic Bézier curve. By ﬁnding and using appropriate t values, show that the points P0 and P3 both lie on the curve given by B(t ). (f) Write the formulas for x(t ) and y(t ) that give the parameterizations for the cubic Bézier curve that has the four control points P0 (2, 2), P1 (5, 10), P2 (40, 20), and P3 (10, 5). Use Maple to plot each of the parametric curves given by P (j) (t ), j = 1 . . . 3, Q (1) (t ), Q (2) (t ), and B(t ) in the same window. Discuss how the various curves combine to form others. (g) For the general Bézier curve with control points P0 (x0 , y0 ), P1 (x1 , y1 ), P2 (x2 , y2 ), and P3 (x3 , y3 ), derive the equation for the tangent line to the curve at the point (x0 , y0 ), and prove that the point (x1 , y1 ) lies on this tangent line. (Hint: to determine the slope of the tangent line, use the chain rule in the standard way for ﬁnding dy /dx for a parametric curve.) (h) Laser printers and the program Postscript use Bézier curves to construct the fonts that we use to represent letters. For example, a picture of the letter g is shown below that reveals the control points and Bézier curves required to accomplish this. In Maple, use two or more Bézier curves to sketch a reasonable representation of the letter S. (You need not try to emulate the thickness of the ‘g’ that is shown above.) Then, use an appropriate number of Bézier curves to create an approximation of the lowercase letter ‘a,’ in the form shown here in quotes. State the control points required for the various curves.

For further study

123

Figure 1.21 The letter g.

(i) Discuss the role that vectors and linear combinations play in the development of Bézier curves. 1.13.3 Discrete dynamical systems

A linear discrete dynamical system is a model that represents changes in a system from time k to time k + 1 by the rule x (k +1) = Ax (k) A discrete dynamical system is similar to a Markov chain, but we no longer require that the columns of the matrix A sum to 1. A key issue in either scenario is the long term behavior of the quantity x (k) being modeled. In what follows, we explore the role of eigenvalues and eigenvectors in determining this long-term behavior and study an important application of these ideas. (a) To begin investigating the long-term behavior of the system, we will assume that A is an n × n matrix with n real linearly independent eigenvectors v1 , . . . , vn . Furthermore, assume that the corresponding real eigenvalues of A satisfy the relationship |λ1 | > |λ2 | ≥ · · · ≥ |λn |

Consider an initial vector x (0) . Explain why there exist constants c1 , . . . , cn such that x (0) = c1 v1 + c2 v2 + · · · + cn vn and show that Ax (0) = c1 λ1 v1 + c2 λ2 v2 + · · · + cn λn vn

124

Essentials of linear algebra

Furthermore, show that x (k) = Ak x (0) = c1 λk1 v1 + c2 λk2 v2 + · · · + cn λkn vn

(1.13.4)

(b) In (1.13.4), divide both sides by λk1 . What can you conclude about (λ2 /λ1 )k as k → ∞? Why can you make similar conclusions about (λj /λ1 )k for j = 3 . . . n? Hence explain why for large k k 1 Ak x (0) ≈ c1 v1 λ1 and thus why Ak x (0) is an approximate eigenvector of A corresponding to v1 . (c) In studying a population like spotted owls, mathematical ecologists often pay close attention to the various numbers of a species at different stages of life. For example, for spotted owls there are three pronounced groupings: juveniles (under 1 year), subadults (1 to 2 years old), and adults (2 years and older). The owls mate during the latter two stages, breed as adults, and can live for up to 20 years. A critical time in the life cycle and survival of these owls is when the juvenile leaves the nest to build a home of its own.9 Let the number of spotted owls in year k be represented by the vector ⎡ ⎤ jk x (k) = ⎣ sk ⎦ ak where jk is the number of juveniles, sk the number of subadults, and ak the number of adults. Using ﬁeld data, mathematical ecologists have determined10 that a particular spotted owl population is modeled by the discrete dynamical system ⎡ ⎤ 0 0 0.33 0 0⎦ x (k) x (k +1) = ⎣0.18 0 0.71 0.94 What does this model imply about the percent of juveniles that survive to become subadults? About the percent of subadults that survive to become adults? About the percent of adults that survive from one year to the next? What percent of adults produce juvenile offspring in a given year? (d) Assume that in a given region, ecologists have measured the present populations as follows: j0 = 200, s0 = 45, and a0 = 725. Use the model stated in (c) to determine the population x (k) = [jk sk ak ]T for k = 1, . . . , 20. Do you think the spotted owl will become extinct? Give a 9 To read more about the issue of spotted owl survival, see the introduction to chapter 5 of David C. Lay’s Linear Algebra and its Applications. 10 R. H. Lamberson et al., “A Dynamic Analysis of the Viability of the Northern Spotted Owl in a Fragmented Forest Environment,” Conservation Biology 6 (1992), 505–512.

For further study

125

convincing argument using not only your computations of the population vectors but also the results of (b). (e) Say that r is the fraction of juveniles that survive from one year to the next (that is, replace 0.18 in the matrix of the model with r) . By experimenting with different values of r, determine the minimum fraction of juveniles that must survive from one year to the next in order for the spotted owl population not to become extinct. How does your answer depend on the eigenvalues of the matrix? (f) Let A be the n × n matrix of a discrete dynamical system and assume that A has n real linearly independent eigenvectors. Let x (0) be an initial vector and let ρ (A) denote the maximum absolute value of an eigenvalue of A. Show that the following are true: (i) If ρ (A) < 1, then limk →∞ Ak x (0) = 0. (ii) If ρ (A) = 1 and λ = 1 is the unique eigenvalue having this maximum absolute value, then limk →∞ Ak x (0) is an eigenvector of A. (iii) If ρ (A) > 1, then there exist choices of x (0) for which Ak x (0) grows without bound.

This page intentionally left blank

2 First-order differential equations

2.1 Motivating problems

Differential equations arise naturally in many problems encountered when modeling physical phenomena. To begin our study of this subject, we introduce two fundamental examples that demonstrate the central role that differential equations play in our world. In section 1.1, we discussed how the amount of salt present in a system of two tanks can be modeled through a system of differential equations. Here, an even simpler situation is considered: our goal is to predict the amount of salt present in a city’s water reservoir at time t , given a set of determining conditions. Suppose that the reservoir is ﬁlled to its capacity of 10 000 m3 , and that measurements indicate an initial concentration of salt of C0 = 0.02 g/m3 . Note that it follows there are A0 = 200 g of salt initially present. As the city draws this solution from the reservoir for use, new solution (water with some salt concentration) from the local treatment facility ﬂows into the reservoir so that the volume of water present in the tank stays constant. Let us assume that the concentration of salt in the inﬂowing solution is 0.01 g/m3 , and that the rate of this inﬂow is 1000 m3 /day. Since the city is also assumed to be drawing solution at an equal rate from the reservoir, the outﬂow also occurs at a rate of 1000 m3 /day. We are interested in several key questions. How much salt is in the tank at time t ? What is the concentration of salt in the water being used by the city at time t ? What happens to these values over time? We will let A(t ) denote the amount of salt in the tank at time t . The instantaneous rate of change dA /dt of A(t ) is given by the difference between 127

128

First-order differential equations

the rate at which salt is entering the tank and the rate at which salt is leaving. Exploring the given information regarding inﬂow and outﬂow, we can determine these rates precisely. Since solution is entering the reservoir at 1000 m3 /day containing a concentration of 0.01 g/m3 , it follows that salt is entering the tank at a rate of m3 g g · 0.01 3 = 10 day m day For salt leaving the reservoir, the situation is slightly more complicated. Since we do not know the exact amount of salt present in the reservoir at time t , we denote this by A(t ). Assuming that the solution in the reservoir is uniformly mixed, the concentration of salt in the outﬂowing solution is the ratio of the amount A(t ) of salt to the volume of the tank. That is, the outﬂowing concentration is A(t ) g 10 000 m3 Since this outﬂow is occurring at a rate of 1000 m3 /day, it follows that salt is leaving the tank at a rate of 1000

m3 A(t )g A(t ) g = · day 10 000 m3 10 day It now follows that the instantaneous rate of change dA /dt of salt in the tank in grams per day is given by the difference of the rate of salt entering and the rate of salt leaving the tank. Speciﬁcally, dA A(t ) = 10 − (2.1.1) dt 10 Note carefully what this last equation is saying: A(t ) is an unknown function, but we have an equation that relates this unknown function to its derivative. Such an equation is called a differential equation. The solution to this equation is a function A(t ) that makes the equation true. If we can solve the equation for A(t ), we then will be able to predict the amount of salt in the tank at any time t . Determining such solutions and their long-term behaviors is the main focus of this chapter. Another important application of differential equations involves population growth. Consider a population P(t ) of animals. As likelihood of reproduction depends on the number of animals present, it is natural to assume that the rate of change of P(t ) is directly proportional to P(t ). Phrased in terms of the derivative, this assumption means that dP = kP(t ) (2.1.2) dt where k is some positive constant. Observe that (2.1.2) is a differential equation involving the function P. It is a standard exercise in calculus to show that functions of the form P(t ) = P0 e kt are solutions to (2.1.2). 1000

Deﬁnitions, notation, and terminology

129

Because the function P(t ) = P0 e kt exhibits unbounded growth over time, it turns out that this exponential growth model is not realistic beyond a relatively short period of time. A related, but more sophisticated, model of population growth is the logistic differential equation dP P(t ) = kP(t ) 1 − dt A where the constant k is considered the reproductive rate of the population and the constant A is the surrounding environment’s carrying capacity. For example, if a population had a relative growth rate of k = 0.02 and a carrying capacity of A = 100, the population function would satisfy the differential equation dP P(t ) = 0.02P(t ) 1 − dt 100 The logistic model, usually credited to the Dutch mathematician Pierre Verhulst, accounts not only for reproductive growth, but also for mortality by considering environmental limitations on maximum population. The logistic equation is more challenging to solve; we will do so in section 2.7. In addition to mixing problems and models of population growth, differential equations enjoy widespread applications in other physical phenomena. Differential equations are also mathematically interesting in and of themselves, and in upcoming sections we will study not only their applications, but also their key properties and characteristics to better understand the subject as a whole.

2.2 Deﬁnitions, notation, and terminology

As we have seen with the examples dA A = 10 − dt 10 dP P = 0.02P 1 − dt 100 y + y = 0

(2.2.1) (2.2.2) (2.2.3)

a differential equation is an equation relating an unknown function to one or more of its derivatives. Usually we will suppress the notation “A(t )” and instead simply write “A,” as in (2.2.1). We will interchangeably use the notations y and dy /dt to represent the ﬁrst derivative; similarly, y = d 2 y /dt 2 . Other books sometimes employ the notations y = D(y) = y˙ and y = D 2 (y) = y¨ . A solution of a differential equation is a differentiable function that satisﬁes the equation on some interval (a , b) of values for the independent variable. For example, the function y = sin t is a solution to (2.2.3) on (−∞, ∞) since y = − sin t , and − sin t + sin t = 0 for all values of t . Given any differential equation, we are interested in determining all of its solutions. But many, if not most, differential equations are difﬁcult or impossible

130

First-order differential equations

to solve. For example, the equation y + ty = t (which is only a slightly modiﬁed version of (2.2.3)) has no solution in terms of elementary functions.1 In such situations, we may turn to qualitative or approximation methods that may enable us to analyze how a solution should behave, while perhaps not being able to determine an explicit formula for the function. Equations (2.2.1), (2.2.2), and (2.2.3) are often called ordinary differential equations, in contrast to partial differential equations such as ∂ 2u ∂ 2u + =0 ∂x2 ∂y2

where the solution function u(x , y) has two independent variables x and y. Our focus will be on ordinary differential equations, as partial differential equations are beyond the scope of this text. The order of a differential equation is the order of the highest derivative present. For example, (2.2.1) and (2.2.2) are ﬁrst-order differential equations since they only involve ﬁrst derivatives. Equation (2.2.3) is second-order. For now, we limit our attention to ﬁrst-order equations; higher order equations will be discussed in detail in subsequent chapters. It is important to note that every student of calculus learns to solve a certain class of differential equations through integration. For example, the problem, “ﬁnd a function y whose derivative is te t ” can be restated as a differential equation. In particular, this problem can be stated as the differential equation dy = te t (2.2.4) dt Integrating both sides with respect to t and using integration by parts on the right, it follows that y(t ) = te t − e t + C is a solution for any choice of the constant C. Here we see an important fact: differential equations typically have a family of inﬁnitely many solutions. Determining all possible members of that family, like determining all solutions to systems of linear equations in linear algebra, will be a central component of our work. Calculus students also know that if we are given one more piece of information about the function y along with (2.2.4), it is possible to uniquely determine the integration constant, C. For example, had the problem above read, “ﬁnd a function y whose derivative is te t such that y(0) = 5,” we could integrate to ﬁnd y = te t − e t + C, just as we did previously, and then use the initial condition y(0) = 5 to see that C must satisfy the equation 5 = 0 · e0 − e0 + C 1

This fact is not obvious.

Deﬁnitions, notation, and terminology

131

and thus C = 6. When we are given a differential equation of order n along with n initial conditions, we say that we are solving an initial-value problem.2 In the given example, y = te t − e t + 6 is the solution to the stated initial-value problem. Based on the example above and our experience in calculus, it is clear that integration is an obvious (and often effective) approach to solving differential equations of the form dy = f (t ) dt where f (t ) is a given function. If we can integrate f symbolically, then the differential equation is solved. Even if f (t ) cannot by integrated symbolically with respect to t , we can still use techniques like numerical integration to successfully attack the problem. The situation grows more complicated when we want to solve differential equations that also involve the unknown function y, such as dy = te y dt In what follows in this chapter, we seek to classify ﬁrst-order equations into types that can be solved in a straightforward way by symbolic means (often involving integration), as well as to develop methods that can be used to generate approximate solutions in situations where a symbolic solution is either difﬁcult or impossible to attain. Throughout, the general form of the equations we are considering will be y = f (t , y), where the function f (t , y) represents some combination of the independent variable t and the unknown function y. It is also important to note that a wide range of ﬁrst-order initial-value problems are guaranteed to have unique solutions. This is stated formally in the following theorem, whose proof may be found in more advanced texts. Theorem 2.2.1 Consider the initial-value problem given by y = f (t , y), y(t0 ) = y0 . If the function f (t , y) is continuous on a rectangle that includes (t0 , y0 ) in its interior and the partial derivative3 fy (t , y) is continuous on that same rectangle, then there exists an interval containing t0 on which the initial-value problem has a unique solution. Often the dependent variable, or unknown function y, in a differential equation will model an important quantity in some physical problem: the amount of salt in a tank at time t , the number of members of a population at a given time, or the position of a mass attached to a spring. As such, we will place particular emphasis on the graph of the solution function in order to better understand what the differential equation is telling us about the physical situation it models. 2 3

We often use the abbreviation IVP to stand for the phrase “initial-value problem.” We typically use the notation fy (t , y) = ∂ f /∂ y.

132

First-order differential equations

Just as geometry and graphical interpretations shaped our understanding of linear algebra in chapter 1, these perspectives will prove extremely helpful in our study of differential equations. We begin our explorations of these graphical interpretations through the reservoir problem from section 2.1 and the earlier example y = te t . So far in our references to derivatives in the reservoir and population models, we have viewed the derivative as measuring the instantaneous rate of change of a quantity that is varying. From a more geometric point of view, we also know that the derivative of a function measures the slope of the tangent line to the function’s graph at a given point. For example, with the differential equation dA A = 10 − (2.2.5) dt 10 we can say that if, at some time t , the amount of salt A is A = 20, then dA /dt = 10 − 20/10 = 8. Thus, if A(t ) is a solution to the differential equation, it follows that at any time where A(t ) = 20, A (t ) = 8. Graphically, this means that at such a point, the slope of the tangent line to the curve must be 8. Since we are interested in the function A(t ) over an interval of t -values, we also expect that A(t ) will take on a wide range of values. As such, it is natural to compute the slope of the tangent line determined by (2.2.5) for a large number of different values of A and t . Obviously computers are best suited to such a task, and, as we will see in the introduction to Maple commands at the end of this section, Maple and other computer algebra systems provide tools for doing so. Computing values of dA /dt over a grid of t and A values, we can plot a small portion of each corresponding tangent line at the point (t , A), and see the resulting slope ﬁeld (or direction ﬁeld). The slope ﬁeld for (2.2.5) is shown in ﬁgure 2.1. A(t) 200 150 100 50 t 10

20

Figure 2.1 The

30

40

50

slope ﬁeld for (2.2.5); the graph of the solution corresponding to an initial condition A(0) = 200 is included.

Deﬁnitions, notation, and terminology

133

Observe that a slope ﬁeld provides an intuitive way to understand the information a ﬁrst-order differential equation possesses: the slope at each point gives the direction of the solution at that point. Indeed, we use arrows instead of small lines in order to indicate the ﬂow of the solution as time increases. In essence, the slope ﬁeld is a map that the solution must navigate based on the initial point from which the function starts. For example, if we use the initial condition A(0) = 200 (as was given in the original example in section 2.1), we can start a graph at the point (0, 200) and follow the map. Doing so yields the curve shown in ﬁgure 2.1. Note particularly how we can clearly see the slope of the solution curve ﬁtting with the slopes present in the direction ﬁeld. Moreover, observe that the direction ﬁeld provides an immediate overall sense of how every solution to the differential equation behaves: for any solution A(t ), A(t ) → 100 as t → ∞. This makes sense physically, too, since the saltwater solution entering the reservoir has concentration 0.01 g/m3 . Over time, the concentration of solution in the reservoir should tend to that level, and with 10 000 m3 of solution present in the reservoir, we expect the amount of salt to approach 100 g. Another example of a differential equation’s slope ﬁeld provides further insights. For the differential equation dy = te t dt

(2.2.6)

its slope ﬁeld for the window −2 ≤ t ≤ 1 and −2 ≤ y ≤ 2 is given in ﬁgure 2.2. We noted earlier that the general solution to this equation is y = te t − e t + C. Moreover, given any initial condition, we can determine C. For example, if y(0) = 1/2, C = 3/2. Likewise, if y(0) = 0, C = 1, and if y(0) = −1, C = 0. If we plot the corresponding three functions with the slope ﬁeld, then (as shown in ﬁgure 2.2) the three members of the family of all solutions to the original differential equation appear as shown. In integral calculus, students learn about families of antiderivatives4 and how two members of such a family differ only by a constant. Here, we see this fact graphically in the slope ﬁeld of ﬁgure 2.2, and can add the perspective that there exists a family of solutions to a certain differential equation. In upcoming sections, we will learn new techniques for how to determine solutions analytically in various circumstances, while not losing sight of the fact that every ﬁrst-order differential equation can be interpreted graphically through a direction ﬁeld. Finally, there is an important type of ﬁrst-order differential equation (DE) for which solutions can be determined algebraically. A ﬁrst-order DE is said to be autonomous if it can be written in the form y = f (y). That is, the independent variable t is not involved explicitly in f (y). For example, the equation y = 1 − y2 is autonomous. 4

An antiderivative F of a function f is a function that satisﬁes F = f .

(2.2.7)

134

First-order differential equations

y(t) 1.6 0.8 t −2.0 −1.5 −1.0 −0.5

0.5

1.0

−0.8 −1.6

Figure 2.2 The slope ﬁeld for (2.2.6) along

with three solution functions for the initial conditions y(0) = 1/2, y(0) = 0, and y(0) = −1.

In addition, a solution y to a DE is called an equilibrium or constant solution if the function y is constant. In (2.2.7), both y = 1 and y = −1 are equilibrium solutions to the DE above. Such a solution is stable if all solutions with initial conditions y(t0 ) = y0 with y0 close to the equilibrium solution result in the overall solution to the IVP tending toward the equilibrium solution. Otherwise, the equilibrium solution is called unstable. We close this section with an example regarding an autonomous differential equation. Example 2.2.1 Consider the differential equation y = (y 2 − 1)(y − 3)2 . Determine all equilibrium solutions to the equation, as well as whether or not each is stable or unstable. Finally, plot the direction ﬁeld for the equation and include plots of the equilibrium solutions. Solution. To ﬁnd the equilibrium solutions, we assume that y is a constant function, and therefore y = 0. Solving the algebraic equation 0 = (y 2 − 1)(y − 3)2 we ﬁnd that y = −1, y = 1, and y = 3 are the equilibrium solutions of the given DE. We can decide the stability of each equilibrium solution by studying the sign of y near the equilibrium value; note that (y − 3)2 is always nonnegative. To consider the stability of y = −1, observe that when y < −1, y = (y + 1) (y − 1)(y − 3)2 > 0, since the ﬁrst two terms are both negative and the third is positive. When y > −1 (and y < 1), it follows y = (y + 1)(y − 1)(y − 3)2 < 0

Deﬁnitions, notation, and terminology

135

y(t) 3 2 1 t −0.5

0.5 −1

Figure 2.3 The slope ﬁeld for y = (y 2 − 1)

(y − 3)2 along with its three equilibrium solutions.

since the middle term is negative while the other two are positive. Hence, if a solution starts just below y = −1, that solution will increase toward −1, whereas if a solution starts just above y = −1, it will decrease to −1. This makes the equilibrium y = −1 stable. These observations are easiest to make visually in the direction ﬁeld. As seen in ﬁgure 2.3, the constant solution y = −1 is stable, since any solution with an initial condition just above or just below y = −1 will tend to y = −1. However, the solution at y = 1 is unstable, since any solution with an initial value just above or just below y = 1 will tend away from 1 (and tend toward y = 3 or y = −1, respectively). Finally, although solutions just below y = 3 tend to 3, any solution that begins just above y = 3 will increase away from that constant solution, and hence y = 3 is also unstable.5

2.2.1 Plotting slope ﬁelds using Maple

Just as our work in linear algebra required the use of Maple’s Linear Algebra package, to take advantage of the software’s support for the study of differential equations we use the DEtools package, loading it with the command > with(DEtools): 5 Some authors call a solution such as y = 3 in this example semi-stable, since there is stability on one side and instability on the other.

136

First-order differential equations

To plot the direction ﬁeld associated with a given differential equation, it is convenient to ﬁrst deﬁne the equation itself in Maple. This is accomplished (for the equation from the reservoir problem) through the following command: > Eq1 := diff(A(t), t) = 10-1/10*A(t);

Note that the differential equation of interest is now stored in “Eq1”. The slope ﬁeld may now be generated by the command > DEplot(Eq1, A(t), t = 0 .. 50, A(t) = 0 .. 200, color = grey, arrows=large);

This command produces the slope ﬁeld of ﬁgure 2.1, but without any particular solution satisfying an initial value included. It is important to note that the range of t and A(t ) values is extremely important. Without a well-chosen window selected by the user, the plot Maple generates may not be very insightful. For example, if the above command were changed so that the range of A(t ) values is 0 .. 10, almost no information can be gained from the slope ﬁeld. As such, we will strive to learn to analyze the expected behavior of a differential equation from its form so that we can choose windows well in related plots; we may often have to experiment and explore to ﬁnd graphs that are useful. Finally, if we are interested in one or more related initial-value problems, a variation of the DEplot command enables us to sketch the graph of each corresponding solution. For example, the command > DEplot(Eq1, A(t), t = 0 .. 50, A(t) = 0 .. 200, color = grey, arrows=large, [[0,200]]);

will generate not only the slope ﬁeld, but also the graph of the solution A(t ) that satisﬁes A(0) = 200, as shown in ﬁgure 2.1. Additional curves for different initial conditions may be plotted by listing the other conditions to be satisﬁed: for example, in the stated command above we could replace [[0,200]] with [[0,200], [0,100], [0,0]] to include the plots of the three solution curves that respectively satisfy A(0) = 200, A(0) = 100, and A(0) = 0. Exercises 2.2 1. Consider the differential equation y = 4y. (a) What is the order of this equation? (b) Show via substitution that the function y = e 2t is a solution to this equation. (c) Are there any other functions of the form y = e rt (r = 2) that are also solutions to the equation? If so, which? Justify your answer.

Deﬁnitions, notation, and terminology

137

2. For a ball thrown straight up from an initial height s(0) = 4 meters at an initial velocity of s (0) = 10 m/s, we know that after being thrown, the only force acting on the ball is gravity, provided we neglect air resistance. Knowing that acceleration due to gravity is constant at −9.81 m/s2 , it follows that s (t ) = −9.81. Use the given information to determine s(t ), the function that tells us the height of the ball at time t . Then determine the maximum height the ball reaches, as well as the time the ball lands. 3. In the differential equation dA /dt = 10 − A /10 from the reservoir problem, explain why the function A(t ) = 100 is an equilibrium solution to the equation. Is it stable or unstable? Why? 4. Consider the logistic differential equation dP P = 0.02P 1 − dt 100 Use Maple to plot the direction ﬁeld for this equation. Print the output and, by hand, sketch the solutions that correspond to the initial conditions P(0) = 10, P(0) = 75, and P(0) = 125. What is the long-term behavior of every solution P(t ) for which P(0) > 0? Are there any constant (or equilibrium) solutions to the equation? Explain what these observations tell you about the behavior of the population being modeled. 5. For the logistic differential equation

dP P = 0.001P 1 − dt 25

how should the direction ﬁeld appear? Use the constant/equilibrium solutions to the equation as well as the long-term behavior of the population to help you sketch, by hand, the direction ﬁeld for this DE. 6. By constructing tangent lines over a grid with at least sixteen vertices, sketch a direction ﬁeld by hand for each of the following differential equations. (a) y = 1 − y (b) y = 12 (t − y) (c) y = 12 (t + y)

(d) y = 1 − t

7. Without using Maple to plot direction ﬁelds, match each of the following differential equations with its corresponding direction ﬁeld. Write at least one sentence to explain the reasoning behind each of your choices. (a)

dy = y −t dt

(b)

dy = ty dt

(c)

dy =y dt

(d)

dy =t dt

138

First-order differential equations

y(t) 1.0

1.0

y(t)

t (ii) 1.0 −1.0

(i) −1.0

−1.0 1.0

t 1.0

−1.0 y(t)

1.0

t

t (iv) 1.0 −1.0

(iii) −1.0

−1.0

y(t)

1.0

−1.0

In exercises 8–15, use integration to ﬁnd a family of solutions for the given differential equation. 8. y = t 2 + 2 9. y = t + cos t 10. y =

t t2 + 1

11. y = t 2 + 2 12. y = 5t 13. y = t sin t 14. y =

1 t 2 + 5t + 6

15. y = te −t

2

Linear ﬁrst-order differential equations

139

In exercises 16–23, solve each of the following initial-value problems. 16. y = t 2 + 2,

y(1) = 4

17. y = t + cos t , y(π/2) = 1 t 18. y = 2 , y(0) = 3 t +1 19. y = t 2 + 2, 20. y = 5t ,

y(−1) = 3, y (−1) = −1, y (−1) = 0

21. y = t sin t , 22. y =

y(1) = 4, y (1) = −2 y(0) = 2

1 t 2 + 5t

23. y = te −t , 2

+6

,

y(0) = 1

y(0) = −1

24. For an n th order IVP of the form y (n) = f (t ), how many initial conditions are needed in order to uniquely determine the solution y(t )? Explain. For each of the autonomous differential equations given in exercises 25–29, algebraically determine all equilibrium solutions to the DE. In addition, plot an appropriate direction ﬁeld and use it to classify each equilibrium solution as stable or unstable. 25. y = 3 − 2y 26. y = −y 2 − 5y − 6 27. y = y − y 3 28. y = e −y (1 + y 2 ) 29. y = (y − 1)(y − 3)2

2.3 Linear ﬁrst-order differential equations

Some classes of differential equations can usually be solved by certain standard techniques. In this section, we consider the class of linear ﬁrst-order differential equations and develop an approach for solving any such equation. Since any ﬁrst-order DE is an equation that involves the functions y and y , it is natural for us to consider the different ways in which y and y may be combined. For example, the equations yy = e t

(2.3.1)

2ty + y sin t = cos t

(2.3.2)

y + sin y = 2

(2.3.3)

140

First-order differential equations

are all ﬁrst-order DEs. Recall that in section 1.12 we discussed linear combinations of generalized vectors. Here we can view y and y as functions that belong to a vector space, and thus think about whether a certain combination of y and y is a linear combination or not. We say that any differential equation of the form (2.3.4) a1 (t )y + a0 (t )y = b(t ) is a linear ﬁrst-order differential equation, since a linear combination of y and y is being formed. Any other ﬁrst-order differential equation is said to be nonlinear. If we stipulate that a1 (t ) = 0, we can divide through by a1 (t ) and hence write y + p(t )y = f (t )

(2.3.5)

as the standard form for a linear ﬁrst-order equation. We call f (t ) the forcing function. Above, note that (2.3.1) and (2.3.3) are nonlinear equations, while (2.3.2) is linear. The simplest linear ﬁrst-order differential equations are those for which the forcing function is zero. We naturally call the equation y + p(t )y = 0

(2.3.6)

a homogeneous linear ﬁrst-order DE. We consider a particular example that shows how every such homogeneous DE may be solved. Example 2.3.1 Solve the differential equation y + (1 + 3t 2 )y = 0. In addition, solve the initial-value problem that is given by the same DE and the initial condition y(0) = 4. Solution. We will use integration to solve for y. Rearranging the given equation, we observe that y = −(1 + 3t 2 )y . Dividing both sides by y, we ﬁnd that y = −(1 + 3t 2 ) y Keeping in mind the fact that y and y are each unknown functions of t , we integrate both sides of the previous equation with respect to t : y dt = −(1 + 3t 2 ) dt y We recognize from the chain rule that the left-hand side is ln y. Thus, integrating the polynomial in t on the right yields ln y = −t − t 3 + C We note that while an arbitrary constant arises on each side of the equation when integrating, it sufﬁces to simply include one constant on the right. Finally, we solve for y using properties of the natural logarithm and exponential functions to ﬁnd that 3 3 y = e −t −t +C = e C e −t −t

Linear ﬁrst-order differential equations

141

Since C is a constant, so is e C , and thus we write y = Ke −t −t

3

Observe that we have found an entire family of functions that solve the original differential equation: regardless of the constant K , the above function y is a solution. If we consider the stated initial-value problem and apply the given initial condition y(0) = 4, we immediately see that K = 4, and the solution to the initial-value problem is 3 y = 4e −t −t The solution method in example 2.3.1 can be generalized to apply to any homogeneous linear ﬁrst-order DE. Using the notation p(t ) to replace the function 1 + 3t 2 , which is the coefﬁcient of y, the same steps above may be used to ﬁnd the solution to the standard homogeneous linear ﬁrst-order differential equation. We state this result in the following theorem. Theorem 2.3.1 For any homogeneous linear ﬁrst-order differential equation of the form y + p(t )y = 0, the general solution is y = Ke −P(t ) , where P is any antiderivative of p. Moreover, for the initial condition y(t0 ) = y0 , if p(t ) is continuous on an interval containing t0 , then the solution to the corresponding initial-value problem is unique. The uniqueness of the solution to the initial-value problem follows from theorem 2.2.1. But perhaps the most important lesson to learn from this result is that a homogeneous linear ﬁrst-order DE can always be solved. This is analogous to our experience with homogeneous linear systems of algebraic equations in chapter 1. In particular, note that by taking K = 0, the zero function (y = 0) is always a solution to y + p(t )y = 0; in addition, the homogeneous linear ﬁrst-order DE has inﬁnitely many solutions. This is very similar to how, for a given matrix A, the homogeneous equation Ax = 0 always has the zero vector as a solution and, in the case where A is singular, Ax = 0 has inﬁnitely many solutions. Having now completely addressed the case of a homogeneous linear ﬁrstorder DE, we turn to the nonhomogeneous case. In particular, we are interested in solving the equation (2.3.7) y + p(t )y = f (t ) where f (t ) is not identical to zero. Recalling the product rule from calculus, d [v(t ) · y ] = v(t )y + v (t )y (2.3.8) dt we observe that the left-hand side of (2.3.7), y + p(t )y, looks similar to the right-hand side of (2.3.8). If we multiply both sides of (2.3.7) by an unknown

142

First-order differential equations

function v(t ), we have v(t )y + v(t )p(t )y = v(t )f (t )

(2.3.9)

v (t )

Next, we observe that if v(t ) is a function such that = v(t )p(t ), then it follows from the product rule that (2.3.9) has the form d

v(t )y = v(t )f (t ) (2.3.10) dt We assume temporarily that such a function v(t ) exists; we will proceed to discuss more about v(t ) shortly. Integrating both sides of (2.3.10), we now see that v(t )y =

v(t )f (t ) dt

(2.3.11)

To solve for y, we divide both sides by v(t ), yielding 1 y(t ) = v(t )f (t ) dt (2.3.12) v(t ) Prior to (2.3.10), we stipulated a condition on v that enabled us to proceed. In particular, we noted that “if v(t ) is a function such that v (t ) = v(t )p(t ),” then we could ﬁnd a solution in terms of v. Observe that the differential equation v satisﬁes is, in fact, a homogeneous linear ﬁrst-order equation itself (v − p(t )v = 0), and therefore its solution is

v(t ) = Ke P(t ) ,

where P(t ) = p(t ) dt . Since we only need one such nonzero function v to proceed, we set K = 1. From this and our conclusion in (2.3.12), we have determined that y(t ) = e −P(t )

e P(t ) f (t ) dt

(2.3.13)

where P(t ) = p(t ) dt . The function v(t ) = e P(t ) is usually called an integrating factor. We next consider two examples of nonhomogeneous linear ﬁrst-order differential equations and apply the method we just derived to solve them.

Example 2.3.2

Solve the differential equation y + 2y = 4.

Solution. In this equation, p(t ) = 2, and therefore P(t ) = 2t . From (2.3.13), it follows that y(t ) = e −P(t ) e P(t ) f (t ) dt = e −2t e 2t · 4 dt = e −2t 2e 2t + C = 2 + Ce

−2t

(2.3.14) (2.3.15)

Linear ﬁrst-order differential equations

143

There are several important observations to make from our work in example 2.3.2. First, the parentheses at (2.3.14) are essential. Without them, e −2t is not multiplied by the entire antiderivative, and the function y would no longer be a solution to the given DE. A second is that if we had instead solved the corresponding homogeneous differential equation y + 2y = 0, we would have found the so-called complementary solution yh = Ce −2t . Moreover, by observing that y = 4 − 2y = 2(2 − y), if we consider the function yp = 2, it is apparent that yp is a solution to the nonhomogeneous equation y + 2y = 4. In addition, if we omit the constant of integration C in (2.3.14), it follows that the method derived in (2.3.13) can be viewed as producing a so-called particular solution yp that is a solution to the given nonhomogeneous linear ﬁrst-order differential equation. Thus we see that the method derived in (2.3.13) and implemented to ﬁnd (2.3.15) ultimately expresses the solution to the original nonhomogeneous linear ﬁrst-order DE in the form y = yp + yh where yp is a particular solution to the nonhomogeneous equation, while yh is the complementary solution, the solution to the corresponding homogeneous equation. This situation reminds us of one way to view the general solution to a system of linear equations given by Ax = b, where in (1.5.1) in section 1.5 we found that x = xp + xh . A further discussion of this property of linear ﬁrst-order DEs will occur in theorem 2.3.2 to close the current section. Before doing so, we consider another example. Example 2.3.3 Solve the nonhomogeneous ﬁrst-order linear differential equation y + y tan t = cos t In addition, solve the initial-value problem (IVP) that is given by the same DE and the initial condition y(π/3) = 1. Solution. We ﬁrst determine the integrating factor v(t ). Since p(t ) = tan t , it follows that P(t ) =

tan t dt = − ln(cos t )

Thus, v(t ) = e − ln(cos t ) . Applying the integrating factor and using properties of exponential and logarithmic functions, we now observe that y = e ln(cos t ) cos t · e − ln(cos t ) dt = cos t

cos t

1 dt cos t

144

First-order differential equations

= cos t

1 dt

= cos t (t + C)

Thus, the general solution to the given differential equation is y = t cos t + C cos t . To solve the corresponding IVP with the condition that y(π/3) = 1, it follows that 1 = π/3 · 1/2 + C · 1/2, so that C = 2 − π/3. The solution is y = t cos t + (2 − π/3) cos t As in example 2.3.2, we note that the solution y = t cos t + C cos t in example 2.3.3 is of the form y = yp + yh , where yh = C cos t can easily be checked to be the solution to the corresponding homogeneous equation. Two important results can now be stated in general. The ﬁrst is a formal statement of our derivation in (2.3.12) that shows how we can use an integrating factor to solve any nonhomogeneous linear ﬁrst-order DE. The second demonstrates that for any of these types of DEs, if yp is a particular solution to the nonhomogeneous DE and yh is a complementary solution to the corresponding homogeneous DE, then y = yp + yh is also a solution to the nonhomogeneous DE. Theorem 2.3.2 For any nonhomogeneous linear ﬁrst-order differential equation of the form y + p(t )y = f (t ), the general solution is y = e −P(t )

e P(t ) f (t ) dt

where P(t ) = p(t ) dt . Moreover, for the initial condition y(t0 ) = y0 , if p(t ) and f (t ) are continuous on an interval containing t0 , then the solution to the corresponding initial-value problem is unique.

The proof of the ﬁrst part of theorem 2.3.2 is given above in the discussion of (2.3.7)–(2.3.12). The uniqueness of the solution to the IVP follows from theorem 2.2.1. Finally, we observe that given a nonhomogeneous linear ﬁrst-order differential equation y + p(t )y = f (t ) and a particular solution yp (so yp + p(t )yp = f (t )) and complementary solution yh to the corresponding homogeneous equation (yh + p(t )yh = 0), it follows that (yp + yh ) + p(t )(yp + yh ) = yp + yh + p(t )yp + p(t )yh = (yp + p(t )yp ) + (yh + p(t )yh ) = f (t ) + 0 = f (t )

Linear ﬁrst-order differential equations

145

Therefore, yp + yh is also a solution to the nonhomogeneous DE. Formally, we have the following result. Theorem 2.3.3 For any nonhomogeneous linear ﬁrst-order differential equation, y + p(t )y = f (t ) if yp is a particular solution to the nonhomogeneous equation and yh is a solution to the corresponding homogeneous equation, then y = yp + yh is also a solution to the nonhomogeneous equation.

Exercises 2.3 In exercises 1–6, classify each equation as linear or nonlinear. Do not attempt to solve the equations. 1. y + 7y = e t 2. cos ty + sin ty = t 2 3. cos y + sin y = t 2 4. ty + t 2 y = t 3 5. y y 2 = 3t 6. 1 = y /y In exercises 7–13, solve each of the given homogeneous linear ﬁrst-order DEs. 7. y + y = 0 8. y + 2y = 0 9. y + ty = 0 2 10. y + y = 0 t 11. y = −y cot t 12. (1 + t 2 )y + 2ty = 0 2 y 100 − t In exercises 14–20, solve each of the given nonhomogeneous linear ﬁrst-order DEs. 13. y = −

14. y + y = 2 15. y + 2y = 2t 16. y + ty = 10t

146

First-order differential equations

2 17. y + y = e t t 18. y = −(y − 1) cot t 19. (1 + t 2 )y + 2ty = 2t 2 y 100 − t In exercises 21–27, solve each of the given initial-value problems. 20. y = 0.03 −

21. y + y = 2,

y(0) = 3

22. y + 2y = 2t , 23.

y + ty

y(1) = 0

= 10t ,

y(0) = 5

2 24. y + y = e t , y(1) = 4, t > 0 t 25. y = −(y − 1) cot t , y(π/2) = 1, 0 < t < π 26. (1 + t 2 )y + 2ty = 2t ,

y(0) = 1

2 y, 100 − t

y(0) = 1

27. y = 0.03 −

In exercises 28–33, plot a slope ﬁeld in an appropriate window of t and y values for each of the given DEs. In addition, in the same window, plot the solution to each given IVP. Compare each graph to the solutions you found in the corresponding exercises 21–27. 28. y + y = 2,

y(0) = 3

29. y + 2y = 2t , 30.

y + ty

= 10t ,

y(1) = 0 y(0) = 5

2 31. y + y = e t , y(1) = 4, t > 0 t 32. y = −(y − 1) cot t , y(π/2) = 1, 0 < t < π 33. (1 + t 2 )y + 2ty = 2t ,

y(0) = 1

34. With matrix multiplication, we noted that for any matrix A and appropriately sized vectors x and y, A(x + y) = Ax + Ay. In addition, for any constant c, A(cx) = cAx. We called these properties “the linearity of matrix multiplication.” In calculus, we learn that the derivative operator, D, satisﬁes similar properties of linearity. In particular, if f and g are differentiable functions and c is any constant, what can you say about D(f + g ) and D(cf )? (Recall that D(f ) is alternate notation for f .)

Applications of linear ﬁrst-order differential equations

147

2.4 Applications of linear ﬁrst-order differential equations

A large number of important physical situations can be modeled by linear ﬁrst-order differential equations. In this section we introduce several such applications through examples and explore further scenarios in the exercises. 2.4.1 Mixing problems

Recall that in section 2.1, we encountered a problem where a saltwater solution was entering and exiting a city’s water reservoir. Speciﬁcally, in (2.1.1) we encountered the DE A(t ) dA = 10 − dt 10 This equation, rewritten in the form 1 A + A = 10 10 is a linear ﬁrst-order DE that we now can easily solve. With p(t ) = 1/10, the integrating factor is v(t ) = e t /10 , and therefore −t /10 A=e e t /10 · 10 dt (2.4.1) = e −t /10 (100e t /10 + C) = 100 + Ce

−t /10

(2.4.2) (2.4.3)

From this result, we can also conﬁrm our previous observation that as t → ∞, A(t ) → 100, for any solution A(t ) to the differential equation. Moreover, if we consider the initial condition A(0) = 200 stated along with the original problem in section 2.1, it follows that A(t ) = 100 + 100e −t /10 Certainly we can consider a wide range of variations on this mixing problem by changing concentrations, ﬂow rates, and tank volumes. In every such scenario, the most important thing to keep in mind is that the rate of change of salt (or whatever quantity is under consideration) is the difference between the rate of salt entering and the rate exiting. Furthermore, an analysis of units is often very helpful. We consider one more example to demonstrate what can occur when the entering and exiting solutions are ﬂowing at different rates. Example 2.4.1 Consider a tank in which 1 g of chlorine is initially present in 100 m3 of a solution of water and chlorine. A chlorine solution concentrated at 0.03 g/m3 ﬂows into the tank at a rate of 1 m3 /min, while the uniformly mixed solution exits the tank at 2 m3 /min. At what time is the maximum amount of chlorine present in the tank, and how much is present?

148

First-order differential equations

Solution. To answer the questions posed, we set up and solve an IVP. We let A(t ) denote the amount of chlorine in the tank (in grams) at time t (in minutes). We note from the inﬂow that the rate at which chlorine is entering the tank is given by m3 g · 0.03 3 (2.4.4) rate in = 1 min m For the exiting ﬂow, we must compute the concentration of chlorine present in the solution leaving the tank. This concentration is given by the ratio of amount present in grams to the total volume of solution in the tank at time t . In this problem, note that the volume is changing as a function of time. In particular, since solution enters at 1 m3 /min and exits at 2 m3 /min, it follows that the volume V (t ) of solution present in the tank is decreasing at a rate of 1 m3 /min. With 100 m3 initially present, we observe that V (t ) = 100 − t is the volume of solution in the tank at time t . Thus, the concentration of chlorine in the solution exiting the tank at time t is given by rate out = 2

2 · A(t ) g m3 A(t ) g · = 3 min V (t ) m 100 − t min

(2.4.5)

It follows from (2.4.4) and (2.4.5) that the overall instantaneous rate of change of chlorine in the tank with respect to time is 2A dA = rate in − rate out = 0.03 − dt 100 − t Note that we also have the initial condition A(0) = 1. Rearranging the differential equation, we see that we must solve the nonhomogeneous linear ﬁrst-order equation 2 A + (2.4.6) A = 0.03 100 − t Applying the approach discussed in section 2.3, followed by the initial condition, it can be shown that the solution to (2.4.6) is A(t ) = 3 − 0.03t − 0.0002(100 − t )2 From the quadratic nature of this solution, as well as from the direction ﬁeld shown in ﬁgure 2.4, we can see that this function has a maximum value. It is a straightforward exercise to show that this maximum of A(t ) occurs when t = 25 min and that the maximum is A = 1.125 g.

2.4.2 Exponential growth and decay

A radioactive substance emits particles; in doing so, the substance decreases its mass. This process is known as radioactive decay. For example, the radioactive isotope carbon-14 emits particles and loses half its mass over a period of 5730 years. For any such isotope, the instantaneous rate of decay is proportional to the mass of the substance present at that instant. Thus, assuming an initial

Applications of linear ﬁrst-order differential equations

149

A(t) 2.0

1.0

t 50

100

Figure 2.4 Direction

ﬁeld for (2.4.6) with solution corresponding to the initial condition A(0) = 1.

mass M0 is present, it follows that the mass M (t ) of the substance at time t must satisfy the initial-value problem M = −kM , M (0) = M0

(2.4.7)

for some positive constant k. Note that the minus sign is present in (2.4.7) since the mass M (t ) is decreasing. It follows from our work with homogeneous linear ﬁrst-order DEs in section 2.3 that the solution to this equation is M (t ) = M0 e −kt

(2.4.8)

Similarly, experiments show that a population with zero death rate (e.g., a colony of bacteria with sufﬁcient food and no predators) grows at a rate proportional to the size of the population at time t . In particular, if P(t ) is the population present at time t and P0 is the initial population, then P satisﬁes the initialvalue problem P = kP, P(0) = P0 , for some positive constant k. Here, it follows that (2.4.9) P(t ) = P0 e kt Problems involving radioactive decay and exponential population growth are very similar and should be familiar to students from past courses in calculus and precalculus. We include one example here for review and several more in the exercises at the end of the section. Example 2.4.2 A radioactive isotope initially has 40 g of mass. After 10 days of radioactive decay, its mass is 39.7 g. What is the isotope’s half-life? At what time t will 1 g remain? Solution. Because the isotope decays radioactively, we know that its mass M (t ) must have the form M (t ) = M0 e −kt . To answer the questions posed, we must ﬁrst determine the constant k. In the given problem, we know that M0 = 40

150

First-order differential equations

and that M (10) = 39.7. It follows that 39.7 = 40e −10k Dividing both sides of the equation by 40, taking natural logs, and solving for k, we ﬁnd that 1 39.7 k = − ln 10 40 To compute the half life, we now solve the equation M0 = M0 e −kt 2 for t . In particular, we have 1

20 = 40e 10 ln

39.7 40 t

Dividing by 40 and taking natural logs, 1 1 39.7 ln t = ln 2 10 40 so ln 12 t = 1 39.7 10 ln 40 Thus the half-life of the isotope is approximately 921 days. Finally, to determine when 1 g of the substance will remain, we simply solve the equation 1

1 = 40e 10 ln Doing so shows that t ≈ 4900 days.

39.7 40 t

2.4.3 Newton’s law of Cooling

Suppose that T (t ) is the temperature of a body immersed in a cooler surrounding medium such as air or water. Sir Isaac Newton postulated (and experiments conﬁrm) that the body will lose heat at a rate proportional to the difference between its present temperature and the temperature of its surroundings. If we assume that the temperature of the surrounding medium is constant, say Tm , and that the warmer body’s initial temperature is T (0) = T0 , then Newton’s law of Cooling can be expressed through the initial-value problem T = −k(T − Tm ), T (0) = T0

(2.4.10)

Written in the standard form of a nonhomogeneous linear ﬁrst-order DE, we ﬁnd that T satisﬁes the IVP T + kT = kTm , T (0) = T0

(2.4.11)

Applications of linear ﬁrst-order differential equations

151

Solving this problem in the standard way reveals that the temperature of the cooling body must satisfy T (t ) = (T0 − Tm )e −kt + Tm (2.4.12) We consider an example with some particular details given in order to analyze the behavior of the temperature function. Example 2.4.3 A can of soda at room temperature 70◦ F is placed in a refrigerator that maintains a constant temperature of 40◦ F. After 1 hour in the refrigerator, the temperature of the soda is 58◦ F. At what time will the soda’s temperature be 41◦ F? Solution. Let T (t ) denote the temperature of the soda at time t in degrees F; note that T0 = 70. Since the surrounding temperature is 40, T satisﬁes the initial-value problem T = −k(T − 40), T (0) = 70 and therefore by (2.4.12) T has the form T (t ) = 30e −kt + 40 In particular, note that the temperature is decreasing exponentially as time increases and tending towards 40◦ F, the temperature of the refrigerator, as t → ∞. To determine the constant k, we use the additional given information that T (1) = 58, and therefore 58 = 30e −k + 40 It follows that e −k = 3/5, and thus k = ln(5/3). To now answer the original question, we solve the equation 41 = 30e − ln(5/3)t + 40 and ﬁnd that t = ln(30)/ ln(5/3) ≈ 6.658 h. Exercises 2.4 1. A population of bacteria is growing at a rate proportional to the number of cells present at time t . If initially 100 million cells are present and after 6 hours 300 million cells are present, what is the doubling time of the population? At what time will 100 billion cells be present? 2. The half-life of a radioactive element is 2000 years. What percentage of its original mass is left after 10 000 years? After 11 000 years? 3. The evaporation rate of moisture from a sheet hung on a clothesline is proportional to the sheet’s moisture content. If one half of the moisture evaporates in the ﬁrst 30 min, how long will it take for 95 percent of the moisture to evaporate?

152

First-order differential equations

4. A population of 200 million people is observed to grow at a rate proportional to the population present and to be increasing at a rate of 2 percent per year. How long will it take for the population to triple? 5. In a certain lake, wildlife biologists determine that the walleye population is growing very slowly. In particular, they conclude that the population growth is modeled by the differential equation P = 0.002P, where P is measured in thousands of walleye, and time t is measured in years. The biologists estimate that the initial population of walleye in the lake is 100 000 ﬁsh. To enhance the ﬁshery, the department of conservation begins planting walleye ﬁngerlings in the lake at a rate of 5000 walleye per year. (a) Write an IVP that the population P(t ) of walleye in the lake in year t will satisfy under the assumption that walleye are being added to the lake at a rate of 5000 ﬁsh per year. (b) Solve the IVP stated in (a). (c) In 20 years, how many more walleye will be in the lake than if the biologists had not planted any ﬁsh? 6. Solve the IVP A = 0.03 − 2/(100 − t ) A, A(0) = 1, in order to verify the stated solution in example 2.4.1. 7. Brine (saltwater) is entering a 25 m3 tank at ﬂow rate of 0.25 m3 /min and at a concentration of 6 g/m3 . The uniformly mixed solution exits the tank at a rate of 0.25 m3 /min. Assume that initially there are 15 m3 of solution in the tank at a concentration of 3 g/m3 . (a) State an IVP that is satisﬁed by A(t ), the amount of salt in grams in the tank at time t . (b) What will happen to the amount of salt in the tank as t → ∞? Why? (c) Plot a direction ﬁeld for the IVP stated in (a), including a plot of the solution. (d) At exactly what time will there be 75 g of salt present in the tank? 8. Brine is entering a 25-m3 tank at ﬂow rate of 0.5 m3 /min and at a concentration of 6 g/m3 . The uniformly mixed solution exits the tank at a rate of 0.25 m3 /min. Assume that initially there are 5 m3 of solution in the tank at a concentration of 25 g/m3 . (a) State an IVP that is satisﬁed by the amount of salt A(t ) in grams in the tank at time t . (b) Solve the IVP stated in (a). For what values of t is this problem valid? Why? (c) At exactly what time will the least amount of salt be present in the tank? How much salt will there be at that time? (d) Plot a direction ﬁeld for the IVP stated in (a), including a plot of the solution. Discuss why this direction ﬁeld and the solution make sense in the physical context of the problem.

Applications of linear ﬁrst-order differential equations

153

9. A body of water is polluted with mercury. The lake has a volume of 200 million cubic meters and mercury is present in a concentration of 5 grams per million cubic meters. Health ofﬁcials state that any level above 1 g per million cubic meters is considered unsafe. If water unpolluted by mercury ﬂows into the lake at a rate of 0.5 million cubic meters per day, and uniformly mixed lake water ﬂows out of the lake at the same rate, how long will it take for the lake to reach a mercury concentration that is considered safe? 10. An average person takes eighteen breaths per minute and each breath exhales 0.0016 m3 of air that contains 4 percent more carbon dioxide (CO2 ) than was inhaled. At the start of a seminar containing 300 participants, the room air contains 0.4 percent CO2 . The ventilation system delivers 10 m3 of fresh air per minute to the room whose volume is 1500 m3 . Find an expression for the concentration level of CO2 in the room as a function of time; assume that air is leaving the room at the same rate that it enters. 11. Solve the general Newton’s law of Cooling IVP T = −k(T − Tm ), T (0) = T0 in order to verify the solution stated in (2.4.12). 12. A potato at room temperature of 72◦ F is placed in an oven set at 350◦ F. After 30 min, the potato’s temperature is 105◦ F. At what time will the potato reach a temperature of 165◦ F? 13. An object at a temperature of 80◦ C is placed in a refrigerator maintained at 5◦ C. If the temperature of the object is 75◦ C at 20 min after it is placed in the refrigerator, determine the time (in hours) the object will reach 10◦ C. 14. An object at a temperature of 9◦ C is placed in a refrigerator that is initially at 5◦ C. At the same time the object is placed in the refrigerator, the refrigerator’s thermostat is adjusted in order to raise the temperature inside from 5◦ C to 10◦ C; the function that governs the temperature of the 10 refrigerator is R(t ) = . 1 + e −0.75t (a) Using the refrigerator’s temperature constant k from exercise 13, modify Newton’s law of Cooling appropriately to state an IVP whose solution is the temperature of the object. (b) Plot a direction ﬁeld for the IVP from (a) and sketch an approximate solution to the IVP. (c) Discuss the qualitative behavior of the solution to the IVP. Estimate the minimum temperature the object achieves. 15. On a cold, winter evening with an outdoor temperature of 4◦ F, a home’s furnace fails at 10 pm. At the time of the furnace failure, the indoor temperature was 68◦ F. At 2 am, the indoor temperature was 60◦ F.

154

First-order differential equations

Assuming the outside temperature remains constant, at what time will the homeowner have to begin to worry about pipes freezing due to an indoor temperature below 32◦ F?

2.5 Nonlinear ﬁrst-order differential equations

So far in our work with differential equations, we have seen that linear ﬁrstorder differential equations have many interesting properties. One is that any IVP that corresponds to a linear ﬁrst-order DE (with reasonably well-behaved functions p(t ) and f (t )) is guaranteed to have a unique solution. In addition, through our development of integrating factors, we have a method by which we can always (at least in theory) determine a solution for the differential equation. Any differential equation that is not linear is called nonlinear. Thus, nonlinear differential equations constitute every other type of equation we can conceive. Unfortunately, nonlinear equations are (in general) far more difﬁcult to solve than linear ones. We will limit ourselves in this section to considering a few relatively common special cases of nonlinear ﬁrst-order differential equations that can be solved analytically. In section 2.6, we will consider qualitative and approximation techniques that enable us to gain valuable information from a nonlinear initial-value problem, even in the event that we cannot solve it explicitly. 2.5.1 Separable equations

In example 2.3.1 in section 2.3, we solved the differential equation y = −(1 + t 2 )y. While this equation is linear, our method provides insight into how to approach a class of nonlinear equations whose structure is similar. We begin by considering a slightly modiﬁed example. Example 2.5.1

Solve the nonlinear ﬁrst-order differential equation y = −(1 + t 2 )y 2

(2.5.1)

Solution. Following our approach in example 2.3.1, we can separate the variables y and t algebraically to arrive at the equation dy = −1 − 3t 2 dt Integrating both sides of this equation with respect to t , dy (y(t ))−2 dt = (−1 − 3t 2 ) dt (2.5.2) dt The left-hand side may be simpliﬁed to y −2 dy. Thus, evaluating each integral in (2.5.2), we ﬁnd that −y −1 = −t − t 3 + C (2.5.3) y −2

Nonlinear ﬁrst-order differential equations

155

We note again that since an arbitrary constant of integration arises on each side, it sufﬁces to include just one. It is essential here to observe that by successfully integrating, we have removed the presence of y in the equation, and now have only an algebraic, rather than differential, equation in t and y. Solving (2.5.3) algebraically for y, it follows y=

1 t+

1 3 3t − C

The strategy of example 2.5.1 may be applied to any differential equation of the form y = f (t , y) where f (t , y) can be decomposed into a product of two functions of t and y only. That is, if we can write f (t , y) = g (t ) · h(y) then we are able to separate the variables in the equation, writing all of the y-terms on one side (multiplied by y ), and writing all of the t -terms on the other. Any differential equation of the form y = g (t ) · h(y) is said to be separable. We attempt to solve a separable differential equation by separating the variables and writing 1 (2.5.4) y = g (t ) h(y) Writing y in the alternate notation dy /dt , we have 1 dy = g (t ) h(y) dt

(2.5.5)

Hence when we integrate both sides of (2.5.5) with respect to t , we ﬁnd 1 dy = g (t ) dt h(y) Now, all of this work is only useful if we arrive at integrals we can actually √ evaluate. For example, if the left-hand side is sin y dy, we are really no closer to solving for y than we were when considering the initial differential equation. In section 2.6, we will address ways to approximate the solution of such equations that we seem unable to solve analytically. For now, we consider a few examples of separable equations that we can solve, with more to follow in the exercises. Example 2.5.2 Find a family of solutions to the differential equation y = e t +2y t and a solution to the corresponding initial-value problem with the condition that y(1) = 1.

156

First-order differential equations

Solution.

First, we may write e t +2y = e t e 2y . Thus, we have

y = e t e 2y t Separating the variables, it follows that dy = te t dt Integrating both sides with respect to t , we may now write −2y e dy = te t dt e −2y

Using integration by parts on the right and evaluating both integrals, we have 1 − e −2y = (t − 1)e t + C 2 To now solve algebraically for y, we ﬁrst multiply both sides by −2. Since C is an arbitrary constant, −2C is just another constant, one that we will denote by C1 . Hence e −2y = −2(t − 1)e t + C1 Taking logarithms and solving for y, we can conclude that 1 y = − ln(−2(t − 1)e t + C1 ) 2 is the family of functions that provides the general solution to the original DE. To solve the corresponding IVP with y(1) = 1, we observe that 1 1 y(1) = − ln(−2(1 − 1)e t + C1 ) = − ln(C1 ) = 1 2 2 so ln(C1 ) = −2, and therefore C1 = e −2 . The solution to the IVP is 1 y = − ln(−2(t − 1)e 1 + e −2 ) 2 Example 2.5.3

Is the following differential equation linear or nonlinear? ty + y 2 = 4 Classify the equation, and solve it to ﬁnd a general family of solutions. Solution. We note that the given equation is nonlinear due to the presence of y 2 in the equation; said differently, the left-hand side is not a linear combination of y and y . To separate the variables, we ﬁrst write ty = 4 − y 2 Dividing both sides by t (4 − y 2 ), it follows that 1 1 dy = 2 4 − y dt t

Nonlinear ﬁrst-order differential equations

and therefore

157

dt dy = 2 4−y t Evaluating both integrals, noting that the left-hand side requires integration by partial fractions or a table of integrals, we have y +2 1 = ln t + C ln 4 y −2

It only remains to solve for y algebraically. Using rules of logarithms and letting C = ln K , we can write y + 2 1/4 ln = ln(Kt ) y −2 It now follows that

y + 2 1/4 = Kt y −2 Raising both sides to the fourth power, multiplying by (y − 2), and solving for y yields (Kt )4 + 1 y =2 (Kt )4 − 1 2.5.2 Exact equations

We will consider one other type of nonlinear differential equation that may be solved analytically. We explore this through an example. Let us solve the DE (2 + t 2 y)y + ty 2 = 0 We ﬁrst observe that this equation is neither linear nor separable. The former is clear from the presence of y 2 and yy ; the latter is less obvious, but nonetheless true since the presence of the term (2 + t 2 y) makes it impossible to separate the variables t and y. We therefore explore another algebraic approach. Considering the derivative in differential notation, we have dy (2 + t 2 y) + ty 2 = 0 dt and thus we may instead write (ty 2 )dt + (2 + t 2 y)dy = 0

(2.5.6)

This form may remind us of the total differential d φ of a function φ (t , y), as studied in multivariable calculus. Recall that for a differentiable function φ (t , y), its total differential d φ is given by d φ = φt dt + φy dy where φt = ∂φ/∂ t and φy = ∂φ/∂ y. Note, therefore, from (2.5.6) that if there exists a function φ such that φt = ty 2 and φy = 2 + t 2 y, then (2.5.6) is actually

158

First-order differential equations

of the form d φ = 0, from which it follows that φ (t , y) = K , for some constant K . Assuming that we can ﬁnd the function φ (t , y), we have then transformed the original differential equation in t and y to an algebraic equation in t and y, one that we can hopefully solve for y. In the current example, let us suppose that such a function φ (t , y) exists, and therefore that ∂φ = ty 2 (2.5.7) ∂t and ∂φ = 2 + t 2y (2.5.8) ∂y Integrating both sides of (2.5.7) with respect to t , it follows that 1 φ (t , y) = t 2 y 2 + g (y) 2 The function g (y) arises since the partial derivative with respect to t of any function of only y is zero. For φ to satisfy the condition in (2.5.8), we see that we must take the partial derivative with respect to y of our most recent result and set this equal to 2 + t 2 y. Doing so, we ﬁnd that ∂φ = t 2 y + g (y) = 2 + t 2 y ∂y

Therefore, g (y) = 2, so g (y) = 2y, and we have found that 1 φ (t , y) = t 2 y 2 + 2y 2 Since it is the case that d φ = 0, we know that φ (t , y) = K , and therefore t and y are related by the algebraic equation 1 2 2 t y + 2y = K 2 From the quadratic formula, it follows that √ −2 ± 4 + 2Kt 2 y= t2 and we have solved the original equation. The choice of “+” or “−” in the solution would depend on the value given in an initial condition. There are several important lessons to learn from this example. One is some terminology. If a differential equation can be written in the form M (t , y)dt + N (t , y)dy = 0

(2.5.9)

and there exists a function φ (t , y) such that φt (t , y) = M (t , y) and φy (t , y) = N (t , y), then since the differential equation is of the form d φ (t , y) = 0, we say that the equation is exact. So, certainly a ﬁrst check of whether an equation might be exact consists in trying to write it in the form of (2.5.9). Still, there is the issue of whether or

Nonlinear ﬁrst-order differential equations

159

not φ exists. If φ does exist, and we further assume that M (t , y) and N (t , y) have continuous ﬁrst-order partial derivatives, then it follows from Clairaut’s Theorem in multivariable calculus that My (t , y) = φty = φyt = Nt (t , y) Thus, if (2.5.9) is exact, then it must be the case that My = Nt . Said differently, if My = Nt , then the differential equation is not exact. In fact, it turns out that if My = Nt , then the equation is guaranteed to be exact, but this result is much more difﬁcult to prove. As a consequence of this, it sufﬁces for us to check if My = Nt as a ﬁrst step; if so, the equation is indeed exact and we then proceed to try to ﬁnd the function φ in order to solve the differential equation. If not, another approach is needed. An example is instructive. Example 2.5.4 Solve the differential equation t y + ln(ty) + 1 = 0 y Solution. We begin by observing that this equation is neither linear nor separable. Thus, writing the derivative in differential notation, we have t dy + ln(ty) + 1 = 0 y dt and then rearranging algebraically, t (ln(ty) + 1)dt + dy = 0 y

(2.5.10)

Letting M (t , y) = ln(ty) + 1 and N (t , y) = t /y, we observe that 1 1 1 My = t = and Nt = ty y y and therefore, My = Nt . Hence the differential equation is exact and we can assume that a function φ exists such that φt = M (t , y) and φy = N (t , y). Since the latter equation is more elementary, we consider φy = t /y, and integrate both sides with respect to y. Doing so, we ﬁnd that φ (t , y) = t ln y + h(t )

(2.5.11)

From (2.5.10), φ must also satisfy φt = ln(ty) + 1, so we take the partial derivative of both sides of (2.5.11) with respect to t to ﬁnd that φt = ln y + h (t ) = ln(ty) + 1

From this and properties of the logarithm, we observe that ln y + h (t ) = ln t + ln y + 1 and thus h (t ) = ln t + 1. It follows (integrating by parts and simplifying) that h(t ) = t ln t . Thus, we have demonstrated that the original equation is indeed

160

First-order differential equations

exact by ﬁnding φ (t , y) = t ln y + t ln t = t ln(ty). From here, we now know that φ (t , y) = K , and so t ln(ty) = K Solving for y, we have that 1 y = e K /t t

Exercises 2.5 Classify each of the DEs in exercises 1–14 as linear, nonlinear, separable, or exact. Note that it is possible for an equation to satisfy more than one classiﬁcation. 1. y = 10y 2. y = 10y + 10 3. y = 10y 2 4. y = 10y 2 − 10 5. t 2 y + y 2 = 1 dy =1 dt 7. tdy − (y − 1)dt = 0 6. e 3t +y

8.

5ty − t dy = dt 4 + t2

dy dy = 6 − 3t 2 dt dt −2ty dy 10. = 2 dt t +1 9. y − t

11. (2 + t 2 )y + 2ty = 0 12. 3y 2 y + t 2 = 0 13. (y + t )y + y = t 14. y sin 2t + 2y cos 2t = 0 Solve each of the DEs in exercises 15–28. 15. y = 10y 16. y = 10y + 10 17. y = 10y 2 18. y = 10y 2 − 10

Nonlinear ﬁrst-order differential equations

161

19. t 2 y + y 2 = 1 dy =1 dt 21. tdy − (y − 1)dt = 0

20. e 3t +y

22.

5ty − t dy = dt 4 + t2

dy dy = 6 − 3t 2 dt dt −2ty dy 24. = 2 dt t +1 23. y − t

25. (2 + t 2 )y + 2ty = 0 26. 3y 2 y + t 2 = 0 27. (y + t )y + y = t 28. y sin 2t + 2y cos 2t = 0 Solve each of the IVPs stated in exercises 29–42. In addition, use a computer algebra system to plot an appropriate direction ﬁeld for each, and sketch your solution within the plot. 29. y = 10y,

y(0) = 3

30. y = 10y + 10, 31. y = 10y 2 , 32.

y

y(0) = 2

y(1) = 4

= 10y 2 − 10,

33. t 2 y + y 2 = 1,

y(1) = −1 y(2) = 0

dy = 1, y(0) = 0 dt 35. tdy − (y − 1)dt = 0, y(1) = 3

34. e 3t +y

36.

dy 5ty − t = , dt 4 + t2

y(1) = 1

dy dy = 6 − 3t 2 , y(1) = 5 dt dt −2ty dy = 2 , y(0) = 4 38. dt t +1

37. y − t

39. (2 + t 2 )y + 2ty = 0, 40. 3y 2 y + t 2 = 0,

y(1) = 1

y(0) = 1

162

First-order differential equations

41. (y + t )y + y = t ,

y(0) = 1

42. y sin 2t + 2y cos 2t = 0, y(π/4) = 1/2 √ 43. Consider the IVP y = y, y(0) = 0. Show that this IVP has more than one solution. Does this result contradict theorem 2.2.1? 2.6 Euler’s method

While we have learned to solve certain classes of differential equations explicitly—including linear ﬁrst-order, separable, and exact equations—we must also develop the ability to estimate solutions to initial-value problems that we cannot solve analytically. Direction ﬁelds will play a key role in motivating our work, as we see in the following introductory example. Consider the initial-value problem dy + y 2 = t , y(0) = 1 dt

(2.6.1)

This DE is not linear due to the presence of y 2 . In addition, since we can write y = t − y 2 , we see that the right-hand side may not be expressed as a product of two functions that each involve just one of the variables t and y. Thus, the equation is not separable. Finally, writing the equation in the form dy + (y 2 − t )dt = 0, it is straightforward to check that this equation is not exact. While it may seem frustrating to not be able to use any of the solution methods we have discussed so far, it is important to realize that many differential equations cannot be solved explicitly by analytic techniques. As such, we must explore how we can use our understanding of derivatives to estimate certain values of the solution to an IVP. For the given DE, writing y = t − y 2 , we can generate the direction ﬁeld that is shown in ﬁgure 2.5. For the initial condition y(0) = 1, visually estimating how the solution y(t ) will ﬂow through the direction ﬁeld, we can roughly estimate that y(1/2) ≈ 0.75. But if we think about the calculus underpinnings of slope ﬁelds, we can be much more precise in our estimate. Recall that a direction ﬁeld for a DE y = f (t , y) is created by observing that the slope of the tangent line to the solution curve y(t ) at the point (t0 , y0 ) is f (t0 , y0 ). In the current example, we know that the solution to the IVP must pass through the point (t0 , y0 ) = (0, 1). At this point, the slope of the tangent line to the solution curve is m = 0 − 12 = −1; note also that m ≈ y /t , where y is the exact change in y from t = 0 to t = 1/2, due to the fact that the tangent line approximates the solution curve for values near the point of tangency. Thus, as we step from t0 = 0 to t = 1/2, a change of 1/2 in the t -direction will generate an approximate change y = t · m = 1/2 · (−1) = −1/2 in y. Therefore, from our original y-value of 1, a change of −1/2 leads us to the approximation that y(1/2) ≈ 1/2.

Euler’s method

163

y(t) 2

1

t −1

1

2

3

−1 Figure 2.5 The direction ﬁeld for (2.6.1).

y(t) 2.0 1.5 1.0 0.5 t 0.5

1.0

1.5

2.0

Figure 2.6 Taking one step to esti-

mate y(0.5) in (2.6.1).

Graphically, this estimation approach amounts to following the tangent line to the solution curve for some prescribed change in t . We can see this in ﬁgure 2.6, where it is immediately evident that our estimate is too small. In calculus, we learn that while the tangent line approximation to a differentiable function is good near the point of tangency, the approximation gets poorer and poorer the further we move from the point of tangency. Thus, a natural approach to the estimation problem at hand is to take a smaller step, then search the direction ﬁeld for a new direction to follow, and then take another small step. In this situation, we are much like a hiker lost in the woods who is attempting to navigate by compass: just as the hiker is best served by checking a compass frequently, so are we best served by checking slopes frequently.

164

First-order differential equations

y(t) 2.0 1.5 1.0 0.5 t 0.5

1.0

1.5

2.0

Figure 2.7 Two steps of size 0.25 to

estimate y(0.5) in (2.6.1).

So, rather than stepping the full distance of 1/2 from t = 0 to t = 1/2, let us ﬁrst step to t = 1/4, ﬁnd an estimate to y(1/4), and then proceed from there to estimate y(1/2). Starting at (0, 1), we know that the slope of the tangent line to the solution curve at this point is m0 = f (0, 1) = −1. Stepping t = 0.25, it follows that we experience a change in y along the tangent line of y = m0 t = −1(0.25) = −0.25. Thus, we have that y(0.25) ≈ y(0) + y = 1 − 0.25 = 0.75. Now we repeat this process from the point (0.25, 0.75). At this point, the slope of the tangent line to the solution curve is m1 = f (0.25, 0.75) = 0.25 − (0.75)2 = −0.3125. Taking a step of t = 0.25, it follows that the change in y along the tangent line will be y = m1 t = −0.3125(0.25) = −0.078125. Thus we have that y(0.5) ≈ 0.75 − 0.078125 = 0.671875. We record our work graphically in ﬁgure 2.7, where our improved approximation is apparent, though the estimate is still too small. It is evident from our work in this ﬁrst example that we can signiﬁcantly improve our ability to estimate an initial-value problem’s solution at various t -values by developing an iterative process that uses reasonably small step sizes. In particular, we want to imitate the way in which we took two steps, but rather be able to take n steps using a step-size of t = h. Throughout, the key idea is always that we are estimating the solution function by determining its tangent line at a given point, and then following the tangent line for the determined step size. We observe that when moving along any line from a given point (told , yold ) to a new point (tnew , ynew ), it follows that ynew = yold + y = yold +

y · t t

= yold + m · t

(2.6.2)

Euler’s method

165

Another essential observation to make is that the slope m at each step of our approximation is given by m = y = f (t , y) in the differential equation that we are attempting to solve. In particular, if we have some approximation at time tk given by yk , the slope of the tangent line to the solution curve at this point is given by f (tk , yk ). Therefore, using this value for m in (2.6.2) and letting h = t be the step size, we now have (2.6.3) ynew = yold + hf (told , yold ) Hence, starting from the initial condition (t0 , y0 ), we are able to generate the sequence of points (t1 , y1 ), . . . , (tn , yn ), where for each n ≥ 0, (2.6.4) tn+1 = tn + h and yn+1 = yn + hf (tn , yn ) The value yn is an approximation of the exact solution value y(tn ) at each step, so that yn ≈ y(tn ) for each n ≥ 1. This method of approximating the solution to an initial-value problem is known as Euler’s method. Example 2.6.1 For the initial-value problem dy + y 2 = t , y(0) = 1 dt that we have just considered, apply Euler’s method to estimate the value of y(1/2) using h = 0.1. Solution. At the end of this section, the implementation of Euler’s method in a spreadsheet such as Excel will be discussed. Here, we simply report the results of such a computer implementation. If we use a step size of h = 0.1, we see that we will take ﬁve steps to move from t0 = 0 to t5 = 0.5, the point at which we seek to approximate y. Doing so yields the output shown in table 2.1. With just ﬁve steps, we can see in the direction ﬁeld in ﬁgure 2.8, together with a piecewise linear plot of the approximate solution, that we have an apparently good estimate in the above table for how the actual solution to this IVP behaves on this interval. In the example we have been considering with various step sizes, one shortcoming is that we do not have a precise sense of how accurate our Table 2.1 Euler’s method applied to the IVP y = t − y 2 , y(0) = 1, using h = 0.1 tn

yn

0 0.1 0.2 0.3 0.4 0.5

1 0.9 0.829 0.7802759 0.749392852 0.733233887

166

First-order differential equations

y(t) 2.0 1.5 1.0 0.5 t 0.5

1.0

1.5

2.0

Figure 2.8 Five steps of size h = 0.1 to estimate y(0.5).

approximations are. One way to explore this issue is to apply Euler’s method to an IVP that we can solve exactly, and then compare our estimates with actual solution values. We do so in the following example. Example 2.6.2 Solve the IVP y = y − t , y(0) = 0.5 exactly, and use Euler’s method with the step sizes h = 0.2 and h = 0.1 to estimate the value of y(1). Hence analyze the effect that step size has on error in the method. Solution. We ﬁrst observe that y = y − t is a linear ﬁrst-order DE. Applying our work from section 2.3, we can determine that the solution to this equation is y = 1 + t + Ce t . The initial condition y(0) = 0.5 then implies that C = −1/2, so that the solution to the IVP is y(t ) = 1 + t −

et 2

If we apply Euler’s method with h = 0.2 and take 5 steps to determine yn at each, and also evaluate y(tn ) at each stage, the resulting output is shown in table 2.2. Here, we observe the obvious pattern that the further we step away from the initial condition, the greater the error we encounter. This is a natural consequence of the use of linear approximations. To get a further sense of how the error at a given step depends on step size, we now apply the same method with h = 0.1. Doing so produces the results in table 2.3. For ease of display and comparison to the case where h = 0.2, we only report the results from every other step. By comparing the approximations in the preceding two tables at the common values of t = 0.2, 0.4, 0.8, 1 we can see that cutting the step size in half appears to have reduced the error by a factor of approximately 2.

Euler’s method

167

Table 2.2 Euler’s method applied to the IVP y = y − t, y(0) = 0.5, using h = 0.2 Euler Est.

Solution

Error

tn

yn

y(tn )

|y(tn ) − yn |

0 0.2 0.4 0.6 0.8 1.0

0.5 0.6 0.68 0.736 0.7632 0.75584

0.5 0.5892986 0.6540877 0.6889406 0.6872295 0.6408591

0 0.0107014 0.0259123 0.0470594 0.0759705 0.1149809

Table 2.3 Euler’s method applied to the IVP y = y − t, y(0) = 0.5, using h = 0.1 Euler Est.

Solution

Error

tn

yn

y(tn )

|y(tn ) − yn |

0 0.2 0.4 0.6 0.8 1

0.5 0.595 0.66795 0.7142195 0.728205595 0.70312877

0.5 0.5892986 0.6540877 0.6889406 0.6872295 0.6408591

0 0.0057014 0.0138623 0.0252789 0.0409761 0.0622697

In fact, there are sophisticated ways by which we can analyze the error of Euler’s method in general; we explore these and related issues in depth in chapter 7 on numerical methods. And while Euler’s method can give us an intuitive sense for how a solution is behaving locally, we must note here that its error grows too fast to make it reliable. More sophisticated algorithms for numerically estimating solutions to differential equations exist; several of these are developed in chapter 7. 2.6.1 Implementing Euler’s method in Excel

Any spreadsheet program provides a straightforward way to implement Euler’s method. In our calculations, we will use Microsoft Excel. Recall that in Euler’s method, given an initial-value problem y = f (t , y), y(t0 ) = y0 , we seek approximations y1 , y2 , . . . such that yn ≈ y(tn ), where tn = t0 + htn for some chosen step size h. In particular, we use the rule yn+1 = yn + hf (tn , yn )

168

First-order differential equations

In a given row of the spreadsheet, we will view the data (as labeled in the cells below) step number n, step size h, t -value tn , approximate current y-value yn , slope f (tn , yn ), and updated y-value yn+1 . We will demonstrate the development of such an Excel spreadsheet for the particular example y = t − y 2 , y(0) = 1 using a step size of h = 0.1. To begin, we establish names for the various columns, say in cells A1, B1, C1, D1, E1, and F1, as shown below by entering the text “n”, “h”, etc., in the respective cells shown below.

1

A

B

C

D

E

n

h

t n

y n

f(t n,y n)

F y n+1

In row 2, we now enter the given data at step zero. In particular, in cell A2 we enter the step number (“0”), in B2 the chosen step size (“0.1”), in C2 the starting t -value (“0”), in D2 the starting y-value (“1”), and in E2, we apply the function f (t , y) to get the slope at the point at this step. That is, since in this IVP f (t , y) = t − y 2 , we enter in E2 the command “=C2 - D2ˆ2”. We now also have enough information entered to compute y1 in cell F2. Using the rule from Euler’s method, we know y1 = y0 + hf (t0 , y0 ). In our spreadsheet, this implies we must enter “=D2 + B2*E2”. Doing so, the result (y1 = 0.9) appears in cell F2. Now our spreadsheet should appear as shown.

A

B

C

D

1

n

h

t n

y n

2

0

0.1

0

1

E

F

f(t n,y n) y n+1 −1

0.9

In row 3, we may now build subsequent entries based on existing data. To increase the step number, in A3 we enter “=A2 + 1”. Since the step-size stays constant throughout, in B3 we input “=B2”. Because the next t -value will be the preceding t -value plus the step size (t1 = t0 + h), we enter in C3 the command “=C2 + B2”. We also have the next y-value, so in D3 we enter “=F2” to have this data available in the given row. The slope at step 1 is computed according to the same rule (given by f (t , y)) as it was at step 0. Hence in cell E3 we simply paste a copy of cell E2, which ensures that Excel uses the same computations, but updates them for the current step. Equivalently, we can directly enter in E3 the text “=C3 - D3ˆ2”. Cell F3 computes the newest y-value: the same rule as in step 0 must be followed, so we can copy and paste cell F2 into F3, or equivalently enter in F3 “=D3 + B3*E3”.

Euler’s method

169

At this stage, we see on the screen the following. A

B

C

D

E

F

1

n

h

t n

y n

f(t n,y n)

y n+1

2

0

0.1

0

1

−1

0.9

3

1

0.1

0.1

0.9

−0.71

0.829

Now we can harness the power of Excel to compute as many subsequent steps as we like. By using the mouse to highlight row 3 (cells A3 through F3), and then placing the cursor on the bottom right corner of cell F3, we can then click and drag downward to ﬁll subsequent rows with similar calculations. For example, doing so through row 5 (i.e., down to F7) yields the following table. A

B

C

D

E

F

1

n

h

t n

y n

f(t n,y n)

y n+1

2

0

0.1

0

1

-1

0.9

3

1

0.1

0.1

0.9

-0.71

0.829

4

2

0.1

0.2

0.829

-0.487241

0.7802759

5

3

0.1

0.3

0.7802759

6

4

0.1

0.4

0.749392852 -0.161589647 0.733233887

7

5

0.1

0.5

0.733233887 -0.037631934 0.729470694

-0.30883048 0.749392852

Besides the ease of iteration past the ﬁrst two rows, there are further advantages Excel offers. One is that changing one appropriately-chosen cell will update all of our computations. For example, if we are interested in the change induced by a different step size, say h = 0.05, all we need to do is enter “0.05” in cell B2, and every other cell will update accordingly. In addition, if we desire to see the graphical results of our work, we can use Excel’s Chart Wizard. To plot our approximations, we can simultaneously highlight the t and y columns in our chart above (cells C2 through C7 and D2 through D7), and then go to Insert menu and select Chart (alternatively, we may click on the Chart Wizard icon on the toolbar). In the prompt window that arises, we choose “XY (Scatter)” and select one of the graph style options at the right by clicking on the desired one. By clicking “Next” in a few subsequent windows (in which advanced users can avail themselves of more options), we eventually get to a ﬁnal window where our graph appears and the option to “Finish.” Clicking on “Finish,” the graph will appear in the spreadsheet and may be moved around

170

First-order differential equations

1.2 1 0.8 Series1

0.6 0.4 0.2 0 0

0.1

0.2

0.3

0.4

0.5

0.6

Figure 2.9 An Excel plot of an approximate solution to the IVP y = t − y 2 , y(0) = 1, for 0 ≤ t ≤ 0.5.

by clicking and dragging it accordingly. We see the resulting plot displayed as in ﬁgure 2.9. Exercises 2.6 1. Consider the IVP y = t /y, y(1) = 3 (where we assume that y is always positive). (a) Program Excel to use Euler’s method to determine an estimate of the value of y(3). Do so using a step size of h = 0.2. Show the results in a table and create an appropriate plot of the approximate solution. (b) Use an established solution method to determine an algebraic formula for the unique solution y(t ) for the given IVP. Then determine y(tn ) exactly and use Excel to determine the error in your approximation at each step n. Finally, compare a plot of y(t ) to your plot of the approximation above. (c) Use a computer algebra system appropriately to plot a direction ﬁeld for the given differential equation. By hand, sketch a solution that satisﬁes the above IVP. Compare your work in (a) and (b) to the direction ﬁeld. 2. Consider the IVP y = (1 − t )(1 + y), y(0) = 2. (a) Program Excel to use Euler’s method to determine an estimate to the value of y(1.6). Do so using step sizes of h = 0.2 and h = 0.1. Show the results in a table and create an appropriate plot of the approximate solution.

Euler’s method

171

(b) Use an established solution method to determine an algebraic formula for the unique solution y(t ) for the given IVP. Then determine y(tn ) exactly and use Excel to determine the error in your approximation at each step n. Finally, compare a plot of y(t ) to your plot of the approximation above. (c) Use a computer algebra system appropriately to plot a direction ﬁeld for the given differential equation. By hand, sketch a solution that satisﬁes the above IVP. Compare your work in (a) and (b) to the direction ﬁeld. 3. Consider the IVP y = (t − y)2 /4, y(0) = 1/2. (a) Program Excel to use Euler’s method to determine an estimate to the value of y(1.5). Do so using step sizes of h = 0.1 and h = 0.05. Show the results in a table and create an appropriate plot of the approximate solution. (b) Explain why you cannot solve the given IVP explicitly. (c) Use a computer algebra system appropriately to plot a direction ﬁeld for the given differential equation. By hand, sketch a solution that satisﬁes the above IVP. Compare your work in (a) to the direction ﬁeld. 2 4. Consider the IVP y = e t − y, y(1) = 4, t > 0. t (a) Program Excel to use Euler’s method to determine an estimate to the value of y(2.2). Do so using step sizes of h = 0.1 and h = 0.05. Show the results in a table and create an appropriate plot of the approximate solution. (b) Use an established solution method to determine an algebraic formula for the unique solution y(t ) for the given IVP. Then determine y(tn ) exactly and use Excel to determine the error in your approximation at each step n. Finally, compare a plot of y(t ) to your plot of the approximation above. (c) Use a computer algebra system appropriately to plot a direction ﬁeld for the given differential equation. By hand, sketch a solution that satisﬁes the above IVP. Compare your work in (a) and (b) to the direction ﬁeld. In each of exercises 5–10, ﬁnd an approximate solution to the stated IVP by using Euler’s method with h = 0.1 on the interval [0, 1]. In addition, ﬁnd an exact solution and compare the values and plots of the approximate and exact solutions. 5. y + 2ty = 0, 6.

y

= 2y − 1,

7. y − y = 0,

y(0) = −2 y(0) = 2 y(0) = 2

172

First-order differential equations

8. (y )2 + 2y = 0, 9.

y y 2

10. (t

= 8,

+ 1)yy

y(0) = 2

y(0) = 1 = −1 − y 2 ,

y(0) = 2

In each of exercises 11–14, ﬁnd an approximation solution to the stated IVP by using Euler’s method with h = 0.1 on the interval [0, 1]. In addition, explain why it is not possible to solve the IVP exactly by established methods. 11. (y )2 − 2y 2 = t , y(0) = 2 12. y − sin y = 2e t , y(0) = 0 13. y + y 3 = t 3 , y(0) = 2 14. (t + 1)yy = −1 − y 2 − t 2 , y(0) = 2 2.7 Applications of nonlinear ﬁrst-order differential equations

In this section, we explore two examples of nonlinear differential equations. It is important to recall that if an equation is nonlinear, it is possible that we may not be able to solve for the solution function explicitly. Regardless, we can use direction ﬁelds to qualitatively understand the behavior of solution curves; furthermore, if we are unable to ﬁnd an exact solution function, we may employ Euler’s method to generate approximate solutions. 2.7.1 The logistic equation

We have recently learned that if a population is assumed to grow at a constant relative growth rate (or in a way such that the rate of change of the population is proportional to the size of the population), then the population function satisﬁes the initial-value problem P = kP , P(0) = P0 This leads to the familiar population model P(t ) = P0 e kt , which is also studied in algebra and calculus courses. While this model is a natural one, it is also unrealistic: over signiﬁcant periods of time, the function P will grow to values that become unreasonable since the function exhibits unbounded growth. Therefore, we now explore a more plausible population model. Let us assume we know that a given population P has the tendency over time to level off at a value A. The value A is often called the carrying capacity of the population; as the name indicates, it is the maximum population sustainable by the surrounding environment. It is natural to further assume that if P is close to, but less than A, then dP /dt will be small and positive, indicating that the population will be growing slowly. Similarly, if P is close to, but greater than A, we will want dP /dt to be negative and close to zero, so that the population will be decreasing slowly.

Applications of nonlinear ﬁrst-order differential equations

173

At the same time, we want to maintain the natural inherent exponential characteristic of growth, so when P is relatively small (in comparison to A), we would like for dP /dt to be approximately kP for some appropriate constant k. The combination of all these criteria led Dutch mathematician Pierre Verhulst (1804–1849) to propose the differential equation P dP = kP 1 − (2.7.1) dt A as a more realistic model of population growth, where k and A are positive constants. Equation (2.7.1) is known as the logistic differential equation. That the logistic equation may be solved in general (to determine an explicit solution P involving k and A) will be shown in the exercises. We consider here a speciﬁc example where k and A are given to provide further insight into the behavior of solutions to this equation. Example 2.7.1 A population P(t ) exhibits logistic growth according to the model dP P = 0.05P 1 − , P(0) = 10 dt 75 (a) Determine the values of P for which P is an increasing function (b) Plot the direction ﬁeld for the differential equation (c) Determine the value(s) of P for which P is increasing most rapidly (d) Solve the IVP explicitly for P Solution. (a) To determine where P is increasing, we require that dP /dt > 0. If P < 0, note that (1 − P /75) > 0, which makes dP /dt < 0, so we need P > 0 and (1 − P /75) > 0 to make dP /dt positive. This occurs on the interval 0 < P < 75, so for these P values, P is an increasing function of t . We note further that if P > 75 or P < 0, then dP /dt < 0 and P is a decreasing function. Finally, it is evident that both P = 0 and P = 75 are equilibrium solutions, which makes sense given the physical interpretation of the population model. (b) Using familiar commands in Maple, we can plot the direction ﬁeld for this differential equation. Note in advance the behavior we expect from our work above: two equilibrium solutions at 0 and 75, plus certain increasing and decreasing behavior. Finally, note that our analysis of the equation suggests a good range of values to select for P when plotting, say, P = −10 . . . 100. As always, some experimentation with t may be necessary to get a useful plot. The plot is shown in ﬁgure 2.10.

174

First-order differential equations

P(t) 100 75 50 25 t 25

50

75

100

Figure 2.10 The slope ﬁeld for

dP /dt = 0.05P(1 − P /75).

(c) To decide where P is increasing most rapidly, we seek the maximum value of P . Graphically, we can observe in ﬁgure 2.10 that this appears to occur approximately halfway between P = 0 and P = 75. This is reasonable in light of the physical meaning of the logistic equation, since at this point the population has accumulated some substantial numbers to increase its growth rate, while not being close enough to the carrying capacity to have its growth slowed. We can determine this point of greatest increase in P analytically as well. Note that P = 0.05P(1 − P /75) = 0.05P − 0.0006P 2 , so that P is determined by a quadratic function of P. We have already observed that this quadratic function has zeros at the equilibrium solutions (P = 0 and P = 75), and furthermore, we know that every quadratic function achieves is extremum (a maximum in this case, since the function g (P) = 0.05P − 0.0006P 2 is concave down) at the midpoint of its zeros. Hence, P is maximized precisely when P = 75/2. (d) Our ﬁnal task is to solve the given initial-value problem explicitly for P. We ﬁrst solve the differential equation dP = 0.05P (1 − P /75) dt for P. Note that this equation is separable and nonlinear. Separating variables, we ﬁrst write dP = 0.05dt (2.7.2) P(1 − P /75) Because the left-hand side is a rational function of P, we may use the method of partial fractions to integrate the left-hand side of (2.7.2). Observe that 75 1 = P(1 − P /75) P(75 − P)

Applications of nonlinear ﬁrst-order differential equations

175

Now, letting A B 75 = + P(75 − P) P 75 − P it follows that A = 1 and B = 1, so that (2.7.2) may now be written as 1 1 − dP = 0.05 dt (2.7.3) P P − 75 Integrating both sides of (2.7.3), we ﬁnd that P must satisfy the equation ln |P | − ln |P − 75| = 0.05t + C Using a standard property of logarithms, the left-hand side may be expressed as ln |P |/|P − 75|, and hence using the deﬁnition of the natural logarithm, it follows that P 0.05t +C = Ke 0.05t P − 75 = e where K = e C . Since K is an arbitrary constant, the sign of K will absorb the ± that arises from the presence of the absolute value signs, and thus we may write P = Ke 0.05t P − 75 Multiplying both sides by P − 75 and expanding, we see that P = PKe 0.05t − 75Ke 0.05t and gathering all terms involving P on the left, P(1 − Ke 0.05t ) = −75Ke 0.05t Thus, it follows that −75Ke 0.05t 1 − Ke 0.05t Multiplying the top and bottom of the right-hand side by −1/(Ke 0.05t ), it follows that 75 P= 1 − Me −0.05t where M = 1/K . In this ﬁnal form, it is evident that as t → ∞, P(t ) → 75, which ﬁts with the given carrying capacity in the original problem. At this point, we can use the initial condition P(0) = 10 to solve for M ; doing so results in the equation 10 = 75/(1 − M ), which yields that M = −13/2, and thus 75 P= 13 −0.05t 1+ 2 e A plot of this function (shown in ﬁgure 2.11), along with comparison to our work throughout this example, demonstrates that our solution is correct.

P=

176

First-order differential equations

80

y

40

t 50

100

150

Figure 2.11 The solution P = 75/ −0.05t ) to the IVP dP /dt = (1 + 13 2 e 0.05P(1 − P /75), P(0) = 10.

For the general logistic differential equation dP P = kP 1 − dt A an argument similar to the one we just completed can be used to show that the solution to this equation is A , 1 + Me −kt where M is a constant that may be determined by an initial condition. This fact will be shown in exercise 1 for this section. P(t ) =

2.7.2 Torricelli’s law

Suppose that a water tank has a hole in its base with area a, through which water is ﬂowing. Let h(t ) be the depth of the water and V (t ) be the volume of water in the tank at time t . At what rates are h(t ) and V (t ) changing? Evangelista Torricelli (1608–1647) discovered what has come to be known as Torricelli’s law, which describes the way water in an open tank will ﬂow through a small hole in the bottom. To develop this law, let us consider6 how water molecules will rearrange themselves as water exits the tank and the relationship between the potential and kinetic energy of a small mass m of water. The potential energy lost as a small mass m of water falls from a height h > 0 is mgh, where g is the gravitational constant; at the same time, the kinetic energy gained as an equal mass m exits the tank is 12 mv 2 , where v is the velocity at which the water is ﬂowing. Equating the potential and kinetic energy, we ﬁnd 6 Our approach follows that of R. D. Driver in “Torricelli’s law: An Ideal Example of an Elementary ODE,” Amer. Math. Monthly, 105(5) (May 1998), pp. 453–455.

Applications of nonlinear ﬁrst-order differential equations

177

that mgh = 12 mv 2 , so that v = 2gh This model assumes that no friction is present; a slightly more realistic model takes a fraction of this velocity, depending on the viscosity. For simplicity, we will consider the ideal case where friction is not considered. If we now consider the water exiting the tank, it follows that the rate of change dV /dt of volume in the tank is determined by the product of the area a of the hole and the exiting water’s velocity v. In other words, dV = −av = −a 2gh (2.7.4) dt At this point, observe that we have related the rate of change of volume to the height of the water in the tank at time t . Instead, we desire to either relate dV /dt and V or dh /dt and h. Of course, height and volume are related. If we assume that A(y) denotes the tank’s cross sectional area at height y, then integral calculus tells us that the volume of the tank up to height h is given by h V (h) = A(y)dy 0

Furthermore, by the Fundamental Theorem of Calculus, differentiating V (h) implies dV /dh = A(h), and thus by the chain rule, dV dh dh dV = = A(h) dt dh dt dt Using this new expression for dV /dt in (2.7.4), it follows that dh = −a 2gh (2.7.5) dt which is a differential equation in h. In particular, this nonlinear equation predicts, given a tank of a particular shape (as determined by A(h)) with a hole of area a, the behavior of the function h(t ) that describes the height of the water at time t . We explore this further in the following example. A(h)

Example 2.7.2 For a cylindrical tank of height 2 m and radius 0.3 m, ﬁlled to the top with water, how long does it take the tank to drain once a hole of diameter 4 cm is opened? Solution. In this situation, the cross sectional area A(h) of the tank at height h is constant because each is a circle of radius 0.3, so that A(h) = 0.09π . In addition, the area of the hole in square meters is a = π (0.02)2 = 0.0004π , and the gravitational constant is g = 9.8 m/s2 . Since we have already established that A(h)dh /dt = −a 2gh, we therefore conclude that h satisﬁes the equation √ dh = −0.0004π 19.6h 0.09π dt

178

First-order differential equations

h(t) 2.0

1.0

t 50

100

150

Figure 2.12 The√slope ﬁeld for dh /

dt = −0.019676 h.

Simplifying, it follows that √ dh = −0.019676 h dt

Separating variables, we have h −1/2 dh = −0.019676dt and upon integrating, it follows that 2h 1/2 = −0.019676t + C Thus, h(t ) = (C0 − 0.009838t )2 √ √ Because h(0) = 2, C0 = 2. Furthermore, with h(t ) = ( 2 − 0.009838t )2 , we can see that h(t ) = 0 when t = 143.75 sec, at which time the tank is empty. A plot of h(t ) conﬁrms precisely the behavior observed in the direction ﬁeld in ﬁgure 2.12.

Exercises 2.7 1. For a population P(t ) that exhibits logistic growth according to the general model P dP , P(0) = P0 = kP 1 − dt A (a) Determine the values of P (in terms of A and k) for which P is an increasing function. (b) Sketch by hand the direction ﬁeld for the differential equation, clearly indicating the role of the constant A in your sketch. (c) Determine the value(s) of P (in terms of A and k) for which P is increasing most rapidly, and justify your answer.

Applications of nonlinear ﬁrst-order differential equations

179

(d) Solve the initial-value problem explicitly for P to show that A P(t ) = 1 + Me −kt and determine M in terms of A and P0 . 2. The growth of an animal population is governed by the equation 500 dP = 50 − P P dt where P(t ) is the number of animals in the herd at time t . The initial population is known to be 125. Determine the solution P(t ), sketch its graph, and decide whether there will ever be more than 125 or fewer than 50 animals present. 3. Consider the differential equation dP /dt = −0.02P 2 + 0.08P. (a) What are the equilibrium solutions to this equation? (b) Determine whether each equilibrium solution is stable or unstable. (c) At what value of P is the function growing most rapidly? (d) Under the initial condition P(0) = 0.25, determine the time at which P(t ) = 3. 4. Consider a ﬁsh population that grows according to the model dP = 0.05P − 0.000005P 2 dt where t is measured in years, and P is measured in thousands. (a) Determine the population of ﬁsh at time t if initially P(0) = 1000. What is the carrying capacity of the population? (b) Suppose that the ﬁsh population is established as growing according to the above model in the absence of ﬁsh being removed from the lake. Suppose that harvesting begins at a rate of 20 000 ﬁsh per year. How does the differential equation governing the ﬁsh population change? Explain. (c) Plot a direction ﬁeld for the updated differential equation you found in part (b). Discuss the new equilibrium solutions for the ﬁsh population. Can you solve the IVP with P(0) = 1000? (d) How would the DE change if wildlife biologists began planting 30 000 ﬁsh per year in the lake, and no harvesting occurred? 5. Solve the initial-value problem dP = 6 − 7P + P 2 , P(0) = 2 dt Sketch your solution curve P(t ) and explain why it makes sense in light of the equilibrium solutions to the given equation and your understanding of where dP /dt is positive and negative.

180

First-order differential equations

6. A cruise ship leaves port with 2500 vacationers aboard. At the time the boat leaves the dock, ten recent visitors of an amusement park are sick with the ﬂu. Let S(t ) denote the number of people at time t who have had the ﬂu at some time since leaving port. (a) Assuming that the rate at which the ﬂu virus spreads is directly proportional to the product of the number of people who have had the ﬂu times the number of people not yet infected, write a differential equation whose solution is the function S(t ). Explain why the differential equation is a logistic equation. (b) Solve the differential equation you found in (a). Assume that four days into the trip, 150 people have been sick with the ﬂu. Clearly show how all constants are identiﬁed, and sketch a graph of your solution curve. (c) How many people have been sick seven days into the trip? How long would the boat have to stay at sea for half the vacationers to get ill? 7. A cylindrical tank of height 4 m and radius 1 m is full of water. A small hole of diameter 1 cm is opened in the bottom of the tank. Use Torricelli’s law to determine how long it will take for all the water to drain from the tank. 8. A cylindrical tank of height 1.2 m and radius 30 cm is originally full of water. A small hole is opened in the bottom of the tank, and after 15 min, the water in the tank has dropped 10 cm. According to Torricelli’s law, how large is the hole and how long will it take the tank to drain? √ 9. Consider a tank that is generated by taking the curve x = y and revolving it about the y-axis. Assume that the tank is full of water to a depth of 1.2 m and that a hole of diameter 1 cm is opened in the bottom. Use Torricelli’s law to determine how long it will take for all the water to drain from the tank. 10. Suppose a hemispherical bowl has top radius of 30 cm and at time t = 0 is full of water. At that moment a circular hole with diameter 1.2 mm is opened in the bottom of the tank. Use Torricelli’s law to determine how long it will take for all the water to drain from the tank. 11. For an open cylindrical tank, Torricelli’s law tells us that if a small hole is opened, the height of the water at time t obeys the IVP √ dh = −k h , h(t0 ) = h0 dt where k is a constant that depends on the radius of the tank and the radius of the hole. In this exercise, we will take k = 1. (a) Explain why theorem 2.2.1 does not guarantee a unique solution to the IVP √ dh = − h , h(1) = 0 dt (b) Explain why it is physically impossible to determine the height of the water at time t < 1 in a tank which satisﬁes h(1) = 0.

For further study

(c) Show that for any c < 1, the function ! 2 1 c − 21 t 2 h(t ) = 0

181

if t < c if t ≥ c

is a solution to the IVP in (a). (d) Explain how the result of (c) can be interpreted physically in light of the time when the tank becomes empty. Compare your ﬁndings to those in (a) and (b). 2.8 For further study 2.8.1 Converting certain second-order DEs to ﬁrst-order DEs

Linear second-order differential equations such as y + p(t )y + q(t )y = f (t )

(2.8.1)

will be the focus of upcoming work in chapters 3 and 4. But there are some second-order equations we can solve at present. For example, if q(t ) = 0 in (2.8.1), then we can perform a process called reduction of order to convert the equation to a ﬁrst-order one. (a) Consider the second-order equation y + p(t )y = f (t ). Using the substitution u = y , convert the equation to a new ﬁrst-order DE involving the function u. (b) Use a standard solution technique to state the solution u to the differential equation in (a) in terms of p(t ) and f (t ). (Your answer will involve integrals.) (c) Explain how you would use your result in (b) to ﬁnd the solution y to the original DE. (d) Use reduction of order to solve each of the following second-order IVPs. (i) y + 2y = 4,

y(0) = 2,

(ii) y + tan(t )y = t , (iii) y + (iv) y +

2t y = t 2, 1+t 2 1 4−t y = 4 − t ,

y (0) = 1

y(0) = 1,

y (0) = 0

y(0) = 0,

y (0) = 1

y(0) = 1,

y (0) = 1

(e) Reduction of order can be performed on certain nonlinear differential equations as well. For instance, suppose that we have an equation of form y = g (y )h(t )

(2.8.2)

Show that the substitution u = y converts (2.8.2) to a ﬁrst-order equation in u. Explain how you would approach solving the new equation in u.

182

First-order differential equations

(f) Solve each of the following second-order IVPs. (i) y = (y )2 t 2 , (ii) y =

t + t (y )2 , y

(iii) y = e 2t +y , (iv) y =

y(0) = 1,

y ,

y (0) = 0

y(0) = 2,

y(0) = 0, y(0) = 3,

y (0) = 1

y (0) = 0 y (0) = 5

2.8.2 How raindrops fall

The following questions and discussion are based on the article “Falling Raindrops” by Walter J. Meyer7 . When a raindrop falls, various forces act upon it. We explore several different models that show the importance of adjusting assumptions appropriately to match physical conditions. Let us ﬁrst assume that the only force acting upon the raindrop is the acceleration due to gravity. Under this assumption, Galileo (1564–1642) hypothesized that the falling raindrop would gain an extra 32 ft/s in velocity for every second for which it falls. In other words, the acceleration of the raindrop is constant and equal to 32 ft/sec2 . (a) Let y(t ) denote the distance (in feet) traveled by the rain drop after it has been falling for t seconds. Write an initial-value problem involving y(t ) based on the above assumption. Solve this IVP; be sure to introduce appropriate initial conditions based on the context of the problem. (b) Assuming that the raindrop starts from rest at an elevation of 3000 ft, how long does it take the raindrop to fall to earth? What is the raindrop’s velocity when it hits the ground? Why is this model unrealistic? (c) We next must attempt to account for the air resistance the raindrop encounters through a slightly more sophisticated model. For a raindrop having diameter d ≤ 0.00025 ft, this model, sometimes known as Stoke’s law, states that the acceleration of the raindrop due to gravity is opposed by an acceleration directly proportional to the velocity of the raindrop at that instant. Suppose that the constant of proportionality is given by c /d 2 , where c ≈ 3.29 × 10−6 ft2 /s is an experimentally determined constant. Write a new IVP (again involving y(t ) and its relevant derivatives) for the raindrop having diameter d. Do not yet attempt to solve this equation. Leave d as an unknown constant. (d) Letting v = y and using the fact that the raindrop starts from rest, convert the IVP in (c) to a ﬁrst-order IVP involving v. Using d = 0.00012 ft (which can be considered a drizzle), produce a slope ﬁeld corresponding to the 7

See Applications of Calculus, MAA Notes Number 29, pp. 101–111.

For further study

183

differential equation in v. On this slope ﬁeld, sketch a graphical approximation of the solution to the stated IVP. Describe the behavior of the raindrop’s velocity based on the slope ﬁeld you constructed in the problem above. (e) In the model in (d), we will say that the long-term limiting velocity of the raindrop is its terminal velocity, denoted vterm . Calculate this terminal velocity by using the IVP to answer the following questions: What is the initial velocity of the raindrop? What is the equilibrium solution of the differential equation? What happens to the velocity of the raindrop if it ever reaches the equilibrium value? Why, in view of the differential equation, must the velocity of the raindrop increase from its initial value to the equilibrium value? (f) Use your result from (e) to determine the terminal velocities for raindrops having diameters of 0.00009, 0.00012, and 0.00015 ft, respectively. Graph vterm as a function of d, and comment on the phenomena observed. (g) Solve the IVP from (d) explicitly for v. Graph your solution, and then use your solution to calculate vterm as well. (h) Assuming that a raindrop of diameter 0.00012 ft starts from rest at 3000 ft, how long does it take the raindrop to fall to the ground? What is its velocity at the instant it hits the ground? Do your answers surprise you? Is it raining hard or barely raining when raindrops are this size? (i) When the diameter of the raindrop becomes too large, the force of air resistance on the raindrop becomes so appreciable that Stoke’s model loses accuracy as well. This leads to a third model, known as the velocity-squared model. This model states that when a raindrop has diameter d ≥ 0.004 ft, the acceleration due to gravity is opposed by an acceleration directly proportional to the square of the velocity of the raindrop at that instant. Here the constant of proportionality is given by k /d, where k ≈ 0.00046. (j) Repeat questions (c), (d), and (e) for the velocity-squared model. Compare your ﬁndings with those of Stoke’s model. For example, how do the terminal velocities of small raindrops compare with those of large raindrops? For which type of raindrop, small or large, does the terminal velocity increase more rapidly as a function of diameter? (k) Finally, explicitly solve the IVP arising from the velocity-squared model for the velocity function v(t ). Graph your solution v(t ) for an appropriate choice of d and compare the result to the results in (j). 2.8.3 Riccati’s equation

The Ricatti equation y + p(t )y + q(t )y 2 = f (t )

(2.8.3)

184

First-order differential equations

and its study are attributed to the Italian mathematician Jacobo Riccati (1667–1748). Observe that this nonlinear equation is a modiﬁcation of the standard linear ﬁrst-order equation y + p(t )y = f (t ). Through the following steps, we will use a change of variables to transform the Riccati equation into a linear, second-order differential equation. (a) We consider a change of variables to convert (2.8.3) from being a differential equation in y to a new equation in v. Let v be a function that satisﬁes the relationship v = q(t )y(t )v(t ) (i) Differentiate v = qyv with respect to t to show that v = (qyv) = q yv + qy v + qyv

(2.8.4)

(ii) Show that q yv = q v /q. (b) Multiply both sides of the Riccati equation (2.8.3) by qv and use (i) and (ii) to show that the left-hand side may be written q 2 2 vqy + vqpy + vq y = v + p − v (2.8.5) q (c) Use your work in (b) to show that the Riccati equation may now be re-expressed as the second-order equation in v given by q v + p − v − vqf = 0 (2.8.6) q (d) Explain how you would solve the Riccati equation in the special case when f (t ) = 0. Note particularly that to solve (2.8.6) with f (t ) = 0, you must reduce the order of the equation through an appropriate substitution, say u = v . See section 2.8.1 for further details on this technique. In addition, note that your goal is to ﬁnd the solution y to the original equation (2.8.3). Be sure to explain how the functions v and u are used in this process. (e) Solve the following differential equations, each of which is a Riccati equation. (i) y + 2y + 4y 2 = 0 (ii) y + 1t y + t 2 y 2 = 0 (iii) y + y tan t + y 2 cos t = 0 2.8.4 Bernoulli’s equation

The Bernoulli brothers, James (1654–1705) and John (1667–1748), contributed to the solution of y + p(t )y = q(t )y n , n = 1

(2.8.7)

For further study

185

the so-called Bernoulli equation. We will explore the approach credited to John through the following prompts. Similar to the Riccati equation, the Bernoulli equation may be transformed into a linear differential equation through a clever change of variables. (a) First, multiply (2.8.7) by y −n to obtain y −n y + p(t )y 1−n = q(t )

(2.8.8)

Next, consider the change of variables v = y 1−n . Compute v to show that v = (1 − n)y −n y

(2.8.9)

Now use (2.8.8) and (2.8.9) to show that v satisﬁes the linear ﬁrst-order equation (2.8.10) v + (1 − n)p(t )v = (1 − n)q(t ) (b) Explain why in the cases when n = 1, n = 2, q(t ) = 0, and p(t ) = 0 the Bernoulli equation reduces to familiar equations whose solutions are known. (c) Solve these differential equations, each of which is a Bernoulli equation. (i) y + 2y = ty 3 (ii) y + 1t y = 3y 3 (iii) y + y cot t = y 3 sin t

This page intentionally left blank

3 Linear systems of differential equations

3.1 Motivating problems

In section 1.1, we considered how the amount of salt present in a system of two tanks can be modeled through a system of differential equations. In that particular example, we assumed that the volume of solution in each tank (as seen in ﬁgure 3.1) remains constant and all inﬂows and outﬂows happen at the identical rate of 5 liter/min, and further that that the tanks are uniformly mixed so that the salt concentration in each is identical throughout each tank at a given time t . With the additional premises that the volume of solution in tank A is 200 liters and the independent inﬂow entering A carries water contaminated with 4g/liter of salt, we can develop a differential equation that models x1 (t ), the amount of salt (in grams) in tank A at time t . Likewise, by presuming that tank B holds solution of volume 400 liters and the inﬂow entering B carries a concentration of salt of 7g/liter, a similar analysis produces a differential equation whose solution is x2 (t ), the amount of salt (in grams) in tank B at time t . In particular, we found in (1.1.6) that the following system of differential equations arose: dx1 x1 x2 =− + + 20 dt 20 80 x1 x2 dx2 = − + 35 dt 40 40

(3.1.1)

With our experience in linear algebra, we can now represent this system in matrix notation. In particular, if we simultaneously consider the amounts of 187

188

Linear systems of differential equations

B

A

Figure 3.1 Two tanks with inﬂows, outﬂows,

and connecting pipes.

salt x1 (t ) and x2 (t ) as entries in the vector function x1 (t ) x(t ) = x2 (t ) we know that x (t ) =

dx1 /dt dx2 /dt

(3.1.2)

Moreover, in (3.1.1) we recognize the familiar form of a matrix product in the terms involving x1 and x2 . Speciﬁcally, −x1 /20 + x2 /80 1/80 x1 −1/20 = (3.1.3) 1/40 −1/40 x2 x1 /40 − x2 /40 With the observations from (3.1.2) and (3.1.3) substituted into (3.1.1) and replacing the quantities 20 and 35 with the appropriate vector, we may now write the system of differential equations in the form 1/80 20 −1/20 x = x+ (3.1.4) 1/40 −1/40 35 Letting A be the matrix of coefﬁcients that multiplies the vector x and b the vector [20 35]T , we can also write the system in (3.1.4) in the simpliﬁed form x = Ax + b

(3.1.5)

This form reminds us of the familiar nonhomogeneous linear ﬁrst-order differential equation with constant coefﬁcients, for instance, an equation such as y = 2y + 5

(3.1.6)

In this chapter, we will study similarities between (3.1.5) and (3.1.6) with the speciﬁc goal of learning how to completely solve nonhomogeneous linear systems of differential equations with constant coefﬁcients such as the system (3.1.4). We will be especially interested in the role that linear algebra plays in identifying certain characteristics of the coefﬁcient matrix A that enable us to ﬁnd all solutions to the system. Before we proceed to an in-depth study of linear systems of differential equations, at least one more motivating example is appropriate. A spring-mass system

Motivating problems

y

189

y

−y(t)

−y(t)

displacement −y(t) t

equilibrium

mass

t

Figure 3.2 A spring-mass system shown at two different points in time; −y(t ) denotes

the displacement of the mass from equilibrium (where displacements below the t -axis are considered positive).

is a physical situation that models vibrations; for example, such a system arises any time a mass attached to a spring is set in motion. We choose to envision this situation vertically, as seen in ﬁgure 3.2, though one can also imagine the mass resting on a table and moving horizontally. We consider some of the physics of basic springs and motion under the inﬂuence of gravity in order to develop a differential equation that describes the spring-mass system. Initially, the mass will stretch the spring from its natural length. Hooke’s law states that the force necessary to stretch a spring a distance x from its natural length is given by the equation F (x) = kx where k is the spring constant. Assume that the mass stretches the spring a distance L0 . Then from Hooke’s law, when the system is in equilibrium, we see that the force Fs exerted by the spring must be Fs = −kL0 Here the minus sign indicates that the force is opposing the natural downward displacement of the spring. Note particularly that we view the downward direction as positive. We also know that gravity acts on the mass with force Fg given by Fg = mg If the system is in static equilibrium, we know that the sum of the two forces is zero. In other words, Fg + Fs = 0 and therefore mg = kL0 Once the system is set in motion by some initial force or displacement, we track the location of the mass at time t with a function y(t ). In particular, y(t ) represents the displacement of the mass from the equilibrium position at time t ; note that y = 0 is the equilibrium position of the system. We continue to

190

Linear systems of differential equations

designate the downward direction as positive, so y(t ) > 0 means that the mass is below the equilibrium position, while y(t ) < 0 means the mass is above the equilibrium position. We can see the role y(t ) plays in ﬁgure 3.2 as it tracks the displacement of the mass from equilibrium and thus traces out a curve with respect to time. We can now use Newton’s second law to obtain a differential equation that governs the system. The forces that act on the mass are: • Gravity, with Fg = mg . • The spring force Fs . Note now that at a given time t the displacement of the spring from its natural length is L0 + y(t ), so that by Hooke’s law we have Fs = −k(L0 + y). • A possible damping force Fd . Motion may be damped due to air resistance, friction, or some sort of external damping system (usually called a dashpot). We assume that damping forces are directly proportional to the velocity of the mass. Under this assumption, it follows that Fd = −cy . Again, the minus sign indicates that this force opposes the motion of the mass. The positive constant c is called the damping constant. • Finally, there may be an external driving force present (such as the periodic force that drives a piston in an engine). We call this a forcing function F (t ); the role of forcing functions will be considered in detail later on in this chapter. Newton’s second law demands that the resultant force (that is, the sum of all the forces) on the mass must be equal to ma, where a is the body’s acceleration (which is also y ). Summing all the aforementioned forces and equating the result with ma = my , we ﬁnd my = Fg + Fs + Fd + F (t )

(3.1.7)

Using the formulas we developed earlier and substituting in (3.1.7) yields my = mg − k(L0 + y) − cy + F (t )

(3.1.8)

Now recall that mg − kL0 = 0, rearrange (3.1.8), and divide by m. This leads us to the standard form of the differential equation that governs a spring mass system, c k 1 (3.1.9) y + y = F (t ) m m m Note that (3.1.9) is a nonhomogeneous linear second-order differential equation. To see how such a second-order linear differential equation is linked to a system of linear differential equations, let’s consider the speciﬁc example where c = 1, m = 1, k = 6, and F (t ) = 0, which results in the equation y +

y + y + 6y = 0

(3.1.10)

The eigenvalue problem revisited

191

If we introduce the functions x1 and x2 through the substitutions y = x1 and y = x2 , then x1 (t ) represents the displacement of the mass at time t and x2 (t ) is the velocity of the mass at time t . Observe ﬁrst that x1 = x2 Moreover, since Equivalently,

x2

=

y ,

we can rewrite (3.1.10) as

(3.1.11)

x2

+ x2 + 6x1 = 0.

x2 = −6x1 − x2

(3.1.12)

Thus (3.1.11) and (3.1.12) generate the system of differential equations x1 = x2

x2 = −6x1 − x2

which may also be expressed in matrix form as 0 1 x x = −6 −1

(3.1.13)

(3.1.14)

We have therefore shown that the linear second-order differential equation (3.1.9) that describes a spring-mass system may be converted to the system of linear ﬁrst-order equations (3.1.14) through the substitution x1 = y, x2 = y . In fact, any linear higher order differential equation may be converted through a similar substitution to a system of linear ﬁrst-order equations. Therefore, by learning to understand and solve systems of linear equations, we will be able to determine the behavior of higher order linear equations as well. It is this fact that motivates us to study systems of linear equations prior to the study of higher order single equations.

3.2 The eigenvalue problem revisited

As we begin our study of linear systems of ﬁrst-order differential equations, we are ultimately interested in two main questions: the ﬁrst asks, for a linear system x = Ax such as 2 3 x = x 2 1 how can we explicitly solve the system for x(t )? In addition, what is the longterm behavior of the solution x(t ) to such a system? How does its graph appear? We start our investigation by thinking carefully about the meaning of the matrix equation x = Ax and compare our experience with the single ﬁrst-order differential equation x = ax. Note that we naturally begin with the homogeneous system x = Ax; later we will consider nonhomogeneous systems of the form x = Ax + b. In every case, we seek a vector function x(t ) that solves the given system. An elementary example is instructive.

192

Linear systems of differential equations

Example 3.2.1

Solve the linear system x = Ax, where 0 −3 A= 0 −1

Explain the role that the eigenvalues and eigenvectors of A play in the general solution, and graph and discuss the solution curves for different choices of initial conditions. Solution.

First, we observe that the system x1 0 0 x1 −3 −3 x = x = = 0 −1 0 −1 x2 x2

(3.2.1)

tells us that we seek two functions x1 (t ) and x2 (t ) such that x1 = −3x1 and x2 = −x2 . Because the matrix of the system is diagonal, the problem is especially simple. In particular, the system is uncoupled, which means that the differential equation for x1 does not involve x2 and the equation for x2 does not involve x1 . From our experience with linear ﬁrst-order equations, we know that the general solution to x1 = −3x1 is x1 (t ) = c1 e −3t and that the solution to x2 = −x2 is x2 (t ) = c2 e −t . Writing the solution to the system as a single vector, we have −3t x c e x = 1 = 1 −t (3.2.2) x2 c2 e Rewriting x in another form sheds further insight on the key components of this solution. Writing x as the sum of two vectors, we ﬁnd −3t 0 c1 e −3t 1 −t 0 x= (3.2.3) + = c1 e + c2 e c2 e −t 0 1 0 Here, we can make a key observation about the eigenvalues and eigenvectors of A: because A is diagonal, its eigenvalues are its diagonal entries, λ1 = −3 and λ2 = −1. Moreover, its corresponding eigenvectors may be easily conﬁrmed to be the vectors 1 0 and v2 = v1 = 0 1 Thus, in (3.2.3), we see the interesting fact that the solution has the form x = c1 e λ1 t v1 + c2 e λ2 t v2 ; the eigenvalues and eigenvectors therefore play a central role in the system’s behavior. Finally, we explore the solutions to several related initial-value problems for select initial conditions. If we have the initial condition x(0) = [4 0]T , we see in (3.2.3) that c1 = 4 and c2 = 0, so that the solution to the IVP is −3t 1 x(t ) = 4e 0 Two key observations can be made about this solution curve: one is that its graph is a straight line, since for every value of t , x is a scalar multiple of the

The eigenvalue problem revisited

193

vector [1 0]T . Note particularly that the direction of this line is given by the eigenvector corresponding to λ1 = −3. The other important fact is that e −3t → 0 as t → ∞, and therefore x(t ) → 0, so that the solution approaches the origin as time increases without bound. For the initial condition x(0) = [0 5]T , it follows from (3.2.3) that c1 = 0 and c2 = 5, and thus the solution to this IVP is −t 0 x(t ) = 5e 1 Similar observations about the behavior of this solution may be made to those noted above for the ﬁrst chosen initial condition: this solution curve is linear and approaches the origin as t → ∞. Finally, if we consider an initial condition that does not correspond to an eigenvector of the system, such as x(0) = [4 5]T , (3.2.3) tells us that c1 = 4 and c2 = 5, and thus −3t 1 −t 0 x = 4e + 5e 0 1 This last solution’s graph is not a straight line. As seen in ﬁgure 3.3, which shows the three different solutions based on the differing initial conditions, we see the consistent behavior that every solution tends to the origin as t → ∞, as well as that the eigenvectors play a key role in how these graphs appear. We will discuss this graphical perspective further in sections 3.4 and 3.5. The long-term behavior of the solutions to the system (3.2.1) in example 3.2.1 suggests that every solution tends to the zero vector. In fact, the origin itself is a solution, a so-called constant or equilibrium solution. That is, if x2 solution through (0,5) 5

solution through (4,5)

solution through (4,0) x1 5 Figure 3.3 Plots of solutions to three IVPs for

the system in example 3.2.1. Arrows indicate the direction of ﬂow along the solution curve as time increases.

194

Linear systems of differential equations

we consider whether there is any constant vector x that is a solution to x = Ax, it follows that x = 0, and thus x must satisfy Ax = 0. From our work with homogeneous linear equations, we know that x = 0 is always a solution to this equation, and thus the zero vector is a constant solution to every homogeneous linear system of ﬁrst-order differential equations. In sections 3.4 and 3.5 we will investigate the so-called stability of this equilibrium solution. There is a second perspective from which we can see how eigenvectors and eigenvalues arise in the solution of linear systems of differential equations. After constant solutions, the next simplest type of solutions to such a system are straight-line solutions. In other words, solutions whose graph is a straight line in space form a particularly important type of solution to a system. In the preceding example, we saw two such straight-line solutions: each occurred in the direction of an eigenvector and passed through the origin. In search of a general straight-line solution to x = Ax, we know that any such solution must have the form x(t ) = f (t )v, where f (t ) is a scalar function and v is a constant vector. This form guarantees that x(t ) traces out a path that is a straight line through 0 in the direction of v. In order for x(t ) to satisfy the system, we observe that since x (t ) = f (t )v, the equation f (t )v = A(f (t )v)

(3.2.4)

must hold. Moreover, since f (t ) is a scalar, the linearity of matrix multiplication allows us to rewrite (3.2.4) as f (t )v = f (t )Av

(3.2.5)

Equation (3.2.5) is strongly reminiscent of the equation we use to deﬁne eigenvalues and eigenvectors: Ax = λx. In fact, if f (t ) = λf (t ), then (3.2.5) implies that λf (t )v = f (t )Av

Further, if f (t ) = 0, then λv = Av, and λ and v must be an eigenvalueeigenvector pair of A. It is therefore natural for us to want f to satisfy the single differential equation f (t ) = λf (t ). From our work in chapter 2, we know that f (t ) = Ce λt is the general solution to this equation. Substituting this form for f in (3.2.5), we now observe that λe λt v = e λt Av

(3.2.6)

and since e λt is never zero, we can simplify (3.2.6) to λv = Av

(3.2.7)

which is satisﬁed precisely when v is an eigenvector of A with corresponding eigenvalue λ. Our most recent work has demonstrated that if x(t ) is a function of the form x(t ) = e λt v that is a solution to x = Ax, then (λ, v) is an eigenpair of the coefﬁcient matrix A. In fact, the converse also holds (as will be shown in

The eigenvalue problem revisited

195

the exercises), so that the following result is true for any n × n system of linear ﬁrst-order differential equations. Theorem 3.2.1 Let A be an n × n matrix. The vector function x(t ) = e λt v is a solution to the homogeneous linear system of ﬁrst-order differential equations given by x = Ax if and only if v is an eigenvector of A with corresponding eigenvalue λ. We close this section with one more example to demonstrate theorem 3.2.1 and one of its important consequences. Example 3.2.2 Consider the system of differential equations given by x1 = −2x1 − 2x2 x2 = −4x1 Write the system in the form x = Ax and show that A has two real eigenvalues with corresponding linearly independent eigenvectors. Verify by substitution that for each eigenvalue-eigenvector pair, x(t ) = e λt v is a solution of the system. In addition, show that any linear combination of such solutions is also a solution to the system. Solution. First, we observe that the system can be expressed in the form x = Ax by using the matrix −2 −2 A= −4 0 We brieﬂy review the process of determining the eigenvalues and eigenvectors of a matrix A; in most future occurrences, we will use Maple to determine this information using the commands introduced in section 1.10.2. Since the eigenvalues are the roots of the characteristic equation, we solve det(A − λI) = 0. Doing so, 0 = det(A − λI) −2 − λ −2 = −λ(−2 − λ) − 8 = det −4 −λ = λ2 + 2λ − 8 = (λ + 4)(λ − 2)

so the eigenvalues of A are λ = −4 and λ = 2. To ﬁnd the eigenvector v that corresponds to λ = −4, we solve the equation (A − (−4I))v = 0. Row-reducing the appropriate augmented matrix yields 2 −2 0 1 −1 0 → 0 0 0 −4 4 0

196

Linear systems of differential equations

which shows that a corresponding eigenvector is any scalar multiple of the vector v1 = [1 1]T . Similar computations show that for λ = 2, a corresponding eigenvector is v2 = [1 − 2]T . We now verify directly what theorem 3.2.1 guarantees: that x1 (t ) = e −4t [1 1]T and x2 (t ) = e 2t [1 − 2]T are solutions to the given system of equations. Observe ﬁrst that 1 (3.2.8) x1 (t ) = −4e −4t 1 and that

1 −2 −2 −4t 1 −4t −2 −2 Ax1 (t ) = e =e 1 −4 0 −4 0 1 1 −4 = e −4t = −4e −4t 1 −4

(3.2.9)

Equations (3.2.8) and (3.2.9) conﬁrm that indeed x1 (t ) = Ax1 (t ) and demonstrate the role that eigenvalues and eigenvectors play in the solution. Similarly, for the function x2 (t ), 1 x2 (t ) = 2e 2t −2 and

1 1 −2 −2 2t 2t −2 −2 e =e Ax2 (t ) = −4 0 −2 −4 0 −2 2 1 = e 2t = 2e 2t −4 −2

(3.2.10)

This shows that x2 (t ) = Ax2 (t ). Finally, we are asked to show that any linear combination of x1 (t ) and x2 (t ) is also a solution to the differential equation. While we could conﬁrm this somewhat laboriously through direct computations, it is much easier to work more generally and consider known properties of differentiation and matrix multiplication. In particular, differentiation is a linear operator and we know that if we let y(t ) = c1 x1 (t ) + c2 x2 (t ) it follows that y (t ) = (c1 x1 (t ) + c2 x2 (t )) = c1 x1 (t ) + c2 x2 (t )

(3.2.11)

Similarly, matrix multiplication is a linear process, so Ay(t ) = A(c1 x1 (t ) + c2 x2 (t )) = c1 Ax1 (t ) + c2 Ax2 (t ) x1 (t )

Since we have already established that = Ax1 (t ) and follows that c1 x1 (t ) + c2 x2 (t ) = c1 Ax1 (t ) + c2 Ax2 (t )

x2 (t )

(3.2.12)

= Ax2 (t ), it

so by (3.2.11) and (3.2.12) we have shown that y (t ) = Ay(t ) and thus indeed every linear combination of x1 (t ) and x2 (t ) is also a solution to x = Ax.

The eigenvalue problem revisited

197

Example 3.2.2 provides the foundation for much of our study of linear systems of differential equations. It shows that when we can ﬁnd real eigenvalues and eigenvectors, these lead us directly to solutions of the system. In addition, any linear combination of such solutions is also a solution to the system; we state this formally in the next theorem. Theorem 3.2.2 If (λ1 , v1 ), (λ2 , v2 ), . . . , (λk , vk ) are eigenpairs of an n × n matrix A and c1 , . . . , ck are any scalars, then x(t ) = c1 e λ1 t v1 + c2 e λ2 t v2 + · · · + ck e λk t vk is a solution to x = Ax. In upcoming sections, we will determine whether we have found all of the solutions to a given system, address some subtle issues that arise when we cannot ﬁnd enough real eigenvalues and eigenvectors, and better understand the graphical and long-term behavior of solutions. The exercises in this section will help further illuminate the roles of eigenvalues and eigenvectors as well as some of the issues that arise when there is an insufﬁcient number of real eigenvectors for a given system’s matrix. Exercises 3.2 In exercises 1–7, compute by hand the eigenvalues and eigenvectors of the given matrix. 1 4 1. A = 2 3 0 4 2. A = 1 0 0 3 3. A = 3 8 2 2 4. A = −1 −1 ⎤ ⎡ 2 2 0 5. A = ⎣1 2 1⎦ 1 2 1 ⎡ ⎤ 3 0 1 0⎦ 6. A = ⎣0 2 5 0 −1 ⎡ ⎤ 2 1 0 7. A = ⎣0 2 1⎦ 0 0 2

198

Linear systems of differential equations

8. Consider the system of differential equations given by x1 = −2x1 + 3x2 x2 = x1 − 4x2 (a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant (equilibrium) solutions to x = Ax. (c) Compute the eigenvalues and eigenvectors of A. (d) Determine all straight-line solutions to x = Ax. (e) Find a more general solution to x = Ax by taking all possible linear combinations of the straight-line solutions from (d). (f) Solve the initial-value problem x = Ax, x(0) = [1 2]T . Discuss the graphical behavior of this solution. 9. Consider the system of differential equations given by x1 = −x1 + 2x2 x2 = −7x1 + 8x2 (a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant solutions to x = Ax. (c) Compute the eigenvalues and eigenvectors of A. (d) Determine all straight-line solutions to x = Ax. (e) Find a more general solution to x = Ax by taking all possible linear combinations of the straight-line solutions from (d). (f) Solve the initial-value problem x = Ax, x(0) = [−2 0]T . Discuss the graphical behavior of this solution. 10. Consider the system of differential equations given by x1 = 2x1 + 3x2 x2 = −4x2 (a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant solutions to x = Ax. (c) Compute the eigenvalues and eigenvectors of A. (d) Determine all straight-line solutions to x = Ax. (e) Find a more general solution to x = Ax by taking all possible linear combinations of the straight-line solutions from (d). (f) Explain how you could ﬁnd this same general solution without determining eigenvalues and eigenvectors. (Hint: focus on x2 (t ) ﬁrst.) (g) Solve the initial-value problem x = Ax, x(0) = [0 1]T . Discuss the graphical behavior of this solution.

The eigenvalue problem revisited

199

11. Consider the system of differential equations given by x1 = −2x1 + x2 x2 = −2x2 (a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant solutions to x = Ax. (c) Compute the eigenvalues and eigenvectors of A. (d) Determine all straight-line solutions to x = Ax. (e) Find a more general solution to x = Ax by taking all possible linear combinations of your straight-line solutions from (d). (f) Attempt to solve the initial-value problem x = Ax, x(0) = [1 1]T . What does this tell you about the proposed general solution in (e)? 12. Consider the system of differential equations given by x1 = 2x1 + 9x2 x2 = −x1 − 2x2 (a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant solutions to x = Ax. (c) Compute the eigenvalues and eigenvectors of A. (d) Are there any straight-line solutions to x = Ax. Why or why not? 13. Consider the system of differential equations given by x1 = −3x1 + x2 x2 = 3x1 − x2 (a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant solutions to x = Ax. Compare and contrast your ﬁndings with preceding exercises. (c) Compute the eigenvalues and eigenvectors of A. (d) Determine all straight-line solutions to x = Ax. How many such solutions exist? (e) Find a more general solution to x = Ax by taking all possible linear combinations of your straight-line solutions from (d). (f) Solve the initial-value problem x = Ax, x(0) = [3 0]T . Discuss the graphical behavior of this solution. 14. Consider the system of differential equations given by x1 = 3x1 + x2 + x3 x2 = x1 + 3x2 + x3 x3 = x1 + x2 + 3x3

200

Linear systems of differential equations

(a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant solutions to x = Ax. (c) Compute the eigenvalues and eigenvectors of A. (d) Determine all straight-line solutions to x = Ax. (e) Find a more general solution to x = Ax by taking all possible linear combinations of your straight-line solutions from (d). (f) Solve the initial-value problem x = Ax, x(0) = [1 1 1]T . Discuss the graphical behavior of this solution. 15. Consider the system of differential equations given by x1 = 8x1 − x2 − 11x3 x2 = 18x1 − 3x2 − 19x3 x3 = 2x1 − x2 − 5x3 (a) Determine a matrix A so that the system may be written in the form x = Ax. (b) Determine all constant solutions to x = Ax. (c) Compute the eigenvalues and eigenvectors of A. (d) Determine all straight-line solutions to x = Ax. (e) Find a more general solution to x = Ax by taking all possible linear combinations of your straight-line solutions from (d). (f) Solve the initial-value problem x = Ax, x(0) = [1 1 1]T . Discuss the graphical behavior of this solution. Recall from section 3.1 that a second-order linear differential equation whose solution is y(t ) may be converted to a system of ﬁrst-order linear equations whose solution is x = [x1 x2 ]T through the substitution x1 = y, x2 = y . See, for example, the discussion following (3.1.10). In exercises 16–22, convert each given higher order differential equation to a system of ﬁrst-order equations through an appropriate substitution. 16. y − 4y = 0 17. y + y − 12y = 0 18. y + y + y = 0 19. y − 2y − 8y = e t 20. y + 3y + 3y + y = 0 21. y − 6y + 5y = 0 22. y (4) + 2y − 5y + y − 9y = 0 In sections 1.1 and 3.1, we showed how two connected tanks containing a solute lead to a system of linear ﬁrst-order differential equations. In exercises 23–26,

The eigenvalue problem revisited

201

set up, but do not solve a system of differential equations or initial-value problem whose solution would give the amount of salt in each tank at time t . Write each system in matrix form. 23. A system of two tanks is connected in such a way that each of the tanks has an independent inﬂow that delivers salt solution to it, each has an independent outﬂow (drain), and each tank is connected to the other with an outﬂow and an inﬂow. The relevant information about each tank is given in the table below. Tank A

Tank B

100 liters

200 liters

5 liters/min

9 liters/min

7 g/liter

3 g/liter

4 liters/min

10 liters/min

to B: 3 liters/min

to A: 2 liters/min

Tank volume Rate of inﬂow to the tank Concentration of salt in inﬂow Rate of drain outﬂow Rates of outﬂows to other tank

24. Suppose that in exercise 23 all of the given information remains the same except for the fact that instead of saltwater ﬂowing into each tank, pure water ﬂows in; that is, the concentration of salt in the entering solution is 0 g/liter for each tank. 25. In a closed system of two tanks (i.e., one for which there are no input ﬂows and no output ﬂows), the following information is given. Tank A is ﬁlled with 100 liters of solution whose initial concentration is 0.25 g/liter. Tank B is ﬁlled with 50 liters of solution whose initial concentration is 3 g/liter. The two tanks are connected with two pipes having ﬂows in opposite direction; mixed solution from Tank A ﬂows to Tank B at a rate of 4 liters/min. Similarly, mixed solution ﬂows from Tank B to Tank A at a rate of 4 liters/min. 26. In a closed system of three tanks (i.e., one for which there are no input ﬂows and no output ﬂows), the following information is given. Tank A

Tank B

Tank C

100 liters

150 liters

125 liters

Rates of outﬂows to other tanks

to B: 3 liters/min

to C: 1 liter/min

to A: 4 liters/min

Rates of outﬂows to other tanks

to C: 4 liters/min

to A: 3 liters/min

to B: 1 liter/min

Tank Volume

202

Linear systems of differential equations

Tank A is ﬁlled with 100 liters of solution whose initial concentration is 8 g/liter. Tank B is ﬁlled with 150 liters of solution whose initial concentration is 3 g/liter. Tank C is initially ﬁlled with 125 liters of pure water. The three tanks are connected with pipes having ﬂows in opposite directions; ﬂow rates are given in the table above. 27. Show that if (λ, v) is an eigenpair of the matrix A, then x(t ) = e λt v is a solution to the homogeneous system of linear differential equations given by x = Ax. 3.3 Homogeneous linear ﬁrst-order systems

In preceding sections, we have encountered examples of systems of two (or three) linear differential equations in two (or three) unknown functions. More generally, a linear system of n differential equations in n unknown functions (or simply, a linear system) is a collection of differential equations for which we seek unknown functions x1 (t ), . . . , xn (t ) when given n equations with coefﬁcient functions aij (t ) and bi (t ) in the form dx1 = a11 (t )x1 + a12 (t )x2 + · · · + a1n (t )xn + b1 (t ) dt dx2 = a21 (t )x1 + a22 (t )x2 + · · · + a2n (t )xn + b2 (t ) dt .. .. . . dxn = an1 (t )x1 + an2 (t )x2 + · · · + ann (t )xn + bn (t ) dt It will be convenient to write the above system in matrix form. If we let x denote the vector function whose entries are x(t ) = [xi (t )], A(t ) the n × n matrix of functions whose entries are A = [aij (t )], and b(t ) the vector of functions whose entries are b = [bi (t )], then the above system can be rewritten simply as x (t ) = A(t )x(t ) + b(t )

(3.3.1)

In much of our work, we will suppress the independent variable t and write x = Ax + b. Moreover, it will most often be the case that, as in examples 3.2.1 and 3.2.2, the matrix A has all constant entries. Indeed, from this point on, unless otherwise noted, we will assume the matrix A has constant entries. In the event that b = 0, we say that the linear system is homogeneous. If b is nonzero, the system is nonhomogeneous. We have already encountered in theorems 3.2.1 and 3.2.2 the important facts that for any homogeneous ﬁrst-order linear system x = Ax, every solution of the form x(t ) = e λt v requires (λ, v) to be an eigenpair of A, and that any linear combination of such solutions is also a solution to the system. Just as with individual differential equations, to each system of equations we can associate an initial-value problem. Using the matrix notation (3.3.1), if

Homogeneous linear ﬁrst-order systems

203

we assume that we also have the initial condition x(t0 ) = x0 , then we have the standard initial-value problem x (t ) = A(t )x(t ), x(t0 ) = x0 (3.3.2) We next consider a theoretical result (whose proof we omit) that will frame our overall work with systems. The following theorem is analogous to the earlier result we encountered in theorem 2.2.1 regarding the existence of a unique solution to the initial-value problem associated with a single ﬁrst-order differential equation. Theorem 3.3.1 In (3.3.2), let the entries of the matrix A(t ) be continuous functions on a common interval I that contains the value t0 . Then there exists a unique solution x(t ) to (3.3.2) on the interval I . In particular, we note that in examples where the matrix A has constant coefﬁcients, the entries are continuous functions, so that the IVP x = Ax, x(0) = x0 is guaranteed to have a unique solution. We now examine this result more closely through a particular example, revisiting a problem we considered in the preceding section. Example 3.3.1 Determine the unique solution to the IVP given by −2 −2 −5 x = x , x(0) = 3 −4 0

(3.3.3)

Solution. We note, by theorem 3.3.1, that a unique solution exists. Moreover, from our work in example 3.2.2, every function of the form 1 1 x(t ) = c1 e −4t (3.3.4) + c2 e 2t 1 −2 is a solution to the system x = Ax. We now explore whether we can ﬁnd constants c1 and c2 in order that the function x(t ) will satisfy the given initial condition in (3.3.3). The initial condition in (3.3.3) and (3.3.4) together imply 1 −5 0 1 0 = x(0) = c1 e + c2 e 3 1 −2 or equivalently

1 1 −5 + c2 = c1 1 3 −2

(3.3.5)

We note that since the vectors [1 1]T and [1 − 2]T (which are eigenvectors of A) are linearly independent and span R2 , we are guaranteed a unique solution to (3.3.5). Row-reducing the system (3.3.5), we ﬁnd

1 0 − 73 1 1 −5 → 1 −2 3 0 1 − 83

204

Linear systems of differential equations

Thus, we have shown

7 −4t 1 8 2t 1 − e x(t ) = − e 1 − 2 3 3

is the unique solution to the given initial-value problem. One especially important observation from example 3.3.1 can be made regarding the point at which we solved for the constants c1 and c2 : we were guaranteed not only that a solution existed, but also that it was unique, due to the fact that two linearly independent eigenvectors of the 2 × 2 matrix A were present in the general solution (3.3.4). Indeed, if we imagine wanting to solve any similar IVP with the freedom to choose any initial vector x(0), it will be necessary that x(0) can be written as a linear combination of the vectors v1 and v2 , whenever the general solution has form x(t ) = c1 e λ1 t v1 + c2 e λ2 t v2 This situation is indicative of the general fact that for all 2 × 2 linear systems of DEs, we must have two parts to the general solution, in order to be able to uniquely determine the constants c1 and c2 . Note further that for the solutions x1 (t ) = e λ1 t v1 and x2 (t ) = e λ2 t v2 we encountered above, x1 (0) = v1 and x2 (0) = v2 are linearly independent and form a basis for R2 . This linear independence of the constant vectors v1 and v2 turns out to have an important analog in the linear independence of certain solutions to the system of differential equations. More generally, we can consider these same issues for an n × n homogeneous system. Because theorem 3.3.1 guarantees the existence of a unique solution to the corresponding IVP for every initial condition x(0) ∈ Rn , when we think about the structure of the general solution, it is natural to think this solution will have form x(t ) = c1 x1 (t ) + c2 x2 (t ) + · · · + cn xn (t ) where {x1 (0), x2 (0), . . . , xn (0)} form a basis for Rn . These observations, together with our earlier work in theorem 3.2.2 that showed that every linear combination of solutions to the general homogeneous linear system of DEs (3.3.1) is also a solution to (3.3.1), help explain why the set of all solutions to x = Ax, where A is a matrix with constant coefﬁcients, is a vector space of dimension n. We state this formally in the following result. Theorem 3.3.2 The set of all solution vectors to the homogeneous linear system x = Ax, where A is an n × n matrix with constant coefﬁcients, forms a vector space of dimension n. Theorem 3.3.2 shows us that in order to solve an n × n system of homogeneous ﬁrst-order DEs, we must ﬁnd n linearly independent solutions to the system. Said differently, the general solution to x = Ax will have form x(t ) = c1 x1 (t ) + c2 x2 (t ) + · · · + cn xn (t )

(3.3.6)

Homogeneous linear ﬁrst-order systems

205

where x1 (t ), . . . , xn (t ) are linearly independent functions. Thus, our search for the general solution to the system requires us to ﬁnd these n linearly independent functions x1 (t ), . . . , xn (t ). While we need to discuss in more detail what it means for vector functions (rather than constant vectors) to be linearly independent, we can ﬁrst note that we know by theorem 3.2.1 that when (λi , vi ) is an eigenpair of A, the function xi (t ) = e λi t vi is a solution to x = Ax. This fact, combined with theorem 3.3.2, implies the result depicted in theorem 3.3.3. Theorem 3.3.3 If A is an n × n matrix with n linearly independent eigenvectors v1 , v2 , . . . , vn , with corresponding eigenvalues λ1 , λ2 , . . . , λn (where the eigenvalues are not necessarily distinct), then the general solution to x = Ax is x(t ) = c1 e λ1 t v1 + c2 e λ2 t v2 + · · · + cn e λn t vn

(3.3.7)

The linear independence of v1 , . . . , vn guarantees that we can solve the IVP x = Ax, x(0) = x0 for every possible choice of x0 ∈ Rn , since we may write x0 = c1 v1 + c2 v2 + · · · + cn vn for a unique set of values c1 , . . . , cn . This shows that the general solution (3.3.7) indeed captures all possible solutions to the system. In our original study of the eigenvalue problem in section 1.10, we observed (and proved in one of the exercises) that eigenvectors corresponding to distinct (real1 ) eigenvalues are linearly independent. This yields an important consequence of theorem 3.3.3: if A has n distinct real eigenvalues, then A has n linearly independent (real) eigenvectors. In particular, the following corollary is true. Corollary 3.3.4 If A is an n × n matrix with n distinct real eigenvalues λ1 , λ2 , . . . , λn , then the corresponding eigenvectors v1 , v2 , . . . , vn are linearly independent and the general solution to x = Ax is x(t ) = c1 e λ1 t v1 + c2 e λ2 t v2 + · · · + cn e λn t vn

(3.3.8)

We now consider a speciﬁc example in which we see corollary 3.3.4 at work. Example 3.3.2 Determine the general solution to the homogeneous ﬁrst-order system of DEs x = Ax and determine the unique solution to the initial-value problem ⎤ ⎡ ⎡ ⎤ −4 1 −1 1 5⎦ x , x(0) = ⎣−2⎦ x = Ax = ⎣−1 −2 3 −3 3 0 1 We are interested in real solutions to the system x = Ax; when eigenvalues and eigenvectors are complex, additional work is needed. See section 3.5.

206

Linear systems of differential equations

Solution. We begin by computing the eigenvalues and eigenvectors of A. Using the Eigenvectors(A) command in Maple, we ﬁnd that the eigenvalues of A are λ1 = −6, λ2 = −3, λ3 = 3, with corresponding eigenvectors ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 1 v1 = ⎣−1⎦ , v2 = ⎣1⎦ , v3 = ⎣1⎦ 0 1 1 Since the eigenvalues of A are distinct, we know immediately that the corresponding eigenvectors are linearly independent, and therefore by corollary 3.3.4 that the general solution to the given system is ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 0 x(t ) = c1 e −6t ⎣−1⎦ + c2 e −3t ⎣1⎦ + c3 e 3t ⎣1⎦ (3.3.9) 0 1 1 To solve the IVP with

⎡

⎤ 1 x(0) = ⎣−2⎦ 3

we set t = 0 in (3.3.9) and apply the given condition, which leads to the vector equation ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 0 1 c1 ⎣−1⎦ + c2 ⎣1⎦ + c3 ⎣1⎦ = ⎣−2⎦ 0 1 1 3 Writing this equation in augmented matrix form and row-reducing shows that ⎡ ⎤ ⎡ ⎤ 1 1 0 1 1 0 0 2 ⎣−1 1 1 −2⎦ → ⎣0 1 0 −1⎦ 1 0 1 3 0 0 1 1 and, therefore, the solution to the IVP is ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 0 x(t ) = 2e −6t ⎣−1⎦ − e −3t ⎣1⎦ + e 3t ⎣1⎦ 0 1 1 From corollary 3.3.4, we know that if we have an n × n matrix A with n linearly independent real eigenvectors, then we can completely solve the system x = Ax. But what if A lacks n real linearly independent eigenvectors? While we will encounter this situation in more detail in section 3.5, here it is worthwhile to note that we will still be seeking n linearly independent solutions x1 (t ), . . . , xn (t ) to the general system. For these vector functions, the fundamental meaning of linear independence remains the same as it does for constant vectors: the set of

Homogeneous linear ﬁrst-order systems

207

vector functions {x1 (t ), . . . , xn (t )} is linearly independent if and only if the only values of c1 , . . . , cn that make c1 x1 (t ) + · · · + cn xn (t ) = 0

(3.3.10)

true for all values of t are c1 = · · · = cn = 0. Testing the linear independence of vector functions is more involved; to do so, we introduce a new concept and a corresponding theorem. Deﬁnition 3.3.1 Given vector functions x1 (t ), . . . , xn (t ) where each xi (t ) ∈ Rn for all t , the Wronskian of these functions is W [x1 , . . . , xn ] = det[x1 , . . . , xn ]

(3.3.11)

That is, the Wronskian of a set of n vector functions, each of which lies in Rn , is the determinant of the n × n matrix whose columns are x1 , . . . , xn . The Wronskian enables us to easily test whether or not vector functions are linearly independent through the following theorem, which will be stated without proof. Theorem 3.3.5 Let x1 (t ), . . . , xn (t ) be vector functions continuous on an interval I , where xi (t ) ∈ Rn for all t ∈ I . If at any point t0 in I , W [x1 , . . . , xn ] (t0 ) = 0, then {x1 (t ), . . . , xn (t )} is linearly independent on I . We observe that this result appears reasonable since it is analogous to two statements that appear in the Invertible Matrix theorem: for a set of n constant vectors in Rn , we know that the set is linearly independent if and only if the determinant of the matrix whose columns are these vectors is nonzero. Theorem 3.3.5 is a generalization of this result to the situation where the vectors are not constant. An example will now demonstrate the use of the Wronskian in showing a set of vector functions is linearly independent. Example 3.3.3 Consider the vector functions x1 = [e −t − e −t e −t ]T , x2 = [3e 2t e 2t − 2e 2t ]T , and x3 = [e 5t e 5t e 5t ]T . Are x1 , x2 , and x3 linearly independent? Solution. We use the Wronskian of x1 , x2 , and x3 to determine their linearly independence. Observe that ⎡ −t ⎤ e 3e 2t e 5t e 2t e 5t ⎦ W [x1 , x2 , x3 ] = det ⎣−e −t − t e −2e 2t e 5t = e −t (e 2t e 5t + 2e 5t e 2t ) − 3e 2t (−e −t e 5t − e 5t e −t ) + e 5t (2e 2t e −t − e 2t e −t )

208

Linear systems of differential equations

= e −t (3e 7t ) − 3e 2t (−2e 4t ) + e 5t (e t ) = 10e 6t = 0

Since W [x1 , x2 , x3 ] = 0 for at least one t -value (in fact, for all t ), it follows by theorem 3.3.5 that the functions x1 , x2 , and x3 are linearly independent. In conclusion, we now know that when we encounter a homogeneous system of n linear ﬁrst-order differential equations in n unknown functions, the set of all solutions to the system forms an n-dimensional vector space. Hence, we seek n linearly independent solutions to the system x = Ax. Such a set x1 , . . . , xn of n linearly independent solution vectors to this system is called a fundamental set. Moreover, given a set of fundamental solutions x1 , . . . , xn to x = Ax, on some interval I , the general solution to the system is x(t ) = c1 x1 + · · · + cn xn We have also seen that if an n × n matrix A has n linearly independent real eigenvectors, then these eigenvectors and their corresponding eigenvalues generate a fundamental set for the system x = Ax. In subsequent sections we will ﬁnd that, even in the case when an insufﬁcient number of real eigenvectors exists, the eigenvalue problem enables us to build a fundamental set. Moreover, we will investigate how fundamental solutions allow us to fully understand the graphical behavior of solutions and the stability of equilibrium solutions to the system. Exercises 3.3 1. If x = Ax represents the system of differential equations given by a 4 × 4 matrix A with constant entries, how many linearly independent solutions to the system do we need to ﬁnd in order to determine the general solution? What if A is 7 × 7? 2. Consider the second-order differential equation y + y = 0. Using the substitutions y = x1 and y = x2 , convert the given second-order differential equation to a system of ﬁrst-order equations. What is the dimension of the solution space to the system? What does this tell you about the dimension of the solution space to the original second-order equation? 3. Consider the third-order differential equation y + 3y + 3y + y = 0. Using the substitutions y = x1 , y = x2 , and y = x3 , convert the given differential equation to a system of ﬁrst-order equations. What is the dimension of the solution space to the system? What does this tell you about the dimension of the solution space to the original third-order equation?

Homogeneous linear ﬁrst-order systems

209

In exercises 4–8, use the Wronskian to determine if the given set of vector functions is linearly independent. 4. x1 (t ) = [e −t − e −t ]T , x2 (t ) = [e 2t 2e 2t ]T 5. x1 (t ) = [cos t sin t ]T , x2 (t ) = [sin t cos t ]T 6. x1 (t ) = [e −t − e −t ]T , x2 (t ) = [−3e −t 3e −t ]T 7. x1 (t ) = [e t − e t e t ]T , x2 (t ) = [e 7t 2e 7t − 3e 7t ]T , x3 (t ) = [4e −4t e −4t − e −4t ]T 8. x1 (t ) = [cos t − sin t 0]T , x2 (t ) = [sin t cos t 0]T , x3 (t ) = [0 0 e t ]T 9. Explain why for a set of two vector functions, the Wronskian is unneeded to check for linear independence. (Hint: what is the simple test for a pair of constant vectors to be linearly independent?) 10. Let x = Ax be given by the matrix 1 −2 A= 1 −2 (a) Compute the eigenvalues and eigenvectors of A. Explain why these enable you to ﬁnd the general solution to x = Ax. (b) State the general solution to the system. (c) Solve the IVP with the initial condition x(0) = [3 2]T . 11. Let x = Ax be given by the matrix

3 1 A= 0 3

(a) Compute the eigenvalues and eigenvectors of A. Explain why you have found one linearly independent solution to the system, but still need to determine another. (b) Verify through direct substitution that x2 (t ) = te 3t [1 0]T + e 3t [0 1]T is a solution to the given system x = Ax. (c) Show that the solution you found in (a) above and the solution x2 (t ) in (b) are linearly independent, and hence state the general solution to the system. (d) Solve the IVP with the initial condition x(0) = [3 2]T . 12. Let x = Ax be given by the matrix

3 0 A= 0 3

(a) Compute the eigenvalues and eigenvectors of A. Explain why, despite the repeated eigenvalue, you have found two linearly independent solutions to the system. (b) State the general solution to the system.

210

Linear systems of differential equations

(c) Solve the IVP with the initial condition x(0) = [3 2]T . (d) Explain how you could solve the original system given in this problem without using eigenvalues and eigenvectors. 13. Let x = Ax be given by the matrix A=

0 −1 1 0

(a) Compute the eigenvalues and eigenvectors of A. Explain why the eigenvalues and eigenvectors do not produce any real linearly independent solutions to the system. (b) Verify through direct substitution that x1 (t ) = [cos t sin t ]T and x2 (t ) = [− sin t cos t ]T are solutions to the given system x = Ax. (c) Show that the solutions you veriﬁed in (b) are linearly independent, and hence state the general solution to the system. (d) Solve the IVP with the initial condition x(0) = [3 2]T . 14. Let x = Ax be given by the matrix ⎤ ⎡ 5 6 2 A = ⎣0 −1 −8⎦ 1 0 −2 (a) Compute the eigenvalues and eigenvectors of A. Explain why your work determines two linearly independent solutions to the system, but that one additional linearly independent solution remains to be found. (b) Verify through direct substitution that x3 (t ) = te 3t [5 − 2 1]T + e 3t [1 1/2 0]T is a solution to the given system x = Ax. (c) Show that the set of three solutions from (a) and (b) is linearly independent, and hence state the general solution to the system. (d) Solve the IVP with the initial condition x(0) = [3 2 1]T . 15. Consider the second-order differential equation y + y = 0. Convert this equation to a system of ﬁrst-order equations and solve the system. Use your work to state the general solution y to the original equation. (Hint: See exercise 13.) 16. Convert the second-order differential equation y + 3y + 2y = 0 to a system of ﬁrst-order equations and solve the system. Use your work to state the general solution y to the original equation. 17. Convert the third-order differential equation y − y = 0 to a system of ﬁrst-order equations and solve the system. Use your work to state the general solution y to the original equation.

Systems with all real linearly independent eigenvectors

211

3.4 Systems with all real linearly independent eigenvectors

In this section, we closely examine the graphical and long-term behavior of solutions to 2 × 2 systems in the case where the coefﬁcient matrix A has two real, linearly independent eigenvectors. We do so through a sequence of examples that demonstrate a variety of possibilities that naturally lead to discussion of the stability of equilibrium solutions. We ﬁrst review the graphical behavior of vector functions, a subject normally encountered in multivariable calculus. For the system x = Ax in the case where A is 2 × 2, every solution x(t ) is a vector function whose output lies in R2 . In particular, the graph of x(t ) is the curve that is traced out by the vectors x(t ) at various times t . For example, if −t e −t 1 t 0 x(t ) = e +e = (3.4.1) 0 1 et is a function we have found by solving a system of differential equations, then evaluating x(t ) at t = −1, 0, and 1 yields the vectors 2.719 1 0.368 x(−1) ≈ (3.4.2) , x(0) = , and x(1) ≈ 0.368 1 2.719 Plotting these vectors helps indicate how x(t ) traces out the parametric curve given by (x1 (t ), x2 (t )) = (e −t , e t ), shown at left in ﬁgure 3.4. In addition, it is important to recall the meaning of x (t ), the derivative of a vector function. The direction of the vector x (t ) indicates the instantaneous direction of motion of a particle traveling along the curve traced out by x(t ), while the magnitude of x (t ) determines the instantaneous speed of the particle at time t . For our purposes, the direction of motion is most important because

4

x2

4

(0.368, 2.719) (1,1) −4

−4

x2 (0.368, 2.719) (1,1)

(2.719, 0.368) x1 4 −4

(2.719, 0.368) x1 4

−4

Figure 3.4 At left, the solution curve x(t ) given in (3.4.1). At right, the solution curve x(t ) given in (3.4.1), along with corresponding scaled derivative vectors at times t = −1, t = 0, and t = 1.

212

Linear systems of differential equations

this indicates a ﬂow along the solution curve as time increases. Thus, rather than plotting the vector x (t ) at various times, we plot scaled versions of it, each emanating from the tip of x(t ). For example, since −t −e x (t ) = (3.4.3) et it follows that x (−1) ≈

−2.719 −1 −0.368 , x (0) = , and x (1) ≈ 0.368 1 2.719

(3.4.4)

Plotting scaled versions of each of these vectors emanating from x(−1), x(0), and x(1), respectively, we see the updated image at the right in ﬁgure 3.4. These plots of the derivative vectors and the ﬂow of the solution curve remind us of our earlier work with slope ﬁelds for single differential equations. Indeed, since a solution curve such as x(t ) will always be the result of solving some differential equation x = Ax, we realize that we have a formula for x , just as we had a formula for y in examples like y = −2y. In the example discussed above, we can view x(t ) as being the solution to the system x = Ax where A is the matrix −1 0 A= (3.4.5) 0 1 so that x (t ) satisﬁes the equation

x1 (t ) −x1 (t ) = x (t ) = Ax(t ) = x2 (t ) x2 (t )

(3.4.6)

In particular, (3.4.6) indicates how, for any point (x1 , x2 ) in the plane, we can easily compute x at that point, and hence know the direction of the ﬂow of the solution curve that passes through that point. Using a computer to conduct such computations at points sampled throughout the plane (with each resulting vector scaled to be of equal length), we get a picture of the so-called direction ﬁeld for the system, shown at left in ﬁgure 3.5, which is analogous to a direction ﬁeld for a single differential equation. If we now superimpose our plot of the solution curve in ﬁgure 3.4 in the direction ﬁeld, now shown on the right in ﬁgure 3.5, we see clearly the role that the derivative x and the direction ﬁeld play in determining the graph of the solution x, as well as the typical behavior of a solution as time increases. The x1 –x2 plane is usually called the phase plane; note that the independent variable t is implicit in the ﬂow, while the behavior of the curve relative to the coordinate axes demonstrates the interrelationship between the components x1 (t ) and x2 (t ) of the solution x(t ). Sample solution curves, such the one plotted in ﬁgure 3.5, are typically called trajectories. Each distinct trajectory is a solution to an initial-value problem; the one in ﬁgure 3.5 can be viewed as the solution to x = Ax , x(0) = [1 1]T .

Systems with all real linearly independent eigenvectors

4

x2

4

213

x2

x1 −4

4

−4

−4

4

x1

−4

Figure 3.5 At left, the direction ﬁeld for the system x = Ax given by (3.4.5). At right,

the solution to (3.4.5) that is given by (3.4.1).

We will now explore the direction ﬁeld, phase plane, and trajectories for several examples of 2 × 2 systems of linear differential equations for which the coefﬁcient matrix has two real linearly independent eigenvectors. An important theme throughout will be the long-range behavior of solutions x(t ) as t → ∞. In addition, we will study the equilibrium solutions of each system; a solution x(t ) is an equilibrium or constant solution if and only if x(t ) is constant for all values of t . Example 3.4.1 Consider the system of differential equations given by x = 3 2 Ax where A = . Compute the eigenvalues and eigenvectors of A and 2 3 state the general solution to the system. In addition, determine all equilibrium solutions of the system. Finally, plot the direction ﬁeld for the system, sketch several trajectories, and discuss the long-term behavior of solutions relative to the equilibrium solution(s). Solution. The Maple command > Eigenvectors(A) produces the output 5 1 −1 1 1 1 so that A has eigenvalues λ1 = 5 and λ2 = 1, with corresponding eigenvectors v1 = [1 1]T and v2 = [−1 1]T . We therefore know that the general solution to x = Ax is 5t 1 t −1 + c2 e x(t ) = c1 e 1 1 To ﬁnd the equilibrium solution(s), we seek all constant vectors x that satisfy x = Ax. In this situation, since x is constant with respect to t , we know that

214

Linear systems of differential equations

x = 0, so therefore we must solve the system of linear equations given by Ax = 0 where 3 2 A= 2 3 Since det(A) = 0, it follows that A is an invertible matrix, so the only solution to Ax = 0 is x = 0. Thus the system has the origin as its only equilibrium solution. At the end of this section, in subsection 3.4.1, we will show how to use Maple to plot direction ﬁelds for systems. In this and subsequent examples, well simply provide these plots for discussion. In ﬁgure 3.6, we see not only the direction ﬁeld generated by the system, but also the plots of several trajectories, which are natural to sketch (even by hand, once the direction ﬁeld is provided) by following the map that the direction ﬁeld provides. Note particularly the straight-line solutions that follow the eigenvectors v1 = [1 1]T and v2 = [−1 1]T . Moreover, since both eigenvalues are positive, the respective scalar functions e 5t and e t both increase without bound as t → ∞. This explains why the ﬂow along each straight-line solution is away from the origin. Indeed, every solution besides the zero solution ﬂows away from the equilibrium solution at the origin. In chapter 2, we considered single autonomous differential equations such as y = 2y − 4. When we found equilibrium solutions to such equations, we also classiﬁed their stability based on the behavior exhibited in the direction ﬁeld. We do likewise with equilibrium solutions for systems. In example 3.4.1,

x2 4

x1 −4

4

−4 Figure 3.6 The direction ﬁeld for the system

x = Ax of example 3.4.1 along with several trajectories.

Systems with all real linearly independent eigenvectors

215

we found that x = 0 is the only equilibrium solution of the system, and that every non-constant solution ﬂows away from 0. This shows that 0 is an unstable equilibrium, and in this case we naturally call 0 a repelling node. We next explore the behavior of a system where both eigenvalues are negative. Example 3.4.2 the system of differential equations given by x = Ax Consider 2 −2 where A = . Compute the eigenvalues and eigenvectors of A, and 1 −3 state the general solution to the system. In addition, determine all equilibrium solutions to the system. Finally, plot the direction ﬁeld for the system, sketch several trajectories, and discuss the long-term behavior of solutions relative to the equilibrium solution(s). Solution. Using Maple, we ﬁnd that A has eigenvalues λ1 = −1 and λ2 = −4, with corresponding eigenvectors v1 = [2 1]T and v2 = [−1 1]T . The general solution to x = Ax is therefore −t 2 −4t −1 + c2 e x(t ) = c1 e 1 1 To ﬁnd the equilibrium solution, we set x = 0. Solving the system of linear equations given by Ax = 0, we see that since A is an invertible matrix, the only solution to Ax = 0 is x = 0, so the system has the origin as its only equilibrium solution. Plotting the direction ﬁeld and several trajectories, as shown in ﬁgure 3.7, we observe that all solutions ﬂow towards the equilibrium solution at the origin. This makes sense due to the presence of the scalar functions e −4t and e −t in the general solution, as each approaches 0 as t → ∞, and thus it follows that x(t ) → 0 as t → ∞. Moreover, note the two straight-line solutions that show ﬂow along stretches of the two eigenvectors v1 = [2 1]T and v2 = [−1 1]T . Because every non-constant solution to the system in example 3.4.2 approaches the equilibrium solution at 0, we say that the origin is a stable equilibrium. Moreover, based on the patterns in the ﬂow, we use the terminology that 0 is an attracting node. We study the third case for a 2 × 2 linear system of differential equations with two real, nonzero eigenvalues in the next example: the eigenvalues have opposing signs. 3 −2 Example 3.4.3 Let A = and consider the system of differential 2 −2 equations given by x = Ax. Find the general solution of the system, determine all equilibrium solutions to the system, and plot the direction ﬁeld for the system. Include sketches of several trajectories and discuss the long-term behavior of solutions relative to the equilibrium solution(s).

216

Linear systems of differential equations

4

x2

x1 −4

4

−4 Figure 3.7 The direction ﬁeld for the system

x = Ax in example 3.4.2 along with several trajectories.

Solution. We ﬁnd that A has eigenvalues λ1 = 2 and λ2 = −1, with corresponding eigenvectors v1 = [2 1]T and v2 = [1 2]T . It follows that the general solution to x = Ax is −t 1 2t 2 x(t ) = c1 e + c2 e 1 2 Since A is an invertible matrix, the only solution to Ax = 0 is x = 0, so the origin is only equilibrium solution of the system. As ﬁgure 3.8 shows, the direction ﬁeld and various trajectories exhibit a different type of behavior around the origin. In particular, solutions that do not lie on either eigenvector appear to initially ﬂow toward the origin, and then turn away and tend toward the straight-line solution associated with the positive eigenvalue. More speciﬁcally, it appears that solutions that do not pass through a point on the line in the direction of the eigenvector [1 2]T are eventually attracted to stretches of the eigenvector [2 1]T . This is reasonable since in the general solution, e −t will tend to 0 as t → ∞, leaving the function c1 e 2t [2 1]T to dominate. Since some solutions that pass through points near the origin tend away from the origin as t → ∞, the origin is an unstable equilibrium in example 3.4.3. Moreover, as the trajectories remind us of the contour plot in multivariable calculus of a surface whose graph looks like a saddle, we say in this context as well that the origin is a saddle point. The preceding examples demonstrate the three possible cases for a 2 × 2 system with real, nonzero eigenvalues: both positive, both negative, or opposites. Our next example investigates the situation when one eigenvalue is zero.

Systems with all real linearly independent eigenvectors

217

x2

4

x1 −4

4

−4 Figure 3.8 The direction ﬁeld for the system

x = Ax of example 3.4.3 along with several trajectories.

Example 3.4.4 For the matrix A =

1 −3 and the corresponding system 3 −1

of differential equations x = Ax, ﬁnd the general solution of the system and determine all equilibrium solutions. Furthermore, plot the direction ﬁeld for the system along with sketches of several trajectories; discuss the long-term behavior of solutions relative to the equilibrium solution(s). Solution. We ﬁrst do the standard computations to ﬁnd that A has eigenvalues λ1 = −4 and λ2 = 0, with corresponding eigenvectors v1 = [−1 1]T and v2 = [1 3]T . Thus, the general solution to x = Ax is 1 −4t −1 + c2 x(t ) = c1 e 1 3 We immediately notice something different about x(t ). In particular, because the second eigenvalue is 0, the scalar function e 0t has no effect on the general solution. Furthermore, with e −4t the only part of x(t ) that changes with t , we can see that for any nonzero constant c1 and any c2 , the graph of x(t ) is always a straight line where the direction is given by the eigenvector corresponding to the nonzero eigenvalue. In addition, the presence of a zero eigenvalue has a signiﬁcant impact on the system’s equilibrium solutions. The fact that the columns of A are scalar multiples of each other leads us to see immediately that A is not invertible; this can be equivalently deduced from the fact that A has a zero eigenvalue. The singularity of A further implies that the homogeneous equation Ax = 0 has inﬁnitely many solutions. In particular, row-reducing the appropriate

218

Linear systems of differential equations

augmented matrix, we ﬁnd that 1 0 1 −1/3 0 −3 → 3 −1 0 0 0 0 This implies that any constant vector x of the form 1 x = x1 3 satisﬁes the equation x = Ax, and therefore is an equilibrium solution. Note especially that x = x1 [1 3]T is an eigenvector associated with λ = 0, and thus every eigenvector associated with the zero eigenvalue is an equilibrium solution to the system. The interesting behaviors that we have discussed algebraically are seen in ﬁgure 3.9. Speciﬁcally, every non-constant solution is a straight line solution in the direction of the eigenvector [−1 1]T that is drawn toward an equilibrium point that lies on the eigenvector [1 3]T corresponding to the zero eigenvalue. The ﬂows in ﬁgure 3.9, as well as the long-term behavior of the function e −4t in the general solution x(t ), clearly demonstrate that every equilibrium solution to the system is stable. Moreover, we say that each such equilibrium point is an attracting node. There are two important observations to make in closing. One is that we still must address the situations where A lacks two real linearly independent eigenvectors; we will do so in the next section. In addition, examples 3.4.1–3.4.4 x2 4

x1 −4

4

−4 Figure 3.9 The direction ﬁeld for the system

x = Ax of example 3.4.4 along with several trajectories.

Systems with all real linearly independent eigenvectors

219

indicate that plotting a direction ﬁeld is perhaps best left to a computer; however, in the case where A has two real, linearly independent eigenvectors, it is a straightforward exercise use the eigenvectors to plot these straight-line solutions by hand and to use the signs of the corresponding eigenvalues to understand the ﬂows along the straight line solutions. Then, it is not difﬁcult to imagine the overall appearance of the direction ﬁeld and sketch several probable trajectories by hand, thus fully understanding the graphical behavior of all solutions to the system. 3.4.1 Plotting direction ﬁelds for systems using Maple

We again use the DEtools package, and load it with the command > with(DEtools):

To plot the direction ﬁeld associated with a given system of differential equations, we ﬁrst deﬁne the system itself, similar to how we deﬁned a single differential equation in order to plot its slope ﬁeld. We do this through the 3 2 following command for the system with coefﬁcient matrix A = from 2 3 example 3.4.1. > sys := diff(x(t),t)= 3*x(t)+2*y(t), diff(y(t),t)= 2*x(t)+3*y(t);

The system of differential equations of interest is now stored in “sys”. While we typically use x1 (t ) and x2 (t ) to represent the component functions in our discussion of the theory and solution of systems, in working with Maple it is often simpler to use x(t ) and y(t ). The direction ﬁeld may now be generated by the command > DEplot([sys], [x(t),y(t)], t=-1..1, x=-4..4, y=-4..4, arrows=large, color=gray);

This command produces the output shown at left in ﬁgure 3.10. From here, it is a straightforward exercise to sketch trajectories by hand. Of course, Maple has the capacity to include trajectories that pass through any initial conditions we choose. For example, if we are interested in the various initial conditions x(0) = (2, 2), (0, 4), (4, 0), and (−1, 1), we can modify the earlier DEplot command to > DEplot([sys], [x(t),y(t)], t=-1.6..3.6, x=-4..4, y=-4..4, arrows=large, color=gray, [[x(0)=-2,y(0)=0], [x(0)=0,y(0)=-2], [x(0)=2,y(0)=0], [x(0)=0,y(0)=2],

220

Linear systems of differential equations

x2

x2

4

4

x1

x1 −4

4

−4

−4

4

−4

Figure 3.10 At left, the direction ﬁeld for the system x = Ax of example 3.4.1. At right, the same direction ﬁeld with several trajectories.

[x(0)=0.1,y(0)=0.1], [x(0)=-0.1,y(0)=-0.1], [x(0)=0.1,y(0)=-0.1], [x(0)=-0.1,y(0)=0.1]]);

The results of this most recent DEplot command are shown at right in ﬁgure 3.10. As always, the user can experiment some with the window in which the plot is displayed: the range of x- and y-values can affect how clearly the direction ﬁeld is revealed, and the range of t -values determines how much of each trajectory is plotted. Exercises 3.4 1. Consider the system of differential equations x = Ax given by 2 −1 A= 3 −2 (a) Determine the general solution to the system x = Ax. (b) Classify the stability of all equilibrium solutions to the system. (c) Sketch all straight-line solutions to the system and hence plot several nonlinear trajectories in the phase plane. 2. Consider the system of differential equations x = Ax given by 3 1 A= 1 3 (a) Determine the general solution to the system x = Ax. (b) Classify the stability of all equilibrium solutions to the system.

Systems with all real linearly independent eigenvectors

221

(c) Sketch all straight-line solutions to the system and hence plot several nonlinear trajectories in the phase plane. 3. Consider the system of differential equations x = Ax given by 2 −3 A= 2 −3 (a) Determine the general solution to the system x = Ax. (b) Classify the stability of all equilibrium solutions to the system. (c) Sketch all straight-line solutions to the system and hence plot several nonlinear trajectories in the phase plane. 4. Consider the system of differential equations x = Ax given by 0 −2 A= 0 −2 (a) Determine the general solution to the system x = Ax. (b) Classify the stability of all equilibrium solutions to the system. (c) Sketch the straight-line solutions to the system that correspond to the two linearly independent eigenvectors. Why is every solution to this system also a straight-line solution? 5. Consider the system of differential equations x = Ax given by 2 −2 A= 1 −1 (a) Determine the general solution to the system x = Ax. (b) Classify the stability of all equilibrium solutions to the system. (c) Why is every non-constant solution to this system also a straight-line solution? How are these straight-line solutions related to the eigenvectors of the system? In exercises 6–9, let x(t ) be the stated general solution to some system x = Ax. State the straight-line solutions to the system, classify the stability of the origin, and sketch some sample trajectories. 1 3 6. x(t ) = c1 e −2t + c2 e −5t 3 1 1 −1 + c2 e −3t 7. x(t ) = c1 e 4t 2 2 2 1 8. x(t ) = c1 e 2t + c2 1 −1

222

Linear systems of differential equations

9. x(t ) = c1

e 0.1t

1 −1 10t + c2 e 1 1

10. For the system x = Ax whose general solution is given in exercise 6, determine a possible matrix A for the system. (Hint: If A is a matrix with all real linearly independent eigenvectors and those eigenvectors are the columns of a matrix P, then A satisﬁes the equation AP = PD, where D is the diagonal matrix whose entries are the eigenvalues of A in order corresponding to the eigenvectors in the columns of P.) 11. For the system x = Ax whose general solution is given in exercise 7, determine a possible matrix A for the system. 12. Consider the four systems of equations given by x = Ax where A is given by the matrices I, II, III, and IV below. Match each system with one of the four direction ﬁeld plots (a), (b), (c), and (d) given below. Write one sentence the reasoning behind for each to explain your choice. 5 3 2 −4 2 7 2 3 I. A = II. A = III. A = IV. A = 3 5 7 2 3 −6 −1 2 4

x2

4

(a)

x2

x1 (b) −4

4

x1 −4

−4

−4 4

4

x2

4

x2

x1

x1 (d)

(c) −4

4

−4

−4

4

−4

When a matrix lacks two real linearly independent eigenvectors

223

In exercises 13–17, solve the IVP x = Ax with the given matrix A and stated initial condition. 2 −1 , x(0) = [1 2] 13. A = 3 −2 3 1 14. A = , x(0) = [−3 1]T 1 3 2 −3 15. A = , x(0) = [1 − 2]T 2 −3 0 −2 16. A = , x(0) = [−2 − 2]T 0 −2 2 −2 17. A = , x(0) = [1 4]T 1 −1 In exercises 18–22, use the standard substitution to convert the given secondorder differential equation to a system of two linear ﬁrst-order equations. Solve the system to hence determine the solution y to the second-order equation. 18. y − y − 6y = 0 19. y − 6y + 5y = 0 20. y + 4y = 0 21. y + 3y + 2y = 0 22. y + y = 0 3.5 When a matrix lacks two real linearly independent eigenvectors

We have seen repeatedly, both in theory and in speciﬁc examples, that when a 2 × 2 matrix A has two real linearly independent eigenvectors, we can determine the general solution to x = Ax and its graphical behavior. In this section, we address two remaining cases: when A has a repeated eigenvalue and only one associated real linearly independent eigenvector, and when A has complex eigenvalues and eigenvectors. In each case, we work through preliminary examples to discover general patterns and principles, expand these principles with appropriate theorems, and explore and discuss graphical behavior along the way. Example 3.5.1 Consider the system of differential equations given by x = Ax 1 −2 where A = . Compute the eigenvalues and eigenvectors of A and 0 −2

224

Linear systems of differential equations

explain why this alone does not lead to the general solution of the system. By noting that the system is partially coupled, solve the system and determine a second real, linearly independent solution. Finally, state the general solution. Solution. By inspection, since A is a triangular matrix, we see that λ = −2 is a repeated eigenvalue of A with multiplicity 2. From this, we deduce that v1 = [1 0]T is a corresponding eigenvector, and therefore one solution to x = Ax is x1 = c1 e −2t [1 0]T . However, A lacks a second linearly independent eigenvector associated with λ = −2; therefore, we need to ﬁnd a second real linearly independent solution to the system in order to determine the general solution to x = Ax. In this example, we are fortunate that the system is only partially coupled and that therefore we may solve the system directly by using techniques for single differential equations from chapter 2. In particular, noting that the second equation in the system is x2 = −2x2 , it follows immediately that the solution to this single differential equation is x2 (t ) = ce −2t . Substituting this result into the equation x1 = −2x1 + x2 , it remains for us to solve the single nonhomogeneous linear ﬁrst-order differential equation x1 = −2x1 + ce −2t Applying our understanding of such equations from section 2.3, via the integrating factor v(t ) = e 2t we know that 1 x1 (t ) = 2t e 2t · ce −2t dt = e −2t (ct + k) e To summarize, with x1 (t ) and x2 (t ) as the components of x(t ), we have found that a solution to the system is x (t ) x(t ) = 1 x2 (t ) −2t e (ct + k) (3.5.1) = ce −2t If we factor this expression to write x(t ) as a linear combination of two vectors in order to more clearly identify the role of the constants in (3.5.1), we see −2t −2t te e x(t ) = k (3.5.2) + c −2t 0 e In this form, two key observations can be made. First, each individual vector in (3.5.2) may be veriﬁed to be a solution to the given system. Moreover, these two vectors are linearly independent. Hence, (3.5.2) is the general solution to the given system. While it is good that we were able to solve the system in example 3.5.1, it is still unclear how we will proceed in similar circumstances when neither equation in the system may be solved by techniques for single ﬁrst-order equations. That is,

When a matrix lacks two real linearly independent eigenvectors

225

if the equation for x1 involves x2 and the equation for x 2 involves x1 , but the system’s matrix has only one linearly independent eigenvector, we cannot employ the approach used in example 3.5.1. However, the general form of the solution (3.5.2) can help us guess an appropriate form of the needed second linearly independent solution in the more general case. Recall that we know that whenever (λ, v) is a real eigenpair of A, the function x(t ) = e λt v is a solution to x = Ax, and moreover x(t ) is a straight-line solution to the system. In example 3.5.1, we found that for the given matrix, which had a repeated eigenvalue and only one associated linearly independent eigenvector, the scalar function te λt arose in the solution. If we recall that our original work with e λt v arose from guessing that a function of the form f (t )v was a solution to x = Ax, example 3.5.1 now suggests that in the case where we are missing an eigenvector, we consider a vector function that somehow involves the scalar function te λt as a second linearly independent solution to x = Ax. A closer look at (3.5.2) suggests the form of this second solution we seek. In particular, recalling that the matrix A in example 3.5.1 had v1 = [1 0]T as the eigenvector corresponding to λ = −2, rewriting (3.5.2) reveals the role v1 plays in the general solution. Speciﬁcally, 1 1 0 x(t ) = ke −2t (3.5.3) + cte −2t + ce −2t 0 0 1 and since x1 (t ) = e −2t [1 0]T is the standard solution that arises through the eigenpair, we see from (3.5.3) that the second linearly independent solution 1 0 x2 (t ) = te −2t + e −2t 0 1 has the form te −2t v + e −2t u, where u is not an eigenvector of A corresponding to λ = −2. This suggests a form for the second solution when this case arises in general. We now consider this situation for an arbitrary matrix with the appropriate properties. Let A be a 2 × 2 matrix with a single real, repeated eigenvalue λ with only one linearly independent eigenvector v. Note speciﬁcally that we know Av = λv and x1 (t ) = e λt v is a solution to x = Ax. Now consider a second function (3.5.4) x2 (t ) = te λt v + e λt u where u is an unknown constant vector and (λ, v) remains an eigenpair of A. We seek conditions on u that will make x2 (t ) a solution to x = Ax; as we have previously encountered in several instances, direct substitution into the differential equation reveals the constraints on u. First, differentiating (3.5.4) gives x2 (t ) = (λte λt + e λt )v + λe λt u

(3.5.5)

Next, observe that multiplying x2 (t ) by A yields Ax2 (t ) = A(te λt v + e λt u) = te λt (Av) + e λt (Au)

(3.5.6)

226

Linear systems of differential equations

In order for x2 (t ) to be a solution to x = Ax, it follows from (3.5.5) and (3.5.6) that we require the equality (λte λt + e λt )v + λe λt u = te λt (Av) + e λt (Au)

(3.5.7)

to hold. Using the fact that Av = λv and expanding, we ﬁnd λte λt v + e λt v + λe λt u = λte λt v + e λt (Au)

(3.5.8)

With λte λt v present on both sides of (3.5.8), we can simplify the equality to e λt v + λe λt u = e λt (Au) Since e λt

(3.5.9)

is never zero, we observe from (3.5.9) that u must satisfy the equation v + λu = Au

(3.5.10)

In other words, (A − λI)u = v, where (as we assumed earlier) v is an eigenvector of A that corresponds to the eigenvalue λ. In particular, note that v satisﬁes the equation (A − λI)v = 0. We summarize our work above in the following theorem. Theorem 3.5.1 If A is a 2 × 2 matrix with repeated eigenvalue λ and only one corresponding linearly independent eigenvector v, then the general solution to x = Ax is given by x(t ) = c1 e λt v + c2 e λt (t v + u) where u satisﬁes the equation (A − λI)u = v. The vector u is often called a generalized eigenvector of A corresponding to λ. We now demonstrate the role of theorem 3.5.1 in the following example. 1 4 Example 3.5.2 Let A = and consider the system of differential −1 5 equations given by x = Ax. Find the general solution of the system, determine all equilibrium solutions to the system, and plot the direction ﬁeld for the system. Include sketches of several trajectories and discuss the long-term behavior of solutions relative to the equilibrium solution(s). Solution. We ﬁnd that A has a single repeated eigenvalue λ = 3 with just one corresponding linearly independent eigenvector v = [2 1]T . Thus, one linearly independent solution to x = Ax is x1 (t ) = e 3t v. Applying theorem 3.5.1, we determine a second linearly independent solution to the system. Speciﬁcally, we ﬁrst solve the vector equation (A − 3I)u = v. To do so, we row-reduce the appropriate augmented matrix and ﬁnd 1 −2 −1 −2 4 2 → 0 0 0 −1 2 1 It follows that the vector u must have components u1 and u2 that satisfy the equation u1 = 2u2 − 1, where u2 is a free variable. Since we only need one

When a matrix lacks two real linearly independent eigenvectors

4

227

x2

x1 −4

4

−4 Figure 3.11 The direction ﬁeld for the system

x = Ax of example 3.5.2 along with several trajectories.

such vector u, we choose u2 = 0 and thus u1 = −1. From theorem 3.5.1, it now follows that a second linearly independent solution to x = Ax is given by the function x2 (t ) = e 3t (t v + u). In particular, the general solution to x = A x is 2 2 −1 x(t ) = c1 e 3t + c2 e 3t t + 1 1 0 We note further that since A is an invertible matrix, the only solution to Ax = 0 is x = 0, so the origin is the only equilibrium solution of the system. As ﬁgure 3.11 shows, the direction ﬁeld and several trajectories exhibit behavior consistent with the fact that the system has just one straightline solution, the one that corresponds to the single linearly independent eigenvector of A. Note as well that since the system’s only eigenvalue is positive, every non-constant solution ﬂows away from the origin as t → ∞. In example 3.4.3, the origin is obviously an unstable equilibrium solution. Because there is only one linearly independent eigenvector for the system, we call the origin a degenerate node, and in this case where λ = 3 > 0 and all the trajectories ﬂow away from the origin, this degenerate node is also called a repelling node. We now consider an example that reveals the other possible situation that can arise when a matrix A lacks two real linearly independent eigenvectors: when A has no real eigenvalues and no real eigenvectors.

228

Linear systems of differential equations

Example 3.5.3

Consider the system x = Ax given by the matrix 0 −1 A= 1 0

Compute the eigenvalues and eigenvectors of A and explain why this does not lead directly to the general solution of the system. In addition, plot the direction ﬁeld for the system to conﬁrm these observations from a graphical perspective. Using familiarity with solutions to single differential equations and the form of the equations for the given system, determine the general solution to the system. Solution. The eigenvalues of the matrix A are computed using the characteristic equation −λ −1 = λ2 + 1 = 0 det(A − λI) = det 1 −λ √ We see that λ2 = −1, so that λ = ±i, where i is the complex number2 i = −1. To determine the eigenvector associated with λ = i, we solve (A − iI)v = 0. Row-reducing the appropriate matrix with complex entries just as we would a matrix with real entries, we observe 1 −i 0 1 −i 0 −i − 1 0 → → 1 −i 0 0 0 0 −i −1 0 where the ﬁrst step was achieved by swapping the two rows, while the last step was achieved by computing the row replacement iR1 + R2 → R2 . It follows that any eigenvector v associated with λ = i must have components v1 and v2 that satisfy v1 = iv2 . Choosing v 2 = 1, we see that an eigenvector v corresponding to λ = i is v = [i 1]T . Similar computations with λ = −i show that a corresponding eigenvector is v = [−i 1]T . While we might suggest at this point that i x(t ) = e it 1 is a solution to x = Ax, such a solution involves the complex number i, and is not a real solution to the system. A plot of the direction ﬁeld for the system reveals further why no real solutions arise directly from the eigenvectors. In particular, if we examine ﬁgure 3.12, the direction ﬁeld and various trajectories exhibit behavior consistent with the fact that the system has no straight-line solutions due to the fact that it has no real eigenpairs: every trajectory appears to be circular. In this example, we will suspend our work with eigenvalues and eigenvectors and see whether we can determine a solution to the system more directly. If we examine the two equations given in the system x = Ax, we observe that we 2

A review of key concepts with complex numbers may be found in appendix B.

When a matrix lacks two real linearly independent eigenvectors

4

229

x2

x1 −4

4

−4 Figure 3.12 The direction ﬁeld for the system

x = Ax of example 3.5.3.

are trying to solve the two equations x1 = −x2 and x2 = x1 simultaneously. In particular, we seek two functions x1 (t ) and x2 (t ) such that the derivative of the ﬁrst is the opposite of the second and the derivative of the second is the ﬁrst. This is a familiar scenario encountered in calculus and we recognize that x1 (t ) = cos t and x2 (t ) = sin t form a pair of such functions. Further consideration reveals that the choices x1 (t ) = − sin t and x2 (t ) = cos t also satisfy the system. Our recent observations show that the vector functions cos t − sin t and x2 (t ) = x1 (t ) = sin t cos t each form a real solution to x = Ax; moreover, it is clear that x1 (t ) and x2 (t ) are not scalar multiples of one another, and thus these are two linearly independent solutions to the system. Therefore, theorem 3.3.2 implies that the general solution to the given system is cos t − sin t x(t ) = c1 (3.5.11) + c2 sin t cos t The presence of the sine and cosine functions in the entries of x will also lead to the circular trajectories we expect from the direction ﬁeld in ﬁgure 3.12. Example 3.5.3 shows several new phenomena. In every preceding example we have considered for 2 × 2 systems x = Ax, eigenpairs have directly provided at least one real solution to the system. But for the latest system we examined, the eigenpairs appeared to not produce any solutions to the system at all. Moreover, for the ﬁrst time in our work with linear systems, the sine and cosine

230

Linear systems of differential equations

functions arose. An important question to consider at this point is whether the complex eigenpair i λ = i, v = (3.5.12) 1 can be linked to the general solution that we found in (3.5.11). It turns out that the key idea lies in understanding how the exponential function e z behaves when the input z is a complex number. The great Swiss mathematician Leonhard Euler (1707–1783) is credited with discovering Euler’s formula, which states that for any real number t , e it = cos t + i sin t

(3.5.13)

In exercise 14 in this section, one way to derive Euler’s formula through Taylor series for the exponential and trigonometric functions is explored. For now, we will simply accept (3.5.13) and put it to use. Using the ﬁrst complex eigenpair found in example 3.5.3, let us consider the standard form of a potential solution to x = Ax, x(t ) = e λt v, using the eigenpair identiﬁed in (3.5.12). Here, since the solution we are considering is in fact complex, we will use the notation z(t ). Using Euler’s formula and complex arithmetic, observe that it i z(t ) = e 1 i = (cos t + i sin t ) 1 i cos t − sin t (3.5.14) = cos t + i sin t When working with complex numbers, it is often useful to identify the real and imaginary parts of the numbers. That is, for a complex number z = a + ib where a and b are real, we call a the real part of z, and b the imaginary part of z. The same distinctions hold for vectors with complex entries. Considering (3.5.14), if we separate this vector into its real and imaginary parts, we may write cos t − sin t z(t ) = (3.5.15) +i cos t sin t If we now compare the general solution to x = Ax that we found in (3.5.11) to (3.5.15) above, we can make a critical observation. The two linearly independent solutions to the system seen in (3.5.11) are in fact the real and complex parts of the vector z(t ) which arose from considering z(t ) = e λt v where (λ, v) was a complex eigenpair of A. That this fact holds in general is our next stated theorem. Theorem 3.5.2 If A is a real 2 × 2 matrix with a complex eigenvalue λ = a + ib and corresponding eigenvector v = p + iq, where a, b, p, and q are real, then

When a matrix lacks two real linearly independent eigenvectors

231

the real and imaginary parts of z(t ) = e (a +bi)t (p + iq) are real linearly independent solutions to x = Ax. We proceed to apply this result in another example involving complex eigenvalues and eigenvectors. −1 −2 Example 3.5.4 Let A = and consider the system of differential 2 −1 equations given by x = Ax. Find the general solution of the system, determine all equilibrium solutions to the system, and plot the direction ﬁeld for the system. Include sketches of several trajectories and discuss the long-term behavior of solutions relative to the equilibrium solution(s). Solution. For matrices with complex eigenvalues, Maple provides an efﬁcient and valuable approach: the program completes the necessary complex arithmetic automatically and produces the results we need. Doing so, we ﬁnd that A has complex eigenvalues λ = −1 ± 2i with corresponding complex eigenvectors v = [±i 1]T . We choose one of these complex eigenpairs and consider the complex function (−1+2i)t i z(t ) = e 1 Observe that e (−1+2i)t = e −t e 2ti , so by Euler’s formula e (−1+2i)t = e −t (cos 2t + i sin 2t ) Substituting this fact into z(t ), we observe that

i z(t ) = e (cos 2t + i sin 2t ) 1 −t − sin 2t + i cos 2t =e cos 2t + i sin 2t cos 2t − sin 2t + ie −t = e −t cos 2t sin 2t −t

By theorem 3.5.2, it now follows that the real and imaginary parts of z(t ) form two real linearly independent solutions to x = Ax, and therefore the general solution to x = Ax is cos 2t − sin 2t (3.5.16) + c2 e −t x(t ) = c1 e −t cos 2t sin 2t Since A is an invertible matrix, the origin is the only equilibrium solution of the system. Finally, as ﬁgure 3.13 shows, the direction ﬁeld and plotted trajectories exhibit behavior consistent with the fact that the system has no

232

Linear systems of differential equations

x2 4

x1 −4

4

−4 Figure 3.13 The direction ﬁeld for the system

x = Ax of example 3.5.4 along with several trajectories.

real eigenvectors and therefore no straight-line solutions. Moreover, since the real part of λ = −1 + 2i is negative, the role of e −t in the general solution (3.5.16) draws every solution to 0 and thus the origin is a stable equilibrium. In cases such as the one in example 3.5.4 where there are no straight-line solutions and every nonconstant solution tends to 0 as t → ∞, we naturally say that 0 is a spiral sink. Note that this case corresponds to the situation where the real part of a complex eigenvalue is negative. If the real part a of λ = a + bi is positive, then we will have e at present in the general solution, and this will drive every solution away from the origin. We therefore call 0 a spiral source and note that this equilibrium solution is unstable. Finally, in the event that a = 0 in the complex eigenvalue λ = a + bi, as it was in example 3.5.3, then all nonconstant solutions will orbit the origin while neither being drawn toward or repelled from the equilibrium solution. See, for example, ﬁgure 3.12. Such an equilibrium is called a center and is considered stable. In our discussions in this section we have addressed the two possible cases for a 2 × 2 matrix A which lacks two linearly independent eigenvectors. Our work extends naturally to the case of more general n × n systems where the n × n matrix A may or may not have n real linearly independent eigenvectors. Of course, in the case where A has a full set of n real linearly independent eigenvectors, the eigenpairs allow the general solution to the system to be determined. In cases where some of the eigenvalues are complex, or repeated with missing eigenvectors, we can work with each individual eigenvalue to build real linearly independent solutions in ways similar to our preceding work. Some examples are explored in the exercises that follow.

When a matrix lacks two real linearly independent eigenvectors

233

Table 3.1 The stability of the origin as determined by the eigenvalues of a 2 × 2 matrix A

0 < λ1 ≤ λ 2

0 is unstable and called a repelling node

λ1 < 0 < λ 2

0 is unstable and called a saddle

λ1 ≤ λ 2 < 0

0 is stable and called an attracting node

λ = a ± bi and a > 0

0 is unstable and called a spiral source

λ = a ± bi and a = 0

0 is stable and called a center

λ = a ± bi and a < 0

0 is stable and called a spiral sink

We close this section with a summary in table 3.1 of the stability of the origin as an equilibrium point of x = Ax in the cases where both eigenvalues are nonzero. Exercises 3.5 For each of exercises 1–7, the general solution x(t ) to a homogeneous linear 2 × 2 system of differential equations x = Ax is given. For each problem, sketch any straight-line solutions, classify the stability of the equilibrium solution x = 0, and sketch a few trajectories that are not straight lines. Do not use a computer. 1 −1 1. x(t ) = c1 e −2t + c2 e −3t 2 2 cos t − sin t 2. x(t ) = c1 e −2t + c2 e −2t sin t cos t 1 −1 + c2 e −t 3. x(t ) = c1 e 2t 1 1 1 −1 4. x(t ) = c1 e −2t + c2 1 1 2 cos t − sin t 5. x(t ) = c1 + c2 sin t 2 cos t 2 cos t − sin t t 3t 6. x(t ) = c1 e + c2 e sin t 2 cos t 4 1 7. x(t ) = c1 e 2t + c2 e t 1 4 For each of exercises 8–13, the characteristic polynomial p(λ) of a matrix A is given. That is, the zeros of the given polynomial are the eigenvalues of

234

Linear systems of differential equations

the matrix A. For each, classify the stability of the origin as an equilibrium point of the system given by x = Ax. 8. p(λ) = λ2 − 4 9. p(λ) = λ2 + 4 10. p(λ) = λ2 + λ + 1 11. p(λ) = λ2 − 10λ + 9 12. p(λ) = λ2 − 2λ + 5 13. p(λ) = λ2 + 3λ + 2 14. Recall or look up the formulas for the Taylor series about a = 0 for each of the functions e x , sin x, and cos x. Assuming that the Taylor series for e x is valid for complex numbers x, compute e ib and compare the result to the expansions for cos b and i sin b to show that e ib = cos b + i sin b In addition, show that e a +ib = e a (cos b + i sin b) In exercises 15–19, a matrix A is given. For each, consider the system of differential equations x = Ax and respond to (a) - (d). (a) Determine the general solution to the system x = Ax. (b) Classify the stability of all equilibrium solutions to the system. (c) How many straight-line solutions does this system of equations have? Why? (d) Use a computer algebra system to plot the direction ﬁeld for this system and sketch several trajectories by hand. 0 −2 15. A = 2 0 2 −3 16. A = 3 2 1 −2 17. A = 0 −2 −4 5 18. A = −5 4 7 −1 19. A = 4 11

When a matrix lacks two real linearly independent eigenvectors

235

In exercises 20–24, solve the IVP given by x = Ax and the stated initial condition. 0 −2 20. A = , x(0) = [1 3]T 2 0 2 −3 21. A = , x(0) = [−3 1]T 3 2 1 −2 22. A = , x(0) = [2 − 2]T 0 −2 −4 5 , x(0) = [−2 − 3]T 23. A = −5 4 7 −1 24. A = , x(0) = [0 5]T 4 11 25. Consider the system of differential equations x = Ax given by ⎡ ⎤ 3 1 −1 1⎦ A=⎣ 1 3 −1 1 3 (a) Determine the general solution to the system x = Ax. (b) Classify the stability of all equilibrium solutions to the system. (c) How many straight-line solutions does this system of equations have? Why? 26. Repeat exercise 25 using the matrix ⎡ ⎤ 0 3/2 −1/2 3/2⎦ A = ⎣−1 −3/2 −1 1/2 −1/2 27. Explain why every 3 × 3 homogeneous linear system of differential equations of the form x = Ax must always have at least one straight-line solution. Must every 4 × 4 system have at least one straight-line solution? Explain. What can you say about any n × n homogeneous linear system? In exercises 28–32, use the standard substitution to convert the given secondorder differential equation to a system of two linear ﬁrst-order equations. Solve the system to hence determine the solution y to the second-order equation. 28. y + y − 6y = 0 29. y + 2y + 5y = 0 30. y + 4y = 0 31. y + 3y − 28y = 0 32. y + y + 1 = 0

236

Linear systems of differential equations

3.6 Nonhomogeneous systems: undetermined coefﬁcients

So far in our studies of systems of linear differential equations, we have focused almost exclusively on the case where the system is homogeneous and can be represented in the form x = Ax. We now begin to investigate nonhomogeneous systems, which are systems of the form x = Ax + b where b = 0. In section 3.1, we encountered a system of two tanks where we were interested in the amount of salt in each tank at time t . With the amount of salt in the two tanks represented respectively by x1 (t ) and x2 (t ), we saw that these component functions had to satisfy the system of differential equations given by 1/80 x1 20 −1/20 (3.6.1) + x = 1/40 −1/40 x2 35 and that this system is naturally represented in the form x = Ax + b

(3.6.2)

In our most recent work with the homogeneous equation x = Ax, we noted several times the analogy to solving the single ﬁrst-order differential equation x = ax. In particular, we observed the key role that e λt plays in the process of solving homogeneous systems of equations, much like e at does in the solution of a single homogeneous linear ﬁrst-order equation. We next naturally consider the linear ﬁrst-order analogy of (3.6.2), a nonhomogeneous equation such as y = 2y + 5

(3.6.3)

In section 2.3, we made the observation in theorem 2.3.3 that for any linear ﬁrst-order differential equation in the form y + p(t )y = f (t ) if yp is any solution to the nonhomogeneous equation and yh is a solution to the corresponding homogeneous equation, then y = yp + yh is a solution to the nonhomogeneous equation. In our studies of linear algebra in chapter 1, we made a similar observation in section 1.5: if we have a solution xp to the nonhomogeneous equation Ax = b, and we add to xp any solution xh to the homogeneous equation Ax = 0, the result (x = xp + xh ) is also a solution to Ax = b. See (1.5.1) to revisit the details of this discussion. Note that in this purely linear algebra context, x is a vector whose entries are constant. These two preceding observations for linear ﬁrst-order differential equations and systems of linear algebraic equations are now applied to the nonhomogeneous system of linear ﬁrst-order differential equations, x = Ax + b. We note speciﬁcally that in this context, x(t ) is a function of t . Let’s return to

Nonhomogeneous systems: undetermined coefﬁcients

237

the known situation of the homogeneous system x = Ax and denote its solution by xh (t ). In addition, suppose we are able to determine a single solution xp (t ) to the nonhomogeneous equation x = Ax + b. We claim that the function x(t ) = xh (t ) + xp (t ) is the general solution of the nonhomogeneous equation. To see this, we substitute directly into x = Ax + b and verify that the equation is satisﬁed. By properties of linearity, observe that x (t ) = xh (t ) + xp (t )

(3.6.4)

Ax + b = A(xh + xp ) + b = Axh + Axp + b

(3.6.5)

and furthermore By how we deﬁned xh (t ) and xp (t ), we know that xh (t ) = Axh (t ) and xp (t ) = Axp (t ) + b, and thus (3.6.5) implies Ax + b = xh (t ) + xp (t )

(3.6.6)

From (3.6.4) and (3.6.6), we see that x = xh + xp is indeed a solution to x = Ax + b. In fact, we have found the general solution to the nonhomogeneous system, as stated in the following theorem. Theorem 3.6.1 Let A be an n × n matrix with constant coefﬁcients. If xh is the general solution to the homogeneous system x = Ax and xp is any solution to the nonhomogeneous system x = Ax + b, then x = xh + xp is the general solution to x = Ax + b. Theorem 3.6.1 provides an approach that will guide us throughout our efforts to solve nonhomogeneous systems of differential equations. First, we solve the associated homogeneous system to ﬁnd xh , a process we are familiar with. We usually call xh the complementary solution to the equation x = Ax + b. Next, we must ﬁnd a so-called particular solution xp to the nonhomogeneous system x = Ax + b. Although a more sophisticated approach will be introduced in the next section, for now we will investigate a few examples in which the process of ﬁnding such a particular solution xp is relatively straightforward. Example 3.6.1 From the system of two tanks discussed in sections 1.1 and 3.1, consider the nonhomogeneous system of linear differential equations given by 1/80 20 −1/20 x = x+ (3.6.7) 1/40 −1/40 35 By solving the associated homogeneous system and determining a particular solution to the nonhomogeneous system, ﬁnd the general solution to the given system. In addition, plot an appropriate direction ﬁeld and discuss the longterm behavior of solutions and their meaning in the context of the salt in each tank. Determine and sketch the solution to the IVP with initial condition x(0) = [2000 1000]T .

238

Linear systems of differential equations

Solution.

We begin by solving x = Ax, where 1/80 −1/20 A= 1/40 −1/40

The eigenvalues of A are approximately λ1 = −0.158 and λ2 = −0.592, with corresponding eigenvectors approximated by v1 = [0.366 1.000]T and v2 = [−1.366 1.000]T . It follows that the general solution xh is −0.158t 0.366 −0.592t −1.366 xh (t ) = c1 e + c2 e 1.000 1.000 Next, we must determine a particular solution xp to the nonhomogeneous equation x = Ax + b. In this particular example, b is a constant vector. Therefore, it is natural to guess that a constant vector xp will satisfy the nonhomogeneous equation. More than this, we should recall from earlier discussions of the problem leading to the given system that the vector x represents the amounts of salt in two connected tanks as streams of inﬂow deliver salt, each at a constant rate. Our intuition suggests that over time the two tanks should approach a stable equilibrium, and hence an equilibrium (and therefore constant) solution should be present. Therefore, we assume that xp is a constant vector and observe that this immediately implies that xp = 0. Substituting into x = Ax + b, it follows that xp must satisfy the system of linear equations 0 = Axp + b or Axp = −b. With the given entries of A and b, this leads us to row reduce the appropriate augmented matrix and ﬁnd that 1/80 −20 1 0 1000 −1/20 → 1/40 −1/40 −35 0 1 2400 This shows xp = [1000 2400]T is a particular solution to x = Ax + b, and, more speciﬁcally, is an equilibrium solution of the system. Moreover, it now follows that the general solution to the system is given by 1000 −0.158t 0.366 −0.592t −1.366 x(t ) = xh (t ) + xp (t ) = c1 e + c2 e + 1.000 1.000 2400 (3.6.8) T If we add the initial condition that x(0) = [2000 1000] , we can solve for the constants c1 and c2 , and plot the appropriate corresponding trajectory, as shown in ﬁgure 3.14. In both (3.6.8) and ﬁgure 3.14 we can see how the long-term behavior of every solution tends to the equilibrium solution. Moreover, in the direction ﬁeld we can also recognize the straight-line solutions that correspond to lines in the direction of each eigenvector but that now pass through the equilibrium solution (1000, 2400). From example 3.6.1, we observe that in cases where we want to solve x = Ax + b and b is itself a constant vector, xp may be determined by assuming that xp is a constant vector and solving 0 = Axp + b. If xp is not constant, then the situation is more complicated, as we discover in the following example.

Nonhomogeneous systems: undetermined coefﬁcients

5000

239

x2

3000 equilibrium solution (1000, 2400)

solution through (2000, 1000)

1000 x1 1000

2000

Figure 3.14 The direction ﬁeld for the system x = Ax + b of example 3.6.1.

Example 3.6.2 Find the general solution of the nonhomogeneous system given by 2 −1 cos 2t x = x+ (3.6.9) 3 −2 0

2 −1 are λ1 = −1 and λ2 = 1 with 3 −2 corresponding eigenvectors v1 = [1 3]T and v2 = [1 1]T , it follows that the complementary solution to the related homogeneous system is 1 1 xh = c1 e −t + c2 e t 3 1

Solution.

Since the eigenvalues of A =

To determine the particular solution xp to the given nonhomogeneous system, we need to ﬁnd a vector function x(t ) that simultaneously satisﬁes the system (3.6.9). Due to the presence of cos 2t in the vector b, it is natural to guess that the components of xp will somehow involve cos 2t . In addition, since xp plays a role in the system, we must account for the possibility that the derivative of cos 2t may also arise; moreover, since Ax will also be computed, linear combinations of vectors that involve the entries in x will be present. Therefore, we make the reasonable guess that xp has the form a cos 2t + b sin 2t xp = (3.6.10) c cos 2t + d sin 2t and attempt to determine values for the undetermined coefﬁcients a , b , c , and d that make xp a solution to the system. We accomplish this by direct substitution into (3.6.9). First, observe that −2a sin 2t + 2b cos 2t xp = (3.6.11) −2c sin 2t + 2d cos 2t

240

Linear systems of differential equations

Now substituting (3.6.10) and (3.6.11) into (3.6.9), it follows 2 −1 a cos 2t + b sin 2t cos 2t −2a sin 2t + 2b cos 2t = + 3 −2 c cos 2t + d sin 2t 0 −2c sin 2t + 2d cos 2t If we now expand the matrix product and factor out the terms involving sin 2t and cos 2t on the right side, −2a sin 2t + 2b cos 2t = (2b − d) sin 2t + (2a − c + 1) cos 2t

(3.6.12)

−2c sin 2t + 2d cos 2t = (3b − 2d) sin 2t + (3a − 2c) cos 2t

(3.6.13)

In (3.6.12), we can equate the coefﬁcients of sin 2t to ﬁnd that −2a = 2b − d. Doing likewise for the coefﬁcients of cos 2t , 2b = 2a − c + 1. Similarly, (3.6.13) results in the two equations −2c = 3b − 2d and 2d = 3a − 2c. Reorganizing these four equations in four unknowns, we see that a , b , c , and d must satisfy the system −2a − 2b + d = 0 −2a + 2b + c = 1 −3b − 2c + 2d = 0 −3a + 2c + 2d = 0

Row-reducing,

⎡

0 −2 −2 ⎢−2 2 1 ⎢ ⎣ 0 −3 −2 −3 0 2

1 0 2 2

⎤ ⎡ 0 1 ⎢0 1⎥ ⎥→⎢ ⎣0 0⎦ 0 0

0 1 0 0

0 0 1 0

⎤ 0 −2/5 0 2/5⎥ ⎥ 0 −3/5⎦ 1 0

which shows a = −2/5, b = 2/5, c = −3/5, and d = 0, so a particular solution to the nonhomogeneous system is

− 25 cos 2t + 25 sin 2t xp = − 35 cos 2t Finally, it follows that the general solution to the system is

2 − 5 cos 2t + 25 sin 2t −t 1 t 1 x = x h + x p = c1 e + c2 e + 3 1 − 35 cos 2t One lesson to take from example 3.6.2 is that while the process for trying to solve a nonhomogeneous system of differential equations is straightforward, the actual computation of a particular solution xp can be quite cumbersome. Indeed, even in the case where the vector b is quite simple, as it is in the most recent example, tedious calculations can arise. Moreover, it is less clear how one might proceed in the situation where the vector b is particularly complicated. Speciﬁcally, making an appropriate guess for xp may be difﬁcult. We usually

Nonhomogeneous systems: undetermined coefﬁcients

241

call the process of ﬁnding xp through a guess involving unknown constants the method of undetermined coefﬁcients. To gain a better sense of the guesses that are involved in using undetermined coefﬁcients, we turn to the following example. Example 3.6.3 For nonhomogeneous linear systems of the form x = Ax + b where A is a matrix with constant entries, state the natural guess to use for xp when the vector b is −t 2 −3t e 1 t e (b) b = (c) b = (d) b = (a) b = t 2e −t 0 −2 Solution. (a) With b = [e −t 2e −t ]T , it is natural to expect that any particular solution must involve e −t in its components. Speciﬁcally, we make the guess that −t Ae xp = Be −t and substitute directly into x = Ax + b in order to attempt to ﬁnd values of A and B for which xp satisﬁes the given system.3 (b) Given b = [1 t ]T , we must account for the fact that xp and its derivative can involve constant and linear functions of t . In particular, we suppose that At + B xp = Ct + D and substitute appropriately in an effort to determine A, B, C, and D. (c) For b = [t 2 0]T , with one quadratic term present in b, it is necessary to include quadratic terms in each entry of xp . But since the derivative of xp will be taken, linear terms must be included as well. Finally, once linear terms are included, for the same reason we must permit the possibility that constant terms can be present in xp . Therefore, we guess the form 2 At + Bt + C xp = Dt 2 + Et + F (d) With b = [e −3t − 2]T having both an exponential and constant term present, we account for both of these scalar functions and their derivative by assuming that −3t Ae +B xp = Ce −3t + D 3 It is possible that the guess can fail to work, in which case a modiﬁed form for x is required. One p setting where this may occur is when λ = −1 is an eigenvalue of A, whereby a vector involving e −t already appears in the complementary solution xh . See exercise 8 for further investigation of this issue.

242

Linear systems of differential equations

The method of undetermined coefﬁcients is not foolproof: it is certainly possible to guess incorrectly (as noted in the footnote related to part (a) of example 3.6.3). If our guess is incorrect, an inconsistent linear system of algebraic equations will arise, which tells us we need to modify our guess. Besides the possibility of guessing incorrectly, it can also be the case that the computations involved in determining xp are very cumbersome. In the next section, we consider a different approach, one that parallels our solution of single linear ﬁrst-order differential equations of the form y + p(t )y = f (t ), that provides, at least in theory, an algorithmic approach to solving any nonhomogeneous system x = Ax + b where the matrix A has real, constant entries. Finally, we note that the presence of nonconstant entries in the vector b in a nonhomogeneous system x = Ax + b makes it impossible to plot a direction ﬁeld for the system. In particular, when we sketch direction ﬁelds, we rely on the fact that regardless of time, t , the direction vector x to the solution curve x is dependent only on the location (x1 , x2 ), and not on t . When b is nonconstant and a function of t , this is no longer the case and we therefore are left with only algebraic approaches to the problem. If b is constant, then we can generate the direction ﬁeld for the system, such as the one shown in ﬁgure 3.14. Exercises 3.6 In each of exercises 1–4, show by direct substitution that the given particular solution xp is indeed a solution to the stated nonhomogeneous system of equations. Hence determine the general solution to the stated system. 3 5 −1 −4 1. x = x+ , xp = 2 −3 −1 −3 2t 1 −2 −1/3 e 2t 2. x = x+ , xp = e 2/3 −2 1 0 2 1 sin t −2/5 −3/10 + cos t 3. x = x+ , xp = sin t 1 2 0 1/10 1/5 2t 1 1 3/14 −3 e +1 x+ , xp = + e 2t 4. x = 1 −1 2 1/14 1 5. Consider the system of differential equations 1 1 1 x = x+ 4 1 −3 (a) Explain why it is reasonable to assume that xp is a constant vector, and use this assumption to determine a particular solution to the given nonhomogeneous system. (b) Determine the complementary solution xh to the associated homogeneous system, x = Ax. (c) State the general solution to the system. (d) Is there an equilibrium solution to this system? If so, is it stable? Explain.

Nonhomogeneous systems: undetermined coefﬁcients

243

6. Consider the system of differential equations 4t 1 1 e x = x+ 4 1 0 (a) Explain why it is reasonable to assume that xp is a vector of the form 4t ae xp = be 4t Then use this assumption to determine a particular solution to the given nonhomogeneous system. (b) Determine the complementary solution xh to the associated homogeneus system, x = Ax. (c) State the general solution to the system. 7. Consider the system of differential equations −2t 1 1 e +1 x+ x = 4 1 2e −2t + 3 (a) Explain why it is reasonable to assume that xp is a vector of the form −2t ae +b xp = ce −2t + d Use this assumption to determine a particular solution to the given nonhomogeneous system. (b) Determine the complementary solution xh to the associated homogeneus system, x = Ax. (c) State the general solution to the system. 8. Consider the system of differential equations −t 1 1 e x+ x = 4 1 0 (a) Explain why it is reasonable to assume that xp is a vector of the form −t ae xp = be −t (b) Show that the form of xp above does not result in a particular solution to the system. (c) By assuming that xp is a vector of the form −t ae + bte −t xp = ce −t + dte −t determine a particular solution to the given nonhomogeneous system (d) Determine the complementary solution xh to the associated homogeneus system, x = Ax. (e) State the general solution to the system.

244

Linear systems of differential equations

For the nonhomogeneous linear systems of differential equations given in exercises 9–17, (a) determine a particular solution xp by making an appropriate assumption about the form of xp , (b) determine the complementary solution xh to x = Ax, and (c) hence state the general solution to the system. 5 1 −4 9. x = x+ 5 −4 −1 −2t 5 3e −4 10. x = x+ 5 −4 −e −2t 3e −2t −1 1 11. x = x+ 0 1 −4e −2t 2 −1 1 12. x = x+ 0 1 −5 0 −1 3 13. x = x+ 1 0 −2 0 −1 et 14. x = x+ 1 0 −2e t 0 −1 3 + et 15. x = x+ 1 0 −2 − 2e t 2 −1 t −2 16. x = x+ 3 −2 3t − 4 2 −1 cos 3t 17. x = x+ 3 −2 4 18. For the system of differential equations given in exercise 10, solve the IVP with initial condition x(0) = [1 − 2]T . 19. For the system of differential equations given in exercise 11, solve the IVP with initial condition x(0) = [−3 − 2]T . 20. For the system of differential equations given in exercise 14, solve the IVP with initial condition x(0) = [0 4]T . 21. For the system of differential equations given in exercise 15, solve the IVP with initial condition x(0) = [1 − 2]T . 22. Without actually computing xp , choose and justify the form you would guess for a particular solution to 5 1 −4 −2t x = x + e sin t 5 −4 −1

Nonhomogeneous systems: variation of parameters

245

23. Without actually computing xp , choose and justify the form you would guess for a particular solution to −4 5 sin 3t x = x+ 5 −4 cos 2t 24. Suppose that x1 (t ) and x2 (t ) are solutions of x = Ax + f1 (t ) and x = Ax + f2 (t ) respectively. Show that x(t ) = x1 (t ) + x2 (t ) is a solution of x = Ax + f1 (t ) + f2 (t ) 3.7 Nonhomogeneous systems: variation of parameters

In section 3.6, we discovered that solving the nonhomogeneous linear system x = Ax + b requires us to ﬁnd one particular solution xp to the nonhomogeneous system. We then combine this particular solution with the complementary solution xh —the general solution to the corresponding homogeneous system x = Ax. While we were able to successfully solve a range of problems, the method of undetermined coefﬁcients is somewhat dissatisfying: essentially we made an educated guess as to the form that xp should take, and then substituted to see if our guess was appropriate and resulted in a particular solution. As was shown in exercise 8 in section 3.6, there are instances when the obvious guess fails to work and additional investigation of a possible solution xp is needed. Moreover, with undetermined coefﬁcients we only considered functions b(t ) that had entries that were polynomial, sinusoidal, or exponential in nature. We desire a more systematic approach to ﬁnding xp ; developing such a method is the purpose of this section. In section 2.3, we learned that for any linear ﬁrst-order differential equation of the form y + p(t )y = f (t ), the solution y is given by −P(t ) e P(t ) f (t ) dt (3.7.1) y =e where P(t ) = p(t ) dt . We now seek to establish a similar result for the case of systems of the form x = Ax + b, where A is an n × n matrix with constant entries and b is a vector function of t . Let us ﬁrst consider the form of the general solution xh to the corresponding homogeneous system. Recall that x = c1 x1 + · · · + cn xn , where {x1 , . . . , xn } is a set of n linearly independent solutions to x = Ax. Being more explicit about the vectors present, say with entries xij (t ), we can rewrite x = c1 x1 + · · · + cn xn as ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ x11 x12 x1n c1 x11 + c2 x12 + · · · + cn x1n ⎢ x21 ⎥ ⎢ x22 ⎥ ⎢ x2n ⎥ ⎢ c1 x21 + c2 x22 + · · · + cn x2n ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ x = c1 ⎢ .. ⎣ ... ⎦ + c2 ⎣ ... ⎦ + · · · + cn ⎣ ... ⎦ = ⎣ ⎦ .

xn1

xn2

xnn

c1 xn1 + c2 xn2 + · · · + cn xnn

246

Linear systems of differential equations

Now observe that the right side of the above equation—the overall vector formulation of x—can be expressed as a matrix product. In particular, we write x = C

(3.7.2)

where C is the vector whose entries are the arbitrary constants c1 , . . . , cn that arise in the formulation of the general solution x, and (t ) is the matrix whose columns are the n linearly independent solutions to x = Ax. We call (t ) the fundamental solution matrix of the system. At this point, it is essential to make two observations about (t ). The ﬁrst is that (t ) is nonsingular for every relevant value of t . This holds because the columns of (t ) are linearly independent since, by deﬁnition, they are linearly independent solutions of x = Ax. Second, we note that (t ) = A(t ). Since the derivative of (t ) is taken component-wise, this equation is simply the matrix way to say that each column of (t ) satisﬁes the homogeneous system of equations x = Ax. Now, recall (3.7.2) where we expressed the complementary solution in the form xh = (t )C. As we now seek a particular solution xp to the nonhomogeneous equation, it is natural to suppose that xp has the form xp (t ) = (t )u(t )

(3.7.3)

where u(t ) is a function yet to be determined. We now substitute this guess for xp into x = Ax + b(t ) to see what conditions u must satisfy. For ease of display, in what follows we suppress the “(t )” notation in each of the functions , u, u , and b. By the product rule, xp = (u) = u + u and so substituting into the system x = Ax + b(t ), we have u + u = Au + b

(3.7.4)

Recalling our observation above that = A, we can substitute in (3.7.4) to ﬁnd u + Au = Au + b

(3.7.5)

We next subtract Au from both sides of (3.7.5) to deduce that u = b

(3.7.6)

Since we are interested in determining the unknown function u, and we know that is nonsingular, we may now write u = −1 b and, therefore, u must have the form u(t ) = −1 (t )b(t ) dt

(3.7.7)

(3.7.8)

Nonhomogeneous systems: variation of parameters

247

Finally, recalling the supposition we made in (3.7.3) that xp = u, (3.7.8) now implies xp (t ) = (t ) −1 (t )b(t ) dt (3.7.9) It is remarkable how this form of xp aligns with our experience with a single linear ﬁrst-order differential equation and the form of its solution given by (3.7.1). We summarize our above work in the following theorem. Theorem 3.7.1 If A is an n × n matrix with constant entries, (t ) is the fundamental solution matrix of the homogeneous system of differential equations x = Ax, and b(t ) is a continuous vector function, then a particular solution xp to the nonhomogeneous system x = Ax + b(t ) is given by xp (t ) = (t ) −1 (t )b(t ) dt (3.7.10) The approach to ﬁnding a particular solution given in theorem 3.7.1 is often called variation of parameters. We next consider an example to see theorem 3.7.1 at work. Example 3.7.1 Find the general solution of the nonhomogeneous system given by 2 −1 0 x+ t x = 3 −2 4 Solution. From our determination of the eigenvalues and eigenvectors of the same coefﬁcient matrix in example 3.6.2, the complementary solution is 1 1 xh = c1 e −t + c2 e t 3 1 Therefore, the fundamental matrix is

e −t e t (t ) = 3e −t e t

According to (3.7.10), we next need to compute −1 . While the inverse of this matrix of functions may be computed by row-reducing [ | I] in the usual way, because of the function coefﬁcients in it is much easier to use a shortcut for computing the inverse of a 2 × 2 matrix that we established in exercise 19 of section 1.9. Speciﬁcally, if a b A= c d is an invertible matrix, then 1 d −b A−1 = a det(A) −c

248

Linear systems of differential equations

Here, since det() = e −t e t − 3e −t e t = −2, it follows 1 e t −e t −1 =− −t e −t 2 −3e Thus, by (3.7.10), we now have xp (t ) = (t ) −1 (t )b(t ) dt

e −t 3e −t −t e = 3e −t =

1 t − 12 e t 2e 3 −t 1 −t e − e 2 2 t t e 2te dt et −2te −t

et et

0 dt 4t

Integrating the vector function component-wise by parts and computing the subsequent matrix product, −t t e e 2(t − 1)e t xp (t ) = 3e −t e t 2(t + 1)e −t 2(t − 1) + 2(t + 1) = 6(t − 1) + 2(t + 1) 4t = 8t − 4 Therefore, the general solution to the original nonhomogeneous system is 1 1 4t + c2 e t + x = xh + xp = c1 e −t 3 1 8t − 4 Example 3.7.1 demonstrates that there are three key steps in the solution to systems of the form x = Ax + b(t ). The ﬁrst is solving the related homogeneous system x = Ax to determine the fundamental solution matrix (t ). Next, we have to compute −1 (t ). And ﬁnally, we must integrate the vector function given by −1 (t )b(t ). Since we are seeking just one particular solution xp , there is no need to include the arbitrary constants that arise in antidifferentiating −1 (t )b(t ). We close this section with a second example that shows the computations involved when more complicated functions are present in b(t ). Example 3.7.2 given by

Find the general solution of the nonhomogeneous system x =

2 −1 1/(e t + 1) x+ 3 −2 1

Nonhomogeneous systems: variation of parameters

249

Solution. We ﬁrst ﬁnd xh . By ﬁnding the eigenvalues and eigenvectors of the coefﬁcient matrix A, it is straightforward to show that 1 1 + c2 e t xh = c1 e −t 3 1 Therefore, the fundamental solution matrix is −t t e e (t ) = 3e −t e t Moreover, we can show that −1

1 e t −e t (t ) = − −t e −t 2 −3e

We are now ready to compute xp and write xp (t ) = (t ) −1 (t )b(t ) dt

e −t e t = 3e −t e t =

e −t e t 3e −t e t

e t −e t 1/(e t + 1) 1 − dt 1 2 −3e −t e −t ⎤ ⎡ 2t ⎣

1 e 2 e t +1 ⎦ dt 1 2e −t −1 t 2 e +1

At this point, it is easiest to use a computer algebra system to integrate and complete our calculation of xp . Doing so, and then ﬁnding the required matrix product, we have

−t t 1 t 1 t e e 2 e − 2 ln(e + 1) xp (t ) = 3e −t e t −e −t − 3 t + 3 ln(e t + 1) 2 2

1 1 −t 3 t t − 2 − 2 e ln(e + 1) − 2 te + 32 e t ln(e t + 1) = 1 3 −t 3 t 3 t t t 2 − 2 e ln(e + 1) − 2 te + 2 e ln(e + 1) Hence, the general solution to the given nonhomogeneous system is 1 −t 1 t + c2 e x = x h + x p = c1 e 3 1 ⎤ ⎡ − 12 − 12 e −t ln(e t + 1) − 32 te t + 32 e t ln(e t + 1) ⎦ +⎣ 1 3 −t t + 1) − 3 te t + 3 e t ln(e t + 1) − e ln(e 2 2 2 2

250

Linear systems of differential equations

At each stage in applying variation of parameters it is essential to simplify. In particular, −1 (t ) should be simpliﬁed as much as possible before computing −1 (t )b(t ), and similarly, −1(t )b(t ) dt should be simpliﬁed as much as possible before computing (t ) −1 (t )b(t ) dt . One option, of course, is to use a computer algebra system to avoid the more tedious aspects of the computations. We offer some suggestions for how to use Maple to assist in the computations in the following subsection. 3.7.1 Applying variation of parameters using Maple

Here we address how Maple can be used to execute the computations in a problem such as the one posed in example 3.7.2, where we are interested in solving the nonhomogeneous linear system of equations given by 2 −1 1/(e t + 1) x = x+ 3 −2 1 As usual, we load the Linear Algebra package. > with(LinearAlgebra):

Because we already know how to ﬁnd the complementary solution, we focus on determining xp by variation of parameters. First, we use the complementary solution, 1 1 + c2 e t xh = c1 e −t 3 1 to deﬁne the fundamental matrix (t ): > Phi := |<exp(t),exp(t)>>;

We next use the MatrixInverse command to ﬁnd −1 by entering > MatrixInverse(Phi);

The resulting output is

− 12 e −1 t 3 1 2 et

1 1 2 e −t − 12 e1t

We can simplify this result using negative exponents; Maple can do so through the following command, through which we also store −1 in PhiInv: > PhiInv := simplify(MatrixInverse(Phi));

Next, in order to compute −1 (t )b(t ), we must enter the function b(t ). We enter > b := ;

Nonhomogeneous systems: variation of parameters

251

and then > y := simplify(PhiInv.b);

At this point, y is a 2 × 1 array that holds the vector function −1 (t )b(t ). Speciﬁcally, the output for y displayed by Maple is ⎡ ⎤ 2t 1 e 2 e t +1

⎦ y := ⎣ −t 2+e t ) − 12 e (e− t +1

To access the components in y, we reference them with the commands y[1,1] and y[2,1]. In particular, since we have to integrate −1 (t )b(t ) componentwise, we enter > Y := ;

This last command produces the output

1 t 1 t 2 e − 2 ln(e + 1) Y := − e1t − 32 ln(e t ) + 32 ln(e t + 1) and obviously stores −1 (t )b(t ) in Y . Note that Maple has not made the obvious simpliﬁcation ln(e t ) = t . Finally, in order to compute (t ) −1 (t )b(t ) dt , we need to enter Phi.Y. Of course, we again want to simplify, so we use > simplify(Phi.Y);

which produces the output 1 1 −t

− 2 − 2 e ln(e t + 1) − 32 e t ln(e t ) + 32 e t ln(e t + 1) 1 2

− 32 e −t ln(e t + 1) − 32 e t ln(e t ) + 32 e t ln(e t + 1)

This last result is the particular solution xp to the original system of nonhomogeneous equations given in example 3.7.2. Note again that we can simplify ln(e t ) to t in each component. Exercises 3.7 1. Consider the of system differential equations given by 3 2 5 x = x+ 2 3 −1 (a) Based on the form of b(t ), make a guess and determine xp by undetermined coefﬁcients. (b) Use variation of parameters to determine xp .

252

Linear systems of differential equations

2. Consider the 2t of differential equations given by system 3 2 e x = x+ 2 3 0 (a) Based on the form of b(t ), make a guess and determine xp by undetermined coefﬁcients. (b) Use variation of parameters to determine xp . 3. Consider the t of system differential equations given by 3 2 3e x = x+ 2 3 et (a) Based on the form of b(t ), what is the natural guess for xp ? Show that this natural guess fails to work. (b) Compute the complementary solution xh to the stated system and use its form to explain why the natural guess in (a) is not a valid one. (c) Use variation of parameters to determine xp . 4. Consider equations given by thesystem of differential 0 2 4 sin t x = x+ 1 −1 2 sin t (a) Based on the form of b(t ), what would be the natural guess to make for xp ? How many undetermined coefﬁcients would need to be computed? (b) Use variation of parameters to determine xp . In each of the exercises 5–12, determine the general solution to the given system by ﬁnding xp using variation of parameters. Note that in each case, (t ) is given. −t t 2e 0 1 0 2e 5. x = x+ , (t ) = −1 3 1 e t e 3t 3t t 2e 0 1 0 e 6. x = x+ , (t ) = −1 3 e t e 3t −e 3t t 2e 0 1 0 cos 2t 7. x = x+ , (t ) = 2 sin 2t −1 3 e t e 3t 3t 2 1 10t e −e −t 8. x = x+ , (t ) = 3t 3 0 10t e 3e −t −t 3t 2 1 2e e −e −t 9. x = x+ , (t ) = 3t 3 0 5e −t e 3e −t 2t t 1 1 e cos t e t sin t e 10. x = x+ , (t ) = −1 1 −e t sin t e t cos t 0

Applications of linear systems

1 1 2t + 2 11. = x+ , 0 −1 1 ⎤ ⎡ ⎡ t⎤ 2 1 0 e 12. x = ⎣0 2 0⎦ x + ⎣ 1⎦ , 0 0 1 0

x

e t cos t e t sin t (t ) = −e t sin t e t cos t ⎤ ⎡ 2t e te 2t 0 (t ) = ⎣ 0 e 2t 0⎦ − 0 0 e t

253

3.8 Applications of linear systems

In this section, we consider three fundamental physical problems that may be modeled and studied using linear systems of differential equations. 3.8.1 Mixing problems

Through our study of the motivating example provided at the start of chapter 1 and reconsidered at the beginning of the current chapter, we have seen that mixing problems naturally lead to nonhomogeneous linear systems of differential equations. Below, we examine a slightly more complicated example. Consider a system of three tanks connected in such a way that each of the tanks has an independent inﬂow that delivers salt solution to it, each has an independent outﬂow (drain), and each tank is connected to the other two with both outﬂow and inﬂow pipes. The relevant information about each tank is given in table 3.2. We set up a system of differential equations whose solution represents the amount of salt in each tank at time t and state the system in matrix form. For tank A, we denote the amount of salt (in grams) in the tank at time t (in minutes) by x1 (t ). Similarly, we let x2 (t ) and x3 (t ) represent the amount of salt in tanks B and C. A careful check of the given data shows that for each tank the total rates Table 3.2 Saltwater mixing in three tanks A, B, and C

Tank A

Tank B

Tank C

Tank volume

50 liters

100 liters

200 liters

Rate of inﬂow to the tank

2 liters/min

4 liters/min

5 liters/min

Concentration of salt in inﬂow

0.25 g/liter

2 g/liter

0.9 g/liter

Rate of drain outﬂow

2 liters/min

4 liters/min

5 liters/min

Rates of outﬂows to other tanks to B: 3 liters/min to C: 1 liter/min to A: 4 liters/min Rates of outﬂows to other tanks to C: 4 liters/min to A: 3 liters/min to B: 1 liter/min

254

Linear systems of differential equations

of inﬂow and outﬂow of solution balance so that the volume of solution in each tank is constant. From the given information on the independent inﬂow to the tank, we know that tank A gains salt at a rate of 0.25

g liters g ·2 = 0.5 liter min min

(3.8.1)

Furthermore, tank A also gains salt from the two inﬂows that come from tanks B and C. For tank B, which contains 100 liters of solution, solution ﬂows to A at a rate of 3 liters/min with a concentration of x2 (t )/100 g/liter, so that salt is gained by tank A at a rate of liters 3x2 g x2 g ·3 = 100 liter min 100 min

(3.8.2)

Similarly, the ﬂow from tank C to tank A results in A gaining salt at a rate of liters x3 g x3 g ·4 = 200 liter min 50 min

(3.8.3)

Tank A is also losing salt through its three outﬂows: a drain, ﬂow to tank B, and ﬂow to tank C. Since the concentration of solution in tank A at time t is x1 (t )/50 g/liter, it follows that each outﬂow carries this concentration of salt, doing so at respective rates of 2 liters/min, 3 liters/min, and 4 liters/min. This shows that solution is leaving tank A at a cumulative rate of 9 liters/min, therefore causing the rate at which salt is lost from tank A to be x1 g liters 9x1 g ·9 = 50 liter min 50 min

(3.8.4)

Combining the rates of inﬂow and outﬂow in (3.8.1), (3.8.2), (3.8.3), and (3.8.4), it follows that x1 (t ) satisﬁes the differential equation x1 = 0.5 +

3x2 4x3 9x1 + − 100 200 50

(3.8.5)

Similar reasoning shows that x2 (t ) and x3 (t ) satisfy the differential equations 3x1 x3 8x2 x2 = 8 + + − (3.8.6) 50 200 100 and 4x1 x2 10x3 + − (3.8.7) x3 = 4.5 + 50 100 200 Rearranging (3.8.5), (3.8.6), and (3.8.7) and writing the system they generate in matrix form, we see ⎡ ⎡ ⎤ ⎤ −9/50 3/100 1/50 0.5 x = ⎣ 3/50 −2/25 1/200⎦ x + ⎣ 8⎦ (3.8.8) 4.5 2/25 1/100 −1/20

Applications of linear systems

255

We can easily determine the equilibrium solution to the system by setting x = 0 and row-reducing the resulting linear system of equations. Doing so results in ⎡ ⎤ ⎡ ⎤ −9/50 3/100 1/50 −0.5 1 0 0 50 ⎣ 3/50 −2/25 1/200 −8⎦ → ⎣0 1 0 150⎦ 0 0 1 200 2/25 1/100 −1/20 −4.5 so that x1 = 50, x2 = 150, x3 = 200 is the only equilibrium solution to the system. In addition, the eigenpairs of the coefﬁcient matrix A are approximately λ = − 0.030, −0.204, −0.076 and v = [0.203 0.346 1]T , [−2.041 0.949 1]T , [−0.168 − 1.250 1]T . Since all three eigenvalues are real and negative, we can conclude that the above equilibrium is a stable attracting node. Moreover, we can determine the general solution to the system. The eigenvalues and eigenvectors provide us with xh , the complementary solution, while xp is given by the equilibrium solution so that ⎡ ⎤ ⎡ ⎤ −2.041 0.203 x(t ) = c1 e −0.030t ⎣ 0.346 ⎦ + c2 e −0.204t ⎣ 0.949 ⎦ 1 1 ⎡ ⎤ ⎡ ⎤ −0.168 50 + c3 e −0.076t ⎣ −1.250 ⎦ + ⎣ 150 ⎦ 200 1 We conclude from this example that three connected tanks generate a natural example of a linear system of nonhomogeneous differential equations. Certainly, we can envision similar ideas being applied to more complicated scenarios, such as the spread of a pollutant through a connected chain of rivers and lakes. 3.8.2 Spring-mass systems

In section 3.1, we developed the linear second-order differential equation that governs the behavior of a spring-mass system and converted the equation to a system of two ﬁrst-order equations. In particular, we learned that for a system with mass m, spring constant k, damping constant c, and driving force F (t ), the displacement y(t ) of the mass from its equilibrium position satisﬁes the DE y +

c k 1 y + y = F (t ) m m m

(3.8.9)

Moreover, using the substitution x1 = y and x2 = x1 = y , it follows that (3.8.9) can be represented by the system x1 = x2 x2 = −

k c 1 x1 − x2 + F (t ) m m m

(3.8.10)

256

Linear systems of differential equations

L k1

k2 m1 equilibrium

m2 equilibrium

Figure 3.15 Two masses m1 and m2 joined by two springs, at equilibrium.

Next, we consider the more complicated case of a system involving two masses and two springs, but omit damping and driving forces. In particular, suppose that a mass m1 is attached to a spring with spring constant k1 and that from m1 a second spring with constant k2 and mass m2 is attached, as shown in ﬁgure 3.15. While we represent the masses with boxes, for our theoretical work we assume we are working with point-masses, where all of the mass is concentrated at a single point. We can envision these points as lying at the centers of the respective boxes in ﬁgure 3.15. To omit damping, we assume that the surface on which the masses rest is frictionless. In addition, once the masses are set in motion by some collection of initial displacements and velocities, we let x1 (t ) denote the displacement of m1 from its equilibrium position and x2 (t ) the displacement of m2 from its equilibrium position and set the system in motion, as shown in ﬁgure 3.16. We seek a system of ﬁrst-order differential equations that models this situation. Note that m1 has two springs attached to it, so each spring exerts forces on m1 . One is F1 = −k1 x1 , which is the force the ﬁrst spring exerts to oppose the displacement of the ﬁrst mass. Next, observe that when the system is at equilibrium, the distance between the two masses is some constant L. Once the system is set in motion, the distance between the two masses is L + x2 − x1 . As such, the second spring is being stretched a length of x2 − x1 beyond where it is when the system is at equilibrium. On mass m1 this exerts a force in the opposite direction of F1 , speciﬁcally the force F2 = k2 (x2 − x1 ) on m1 . On the second mass m2 there is only this same force exerted by the second spring, but in the opposite direction as on m1 . In particular, F3 = −k2 (x2 − x1 ) acts on m2 . L

x1

x2

Figure 3.16 Two masses m1 and m2 and two springs displaced from equilibrium.

Applications of linear systems

257

Now, because we have omitted damping and forcing, these are the only forces acting on m1 and m2 . Newton’s second law tells us that the sum of all forces acting on an object must equal the object’s mass times its acceleration. In particular, we have m1 x1 = −k1 x1 + k2 (x2 − x1 ) m2 x2 = −k2 (x2 − x1 ) Dividing through by m1 and m2 , respectively, these observations lead us to the system of linear second-order differential equations k1 k2 x1 + (x2 − x1 ) m1 m1 k2 x2 = − (x2 − x1 ) m2 x1 = −

(3.8.11)

To study the behavior of this system with the techniques that we have developed, we must convert each of the second-order equations to a system of two ﬁrstorder equations. Before doing so, we introduce speciﬁc numerical values for the masses and spring constants to simplify our work. We let k1 = 2 and k2 = 1, and m1 = 2 and m2 = 4. This yields the system x1 = −x1 + 0.5(x2 − x1 ) x2 = −0.25(x2 − x1 )

(3.8.12)

Using the substitutions y1 = x1 , y2 = y1 = x1 , y3 = x2 , y4 = y3 = x2 , it follows that (3.8.12) results in the system of four ﬁrst-order equations given by y1 = y2

y2 = −y1 + 0.5(y3 − y1 )

y3 = y4

(3.8.13)

y4 = −0.25(y3 − y1 )

Letting y be the vector [y1 y2 y3 y4 ]T , we can write (3.8.13) in matrix form, ⎤ ⎡ 0 1 0 0 ⎢−1.5 0 0.5 0⎥ ⎥y (3.8.14) y = ⎢ ⎣ 0 0 0 1⎦ 0.25 0 −0.25 0 From this, we can now analyze the overall behavior of the coupled spring-mass system. In particular, the eigenvalues and eigenvectors of the coefﬁcient matrix in (3.8.14) will enable us to ﬁnd the general solution y. Given initial conditions, we can fully describe the functions yi (t )—particularly y1 and y3 , which represent the respective displacements of the masses in the system—and understand the behavior of the system over time. This problem and others like it are explored further in the exercises at the end of this section.

258

Linear systems of differential equations

3.8.3 RLC circuits

The ﬂow of electricity through a circuit, much like the ﬂow of water in a pipe, naturally involves relationships with rates of change. As such, the study of electrical current involves differential equations. Here, we explore some fundamental properties of electricity and how these lead to such equations. Throughout what follows, we will make use of the analogy that the ﬂow of charge carriers in an electrical circuit is like the ﬂow of particles in a moving stream of water. Just as we consider ﬂow of water in a pipe to be the number of water particles ﬂowing past a given point during a certain time interval, the current I (t ) in a circuit at time t is proportional to the number of positive charge carriers that move past any given point per second in the conductor. Note particularly that current measures a rate of change of charge. Current is measured in amperes(amp), the base unit through which all other units will be deﬁned. One ampere corresponds to 6.2420 × 1018 charge carriers per second moving past a given point. The unit of charge is a coulomb, which is the amount of charge that ﬂows through a cross section of a wire in one second when a one amp current is ﬂowing. In other words, 1 amp = 1 coulomb/s Here, we begin to see how derivatives and integrals are involved in the study of electricity. The current I (t ) at time t is by deﬁnition a rate of change of charge. Thus, by the Fundamental Theorem of Calculus, the total amount of charge that ﬂows past a given point on a time interval [t0 , t1 ] is given by t1 I (s) ds (3.8.15) t0

If we let Q(t ) measure the total accumulated charge at a given point in the circuit from time t0 up to time t , then we have t Q(t ) = Q(t0 ) + I (s) ds (3.8.16) t0

Q (t ) = I (t ).

and therefore As current ﬂows through a circuit, the charge carriers and elements in the circuit exchange energy. We, therefore, deﬁne a potential function V throughout a circuit. The energy (per coulomb of charge) that has been exchanged by the charge carriers as they ﬂow from point a to point b is computed as Vab = Va − Vb where Va and Vb are the values of the potential function at points a and b in the circuit. The difference Vab is called the voltage drop from a to b and is measured in joules per coulomb, which are also known as volts. If we again think of the ﬂow of water through a pipe, the concept of voltage drop is analogous to the change in water pressure between points a and b. Batteries, for example, maintain a voltage drop between two terminals; the energy provided by a battery’s internal chemicals produces a constant amount of energy per coulomb as charge carriers

Applications of linear systems

259

move throughout the battery, which raises the function V by the voltage rating of the battery. As current ﬂows through a circuit, energy is lost. This makes the potential V at one end lower than the potential at the other. Over a portion of a circuit, say from a to b, where a substantial amount of energy is lost, we say that such a portion is called a resistor. Good examples of resistors are light bulbs and heating elements, because they show how electrical energy can be converted into light and heat. The voltage drop across a resistor and the current ﬂowing through it are modeled by Ohm’s law, which says that the potential difference Vab between the endpoints a and b of a resistor is proportional to the current ﬂowing through the resistor. In other words, (3.8.17) Vab = IR where R is a constant called the resistance. The unit of resistance is the ohm, which is equal to one volt per ampere, or one volt-second per coulomb. A changing electrical current I (t ) in a segment of a circuit will create a changing magnetic ﬁeld that results in a voltage drop between the ends of a segment. When this effect is large, such as in a coil between points b and c (the effect can be magniﬁed by different geometrical arrangements of the circuit), the device that induces the effect is called an inductor. Faraday’s law tells us what happens with the voltage drops across inductors. In particular, the voltage drop across an inductor is proportional to the rate of change of the current, or, in other words dI Vbc = L (3.8.18) dt where L is a constant called the inductance. Note speciﬁcally that Faraday’s law regards the rate of change of current. Inductance is measured in henries. Finally, if a circuit is broken and we include two plates separated by an insulating material (such as air), and the terminals of the circuit are connected to a voltage source (such as a battery), then charges will build up on the plates. In the ongoing analogy to water, this is similar to a tank used to store water to provide a source of pressure. We call the set of plates a capacitor, and speak of the total charge Q(t ) on the capacitor. From (3.8.16), since we know that current I is the rate of change of charge Q, if we know an initial charge Q(t0 ), then given a current I (t ) we can ﬁnd the charge Q(t ) by the relationship t I (s) ds (3.8.19) Q(t ) = Q(t0 ) + t0

Finally, Coulomb’s law states that the voltage drop Vcd across a capacitor between points c and d is proportional to the charge on the capacitor, or t 1 1 Vcd = Q(t ) = Q(t0 ) + I (s)ds (3.8.20) C C t0 where C is called the capacitance of the capacitor and is measured in farads.

260

Linear systems of differential equations

All three of the laws (3.8.17), (3.8.18), and (3.8.20) are based on experimental observations of circuits. Similarly, Kirchoff’s law is a conservation law that tells us what we can expect for the voltage drops across various parts of a circuit. Simply stated, Kirchoff’s law says that if we pick a sequence of points in a closed circuit, then the sum of the voltage drops across these segments is zero. Speciﬁcally, for points a1 , a2 , . . . , an , Va1 a2 + Va2 a3 + · · · + Van−1 an + Van a1 = 0

(3.8.21)

A ﬁnal necessary law for us to consider is Kirchoff’s current law, which tells us that at each point of a circuit, the sum of currents ﬂowing into a point equals the sum of the currents ﬂowing out. For a simple RLC circuit with one loop, Kirchoff’s current law guarantees that we can use a single function I (t ) to model the current at any point at a given time t ; for circuits with multiple loops, multiple functions I (t ) are needed. Now we are prepared to see how these fundamental laws of electricity lead to a second-order differential equation, and hence a 2 × 2 system of ﬁrst-order DEs. Let us consider an RLC circuit that consists of a resistor, inductor, and capacitor, along with some energy (voltage) source E(t ), arranged in series, as shown in ﬁgure 3.17. Kirchoff’s law leads us directly to second-order differential equations that determine the behavior of the current I (t ) in the circuit and the charge Q(t ) on the capacitor. By Ohm’s law, we know thatVab = IR. Similarly, Faraday’s law = " impliesthat Vbc # t dI 1 1 L dt and Coulomb’s law tells us that Vcd = C Q(t ) = C Q(t0 ) + t0 I (s) ds . Finally, we know from the voltage source that Vda = −E(t ). Kirchoff’s law now yields the equation Vab + Vbc + Vcd + Vda = 0, or 1 RI (t ) + LI (t ) + Q(t ) = E(t ) (3.8.22) C L b

c

I(t)

R

a

C

+ −

d

E(t) Figure 3.17 An RLC circuit with resistance

R, inductance L, capacitance C, and energy source E(t ).

Applications of linear systems

261

Recalling that Q (t ) = I (t ), we may rewrite (3.8.22) in two different ways. If we differentiate both sides of (3.8.22), and rearrange the terms in decreasing order of derivatives, it follows immediately that the current I (t ) must satisfy the linear second-order differential equation LI (t ) + RI (t ) +

1 I (t ) = E (t ) C

(3.8.23)

If instead we substitute Q for I in (3.8.22), then we see that Q is the solution to the linear second-order differential equation LQ (t ) + RQ (t ) +

1 Q(t ) = E(t ) C

(3.8.24)

We can therefore study the behaviors of different RLC circuits based on the given resistance, inductance, capacitance, and supplied voltage. Moreover, as we well know, any such linear second-order differential equation may be converted to a system of ﬁrst-order equations. For example, letting x1 = I and x2 = I , we can convert (3.8.23) to the system of equations x1 = x2 x2 = −

1 R 1 x1 − x2 + E (t ) CL L L

Example 3.8.1 Determine all solutions I (t ) for an RLC circuit when L = 20 H, R = 80 , C = 10−2 F, and the external voltage is given by the function E(t ) = 50 sin 2t . Solution. From (3.8.23) and the given information, we can immediately determine the second-order differential equation that I (t ) satisﬁes. In particular, since E(t ) = 50 sin 2t , we have E (t ) = 100 cos 2t , and using the values for L, C, and R, I (t ) is a solution to the equation 20I + 80I + 100I = 100 cos 2t

(3.8.25)

Using the substitution x1 = I and x2 = I and multiplying both sides of (3.8.25) by 1/20, the system becomes x1 = x2 x2 = −5x1 − 4x2 + 5 cos 2t From this, we can write the system in matrix form as 0 1 0 x = x+ 5 cos 2t −5 −4

(3.8.26)

For the coefﬁcient matrix A in (3.8.26), we compute the eigenvalues and eigenvectors in order to ﬁnd the complementary solution xh of the system.

262

Linear systems of differential equations

Doing so, we ﬁnd that A has complex eigenvalues and eigenvectors; one eigenvalue-eigenvector pair is −2 − i λ = −2 + i , v = 5 Writing z(t ) = e (−2+i)t

−2 − i 5

we know from theorem 3.5.2 that the real and imaginary parts of the vector function z(t ) will form two real linearly independent solutions to the homogeneous system x = Ax. Rewriting z using Euler’s formula, −2 −1 −2t +i z(t ) = e (cos t + i sin t ) 5 0 −2t −2 cos t + sin t −2t − cos t − 2 sin t + ie =e 5 cos t 5 sin t The real and imaginary parts of z are real linearly independent solutions to x = Ax, so we have determined that the complementary solution to the original system is −2 cos t + sin t − cos t − 2 sin t + c2 e −2t xh = c1 e −2t 5 cos t 5 sin t In theory, we are now ready to apply variation of parameters to ﬁnd a particular solution xp . While we could do so here, the computations get remarkably cumbersome. In the next chapter on higher order differential equations, we will learn that for certain higher order equations, making a good guess at the form of a particular solution provides the simplest approach. In fact, we will even see that keeping certain second-order equations in that form, rather than converting them to systems of ﬁrst-order equations, often is the best way to proceed. For now, we will guess a form for xp . Since 0 b(t ) = 5 cos 2t we assume that a particular solution xp has form a cos 2t + b sin 2t xp = c cos 2t + d sin 2t From this, it follows xp =

−2a sin 2t + 2b cos 2t −2c sin 2t + 2d cos 2t

Applications of linear systems

263

Substituting xp and xp for x and x in (3.8.26), we have c cos 2t + d sin 2t −2a sin 2t + 2b cos 2t = −2c sin 2t + 2d cos 2t −5a cos 2t − 5b sin 2t − 4c cos 2t − 4d sin 2t 0 + 5 cos 2t Equating the coefﬁcients of sin 2t and cos 2t in the entries of the vectors in this most recent vector equation, the following system of four linear equations in a, b, c, and d arises: −2a = d 2b = c −2c = −5b − 4d

2d = −5a − 4c + 5 Rearranging this system to write it in matrix form and row-reducing, we observe ⎡ ⎤ ⎡ ⎤ 0 −1 0 1 0 0 0 1/13 −2 0 ⎢ 0 2 −1 ⎢ 8/13⎥ 0 0⎥ ⎢ ⎥ → ⎢0 1 0 0 ⎥ ⎣ 0 5 −2 ⎣0 0 1 0 16/13⎦ 4 0⎦ 0 0 0 1 −2/13 −5 0 4 2 5 Thus we conclude that a particular solution is 1/13 cos 2t + 8/13 sin 2t xp = 16/13 cos 2t − 2/13 sin 2t In conjunction with our earlier work to ﬁnd xh , we have determined that the general solution to the system of ﬁrst-order differential equations given by (3.8.25) is −2 cos t + sin t − cos t − 2 sin t + c2 e −2t x = c1 e −2t 5 cos t 5 sin t 1/13 cos 2t + 8/13 sin 2t + 16/13 cos 2t − 2/13 sin 2t Recalling that x1 = I is the current in the given RLC circuit, we have shown that 1 8 I (t ) = c1 e −2t (−2 cos t + sin t ) + c2 e −2t (− cos t − 2 sin t ) + cos 2t + sin 2t 13 13 Given initial conditions for I (0) and I (0), we can ﬁnd the values of the constants c1 and c2 . Moreover, we note that as t → ∞, the components of the solution that include e −2t will die off, leaving us with long-term behavior of I (t ) modeled by 1 8 1 8 13 cos 2t + 13 sin 2t . We hence call 13 cos 2t + 13 sin 2t the steady-state solution

264

Linear systems of differential equations

of the original equation (3.8.25) and c1 e −2t (−2 cos t + sin t ) + c2 e −2t (− cos t − 2 sin t ) the transient solution. Overall, we have now seen several examples of important phenomena governed by linear systems of differential equations. Further examples will be considered in the exercises. Exercises 3.8 1. In a closed system of two tanks (i.e, one for which there are no input ﬂows and no output ﬂows), the following information is given. Tank A is ﬁlled with 100 liters of solution whose initial concentration is 0.25 g/liter. Tank B is ﬁlled with 50 liters of solution whose initial concentration is 1 g/liter. The two tanks are connected with two pipes having ﬂows in opposite direction; mixed solution from Tank A ﬂows to Tank B at a rate of 4 liters/min. Similarly, mixed solution ﬂows from Tank B to Tank A at a rate of 4 liters/min. Set up and solve an initial-value problem whose solution will tell you the amount of salt in each tank at time t . Discuss the graphical behavior of the solution x(t ) (whose components are the amount of salt in each tank at time t ). Is there an equilibrium solution to the system? If so, what is it? 2. Consider a system of two tanks connected in such a way that each of the tanks has an independent inﬂow that delivers salt solution to it, each has an independent outﬂow (drain), and each tank is connected to the other with an outﬂow and an inﬂow. The relevant information about each tank is given in the table below. Tank A

Tank B

Tank volume

100 liters

200 liters

Rate of inﬂow to the tank

5 liters/min

9 liters/min

Concentration of salt in inﬂow

7 g/liter

3 g/liter

Rate of drain outﬂow

4 liters/min

10 liters/min

Rates of outﬂows to other tank

to B: 3 liters/min

to A: 2 liters/min

Initially, Tank A has 20 g of salt present in its solution, and Tank B has 75 g of salt present in its solution. Set up and solve an initial-value problem whose solution will determine the amount of salt in each tank at time t . Discuss the graphical behavior of the solution x(t ) (whose components are the amount of salt in each

Applications of linear systems

265

tank at time t ). Is there an equilibrium solution to the system? If so, what is it? 3. Suppose that in exercise 2 all of the given information remains the same except for the fact that instead of saltwater ﬂowing into each tank, pure water ﬂows in. How do the results of your work in exercise 2 change? 4. In a closed system of three tanks (that is, one for which there are no input ﬂows and no output ﬂows), the following information is given. Tank A

Tank B

Tank C

Tank volume

100 liters

150 liters

125 liters

Rates of outﬂows to other tanks

to B: 3 liters/min

to C: 1 liters/min

to A: 4 liters/min

Rates of outﬂows to other tanks

to C: 4 liters/min

to A: 3 liters/min

to B: 1 liter/min

Tank A is ﬁlled with 100 liters of solution whose initial concentration is 8 g/liter. Tank B is ﬁlled with 150 liters of solution whose initial concentration is 3 g/liter. Tank C is initially ﬁlled with 125 liters of pure water. The three tanks are connected with pipes having ﬂows in opposite directions; ﬂow rates are given in the table above. Set up and solve an initial-value problem whose solution will tell you the amount of salt in each tank at time t . Discuss the graphical behavior of the solution x(t ) (whose components are the amount of salt in each tank at time t ). Is there an equilibrium solution to the system? If so, what is it? 5. In a system of three tanks of saltwater, the following information is given. Tank A

Tank B

Tank C

Tank volume

400 liters

200 liters

300 liters

Rate of inﬂow to the tank

7 liters/min

0 liters/min

0 liters/min

Concentration of salt in inﬂow

10 g/liter

n/a

n/a

Rate of drain outﬂow

0 liters/min

0 liters/min

7 liters/min

Rates of outﬂows to other tanks

to B: 7 liters/min

to C: 7 liters/min

to A: 0 liters/min

Rates of outﬂows to other tanks

to C: 0 liters/min

to A: 0 liters/min

to B: 0 liters/min

266

Linear systems of differential equations

Each tank is full; tank A contains solution whose initial concentration is 20 g/liter. Tank B contains solution whose initial concentration is 50 g/liter. Tank C contains pure water. Without setting up a system of differential equations, ﬁrst use your intuition to describe what you think will be the behavior of the functions x1 (t ), x2 (t ), and x3 (t ) that measure the amount of salt in each of the three respective tanks at time t . Then, set up and solve an initial-value problem whose solution will tell you the amount of salt in each tank at time t . Discuss the graphical behavior of each component of the solution x(t ) and compare it to your intuitive expectations. Is there an equilibrium solution to the system? If so, what is it? 6. In a system of three tanks of saltwater interconnected with pipes of inﬂow and outﬂow to and from each, the following information is given. Tank A

Tank B

Tank C

Tank volume

400 liters

800 liters

500 liters

Rate of inﬂow to the tank

5 liters/min

10 liters/min

5 liters/min

Concentration of salt in inﬂow

25 g/liter

15 g/liter

40 g/liter

Rate of drain outﬂow

4 liters/min

7 liters/min

9 liters/min

Rates of outﬂows to other tanks

to B: 6 liters/min

to C: 5 liters/min

to A: 4 liters/min

Rates of outﬂows to other tanks

to C: 4 liters/min

to A: 5 liters/min

to B: 1 liter/min

Assume that the system is such that initially there is a concentration of 10 g/liter of salt in each of the three tanks. Set up and solve an initial-value problem whose solution will tell you the amount of salt in each tank at time t . Discuss the graphical behavior of each component of the solution x(t ). Is there an equilibrium solution to the system? If so, what is it? 7. Recall that for a spring-mass system of mass m, spring constant k, and damping constant c, the displacement y(t ) of the mass from equilibrium is governed by the linear second-order differential equation c k 1 y + y = F (t ) m m m For a mass of 0.5 kg with spring constant k = 2 N/m in an undamped, unforced system, assume the mass is displaced 0.4 m from equilibrium and released (i.e., y(0) = 0.4 and y (0) = 0). y +

Applications of linear systems

267

(a) State the second-order IVP that models this situation. (b) Convert the second-order equation to a system of ﬁrst-order DEs using the standard substitution: x1 = y, x2 = y . (c) Solve the system in (b), and graph the component function x1 (t ). Discuss the long-term behavior of the spring-mass system. 8. For a mass of 0.5 kg with spring constant k = 2 N/m and damping constant c = 0.5 N·s/m in an unforced system, assume the mass is displaced 0.3 m from equilibrium and released. (a) State the second-order IVP that models this situation. (b) Convert the second-order equation to a system of ﬁrst-order DEs using the standard substitution: x1 = y, x2 = y . (c) Solve the system in (b), and graph the component function x1 (t ). Discuss the long-term behavior of the spring-mass system. 9. For a mass of 0.5 kg with spring constant k = 2 N/m and damping constant c = 0.5 N·s/min a forced system with forcing function F (t ) = cos 2t N, assume the mass is initially displaced 0.3 m from equilibrium and released. (a) State the second-order IVP that models this situation. (b) Convert the second-order equation to a system of ﬁrst-order DEs using the standard substitution: x1 = y, x2 = y . (c) Use variation of parameters to solve the system in (b), and graph the component function x1 (t ). Discuss the long-term behavior of the spring-mass system. 10. In section 3.8.2, we considered a system of two masses attached to two springs in parallel, where a mass m1 is attached to a spring with spring constant k1 and from m1 a second spring with constant k2 and mass m2 is attached. See ﬁgure 3.16. If we assume that the surface on which the masses rest is frictionless and let let x1 (t ) denote the displacement of m1 from its equilibrium position and x2 (t ) the displacement of m2 from its equilibrium position and set the system in motion, then the system is governed by the system of second order differential equations k1 k2 (x2 − x1 ) x1 = − x1 + m1 m1 k2 x2 = − (x2 − x1 ) m2 (a) Suppose that k1 = 2, m1 = 1, k2 = 4 and m2 = 0.5. Using the given constant values and the substitution y1 = x1 , y2 = y1 = x1 , y3 = x2 , y4 = y3 = x2 , convert the system of two second-order equations to a system of four ﬁrst-order equations. (b) Assume that the masses m1 and m2 are each displaced 1 unit from their natural equilibrium and released. That is, assume x1 (0) = 1, x1 (0) = 0,

268

Linear systems of differential equations

x2 (0) = 1, and x2 (0) = 0. Solve this initial-value problem using the system in (a) and sketch the plots of y1 and y3 and discuss what they tell you about the system. 11. Recall that the current I (t ) in an RLC circuit is governed by the linear second-order differential equation 1 I (t ) = E (t ) C where L is the inductance, R the resistance, and C the capacitance of the circuit. LI (t ) + RI (t ) +

Suppose we have an RLC circuit for which an inductor of L = 1 henry and capacitor C = 0.01 farad are present. Assume further that I (0) = 100 and I (0) = 0. (a) State a second-order IVP whose solution is I (t ), the current at time t . (b) Convert the IVP in (a) to a system of ﬁrst-order IVPs using a standard substitution. (c) Solve the system in (b) to determine the current I (t ) in the cases where the resistance is (i) R = 0 , (ii) R = 16 , (iii) R = 20 , and (iv) R = 25 , assuming consistent units. Sketch a plot of each solution I (t ) and discuss the impact that changing R has on the current. 12. Suppose we have an RLC circuit for which an inductor of L = 1 H, resistor R = 16 , and capacitor C = 0.01 F are present. Assume further that I (0) = 100 A and I (0) = 0. Finally, suppose that the system is provided a voltage source of E(t ) = 100 sin 10t (a) State a second-order IVP whose solution is I (t ), the current in the circuit at time t . (b) Convert the IVP in (a) to a system of ﬁrst-order IVPs using a standard substitution. (c) Solve the system in (b) to determine the current I (t ) at time t . Sketch a plot of the solution I (t ) and discuss the impact the forcing function has on the current. 3.9 For further study 3.9.1 Diagonalizable matrices and coupled systems

We have seen that in the case where a system of linear ﬁrst-order differential equations is uncoupled, such as x1 3 0 x1 3x1 = = 0 −2 x2 x2 −2x2 the system is particularly straightforward to solve. In addition, even when the coefﬁcient matrix A of the system x = Ax is not a diagonal matrix, in the

For further study

269

case where A is n × n and has n real, linearly independent eigenvectors, it is again a straightforward exercise to determine the general solution to x = Ax. In what follows, we investigate the connections between A having n real linearly independent eigenvectors and the system being uncoupled. (a) Solve the uncoupled system of linear ﬁrst-order equations x1 3 0 x1 3x1 = = 0 −2 x2 x2 −2x2 by directly solving the two individual equations x1 = 3x1 and x2 = −2x2 . (b) For the coefﬁcient matrix

A=

3 0 0 −2

how are your solutions in (a) to the individual differential equations related to the eigenvalues and eigenvectors of A? 1 6 (c) Determine the eigenvalues and eigenvectors of the matrix A = and 5 2 show that A has two real, linearly independent eigenvectors. (d) Let D be the diagonal 2 × 2 matrix whose diagonal entries are λ1 and λ2 , the eigenvalues of A from (c), and let P be the 2 × 2 matrix whose columns are x1 and x2 , the eigenvectors of A corresponding to λ1 and λ2 . Show that AP = PD. (e) More generally, let A be an n × n matrix with n linearly independent real eigenvectors x1 , x2 , . . . , xn that correspond to real eigenvalues λ1 , λ2 , . . . , λn . As in (d), let D be the diagonal matrix whose diagonal entries are the eigenvalues of A and P be the matrix whose columns are the corresponding eigenvectors of A. Explain why AP = PD and thus why A = PDP−1 and D = P−1 DP. A real n × n matrix A with the property that it has n real, linearly independent eigenvectors is called diagonalizable. When we factor A in the form A = PDP−1 , we say that we have diagonalized the matrix A. (f) For a 2 × 2 diagonalizable matrix A, consider the system of differential equations given by x = Ax. Let D and P be the matrices deﬁned above in (d). Note that in this problem A is a arbitrary diagonalizable matrix: we are not specifying the values of λ1 and λ2 , nor the values of the entries in the corresponding eigenvectors. (i) Let y = P−1 x. Show that x = Py . (ii) Use the substitution y = P−1 x and the fact that A = PDP−1 to show that the original system x = Ax may be equivalently represented by the system y = Dy. (iii) Explain why the system y = Dy is preferable to the system x = Ax.

270

Linear systems of differential equations

1 6 (g) For the matrix A = , solve the system x = Ax by executing the 5 2 following steps.

(i) Diagonalize A by determining matrices D and P such that A = PDP−1 . Recall that D is the diagonal matrix whose diagonal entries are the eigenvalues of A and P is the matrix whose columns are the corresponding eigenvectors of A. (ii) Follow your work in (f) to introduce a substitution that converts the system x = Ax to a new system in the variable y that is uncoupled and of the form y = Dy. (iii) Solve the uncoupled system in (ii) for y. (iv) Determine the solution x to the original system by showing that x = Py and using this substitution appropriately. (h) Solve the system x = Ax given by A=

2 1 1 2

using the approach outlined in (g). (i) Solve the system x = Ax given by ⎡

⎤ 3 −1 1 3 −1⎦ A = ⎣−1 1 −1 3

using the approach outlined in (g). (j) Compare your work in (g)–(i) to how you learned to solve the system x = Ax in section 3.3. Is this new approach fundamentally the same or is it markedly different? Explain. 3.9.2 Matrix exponential

An important result in calculus is that e x can be represented by its Taylor series expansion x2 x3 xn ex = 1 + x + + + · · · + (3.9.1) + ··· 2! 3! n! and that (3.9.1) holds for every real value of x. In what follows, we explore the notion of e A , where A is a matrix, through the use of an analogous expansion, as well as the role of e A in the solution of systems of differential equations of the form x = Ax. (a) Let A be the diagonal matrix

3 0 A= 0 −2

For further study

Explain why

n

A =

3n 0 0 (−2)n

271

(b) For the matrix A in (a), show that

2 n 1 2 1 n 1 + 3 + 32! +···+ 3n! 0 I + A + A +···+ A = 2 (−2)n 2! n! 0 1 − 2 + (−22) ! +···+ n ! (3.9.2)

Based on the entries in the right-hand matrix of (3.9.2), explain why it is reasonable to write that 1 1 1 e A = I + A + A2 + A3 + · · · + A n + · · · (3.9.3) 2! 3! n! We use (3.9.3) as the deﬁnition of e A for any diagonal matrix A. 2 −2 (c) Now consider the matrix B = . Find the eigenvalues and −2 −1 eigenvectors of B and diagonalize B by writing B = PDP−1 where D is the diagonal matrix whose diagonal entries are the eigenvalues of B and P is the matrix whose columns are the corresponding eigenvectors of B. For more on the notion of a matrix being ‘diagonalizable’, see subsection 3.9.1. (d) For an arbitrary diagonalizable matrix B for which B = PDP−1 (where D and P have the meaning ascribed in (c)), show that Bn = PDn P−1 (e) For an arbitrary diagonalizable matrix B, explain why 1 1 1 1 1 I + B + B2 + B3 + · · · + Bn + · · · = P I + D + D2 + D3 + · · · 2! 3! n! 2! 3! 1 + Dn + · · · P−1 n! again where D and P have the meaning ascribed in (c). We thus deﬁne e B for any diagonalizable matrix B by the equation eB = I + B +

1 2 1 3 1 B + B + · · · + Bn + · · · 2! 3! n!

(3.9.4)

(f) Show that if B is any diagonalizable matrix such that B = PDP−1 (where D and P have the meaning ascribed in (c)), then e B = Pe D P−1

272

Linear systems of differential equations

(g) Use the result in (f) to compute e B for the speciﬁc matrix B given in (c). (h) Recall that when we solve a single homogeneous linear ﬁrst-order DE such as y = 5y one way to solve the equation is to guess that the solution is y = e rt and work to determine the value of r that satisﬁes the DE. Of course we ﬁnd that r = 5 and y = Ce 5t is the general solution. Indeed, for any constant a, the solution to y = ay is y = Ce at . Now let this consider solving the system of differential equations 3 0 x = Ax = 0 −2

(3.9.5)

noting that A is the diagonal matrix from (a) above. (i) Viewing t as a scalar multiplier of A, update your work from (3.9.3) to write a series expansion for e At . (ii) Noting that e At is a matrix, explain why it is reasonable to guess that (t ) = e At is a solution matrix for the system x = Ax. (iii) Using your expression from (i) for (t ) = e At , compute both (t ) and A (t ) to verify that the matrix function (t ) satisﬁes the equation (t ) = A (t ).

4 Higher order differential equations

4.1 Motivating equations

Through our study of linear systems of differential equations, we have already encountered higher order differential equations that arise naturally in physical applications. Two particularly important ones are those associated with springmass systems and RLC circuits. Here, we brieﬂy revisit these equations. In section 3.1, we considered a mass m suspended from a spring with spring constant k that is subject to damping with proportionality constant c. If F (t ) is an external forcing function on the system, then the displacement y(t ) of the mass from equilibrium satisﬁes my + cy + ky = F (t )

(4.1.1)

This is a nonhomogeneous linear second-order differential equation. While we have already studied this equation by using the substitution x1 = y and x2 = y and considered the resulting linear system of ﬁrst-order differential equations, there is further insight to be gained by examining (4.1.1) solely as a secondorder equation. In fact, while it is theoretically possible to solve (4.1.1) using the corresponding linear system and ideas from chapter 3, doing so in the cases where F (t ) = 0 is often cumbersome; we will see in section 4.4 that this equation may often be solved in a straightforward manner by leaving it in its original form as a second-order equation. In section 3.8, we encountered another important nonhomogeneous linear second-order differential equation. By viewing the ﬂow of electricity through a circuit as analogous to the ﬂow of water in a pipe, we came to understand a differential equation that models the current I (t ). Using results from physics, 273

274

Higher order differential equations

including Ohm’s law, Faraday’s law, and Coulomb’s law, we learned that the current I (t ) must satisfy the linear second-order differential equation LI + RI +

1 I = E (t ) C

(4.1.2)

where L is the inductance, R is the resistance, C is the capacitance, and E(t ) represents an external voltage source. We note speciﬁcally that the governing differential equations for springmass systems and RLC circuits are both linear nonhomogeneous second-order differential equations with constant coefﬁcients. These differential equations therefore merit further study as we endeavor to more fully understand these physical systems. When the damping constant c = 0 and the resistance R = 0 in (4.1.1) and (4.1.2), these equations are often called harmonic oscillator equations. When small damping or resistance is present, we refer to them as damped harmonic oscillators.

4.2 Homogeneous equations: distinct real roots

If we consider our experience with single homogeneous linear ﬁrst-order differential equations and systems thereof, we realize that the exponential function plays a central role in their solution. For example, if we solve the equation y − 5y = 0 the solution is y = ce 5t . Likewise, if we solve the system given by x = Ax, where A is a matrix with eigenvalues λ = 2 and λ = −3, then the general solution is x = c1 e 2t v1 + c2 e −3t v2 where v1 and v2 are eigenvectors that correspond to the eigenvalues λ = 2 and λ = −3. Given this prominence of the exponential function, it is not surprising that functions of the form y = e rt play a central role in our study of higher order equations. For example, consider the second-order linear homogeneous differential equation with constant coefﬁcients given by y − y − 6y = 0

(4.2.1)

Even without our experience with ﬁrst-order equations and systems, it is reasonable to think that one or more functions of the form y = e rt will be a solution to this equation because of the question the equation begs: “what function y is such that its second derivative minus its ﬁrst derivative is equal to 6 times itself?” In essence, we are looking for a function y such that a certain linear combination of the function, its ﬁrst derivative, and its second derivative,

Homogeneous equations: distinct real roots

275

is the zero function. This makes it natural for us to expect that the solution is such that its derivatives are scalar multiples of itself, hence leading us to consider y = e rt . Letting y = e rt , we observe that y = re rt and y = r 2 e rt . Substituting these functions into (4.2.1) requires r to satisfy the equation r 2 e rt − re rt − 6e rt = 0

(4.2.2)

Factoring, we can rewrite (4.2.2) as e rt (r 2 − r − 6) = 0 and since e rt is never zero, it follows that r must be such that r 2 − r − 6 = (r − 3)(r + 2) = 0. From this, r = 3 or r = −2, and therefore y1 = e 3t and y2 = e −2t are both solutions to (4.2.1). Since y1 = e 3t is not a scalar multiple of y2 = e −2t , it follows that y1 and y2 are linearly independent solutions to (4.2.1). Through our work with homogeneous linear systems, we are accustomed to taking linear combinations of linearly independent solutions in order to form a general solution; the same principle holds here, which we will verify directly. Letting y = c1 y1 + c2 y2 = c1 e 3t + c2 e −2t , it follows that y = 3c1 e 3t − 2c2 e −2t and y = 9c1 e 3t + 4c2 e −2t . If we now consider y − y − 6y, we have y − y − 6y = (9c1 e 3t + 4c2 e −2t ) − (3c1 e 3t − 2c2 e −2t ) − 6(c1 e 3t + c2 e −2t ) = (9c1 e 3t − 3c1 e 3t − 6c1 e 3t ) + (4c2 e −2t + 2c2 e −2t − 6c2 e −2t ) =0

Thus, we have shown that every function of the form y = c1 e 3t + c2 e −2t is a solution to (4.2.1). This shows that the solution space of (4.2.1) is at least twodimensional; might there be any other linearly independent solutions to the equation? By our earlier work with systems, we know that the solution space of the equation x = Ax, where A is n × n, is n-dimensional. Since the second-order equation (4.2.1) can be converted to a 2 × 2 system of equations, it follows that its solution space has dimension exactly 2, and thus y = c1 e 3t + c2 e −2t

(4.2.3)

is the general solution to (4.2.1). Our work to show that if y1 and y2 are solutions to (4.2.1), then y = c1 y1 + c2 y2 is also a solution may be generalized to any homogeneous linear secondorder differential equation. We state this result in the following theorem. Theorem 4.2.1 If y1 and y2 are solutions to the second-order linear homogeneous equation y + a(t )y + b(t )y = 0 then y = c1 y1 + c2 y2 is also a solution for any constants c1 and c2 .

276

Higher order differential equations

The important roles the constants c1 and c2 play are further exempliﬁed by initial-value problems. For example, if we consider the initial-value problem y − y − 6y = 0,

y(0) = 2,

y (0) = 1

(4.2.4)

we can show that this IVP has a unique solution. Using the general solution y(t ) = c1 e 3t + c2 e −2t , the condition y(0) = 2 implies that 2 = c1 + c2

(4.2.5)

Differentiating the general solution, we ﬁnd that y (t ) = 3c1 e 3t − 2c2 e −2t , and therefore y (0) = 1 implies (4.2.6) 1 = 3c1 − 2c2 Equations (4.2.5) and (4.2.6) form a linear system of two equations in two unknowns. Solving this system, c1 = 1 and c2 = 1, so that the function y(t ) = e 3t + e −2t is the unique solution to (4.2.4). Our work with the example equation y − y − 6y = 0 is indicative of many broader trends in the study of second-order linear differential equations. Because such equations can be converted to systems, we should not be at all surprised to learn that a broad class of initial-value problems associated with second-order equations have unique solutions, nor that the general solution to a second-order equation belongs to a two-dimensional solution space. We state two theorems in order to formalize these observations. Theorem 4.2.2

Consider the second-order initial-value problem given by

y + p(t )y + q(t )y = f (t )

y(t0 ) = y0 ,

y (t0 ) = y1

(4.2.7)

where the coefﬁcient functions p(t ) and q(t ) and the forcing function f (t ) are continuous on an open interval (a , b). Given any t0 in (a , b), (4.2.7) has a unique solution in (a , b). While the proof of theorem 4.2.2 is beyond the scope of this book, it is notable that in the case that p(t ) and q(t ) are constant functions, we can prove the theorem. Indeed, we will do so by actually constructing the solution in various cases in this section and those following. Just as we almost exclusively considered matrices A with constant entries in our work with systems of linear ﬁrst-order differential equations of the form x = Ax, in our study of second-order linear differential equations, we will normally consider the situation where the coefﬁcient functions p(t ) and q(t ) are constant. For this context, we can deduce the following result. Theorem 4.2.3 The set of all solutions to the second-order homogeneous linear differential equation y + a1 y + a0 y = 0, where a0 and a1 are constants, is a vector space of dimension 2.

Homogeneous equations: distinct real roots

277

This result can be viewed as a consequence of theorem 3.3.2 for linear systems of differential equations with constant coefﬁcients. In particular, given y + a1 y + a0 y = 0

(4.2.8)

if we use the standard substitution x1 = y, x2 = y , then it follows that (4.2.8) is equivalent to the system 0 1 x x = Ax = −a0 −a1 which has a two-dimensional solution space. Thus, in order to solve (4.2.8), we seek two linearly independent solutions that satisfy the equation. In particular, if we can ﬁnd two functions y1 = e r1 t and y2 = e r2 t that are both solutions to (4.2.8), where r1 = r2 , then the general solution must be y = c1 e r1 t + c2 e r2 t More speciﬁcally, if we recall our earlier approach following (4.2.1) in the ﬁrst example in this section, we made the assumption that a solution y has form y = e rt . Doing so and substituting in the general equation y + a1 y + a0 y = 0, we see that r must satisfy r 2 e rt + a1 re rt + a0 e rt = 0

(4.2.9)

Since e rt is never zero, it follows that r must be a solution of the characteristic equation of the second-order homogeneous linear equation (4.2.8), which is r 2 + a1 r + a0 = 0

(4.2.10)

If r1 and r2 are the roots of (4.2.10), then it follows that y1 = e r1 t and y2 = e r2 t are both solutions to the original equation (4.2.8). In particular, if r1 = r2 , then y1 and y2 are linearly independent and we have found the general solution to (4.2.8), which is y = c1 e r1 t + c2 e r2 t We state this result formally in the following theorem. Theorem 4.2.4 Given the second-order linear differential equation with constant coefﬁcients y + a1 y + a0 y = 0 if the characteristic equation r 2 + a1 r + a0 = 0 has two distinct real roots r1 and r2 , then the general solution to (4.2.4) is y = c1 e r1 t + c2 e r2 t We close this section with an example.

278

Higher order differential equations

4

y

2 t −0.5

0.5

1.0

1.5

−2

−4 Figure 4.1 A plot of the solution y(t ) to the IVP given in (4.2.11).

Example 4.2.1

Solve the second-order initial-value problem given by y + 7y + 12y = 0,

y(0) = 3,

y (0) = −1

(4.2.11)

Graph the solution and discuss its long-term behavior. Solution. We begin by assuming that y = e rt . Direct substitution into (4.2.11) and removing the factor e rt results in the characteristic equation r 2 + 7r + 12 = 0 Factoring, we ﬁnd that (r + 3)(r + 4) = 0, and therefore, r = −3 or r = −4. Since the two r values are distinct, it follows that y1 = e −3t and y2 = e −4t are linearly independent solutions to (4.2.11) and the general solution is y = c1 e −3t + c2 e −4t

(4.2.12)

Applying the given initial conditions, we can solve for c1 and c2 . Since y(0) = 3 and y (0) = −1, (4.2.12) implies that 3 = c1 + c2 −1 = −3c1 − 4c2

It follows c1 = 11 and c2 = −8, and thus the unique solution to the given IVP (4.2.11) is y = 11e −3t − 8e −4t . Plotting y(t ) results in the graph shown in ﬁgure 4.1, where we clearly see the given initial behavior at t = 0 (the function value is 3 and the slope of the tangent line is −1) and that the solution’s long-term behavior is that y(t ) → 0 as t → ∞. We can also observe from the negative constants present in the exponents of the general solution y = c1 e −3t + c2 e −4t , that every such solution must tend to zero as t → ∞. We note that y = 0 is the only constant (equilibrium) solution

Homogeneous equations: distinct real roots

279

to the original equation y + 7y + 12y = 0, and that because every solution tends to y = 0, we say y = 0 is a stable equilibrium. Exercises 4.2 In exercises 1–7, determine the general solution to the given second-order homogeneous linear DE. 1. y − y − 12y = 0 2. y + y − 2y = 0 3. y − y = 0 4. y + 3y = 0 5. y = 0 6. y + 4y + 3y = 0 7. y + y − y = 0 In exercises 8–14, solve the stated IVP. In addition, graph your solution and discuss its long-term behavior. Note that the general solution to each equation has been found in exercises 1–7. 8. y − y − 12y = 0,

y(0) = −4,

9. y + y − 2y = 0, 10. y − y = 0,

y(0) = 2,

y(0) = −3,

13. y + 4y + 3y = 0, 14. y + y − y = 0,

y (0) = 2

y (0) = −1

y(0) = 1,

11. y + 3y = 0, 12. y = 0,

y(0) = 2,

y (0) = 1

y (0) = 3

y (0) = 1

y(0) = −2, y(0) = 9,

y (0) = −6

y (0) = −3

In exercises 15–19, construct a second-order homogeneous linear DE having the given functions as solutions. 15. y1 = e −2t , y2 = e 2t 16. y1 = e 5t , y2 = e −3t 17. y1 = e 4t , y2 = 1 18. y1 = e 2t , y2 = e 3t 19. y1 = 1, y2 = t 20. Consider the second-order homogeneous linear equation y − 6y + 9y = 0. (a) Use the substitution y = e rt to attempt to ﬁnd two linearly independent solutions to the given equation.

280

Higher order differential equations

(b) Explain why your work in (a) only results in one linearly independent solution, y1 (t ). (c) Verify by direct substitution that y2 = te 3t is a solution to y − 6y + 9y = 0. Explain why this function is linearly independent from y1 found in (a). (d) State the general solution to the given equation. 21. Consider the second-order homogeneous linear equation y − 2y + 5y = 0. (a) Use the substitution y = e rt to attempt to ﬁnd two linearly independent solutions to the given equation. (b) Explain why your work in (a) does not generate any real solutions to the given equation. (c) Verify by direct substitution that y1 = e t cos 2t and y2 = e t sin 2t are solutions to y − 2y + 5y = 0. Explain why these functions are linearly independent. (d) State the general solution to the given equation. 22. Consider the second-order homogeneous linear equation y + 4y = 0. (a) Use the substitution y = e rt to attempt to ﬁnd two linearly independent solutions to the given equation. (b) Explain why your work in (a) does not generate any real solutions to the given equation. (c) Think about familiar functions that can satisfy the condition that “the second derivative equals −4 times the function itself.” By making a natural guess and verifying by direct substitution, ﬁnd two linearly independent functions y1 and y2 that satisfy the given differential equation. (d) State the general solution to the given equation. Recall that in a spring-mass system, the displacement y(t ) of the mass from its natural equilibrium is governed by the equation c k 1 y + y + y = F (t ) m m m where c is the damping constant, k is the spring constant, m is the mass of the suspended object, and F is the forcing function. 23. For an unforced system with c = 3, k = 2, and m = 1, determine the displacement of the mass at time t if the system is set in motion via the initial conditions y(0) = 2, y (0) = 1. Sketch a graph of the solution you determine and discuss the long-term behavior of the spring-mass system. Assume consistent units on all constants. 24. For an unforced spring-mass system with k = 9, c = 12, and m = 3, determine the displacement of the mass from equilibrium at time t if y(0) = 0 and y (0) = −1. Assume consistent units on all constants.

Homogeneous equations: repeated and complex roots

281

Recall that in a standard RLC electrical circuit, the current I (t ) satisﬁes the equation 1 LI (t ) + RI (t ) + I (t ) = E (t ) C where L is the inductance, R is the resistance, C is the capacitance, and E(t ) represents an external voltage source. 25. For an RLC circuit with no external voltage source, L = 20, R = 80, and C = 1/60, determine the current at time t given the initial conditions I (0) = 100, I (0) = 25. Graph the solution you determine and discuss the long-term behavior of the current. Assume consistent units on all constants. 26. For an RLC circuit with no external voltage source, L = 20, R = 0, and C = 1/60, determine the current at time t given the initial conditions I (0) = 100, I (0) = 25. Graph the solution you determine and discuss the long-term behavior of the current. Assume consistent units on all constants. 4.3 Homogeneous equations: repeated and complex roots

In the preceding section, we observed that any time the characteristic equation of the second-order equation y + a1 y + a0 y has two real, distinct roots, the general solution of the differential equation is easily determined. However, in an equation such as (4.3.1) y − 6y + 9y = 0 with characteristic equation r 2 − 6r + 9 = 0, the only root of this equation is r = 3. Although this leads us to the solution y1 = e 3t , we do not immediately see how to ﬁnd a second linearly independent solution. In a similar way, the equation (4.3.2) y − 2y + 5y = 0 has characteristic equation is r 2 − 2r + 5 = 0 and its roots are r = 1 ± 2i In this case, we see that no real solution to (4.3.2) results using our previous approach, so it remains for us to ﬁnd two real linearly independent solutions. Now we will endeavor to understand how to address these two cases: when roots of the characteristic equation are repeated and when the roots of the characteristic equation are complex. 4.3.1 Repeated roots

Let us consider the second-order homogeneous linear DE given by y + 4y + 4y = 0

(4.3.3)

282

Higher order differential equations

Its characteristic equation is r 2 + 4r + 4 = (r + 2)2 = 0, so that only the solution y1 = e −2t results from the guess that y = e rt . To ﬁnd a second linearly independent solution, it is natural to think that we need to somehow complicate the function y = e −2t , just as we did in section 3.5 when we encountered the similar case where the coefﬁcient matrix of a 2 × 2 system of linear ﬁrst-order DEs had a repeated eigenvalue. Thus, we consider a second potential solution y2 = v(t )e −2t where v(t ) is a function yet to be determined. By using this function and substituting into the equation y + 4y + 4y = 0, we ﬁnd conditions that v(t ) must satisfy. First, observe by the product rule that y2 = −2ve −2t + v e −2t

(4.3.4)

y2 = 4ve −2t − 4v e −2t + v e −2t

(4.3.5)

Similarly,

Next, substituting into (4.3.3), we ﬁnd 0 = y2 + 4y2 + 4y2 = (4ve −2t − 4v e −2t + v e −2t ) + 4(−2ve −2t + v e −2t ) + 4(ve −2t ) = v e −2t

(4.3.6)

Since e −2t is never zero, it follows that v (t ) must equal zero for all values of t . This implies that v(t ) can be any linear function. Because all we seek is one function y2 = v(t )e −2t that is a solution to (4.3.3) and is linearly independent from y1 = e −2t , it sufﬁces to choose v(t ) = t . Speciﬁcally, y2 = te −2t is a second linearly independent solution to (4.3.3). The general solution is therefore y(t ) = c1 e −2t + c2 te −2t The condition we derived at (4.3.6) for v(t ) will hold in any situation where the characteristic equation of a second-order linear homogeneous DE has a repeated root. This leads us to state the following theorem. Theorem 4.3.1 For any second-order linear homogeneous differential equation of the form y + 2ky + k 2 y = 0 whose characteristic equation has repeated real root r = −k, the general solution to the differential equation is y = c1 e −kt + c2 te −kt

Homogeneous equations: repeated and complex roots

283

Before proceeding to the case of complex roots, we consider one example to demonstrate theorem 4.3.1 at work. Example 4.3.1 Determine the general solution to the equation y − 10y + 25y = 0

(4.3.7)

Solution. The characteristic equation of the given DE is r 2 − 10r + 25 = (r − 5)2 = 0, which has the repeated root r = 5. By theorem 4.3.1, it follows that the general solution to (4.3.7) is y = c1 e 5t + c2 te 5t 4.3.2 Complex roots

We continue to be guided throughout our work with second-order linear homogeneous equations by the informed guess that the solution has form y = e rt . When this guess and the corresponding characteristic equation result in two distinct, real values of r, we have found the general solution to the given differential equation. Likewise, we have just shown that when the characteristic equation has only one real root, we can still ﬁnd the general solution to the DE. We next explore how, even in the complex case, we can ﬁnd the general solution through our original guess, y = e rt . We return to the example y − 2y + 5y = 0 (4.3.8) and recall that the roots of the characteristic equation are r = 1 ± 2i. While this suggests that z(t ) = e (1+2i)t should be a solution of the differential equation, the function z(t ) is complex-valued. When we encountered a similar situation in section 3.5 for a linear system whose coefﬁcient matrix had complex eigenvalues and complex eigenvectors, we used Euler’s formula to separate such a complexvalued function into real and imaginary parts in order to ﬁnd real solutions. We proceed similarly here. Recall that Euler’s formula states that e i θ = cos θ + i sin θ , so e (a +bi)t = e at e ibt = e at (cos bt + i sin bt ) For the complex solution z(t ) to (4.3.8), we thus ﬁnd that z(t ) = e (1+2i)t = e t (cos 2t + i sin 2t ) = e t cos 2t + ie t sin 2t (4.3.9) In (4.3.9), we see that z(t ) has been written in the form z(t ) = Re(z) + iIm(z) where Re(z) and Im(z) are themselves real-valued functions of t . Based on our experience with systems of differential equations with complex-valued solutions,

284

Higher order differential equations

it is natural at this point to hope that both the real and imaginary parts of z(t ) will be linearly independent solutions to (4.3.8). Indeed, if we let y1 = e t cos 2t and y2 = e t sin 2t , then it can be shown by direct substitution that both y1 and y2 are solutions to (4.3.8). Because y1 and y2 are not scalar multiples of each other, these two functions are linearly independent, and therefore, by theorem 4.2.3, it follows that y(t ) = c1 e t cos 2t + c2 e t sin 2t is the general solution to (4.3.8). The direct substitution that is used to verify that the real and imaginary parts of z(t ) are solutions to the original equation is somewhat tedious, but not difﬁcult. In fact, in the more general case where we have complex roots a ± bi, it can be similarly veriﬁed by direct substitution into the corresponding second-order equation that y1 = e at cos bt and y2 = e at sin bt are each solutions to the equation. Note that this scenario implies that the characteristic equation has form C(r) = 0 where C(r) = [r − (a + bi)][r − (a − bi)] = r 2 − (a + bi)r − (a − bi)r + (a + bi)(a − bi) = r 2 − 2ar + (a 2 + b 2 )

(4.3.10)

This shows that, up to a scalar multiple of the equation, complex roots to the characteristic equation arise from second-order homogeneous linear differential equations of the form y − 2ay + (a 2 + b 2 )y = 0

(4.3.11)

Our work above now enables us to state a formal result on ﬁnding real, linearly independent solutions from complex-valued ones. Theorem 4.3.2 Let a and b be real constants with b = 0. For the second-order homogeneous linear differential equation y − 2ay + (a 2 + b 2 )y = 0 the roots of the corresponding characteristic equation are r = a ± bi and the general solution to the differential equation is given by y = c1 e at cos bt + c2 e at sin bt Note that it is precisely the presence of complex roots to the characteristic equation that produces the periodic functions cos bt and sin bt in the solution. In physical situations such as spring-mass systems and RLC circuits where we anticipate that solutions will have a sinusoidal component, we can expect that the characteristic equation will have complex roots. We conclude this section by applying theorem 4.3.2 in the following example.

Homogeneous equations: repeated and complex roots

285

Example 4.3.2 Solve the initial-value problem given by y + 2y + 10y = 0,

y(0) = 1,

y (0) = 1

Plot the solution and discuss its long-term behavior. Solution. We ﬁrst ﬁnd the general solution to the given differential equation. The corresponding characteristic equation is r 2 + 2r + 10 = 0, with roots r = −1 ± 3i By theorem 4.3.2 it follows that the general solution is y = c1 e −t cos 3t + c2 e −t sin 3t To determine the solution to the stated IVP, ﬁrst note that y(0) = 1 implies that 1 = c1 e 0 cos(0) + c2 e 0 sin(0) so that c1 = 1. In addition, since y = −c1 e −t cos 3t − c2 e −t sin 3t − 3c1 e −t sin 3t + 3c2 e −t cos 3t it follows from the fact that y (0) = 1 that 1 = −c1 + 3c2 Since c1 = 1, we ﬁnd that c2 = 2/3 and hence the solution to the IVP is 2 y = e −t cos 3t + e −t sin 3t 3 Plotting the function y in ﬁgure 4.2, we see that the function y(t ) oscillates due to the presence of the trigonometric functions, while y(t ) → 0 as t → ∞ because of the damping effect of e −t . In fact, the graphical behavior demonstrated by y(t ) in ﬁgure 4.2 is precisely what we would expect if the given IVP was modeling a spring-mass system where relatively small damping is present: the mass will oscillate once sent in motion, but will eventually return to equilibrium. Exercises 4.3 In exercises 1–9, use the characteristic equation to determine the general solution to the given second-order linear homogeneous differential equation. 1. y − 8y + 16y = 0 2. y + y + y = 0 3. y + y + 14 y = 0 4. y − 4y = 0 5. y + 4y = 0 6. y − 10y + 50y = 0

286

Higher order differential equations

y 1.0

t 1

3

5

−1.0 Figure 4.2 A plot of the solution y(t ) to

the IVP given in example 4.3.2.

7. y − 10y + 25y = 0 8. y = 0 9. 2y + 7y + 5y = 0 In exercises 10–18, solve the stated initial-value problem. In addition, graph your solution and discuss its long-term behavior. Note that the general solution to each equation has been found in corresponding problems in exercises 1–9. 10. y − 8y + 16y = 0, 11. y + y + y = 0,

y(0) = 2,

12. y + y + 41 y = 0,

y (0) = 1

y(0) = −4,

y (0) = 2

y(0) = 0,

y (0) = −1

13. y − 4y = 0,

y(0) = 7,

y (0) = −5

14. y + 4y = 0,

y(0) = 2,

y (0) = 3

15. y − 10y + 50y = 0,

y(0) = −3,

y (0) = 1

16. y − 10y + 25y = 0,

y(0) = −2,

y (0) = −6

17. y = 0,

y(0) = 0,

y (0) = 0

18. 2y + 7y + 5y = 0,

y(0) = 9,

y (0) = −3

19. Consider the second-order linear homogeneous equation y − 6y + 9y = 0. (a) Find the general solution y of the given equation. (b) Convert the given equation to a system x = Ax of two ﬁrst-order equations using the substitution x1 = y, x2 = y .

Homogeneous equations: repeated and complex roots

287

(c) Solve the system x = Ax. (d) Compare your results for y and x1 . What do you observe? 20. Consider the second-order linear homogeneous equation y + 6y + 10y = 0. (a) Find the general solution y of the given equation. (b) Convert the given equation to a system x = Ax of two ﬁrst-order equations using the substitution x1 = y, x2 = y . (c) Solve the system x = Ax. (d) Compare your results for y and x1 . What do you observe? 21. Consider the general second-order linear homogeneous equation with constant coefﬁcients given by y + a1 y + a0 y = 0 Under what conditions on a1 and a0 does the equation have two real distinct roots? one real repeated root? two distinct complex roots? Recall that in a spring-mass system, the displacement y(t ) of the mass from its natural equilibrium is governed by the equation c k 1 y + y + y = F (t ) m m m where c is the damping constant, k is the spring constant, m is the mass of the suspended object, and F (t ) is the forcing function. In the following exercises, we assume that units on all quantities and constants are consistent. 22. For an unforced spring-mass system with c = 2, k = 1, and m = 1, determine the displacement of the mass at time t if the system is set in motion with the initial conditions y(0) = 2, y (0) = 1. Sketch the solution you determine and discuss the behavior of the spring-mass system. 23. For an unforced, undamped spring-mass system with k = 9 and m = 3, determine the displacement of the mass from equilibrium at time t if y(0) = 2 and y (0) = 1. Sketch the solution you determine and discuss the behavior of the spring-mass system. 24. For an unforced spring-mass system with c = 1, k = 2, and m = 1, determine the displacement of the mass at time t if the system is set in motion with the initial conditions y(0) = 2, y (0) = 1. Sketch the solution you determine and discuss the behavior of the spring-mass system. Recall that in a standard RLC electrical circuit, the current I (t ) satisﬁes the equation 1 LI (t ) + RI (t ) + I (t ) = E (t ) C where L is the inductance, R is the resistance, C is the capacitance, and E(t ) represents an external voltage source. In the following exercises, we assume that units on all quantities and constants are consistent.

288

Higher order differential equations

25. For an RLC circuit with no external voltage source, L = 10, R = 40, and C = 1/40, determine the current at time t given the initial conditions I (0) = 100, I (0) = 25. Sketch the solution you determine and discuss the behavior of the current. 26. For an RLC circuit with no external voltage source, L = 10, R = 40, and C = 1/50, determine the current at time t given the initial conditions I (0) = 100, I (0) = 25. Sketch the solution you determine and discuss the behavior of the current. 27. For an RLC circuit with no external voltage source, L = 10, R = 0, and C = 1/90, determine the current at time t given the initial conditions I (0) = 100, I (0) = 25. Sketch the solution you determine and discuss the behavior of the current. 4.4 Nonhomogeneous equations

As motivated by a spring-mass system with a driving force or an RLC circuit with an external voltage source, we are now interested in solving second-order nonhomogeneous linear differential equations of the form y + a1 y + a0 y = f (t )

(4.4.1)

where f (t ) is not zero. We already know a theoretical way to solve such an equation: through the substitution x1 = y and x2 = y , we can convert (4.4.1) to a system of two ﬁrst-order equations in the form x = Ax + b and solve the two ﬁrst-order DEs. While this approach works in theory, the actual execution of the process can be cumbersome. In fact, it is often much easier to solve (4.4.1) directly through the approaches we present in this section. Analogous to several other types of linear algebraic and linear differential equations, a general principle from our work with nonhomogeneous equations guides us throughout: we ﬁrst seek a complementary solution yh (t ) to the corresponding homogeneous equation y + a1 y + a0 y = 0

(4.4.2)

and then determine a particular solution yp (t ) to the nonhomogeneous equation (4.4.1). It follows that y = yh + yp will be the general solution to the nonhomogeneous equation. Indeed, we have the following theorem, a part of whose formal proof will be addressed in exercise 33 at the end of this section. Theorem 4.4.1

Given the equation y + a1 y + a0 y = f (t )

(4.4.3)

where a0 and a1 are constants, if yh (t ) is the general solution to the corresponding homogeneous equation y + a1 y + a0 y = 0 and yp (t ) is any solution to the nonhomogeneous equation (4.4.3) then y = yh + yp is the general solution to (4.4.3).

Nonhomogeneous equations

289

We already understand how to ﬁnd yh , which depends entirely on the roots to the characteristic equation r 2 + a1 r + a0 = 0 as discussed in sections 4.2 and 4.3. It remains, however, to ﬁnd yp . To do so, we explore two methods: the guessing technique of undetermined coefﬁcients, and the brute force technique of variation of parameters. Each of these methods is analogous to those that may be used to solve nonhomogeneous systems of the form x = Ax + b. 4.4.1 Undetermined coefﬁcients

At this point in our discussion, examples are instructive. We consider several different nonhomogeneous linear second-order DEs to see how making reasonable guesses for the form of yp (t ) can lead to the general solution in many elementary cases. Throughout, we use the following idea to guide our choice of the form of yp (t ): since the ﬁrst and second derivatives of many functions are similar to the original function (e.g., derivatives of sine and cosine functions are cosine and sine functions, derivatives of exponential functions are exponential functions, derivatives of polynomial functions are polynomials), and in equations of the form (4.4.3) we take linear combinations of y, y , and y to get f (t ), it is reasonable to guess that the form of yp (t ) will be similar to the form of f (t ), the forcing function in the nonhomogeneous equation. We ﬁrst see this for polynomial functions in the ﬁrst example. Example 4.4.1 Determine the general solution to y − 3y − 4y = 4t 2 + 2t − 9

(4.4.4)

Solution. For the associated nonhomogeneous equation, y − 3y − 4y = 0, by theorem 4.2.4 the complementary solution is yh = c1 e −t + c2 e 4t . For a particular solution, we naturally guess that yp has the form yp = at 2 + bt + c

(4.4.5)

based on the form of the forcing function. The undetermined coefﬁcients a, b, and c are found by direct substitution into (4.4.4). Note that yp = 2at + b and yp = 2a, so that from (4.4.4) we ﬁnd 2a − 3(2at + b) − 4(at 2 + bt + c) = 4t 2 + 2t − 9 Rearranging the left-hand side of this equation, it follows −4at 2 + (−6a − 4b)t + (2a − 3b − 4c) = 4t 2 + 2t − 9

(4.4.6)

Equating like coefﬁcients of the power functions present in (4.4.6), the system of equations −4a = 4 −6a − 4b = 2

2a − 3b − 4c = −9

290

Higher order differential equations

must hold. We see that a = −1, from which it follows that b = 1 and c = 1 so that yp = −t 2 + t + 1. Combining this with yh , we have determined that the general solution to (4.4.4) is y = c1 e −t + c2 e 4t − t 2 + t + 1 We can imagine that if f (t ) was a polynomial other than 4t 2 + 2t − 9, we would have guessed that yp was a general polynomial of the same degree with unknown coefﬁcients. This approach almost always works; we will discuss some exceptions that can arise after examples involving non-polynomial forcing functions. Example 4.4.2

Determine the general solution to y − y = 16e 3t

(4.4.7)

Solution. Just as in example 4.4.1, we ﬁrst solve the corresponding homogeneous equation and ﬁnd yh . Doing so, we observe that for y − y = 0, the solution yh is y h = c1 e t + c2 e − t For the particular solution, we use the natural guess that yp = Ae 3t . From this, yp = 3Ae 3t and yp = 9Ae 3t , so substituting into (4.4.7), we ﬁnd 9Ae 3t − Ae 3t = 16e 3t Equating the coefﬁcients of e 3t , it follows that 8A = 16, so A = 2 and therefore yp = 2e 3t . Hence we have found the general solution of (4.4.7) to be y = yh + yp = c1 e t + c2 e −t + 2e 3t Here, we observe that if f (t ) in (4.4.7) were a different exponential function, say of the form f (t ) = Be kt , we would again guess that yp = Ae kt . This is based on the fact that our guess for yp incorporates all the possible forms of the derivatives of f (t ). Just as with polynomial forcing functions, this approach almost always works. We will consider situations where these natural educated guesses can fail following one more example. Example 4.4.3

Solution. to be

Determine the general solution to y − y − 2y = 10 sin t

(4.4.8)

First, we observe that the complementary solution can be shown yh = c1 e 2t + c2 e −t

To ﬁnd yp , we guess that yp = A sin t + B cos t Note that we must include the cosine function in yp in order to account for the fact that the cosine function arises in the derivative of f (t ) = 10 sin t .

Nonhomogeneous equations

291

From our guess for yp , it follows that yp = A cos t − B sin t and yp = −A sin t − B cos t . Substituting in (4.4.8), we see that A and B must satisfy the equation (−A sin t − B cos t ) − (A cos t − B sin t ) − 2(A sin t + B cos t ) = 10 sin t (4.4.9) Rearranging (4.4.9) in order to compare coefﬁcients of the sine and cosine functions, we have (−A + B − 2A) sin t + (−B − A − 2B) cos t = 10 sin t from which it follows that −3A + B = 10 and −A − 3B = 0. Consequently, A = −3 and B = 1, so that yp = −3 sin t + cos t . Therefore we have shown that the general solution of (4.4.8) is y = yh + yp = c1 e 2t + c2 e −t − 3 sin t + cos t In the more general setting where we imagine the forcing function f (t ) involving sin kt or cos kt , it will be natural to make the guess that yp = A sin kt + B cos kt , which again will work in most cases. We have hinted that while the method of undetermined coefﬁcients will usually work, it can occasionally fail. What can go wrong? First, if the forcing function f (t ) is particularly complicated, this can make determining a reasonable guess for yp challenging. Moreover, even if f (t ) is a relatively simple function whose derivatives take on unusual forms—for example, f (t ) = ln t , where f (t ) and f (t ) are not logarithmic—we may ﬁnd it difﬁcult or impossible to ﬁnd a form of yp that works. These two situations will be addressed by the variation of parameters method that we introduce in the next subsection. In addition, there is one more case in which undetermined coefﬁcients can fail, yet the difﬁculty is straightforward to reconcile. An example is instructive. Example 4.4.4 Find the general solution to the differential equation y − y = 16e −t

(4.4.10)

Solution. Note that this differential equation is nearly identical to the one considered in example 4.4.2, but here the forcing function is f (t ) = 16e −t , rather than f (t ) = 16e 3t . As above, it still holds that yh = c1 e t + c2 e −t . In addition, we naturally guess that yp = Ae −t , from which it follows that yp = −Ae −t and yp = Ae −t . Substituting in (4.4.10), we have Ae −t − Ae −t = 16e −t But this last equality is clearly impossible, regardless of the value of A, since 0 = 16e −t is never true. We can determine where the method failed by observing that in this case, our guess for the particular solution yp was actually part of the complementary solution. Note that yh = c1 e t + c2 e −t , from which it follows that yp cannot have the form Ae −t , since this latter function belongs to yh .

292

Higher order differential equations

We therefore need a more complicated guess for yp ; a natural one to attempt is yp = Ate −t

(4.4.11)

where we have introduced the additional multiplier t . From this, yp = −Ate −t + Ae −t and yp = Ate −t − Ae −t − Ae −t . Substituting in (4.4.10), it now follows (Ate −t − 2Ae −t ) − (Ate −t ) = 16e −t Rearranging and simplifying this last equation in order to compare like coefﬁcients of e −t and te −t , we see that the terms involving te −t drop out and we are left with −2Ae −t = 16e −t

so that A = −8 and yp = −8te −t . We therefore have shown that the general solution is y = yh + yp = c1 e t + c2 e −t − 8te −t The preceding example shows that if the form of the forcing function matches the form of one or more parts of the complementary solution yh , then we have to use a different, more complicated guess for yp than the most natural one. One more example will be helpful before we make some general conclusions. Example 4.4.5

Find the general solution of y − y = 4t

(4.4.12)

Solution. From the characteristic equation r 2 − r = 0 for the corresponding homogeneous equation, we quickly deduce that y h = c1 + c 2 e t Next, since f (t ) = 4t , we naturally guess that yp is a ﬁrst order polynomial: yp = at + b. From this, yp = a and yp = 0. Substituting in (4.4.12), we ﬁnd 0 − a = 4t Clearly, there is no value of a that makes −a = 4t for all values of t , so there can be no particular solution yp of the form yp = at + b. From another perspective, we can see why this must be true by observing that the “b” in our guess for yp is already part of the complementary solution since any constant function is a solution to y − y = 0. Therefore, we revise our guess for yp and assume it has form yp = t (at + b) = 2 at + bt . Doing so, we now have yp = 2at + b and yp = 2a, so substituting in (4.4.12) it follows 2a − (2at + b) = 4t

Nonhomogeneous equations

293

Rearranging so that we can equate like coefﬁcients, we have −2at + (2a − b) = 4t

so −2a = 4 and 2a − b = 0. It follows that a = −2 and b = −4, and thus yp = −2t 2 − 4t . Therefore, we have found the general solution of (4.4.12) to be y = c1 + c2 e t − 2t 2 − 4t From our work with examples 4.4.1–4.4.5, we observe that the method of undetermined coefﬁcients breaks down into two fundamental cases Case 1. No functions in the assumed particular solution yp are also solutions to the associated homogenous differential equation. Case 2. A function in the assumed particular solution yp is also a solution of the associated homogeneous differential equation. Moreover, we can observe that when the forcing function f (t ) is a sum of polynomial, exponential, and sine and cosine functions, the linearity of the differential equation allows us to guess a form for yp that is an appropriate sum of all the different types of functions represented. The following example shows some of the variety that arises in choosing the form of yp . Example 4.4.6 Write an appropriate guess for yp for each of the following equations. Do not solve for the unknown coefﬁcients. (a) y + y = 4e 3t + 5t 2 (b) y − 5y − 6y = 3e −2t + 4 cos 3t (c) y − 2y + 5y = 3te t (d) y − 4y − 5y = 3e 2t sin t Solution. (a) The forcing function f (t ) = 4e 3t + 5t 2 combines an exponential function and a second degree polynomial, so we would guess that yp = Ae 3t + bt 2 + ct + d. (b) The natural guess is yp = Ae −2t + B cos 3t + C sin 3t to account for the exponential and trigonometric functions present. (c) f (t ) = 3te t is a product of a linear function and an exponential one. Its derivatives will be sums of functions of the same form and constant multiples of exponential functions, so we assume that yp = Ate t + Be t = e t (At + B). (d) We observe that every derivative of f (t ) = 3e 2t sin t is the sum of functions of the form Ae 2t cos t + Be 2t sin t , so that we would guess that yp = Ae 2t cos t + Be 2t sin t .

294

Higher order differential equations

Note the general rule we are using in case 1 and example 4.4.6: provided the terms of f (t ) do not belong to yh , the form of yp is a linear combination of all linearly independent functions that are generated by repeated differentiation of the forcing function f (t ). For dealing with equations that fall into case 2, we make a guess yp that is a sum of functions similar to those present in f (t ). We then have to tack on powers of t to modify any parts of yp that already appear in yh . In particular, we use the rule that if any part of yp contains terms that duplicate terms in yh , then we must multiply that part by t n using the smallest possible value of n to eliminate the duplication. For example, if we wanted to solve y + 4y + 4 = 3e −2t , which has characteristic equation r 2 + 4r + 4 = (r + 2)2 = 0, our work in section 4.3 implies that yh = c1 e −2t + c2 te −2t Therefore, for the form of yp , which we initially might assume to be yp = Ae −2t , we see that we must in fact introduce a multiplier of t 2 in order to ensure that yp does not appear in yh . Thus, the appropriate form of yp is yp = At 2 e −2t . A few more examples of the possibilities that arise in case 2 are useful. Example 4.4.7 Write an appropriate trial solution yp for each of the following examples. Do not solve for the unknown coefﬁcients. (a) y − y = 4e t + 5e −t (b) y + 4y = 4 cos 2t (c) y − 2y + y = 3te t Solution. (a) Observe from the characteristic equation r 2 − 1 = 0 that yh = c1 e t + c2 e −t , so both parts of the forcing function appear in yh . We therefore assume that yp = Ate t + Bte −t . (b) The characteristic equation is r 2 + 4 = 0 with roots r = ±2i. It follows that yh = c1 sin 2t + c2 cos 2t . Since cos 2t appears in the forcing function, and both sin 2t and cos 2t arise in yh , the appropriate guess for yp is yp = At cos 2t + Bt sin 2t . (c) Note that the characteristic equation is r 2 − 2r + 1 = (r − 1)2 = 0 so that yh = c1 e t + c2 te t . Since te t is included in yh , this implies that we must choose yp = At 2 e t . Obviously the method of undetermined coefﬁcients requires us to be experienced with a wide range of examples and to understand how the derivatives of the forcing function behave. The exercises at the end of this section will provide further practice in this regard.

Nonhomogeneous equations

295

4.4.2 Variation of parameters

Recall that we are focusing on solving the nonhomogeneous linear second-order equation y + a1 y + a0 y = f (t ) While the method of undetermined coefﬁcients works well for a reasonable collection of forcing functions, it has some fairly strict limitations. In particular, it is unclear whether it is possible to make a reasonable guess for yp in order to solve an equation such as y + 4y − 5y = ln t . In fact, we cannot: the derivative of the logarithm function is not a logarithm, and this is the main issue that prevents the use of this method.1 Here, we study a method that will enable us, in theory, to solve a much wider class of nonhomogeneous linear second-order equations; as always, the approach requires us to ﬁnd the general solution to the related homogeneous equation ﬁrst. Let us again consider the equation y + a1 y + a0 y = f (t )

(4.4.13)

where a0 and a1 are constant and assume only that f (t ) is continuous. Suppose we know that y1 (t ) and y2 (t ) are linearly independent solutions of the associated homogeneous equation, so the complementary solution is yh = c1 y1 (t ) + c2 y2 (t ). In the method of undetermined coefﬁcients, we made a guess of a particular solution yp to (4.4.13) based on the form of f (t ). In the method of variation of parameters, we assume instead that the form of yp is a more complicated version of yh . In particular, we assume that yp has the form yp = u1 (t )y1 (t ) + u2 (t )y2 (t )

(4.4.14)

for unknown functions u1 and u2 , where again y1 and y2 are the functions that arose in solving the related homogeneous equation. The goal of variation of parameters is to ﬁnd the functions u1 (t ) and u2 (t ) such that the function yp = u1 y1 + u2 y2 is a particular solution to (4.4.13). Let us explore what conditions u1 (t ) and u2 (t ) must satisfy. Differentiating yp yields yp = u1 y1 + u1 y1 + u2 y2 + u2 y2

(4.4.15)

While it seems natural at this point to differentiate again to ﬁnd yp and substitute into the differential equation, this becomes rather complicated. Above we have seen that the two unknown functions must satisfy one condition (so far), that being the differential equation itself, as stated in (4.4.13). Because we have two functions, we have the freedom to set a second condition as well. In order to make the functions as simple as possible, and to eliminate 1 If we tried the guess y = A ln t , then y = A /t , which introduces a function of an entirely new p p form. If we tried yp = A ln t + B /t , then the derivative leads us to a function involving 1/t 2 , again of a form not considered.

296

Higher order differential equations

the second derivatives of u1 and u2 from arising in yp , we impose a second condition given by (4.4.16) u1 y1 + u2 y2 = 0 Observe now that by substituting the condition (4.4.16) in (4.4.15) we have yp = u1 y1 + u2 y2 so that yp = u1 y1 + u1 y1 + u2 y2 + u2 y2 Substituting the above expressions for yp and yp in (4.4.13) yields (u1 y1 + u1 y1 + u2 y2 + u2 y2 ) + a1 (u1 y1 + u2 y2 ) + a0 (u1 y1 + u2 y2 ) = f (t ) (4.4.17) Reorganizing (4.4.17) according to the terms u1 , u2 , u1 , and u2 , we have u1 (y1 + a1 y1 + a0 y1 ) + u2 (y2 + a1 y2 + a0 y2 ) + (u1 y1 + u2 y2 ) = f (t ) (4.4.18) Now, at this point we recall that y1 and y2 are fundamental solutions to the associated homogeneous equation y + a1 y + a0 = 0, which shows that in (4.4.18) the coefﬁcients of both u1 and u2 are zero. Therefore, (4.4.18) reduces to (4.4.19) u1 y1 + u2 y2 = f (t ) Combining conditions (4.4.16) and (4.4.19) results in the system of linear equations in u1 and u2 given by y1 u1 + y2 u2 = 0 y1 u1 + y2 u2 = f (t ) To solve for u1 and u2 , we multiply the ﬁrst equation by y2 and the second equation by y2 , which gives y2 y1 u1 + y2 y2 u2 = 0 (4.4.20) y2 y1 u1 + y2 y2 u2 = y2 f Subtracting the second equation from the ﬁrst in (4.4.20), we have y2 y1 u1 − y2 y1 u1 = −y2 f and therefore y2 f u1 = y2 y1 − y1 y2

(4.4.21)

Using similar algebra to solve for u2 , we may show that y1 f (4.4.22) u2 = y1 y2 − y2 y1 Finally, to determine u1 and u2 , we integrate to ﬁnd y2 f y1 f (t ) u1 = dt and u2 = dt (4.4.23) y2 y1 − y1 y2 y1 y2 − y2 y1 Once we integrate in (4.4.23) to solve for u1 and u2 , we can conclude that a particular solution yp to the original nonhomogeneous linear second-order

Nonhomogeneous equations

297

differential equation is yp = u1 y1 + u2 y2 where yh = c1 y1 + c2 y2 . Examples will be helpful to demonstrate the key steps of this method. First, we state the formal result proved by our discussion above. Theorem 4.4.2 (Variation of Parameters Method) For the differential equation y + a1 y + a0 y = f (t ), where f is continuous, assume that y1 and y2 are linearly independent solutions of the corresponding homogeneous equation y + a1 y + a2 y = 0. Then, a particular solution to the non-homogeneous equation is yp = u1 y1 + u2 y2 , where u1 and u2 satisfy y2 f y1 f u1 = dt and u2 = dt (4.4.24) y2 y1 − y1 y2 y1 y2 − y2 y1 Example 4.4.8 Solve the differential equation y + y = sec t where we assume that − π2 < t < π2 .

(4.4.25)

Solution. We ﬁrst observe that the corresponding characteristic equation is r 2 + 1 = 0 so that the complementary solution is yh = c1 cos t + c2 sin t . In particular, y1 = cos t and y2 = sin t . We now seek two functions u1 (t ) and u2 (t ) that satisfy the equations (4.4.24). Since y1 = cos t and y2 = sin t , it follows that y1 = − sin t and y2 = cos t , and therefore, we have y2 f sin t sec t u1 = dt = dt y2 y1 − y1 y2 − sin2 t − cos2 t sin t = − sin t sec t dt = − dt = ln(cos t ) cos t and y1 f cos t sec t u2 = dt = dt y1 y2 − y2 y1 cos2 t + sin2 t = 1 dt = t Note that we have used the fundamental trigonometric identity sin2 t + cos2 t = 1 as well as other standard trigonometric relationships such as sec t = 1/ cos t . Also, since we are seeking any two functions u1 and u2 that satisfy (4.4.24), it is not necessary to include the constants that can arise in integrating. Hence we have found that u1 = ln(cos t ) and u2 = t . This enables us to conclude that a particular solution to the equation (4.4.25) is yp = u1 y1 + u2 y2 = ln(cos t ) cos t + t sin t and, therefore, the general solution is y = yh + yp = c1 cos t + c2 sin t + ln(cos t ) cos t + t sin t

298

Higher order differential equations

Example 4.4.9

Solve the equation y + 4y + 4y = e −2t ln t

Solution.

(4.4.26)

To begin, we solve the associated homogeneous equation and get yh = c1 e −2t + c2 te −2t

Thus for variation of parameters, we assume that yp = u1 (t )e −2t + u2 (t )te −2t and we seek u1 and u2 . Since y1 = e −2t and y2 = te −2t , it follows that y1 = −2e −2t and y2 = −2te −2t + e −2t , and therefore by (4.4.24) y2 f te −2t (e −2t ln t ) u1 = dt = dt − 2t − 2t y2 y1 − y1 y2 te (−2e ) − e −2t (−2te −2t + e −2t ) te −4t ln(t ) 1 1 = dt = − t ln t dt = − t 2 ln t + t 2 e −4t (−2t + 2t − 1) 2 4 and

y1 f dt = y1 y2 − y2 y1

u2 = =

e −2t (e −2t ln t ) dt e −4t (−2t + 1 + 2t )

ln t dt = t ln t − t

From these expressions for u1 and u2 , we can conclude that the overall form of the solution y to (4.4.26) is y = yh + yp

1 1 = c1 e −2t + c2 te −2t + − t 2 ln t + t 2 e −2t + (t ln t − t )te −2t 2 4

1 = c1 e −2t + c2 te −2t + t 2 e −2t (2 ln t − 3) 4 Exercises 4.4 In exercises 1–10, determine the complementary solution yh and state the general form of yp that you would guess in applying the method of undetermined coefﬁcients. 1. y − y − 12y = 10e 5t 2. y + y − 2y = 4t 2 − 1 3. y − y = 11e t 4. y + 3y = 3 sin 2t 5. y = t 2 + 3

Nonhomogeneous equations

299

6. y + 4y + 3y = 2t + 4 cos t 7. y + 4y + 4y = t 2 8. y + 4y = 2 sin 2t 9. y + 4y = 20e t cos t 10. y + y − y = 3 In exercises 11–20, solve the stated IVP using the method of undetermined coefﬁcients. Note that the complementary solutions yh and appropriate guesses for yp were found in the corresponding exercises 1–10. 11. y − y − 12y = 10e 5t ,

y(0) = 2,

y (0) = −1

12. y + y − 2y = 4t 2 − 1,

y(0) = 1,

y (0) = 1

13. y − y = 11e t ,

14. y + 3y = 3 sin 2t , 15. y = t 2 + 3,

y (0) = 2

y(0) = −3, y(0) = −2,

y (0) = −2

16. y + 4y + 3y = 2t + 4 cos t , 17. y + 4y + 4y = t 2 , 18. y + 4y = 2 sin 2t ,

y(0) = 2,

y(0) = 5, y(0) = 1,

19. y + 4y = 20e t cos t , 20. y + y − y = 3,

y (0) = 0

y(0) = 0,

y(0) = 0,

y(0) = −1,

y (0) = 0

y (0) = 3 y (0) = −1 y (0) = −1 y (0) = −1

In exercises 21–27, ﬁnd the general solution of the given differential equation using variation of parameters. − π2 < t

solve(rˆ4 - rˆ3 - 7*rˆ2 + r + 6 = 0, r);

Maple produces the output −1, 1, −2, 3

showing that these are the four roots of the characteristic equation. Of course, not all polynomial equations will have all integer solutions, much less all real solutions. For example, if we consider the equation r4 + r3 + r2 + r + 1 = 0 and use the solve command, we see that > solve(rˆ4 + rˆ3 + rˆ2 + r + 1 = 0, r);

Higher order linear differential equations

317

results in the output + + √ √ 1 √ 1√ 1 1 √ 1 1√ − + 5 + I 2 5 + 5, − 5 − + I 2 5 − 5, 4 4 4 4 4 4 + + √ √ 1√ 1 1 √ 1 1√ 1 √ − 5 − − I 2 5 − 5, − + 5− I 2 5+ 5 4 4 4 4 4 4

In this case, we might prefer a decimal approximation to the roots rather than the exactness that Maple provides. One way to achieve this is to use the fsolve command: > fsolve(rˆ4 + rˆ3 + rˆ2 + r + 1 = 0, r, complex);

which generates the result −0.80902 − 0.58779I , −0.80902 + 0.58779I , 0.30902 − 0.95106I , 0.30902 + 0.95106I

Note that without the option “complex” in the fsolve command, the command will not generate any output. This is because the default setting for fsolve is to numerically approximate all of the real roots of the polynomial equation and to ignore complex ones. For polynomial equations of degree 5 or more, the fsolve command is the appropriate tool to use to determine accurate approximations of the equation’s solutions. Exercises 4.6 In exercises 1–12, use the characteristic equation to determine the general solution to the given higher order linear homogeneous DE. 1. y − 2y − y + 2y = 0 2. y − 2y − 3y = 0 3. 4y − 13y − 6y = 0 4. y (4) − 13y + 36y = 0 5. y + 3y + 3y + y = 0 6. y (4) − y − 7y + y + 6y = 0 7. y − y + 4y − 4y = 0 8. y (4) − y = 0 9. y (5) − 2y (4) − y + 2y = 0 10. y (6) + 9y (4) + 24y + 16y = 0 11. y (4) + 4y + 6y + 4y + y = 0 12. y (4) + 3y + y − 5y = 0

318

Higher order differential equations

In exercises 13–22, solve the given IVP. 13. y − 4y = 0,

y(0) = 1, y (0) = 0, y (0) = 2

14. y − 3y + 2y = 0,

y(0) = 0, y (0) = 2, y (0) = 0

15. y − 6y + 11y − 6y = 0,

y(0) = 0, y (0) = 2, y (0) = 0

16. y (4) − 2y − y + 2y = 0,

y(0) = 2, y (0) = 0, y (0) = 10, y (0) = 0

17. y + y + 4y + 4y = 0, 18. y (4) + 5y + 4y = 0, 19. y = 0,

y(0) = 0, y (0) = 10, y (0) = 0

y(0) = 4, y (0) = 0, y (0) = 10, y (0) = 0

y(0) = 2, y (0) = 0, y (0) = 2

20. y (4) − 16y = 0,

y(0) = 4, y (0) = 0, y (0) = 0, y (0) = 0

21. y − 3y + 3y − y = 0, 22. y (5) + y = 0,

y(0) = 1, y (0) = 2, y (0) = 1

y(0) = 1, y (0) = 0, y (0) = 2, y (0) = 0, y (4) (0) = 4

In exercises 23–28, construct a homogeneous linear differential equation of the least possible order that has the given function(s) as solutions. 23. y1 = c, y2 = e t 24. y1 = t 2 e 2t 25. y1 = t , y2 = cos 3t , y3 = e −t 26. y1 = te 4t sin t 27. y1 = e −t /2 cos t , y2 = sin 5t 28. y1 = sin t , y2 (t ) = t sin t 29. Find the general solution to y (4) + 2y + y = cos t . 30. Find a particular solution to y (4) + 2y + y = sin t + 2 cos t . How is your answer similar to the result in exercise 29? In exercises 31–42, use undetermined coefﬁcients to determine the general solution to the stated nonhomogeneous equation. Note that each of the corresponding homogeneous equations has been solved in exercises 1–12. 31. y − 2y − y + 2y = 2 32. y − 2y − 3y = 2e t 33. 4y − 13y − 6y = cos t 34. y (4) − 13y + 36y = t 35. y + 3y + 3y + y = sin t 36. y (4) − y − 7y + y + 6y = t 2 + 3

For further study

319

37. y − y + 4y − 4y = e −t 38. y (4) − y = 3t 39. y (5) − 2y (4) − y + 2y = 7 40. y (6) + 9y (4) + 24y + 16y = t 2 41. y (4) + 4y + 6y + 4y + y = t + cos t 42. y (4) + 3y + y − 5y = 2t − sin t + e t 4.7 For further study 4.7.1 Damped motion

Consider the general form of the spring-mass equation (4.7.1) my + cy + ky = 0 where c = 0 so that viscous damping is present. In what follows, we explore how the values of the constants m, c, and k affect the behavior of the solution y. Note that in this context, m, c, and k are always positive. (a) Show that the roots of the characteristic polynomial of (4.7.1) are √ −c ± c 2 − 4mk λ= 2m (b) We examine the three possible cases for the roots of the characteristic polynomial: √ (i) Suppose that c 2 − 4km > 0. Explain why c 2 − 4mk < c and thus why both roots of the characteristic equation must be negative. State the general solution to the equation (4.7.1) in terms of the constants c, m, and k. (ii) Suppose that c 2 − 4km = 0. Discuss the number of real roots of the characteristic polynomial and state the general solution to the equation (4.7.1) in terms of the constants c and m. (iii) Suppose that c 2 − 4km < 0. Explain why both roots√of the characteristic polynomial are complex. Using = 4mk − c 2 /(2m), state the general solution to the equation (4.7.1) in terms of the constants c, m, and . (c) The respective cases (i), (ii), and (iii) in (b) are typically called overdamping, critical damping, and underdamping. How is the case of underdamping signiﬁcantly different from overdamping and critical damping? Explain both in terms of the algebraic form of the solution as well as in terms of the solution’s expected graph. (d) A 4-kg mass is suspended from a spring with constant k = 25, and a dashpot with various levels of damping viscosity is present. The mass is

320

Higher order differential equations

displaced 0.5 m from its equilibrium and released. Determine the displacement y(t ) of the mass if (i) c = 15, (ii) c = 20, (iii) c = 25, and (iv) c = 30 In each case, state whether the system is overdamped, critically damped, or underdamped, and sketch the solution curve. (e) The case of underdamping is the most interesting of the three cases, for it is here that multiple oscillations through equilibrium occur. In (b)(iii), you should have shown that the general solution may be expressed in the form c

y = e − 2m t (c1 cos t + c2 sin t ) Show that y may be alternatively expressed in the form c

y = Ae − 2m t cos(t − θ )

+

(4.7.2)

where A = c12 + c22 and tan θ = c1 /c2 . (Hint: Set A cos(t − θ ) = c1 cos t + c2 sin t and equate like coefﬁcients after using the trigonometric identity cos(α − β ) = cos α cos β + sin α sin β .) (f) In the underdamped case, we are interested in how fast the amplitude of the oscillations decays to zero. In what follows, we show how the ratio of consecutive local maxima (or minima) of y(t ) depends only on the constants c, m, and . c

(i) Using y = Ae − 2m t cos(t − θ ) from (e), determine y and show that y = 0 if and only if c tan(t − θ ) = − (4.7.3) 2m (ii) If the solutions of (4.7.3) are denoted by tn , then show that " 1 c # nπ θ + (4.7.4) tn = + arctan − 2m Explain why we expect y(tn ) and y(tn+1 ) to be a local maximum and minimum (or local minimum and maximum), respectively, of y(t ), and hence why y(tn ) and y(tn+2 ) will be consecutive maxima or consecutive minima. (iii) Let yn = y(tn ) and yn+2 = y(tn+2 ). Using (4.7.2), evaluate y(tn ) and y(tn+2 ) and verify that yn cos(tn − θ ) − c (tn −tn+2 ) = (4.7.5) e 2m yn+2 cos(tn+2 − θ ) (iv) Show that (4.7.3) implies (tn − tn+2 ) = −2π and thus cos(tn − θ ) π c /m yn = (4.7.6) e yn+2 cos(tn+2 − θ )

For further study

321

(v) Show that tn − θ = tn+2 − θ − 2π

so that cos(tn − θ ) = cos(tn+2 − θ ) Use this last result to prove that yn = e π c /m yn+2

(4.7.7)

(g) The logarithm of (4.7.7), D = ln

πc yn = ln e π c /m = yn+2 m

(4.7.8)

is called the logarithmic decrement. Note that this quantity is independent of t as well as the initial conditions present in the underdamped case for the DE (4.7.1), and that the value of the logarithmic decrement tells us how rapidly consecutive oscillations diminish in the underdamped case. For each of the following underdamped spring-mass systems, determine the solution function y(t ) and compute the logarithmic decrement. Explain how the value of the logarithmic decrement tells you whether oscillations will die out slowly or rapidly. Using a computer algebra system to execute the routine calculations is particularly appropriate here. In each case, assume the mass is displaced 1 m and released. (i) m = 4, c = 19, k = 25 (ii) m = 4, c = 10, k = 25 (iii) m = 4, c = 1, k = 25 (iv) m = 4, c = 0.1, k = 25 4.7.2 Forced oscillations with damping

Consider the general form of the forced spring-mass equation my + cy + ky = f (t )

(4.7.9)

where c > 0 so that viscous damping is present. Again, we remark that in this context m and k are always positive. (a) Show that if

√ c 2 − 4km = 2m

then the complementary solution of (4.7.9) is c yh (t ) = e − 2m t c1 e t + c2 e −t (b) Explain why lim yh (t ) = 0

t →∞

(4.7.10)

322

Higher order differential equations

Recall that we call yh (t ) the transient solution. What does this tell us about the role played by the particular solution yp (t ) in the general solution y = yh + yp as t → ∞? (c) We now consider the effects of the periodic forcing function f (t ) = F0 cos ωt . With this function, we have seen that resonance is only possible when no damping is present; here, we wish to explore the impact of the parameters in f (t ) on the steady-state solution yp to (4.7.9). (i) Use the method of undetermined coefﬁcients to show that with f (t ) = F0 cos ωt , the particular solution yp to (4.7.9) is F0 (k − m ω2 ) cω yp = cos ω t + sin ω t (4.7.11) (k − m ω2 )2 + ω2 c 2 k − m ω2 (ii) As in our study of undamped spring-mass systems and resonance, we let ω0 = k /m. Show that yp (t ) may be equivalently expressed in the form F0 yp = 2 2 cos(ωt − θ ) (4.7.12) m (ω 0 − ω 2 )2 + ω 2 c 2 Compare the result to (4.7.2). (iii) Observe that the amplitude of the oscillation of yp in (4.7.12) is ( ω ) =

F0 2 2 m (ω 0 − ω 2 )2 + ω 2 c 2

(4.7.13)

and that ω0 , m, and c are ﬁxed constants determined by the given spring-mass system. We now examine how the size of these oscillations depends on ω. First, compute d dω Then, set d /d ω = 0 to show that the maximum amplitude occurs when c2 ω2 = ω02 − (4.7.14) 2m 2 (iv) Explain why if c satisﬁes c 2 > 2m 2 ω02 , then there is no value of ω that produces a maximum amplitude of oscillation. In addition, note that when a maximum amplitude exists (i.e., provided c 2 < 2m 2 ω02 ), its value is given by (ω) where ω satisﬁes (4.7.14). Use this condition to compute (ω) and show that 2mF0 max = + c 4m 2 ω02 − c 2

(4.7.15)

For further study

323

(v) Consider a particular spring-mass system for which m = 1 and k = 4 where we consider various damping constants c. In addition, assume we apply the forcing function f (t ) = cos ωt , so that F0 = 1. Recall that ω0 = k /m, so ω0 = 2. For each of the c-values c = 0.1, 1, 2, 3, 4, 5, 6, plot the function (ω) =

F0 2 2 m (ω0 − ω2 )2 + ω2 c 2

on the interval ω = 0 . . . 10. When a maximum oscillation exists, where does it occur? How is the size of the maximum oscillation correlated with c and ω? What should we ensure about the relationship between ω and ω0 if we want to avoid large amplitude oscillations? (d) Complete the following exercises which examine the magnitude of oscillations in damped, driven spring-mass systems. (i) A forcing function f (t ) = 10 sin 2t is imposed on a spring-mass system for which m = 2 kg and k = 8 N/m. Determine the damping constant necessary to limit the amplitude of the motion to a maximum of 2 m. (ii) A forcing function f (t ) = 50 cos ωt is imposed on a spring-mass system for which m = 4 kg, k = 100 N/m, and c = 2 kg/s. Calculate the amplitude of the resulting motion for ω = 4, ω = 4.5, ω = 5, and ω = 6. (iii) Determine the input frequency ω that gives the maximum amplitude for the spring-mass system in (ii) above. For this frequency, what is the maximum amplitude? 4.7.3 The Cauchy–Euler equation

The vast majority of our efforts with higher order DEs have involved linear equations with constant coefﬁcients. The Cauchy–Euler equation is an important example of a linear, second-order DE whose coefﬁcients are not constant. In particular, the Cauchy–Euler equation is a differential equation of form t 2 y + pty + qy = 0

(4.7.16)

where p and q are real constants and t > 0. (a) Explain why it is reasonable to guess that y(t ) = t λ is a solution to (4.7.16). Show by direct substitution in (4.7.16) that the guess y(t ) = t λ requires λ to be a solution to the characteristic equation λ2 + (p − 1)λ + q = 0

(4.7.17)

(b) In the case where (4.7.17) has two distinct real roots λ1 and λ2 , then the general solution to the Cauchy–Euler equation is y = c1 t λ1 + c2 t λ2

324

Higher order differential equations

Solve each of the following Cauchy–Euler initial-value problems: (i) t 2 y − 5ty + 8y = 0, y(1) = 1, y (1) = 0 (ii) t 2 y + 9ty + 12y = 0, y(1) = 1, y (1) = 0 (c) When (4.7.17) has a repeated real root λ1 = λ2 = λ, then we have only determined one linearly independent solution (y1 = t λ ) of the Cauchy–Euler equation. Here we determine a second linearly independent solution. (i) Assuming that λ is a repeated root of (4.7.17), show that 1 − p = 2λ. (ii) Letting v(t ) be an unknown function, consider the guess y2 = v · t λ . By direct substitution in the Cauchy–Euler equation, show that v must satisfy the equation t λ [t 2 v + (2λ + p)tv + (λ2 + (p − 1)λ + q)v ] = 0

(4.7.18)

(iii) Use your work in (i) and (ii), as well as the fact that λ satisﬁes the equation λ2 + (p − 1)λ + q = 0

to show that y2 = v · t λ is a solution of the Cauchy–Euler equation in the case of a repeated root provided that tv + v = 0

(4.7.19)

(iv) Show that v(t ) = ln t is a solution of (4.7.19) and hence state the general solution of the Cauchy–Euler equation in the case where the characteristic equation has a single real repeated root. (d) Solve each of the following Cauchy–Euler initial-value problems: (i) t 2 y + 7ty + 9y = 0, y(1) = 1, y (1) = 0 (ii) t 2 y − 9ty + 25y = 0, y(1) = 1, y (1) = 0 (e) When (4.7.17) has complex roots, say λ1 = a + bi and λ2 = a − bi, then we proceed with a corresponding complex solution to the Cauchy–Euler equation and verify that its real and imaginary parts are themselves real, linearly independent solutions to the equation. In particular, with λ = a + bi, observe that z(t ) = t λ = t a +bi = t a t bi By writing t bi = e ln(t

bi )

= e bi ln t

and applying Euler’s formula, show that z(t ) = t a [cos(b ln t ) + i sin(b ln t )]

(4.7.20)

In addition, show by direct substitution that y1 (t ) = t a cos(b ln t ) is a solution to the Cauchy–Euler equation when a + bi is a root of the

For further study

325

characteristic polynomial. Likewise, show that y2 (t ) = t a sin(b ln t ) is a solution. Hence, state the general solution to the Cauchy–Euler equation in the case where the characteristic polynomial has complex roots λ = a ± bi. (f) Solve each of the following Cauchy–Euler initial-value problems: (i) t 2 y + 3ty + 5y = 0, y(1) = 1, y (1) = 0 (ii) t 2 y − 3ty + 13y = 0, y(1) = 1, y (1) = 0 4.7.4 Companion systems and companion matrices

Given a second-order linear differential equation with constant coefﬁcients such as y + by + cy = 0

(4.7.21)

we know that through the substitution x1 = y, x2 = y we can convert (4.7.21) to the system of ﬁrst-order equations given by x1 = x2

x2 = −cx1 − bx2

(4.7.22)

The system (4.7.22) is called the companion system of (4.7.21). In what follows, we explore the connections between the original equation and its companion system. (a) Consider the homogeneous linear second-order DE y + 3y + 2y = 0

(4.7.23)

Using the guess y = e rt , ﬁnd the characteristic equation of (4.7.23) and the values of r that make y = e rt a solution of the given DE. (b) Convert the DE (4.7.23) into a system of ﬁrst-order equations in the form x = Ax. In addition, determine the eigenvalues of the matrix A. (c) What do you observe about the roots of the characteristic equation in (a) and the eigenvalues of the matrix in (b)? Why is this result not surprising? (d) Find the general solution of the second-order equation (4.7.23) using standard methods from chapter 4. Find the general solution of the ﬁrst-order system you found in (b) using standard methods from chapter 3. Explain how your two results agree. (e) Now consider the general equation (4.7.21) where b and c are arbitrary constants and its corresponding companion system. (i) Show that the roots of the characteristic equation are √ −b ± b 2 − 4c r= 2

326

Higher order differential equations

and that the eigenvalues of the coefﬁcient matrix of the companion system are √ −b ± b 2 − 4c λ= 2 2 (ii) Assuming that b − 4c > 0 so that the values of r in (i) are real and distinct, state the general solution of (4.7.21). (iii) Show that the eigenvectors of the matrix of the companion system that correspond to λ1 and λ2 are given by 1 1 and v1 = v1 = λ1 λ1 √ √ where λ1 = (−b + b 2 − 4c)/2 and λ2 = (−b − b 2 − 4c)/2. State the general solution to the companion system. (iv) Compare your result from (ii) to the result for x2 in (iii). Do your solutions agree? (f) Our work above shows that for any second-order differential equation, there exists a companion system of two ﬁrst-order equations whose vector solution contains the solution of the second-order equation. For the third-order equation y + 2y − y − 2y = 0 ﬁnd the solution of the system directly by using standard methods from chapter 4. Then, ﬁnd the general solution of the ﬁrst-order companion system constructed from the substitution x1 = y, x2 = y , x3 = y using standard methods from chapter 3. Compare your results. (g) In both the direct solution of higher order linear differential equations and in the solution of systems of linear ﬁrst-order equations, the solution methods require us to ﬁnd roots of polynomials. Our work above enables us to see the fact that any polynomial has an associated matrix, a so-called companion matrix, whose eigenvalues are the same as the zeros of the polynomial. In general, given a polynomial function p(t ) = t n + an−1 t n−1 + an−2 t n−2 + · · · + a1 t + a0 the companion matrix of p(t ) is given by ⎡ 0 1 0 0 ··· ⎢ 0 0 1 0 ··· ⎢ ⎢ .. . .. C=⎢ . ⎢ ⎣ 0 0 ··· 0 0 −a0 −a1 −a2 · · · −an−2

0 0 .. . 1 −an−1

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(4.7.24)

That is, C is an n × n matrix whose ﬁrst n − 1 rows are all zero except for the entry just above the diagonal, whose value is 1. The ﬁnal row consists

For further study

327

of the opposites of the coefﬁcients of the constant, linear, etc., terms of the polynomial p. It can be proved that, in general, the eigenvalues of C are the same as the zeros of p(t ). We verify this fact through a few examples. (i) For the polynomial p(t ) = t 2 + 3t + 2, determine the companion matrix C. Compute the eigenvalues of C directly and compare the result to the zeros of p(t ). (ii) For the polynomial p(t ) = t 3 + 3t 2 + 3t + 1, determine the companion matrix C. Compute the eigenvalues of C directly and compare the result to the zeros of p(t ). (iii) For the polynomial p(t ) = t 4 − 1, determine the companion matrix C. Compute the eigenvalues of C directly and compare the result to the zeros of p(t ). (h) For the nth-order linear homogeneous equation y (n) + an−1 y (n−1) + · · · a1 y + a0 y = 0

(4.7.25)

show that the coefﬁcient matrix of the corresponding companion system is in fact that companion matrix of the characteristic polynomial of (4.7.25).

This page intentionally left blank

5 Laplace transforms

5.1 Motivating problems

In this chapter, we again consider solving nonhomogeneous linear differential equations such as y + a1 y + a2 y = f (t ) but in contexts where the forcing function is different from those we have previously encountered. While we have developed the methods of undetermined coefﬁcients and variation of parameters to approach this problem, there are several reasons to consider a different means of solution. Perhaps, most prominent is that in every example to date, we have assumed that the function f (t ) is continuous. Indeed, it has also typically been the case that f (t ) is a standard function, one belonging to the library of basic functions like sin 2t and ln t that we encounter in calculus. In many applications, however, it is possible for f (t ) to be piecewise deﬁned, discontinuous, or worse. We consider two examples that demonstrate these possibilities. Electrical circuits with a voltage source provide a common situation where the forcing function f (t ) is not continuous. If we ﬂip a switch to turn the voltage on, then the forcing function is actually a step function that leaps from zero to a constant value. Recall that the charge Q(t ) in an RLC circuit is modeled by the second-order equation 1 LQ + RQ + Q = E(t ) (5.1.1) C where E(t ) is an external voltage source. Suppose that we are given an RLC circuit with an initial charge Q(0) and initial current Q (0), and that the voltage 329

330

Laplace transforms

E(t ) = 1000 is turned on at t = 4. The voltage function E(t ) is, therefore, deﬁned piecewise by the formula ! 0, if 0 ≤ t < 4 E(t ) = 1000, if t ≥ 4 Let us further assume that L = 20 H, R = 40 , C = 10−2 F, and that Q(0) = 25 and Q (0) = 0. From the given information and (5.1.1), we know that Q(t ) is modeled by the initial-value problem 20Q + 40Q + 100Q = E(t ), Q(0) = 25, Q (0) = 0

(5.1.2)

We have not yet encountered means to deal with a step function as the forcing function in an initial-value problem. In section 5.4, we will discuss step functions in detail, learning how they may be used to turn other functions on and off; in addition, we will show how the Laplace transform provides an ideal tool for dealing with piecewise-deﬁned functions in initial-value problems. With these tools, we will be able to determine the solution Q(t ) for (5.1.2) whose graph is shown in ﬁgure 5.1. Observe that we see the expected damped oscillation in Q(t ) up until time t = 4 when the forcing function E(t ) is turned on, at which point we see the solution driven vertically away from zero so that as t increases, Q(t ) → 10. That Q(t ) approaches 10 should not surprise us since Q(t ) = 10 is a constant solution to the equation 20Q + 40Q + 100Q = 1000 In fact, Q(t ) = 10 is a stable equilibrium solution of the equation. In addition to functions that get turned on or off at a certain time, another important forcing function to consider is a so-called impulse function. These functions are ones where a force is imparted over an extremely short time interval such as a hammer striking a mass. In section 5.4, we introduce the Dirac delta function, δ (t ), study its properties, and see how it may be used in settings such as the following. Q 20

10 t 4

8

Figure 5.1 The solution Q(t ) to

(5.1.2).

Laplace transforms: getting started

331

y 0.4

0.2 t 8

4

Figure 5.2 The solution curve y(t ) to

(5.1.3).

Suppose that a mass of 1 kg is attached to a spring with constant k = 4 and the system’s damping constant is c = 2. In addition, assume that the mass is initially displaced 0.5 m from equilibrium and released. At time t = 4, the mass is struck with a hammer imparting a unit impulse in the positive direction. The combination of all of these conditions leads to the initial-value problem y + 2y + 4y = δ (t − 4),

y(0) = 0.5, y (0) = 0

(5.1.3)

where the function δ (t − 4) represents the hammer imparting the unit force of impulse. Just as with piecewise-deﬁned functions, we will learn that the Laplace transform provides an ideal tool for dealing with impulses. Once we develop the appropriate theory, we will be able to solve initial-value problems such as (5.1.3) and see that the solution behaves as shown in ﬁgure 5.2. In the solution, we see the noticeable impact of the impulse as the problem appears to restart, almost as if new initial conditions have been given at time t = 4. In addition to being able to address discontinuous and impulse forcing functions, the Laplace transform is a powerful tool because it handles all allowable forcing functions in the same manner. Moreover, in each case it proceeds directly to the solution of initial-value problems without ﬁrst ﬁnding the general solution to the differential equation. These ideas and more will be studied in subsequent sections.

5.2 Laplace transforms: getting started

The motivating idea behind the Laplace transform is natural: to solve a differential equation, our desire is to integrate. For the simplest examples, such as y = y, we know that we can separate variables and integrate in order to determine y. However, if we approach the problem y + a0 y = f (t )

(5.2.1)

332

Laplace transforms

by attempting to integrate both sides from 0 to s with respect to t in order to eliminate y , doing so leads to the equation s s s y (t ) dt + a0 y(t ) dt = f (t ) dt (5.2.2) s

0

0

0

While s 0 y (t ) dt = y(s) − y(0) eliminates the derivative y from the equation, and 0 f (t ) dt can s usually be computed for a given f , in (5.2.2) we are left with the expression 0 y(t ) dt , where y is an unknown function. Essentially this step of integrating has replaced the derivative of the unknown function y with its integral in the equation we are endeavoring to solve. This leaves us no closer to ﬁnding the solution function y(t ). Rather than simply trying to integrate, the Laplace transform uses a modiﬁed approach in which every function in (5.2.1) is multiplied by another function before integrating; this approach will enable us to convert differential equations in y(t ) and y (t ) to algebraic equations in a new unknown function Y (s) that we can solve for Y (s). This method is similar to the use of integrating factors when solving linear ﬁrst-order equations. Before we formally deﬁne the Laplace transform, we discuss a few preliminary ideas, some of which are familiar concepts from calculus. First, we assume throughout this chapter that all forcing functions are piecewise continuous functions deﬁned for t > 0 and that

f (0) = f (0+ ) = lim f (t ) t →0+

(5.2.3)

That is, f cannot be discontinuous at the origin itself, though it is allowed to have ﬁnitely many discontinuities for t > 0. Furthermore, we assume that the forcing function does not grow more rapidly than an exponential function. Formally, we will assume that f (t ) is of exponential order, which means that for sufﬁciently large t , |f (t )| ≤ Me bt

(5.2.4)

for positive constants M and b. Functions that are piecewise continuous and meet conditions (5.2.3) and (5.2.4) are called acceptable. For example, polynomial functions, sin kt , e kt , and sums and products of these functions are acceptable, as are piecewise-deﬁned functions with ﬁnitely many discontinuities whose pieces consist of these basic functions. In particular, linear combinations of acceptable functions are acceptable. Functions such as e t , t −1/2 , (t − 1)−1 2

are not acceptable. The ﬁrst grows too rapidly to be of exponential order, the second fails to meet the condition (5.2.3) that a limit exists from the right at the origin, and the third is not piecewise continuous on any interval containing t = 1.

Laplace transforms: getting started

333

In addition, from calculus we recall the following important concepts: t • If y = f (t ) and y(0) = 0, then y = 0 f (s) ds. ∞ • The improper integral 0 f (t ) dt is said to converge whenever

r

lim

r →∞ 0

f (t ) dt

exists. If this limit fails to exist, we say the improper integral diverges. • Given a function of two variables K (s , t ), if we integrate this function with respect to t from t = a to t = b, the result is a function of s. That is,

b

K (s , t ) dt

a

is a function of s. Recall our earlier note regarding the overall approach with Laplace transforms: in order to solve an initial-value problem, we integrate both sides of the differential equation after both sides have been multiplied by a more complicated function. The main idea is that we use the transformation given by

∞

K (s , t )f (t ) dt

0

Knowing the prominent role that the exponential function has played throughout our work with differential equations to date, it is not surprising that we choose to use K (s , t ) = e −st . Speciﬁcally, we make the following deﬁnition. Deﬁnition 5.2.1 Let f (t ) be an acceptable function deﬁned on the interval [0, ∞). The Laplace transform of f (t ), denoted L[f ], is the function deﬁned by ∞ L[f ] = e −st f (t ) dt (5.2.5) 0

We note that because L[f ] is a function of s, we often write F (s) rather than the more explicit L[f (t )]. We consider an example to see the Laplace transform at work. Example 5.2.1 Compute the Laplace transform of f (t ) = t . Solution.

By deﬁnition,

∞

L[t ] = 0

te −st dt

(5.2.6)

334

Laplace transforms

Replacing the improper integral with a limit and integrating by parts, we observe that r L[t ] = lim te −st dt r →∞ 0

1 1 −st r = lim − t+ e r →∞ 0 s s 1 1 −sr 1 1 0 r+ e + 0+ e = lim − r →∞ s s s s r −sr 1 −sr 1 = lim − e − 2 e + 2 r →∞ s s s

(5.2.7)

By L’Hopital’s Rule,1 we know that re −sr → 0 as r → ∞ for each s > 0. Combined with the fact that e −sr → 0 as r → ∞, it follows from (5.2.7) that 1 L[t ] = F (s) = 2 (5.2.8) s Soon we will apply the Laplace transform in order to solve initial-value problems. This process will require us to also use the inverse Laplace transform which asks, “given a function F (s), what function f (t ) is such that L[f (t )] = F (s)?” For instance, (5.2.8) tells us we may write 1 (5.2.9) L−1 2 = t s Much more on inverse transforms will follow as we progress in our study. It is not obvious that the Laplace transform of every acceptable function exists. While we omit the proof, it is possible to prove the following theorem by showing that not only does f (t ) being acceptable guarantee that L[f (t )] = F (s) exists, but that F (s) is a function that must tend to 0 as s → ∞. Theorem 5.2.1 If f (t ) is acceptable, then the Laplace transform F (s) of f (t ) exists. Moreover, 1. sF (s) is bounded as s → ∞, from which it follows that 2. lim F (s) = 0. s →∞

Although it is not necessary for a function to be acceptable in order to have a Laplace transform, our focus will be almost exclusively on acceptable functions. In addition, we note that not all elementary functions can be generated by taking the Laplace transform of an acceptable function. For instance, F (s) = 1 cannot be the Laplace transform of an acceptable function since both parts of theorem 5.2.1 are contradicted. 1

lim

r

r →∞ e sr

= lim

r →∞

1 = 0. se sr

Laplace transforms: getting started

335

The next three examples further illustrate the deﬁnition and notational conventions we use with Laplace transforms. Example 5.2.2 Compute the Laplace transform of f (t ) = 1. Solution.

From the deﬁnition, we observe that ∞ 1 −st r 1 −sr 1 1 −st = L[1] = e dt = lim − e = lim − e + r →∞ r →∞ 0 s s s s 0

since e −sr → 0 as r → ∞. Example 5.2.3 Find the Laplace transform of f (t ) = e at . Solution.

We compute ∞ at at −st L[e ] = e e dt = 0

∞

e

(a −s)t

dt = lim

r →∞ 0

0

r

e (a −s)t dt

1 (a −s)t r 1 (a −s)r 1 1 = lim − = e e r →∞ a − s r →∞ a − s 0 a −s s −a

= lim

provided that s > a, for then e (a −s)r → 0 as r → ∞. At times, we will need to restrict the values of s in order for the Laplace transform to exist. Above, we observed that L[e at ] = 1/(s − a), provided that s > a. Usually, we will suppress the discussion of the restriction on s-values and simply assume that the domain of the Laplace transform is as large as possible. Example 5.2.4 Find L[cos kt ] and L[sin kt ]. Solution.

By deﬁnition,

∞

L[cos kt ] =

cos kte −st dt

0

Integrating by parts twice or using a table of integrals, −st r 1 2 L[cos kt ] = lim 2 k sin kt − s cos kt e r →∞ s + k 2 0 −sr 1 2 1 = lim 2 k sin kr − s cos kr e − (0 − s) r →∞ s + k 2 s2 + k2 −sr 2 e k sin kr e −sr s cos kr s = lim − + (5.2.10) r →∞ s2 + k2 s2 + k2 s2 + k2 Since e −sr → 0 as r → ∞ and | sin kr | and | cos kr | are bounded by 1 as r → ∞, it follows from (5.2.10) that s L[cos kt ] = 2 s + k2

336

Laplace transforms

Similar computations show L[sin kt ] =

k s2 + k2

Table 5.1 Laplace transforms of some basic functions ∞ f (t) F(s) = L[f (t)] = 0 f (t)e−st dt

1

1/s

t

1/s 2

t2

2/s 3

e at

1/(s − a)

cos kt

s /(s 2 + k 2 )

sin kt

k /(s 2 + k 2 )

We close this section with table 5.1, which summarizes the Laplace transforms we have computed so far. Observe that each line in the table may also be written in inverse form. For example, L−1 [1/(s − a)] = e at . This will be particularly useful in the next section as we see the ﬁrst example of how the transform and its inverse can be used to solve an initial-value problem. In order to apply the Laplace transform successfully, we need to develop a deeper understanding of its properties and explore the impact of the transform on a wide range of functions. The following exercises and our investigations in the next section continue our work to this end. Exercises 5.2 In exercises 1–4, explain why the limit of each function g (r) is 0 as r → ∞. In each, assume s > 0. 1. g (r) = re −sr 2. g (r) = r 2 e −sr 3. g (r) = r n e −sr 4. g (r) = e −sr sin kr In exercises 5–16, use the deﬁnition of the Laplace transform to compute L[f (t )]. For each, state the domain of s-values on which L[f (t )] = F (s) is deﬁned. 5. f (t ) = 2t 6. f (t ) = t − 3

General properties of the Laplace transform

337

7. f (t ) = 2 − t 8. f (t ) = t 2 9. f (t ) = t 2 − 3 10. f (t ) = (t − 2)2 11. f (t ) = e 3t 12. f (t ) = e 2t −3 13. f (t ) = e 3t +5 14. f (t ) = cos 4t 15. f (t ) = te at 16. f (t ) = t sin 2t From examples 5.2.2 and 5.2.1, we know that L[1] =

1 1 and L[t ] = 2 s s

Use these facts to compute the Laplace transform of each of the functions in exercises 17–19 with as little computation as possible. What properties of integrals and limits are being used? 17. f (t ) = 1 + t 18. f (t ) = 3t − 2 19. f (t ) = c + kt 20. Explain why the Laplace transform is a linear operator on the vector space of acceptable functions.2 That is, explain why for any real numbers a and b and any acceptable functions f and g , L[af (t ) + bg (t )] = a L[f (t )] + b L[g (t )]

5.3 General properties of the Laplace transform

In many ways, the Laplace transform resembles the differentiation and integration operators from calculus. For example, given a function f (t ) = 3t 4 + 5t + 1, taking the derivative results in a new function f (t ). Using the alternate notation D [f ] for the derivative of f with respect to t , we see that D [3t 4 + 5t + 1] = 12t 3 + 5 2

See appendix D for further discussion on linear transformations of vector spaces.

338

Laplace transforms

In particular, the “D” operator transforms one function into another. Likewise, if we consider the deﬁnite integral of f (t ) = t − 1 from t = 0 to t = x, we ﬁnd that x 1 (t − 1) dt = x 2 − x 2 0 x Letting I (f ) = 0 f (t ) dt , we see that I transforms one function f (t ) into another function F (x) by the process of integration. In the same way, as we have seen in examples 5.2.1–5.2.4, the Laplace transform takes an acceptable function f (t ) and transforms it into a new function F (s) by a process slightly more complicated than standard integration. From calculus and our preceding work with differential equations, we know that taking the derivative of a function is a linear process, as is calculating the deﬁnite integral. More speciﬁcally, for any constants a and b and functions f (t ) and g (t ) that are differentiable and integrable, we know that D [af (t ) + bg (t )] = aD [f (t )] + bD [g (t )] and

x 0

x

[af (t ) + bg (t )] dt = a 0

x

f (t ) dt + b

g (t ) dt 0

Similarly, because the Laplace transform’s deﬁnition involves limits and integrals, it has the same properties of linearity as the derivative and integral operators. In particular, as was shown in exercise 20 of section 5.2, the following theorem holds. Theorem 5.3.1 and g (t ),

For every pair of scalars a and b and acceptable functions f (t ) L[af (t ) + bg (t )] = a L[f (t )] + b L[g (t )]

(5.3.1)

Theorem 5.3.1 shows that the Laplace transform, like the differential and integral operators, is a linear transformation or linear operator. Formally, a linear transformation is a function T that maps one vector space V to another vector space W where T satisﬁes the property that for all constants a and b and all elements u and v in V , T (au + bv) = aT (u) + bT (v). Appendix D provides further discussion on linear transformations of vector spaces. In calculus, following the deﬁnitions of the derivative and the deﬁnite integral, we quickly discover more general properties that enable us to compute derivatives and integrals without using the deﬁnition directly. In the same way, while we have seen a few examples of how to use the deﬁnition to compute the Laplace transform of certain functions f (t ), we can use results such as theorem 5.3.1 to more easily determine the Laplace transform of more complicated functions. Two examples follow. Example 5.3.1

Find the Laplace transform of f (t ) = 7 − 3e 2t .

General properties of the Laplace transform

339

Solution. We know from examples 5.2.2 and 5.2.3 that L[1] = 1/s and L[e 2t ] = 1/(s − 2). By theorem 5.3.1 it now follows that L[7 − 3e 2t ] = 7L[1] − 3L[e 2t ] =

7 3 − s s −2

We note that the individual Laplace transforms are deﬁned on different domains: 7/s is valid for s > 0 while 3/(s − 2) is deﬁned if s > 2. We usually suppress discussion of this issue and assume that L[f (t )] is deﬁned on the largest interval possible. In example 5.3.1, this domain is {s |s > 2}. Example 5.3.2 Find the Laplace transform of cosh kt and sinh kt . Solution. By deﬁnition, the hyperbolic cosine function is given by cosh kt = 1 kt 1 −kt . By the linearity of the Laplace transform, it follows that 2e + 2e 1 1 L[cosh kt ] = L[e kt ] + L[e −kt ] 2 2 1 1 1 s + = 2 = 2 s −k s +k s − k2 Similarly,

L[sinh kt ] = L

1 kt 1 1 1 k − = 2 (e − e −kt ) = 2 2 s −k s +k s − k2

In addition to taking linear combinations of functions, we often want to multiply a given function by t or some power of t . For example, it is natural to wonder if we can use our work in preceding examples to compute L[te at ]. If we ﬁrst consider the Laplace transforms of the simple power functions 1, t , t 2 , and so on, we ﬁnd evidence for a conjecture on how we might approach L[te at ]. In particular, note that L[1] =

1 s

L[t ] =

1 s2

L[t 2 ] =

2 s3

(5.3.2)

The last result was shown in exercise 8 of section 5.2. In fact, we could go on to show that L[t 3 ] = 6/s 4 . This sequence of results reminds us of derivatives: in particular, 1 d 1 2 d 2 6 d 1 (5.3.3) =− 2 = − =− 4 ds s s ds s 2 s 3 ds s 3 s From this sequence of examples, it appears that each time we take a given function f (t ) = t n and multiply it by t , the impact on its Laplace transform is that the transform of the new function is the opposite of the derivative of the transform of the original. Using a result from multivariable calculus known as Leibniz’s rule, a formal proof of this fact may be established, not only for power

340

Laplace transforms

functions, but also for all functions having Laplace transforms. We defer this work to exercise 25 and state the following theorem. If L[f (t )] = F (s), then

Theorem 5.3.2

L[tf (t )] = −F (s) = −

d F (s) ds

(5.3.4)

Theorem 5.3.2 enables us to expand on our observations above regarding the Laplace transforms of the power functions t , t 2 , t 3 , and so on. In particular, replacing F (s) with L[t ], we can take the perspective that (5.3.4) implies L[tf (t )] = −

d L[f (t )] ds

(5.3.5)

This shows that, for example, L[t 4 ] = L[t · t 3 ] = −

d d 6 24 L[t 3 ] = − = 5 ds ds s 4 s

In addition, a generalization of this reasoning can be used to show the following corollary to theorem 5.3.2. See exercise 26. Corollary 5.3.3

For each positive integer n, L[t n f (t )] = (−1)n F (n) (s)

(5.3.6)

We next consider two examples that show how we can use recent results to compute the Laplace transform of familiar functions multiplied by t . Example 5.3.3

Find L[te at ] and L[t 2 e at ].

Solution. We know from earlier work that L[e at ] = 1/(s − a). It follows from theorem 5.3.2 that d d 1 1 L[te at ] = − L[e at ] = − = ds ds s − a (s − a)2 Similarly,

d d 1 2 at L[t e ] = − L[te ] = − = ds ds (s − a)2 (s − a)3 2 at

In fact, as we will see in exercise 27, we can show in general that L[t n e at ] =

Example 5.3.4

Find L[t sin kt ].

n! (s − a)n+1

(5.3.7)

General properties of the Laplace transform

Solution.

341

In example 5.2.4, we showed that L[sin kt ] =

k s2 + k2

Applying theorem 5.3.2, we know that d k 2ks L[t sin kt ] = − = 2 ds s 2 + k 2 (s + k 2 )2 As we have noted, we are motivated to develop the Laplace transform by the need to solve initial-value problems that involve unusual forcing functions. For example, we will soon work to solve equations of the form y + a0 y = f (t )

(5.3.8)

where f (t ) is a step function or other piecewise deﬁned function. We will use our understanding of the Laplace transform to solve these equations by taking the Laplace transform of each side of (5.3.8) to transform the differential equation (in t ) into an algebraic equation (in s). Our hope is that upon doing so, we can solve the new algebraic equation in order to ultimately solve the differential one. To see how this process begins, we take the Laplace transform of both sides of (5.3.8) and apply the linearity property. Doing so results in the equation L[y ] + a0 L[y ] = L[f (t )]

(5.3.9)

Here, we realize that while we can compute L[f (t )] using the deﬁnition or established results, it is unclear how to work with L[y ] and L[y ]. Ideally, if we could understand how the Laplace transform L[y ] of the derivative of an unknown function is related to the Laplace transform L[y ] of the function itself, that would enable us to work with one unknown quantity. To this end, we return to the deﬁnition and show how L[y ] depends on L[y ]. Let us suppose that y and y are acceptable functions and that y is continuous. By deﬁnition, ∞ r −st L[y (t )] = y (t )e dt = lim y (t )e −st dt (5.3.10) r

r →∞ 0

0

To evaluate 0 y (t )e −st dt , we use integration by parts with u = e −st and dv = y (t )dt . It follows that du = −se −st dt and v = y(t ). Integrating3 (5.3.10), r r −st L[y (t )] = lim y(t )e + s y(t )e −st dt r →∞

0

0

= lim y(r)e −sr − y(0) + s r →∞

r

y(t )e −st dt

(5.3.11)

0

3 The integration by parts formula holds since y is continuous. If y has a jump discontinuity, then this part of the argument is more complicated.

342

Laplace transforms

Since y is an acceptable function, it is of exponential order and |y(t )| ≤ Me bt for some positive constants M and b. Assuming that s > b, it follows y(r)e −sr → 0 as r → ∞. In addition, in (5.3.11) we observe r ∞ lim s y(t )e −st dt = s y(t )e −st dt = s L[y(t )] r →∞

0

0

by the deﬁnition of the Laplace transform. Hence, (5.3.11) implies L[y (t )] = s L[y(t )] − y(0) Our work has proved the following theorem. Theorem 5.3.4 Then

(5.3.12)

Suppose y(t ) is continuous and y(t ) and y (t ) are acceptable. L[y (t )] = s L[y(t )] − y(0)

(5.3.13)

Note particularly the appearance of y(0) in the conclusion of theorem 5.3.4. This foreshadows how we will use the Laplace transform to solve an initialvalue problem directly without resorting to a general solution of the associated differential equation. To see further how we will use the Laplace transform, we consider the following example. Example 5.3.5

Use the Laplace transform to solve the initial-value problem y + y = e −t ,

y(0) = 0

(5.3.14)

Solution. We begin by taking the Laplace transform of both sides of (5.3.14) to achieve L[y ] + L[y ] = L[e −t ] (5.3.15) From example 5.2.3, we know that L[e −t ] = 1/(s + 1). Furthermore, we just established L[y ] = s L[y ] − y(0) (5.3.16) Combining (5.3.15), (5.3.16), and the given fact that y(0) = 0, we have 1 (5.3.17) s L[y ] + L[y ] = s +1 Letting Y (s) = L[y ], factoring, and solving for Y (s), 1 (5.3.18) Y (s) = (s + 1)2 To solve the initial-value problem, it remains for us to determine the function y(t ) whose Laplace transform is Y (s) = 1/(s + 1)2 . That is, we must ﬁnd L−1 [Y (s)] = L−1 [1/(s + 1)2 ]. In example 5.3.3, we saw that L[te at ] = 1/(s − a)2 . In particular, 1 1 −1 or L = te −t L[te −t ] = (s + 1)2 (s + 1)2

General properties of the Laplace transform

343

From (5.3.18), it now follows that y(t ) = te −t This is precisely the solution we expect had we applied another method (such as using an integrating factor) to solve (5.3.14). Note particularly that our work in (5.3.14)–(5.3.17) converted the given initialvalue problem (5.3.14) involving y to an algebraic equation (5.3.17) involving L[y ] = Y (s). We then had to use the inverse Laplace transform in order to determine y(t ). This process is typical for how the transform is used to solve IVPs; at this point, we largely need to gain experience with more complicated functions and situations in order to solve more advanced problems. We make note of one more result that relates the Laplace transform of a higher order derivative to the transform of the original function in order to help us solve higher order IVPs before proceeding to establish additional results on products of familiar functions and piecewise-deﬁned functions in order to more fully understand the workings of the Laplace transform. Corollary 5.3.5 Suppose y(t ) and y (t ) are continuous and y(t ), y (t ), and y (t ) are acceptable. Then L[y (t )] = s 2 L[y(t )] − sy(0) − y (0)

(5.3.19)

The proof of corollary 5.3.5 is straightforward by two applications of theorem 5.3.4; see exercise 28. In theorem 5.3.2, we computed the Laplace transform of tf (t ) in terms of the Laplace transform of f (t ). In addition to multiplying by t (or powers of t ), another function that arises frequently in the study of differential equations is e at . Hence we are naturally interested in how L[e at f (t )] is related to L[f (t )]. Letting f (t ) be an acceptable function and L[f (t )] = F (s), we have by deﬁnition that ∞ F (s) = f (t )e −st dt (5.3.20) 0

e at f

For the Laplace transform of (t ), we note that e at f (t ) is an acceptable function and, by deﬁnition, ∞ ∞ L[e at f (t )] = e at f (t )e −st dt = f (t )e −(s −a)t dt (5.3.21) 0

0

From the right-hand sides of (5.3.20) and (5.3.21), we observe that the only difference is that s has been replaced by s − a. In particular, L[e at f (t )] = F (s − a), where L[f (t )] = F (s). We say that F (s) has been shifted by multiplying f (t ) by e at and call the theorem we have just proved the ﬁrst shifting property, which is stated as follows.

344

Laplace transforms

Theorem 5.3.6 (First Shifting Property). Let f (t ) be acceptable and L[f (t )] = F (s). For any real value of a, L[e at f (t )] = F (s − a)

In the next example, we compute three Laplace transforms to show the straightforward application of theorem 5.3.6. Example 5.3.6 Solution.

Find L[e at cos kt ], L[e at sin kt ], and L[e at t 2 ].

We have already established that s s2 + k2

L[cos kt ] =

so by the ﬁrst shifting property, L[e at cos kt ] =

s −a (s − a)2 + k 2

Similarly, from the fact that L[sin kt ] =

k s2 + k2

we observe L[e at sin kt ] =

k (s − a)2 + k 2

Finally, L[t 2 ] =

2 s3

and theorem 5.3.6 together imply L[e at t 2 ] =

2 (s − a)3

A summary of the results we established in this section follows in table 5.2. Exercises 5.3 In exercises 1–5, use the linearity property and the transforms derived in the examples to ﬁnd the Laplace transform of the given function. 1. f (t ) = 3 − e t 2. f (t ) = 4 cos t + 2 sin t 3. f (t ) = 3e 2t − 3 sin 2t 4. f (t ) = 2 + 5 sin 3t 5. f (t ) = 4 cos 5t − 6e −2t

General properties of the Laplace transform

345

Table 5.2 Summary of results on the Laplace Transform from section 5.3 ∞ f (t) F(s) = L[f (t)] = 0 f (t)e−st dt

af (t ) + bg (t )

a L[f (t )] + b L[g (t )]

tf (t )

d −F (s) = − ds L[f (t )]

t n f (t )

(−1)n F (n) (s)

f (t )

s L[f (t )] − f (0) = sF (s) − f (0)

f (t )

s 2 L[f (t )] − sf (0) − f (0) = s 2 F (s) − sf (0) − f (0)

e at f (t )

F (s − a)

In exercises 6–11, use theorem 5.3.2 or corollary 5.3.3 and the transforms derived in the examples to ﬁnd the Laplace transform of the given function. 6. f (t ) = 3te 3t 7. f (t ) = t 2 e −t 8. f (t ) = 3t cos 4t 9. f (t ) = t 3 sin t 10. f (t ) = t 2 cos t 11. f (t ) = 4 cos 5t − 6e −2t In exercises 12– 17, use the ﬁrst shifting property and the transforms derived in the examples to ﬁnd the Laplace transform of the given function. 12. f (t ) = 3te 3t 13. f (t ) = t 2 e −t 14. f (t ) = e −2t cos 4t 15. f (t ) = e −t sin 2t 16. f (t ) = e 4t sinh 2t 17. f (t ) = cosh 2t sin 3t In exercises 18–24, use established general properties and the transforms derived in the examples to ﬁnd the Laplace transform of the given function. 18. f (t ) = 3te 3t − e 2t cos t 19. f (t ) = 4t 2 e −t + 7e −3t sin t 20. f (t ) = e −2t (t 2 + 4t + 5)

346

Laplace transforms

21. f (t ) = (t 2 − t ) sin t 22. f (t ) = t (cos 4t − 2 sin 4t ) 23. f (t ) = te −t sin 2t 24. f (t ) = t 2 e −t sin 2t 25. In multivariable calculus, students may have encountered Leibniz’s rule, which allows differentiation across the integral sign. In particular, the rule states that under reasonable hypotheses on a function K (s , t ), t =b ∂ d t =b K (s , t ) dt = [K (s , t )] dt ds t =a t =a ∂ s Use Leibniz’s rule to explain why theorem 5.3.2 is true. In particular, show that if F (s) = L[f (t )], then −F (s) = L[tf (t )] 26. Using the rule established in theorem 5.3.2, show why corollary 5.3.3 is true. Speciﬁcally, show that if n is a positive integer, then L[t n f (t )] = (−1)n F (n) (s)

(Hint: Apply the theorem to L[t · t n−1 f (t )] to show that L[t n f (t )] = −

d L[t n−1 f (t )] ds

and then repeat this line of reasoning on the expression L[t n−1 f (t )].) 27. Use corollary 5.3.3 to show that L[t n e at ] =

n! (s − a)n+1

28. Apply theorem 5.3.4 twice to prove corollary 5.3.5.

29. Express L f (4) (t ) in terms of L[f (t )] and the ﬁrst three derivatives of f (t ) at t = 0 by using theorem 5.3.4. 30. We have established that L[e at ] = 1/(s − a) for any real number a. Assume now that this formula holds for any complex number a = α + β i, and hence compute the Laplace transform L[e (α+β i)t ]

Use Euler’s formula and properties of complex numbers to show that L[e α t (cos β t + i sin β t )] =

s −α β +i 2 2 (s − α ) + β (s − α )2 + β 2

Explain how equating real and imaginary parts produces an alternate derivation for the Laplace transforms of e α t cos β t and e α t sin β t .

Piecewise continuous functions

2

347

y

1

t a Figure 5.3 The translated unit step

function u(t − a).

5.4 Piecewise continuous functions

In physical applications, we sometimes encounter step functions that represent some quantity being turned on or off, such as an electric switch. If a mass in a spring-mass system is struck with a hammer or a drug is delivered by muscle injection, impulse functions that involve forces acting over very short time periods play a key role. To help us address these and related situations, we study the application of the Laplace transform to two important functions—the Heaviside function and the Dirac delta function. 5.4.1 The Heaviside function

We deﬁne the Heaviside function, or unit step function, denoted u(t ), to be the function that is 0 for all t < 0 and 1 for all t ≥ 0. That is, ! 0, if t < 0 u(t ) = (5.4.1) 1, if t ≥ 0 Often, we will make use of a step function that turns on at t = a, rather than t = 0. Thus we employ the translated unit step function, u(t − a), which by (5.4.1) is given by ! 0, if t < a (5.4.2) u(t − a) = 1, if t ≥ a A plot of the translated unit step function is given in ﬁgure 5.3. Step functions may be used to turn other functions on or off. For example, if we consider the function f (t ) = (4 − t )u(t − 4), we observe that since u(t − 4) = 0

348

Laplace transforms

for t < 4 and u(t − 4) = 1 for t ≥ 4, it follows ! 0, if t < 4 f (t ) = 4 − t , if t ≥ 4

(5.4.3)

From this perspective, we see that the function (4 − t ) is off until t = 4, at which time it is turned on. To see how we can use step functions to turn another function both on and off at various times, we consider the function g (t ) = u(t − a) − u(t − b), where a < b. This difference of translated unit step functions turns on for a ≤ t < b and turns off when t ≥ b. More speciﬁcally, for t < a, both u(t − a) and u(t − b) are zero, so g (t ) = 0. For a ≤ t < b, u(t − a) = 1 and u(t − b) = 0, thus g (t ) = 1. And ﬁnally, once t ≥ b, both u(t − a) = 1 and u(t − b) = 1, so that g (t ) = 0. This can be written equivalently as ⎧ ⎪ ⎨0, if t < a g (t ) = 1, if a ≤ t < b (5.4.4) ⎪ ⎩0, if t ≥ b This property of the function u(t − a) − u(t − b) enables us to write a single formula for any piecewise-deﬁned function that arises, rather than the traditional cases format where we stipulate the different formulas on different intervals, as in (5.4.4). The next example demonstrates the role of u(t − a) − u(t − b). Example 5.4.1 functions.

Deﬁne the following piecewise function using unit step ⎧ ⎪ ⎨t , f (t ) = 2, ⎪ ⎩0,

if 0 ≤ t < 2 if 2 ≤ t < 4 otherwise

Solution. We use the fact that the function u(t ) − u(t − 2) is 1 in the interval 0 ≤ t < 2 and 0 otherwise, and u(t − 2) − u(t − 4) is 1 on 2 ≤ t < 4 and 0 otherwise. Thus, we turn on t for 0 ≤ t < 2 and turn on 2 for 2 ≤ t < 4 by writing f (t ) = t [u(t ) − u(t − 2)] + 2[u(t − 2) − u(t − 4)] = tu(t ) + (2 − t )u(t − 2) − 2u(t − 4)

A plot of f (t ) is shown in ﬁgure 5.4 At this point, we should again not lose sight of our goal: we are interested in using Laplace transforms to solve initial-value problems such as y + 2y + 5y = u(t − 2),

y(0) = 1, y (0) = 0

where the forcing function is turned on at time t = 2. Since we will solve such equations by taking the Laplace transform of both sides, we must understand the

Piecewise continuous functions

349

y

2

1

t 2

4

Figure 5.4 The function f (t ) in

example 5.4.1.

transform of basic step functions. In fact, since step functions will be used to turn other functions on and off, we are more generally interested in L[u(t − a)f (t )]. We return to the deﬁnition to explore this situation further. Because we will employ a change of variables in our work, we begin by using z as a different variable of integration than the usual t in the deﬁnition. Speciﬁcally, from the deﬁnition of the Laplace transform we have ∞ ∞ −sz L[u(t − a)f (t )] = u(z − a)f (z)e dz = f (z)e −sz dz 0

a

The second equality follows from the fact that u(z − a) = 0 for all z < a and u(z − a) = 1 for all z ≥ a, which allows us to eliminate the presence of the unit step function. We now employ the substitution z = t + a and note that t = z − a and dz = dt . From this and our work above, we see ∞ L[u(t − a)f (t )] = f (z)e −sz dz a

= lim

z =r

f (z)e −sz dz

r →∞ z =a t =r −a

= lim

f (t + a)e −s(t +a) dt

= lim

f (t + a)e −st e −as dt

r →∞ t =0 t =r −a r →∞ t =0

(5.4.5)

In (5.4.5), since e −as is constant with respect to t , we can remove it from the integral. Moreover, we can take the limit as r → ∞ and note that (r − a) → ∞ as well. From this, we now have ∞ L[u(t − a)f (t )] = e −as f (t + a)e −st dt 0

350

Laplace transforms

On the right, we observe that the Laplace transform of f (t + a) has arisen, and therefore L[u(t − a)f (t )] = e −as L[f (t + a)]

We call this result the second shifting property and state it formally in the next theorem. Theorem 5.4.1 (Second Shifting Property) If f (t ) has a Laplace transform, then L[u(t − a)f (t )] = e −as L[f (t + a)]

(5.4.6)

When working with inverse transforms, we’ll often use the equivalent formulations of this result that L[u(t − a)f (t − a)] = e −as L[f (t )] or L−1 [e −as F (s)] = u(t − a)f (t − a) (5.4.7)

which come from replacing t with t − a in the argument of f . To see how the second shifting property works and gain more experience with the roles played by unit step functions, we consider several examples. Example 5.4.2

Determine the Laplace transform of the step function, u(t − 3).

Solution. We can view u(t − 3) as the function u(t − 3) · 1. Since we know that L[1] = 1/s, by the second shifting property it follows that L[u(t − 3)] = L[u(t − 3) · 1] = e −3s L[1] =

e −3s s

More generally, we can show that for any a ≥ 0, L[u(t − a)] =

Example 5.4.3 Solution.

e −as s

(5.4.8)

Determine the Laplace transform of f (t ) = u(t − 3) t 2 .

With f (t ) = t 2 , by the second shifting property we have L[u(t − 3) t 2 ] = e −3s L[(t + 3)2 ]

= e −3s L[t 2 + 6t + 9] 2 6 9 −3s =e + + s3 s2 s

Example 5.4.4

Determine the Laplace transform of f (t ) = u(t − a) − u(t − b).

Piecewise continuous functions

351

Solution. Because we know L[u(t − a)] = e −as /s, we can use the linearity of the Laplace transform to ﬁnd 1 1 1 L[u(t − a) − u(t − b)] = e −as − e −bs = (e −as − e −bs ) s s s With our understanding of the Laplace transform of step functions and the second shifting property, we are now prepared to compute transforms of a wide range of step functions. Example 5.4.5 Find the Laplace transform of ⎧ ⎪ if 0 ≤ t < 1 ⎨ 1, f (t ) = t , if 1 ≤ t < 2 ⎪ ⎩2, if 2 ≤ t Solution. We ﬁrst use step functions to write f (t ) with a single formula. Using u(t ) − u(t − 1) to turn 1 on and off, and similar ideas for t and 2, we have f (t ) = 1[u(t ) − u(t − 1)] + t [u(t − 1) − u(t − 2)] + 2u(t − 2) = u(t ) + (t − 1)u(t − 1) + (2 − t )u(t − 2)

Using the linearity of the Laplace transform, the second shifting property, and familiar transforms, L[f (t )] = L[u(t )] + L[(t − 1)u(t − 1)] + L[(2 − t )u(t − 2)]

1 + e −s L[(t + 1) − 1] + e −2s L[2 − (t + 2)] s 1 = + e −s L[t ] + e −2s L[−t ] s 1 1 −s 1 −2s = + 2e − 2e s s s =

Example 5.4.6 Find the Laplace transform of f (t ), where f (t ) is the piecewise linear function shown in the following graph. Solution. From the graph, we see that f has slope 1 on [0, 2) and slope −2 on [2, 3). Therefore, f can be deﬁned piecewise by the rule ⎧ if 0 ≤ t < 2 ⎪ ⎨t , f (t ) = 6 − 2t , if 2 ≤ t < 3 ⎪ ⎩0, if 3 ≤ t Using step functions, we can write f according to the formula f (t ) = t [u(t ) − u(t − 2)] + (6 − 2t )[u(t − 2) − u(t − 3)] = tu(t ) + (6 − 3t )u(t − 2) − (6 − 2t )u(t − 3)

352

Laplace transforms

2

y

1

t 2

4

Applying the second shifting property, linearity, and familiar transforms, we see that L[f (t )] = L[tu(t )] + L[(6 − 3t )u(t − 2)] − L[(6 − 2t )u(t − 3)]

= L[t ] + e −2s L[6 − 3(t + 2)] − e −3s L[6 − 2(t + 3)] = L[t ] + e −2s L[−3t ] − e −3s L[−2t ] =

1 3 2 − 2 e −2s + 2 e −3s 2 s s s

At this point, we have become familiar with piecewise-deﬁned functions and how the Laplace transform may be applied to them. In the near future, we will be solving initial-value problems of the form y + 2y = 6 · u(t − 4), y(0) = 1 through the use of Laplace transforms. In order to assess our progress to date, we explore this approach brieﬂy here. Taking the transform of both sides of the differential equation, 6e −4s s L[y ] − 1 + 2L[y ] = s Letting Y (s) = L[y ] and solving for Y (s), it follows that Y (s)(s + 2) = 1 + so

6e −4s s

1 1 + 6e −4s (5.4.9) s +2 s(s + 2) Here, it remains to determine the function y(t ) whose Laplace transform is Y (s). That is, we must compute the inverse Laplace transform of the righthand side of (5.4.9). Doing so involves using the inverse perspective on the second shifting property, as well as some algebraic work with the quantity 1/s(s + 2). We will pursue these and related ideas further in subsequent sections. Y (s) =

Piecewise continuous functions

353

Next, however, we turn our attention to the study of impulse functions that can model phenomena such as the striking of a hammer. 5.4.2 The Dirac delta function

In physical situations where a large force is delivered over a very short time interval, unit step functions are no longer sufﬁcient to model the forcing function. For example, if a hammer is used to strike a mass attached to a spring at a given time, it is not immediately clear how we should represent this forcing function. To address this situation, physicist Paul Dirac proposed what is today called the Dirac delta function, denoted δ (t ). We seek to understand this function by ﬁrst examining what happens when a force of constant magnitude acts over a smaller and smaller time interval. Suppose that a force Fh of constant magnitude acts on an object over the time interval [a − h , a + h ], where a > 0. Assume that the force is zero otherwise. The impulse (or amount of push) of the force is deﬁned by a +h Fh dt (5.4.10) I= a −h

If we want this constant force Fh to deliver a one-unit impulse, it follows that 1 Fh = 2h More speciﬁcally, if we wish to view the delivered force Fh as being generated by a forcing function Fh (t ), we can use the unit step function to express Fh (t ) through the formula 1 (5.4.11) Fh (t ) = [u(t − (a − h)) − u(t − (a + h))] 2h A plot of Fh (t ) for several different values of h is shown in ﬁgure 5.5; the vertical lines in each are technically not a part of the graph of Fh (t ), but are 10 h = 0.05 h = 0.1

5

h = 0.2 t a Figure 5.5 The forcing function Fh (t ) for h = 0.2, h = 0.1, and h = 0.05.

354

Laplace transforms

included to help contrast the different values of h. Note particularly that Fh (t ) satisﬁes the property that ∞ Fh (t ) dt = 1 (5.4.12) −∞

and that as h → 0, the magnitude of the force grows without bound in order to maintain the same total amount of push being delivered. For an actual impulse, such as when a hammer strikes a mass, we want the force to act instantaneously at time t = a, where a > 0. This instantaneous impulse function is known as the Dirac delta function,4 denoted δ (t − a), and is determined by letting h → 0 in Fh (t ). In particular, we note two key properties of δ (t − a): 1 I. δ (t − a) = lim Fh (t ) = lim [u(t − (a − h)) − u(t − (a + h))] h →0 h →0 2h ∞ II. −∞ δ (t − a) dt = 1 Property I is the deﬁnition of the Dirac delta function; Property II is a consequence of (5.4.12) and taking the limit as h → 0. A good way to think of δ (t − a) is as a function that is zero everywhere except at a, but inﬁnite right at a. Actually, δ (t − a) is a limit of step functions that are nonzero over shorter and shorter intervals, but that always enclose an area of one unit, thus having spikes that grow in magnitude as the interval width shrinks. In situations such as a mass being struck with a hammer, we can now use the delta function to model the forcing function. For instance, if a hammer strikes the mass at t = 3, we can model the forcing function by f (t ) = δ (t − 3). In order to solve initial-value problems that involve the delta function, it will be essential to know the Laplace transform of L[δ (t − a)]. To do so, we ﬁrst apply the deﬁnition of the transform to the step function Fh (t ). In particular, by familiar properties of the Laplace transform, 1 L[Fh (t )] = L [u(t − (a − h)) − u(t − (a + h))] 2h 1 1 L[u(t − (a − h))] − L[u(t − (a + h))] 2h 2h 1 1 −(a −h)s 1 −(a +h)s − e = e 2h s s " # − as e = e hs − e −hs 2hs

=

(5.4.13)

4 Technically, the Dirac delta function is not a function, because it has the unusual property that it is zero everywhere but a, and inﬁnite at t = a. Ultimately, the Laplace transform is what enables us to make sense of this function.

Piecewise continuous functions

355

Since δ (t − a) is deﬁned as the limit of Fh (t ) as h → 0, we naturally deﬁne the Laplace transform of δ (t − a) to be the limit of the Laplace transform of Fh (t ) as h → 0. In particular, from (5.4.13), some algebraic rearrangement, and an application of L’Hopital’s Rule, we can state that # e −as " hs e − e −hs lim L[Fh (t )] = lim h →0 h →0 2hs =

e −as e hs − e −hs lim s h →0 2h

=

e −as se hs + se −hs lim s h →0 2

=

e −as · s = e −as s

We therefore deﬁne L[δ (t − a)] = e −as . We close this section with an example that foreshadows the use of the delta function in a spring-mass system and the role of Laplace transforms in solving the corresponding IVP. Example 5.4.7 Consider a spring mass system where m = 1, k = 13, and c = 4. Assume that the mass is initially displaced 1 m and released. Finally, assume that at t = 3, the mass is struck with a hammer in the positive direction. Set up and solve an initial-value problem that describes this situation. Solution. Using the delta function, the given problem is a standard damped harmonic oscillator equation with an impulse forcing function. In particular, the displacement y of the mass satisﬁes the initial-value problem y + 4y + 13y = δ (t − 3),

y(0) = 1, y (0) = 0

(5.4.14)

Before we solve the IVP, we can use our intuition as a guide: we expect the size of the oscillations of the mass to decrease in magnitude until t = 3, at which time we expect the problem to restart as the blow from the hammer will increase the displacement of the mass, from which oscillations should eventually decrease to zero. We begin to solve (5.4.14) by using the Laplace transform in order to see how far our method enables us to progress. Taking the Laplace transform of both sides of (5.4.14), L[y ] + 4L[y ] + 13L[y ] = L[δ (t − 3)]

From corollary 5.3.5, it follows that s 2 L[y ] − sy(0) − y (0) + 4s L[y ] − 4y(0) + 13L[y ] = L[δ (t − 3)] Using the conditions y(0) = 1 and y (0) = 0, as well as the fact that L[δ (t − 3)] = e −3s , we now have s 2 L[y ] − s + 4s L[y ] − 4 + 13L[y ] = e −3s

356

Laplace transforms

1.0

y

0.5

t 2

4

6

Figure 5.6 The solution to the IVP

(5.4.14).

Solving for L[y ] = Y (s), we see that Y (s)(s 2 + 4s + 13) = s + 4 + e −3s or Y (s) =

s +4 e −3s + s 2 + 4s + 13 s 2 + 4s + 13

(5.4.15)

It remains for us to learn how to compute the inverse Laplace transform of (5.4.15) in order to ﬁnd the solution y to the IVP. The following sections are devoted to these ideas. Upon further study, we will be able to show that the function y(t ) that satisﬁes (5.4.15) is 1 1 y = e −2t (3 cos 3t + 2 sin 3t ) + u(t − 3)e −2(t −3) sin 3(t − 3) 3 3 A plot of this solution is shown in ﬁgure 5.6, where y(t ) demonstrates precisely the type of behavior we expect. The Laplace transform helps us make sense of the Dirac delta function in several ways. One is that we can imagine wanting to say that a hammer strikes a mass with different intensities. If, say, we want to compare the results of the initialvalue problems where a hammer strikes a mass to deliver a given impulse versus what happens when the hammer strikes the mass three times as hard, this at ﬁrst seems to be nonsense: δ (t − 3) and 3δ (t − 3) are both zero everywhere and inﬁnite at t = 3. But the power of the Laplace transform rescues us again. Since by linearity, L[3δ (t − 3)] = 3L[δ (t − 3)] = 3e −3s , the transform detects the difference in the amount of push delivered by the hammer strike, and the results are shown accordingly in the solution to the initial-value problem. In addition, since L[δ (t − a)] = e −as , we know that the presence of e −as in Y (s) will lead to the presence of u(t − a) in y(t ): here we see how the delta function leads to a restart at t = a as the function u(t − a) turns on at this time in the function y(t ).

Piecewise continuous functions

357

5.4.3 The Heaviside and Dirac functions in Maple

Both the Heaviside and Dirac functions belong to Maple’s library of basic functions. The syntax for the Heaviside function is simply > Heaviside(t);. Similarly, the Dirac function is given by > Dirac(t);. For work with the Heaviside function, we often denote the function by u(t ). In Maple, this can be accomplished with the command > u := t -> Heaviside(t);

Then, to enter and plot a piecewise-deﬁned function such as f (t ) = t (u(t ) − u(t − 2)) + (6 − 2t )(u(t − 2) − u(t − 3)) we may use the syntax > f := t -> t*(u(t)-u(t-2)) + (6-2*t)*(u(t-2)-u(t-3)); > plot(f(t), t=-1..5, color=black, thickness=2);

to generate the plot shown in ﬁgure 5.7. More on both the Heaviside function and the Dirac function in Maple, particularly related to their roles in solving initial-value problems with Laplace transforms, can be found in section 5.6.1. 2

y

1

t 2

4

Figure 5.7 The function f (t ) =

t (u(t ) − u(t − 2)) + (6 − 2t ) (u(t − 2) − u(t − 3)).

Exercises 5.4 In exercises 1–7, sketch a graph of each of the following functions and write each in terms of unit step functions. ⎧ ⎪ ⎨0, if 0 ≤ t < 1 1. f (t ) = 1, if 1 ≤ t < 2 ⎪ ⎩0, if 2 ≤ t

358

Laplace transforms

!

1, if 0 ≤ t < 4 2, if 4 ≤ t ⎧ ⎪ ⎨0, if 0 ≤ t < 1 3. f (t ) = t , if 1 ≤ t < 2 ⎪ ⎩t 2 , if 2 ≤ t ! t , if 0 ≤ t < 2 4. f (t ) = 0, if 2 ≤ t ! sin t , if 0 ≤ t < 2π 5. f (t ) = 0, if 2π ≤ t ! sin t , if 0 ≤ t < 2π 6. f (t ) = sin 2t , if 2π ≤ t ⎧ ⎪ if 0 ≤ t < 2 ⎨t , 7. f (t ) = 2, if 2 ≤ t < 4 ⎪ ⎩4 − t , if 4 ≤ t 2. f (t ) =

8. Determine the Laplace Transform of the function f (t ) given in (a) Exercise 1 (b) Exercise 2 (c) Exercise 3 (d) Exercise 4 (e) Exercise 5 (f) Exercise 6 (g) Exercise 7 In exercises 9–11, compute the Laplace transform of f (t ). 9. f (t ) = 2[u(t − 1) − u(t − 3)] + δ (t − 5) 10. f (t ) = 2 sin 5t + δ (t − 3) 11. f (t ) = 2e −3t sin 2t + δ (t − 8) 12. Set up, but do not solve, an initial-value problem that represents a spring-mass system with m = 4 kg, spring constant k = 10, and damping constant c = 2, where a unit impulse is delivered by a hammer at t = 6. Assume the units on all quantities are consistent and that the mass is initially displaced 0.25 m and released. 13. Set up, but do not solve, an initial-value problem that represents a spring-mass system with m = 4 kg, spring constant k = 10, and damping constant c = 2, where a forcing function f (t ) = 3 sin 2t is turned on at t = 4 and an impulse of magnitude 4 is delivered by a hammer at t = 10.

Solving IVPs with the Laplace transform

359

Assume the units on all quantities are consistent and that the mass is initially displaced 0.25 m and released.

5.5 Solving IVPs with the Laplace transform

As we have seen in examples 5.3.5 and 5.4.7, in order to solve initial-value problems using the Laplace transform, the ﬁnal step in the process is to answer the question “what function y(t ) has Laplace transform Y (s)?” In this section, we will further study the inverse Laplace transform, the process that takes the Laplace transform of an unknown function back to the function itself. Throughout, we motivate our work through examples of solving initial-value problems to see some of the typical functions Y (s) that arise in this approach and the steps necessary to determine y(t ) = L−1 [Y (s)]. Example 5.5.1 Use Laplace transforms to solve the initial-value problem y − 2y = 5,

y(0) = 4

Solution. We begin by taking the Laplace transform of both sides of the differential equation. Using the linearity of the transform, L[y ] − 2L[y ] = 5L[1]

By theorem 5.3.4 and the familiar transform of the function f (t ) = 1, it follows that 5 s L[y ] − y(0) − 2L[y ] = s Using the given fact that y(0) = 4 and denoting L[y ] = Y (s), 5 (5.5.1) s Note particularly that (5.5.1) is now an algebraic equation in the unknown function Y (s). Solving for Y (s), we ﬁnd sY (s) − 2Y (s) = 4 +

Y (s) =

4s + 5 s(s − 2)

At this point, we recall that Y (s) = L[y ], where y(t ) is the original unknown function we seek as the solution to the stated IVP. Solving the IVP has now been reduced to ﬁnding the function y(t ) that has Laplace transform Y (s). That is, we seek y(t ) = L−1 [Y (s)]. With a bit of algebraic rearrangement and insight, we can ﬁnd the function y(t ). In particular, using a partial fraction decomposition, we can show that Y (s) =

4s + 5 5/2 13/2 =− + s(s − 2) s s −2

(5.5.2)

360

Laplace transforms

Recalling that L[1] = 1/s and L[e 2t ] = 1/(s − 2), (5.5.2) implies 5 13 y(t ) = − + e 2t 2 2 This is precisely the solution we would ﬁnd to the IVP were we to use an integrating factor or separation of variables to solve the differential equation. Whenever we use the Laplace transform to solve an IVP, we will employ a process similar to our work in example 5.5.1: (1) Take the transform of both sides of the stated differential equation to transform the differential equation in y(t ) into an algebraic equation in Y (s) = L[y ]; (2) Use algebra to solve for Y (s); (3) Determine which function y(t ) has the Laplace transform Y (s). As we have noted previously, given a function F (s), a function f (t ) such that L[f (t )] = F (s) is called the inverse Laplace transform of F . We use the notation L−1 [F (s)] = f (t ). For our purposes, a good way to view the operator L−1 is as one that reverses the work of the Laplace transform. A key step in working backward will be to decompose the function F (s) into more manageable pieces, often through a partial fraction decomposition. A review of partial fractions can be found in appendix A; partial fractions are an algebraic technique that proves useful for more than just integration, as we will see throughout this section. Once the pieces of F (s) are in a recognizable form, we use standard rules we have developed for Laplace transforms to compute the inverse transform. For example, after using partial fractions to decompose Y (s) in example 5.5.1, we showed that since L[e 2t ] = 1/(s − 2), it follows that 1 −1 L = e 2t s −2 More generally, we can state that −1

L

1 = e at s −a

(5.5.3)

Indeed, we realize that we can turn around any known relationship generated by the Laplace transform in order to make a statement about the inverse transform. For example, the inverse transform satisﬁes the linearity property stated in the following theorem. Theorem 5.5.1

For every pair of constants a and b,

L−1 [aF (s) + bG(s)] = a L−1 [F (s)] + b L−1 [G(s)]

Solving IVPs with the Laplace transform

361

Both shifting properties we have developed are regularly used in their inverse form. For the ﬁrst shifting property, given L[f (t )] = F (s), we know that for any real value of a, L[e at f (t )] = F (s − a) Stated differently, this ﬁrst shifting property implies L−1 [F (s − a)] = e at f (t )

(5.5.4)

Likewise, from the slightly revised version of the second shifting property, we know that L[u(t − a)f (t − a)] = e −as L[f (t )] = e −as F (s) and therefore stated in inverse form, L−1 [e −as F (s)] = u(t − a)f (t − a)

(5.5.5)

In our next example, we see how several of these fundamental concepts are employed in practice, speciﬁcally when step functions are involved. Example 5.5.2 Use Laplace transforms to solve the initial-value problem y + y = 5u(t − 1),

y(0) = 4

Solution. Taking the Laplace transform of both sides of the differential equation and applying the initial condition, s L[y ] − 4 + L[y ] = 5L[u(t − 1)] Using the established fact that L[u(t − 1)] = e −s /s and letting Y (s) = L(y), sY (s) − 4 + Y (s) =

5e −s s

Solving for Y (s), 4 1 + 5e −s (5.5.6) s +1 s(s + 1) At this point, we need to use the inverse transform to solve for y(t ). Finding L−1 [4/(s + 1)] is straightforward: by linearity and the ﬁrst shifting property,5 4 L−1 (5.5.7) = 4e −t s +1 Y (s) =

To deal with the remaining term in (5.5.6), we note that with e −s present we will need to use the second shifting property (5.5.5) in reverse. For this, it will be most useful to have the function 1 F (s) = s(s + 1) 5

We know L−1 [1/s ] = 1, and thus the ﬁrst shifting property implies L−1 [1/(s + 1)] = e −t · 1

362

Laplace transforms

y 4

2

t 2

4

6

Figure 5.8 The solution to the IVP

of example 5.5.2.

in a simpler form. Using its partial fraction decomposition, we observe that F (s) =

1 1 − s s +1

By (5.5.5), it now follows that " # 1 1 = 5u(t − 1) 1 − e −(t −1) L−1 5e −s − s s +1

(5.5.8)

Combining our work at (5.5.7) and (5.5.8) to determine y(t ) from (5.5.6), we have shown that y(t ) = 4e −t + 5u(t − 1) − 5u(t − 1)e −(t −1) A plot of this solution curve is shown in ﬁgure 5.8, where we see qualitative behavior consistent with what we would expect from the forcing function in the IVP. In particular, the forcing function is 5u(t − 1), which makes the forcing function behave as if the constant function 5 is turned on at t = 1 in the initialvalue problem. For t = 0 to t = 1, we see the standard exponential decay that we would expect for the homogeneous equation y + y = 0. But at t = 1, the solution function turns and begins to approach the equilibrium solution y = 5 that we expect in the nonhomogeneous equation y + y = 5. We note speciﬁcally that the Laplace transform has successfully handled all of this at once, including the role of the initial condition y(0) = 4 and the corner in the solution function y(t ) at t = 1. We next solve a second-order initial-value problem that involves the unit step function. Here, we will see how the higher order of the equation introduces additional complexity in determining the inverse Laplace transform needed to solve the IVP.

Solving IVPs with the Laplace transform

363

Example 5.5.3 Use the Laplace transform to solve the initial-value problem y + 2y + 5y = u(t − 2),

y(0) = 1, y (0) = 0

(5.5.9)

Solution. Taking the Laplace transform of both sides of (5.5.9) and writing Y (s) = L[y(t )], we observe that e −2s s Substituting the given initial conditions and factoring on the left, we have s 2 Y (s) − sy(0) − y (0) + 2(sY (s) − y(0)) + 5Y (s) =

Y (s)(s 2 + 2s + 5) = s + 2 +

e −2s s

Solving for Y (s), we can write Y (s) = Y1 (s) + Y2 (s) =

s +2 s 2 + 2s + 5

+ e −2s

1 s(s 2 + 2s + 5)

(5.5.10)

It remains for us to determine the function y(t ) whose transform is Y (s). By linearity, it helps for us to break the function Y (s) into the simplest pieces we can; we begin by determining the inverse transform of Y1 (s). Because of shifting properties of the transform (and because of the fact that we cannot factor s 2 + 2s + 5 in an effort to apply partial fractions), it is useful to complete the square in expressions such as s 2 + 2s + 5. We instead write (s + 1)2 + 4, and seek to identify other parts of the expression that involve (s + 1). Separating the numerator (s + 2) into (s + 1) + 1, we can express the ﬁrst term in (5.5.10) as s +2 s +1 1 Y1 (s) = 2 = + (5.5.11) 2 s + 2s + 5 (s + 1) + 4 (s + 1)2 + 4 Recalling that L[cos 2t ] = s /(s 2 + 4) and L[sin 2t ] = 2/(s 2 + 4), we know L−1 [s /(s 2 + 4)] = cos 2t and L−1 [2/(s 2 + 4)] = sin 2t

The inverse of the ﬁrst shifting property, L−1 [F (s + 1)] = e −t f (t ), now implies that s +1 1 1 L−1 + (5.5.12) = e −t cos 2t + e −t sin 2t (s + 1)2 + 4 (s + 1)2 + 4 2 Hence, the ﬁrst term Y1 (s) in (5.5.10) comes from taking the Laplace transform of the function y1 (t ) = e −t cos 2t + 12 e −t sin 2t . From (5.5.10), it remains for us to ﬁnd the function y2 (t ) whose Laplace transform is 1 Y2 (s) = e −2s 2 s(s + 2s + 5) Using a partial fraction decomposition on the rational part of the function, we have 1 1 1 s +2 = e −2s − 2 e −2s 2 s(s + 2s + 5) 5 s s + 2s + 5

364

Laplace transforms

1.0

y

0.5

t 4

8

Figure 5.9 The solution y(t ) to the IVP

in example 5.5.3.

Observe that we have already determined the inverse transform of the function (s + 2)/(s 2 + 2s + 5) above at (5.5.12). Here, we must deal with the additional presence of the constant 1/5, the multiplier e −2s , and the basic function 1/s. Recalling the inverse second shifting property, L−1 [e −as F (s)] = u(t − a)f (t − a), and (5.5.12), we observe that 1 s +2 L−1 e −2s − 2 s s + 2s + 5 1 −(t −2) (cos 2(t − 2) + sin 2(t − 2)) (5.5.13) = u(t − 2) 1 − e 2 Combining (5.5.10), (5.5.12), and (5.5.13), we have shown that the solution y(t ) to the initial-value problem is e −t 1 y(t ) = e −t cos 2t + sin 2t + u(t − 2) 2 5

e −(t −2) −(t −2) 1−e cos 2(t − 2) + sin 2(t − 2) 2 A plot of the function y(t ) is shown in ﬁgure 5.9. Here, we see evidence of the qualitative behavior we expect: until the unit step function turns on, the homogeneous equation should show damped oscillations so that y(t ) → 0. But once the step function turns on, the forcing function makes the equation nonhomogeneous with a constant forcing function, making y = 1/5 the stable equilibrium solution to which y(t ) tends. To further explore the ideas that arise in computing inverse transforms, we next consider a slight modiﬁcation of the preceding example, but in an applied setting where a more complicated forcing function is present. In particular, we examine a spring-mass system in which a periodic forcing function is introduced at t = π .

Solving IVPs with the Laplace transform

365

Example 5.5.4 Consider a mass of 1 kg attached to a spring with spring constant k = 13 such that the system has damping constant c = 4. Assume that the mass is displaced 1 m from equilibrium and released at t = 0; furthermore, at time t = π the forcing function f (t ) = 2 sin 3t is applied. Assuming consistent units, set up an IVP that models this situation and solve the IVP using Laplace transforms. Solution. From our work with spring-mass systems, we know that the displacement y(t ) of the mass from equilibrium must satisfy the initial-value problem y + 4y + 13y = 2u(t − π ) sin 3t ,

y(0) = 1, y (0) = 0

Taking Laplace transforms, it follows that s 2 Y (s) − sy(0) − y (0) + 4(sY (s) − y(0)) + 13Y (s) = 2L[u(t − π ) sin 3t ] (5.5.14) We know that L[sin 3t ] = 3/(s 2 + 9), and by the second shifting property L[u(t − π ) sin 3t ] = e −π s L[sin 3(t + π )]

(5.5.15)

At this point, we observe by basic trigonometry that sin(3t + 3π ) = sin 3t cos 3π + cos 3t sin 3π = − sin 3t . Hence, from (5.5.15) we have 3 L[u(t − π ) sin 3t ] = e −π s L[− sin 3t ] = −e −π s 2 s +9 Returning to (5.5.14) and using the given initial conditions, it follows that 3 s 2 Y (s) − s + 4sY (s) − 4 + 13Y (s) = −2e −π s 2 s +9 Factoring, 3 Y (s)(s 2 + 4s + 13) = s + 4 − 2e −π s 2 s +9 Solving for Y (s), Y (s) = Y1 (s) + Y2 (s) =

s +4 s 2 + 4s + 13

− 2e −π s

3 (s 2 + 9)(s 2 + 4s + 13)

(5.5.16)

It remains to ﬁnd the inverse transform of Y (s); we do so one piece at a time using the linearity of the inverse transform. In both Y1 (s) and Y2 (s), we will algebraically rearrange the expression in order to help us more easily determine the inverse Laplace transform, using an approach similar to our work in example 5.5.3. Taking the ﬁrst term in (5.5.16), we observe that since the denominator does not factor, we need to write it in a more familiar form. Completing the square and separating the numerator enables us to write s +4 s +2 2 Y1 (s) = = + (s + 2)2 + 9 (s + 2)2 + 9 (s + 2)2 + 9

366

Laplace transforms

and see the structure of Laplace transforms of basic functions. In particular, from the ﬁrst shifting property and the known Laplace transforms of cos 3t and sin 3t , it follows that s +2 2 2 −1 −1 = e −2t cos 3t + e −2t sin 3t L [Y1 (s)] = L + 2 2 (s + 2) + 9 (s + 2) + 9 3 (5.5.17) Next we ﬁnd the inverse transform of the term Y2 (s) in (5.5.16). That is, we must determine 1 L−1 [Y2 (s)] = L−1 −6e −π s 2 (5.5.18) (s + 9)(s 2 + 4s + 13) From the presence of e −π s , we know the second shifting property will be used; in addition, we must algebraically rearrange the remaining part of the expression in order to ﬁnd the inverse transform. Computing the partial fraction decomposition of the rational function in (5.5.18), we equivalently seek s −1 s +3 −1 −1 6 −π s L [Y2 (s)] = L − (5.5.19) e 40 s 2 + 9 s 2 + 4s + 13 One additional rearrangement will enable us to ﬁnd the desired inverse transform. Completing the square in the second fraction and separating the numerator in each enables us to rewrite (5.5.19) as s 1 s +2 1 6 L−1 [Y2 (s)] = L−1 e −π s 2 − 2 − − 40 s + 9 s + 9 (s + 2)2 + 9 (s + 2)2 + 9 Applying the inverse of the second shifting property to each of the terms in L−1 [Y2 (s)], it follows that 6 1 L−1 [Y2 (s)] = u(t − π ) cos 3(t − π ) − sin 3(t − π ) 40 3 1 − e −2(t −π ) cos 3(t − π ) − e −2(t −π ) sin 3(t − π ) (5.5.20) 3 Noting that sin(3t − 3π ) = − sin 3t and cos(3t − 3π ) = − cos 3t , we can simplify (5.5.20) to 3 1 1 −1 −2(t −π ) L [Y2 (s)] = u(t − π ) − cos 3t + sin 3t − e (− cos 3t + sin 3t ) 20 3 3 Combining our work with L−1 [Y1 (s)] and L−1 [Y2 (s)], we have therefore shown that y(t ) = L−1 [Y (s)] is the function 2 3 y(t ) = e −2t (cos 3t + sin 3t ) + u(t − π )[− cos 3t 3 20 1 1 + sin 3t − e −2(t −π ) (− cos 3t + sin 3t )] 3 3

Solving IVPs with the Laplace transform

1.0

367

y

0.5

t 3

5

7

9

Figure 5.10 The solution to the IVP in

example 5.5.4.

A plot of the function y(t ) is given in ﬁgure 5.10, where we see that until the forcing function activates at t = π , we see the standard damped oscillations decaying to zero. When the periodic forcing function turns on, the system demonstrates the repeating oscillations generated by this function. At this point in our work, we have been exposed to most of the main ideas necessary for using the Laplace transform to solve initial-value problems. In addition to knowing the standard properties of the transform and its effects on basic functions, we must understand how to compute the inverse transform and the algebraic rearrangements that such inversion entails. Speciﬁcally, we have seen in several examples the need to determine partial fraction decompositions, complete the square, and separate the numerator in fractions. For example, the key computations necessary to ﬁnd the inverse transform of the function 11 F (s) = 2 s(s + 6s + 11) are to ﬁrst determine the partial fraction decomposition and write 1 s +6 F (s) = − 2 s s + 6s + 11 The ﬁrst term is straightforward to invert; but the second term requires further manipulation. Completing the square in the denominator, we see that s 2 + 6s + 11 = (s + 3)2 + 2, and therefore it is convenient to write the numerator as s + 6 = (s + 3) + 3. Doing so, 1 s +3 3 F (s) = − − 2 s (s + 3) + 2 (s + 3)2 + 2 It is at this point, together with the ﬁrst shifting property, that we can ﬁnally compute L−1 [F (s)] and ﬁnd √ √ 3 f (t ) = L−1 [F (s)] = 1 − e −3t cos 2t − √ e −3t sin 2t 2

368

Laplace transforms

Finally, we have also seen that the second shifting property also plays an important role. In the presence of the unit step function u(t − a), the multiplier e −as will arise in F (s). In that case, we must invert e −as F (s); doing so, we get u(t − a)f (t − a), as opposed to simply f (t ). In light of these overall comments, we see the need to practice the computation of inverse Laplace transforms so that we can use these concepts in the solution of initial-value problems. In the next section, we will summarize key properties of the inverse transform, consider a few additional examples of more complicated inverse transforms, demonstrate the role technology plays in computations, and provide exercises for additional practice. We close the current section with an example involving the Dirac delta function. Example 5.5.5 Consider an undamped spring-mass system with spring constant c = 4. Suppose that the mass is displaced 1 unit from equilibrium and struck with a force to impart an initial velocity of y (0) = 1. In addition, at times t = 7 and t = 20, a hammer delivers a one-unit impulse to the mass in the positive direction. Assuming consistent units, set up and solve an IVP that models this situation. Solution. We use the Dirac delta function to represent the impulse forces delivered at times t = 7 and t = 20. Coupled with the standard equation to represent the spring-mass system, we see that the displacement y(t ) of the mass at time t satisﬁes the initial-value problem y + 4y = δ (t − 7) + δ (t − 20), y(0) = 1, y (0) = 1 To solve the IVP, we begin by taking Laplace transforms and ﬁnd that s 2 Y (s) − sy(0) − y (0) + 4Y (s) = L[δ (t − 7)] + L[δ (t − 20)] Recalling that L[δ (t − a)] = e −as and using the given initial conditions, Y (s) must satisfy the equation s 2 Y (s) − s − 1 + 4Y (s) = e −7s + e −20s Factoring, Y (s)(s 2 + 4) = s + 1 + e −7s + e −20s and therefore s 1 1 1 Y (s) = 2 + 2 + e −7s 2 + e −20s 2 s +4 s +4 s +4 s +4 Using the second shifting property to ﬁnd the inverse of the last two terms on the right, we ﬁnd y(t ) = L−1 [Y (s)] 1 1 1 = cos 2t + sin 2t + u(t − 7) sin 2(t − 7) + u(t − 20) sin 2(t − 20) 2 2 2 A plot of the solution function y(t ) is shown in ﬁgure 5.11. We know that because the system is undamped, once it is set in motion it will oscillate at the

Solving IVPs with the Laplace transform

369

y 1

t 30

−1 Figure 5.11 The solution to the IVP of

example 5.5.5.

same amplitude indeﬁnitely in the absence of other forces. When the hammer blows are delivered at t = 7 and t = 20, this will obviously change the amplitude of oscillation. At ﬁrst the observed behavior may seem counterintuitive, as the hammer strikes are diminishing the amount of oscillation. However, if we note that the impulses are delivered in the positive direction at a time when the mass is traveling in the negative direction, then, indeed, the resulting solution accurately models the physical situation. It is interesting to explore how delivering the impulses at other times impacts the system. Note that our work with Laplace transforms in example 5.5.5 is essentially unchanged by the times the impulses occur. In particular, if the hammer strikes occur at t = a and t = b, then the solution will be 1 1 1 y(t ) = cos 2t + sin 2t + u(t − a) sin 2(t − a) + u(t − b) sin 2(t − b) 2 2 2 If we choose a = 9 and b = 18, we see substantially different behavior in the solution function due to the fact that these impulses occur in the same direction as the motion at the time they are delivered. A plot of the solution y(t ) in this case is shown in ﬁgure 5.12. Exercises 5.5 In exercises 1–20, solve the stated initial-value problem using Laplace transforms. In each case, sketch a plot of your solution. 1. y + 5y = 20,

y(0) = 3

2.

y + 3y

= e 2t ,

y(0) = −2

3.

y − 2y

= e 2t ,

y(0) = 1

4.

y + 4y

= sin 3t ,

5.

y + y

= te t ,

y(0) = 5

y(0) = −1

370

Laplace transforms

y

2

t 30

−2 Figure 5.12 The solution to the IVP of

example 5.5.5 where the impulses instead occur at t = 9 and t = 18.

6. y − 8y = u(t − 1),

y(0) = −4

7.

y − 8y

= u(t − 3) · t ,

8.

y − 8y

= δ (t − 1),

9.

y + 9y

= 0,

y(0) = 0, y (0) = 5

10. y − 9y = 0,

y(0) = 2, y (0) = 0

11. y + 9y = 2,

y(0) = 0, y (0) = 1

y(0) = −4 y(0) = −4

12. y + 9y = 5 cos t ,

y(0) = 0, y (0) = 0

13. y + 9y = 5 cos 3t ,

y(0) = 0, y (0) = 0

14. y + 7y + 12y = 0,

y(0) = 0, y (0) = 3

15. y + 6y + 9y = 0,

y(0) = 2, y (0) = 0

16. y + 2y + y = 3t ,

y(0) = 0, y (0) = 0

17. y + 2y + 5y = u(t − 4),

y(0) = 1, y (0) = 0

18. y − 2y − 3y = u(t − 3),

y(0) = 2, y (0) = 0

19. y − 2y − 3y = u(t − 3),

y(0) = 2, y (0) = 0

20. y + 2y + 5y = δ (t − 1),

y(0) = 0, y (0) = 0

For exercises 21–26, solve the stated initial-value problem from exercises 1–20 by standard means developed in preceding chapters (i.e., without using Laplace transforms). 21. y + 3y = e 2t , 22.

y + 4y

y(0) = −2

= sin 3t ,

23. y + y = te t ,

y(0) = 5

y(0) = −1

More on the inverse Laplace transform

24. y + 9y = 2,

371

y(0) = 0, y (0) = 1

25. y + 9y = 5 cos 3t ,

y(0) = 0, y (0) = 0

26. y + 2y + y = 3t ,

y(0) = 0, y (0) = 0

In exercises 27–32, use Laplace transforms to determine the displacement y(t ) of the spring-mass system with spring constant k = 72 and mass m = 2 kg for the given forcing function f (t ). Assume each time the system starts from rest; solve for y(t ) in the cases where the spring constant c is (a) c = 0, (b) c = 2, (c) c = 24, and (d) c = 40, assuming consistent units. Sketch a plot of each solution. 27. f (t ) = 2 28. f (t ) = 10 sin 2t 29. f (t ) = 10 sin 6t 30. f (t ) = 10[u(t ) − u(t − 4π )] 31. f (t ) = 10e −0.2t 32. f (t ) = 100δ (t ) In exercises 33–38, consider an RLC circuit for which an inductor of L = 1 H and capacitor C = 0.01 F are present. For each given forcing function f (t ), use Laplace transforms to determine the charge Q(t ) and current I (t ) in the circuit at time t if initially Q(0) = 0 and I (0) = 0. Determine the charge and current in the cases where the resistance is (a) R = 0 , (b) R = 16 , (c) R = 20 , and (d) R = 25 , assuming consistent units. Sketch a plot of each solution. 33. f (t ) = 10 34. f (t ) = 10 sin 10t 35. f (t ) = 5 sin 10t 36. f (t ) = 10[u(t ) − u(t − 2π )] 37. f (t ) = 10δ (t ) 38. f (t ) = 20e −t 5.6 More on the inverse Laplace transform

In this section, we provide an overall summary of properties of the inverse transform and present some further practice with computations. We close with a discussion of how transforms and inverse transforms may be found using a computer algebra system. To begin, table 5.3 provides a list of familiar functions F (s) and their inverse transforms, as well as several key general properties of the inverse transform.

372

Laplace transforms

Table 5.3 Inverse Laplace transforms of some basic functions and other fundamental properties. F(s)

f (t) = L−1 [F(s)]

1/s n

t n /n !

1/(s − a)

e at

s /(s 2 + k 2 )

cos kt

k /(s 2 + k 2 )

sin kt

s /(s 2 − k 2 )

cosh kt

k /(s 2 − k 2 )

sinh kt

aF (s) + bG(s)

af (t ) + bg (t )

F (s − a)

e at f (t )

e −as

δ (t − a)

e −as F (s)

u(t − a)f (t − a)

Most of the lines in the table are derived from taking the inverse perspective on statements in tables 5.1 and 5.2. While full tables of Laplace transforms typically number many pages, we present only a small collection for use in standard problems involving spring-mass systems and RLC circuits, leaving other examples for exploration in other sources or computer algebra systems. The next several examples demonstrate standard techniques in the computation of inverse transforms. Determine L−1 [F (s)] for each of the following functions: e −2s 2 4se −2π s (b) F (s) = 4 (c) F (s) = 2 (a) F (s) = 2 2 s(s + 1) s + 4s (s + 2s + 5)(s 2 + 9)

Example 5.6.1

Solution. (a) Because of the presence of e −2s in F (s), we will use the second shifting property. But ﬁrst, we ﬁnd the partial fraction decomposition 1 1 1 1 = − − 2 s(s + 1) s s + 1 (s + 1)2 and note that 1 1 1 −1 −1 1 L − − =L s(s + 1)2 s s + 1 (s + 1)2 = 1 − e −t − te −t

More on the inverse Laplace transform

373

Now, in order to compute the inverse transform of the given function, we use the second shifting property to address the presence of e −2s in each term and thus ﬁnd that −2s e −1 L = u(t − 2)[1 − e −(t −2) − (t − 2)e −(t −2) ] s(s + 1)2 (b) Partial fractions shows that F (s) =

2 1 = s 4 + 4s 2 2

1 1 − s2 s2 + 4

Using the inverses of familiar transforms of f (t ) = t and f (t ) = sin 2t , we see 1 1 −1 L [F (s)] = t − sin 2t 2 2 (c) Given the function F (s) =

4se −2π s (s 2 + 2s + 5)(s 2 + 9)

we see that the presence of e −2π s implies the inverse of the second shifting property will be used. As is now custom, we ﬁrst use partial fractions to break the rational part of F (s) into a sum of simpler expressions. Doing so and completing the square to re-express s 2 + 2s + 5, 1 4s − 18 4s − 10 4s = + − 2 (s 2 + 2s + 5)(s 2 + 9) 13 s + 9 s 2 + 2s + 5 1 −4s 18 4(s + 1) 14 + + − = 13 s 2 + 9 s 2 + 9 (s + 1)2 + 4 (s + 1)2 + 4 Letting G(s) = 4s /(s 2 + 2s + 5)(s 2 + 9), it now follows from familiar rules with inverse transforms and the ﬁrst shifting property that L−1 [G(s)] = −

4 18 4 7 cos 3t + sin 3t + e −t cos 2t − e −t sin 2t 13 39 13 13

Finally, since F (s) = e −2π s G(s), the second shifting property implies 4 6 L−1 [F (s)] = u(t − 2π ) − cos 3(t − 2π ) + sin 3(t − 2π ) 13 13 4 −t 7 + u(t − 2π ) e cos 2(t − 2π ) − e −(t −2π ) sin 2(t − 2π ) 13 13

374

Laplace transforms

The 2π shift in each of the sine and cosine functions can be removed; for instance, cos 3(t − 2π ) = cos 3t . Doing so throughout shows that L−1 [F (s)] = u(t − 2π ) 4 6 4 −(t −2π ) 7 −(t −2π ) cos 2t − e sin 2t − cos 3t + sin 3t + e 13 13 13 13

There are certainly other properties of the inverse Laplace transform that we could study. For example, theorem 5.3.4 in inverse form allows us to say that if L−1 [F (s)] = f (t ) and f (0) = 0, then L−1 [sF (s)] = f (t ) (5.6.1) While results like this are theoretically interesting and can occasionally enable us to determine inverse transforms in alternate ways, they are less useful in pragmatic terms when we think of our overarching goal: using Laplace transforms to solve initial-value problems. Indeed, our work throughout this chapter has given us a good overview of how Laplace transforms work, especially the role they play in solving initialvalue problems. Of course, there are also many forcing functions we have not discussed for which Laplace transforms may be taken. There are books that contain lengthy tables of Laplace transforms and inverse transforms that we could, if necessary, consult. But because of the technology available to us, these tables have essentially been rendered obsolete. Most computer algebra systems are fully capable of computing Laplace transforms and their inverses, so we choose not to study methods for these more difﬁcult calculations. The next example demonstrates one such function F (s) which is beyond the methods we have developed but that can easily be handled by a computer algebra system. Example 5.6.2

Solution.

Find the inverse Laplace transform of 9 F (s) = 2 2 (s + 1) (s 2 + 4)2

The partial fraction decomposition of F (s) is 2/3 1 2/3 1 F (s) = − 2 + 2 + 2 + 2 (5.6.2) 2 s + 1 (s + 1) s + 4 (s + 4)2 Two of the terms in (5.6.2) are straightforward to invert, but the two involving squares of irreducible quadratic terms are not among familiar functions from our previous work. In the following subsection, we demonstrate how to use Maple to compute the inverse transform of such functions. These computations reveal that 1 1 1 L−1 = sin t − t cos t (s 2 + 1)2 2 2 and 1 1 1 −1 L = sin 2t − t cos 2t (s 2 + 4)2 16 8

More on the inverse Laplace transform

375

From this work and (5.6.2), we ﬁnd 1 1 1 2 1 1 L−1 [F (s)] = − sin t + sin t − t cos t + sin 2t + sin 2t − t cos 2t 3 2 2 3 16 8 1 1 19 1 = − sin t − t cos t + sin 2t − t cos 2t 6 2 48 8 Further discussion of how to use Maple to compute transforms and inverse transform follows in the next subsection. 5.6.1 Laplace transforms and inverse transforms using Maple

As we have noted, while we have computed Laplace transforms for a range of functions, there are many more examples we have not considered. Moreover, even for familiar functions, certain combinations of them can lead to tedious, involved calculations. Computer algebra systems such as Maple are fully capable of computing Laplace transforms of functions, as well as inverse transforms. Here we demonstrate the syntax required in the solution of the initial-value problem from example 5.5.4: y + 4y + 13y = 2u(t − π ) sin 3t ,

y(0) = 1, y (0) = 0

(5.6.3)

To begin, we load the inttrans package in Maple. > with(inttrans);

If, for example, we desire to use Maple to compute the Laplace transform of 2u(t − π ) sin 3t , we use the syntax > laplace(2*Heaviside(t-Pi)*sin(3*t),t,s);

This command results in the output 6e −s π s2 + 9 which is precisely the transform we expect. After computing by hand the transform of the left-hand side of (5.6.3) and solving for Y (s), as shown in detail in example 5.5.4, we have s +4 3 − 2e −π s 2 Y (s) = 2 2 s + 4s + 13 (s + 9)(s + 4s + 13) −

Here, we may use Maple’s invlaplace command to determine L−1 [Y (s)]. While we could choose to do so all at once, for simplicity of display we do so in two steps. First, > invlaplace((s+4)/(sˆ2 + 4*s + 13),s,t);

376

Laplace transforms

results in the output 1 (−2t ) (3 cos(3t ) + 2 sin(3t )) e 3 Similarly, for the second term in Y (s), we compute

(5.6.4)

> invlaplace(2*exp(-Pi*s)*3/((sˆ2 + 9)*(sˆ2 + 4*s + 13)),s,t);

Maple produces the output 1 Heaviside(t − π )(3 cos(3t ) − sin(3t ) − e (−2t +2π ) (3 cos(3t ) + sin(3t ))) 3 (5.6.5) which corresponds to our work in example 5.5.4. The sum of the two functions of t that have resulted from inverse transforms in (5.6.4) and (5.6.5) is precisely the solution to the IVP. Note that in computing the inverse transform (5.6.5), Maple has implicitly executed the partial fraction decomposition of the expression 3 2 2 (s + 9)(s + 4s + 13) If we wish to ﬁnd this explicitly, we can use the command > convert(3/((sˆ2 + 9)*(sˆ2 + 4*s + 13)), parfrac, s);

which produces the output 1 3 − 3s 1 9 + 3s + 2 2 40 s + 9 40 s + 4s + 13 In general, we see that to compute the Laplace transform of f (t ) in Maple we use the syntax > laplace(f(t),t,s);

whereas to compute the inverse transform of F (s), we enter > invlaplace(F(s),s,t);

Exercises 5.6 In exercises 1–9, ﬁnd the inverse Laplace transform of the given function F (s) using familiar techniques or a computer algebra system. 1. F (s) =

2s (s + 3)2

2. F (s) =

4 (s 2 − 4)2

More on the inverse Laplace transform

3. F (s) =

1 s 2 (s − 2)

4. F (s) =

2 (s 2 − 1)2 (s 2 + 1)

5. F (s) =

s2 + 1 (s + 1)2 (s 2 + 4)

6. F (s) =

e −s s 2 (s − 2)

7. F (s) = e −3s 8. F (s) =

377

2 (s 2 − 1)2 (s 2 + 1)

5s 2 + 20 s(s − 1)(s 2 − 5s + 4)

9. F (s) = e −π s

5s 2 + 20 s(s − 1)(s 2 − 5s + 4)

In exercises 10–22, solve the stated initial-value problem using Laplace transforms (using a computer algebra system as necessary). Sketch a plot of each solution. 10. y + y = e −t + te −t , 11. y + 4y = sin 2t ,

y(0) = 1 y (0) = 1

y(0) = 0,

12. y + 4y = sin 2t + δ (t − 6),

y (0) = 1

y(0) = 0,

13. y + 4y = sin 2t + δ (t − 6) + δ (t − 12),

y(0) = 0,

14. y + 9y = cos 3t + t cos 3t ,

y(0) = 0,

y (0) = 1

15. y + 2y + 5y = e −t sin 2t ,

y(0) = 0,

y (0) = 1

16. y + 2y + 5y = e −t sin 2t + te −t sin 2t ,

y(0) = 0,

17. y + 2y + 5y = e −t sin 2t + u(t − π )te −t sin 2t , 18. y + y − 2y = 4e t + 1,

20. y + y − 2y = 4e t + u(t − 3),

y(0) = 0,

y(0) = 1, y(0) = 1,

y (0) = 1

y(0) = 0,

y (0) = 0

y (0) = 0

21. y + 2y + 5y = e −t sin 2t + te −t sin 2t + δ (t − 5), 22. y + 2y + 5y = 13e t sin t ,

y (0) = 1

y (0) = 0

y(0) = 1,

19. y + y − 2y = 4e t + 1 + δ (t − 3),

y (0) = 1

y(0) = 0,

y (0) = 0

y (0) = 1

378

Laplace transforms

5.7 For further study 5.7.1 Laplace transforms of inﬁnite series

If f (t ) is a function of exponential order that is analytic6 at t = 0 with an inﬁnite radius of convergence, then f (t ) may be expressed as a power series and also has a Laplace transform. It therefore follows that if f (t ) =

∞ ,

an t n

n =0

then its transform is F (s) = L[f (t )] =

∞ ,

an L[t n ] =

n =0

∞ ,

n !an

n =0

1 s n +1

(5.7.1)

We begin by exploring the transforms of some familiar functions through the use of inﬁnite series. (a) Recall that f (t ) = e t is analytic at t = 0 with series expansion et =

∞ n , t

n =0

n!

= 1+t +

t2 t3 + + ··· 2! 3!

(5.7.2)

By taking the Laplace transform of the series (5.7.2) term-wise,7 show that L[e t ] =

∞ , 1 s n+1

(5.7.3)

n =0

Then, recognize (5.7.3) as a geometric series to show that L[e t ] =

1 s −1

(b) Similarly, use the fact that f (t ) = sin t has the series expansion sin t = t −

t3 t5 + − ··· 3! 5!

to show using inﬁnite series that L[sin t ] =

1 s2 + 1

6 More on power series expansions of functions and the meaning of terms such as “analytic” may be found in Section 8.2. 7 While the Laplace transform of a ﬁnite sum is the sum of the Laplace transforms of the individual terms, it is not obvious that this property holds for inﬁnite sums. The formal justiﬁcation that this is valid in what follows is beyond the scope of this text; the reader may assume that this step is valid, and proceed as directed.

For further study

379

In addition, develop the Laplace transform of f (t ) = cos t using the series expansion cos t = 1 − t 2 /2! + t 4 /4! − · · · . While power series expansions of such familiar functions as e t , sin t , and cos t are important and offer a different perspective on the development of the transforms of these functions, power series are even more useful for working with functions that are more complicated. For example, if we seek the transform of e −t − 1 (5.7.4) f (t ) = t none of the methods we have previously discussed apply. However, standard techniques8 with inﬁnite series may be used to address functions such as (5.7.4). (c) Use the standard power series expansion for e t to show that f (t ) = (e −t − 1)/t has the series expansion ∞

, (−1)n e −t − 1 t t2 t3 = −1 + − + − · · · = t n −1 t 2! 3! 4! n! n =1

Then, compute the Laplace transform of the series expression to show that −t e −1 1 1 1 L (5.7.5) = − + 2 − 3 + ··· t s 2s 3s (d) Even though the Laplace transform of an analytic function will result in an inﬁnite sum involving negative powers of s, sometime we can recognize the transform as a familiar function. To see this in (5.7.5), use the known series expansion 1 1 ln(1 + x) = x − x 2 + x 3 − · · · 2 3 and the substitution x = 1/s to show that −t e −1 1 L = − ln 1 + t s (e) From the standard series expansion for the function sin t , determine the Taylor series of sin t (5.7.6) f (t ) = t and hence compute the Laplace transform of (5.7.6). Then, use the expansion 1 1 1 arctan x = x − x 3 + x 5 − x 7 + · · · 3 5 7 8

A review of the development of power series of functions can be found in section 8.2.

380

Laplace transforms

and an appropriate substitution to show that sin t 1 L = arctan t s (f) Use series techniques to show that cos t − 1 1 1 L = − ln 1 + 2 s t 2 5.7.2 Laplace transforms of periodic forcing functions

Nonhomogeneous differential equations often involve periodic forcing functions. In section 4.5, we considered the effects of the forcing function f (t ) = sin ωt in connection with the natural frequency of a system. More generally, here we examine periodic forcing functions that are piecewise continuous. Such functions satisfy the relationship that for some value of a, f (t ) = f (t + a) + f (t + 2a) + f (t + 3a) + · · · + f (t + na) + · · ·

(5.7.7)

An example of such a function is shown in ﬁgure 5.13. Taking the Laplace transform of such a function f , we may write the transform as the inﬁnite sum of integrals ∞ L[f (t )] = f (t )e −st dt

0 a

=

f (t )e −st dt +

0

2a

a

f (t )e −st dt +

3a

f (t )e −st dt + · · ·

2a

f(t) a

t

Figure 5.13 A periodic function with

period a that is piecewise continuous.

(5.7.8)

For further study

381

(a) Using the change of variables t = τ + a in the second integral, t = τ + 2a in the third, and so on, show that a a −st L[f (t )] = f (t )e dt + f (τ + a)e −s(τ +a) d τ 0

0

a

+

f (τ + 2a)e −s(τ +2a) d τ + · · ·

(5.7.9)

0

(b) By replacing the integration variable τ with t in (5.7.9), show that a L[f (t )] = [1 + e −as + e −2as + · · · ] f (t )e −st dt (5.7.10) 0

Then, use the fact that the inﬁnite series in (5.7.10) is geometric in order to conclude a 1 L[f (t )] = f (t )e −st dt (5.7.11) 1 − e −as 0 (c) Use (5.7.11) to determine the Laplace transform of the square wave function shown in ﬁgure 5.14. (The vertical lines shown in the graph are not actually part of the function’s graph; indeed, f is piecewise constant with value 3 on [0, 2) and value −3 on [−2, 4), and so on.) In particular, show that L[f (t )] =

3 1 − e −2s · s 1 + e −2s

where f (t ) is the function pictured in ﬁgure 5.14. y 3

t −1

1

3

5

7

−3 Figure 5.14 A square wave with amplitude 3

and period 4.

382

Laplace transforms

(d) Consider the periodic function with period 2π given by ! sin t , if 0 < t < π f (t ) = 0, if π < t < 2π This function is called the half-rectiﬁed sine wave since it only consists of the top-half of the standard sine function. Sketch a graph of this function and show that its Laplace transform is L[f (t )] =

1 + e −π s (1 − e −2π s )(s 2 + 1)

(e) Let a slightly damped spring-mass system be given with m = 1, c = 0.02, and k = 25, and be driven by a square-wave periodic forcing function f (t ) with amplitude 5 and period 2π . We will use Laplace transforms to solve the initial-value problem that governs this system under the assumption that the system starts from rest. (i) The stated problem is modeled by the initial-value problem y + 0.02y + 25y = f (t ),

y(0) = 0, y (0) = 0

Take Laplace transforms to show that Y (s) = L[y(t )] must satisfy the equation F (s) (5.7.12) Y (s) = 2 s + 0.02s + 25 where F (s) = L[f (t )]. (ii) While we have learned in (c) how to write the transform of a square wave function without using inﬁnite series in its expression, it turns out for this problem that a series expansion is necessary for ﬁnding the inverse transform when solving the IVP. By writing the square wave function given in this problem in the form f (t ) = 5u(t ) − 10u(t − π ) + 10u(t − 2π ) − 10u(t − 3π ) + · · · show that 5 F (s) = L[f (t )] = [1 − 2e −π s + 2e −2π s − 2e −3π s + · · · ] s (iii) Explain why 1 1 ≈ s 2 + 0.02s + 25 (s + 0.01)2 + 52

(5.7.13)

(5.7.14)

(iv) Combine (5.7.12), (5.7.13), and (5.7.14) in order to conclude that y(t ) = L−1 [Y (s)] 5 −π s −2π s = L−1 1 − 2e + 2e − · · · ] (5.7.15) [ s [(s + 0.01)2 + 52 ]

For further study

383

Explain why we have to ﬁnd the inverse transform in (5.7.15) term-by-term. (v) Compute the inverse transform of the ﬁrst term 5 y1 (t ) = L−1 s [(s + 0.01)2 + 52 ] in (5.7.15) given the partial fraction decomposition 5 0.2 0.2s + 0.004 = − 2 2 s [(s + 0.01) + 5 ] s (s + 0.01)2 + 52 (Hint: 0.2s + 0.004 = 0.2(s + 0.01) + 0.002) Conclude that y1 (t ) = 0.2 − e −0.01t (0.2 cos 5t − 0.0004 sin 5t )

(5.7.16)

(vi) Compute the inverse transform of the second term 5 −1 −π s y2 (t ) = L −2e s [(s + 0.01)2 + 52 ] in (5.7.15) using (5.7.16) and the second shifting property. Using the fact that cos 5(t − π ) = − cos 5t and sin 5(t − π ) = − sin 5t , conclude that . y2 (t ) = −2u(t − π ) 0.2 + e −0.01(t −π ) (0.2 cos 5t + 0.0004 sin 5t ) = −2u(t − π ){0.2 + e 0.01π [0.2 − y0 (t )]}

(5.7.17)

(vii) Compute the inverse transform of the third term 5 −1 −2π s y3 (t ) = L 2e s [(s + 0.01)2 + 52 ] in (5.7.15) using (5.7.16) and the second shifting property. Using the fact that cos 5(t − 2π ) = − cos 5t and sin 5(t − 2π ) = − sin 5t , conclude that . y3 (t ) = 2u(t − 2π ) 0.2 − e −0.01(t −2π ) (0.2 cos 5t + 0.0004 sin 5t ) = 2u(t − 2π ){0.2 − e 0.02π [0.2 − y0 (t )]}

(5.7.18)

(viii) So far, we have found the formula for y(t ) valid up to t = 3π . In fact, y(t ) = y1 (t ), if 0 < t < π y(t ) = y1 (t ) + y2 (t ), if π < t < 2π y(t ) = y1 (t ) + y2 (t ) + y3 (t ), if 2π < t < 3π

384

Laplace transforms

Using y1 (t ) = 0.2 − e 0π [0.2 − y1 (t )], together with (5.7.17) and (5.7.18), plus the fact that on 2π < t < 3π we know u(t − π ) = u(t − 2π ) = 1, show that on 2π < t < 3π , y(t ) = 0.2 − [0.2 − y1 (t )] 1 + 2e 0.01π + 2e 0.02π (ix) Using the patterns established in (5.7.17) and (5.7.18), explain why y(t ) = y1 (t ) + y2 (t ) + · · · + yn (t ) = (−1)n 0.2 − [0.2 − y1 (t )] 1 + 2e 0.01π + · · · + 2e 0.01nπ

(5.7.19)

is valid for n π < t < (n + 1)π for any positive integer n (x) Letting z(t ) = e −0.01t (cos 5t + 0.002 sin 5t ) and using the fact that 1 − x n+1 /1 − x = 1 + x + x 2 + · · · x n , show that on n π < t < (n + 1)π , 2 2e (n+1)0.01π n 1 − z(t ) + z(t ) (5.7.20) y(t ) = (−1) 5 5(1 − e 0.01π 5(1 − e 0.01π ) Explain why as t → ∞, it follows that y(t ) → ∞. Using a computer algebra system, graph the solution function on several consecutive large intervals of width π , such as [200π, 201π], [201π, 202π], etc., and discuss the behavior of the system.

5.7.3 Laplace transforms of systems

Recall that the standard initial-value problem for a system of ﬁrst-order DEs is given in matrix form by x = Ax + f (t ),

x(0) = b

(5.7.21)

In the event that f is a continuous function, the variation of parameters technique applies. But, if f is a step function or otherwise piecewise deﬁned, our earlier methods fail, and Laplace transforms may be used. Regardless, the Laplace transform can be a useful tool for systems for many of the same reasons it is for single DEs, such as the fact that it treats all linear systems in a uniform manner and incorporates the initial conditions immediately into the process of ﬁnding the solution. Since each of the three terms in the equation in (5.7.21) is a vector, Laplace transforms may be applied component-wise. For example, x1 (t ) L[x1 (t )] L[x (t )] = L = x2 (t ) L[x2 (t )] sX1 (s) − x1 (0) = = sX(s) − x(0) sX2 (s) − x2 (0)

For further study

385

where we let X(s) denote the Laplace transform of the vector function x(t ). Letting F(s) be the transform of the vector f (t ), we may deduce from (5.7.21) and theorem 5.3.4 that sX(s) − x(0) = AX(s) + F(s)

(5.7.22)

(a) Solve (5.7.22) for X(s) to show that X(s) = Z(s)(F(s) + b)

(5.7.23)

where Z(s) = (sI − A)−1 and b = x(0). Explain why we must assume that s is not an eigenvalue of A when we write X(s) in the form (5.7.23). (b) Next we solve an example system in step-by-step fashion. Consider the IVP 2t 1 0 1 e , x+ x(0) = (5.7.24) x = 0 −1 3 3 (i) Compute F(s) and hence show that

F(s) + x(0) =

1 s −2 + 1 3 s

(ii) Use the given coefﬁcient matrix A to compute Z(s) = (sI − A)−1 and conclude9 that 1 s −3 0 Z(s) = −1 s − 1 (s − 1)(s − 3) (iii) Compute X(s) using (5.7.23) to show that 1 s X(s) = s(s − 2) 2 (iv) Finally, use the inverse Laplace transform component-wise on X(s) (using standard inverse transform techniques) to ﬁnd 2t e −1 x(t ) = L [X(s)] = 2t −1 e (c) Use Laplace transforms and the solution technique outlined in (b) above to ﬁnd the solution of each system of IVPs below. 1 1 cos t −1 (i) x = x+ x(0) = , 0 −1 1 − sin t 0 2 sin t 1 (ii) x = x+ x(0) = , 1 −1 sin t 0 9

Recall the shortcut

a c

b d

−1

=

1 ad − bc

d −c

−b a

.

386

Laplace transforms

(iii)

x

(iv)

x

= = ⎡

2 1 3 0 2 1 3 0

2 1 (v) x = ⎣ 0 2 0 0 ⎡

t 1 , x+ x(0) = t 0 t 0 , x+ x(0) = t 0 ⎡ ⎤ ⎤ ⎡ t⎤ 0 e 0 0 ⎦x + ⎣ 1 ⎦, x(0) = ⎣ 0 ⎦ 0 0 −1

⎡ t⎤ ⎤ 2 1 0 e 0 ⎦x + ⎣ 1 ⎦, (vi) x = ⎣ 0 2 0 0 −1 0

⎡ ⎤ 1 x(0) = ⎣ 0 ⎦ 0

6 Nonlinear systems of differential equations

6.1 Motivating problems

In our studies so far, we have seen that a variety of interesting physical situations can be modeled by linear systems of differential equations. Moreover, nearly all linear systems may be solved explicitly. But, many important phenomena are nonlinear in nature; in order to motivate our upcoming work with such systems, we consider two applications where nonlinear systems of equations arise. A pendulum is a mesmerizing phenomenon. Whether on a grandfather clock or in the hand of a hypnotist, there is something fascinating about its motion. It turns out that a nonlinear second-order differential equation (and hence a system of nonlinear ﬁrst-order equations) models its behavior. To develop this differential equation, let a rigid arm of length L be attached to a point from which it may swing freely. In this discussion, we will assume for simplicity that no damping is present. Similarly, to simplify the physics we assume that the arm itself has negligible mass. Finally, we attach a mass m to the end of the rigid arm and set the pendulum in motion, as shown in ﬁgure 6.1. We are interested in how the mass travels along a circular arc once the mass is set in motion. The quantities of interest to us are noted in ﬁgure 6.1; the variable θ represents the angle (in radians) the arm makes with the vertical axis and s denotes the displacement of the center of the mass along the circular arc. Because the mass is traveling along a circular arc, it follows that s = L θ . Noting that both s and θ are implicit functions of t , we can differentiate with respect to t and ﬁnd s (t ) = L θ (t ) and s (t ) = L θ (t ). In particular, the velocity of the center of the mass along the arc is s (t ) and its acceleration is s (t ).

387

388

Nonlinear systems of differential equations

y

q

L

m

s x

Figure 6.1 A simple pendulum.

y

q

L

m s W=mg

mgcosq mgsinq

x Figure 6.2 Component of gravity’s force

along the pendulum’s motion.

Since the acceleration a(t ) is given by a(t ) = s (t ), we have d 2s d 2θ = L (6.1.1) dt 2 dt 2 Since we have assumed that there is no damping present, once the mass is set in motion the only force acting on the pendulum is gravity. Because we are studying the displacement, velocity, and acceleration of the mass along its path, we must consider the magnitude of the weight W = mg in the direction of motion. From ﬁgure 6.2, we see that gravity induces a force of magnitude W sin θ along the circular arc. Note, too, that this force opposes the motion of the pendulum, assuming s (t ) is positive. a(t ) =

Motivating problems

389

From Newton’s second law, F = ma, it now follows that ma = −mg sin θ , or (6.1.2) a(t ) = −g sin θ (t ) Using the two equivalent expressions for acceleration in (6.1.1) and (6.1.2), it follows that d 2θ (6.1.3) L 2 = −g sin θ dt If we assume that an initial displacement angle θ (0) = θ0 and initial angular velocity θ (0) = θ0 are given, then after rearranging (6.1.3) it follows that θ satisﬁes the initial-value problem g θ + sin θ = 0, θ (0) = θ0 , θ (0) = θ0 (6.1.4) L Because of the presence of sin θ in this equation, this second-order differential equation is nonlinear, which means that none of our previous solution methods apply. If we use the substitution x1 = θ and x2 = θ to recast (6.1.4) as a nonlinear system of ﬁrst-order differential equations, then it turns out that the system has a natural graphical interpretation through its slope ﬁeld, just as we saw with linear systems of differential equations. Using this substitution, we observe that the pendulum is governed by the system

x1 = x2 g x2 = − sin x1 L with initial conditions x1 (0) = θ0 and x2 (0) = θ0 . Besides studying the associated slope ﬁeld, we will also learn that it is possible to approximate this nonlinear system at key points with a linear system to better understand its behavior, particularly at any equilibrium points it may have. In subsequent sections, we will explore these issues in greater detail and return to this example involving the pendulum several times, including an investigation of what happens when friction is present. In addition to the pendulum, another system of nonlinear differential equations arises in the study of population dynamics. Let us consider a population W (t ) of wolves (in hundreds) that prey upon a population M (t ) of moose (in hundreds), where t is time measured in years. A good example of such a situation, and one that biologists have studied in detail, occurs on Isle Royale in Lake Superior. On this remote island, wolves are the only predator of moose and moose are essentially the only prey of wolves. Suppose that in the absence of moose, the wolves would die off at a rate proportional to their own number according to a differential equation such as dW = −0.75W dt In the presence of moose, however, we expect more of the wolves to be able to survive, and to do so at a rate proportional to the moose–wolf interactions since

390

Nonlinear systems of differential equations

these can result in food for the wolves. The number of moose–wolf interactions can be modeled by taking the product of M and W ; only some fraction of such interactions will be beneﬁcial to the wolves. Thus, the wolf population can be assumed to satisfy a differential equation of the form dW = −0.75W + 0.25MW (6.1.5) dt Likewise, in the absence of wolves, we would expect the number of moose to grow unencumbered (at least in the short term). We might, therefore, have a differential equation like dM = 0.5M dt But with wolves around, some of the moose will die due to moose–wolf interactions, hence we assume the moose population satisﬁes an equation like dM = 0.5M − 0.1MW (6.1.6) dt Equations (6.1.5) and (6.1.6) lead to the system of nonlinear differential equations dW = −0.75W + 0.25MW dt dM = 0.5M − 0.1MW dt Systems of this form (regardless of the values of the constants) are typically known as predator–prey or Lotka–Volterra equations. Factoring the right-hand side in each equation above, we see that the wolf and moose populations satisfy dW = W (−0.75 + 0.25M ) dt dM = M (0.5 − 0.1W ) dt from which it is evident that the system of differential equations has not only the obvious equilibrium point at the origin, but also one at (5, 3). What kind of behavior should we expect for the wolf and moose populations for initial conditions near (5, 3)? In particular, is this equilibrium point stable? Are there ways we can approximate this nonlinear system with a linear one? These questions and more are the focus of subsequent sections as we investigate nonlinear systems of DEs. Our in-depth study of linear systems of differential equations in chapter 3 will prove useful in the study of nonlinear systems: as we see in section 6.2, we can study the graphical behavior of solutions to nonlinear systems in the phase plane by plotting a direction ﬁeld, just as we did with linear systems. Moreover, in section 6.3 we will study a process by which we can approximate the nonlinear system at a point by a linear system and use our understanding of the behavior of linear systems to make predictions about the nonlinear system.

Graphical behavior of solutions for 2 × 2 nonlinear systems

391

6.2 Graphical behavior of solutions for 2 × 2 nonlinear systems

In our study of single ﬁrst-order initial-value problems in chapter 2, we learned that every IVP associated with a linear differential equation with sufﬁciently well-behaved coefﬁcient functions has a unique solution; moreover, we can determine an explicit formula for the solution. As we learned in chapter 3, essentially the same situation holds for linear systems of differential equations; those with constant coefﬁcients and their corresponding IVPs can always be solved. However, in the case when the governing differential equation or system of equations is nonlinear, we are not guaranteed that solutions to initial-value problems exist, nor that they are unique when they do exist. In addition, as we now study nonlinear systems, we will ﬁnd that even when unique solutions exist, we are usually unable to determine explicit formulas for them. We therefore turn again to graphical and numerical investigations of the qualitative properties of solutions to nonlinear systems in order to understand their short- and long-term behavior. To begin, let us choose an example through which we can develop intuition. We consider the system given by x1 = x2 − x13 x2 = x1 − x23 If we let

x(t ) =

x1 (t ) x2 (t )

(6.2.1)

and F : R2 → R2 be the function deﬁned by F(x) = F(x1 , x2 ) = (x2 − x13 , x1 − x23 ) then it follows that we may view (6.2.1) as having the form x = F(x)

(6.2.2)

This is analogous to our work with linear systems of differential equations that may be expressed in the form x = Ax, where A is a matrix. In that setting, the right-hand side of the system is a linear function of x, but in (6.2.2), F(x) is not linear. Nonetheless, a graphical interpretation of the system remains both possible and enlightening. In section 3.4, we discussed the graphical behavior of a vector function. Here, we simply remind ourselves that for the system x = F(x) in (6.2.1), a solution x(t ) is a vector function whose output lies in R2 and whose graph is the curve that is traced out by the vectors x(t ) at various times t . Moreover, the derivative x (t ) of x(t ) is itself a vector function that indicates the instantaneous velocity of a particle traveling along the curve traced out by x(t ). In particular, scalar multiples of x (t ) tell us the direction of motion or ﬂow along the solution curve as time increases.

392

Nonlinear systems of differential equations

We therefore turn again to direction ﬁelds to study the ﬂow of the solution curves through the vector ﬁeld generated by the system of differential equations. In particular, (6.2.2) indicates how, for any point (x1 , x2 ) in the plane, we can easily compute x = F(x1 , x2 ) at that point, and hence know the direction of the ﬂow of the solution curve that passes through that point. Using a computer algebra system to execute these computations repeatedly at points sampled throughout the plane, we can view the direction ﬁeld for the nonlinear system, which is analogous to the direction ﬁeld for a linear system. A direction ﬁeld for (6.2.1) is shown in ﬁgure 6.3. The x1 –x2 plane is again called the phase plane; the independent variable t remains implicit in the ﬂow, while the behavior of the curve relative to the coordinate axes demonstrates the interrelationship among the components x1 (t ) and x2 (t ) of the solution x(t ). Sample solution curves, such those plotted in ﬁgure 6.4 are typically called trajectories. In section 6.4 we will learn how to construct trajectories for systems through numerical approximation techniques such as Euler’s method. From ﬁgures 6.3 and 6.4, it appears that the system (6.2.1) has three equilibrium solutions. Speciﬁcally, the behavior of trajectories suggests the possibilities of equilibria at (−1, −1), (0, 0), and (1, 1). We can conﬁrm this algebraically by setting x = 0 and solving the resulting nonlinear system of equations 0 = x2 − x13

(6.2.3)

0 = x1 − x23

(6.2.4)

x2 3

x1 −3

3

−3 Figure 6.3 The direction ﬁeld for the system

x = F(x) given in (6.2.1).

Graphical behavior of solutions for 2 × 2 nonlinear systems

393

x2 3

x1 −3

3

−3 Figure 6.4 The direction ﬁeld for the system

x = F(x) given in (6.2.1) with three trajectories.

Equation (6.2.3) implies that x2 = x13 . Substituting this result in (6.2.4), it follows that 0 = x1 − (x13 )3 Factoring, we see 0 = x1 (1 − x18 ) = x1 (1 − x14 )(1 + x14 ) = x1 (1 − x12 )(1 + x12 )(1 + x14 ) from which we determine that x1 = 0, 1, or −1. Recalling that x2 = x13 , the corresponding x2 -values are x2 = 0, 1, and −1, and we have found that the equilibrium points of the system (6.2.1) are indeed (−1, −1), (0, 0), and (1, 1). Here, we see another distinction between linear and nonlinear systems of differential equations. For a linear system x = Ax, the search for equilibrium solutions means we must solve Ax = 0, which we know has either a unique solution or inﬁnitely many solutions. With nonlinear systems, it is possible that any number of equilibrium solutions exist (from none to inﬁnitely many). Moreover, there are no guarantees that we can even expect to analytically solve the resulting system of nonlinear algebraic equations to ﬁnd such equilibria. When we do ﬁnd equilibrium solutions to a system, it is natural to ask about their stability. For example, for the equilibrium solution (0, 0) to (6.2.1), we might observe from ﬁgure 6.3 that the origin seems to exhibit behavior similar to a saddle point and therefore may be unstable. To investigate this further, one option is to see if there is a linear system of differential equations to which we can compare (6.2.1). For x1 and x2 near zero, observe that both x13 and x23 are extremely small, so that in this region close to the origin it is reasonable for us to say that x1 = x2 − x13 ≈ x2 (6.2.5) x2 = x1 − x23 ≈ x1

394

Nonlinear systems of differential equations

In particular, note that the approximate system is linear, and we can write x = Ax, for x near 0 with 0 1 A= (6.2.6) 1 0 The eigenvalues of the matrix A are λ1 = −1 and λ2 = 1 with corresponding eigenvectors v1 = [−1 1]T and v2 = [1 1]T . Due to the fact that the eigenvalues are real and of opposing signs, it follows that the origin is indeed a saddle point for this approximating linear system and is therefore unstable. The phase plane for the linear system corresponding to (6.2.6) near 0 is displayed in ﬁgure 6.5. This behavior is consistent with that observed near the origin in ﬁgure 6.3. We will call the system x = Ax, where A is given by (6.2.6), the linearization of (6.2.1) near 0. In section 6.3, we will study this approximation to a nonlinear system of differential equations near any particular point of interest to us. We close this section with two examples of nonlinear systems in which we determine all equilibrium solutions and examine the graphical behavior of solutions near the equilibria. Example 6.2.1

Consider the system of differential equations given by x1 = sin x2

(6.2.7)

x2 = x2 − x12

Determine all equilibrium solutions of the system, plot the direction ﬁeld, and discuss the behavior of solutions near at least two of the equilibrium solutions.

x2 1

x1 −1

1

−1 Figure 6.5 The direction ﬁeld for the linear

system x = Ax given in (6.2.5).

Graphical behavior of solutions for 2 × 2 nonlinear systems

395

Solution. To ﬁnd the equilibrium solutions, we set x1 = x2 = 0 and solve the system of equations 0 = sin x2

(6.2.8)

0 = x2 − x12

(6.2.9)

Equation (6.2.8) implies that x2 must be any integer multiple of π , while (6.2.9) shows that x1 and x2 must satisfy the relationship x12 = x2 . This latter equation implies that x2 must be non-negative, and √ therefore with x2 = k π for any nonnegative integer k, it follows that x = ± k π and we have equilibrium solutions 1 √ √ of the form ( k π, k π ), (− k π, k π ) for k = 0, 1, 2, . . .. An appropriate window in which to plot the direction ﬁeld for this system might therefore be √ [−3, 3] × √ [−1, 8], as this √ ﬁve equilibrium √ will include the solutions (0, 0), (− π, π ), ( π, π ), (− 2π, 2π ), and ( 2π, 2π ). Plotting the direction ﬁeld, as shown in ﬁgure 6.6, we see that the system appears to demonstrate familiar √ the equilibrium solutions. For example, √ behavior around at the solutions ( π, π ) and (− 2π, 2π ), each seems to be a saddle point, based on √ the behavior √ of trajectories nearby. In addition, at the equilibrium points (− π, π ) and ( 2π, 2π ), the system appears to demonstrate spiraling behavior where the equilibria might act as stable centers or possibly as unstable spiral sources. Based on the periodicity of the sine function, we can reasonably expect that we would√see similar behavior demonstrated at other equilibrium points of the form (± k π, k π ), for k = 3, 4, . . .. Note further that all equilibria lie along the parabola x2 = x12 , as dictated by (6.2.9). Finally, it is evident that (0, 0) is an unstable equilibrium, though the precise behavior of solutions nearby is not entirely clear from the plot.

x2 7.5 5.0 2.5 x1 −3

−2

−1

1

2

3

Figure 6.6 The direction ﬁeld for

the system (6.2.7)√with equilibrium √ points √ (0, 0), (− √π , π ), ( π, π ), (− 2π , 2π ), and ( 2π , 2π ).

396

Nonlinear systems of differential equations

Indeed, it is apparent that we desire more precision, and not just in the vicinity of (0, 0); our study of the linearization of a system of nonlinear differential equations in the next section will enable a much more rigorous understanding of a system’s behavior near any equilibrium point. Example 6.2.2

Consider the system of differential equations given by x1 = −x1 + x1 x22

(6.2.10)

x2 = −2x2 + x2 x1

Determine all equilibrium solutions of the system, plot the direction ﬁeld, and discuss the behavior of solutions near at least two of the equilibrium solutions. Solution. In the standard way, to ﬁnd the equilibrium solutions we set x1 = x2 = 0 and solve the nonlinear system of equations 0 = −x1 + x1 x22 = x1 (−1 + x22 )

(6.2.11)

0 = −2x2 + x2 x1 = x2 (−2 + x1 )

(6.2.12)

From (6.2.12), we see that either x2 = 0 or x1 = 2. If x2 = 0, substituting this value for x2 in (6.2.11), it follows that x1 = 0, so one equilibrium solution is (0, 0). If x1 = 2, then (6.2.11) implies that −1 + x22 = 0, which in turn shows that x2 = ±1. Thus, two additional equilibrium solutions have been found: (2, 1) and (2, −1). A reasonable window for plotting the direction ﬁeld for this system is [−2, 4] × [−3, 3], since this will include the three equilibrium solutions we

3

x2

1 x1 −2

2

4

−1

−3 Figure 6.7 The direction ﬁeld for the sys-

tem (6.2.10) with equilibrium points (0, 0), (2, 1), and (2, −1).

Graphical behavior of solutions for 2 × 2 nonlinear systems

397

have found at (0, 0), (2, 1), and (2, −1). As we see in ﬁgure 6.7, it appears that (0, 0) is a stable attracting ﬁxed point and that both coordinate axes are straightline solutions. This observation is not surprising if we also think about linear approximations: for x1 and x2 near zero, x1 x22 and x1 x2 will be extremely small, and thus for such values the nonlinear system (6.2.10) can be approximated by the linear system x1 = −x1 (6.2.13) x2 = −2x2 The linear system (6.2.13) has the obvious solutions x1 (t ) = e −t and x2 (t ) = e −2t , which lead to the observed behavior near (0, 0) in the nonlinear system. From ﬁgure 6.7, it also appears that the equilibrium points (2, 1) and (2, −1) are saddle points. From all of our work in this section, we see that equilibrium solutions remain a vital part of our understanding of any system, whether linear or not. In addition, the picture painted by the direction ﬁeld is fundamental to understanding the behavior of solutions to a nonlinear system. And yet, we are left desiring more detail than the direction ﬁeld can provide. In section 6.3 we will develop the concept of the linearization of a system in order to link our understanding of linear systems to the behavior of nonlinear systems near equilibrium points. Furthermore, in section 6.4, we will generalize Euler’s method for single differential equations in order to apply it to systems to generate approximate solutions to solutions. 6.2.1 Plotting direction ﬁelds of nonlinear systems using Maple

The Maple syntax used to generate the plots in this section is essentially identical to that discussed for direction ﬁelds for linear systems in section 3.4.1. As always, we use the DEtools package, and load it with the command > with(DEtools):

To deﬁne the system of differential equations from example 6.2.1 in Maple, we use the command > sys := diff(x[1](t),t) = sin(x[2](t)), diff(x[1](t),t) = x[2](t) - x[1](t)ˆ2;

The system of differential equations of interest is now stored in “sys.” The direction ﬁeld may now be generated by the command > DEplot([sys], [x[1](t),x[2](t)], t=-1..1, x[1]=-3..3, x[2]=-1..8, arrows=large, color=gray);

398

Nonlinear systems of differential equations

In plots in section 6.2, we have also included the equilibrium points. These may be generated by the pointplot command, which requires us to load the plots package. For example, the syntax > with(plots): pointplot([0,0], [sqrt(Pi),Pi], [-sqrt(Pi),Pi], [sqrt(2*Pi),2*Pi], [-sqrt(2*Pi), 2*Pi], symbol=circle, symbolsize=7);

will produce a plot of just these ﬁve points in the plane. To superimpose these points on the direction ﬁeld, we can assign names to each plot and then display them together. Giving the respective plots the names DF and EQsol, we can use the display command as follows. Note the use of colons, rather than semicolons, to suppress output when we assign names to the plots. > DF := DEplot([sys], [x[1](t),x[2](t)], t=-1..1, x[1]=-3..3, x[2]=-1..8, arrows=large, color=gray): > EQsol := pointplot( [0,0], [sqrt(Pi),Pi], [-sqrt(Pi),Pi], [sqrt(2*Pi),2*Pi], [-sqrt(2*Pi), 2*Pi], symbol=circle, symbolsize=7): > display(DF, EQsol);

This combination of commands results in the output shown at left in ﬁgure 6.8. If desired, we can now sketch trajectories by hand. Maple has the capacity to include such trajectories, given initial conditions. For example, if we are given the initial conditions x(0) = (2, 6) and (−2, 6), we can modify the earlier DEplot x2

x2 7.5

7.5

5.0

5.0

2.5

2.5 x1

−3

−2

−1

1

2

3

x1 −3

−2

−1

1

2

3

Figure 6.8 At left, the direction ﬁeld for√the system (6.2.7) √with equilibrium

√ √ points (0, 0), (− π , π ), ( π , π ), (− 2π , 2π ), and ( 2π, 2π ). At the right, the same direction ﬁeld with trajectories through (2, 6) and (−2, 6) is included.

Graphical behavior of solutions for 2 × 2 nonlinear systems

399

command to > DEplot([sys], [x[1](t),x[2](t)], t=-2..2, x[1]=-3..3, x[2]=-1..8, arrows=medium, color=gray), [[x[1](0)=2,x[2](0)=6], [x[1](0)=-2,x[2](0)=6]]);

This most recent command, when saved and displayed simultaneously with the above plot of equilibrium solutions, results in the righthand plot in ﬁgure 6.8. As a reminder, we always expect to experiment some with the window in which the plot is displayed: the range of x- and y-values certainly affects how clearly the direction ﬁeld is revealed, and the range of t -values impacts how much of each trajectory is plotted. As the most recent section shows, a study of a system’s equilibrium points is a helpful guide for choosing a window in which to display a plot. Exercises 6.2 In exercises 1–7, (a) determine all equilibrium solutions, (b) use Maple to plot the direction ﬁeld, and (c) from the direction ﬁeld, visually estimate whether equilibrium solutions are stable or unstable and discuss the long-term behavior of solutions. 1. x1 = x2 − 2x1 x2 x2 = 4x1 x2 − x1

2. x1 = 4 − x22

x2 = 1 − x1 + x2

3. x1 = cos x2

x2 = 1 − sin x1

4. x1 = 2x1 − x2

x2 = −4x1 + 2x2

5. x1 = e −x2

x2 = 1/(1 + x12 )

6. x1 = ln(2 + x2 ) x2 = x12 + x2

7. x1 = x2 − x12

x2 = x1 − 8x22

8. Recall from section 6.1 that the nonlinear system of differential equations W = −0.75W + 0.25MW M = 0.5M − 0.1MW

400

Nonlinear systems of differential equations

models the numbers of wolves and moose (each measured in hundreds) in a predator–prey situation. Determine all equilibrium solutions to this system, plot an appropriate direction ﬁeld in a computer algebra system, and discuss the apparent long-term behavior of the wolf and moose populations. 9. Recall that if x1 = θ is the angle that the arm of a pendulum forms with the positive x-axis (as shown in ﬁgure 6.2) and x2 = x1 = θ , then x1 and x2 satisfy the nonlinear system of differential equations x1 = x2 g x2 = − sin x1 L Let g = 9.8 m/s2 and assume that the length of the arm is L = 2 m. Determine all equilibrium solutions to this system, plot an appropriate direction ﬁeld in a computer algebra system, and discuss the long-term behavior of solutions to the system. Be sure to relate your answers directly to the behavior of the pendulum and corresponding initial conditions.

6.3 Linear approximations of nonlinear systems

In our ﬁrst look at nonlinear systems in the preceding section, we considered the system x1 = x2 − x13 x2 = x1 − x23

(6.3.1)

and observed informally that near the origin where x ≈ 0, we can drop the x13 and x23 terms so that (6.3.1) can be approximated by the linear system x = Ax where 0 1 (6.3.2) A= 1 0 In this section, we make this notion of linear approximation of nonlinear systems more precise and use this approach to classify the stability of equilibria of nonlinear systems. An important idea in calculus is that all well-behaved functions are locally linear. That is, they appear linear when viewed up close; the line the function emulates is the tangent line to the curve at the point on which we focus. In particular, for a function f (x) that is differentiable at the value x = a, f (x) ≈ L(x) for x near a, where L(x) = f (a) + f (a)(x − a)

(6.3.3)

The function L(x) is usually called the tangent line approximation or linearization of f at x = a.

Linear approximations of nonlinear systems

401

We encounter the very same ideas in multivariable calculus. For a differentiable vector function r : R → R3 given by ⎡ ⎤ f (t ) r(t ) = ⎣ g (t ) ⎦ h(t ) for values of t near some ﬁxed value a, the curve in space that r(t ) generates can be approximated by the tangent line to the curve. In particular, r(t ) ≈ L(t ) where ⎡ ⎤ f (a) + f (a)(t − a) (6.3.4) L(t ) = r(a) + r (a)(t − a) = ⎣ g (a) + g (a)(t − a) ⎦ h(a) + h (a)(t − a) for t near a. As in the case of the scalar function f , L is called the tangent line approximation or linearization of r at t = a. Similarly for a differentiable real-valued function of several variables F : R2 → R given by z = F (x , y), F (x , y) can be approximated by its tangent plane for (x , y) near some ﬁxed point (a , b). That is, we have the approximation F (x , y) ≈ L(x , y) where L(x , y) = f (a , b) + fx (a , b)(x − a) + fy (a , b)(y − b)

(6.3.5)

L is called the tangent plane approximation or linearization of f at (a , b). There is obviously a great deal of similarity in the algebraic forms of the linear approximations given in (6.3.3), (6.3.4), and (6.3.5). How can we apply these ideas to systems of nonlinear differential equations? The next example, in which we reconsider (6.3.1), suggests one approach. Because of the pending use of partial derivatives, we will temporarily use the notation x = [x1 x2 ]T = [x y ]T . Example 6.3.1 Consider the system of differential equations x = f (x , y) = y − x 3 y = g (x , y) = x − y 3

(6.3.6)

Determine linear approximations to both f (x , y) and g (x , y) at the point (1, 1). Then explain how these linear combinations may be combined to form an overall linear approximation of (6.3.6) near (1, 1). Solution. In section 6.2, we considered this same system (using x1 and x2 for the functions, instead of x and y) and learned that the equilibrium solutions to the system are (−1, −1), (0, 0), and (1, 1). As noted at the start of this section, we have already considered a linear approximation of the system at (0, 0). Here, we focus on the behavior of solutions near the equilibrium solution (1, 1). To ﬁrst approximate x = f (x , y) = y − x 3 near (1, 1), we use (6.3.5) to ﬁnd the tangent plane approximation. Noting that fx (x , y) = −3x 2 and fy (x , y) = 1,

402

Nonlinear systems of differential equations

it follows that fx (1, 1) = −3 and fy (1, 1) = 1. Moreover, f (1, 1) = 0 since (1, 1) is an equilibrium solution of the system. Now, it follows that for (x , y) near (1, 1), f (x , y) ≈ f (1, 1) + fx (1, 1)(x − 1) + fy (1, 1)(y − 1) = 0 − 3(x − 1) + 1(y − 1) (6.3.7) Similar ideas applied to y = g (x , y) = x − y 3 show that for (x , y) near (1, 1), g (x , y) ≈ g (1, 1) + gx (1, 1)(x − 1) + gy (1, 1)(y − 1) = 0 + 1(x − 1) − 3(y − 1) (6.3.8) If we now consider the overall system (6.3.6), for (x , y) near (1, 1) we have the approximation x = f (x , y) ≈ −3(x − 1) + 1(y − 1) y = g (x , y) ≈ 1(x − 1) − 3(y − 1)

(6.3.9)

Using the fact that both equations in (6.3.9) are linear and writing this system in matrix form with x = [x y ]T , we have x ≈

1 −3 1 −3

x −1 1 1 −1 −3 −3 x+ = y −1 1 −3 1 −3 −1 1 2 −3 x+ (6.3.10) = 1 −3 2

Hence we have approximated the original nonlinear system with a linear one by writing it in the form x ≈ A(x − a) = Ax + b, where b = −Aa, for x near a. Because we have found that we may approximate the system (6.3.6) with the linear system (6.3.10), we can now use our understanding of linear systems to determine the behavior of the nonlinear system near the chosen equilibrium point. Speciﬁcally, the fact that the eigenvalues of the matrix A in (6.3.10) are λ = −2 and λ = −4 tells us that the equilibrium solution (1, 1) of (6.3.1) is a stable, attracting node, as we initially conjectured graphically from ﬁgure 6.4. Moreover, the approach we have taken in example 6.3.1 may certainly be generalized. Any nonlinear system of two differential equations may be written in the form x = F(x)

(6.3.11)

where F is a function of the form F(x) = F(x , y) = (f (x , y), g (x , y)). Given an equilibrium solution of (6.3.11) at a = (a , b), notice that F(a) = 0; in particular, f (a , b) = g (a , b) = 0.

Linear approximations of nonlinear systems

403

If, as in example 6.3.1, we approximate f and g near (a , b) with f (x , y) ≈ f (a , b) + fx (a , b)(x − a) + fy (a , b)(y − b) = fx (a , b)(x − a) + fy (a , b)(y − b)

g (x , y) ≈ g (a , b) + gx (a , b)(x − a) + gy (a , b)(y − b) = gx (a , b)(x − a) + gy (a , b)(y − b)

we observe that in matrix form we have x = F(x) f (x , y) = g (x , y) fx (a , b)(x − a) + fy (a , b)(y − b) ≈ gx (a , b)(x − a) + gy (a , b)(y − b) fx (a , b) fy (a , b) x − a = gx (a , b) gy (a , b) y − b In matrix notation, we have written that x = F(x) ≈ J(a)(x − a) for x near a, where a is an equilibrium point of the original system and J(a) is a matrix with constant entries. The matrix J(a), which is deﬁned by fx (a , b) fy (a , b) J(a) = (6.3.12) gx (a , b) gy (a , b) is known as the Jacobian matrix of the function F evaluated at the point (a , b). More generally, for any differentiable function F : Rn → Rm given by F(x) = F(x1 , . . . , xn ) = (f1 (x1 , . . . , xn ), . . . , fm (x1 , . . . , xn )), the Jacobian matrix J(x) is given by ⎤ ⎡ ∂ f1 /∂ x1 ∂ f1 /∂ x2 · · · ∂ f1 /∂ xn ⎢ ∂ f2 /∂ x1 ∂ f2 /∂ x2 · · · ∂ f2 /∂ xn ⎥ ⎥ ⎢ (6.3.13) J(x) = ⎢ ⎥ .. .. .. .. ⎦ ⎣ . . . . ∂ fm /∂ x1 ∂ fm /∂ x2 · · · ∂ fm /∂ xn

The Jacobian enables us to write the linearization of any differentiable function F for x near a point a as F(x) ≈ F(a) + J(a)(x − a)

(6.3.14)

which is remarkably similar to the tangent line approximation (6.3.3). Note that we must evaluate the Jacobian matrix at the point a of interest; moreover, if we are working with a nonlinear system of differential equations with equilibrium point a, it follows that F(a) = 0, so that we have x = F(x) ≈ J(a)(x − a)

(6.3.15)

404

Nonlinear systems of differential equations

This entire discussion of linearizing nonlinear systems is important for several reasons. One is that it demonstrates how we can take a problem we do not fully understand (the nonlinear system) and gain more knowledge of it by approximating the system near a point of interest with a simpler (linear) system that we do understand. Moreover, because we have completely classiﬁed the stability of equilibria of linear systems through the eigenvalues of the system’s matrix, we classify the equilibria of nonlinear systems by doing so for the corresponding linearization. We will use the same terminology and classiﬁcation scheme for equilibria of nonlinear systems that we established for linear ones in sections 3.4 and 3.5. Two examples now follow to demonstrate these ideas in greater detail. Example 6.3.2

Given the system of differential equations x1 = 9x2 − x22 x2 = x1

determine all equilibrium points of the system, evaluate the Jacobian at each equilibrium point, and ﬁnd a corresponding linearization of the system in order to analyze the behavior of trajectories near each equilibrium point and the stability of equilibria. Finally, plot the direction ﬁeld of the given system to conﬁrm the observations made. Solution.

First, we observe that x = F(x) for F(x) = F(x1 , x2 ) = (f (x1 , x2 ), g (x1 , x2 )) = (9x2 − x22 , x1 )

Setting x = 0, it follows that x1 = 0 and x2 (9 − x2 ) = 0, so that the equilibrium points of the system are (0, 0) and (0, 9). Taking the appropriate partial derivatives, the Jacobian of F is 0 9 − 2x2 J(x) = 1 0 Therefore, for values of x1 and x2 near the equilibrium point a = (0, 0) = 0, we have that x = F(x) ≈ J(0)(x − 0), or 0 9 x x ≈ 1 0 For this linear system, the eigenvalues of the matrix J(0) are λ = 3 and λ = −3, so the origin is a saddle point and therefore unstable. Moreover, we expect there to be two approximately straight-line solutions (along the respective eigenvectors of J(0)) that pass through the origin, along one of which the solution tends toward (0, 0) while on the other the solution is repelled away from (0, 0). For x1 and x2 near the equilibrium point a = (0, 9), we have that x = F(x) ≈ J(a)(x − a), or 0 −9 x1 − 0 0 −9 81 x+ = x ≈ 1 0 x2 − 9 1 0 0

Linear approximations of nonlinear systems

405

x2 10.0

5.0

x1 −10

−5

5

10

−2.5 Figure 6.9 The

direction

ﬁeld

for

Example 6.3.2.

For this nonhomogeneous linear system, the eigenvalues of the matrix J(0, 9) are λ = 3i and λ = −3i. Because the eigenvalues are purely imaginary, it follows that the equilibrium point (0, 9) is a stable center. Nearby this point, we expect to see trajectories orbit the point in approximately elliptical loops. All of our observations are conﬁrmed by the graphical behavior evidenced in ﬁgure 6.9.

Example 6.3.3 For the system of differential equations x1 = sin x2 x2 = x2 − x12

(6.3.16)

determine all equilibrium points of the system, evaluate the Jacobian at each equilibrium point, and ﬁnd a corresponding linearization of the system in order to analyze the behavior of trajectories near each equilibrium point and the stability of equilibria. Finally, plot the direction ﬁeld of the given system to conﬁrm the observations made. Solution. The given system is the same one that we studied in example 6.2.1 in the preceding section. There we discovered that for any equilibrium solution x = (x1 , x2 ), x2 must be any integer multiple of π and x12 = x2 , so that √ x2 must be √ non-negative. Thus, the equilibrium solutions have the form ( k π, k π ), (− k π , k π ) for k = 0, 1, 2, . . .. Letting x = F(x) = (sin x2 , x2 − x12 ), it follows that the Jacobian of F is 0 cos x2 J(x) = −2x1 1

406

Nonlinear systems of differential equations

For values of x1 and x2 near the equilibrium point a = (0, 0) = 0, we have that x = F(x) ≈ J(0)(x − 0), or 0 1 x ≈ x 0 1 The eigenvalues of the matrix J(0) are λ = 0 and λ = 1, so the origin is unstable, because the real eigenvalue λ = 1 > 0 will drive solutions away from the origin as t → ∞. Moreover, because λ = 0 is an eigenvalue of J(0), it also follows that all solutions near 0 are approximately straight-line solutions. √ For x1 and x2 near the equilibrium point a = ( π, π ), we have that x = F(x) ≈ J(a)(x − a), or √ √ 0 −1 x1 − π = √ 0 −1 x + π x ≈ −2 π 1 x2 − π −2 π 1 π √ The eigenvalues of the matrix J( π, π )√are approximately λ = 2.448 and λ = −1.448, and so the equilibrium point ( π, π ) is a saddle √ point and unstable. However, if we consider the equilibrium point a = (− π, π ), we have that x = F(x) ≈ J(a)(x − a), or √ 0 −1 x1 + π 0 −1 π = √ x ≈ √ x+ 2 π 2 π 1 x2 − π 1 π √ In this case, the eigenvalues of the matrix J(− π, π ) are approximately λ = 0.5 ± 1.815i. Because these complex√eigenvalues have positive real parts, it follows that the equilibrium solution (− π, π ) is a spiral source and √ is unstable. If we continue exploring equilibrium points of the form (± √k π, k π ), we can show through the Jacobian √that whenever k is odd, the point ( k π, k π ) is a saddle point √ Conversely, whenever √and the point (− k π, k π ) is a spiral source. k is even, ( k π, k π ) is a spiral source and the point (− k π, k π ) is a saddle. In particular, every equilibrium point of the system is unstable. These observations are all conﬁrmed in the direction ﬁeld shown in ﬁgure 6.10. Through linear approximation, the tools we developed for linear systems enable us to understand and classify the stability of equilibria and behavior of solutions near equilibrium points for nonlinear systems. In the next section, we will explore how to actually compute approximate solutions via Euler’s method for systems. Exercises 6.3 In exercises 1– 6, ﬁnd the Jacobian of the given function, F. 1. F(x1 , x2 ) = (x12 + x2 , x1 − x22 ) 2. F(x1 , x2 ) = (e 2x1 x2 , cos x1 + sin x2 ) 3. F(x1 , x2 ) = (x2 − 2x1 x2 , 4x1 x2 − x1 )

Linear approximations of nonlinear systems

407

x2 7.5 5.0 2.5 x1 −3

−2

−1

1

2

3

Figure 6.10 The direction ﬁeld for

the system (6.3.16)√with equilibrium √ points √ (0, 0), (− √π , π ), ( π, π ), (− 2π , 2π ), and ( 2π , 2π ).

4. F(x1 , x2 ) = (4 − x22 , 1 − x12 ) 5. F(x1 , x2 , x3 ) = (1/(1 + x12 + x22 + x32 ), e −x1 −x2 −x3 , 2x1 − 3x22 + x34 ) 2

2

2

6. F(x1 , x2 , x3 ) = (3x1 − x2 + 4x3 , x1 + x2 − 2x3 , −2x1 + 5x2 − x3 ) In exercises 7–10, ﬁnd the linearization of the given function, F(x1 , x2 ), at the given point a. 7. F(x1 , x2 ) = (x12 + x2 , x1 − x22 ),

a = (1, −1)

8. F(x1 , x2 ) = (x2 e 2x1 , cos x1 + sin x2 ), 9. F(x1 , x2 ) = (x2 − 2x1 x2 , 4x1 x2 − x1 ), 10. F(x1 , x2 ) = (4 − x22 , 1 − x12 ),

a = (π/2, 0) a = (1/2, 1/4)

a = (−1, 2)

In exercises 11–17, ﬁnd all equilibrium points of the system, determine the linearization of the given system near each equilibrium point, classify the stability of each equilibrium point, and compare your work to a plot of the direction ﬁeld for the system.1 11. x1 = x2 − 2x1 x2 x2 = 4x1 x2 − x1

12. x1 = 4 − x22

x2 = 1 − x1 + x2

1

Note that in the exercises of section 6.2, equilibrium solutions were found and direction ﬁelds were plotted in exercises 1–7, which correspond to the same systems of differential equations given here.

408

Nonlinear systems of differential equations

13. x1 = cos x2 x2 = 1 − sin x1 14. x1 = 2x1 − x2 x2 = −4x1 + 2x2 15. x1 = e −x2 x2 = 1/(1 + x12 ) 16. x1 = ln(2 + x2 ) x2 = x12 + x2 17. x1 = x2 − x12 x2 = x1 − 8x22 18. Recall from section 6.1 that the nonlinear system of differential equations W = −0.75W + 0.25MW M = 0.5M − 0.1MW models the numbers of wolves and moose (each measured in hundreds) in a predator-prey situation. Determine the linearization of the system near the nonzero equilibrium solution, classify the stability of this equilibrium, and discuss the long-term behavior of the wolf and moose populations.2 19. Recall that if x1 = θ is the angle that the arm of a pendulum forms with the positive x-axis (as shown in ﬁgure 6.2) and x2 = x1 = θ , then x1 and x2 satisfy the nonlinear system of differential equations x1 = x2 g x2 = − sin x1 L Let g = 9.8 m/s2 and L = 2 m. Determine the linearization of the system near the equilibrium solution at zero and at least one other equilibrium solution, classify the stability of these equilibria, and discuss the long-term behavior of the pendulum. Be sure to relate your answers directly to the behavior of the pendulum and corresponding initial conditions. 20. In example 6.2.2, we considered the system of differential equations given by x1 = −x1 + x1 x22

x2 = −2x2 + x2 x1 Determine the linearization of the system near each equilibrium solution, classify the stability of each equilibrium point, and discuss the behavior of solutions nearby.

2

In the exercises of section 6.2, equilibrium solutions were found and the direction ﬁeld was plotted for this system in exercise 8; similarly, see the results of exercise 9 in section 6.2 for use in the problem 19 below.

Euler’s method for nonlinear systems

409

6.4 Euler’s method for nonlinear systems

Just as we experienced with single nonlinear initial-value problems such as y = ye −y + 1,

y(0) = 1

(6.4.1)

or y(0) = −1 (6.4.2) y = t 2 + y 2 + 1, that we could not solve explicitly, in the past two sections we have encountered systems of nonlinear differential equations for which solutions to corresponding initial-value problems cannot be determined analytically. We therefore desire to explore ways to estimate solutions to these problems. For IVPs such as (6.4.1) and (6.4.2), we know that we may estimate a solution to the problem through Euler’s method. Recall from Section 2.6 that for any ﬁrst-order IVP in the form y = f (t , y), y(t0 ) = y0 , given a step-size h we are able to generate the sequence of points (t1 , y1 ), . . . , (tn , yn ) such that tn+1 = tn + h and yn+1 = yn + hf (tn , yn ),

for n ≥ 0

(6.4.3)

where yn ≈ y(tn ). That is, yn approximates the solution y to the initial-value problem at the point where t = tn . To explore how we can extend Euler’s method to systems of differential equations, let us consider the initial-value problem given by x = 9y − y 2 , x(0) = 1 y(0) = 8 y = x,

(6.4.4)

Here, we choose to use the notation x = [x y ]T rather than [x1 x2 ]T due to the fact that we will be using subscripts to label approximations to the component solutions x(t ) and y(t ). Keeping in mind that x and y are each implicit functions of t , we can view (6.4.4) as being of the form x = f (x , y , t ), y = g (x , y , t ),

x(t0 ) = x0 y(t0 ) = y0

(6.4.5)

To see how to approximate solutions to this system of IVPs, let us reconsider our earlier studies of single differential equations. In section 2.6, we considered the equation y = f (t , y) in a ﬁrst-order IVP and emphasized the fact that Euler’s method relies on following the tangent line approximation to y(t ) at each step. In particular, if we have some approximation yn to the solution y at the t -value tn , then to move along the tangent line to the next approximation (tn+1 , yn+1 ), it follows that yn+1 = yn + y = yn +

y · t t

= yn + m · t

(6.4.6)

where m is the slope at each step of our approximation given by m = y = f

(t , y) in the differential equation that we are attempting to solve. Speciﬁcally, given

410

Nonlinear systems of differential equations

the approximation yn at time tn , the slope of the tangent line to the solution curve at this point is f (tn , yn ). Therefore, using this value for m in (6.4.6), letting h = t be the step size, we have yn+1 = yn + hf (tn , yn )

(6.4.7)

An essentially identical approach will work for the system (6.4.5). In particular, given the initial condition (x0 , y0 ) and a step-size h, we can generate the approximate solution (x(t1 ), y(t1 )) ≈ (x1 , y1 ) by taking x1 = x0 + h · f (t0 , x0 , y0 ) y1 = y0 + h · g (t0 , x0 , y0 )

(6.4.8)

The only difference between this approach and our experience with Euler’s method for a single equation is that we obviously have to update two approximations at once, as estimates of both x(tn ) and y(tn ) are needed to generate approximations of x(tn+1 ) and y(tn+1 ). We generalize our latest observation in (6.4.8) for a step from the approximation (xn , yn ) to the approximation (xn+1 , yn+1 ) by xn+1 = xn + h · f (tn , xn , yn ) yn+1 = yn + h · g (tn , xn , yn )

(6.4.9)

At the end of this section, we will discuss the implementation of Euler’s method for systems in Excel. For now, we simply report the results of such an implementation here to see the approximations generated. For the original system we considered above, x = 9y − y 2 , y = x,

x(0) = 1 y(0) = 8

(6.4.10)

recall that this system was also studied in example 6.3.2 in section 6.3. There we observed that the equilibrium solution (0, 9) is a stable center of the system and that we expect elliptical orbits nearby. If, for the IVP (6.4.10), we choose a step-size of h = 0.1 and take enough steps to complete the expected loop in the orbit, we see the abbreviated data in table 6.1. In particular, we notice that after taking a sufﬁcient number of steps to loop back around to near the initial condition (1, 8), we have in fact not returned to this point; in fact, we have missed it appreciably with the two nearest approximations being (0.527, 6.259) and (2.243, 6.312). If we decrease the step size h and take more steps, we can improve the accuracy of the approximation. Doing so with h = 0.01 results in the values in table 6.2. We see that the approximate trajectory has completed one full loop and has nearly returned to pass through the point (1, 8) where the trajectory began. This behavior is more consistent with what we expected based on the classiﬁcation of the equilibrium point (0, 9) as a stable center through linearization in the preceding section.

Euler’s method for nonlinear systems

411

Table 6.1 Euler’s method applied to (6.4.10) with step-size h = 0.1 tn

xn

yn

0

1

8

0.1

1.8

8.1

0.2

2.529

8.28

0.3 .. .

3.12516 .. .

8.5329 .. .

2

−1.146540202

6.373703158

2.1

0.527383445

6.259049138

2.2

2.242958058

6.311787483

2.3

3.93970067

6.536083289

Table 6.2 Euler’s method applied to (6.4.10) with step-size h = 0.01 tn

xn

yn

0

1

8

0.01

1.08

8.01

0.02

1.159299

8.0208

0.03 .. .

1.237838674 .. .

8.03239299 .. .

2.09

0.934286677

7.878994865

2.1

1.022610614

7.888337731

2.11

1.110302289

7.898563837

2.12

1.197299927

7.90966686

In the ﬁrst example with Euler’s method we just completed, we observe one of the major weaknesses of the method: when a large number of steps are needed and some of the changes in x and y are large, a substantial amount of roundoff error enters the calculations. While more sophisticated numerical methods exist

412

Nonlinear systems of differential equations

(and are studied in chapter 7), for now we limit ourselves to Euler’s method in order to ﬁrst get an intuitive feel for the numerical behavior of approximate solutions. Another example follows. Example 6.4.1

For the system of initial-value problems given by x = y − x 3, y = x − y 3,

x(0) = 2 y(0) = −1

(6.4.11)

estimate the solution to the IVP up to t = 5 using h = 0.1 and comment on the behavior of the trajectory. Solution. In the given problem, if we take the perspective that x = f (t , x , y) and y = g (t , x , y), then it follows that f (t , x , y) = y − x 3 and g (t , x , y) = x − y 3 . Applying (6.4.9) with h = 0.1, we have xn+1 = xn + 0.1 · (yn − xn3 ) yn+1 = yn + 0.1 · (xn − yn3 ) Beginning this iteration with x0 = 2 and y0 = −1, we generate the following table. tn

xn

yn

0

2

−1

0.1

1.1

−0.7

0.2

0.8969

−0.5557

0.3 .. .

0.769180708 .. .

−0.448849846 .. .

4.7

0.994536765

0.994533281

4.8

0.995620126

0.995618024

4.9

0.996490144

0.996488877

5

0.997188297

0.997187534

In the table, we see behavior consistent with the fact that the equilibrium point (1, 1) of the system is a stable attracting node. In addition, the numerical data is in agreement with the graphical behavior we expect based on the direction ﬁeld in ﬁgure 6.4 where we ﬁrst considered the given nonlinear system. This behavior is also seen in the following plot in ﬁgure 6.11, which shows the (xn , yn ) data from n = 0, . . . , 50 generated by Excel.

Euler’s method for nonlinear systems

413

1.5 1.0 0.5 0

0.5

1.0

1.5

2.0

Series1

−0.5 −1.0 −1.5 Figure 6.11 The trajectory for the IVP (6.4.11) generated by Euler’s method

with h = 0.1.

Example 6.4.1 shows that when small changes in t lead to very small changes in x(t ) and y(t ), such as near a stable, attracting node, Euler’s method produces reasonable approximations without having to resort to extremely small h-values. We also see the importance of having a theoretical understanding of the expected behavior in advance of executing computations in order to check the reasonableness of our results. 6.4.1 Implementing Euler’s method for systems in Excel

Just as we did for single initial-value problems in section 2.6.1, we will use Excel to generate approximate solutions to system IVPs. In this setting, given an initial value problem x = f (x , y , t ), y = g (x , y , t ),

x(t0 ) = x0 y(t0 ) = y0

(6.4.12)

we seek approximations x1 , x2 , . . . and y1 , y2 , . . . such that (xn , yn ) ≈ (x(tn ), y(tn )), where tn+1 = tn + h for some chosen step-size h. In particular, we have shown that these approximations are generated using Euler’s method by the rule xn+1 = xn + h · f (tn , xn , yn ) yn+1 = yn + h · g (tn , xn , yn )

(6.4.13)

In a spreadsheet, we will view the following data: step number n, stepsize h, tn , xn , yn , f (tn , xn , yn ), and g (tn , xn , yn ), where tn is the value of the independent variable and (xn , yn ) ≈ (x(tn ), y(tn )) is an estimate to the solution to the IVP at

414

Nonlinear systems of differential equations

the value tn . This data will appear in a given row where the row contains all these values for the corresponding n-value. From this, we naturally build subsequent approximations (xn+1 , yn+1 ) based on the preceding row. We will demonstrate the development of such an Excel spreadsheet for the particular example x = y − x 3, y = x − y 3,

x(0) = 2 y(0) = −1

(6.4.14)

that we investigated in example 6.4.1. To begin, we establish names for the various columns, say in cells A1 through G1, and see on our screen in Excel the information below.

1

A

B

n

h

C t

D n

x

E n

y

F n

f(x

n,y

G n) g(x

n,y

n)

In most of the examples we consider with Euler’s method, the system will be autonomous (i.e., t is implicit in the functions f and g ), and therefore we choose to omit t from the column labels for f (tn , xn , yn ) and g (tn , xn , yn ). In the subsequent row 2, we now enter the given data at step zero. In particular, in cell A2 we enter the step number (“0”), in B2 the chosen stepsize (“0.1”), in C2 the starting t -value (“0”), in D2 the starting x-value (“2”), and in E2 the starting y-value (“-1”). Next, in F2, we apply the function f (t , x , y) to get the slope at the point at this step. That is, since in this IVP f (t , x , y) = y − x 3 , we enter in F2 the command “= E2 - D2ˆ3”. Similarly, since g (t , x , y) = x − y 3 , in G2 we enter “= D2 - E2ˆ3”. Now our spreadsheet appears as follows. A

B

1

n

h

2

0

0.1

C t

D n

0

x

E n

2

y

F n

-1

f(x

n,y -9

G n) g(x

n,y

n)

3

In the next row, row 3, we may now build subsequent entries based on existing data. To increase the step number, in A3 we enter “= A2 + 1”. Since the step-size stays constant throughout, in B3 we input “= B2”. Since the next t -value will be the preceding t -value plus the stepsize (t1 = t0 + h), we enter in C3 the command “= C2 + B2”. To compute the next x-value in cell D3 from Euler’s method, we know that x1 = x0 + hf (t0 , x0 , y0 ). Hence, in D3 we write “= D2 + B2*F2”. Similarly, to compute y1 = y0 + hg (t0 , x0 , y0 ), in cell E3 we enter “= E2 + B2*G2”. Finally, we also need values of f (t1 , x1 , y1 ) and g (t1 , x1 , y1 ) for use in the following step. This involves simply updating the functions f (t , x , y) and g (t , x , y) at the given t -, x-, and y-values, so we select cell F2, copy it, and paste it into cell F3. Equivalently, we can directly enter in F3 “= E3 - D3ˆ3”.

Euler’s method for nonlinear systems

415

We can similarly copy G2 into G3, or in G3 enter “= D3 - E3ˆ3”. Below is the current state of our spreadsheet. A

B

1

n

h

C

2

0

0.1

0

2

-1

-9

3

3

1

0.1

0.1

1.1

-0.7

-2.031

1.443

t

D n

x

E n

y

F n

f(x

n,y

G n) g(x

n,y

n)

Now we can harness the power of Excel to compute as many subsequent steps as we like. By using the mouse to highlight row 3, and then placing the cursor on the bottom right corner of cell E3, we can click and drag downward to ﬁll subsequent rows with similar calculations. For example, doing so through row 7 yields the following. A

B

1

n

h

C

2

0

0.1

0

3

1

0.1

0.1

1.1

-0.7

-2.031

1.443

4

2

0.1

0.2

0.8969

-0.5557

-1.2771929

1.0685015

5

3

0.1

0.3

0.7691807 -0.4488498

-0.9039271

0.8596087

6

4

0.1

0.4

0.6787879 -0.3628889

-0.6756426

0.7265762

7

5

0.1

0.5

0.6112237 -0.2902313

-0.5185811

0.6356711

t

D n

x

E n

2

y

F n

-1

f(x

n,y

G n) g(x

-9

n,y

n)

3

As we have noted previously, besides the relative simplicity of these computations, there are further advantages Excel offers. One is that changing one appropriately chosen cell will update all of our computations. For example, if we are interested in the change induced by a different step-size, say h = 0.01, all we need to do is enter “0.01” in cell B2, and every other cell will update accordingly. In addition, if we desire to see the graphical results of our work, we can use Excel’s Chart Wizard. To plot the trajectory generated by our approximations, we can simultaneously highlight the x and y columns in our chart above (cells C2 through C7 and D2 through D7), and then go to Insert menu and select Chart (alternatively, we may click on the Chart Wizard icon on the toolbar). In the prompt window that arises, we choose “XY (Scatter)” and select one of the graph style options at the right. By clicking “Next” in a few subsequent windows (in which advanced users can avail themselves of more options), we eventually get to a ﬁnal window where our graph appears and the option to “Finish.” Clicking on “Finish,” the graph will appear in the spreadsheet and may be moved around by clicking and dragging it accordingly. We see the resulting plot displayed as in ﬁgure 6.12. Exercises 6.4 In exercises 1–7, use Euler’s method with the stated h-value to estimate the solution of the given system of IVPs at the given t -value. Compare your work to

416

Nonlinear systems of differential equations

0.5

1.0

1.5

2.0

−0.2 −0.4 −0.6

Series1

−0.8 −1.0

Figure 6.12 An Excel plot of an approximate solution to the IVP (6.4.14).

a plot of the direction ﬁeld for the system and the classiﬁcation of any relevant equilibrium solutions.3 1. x = y − 2xy , y = 4xy − x ,

x(0) = 0.75 y(0) = 0.5

t = 1, h = 0.1

2. x = 4 − y 2 , y = 1 − x + y,

x(0) = −2 y(0) = −1

t = 1, h = 0.05

3. x = cos y , y = 1 − sin x ,

x(0) = 2 y(0) = 3

4. x = 2x − y , y = −4x + 2y , 5. x = e −y , y = 1/(1 + x 2 ),

x(0) = 0 y(0) = 0

6. x = ln(2 + y), y = x2 + y,

x(0) = −1 y(0) = −0.5

7. x = y − x 2 , y = x − 8y 2 ,

3

x(0) = 1 y(0) = 1

x(0) = 1 y(0) = 0.75

t = 1, h = 0.1 t = 1, h = 0.1 t = 1, h = 0.05 t = 1, h = 0.1 t = 1, h = 0.05

In the exercises of section 6.2, equilibrium solutions were found and direction ﬁelds were plotted in exercises 1–7, which correspond to the same systems of differential equations given here. Similarly, in section 6.3, equilibrium solutions were classiﬁed through linearization in exercises 11–17, which also correspond to these systems.

For further study

417

8. Recall from section 6.1 that the nonlinear system of differential equations W = −0.75W + 0.25MW M = 0.5M − 0.1MW models the numbers of wolves and moose (each measured in hundreds), in a predator-prey model where time is measured in years. Assume that at time t = 0 there are 250 moose and 550 wolves. Estimate the numbers of moose and wolves present at t = 3, 6, and 9 years using a step-size of (a) h = 0.1, and (b) h = 0.01. Discuss your ﬁndings and describe the behavior of the trajectory.4

6.5 For further study 6.5.1 The damped pendulum

In our development of the pendulum equation, we learned that for a pendulum with an arm of length L and bob of mass m, the angle θ that the arm forms with the positive x-axis at time t satisﬁes the IVP L θ = −g sin θ,

θ (0) = θ0 , θ (0) = θ0

(6.5.1)

provided that we assume no friction is present in the screw from which the pendulum hangs and there is no air drag on the bob. Here, we investigate the effects of such resistance on the pendulum’s behavior. (a) Under the natural assumption that the friction or damping that is present is directly proportional to the velocity of the bob along the arc of motion, explain why it follows the pendulum is governed by the IVP L θ = −g sin θ − c θ ,

θ (0) = θ0 , θ (0) = θ0

(6.5.2)

where c is the damping constant. (b) Using the standard change of variables, convert the nonlinear second-order IVP (6.5.2) to a nonlinear system of ﬁrst-order IVPs. Write the system in the form x = F(x) for an appropriate function F. (c) Determine all equilibrium solutions of the system in (b). Are the equilibria different from those of the undamped pendulum? (d) Let a given pendulum have an arm of length L = 1 m, and recall that g = 9.8 m/sec2 . For each of the c-values c = 0.5, c = 1, c = 2, and c = 5, plot the direction ﬁeld for the system in (b) as well as trajectories that correspond to the stated initial conditions below. For each plot, discuss the 4 In the exercises of section 6.2, equilibrium solutions were found and the direction ﬁeld was plotted for this system in exercise 8.

418

Nonlinear systems of differential equations

behavior of the pendulum over time and how damping affects the observed behavior. (i) θ (0) = 2, θ (0) = 0 (ii) θ (0) = 4, θ (0) = 0 (iii) θ (0) = 2, θ (0) = 10 (iv) θ (0) = 2, θ (0) = −10 In addition, be sure to discuss the physical interpretation of each set of initial conditions and how these conditions affect the trajectories. (e) Using c = 1, ﬁnd the linear approximation of the system in (b) at two different equilibrium points, one that is stable and another that is unstable. Discuss the graphical behavior of the two linear systems you ﬁnd near the equilibrium points and how this compares to the plot of the corresponding direction ﬁeld in (d). (f) Again using c = 1 and L = 1, apply Euler’s method with h = 0.01 to the system in (b) with the initial conditions θ (0) = 2, θ (0) = 10. Experiment with how many steps are needed in order to have the approximations approach the stable equilibrium (2π, 0), plot the approximations you compute, and compare the results to the appropriate direction ﬁeld in (d). 6.5.2 Competitive species

In our development of the predator–prey equations, we used the fundamental assumption that the prey population would, in the absence of a predator, grow according to an exponential model, and similarly that the predator would decay exponentially if no prey is available. These hypotheses led us to equations of the form x = ax − cxy (6.5.3) y = −by + dxy where x is the prey population and y represents the number of predators. Recall that the terms −cxy and dxy represent a fraction of the number of predator–prey interactions that are, respectively, harmful or beneﬁcial to the two species. In what follows, we consider a similar scenario where, instead of one species preying on the other, two species are competing for resources. In this setting, species interactions (modeled by “xy”) are harmful to both species. In addition, rather than assuming exponential growth or decay for the individual populations, we explore the affects of the assumption that each population on its own grows logistically. (a) Assume that in the absence of another species competing for resources, the population x(t ) grows according to the logistic model " x# x = ax 1 − A

For further study

419

where a and A are positive constants (a is the population’s growth constant and A is its carrying capacity). Similarly, for a second population y(t ), assume that without another competing species present y(t ) is governed by the model " y# y = by 1 − B where b and B are positive constants. By viewing a fraction of the interactions xy as harmful, we can subtract from each of the above differential equations a term proportional to xy – say α xy from x and β xy from y – to account for this competition. Do so, and show that the populations x(t ) and y(t ) satisfy the system of equations given by x = ax(1 − A1 x − αa y) y = by(1 − B1 y − βb x)

(6.5.4)

(b) Throughout the remaining questions, we assume that x and y represent populations measured in thousands. We explore the impact of different constants in the equations, as well as various initial conditions. In (6.5.4), let a = 0.5, b = 0.25, A = 5, B = 2, α = 0.04, and β = 0.02. Find all equilibrium points of the system. (Hint: there are more than two equilibria.) (c) At each of the equilibrium points determined in (b), compute the linearization of the system (6.5.4), and hence determine the stability of the equilibrium point. (d) In an appropriate window, plot the direction ﬁeld for the system (6.5.4) and discuss how the direction ﬁeld supports your conclusions regarding the stability of various equilibrium points in (c). Discuss the long-term behavior of the two populations for several different initial conditions. (e) With the initial conditions x(0) = 2, y(0) = 2, use Euler’s method for systems to estimate the values of the populations at a range of time values. Use a step size of h = 0.1 and compare your results to the plot in (d). (f) In (6.5.4), use the parameter values given in (b), except change the carrying capacity of the second population to B = 15. Respond to prompts (b), (c), (d), and (e) for this scenario and compare and contrast the updated system with the ﬁrst one considered. In the new situation, which population will dominate in the long run? Why do you think this is the case? (g) In (6.5.4), let a = 0.5, b = 0.25, A = 5, and B = 2, but now adjust the parameters α and β to reﬂect greater competition for resources by setting α = 0.4, and β = 0.2. Respond to prompts (b), (c), (d), and (e) for this scenario and compare and contrast the updated system with the ﬁrst one considered. In the new situation, which population is more likely to

420

Nonlinear systems of differential equations

dominate in the long run? For which initial conditions is the weaker population able to survive? (h) Suppose there are three different species x, y, and z, all competing for resources. Under the assumption that population interactions xy and xz are harmful to x, and so on, what system of differential equations models the behavior of the three species?

7 Numerical methods for differential equations

7.1 Motivating problems

In previous chapters, we have learned to solve a wide range of differential equations. Primarily, our focus has been on linear differential equations: ﬁrstorder linear equations, higher order linear equations with constant coefﬁcients, and systems of linear equations with constant coefﬁcients. Indeed, we have learned through a variety of techniques that under the proviso that a differential equation or system is linear, we can almost always ﬁnd a solution. The situation is much more complicated for nonlinear equations. For example, while we can use an integrating factor to solve the linear ﬁrst-order differential equation y + y = t , if we replace y by y 2 , the differential equation y + y2 = t

(7.1.1)

is no longer linear. In addition, (7.1.1) is not separable, nor is it exact. With none of our established analytical methods available, it appears that we cannot solve this differential equation. If faced with the related initial-value problem y + y2 = t ,

y(0) = 1

(7.1.2)

we know that we can visually approximate a solution by plotting the direction ﬁeld that corresponds to the differential equation. Moreover, we learned in section 2.6 that we can generate a sequence of estimates of the values of the solution y(t ) at discrete t -values separated by a step-size h according to the rule tn+1 = tn + h

and yn+1 = yn + hf (tn , yn ), for n ≥ 0 421

(7.1.3)

422

Numerical methods for differential equations

The algorithm that generates this sequence of approximations is called Euler’s method. We encounter the same difﬁculties with higher order differential equations. While we can solve almost any higher order linear equation with constant coefﬁcients, such as y + a1 y + a0 y = f (t ) nonlinear equations are much more difﬁcult. For instance, as discussed in section 6.1, a simple pendulum may be modeled by the nonlinear second-order initial-value problem g θ + sin θ = 0, θ (0) = θ0 , θ (0) = θ1 (7.1.4) L where θ (t ) is the angle the arm of the pendulum forms with a vertical axis at time t . In chapter 6, we introduced several different approaches to approximate the solution to (7.1.4); each was based on converting the second-order equation to a system of ﬁrst-order equations and approximating the solution to the resulting system. Finally, nonlinear systems of differential equations are important in their own right. A prominent example is the predator–prey equations, discussed in detail in section 6.1, where two populations M (t ) and W (t ) (in hundreds) are modeled by the following system of nonlinear ﬁrst-order initial-value problems: W = W (−0.75 + 0.25M ), W (0) = 3 (7.1.5) M = M (0.5 − 0.1W ), M (0) = 7 As with the pendulum, the nonlinearity of these equations makes determining an analytical solution (i.e., formulas for W (t ) and M (t )) impossible, and therefore we must instead be content to ﬁnd approximate solutions. In section 6.4, we introduced an extension of Euler’s method that can be used to produce some basic approximations to the solution of a system of nonlinear initial-value problems such as (7.1.5). But through a variety of examples considered in sections 2.6 and 6.4, we have seen that Euler’s method has a big downside: each step produces signiﬁcant error, and each step compounds the error from the preceding step. To get an accurate approximation using Euler’s method, a very small step-size h is usually needed. With modern computing power so readily available, we might be tempted to simply take very small h-values in this approach and be content to do thousands of computations to get estimates of solutions. But taking smaller and smaller values of h proves to be an unsatisfactory approach for many reasons, perhaps most signiﬁcantly because of the fact that as numbers get extremely small, computers have great difﬁculty distinguishing them from zero and major round-off errors can result. Instead, we will seek to develop approaches in the spirit of Euler’s method, but more sophisticated in that they naturally reduce the error that comes from using a step of h = t . Our goal is to develop numerical methods for initial-value

Beyond Euler’s method

423

problems (for ﬁrst-order, higher order, and systems) that, given a step-size h, produce an accurate approximate solution to the initial-value problem. We desire that the methods give reasonably good approximations for small (but not too small) values of h, while at the same time not requiring too many calculations. In the upcoming sections, we will discuss problems of the nature of (7.1.2), (7.1.4), (7.1.5), and more, and develop and apply algorithms that produce acceptable approximations to solutions.

7.2 Beyond Euler’s method

To approach an initial-value problem that we cannot solve by standard techniques, such as separation of variables or integrating factors, we have learned that one option is to use Euler’s method. Given the IVP y = f (t , y),

y(t0 ) = y0

this algorithm generates a sequence of points (t1 , y1 ), (t2 , y2 ), . . ., (tn , yn ) according to the rule yn+1 = yn + hf (tn , yn )

for n ≥ 0

(7.2.1)

where tn+1 = tn + h. Each yn is an approximation to the value of the actual solution y at the value tn . That is, y(tn ) ≈ yn . Euler’s method is developed by using the standard tangent line approximation in calculus. While this is instructive and intuitive, the method is the least accurate of many other available methods. In this section, we begin to develop algorithms beyond Euler’s method in an effort to increase the accuracy of our approximations while actually decreasing the number of computations we execute. Before we develop new approaches, we ﬁrst revisit some important concepts from numerical integration in calculus. These ideas not only remind us of key issues in approximation techniques, but also inform our efforts to approximate solutions to initial-value problems. Given a continuous function f (t ) on an t +h interval [t0 , t0 + h ], there are several basic approximations to t00 f (t ) dt . Speciﬁcally, t0 + h f (t ) dt ≈ h · f (t0 ) (left endpoint rule) tt00 +h f (t ) dt ≈ h · f (t0 + h) (right endpoint rule) tt00 +h f (t0 )+f (t0 +h) f (t ) dt ≈ h · (trapezoid rule) t " 2 # t00 +h h f (t ) dt ≈ h · f t0 + 2 (midpoint rule) t0 It is a standard exercise in calculus to show that the left and right endpoint rules are the least accurate approximations of the four, while the midpoint rule is the best. While one can make sophisticated arguments using Taylor series to justify claims about the size of the error in such an approximation, visual arguments are

424

Numerical methods for differential equations

just as convincing: sampling f at the midpoint of the interval usually balances the behavior of the function and leads to the best approximation of the integral of the four options above. There is a direct link between the numerical approximation of deﬁnite integrals and numerical methods to estimate solutions to initial-value problems such as Euler’s method. Given the IVP y (t ) = f (t , y),

y(t0 ) = y0

if we integrate both sides of the differential equation with respect to t from t = t0 to t = t0 + h for some h > 0, then t0 +h t0 +h y (t ) dt = f (t , y(t )) dt (7.2.2) t0

t0

Integrating the left side of (7.2.2), we have t0 +h f (t , y(t )) dt y(t0 + h) − y(t0 ) = t0

or equivalently

y(t0 + h) = y(t0 ) +

t0 +h

f (t , y(t )) dt

(7.2.3)

t0

Estimating the integral in (7.2.3) with the left endpoint rule, y(t0 + h) ≈ y(t0 ) + hf (t0 , y(t0 ))

(7.2.4)

Using the initial condition y(t0 ) = y0 , it follows that y(t0 + h) ≈ y0 + hf (t0 , y0 )

(7.2.5)

which is precisely the ﬁrst step in Euler’s method. That is, we have shown in our efforts to step from t = t0 to t = t0 + h along the solution y(t ) that this process can be equivalently achieved by estimating the value of a deﬁnite integral. Moreover, Euler’s method can be viewed as arising naturally from estimating the required deﬁnite integral through a left endpoint rule. As such, it is not surprising that Euler’s method is not an accurate approach, for neither is the left endpoint rule for approximating integrals. The availability of the trapezoid and midpoint rules as better approximations leads us to consider two improvements upon Euler’s method. 7.2.1 Heun’s method

To improve on Euler’s method, we return to (7.2.3), and instead estimate the deﬁnite integral on the right-hand side with the trapezoid rule. Doing so, we ﬁnd f (t0 , y(t0 )) + f (t0 + h , y(t0 + h)) y(t0 + h) ≈ y(t0 ) + h · (7.2.6) 2

Beyond Euler’s method

425

The difﬁculty in (7.2.6) is that the last term in the approximation on the righthand side involves y(t0 + h), the very quantity we are trying to estimate. One way to view what is occurring in this approach is that we are trying to use not only the slope at (t0 , y0 ), computed as f (t0 , y0 ), but also the slope at (t0 + h , y(t0 + h)). While we do not know y(t0 + h) exactly, we can estimate this value using Euler’s method. In particular, if we use the fact that y(t0 ) = y0 and employ the Euler approximation y(t0 + h) ≈ y0 + hf (t0 , y0 ), then from (7.2.6) we ﬁnd that y(t0 + h) ≈ y0 + h ·

f (t0 , y0 ) + f (t0 + h , y0 + hf (t0 , y0 )) 2

(7.2.7)

Generalizing (7.2.7) to the situation where we are moving from the known approximation y(tn ) ≈ yn at point (tn , yn ) to a new approximation (tn+1 , yn+1 ) with tn+1 = tn + h, we have developed Heun’s method given by yn+1 = yn + h ·

f (tn , yn ) + f (tn+1 , yn + hf (tn , yn )) 2

(7.2.8)

Because this algorithm is more complicated than Euler’s method, some additional notation can assist us in its implementation. We ﬁrst let an = f (tn , yn )

(7.2.9)

which is the slope of the solution curve at (tn , yn ) given by the IVP. We observe that the expression an arises twice in (7.2.8), and that we also have to compute f (tn+1 , yn + han ). We therefore let bn = f (tn+1 , yn + han )

(7.2.10)

It follows that Heun’s method is then executed by computing yn+1 = yn + h ·

a n + bn 2

(7.2.11)

In this light, we see that Heun’s method uses the average of two slopes (the slope at (tn , yn ) and the approximate slope at (tn+1 , yn+1 )) in order to predict the next value of the solution y(t ). We consider an example to demonstrate how Heun’s method is implemented and to contrast its results with those from Euler’s method. Example 7.2.1 Execute ten steps of Heun’s method with h = 0.1 to ﬁnd an approximate solution of the initial-value problem y = 2t (2 − y),

y(0) = 1

Compare the results to Euler’s method as well as the exact solution of the IVP. Solution. Note ﬁrst that the given differential equation is both linear and 2 separable. The exact solution of the IVP is y(t ) = 2 − e −t .

426

Numerical methods for differential equations

To apply Heun’s method, we must compute an , bn , and yn at each step. To begin, a0 = f (t0 , y0 ). From the stated IVP, f (t , y) = 2t (2 − y) and (t0 , y0 ) = (0, 1). Thus, a0 = 2 · 0 · (2 − 1) = 0 In addition, b0 = f (t1 , y0 + ha0 ), so b0 = 2 · 0.1 · (2 − (1 + 0.1 · 0)) = 0.2 With both a0 and b0 calculated, we can now determine y1 to be h 0.1 (0 + 0.2) = 1.01 y1 = y0 + (a0 + b0 ) = 1 + 2 2 Repeating these same steps to determine y2 , we ﬁnd that a1 = f (t1 , y1 ) = f (0.1, 1.01) = 2 · 0.1 · (2 − 1.01) = 0.198 and b1 = f (t2 , y1 + ha1 ) = f (0.2, 1.01 + 0.1 · 0.198) = 2 · 0.2 · (2 − 1.0298) = 0.38808

so that 0.1 (a1 + b1 ) = 1.01 + 0.05(0.198 + 0.38808) = 1.039304 2 Implementing the remaining computations in a program such as Excel, it follows that we can generate the values shown in table 7.1. Included in the table are the approximations generated by Euler’s method, as well as the errors resulting from both methods which are computed by comparison to the exact solution of the IVP. For simplicity, we report the results from every other step in each algorithm. y2 = y1 +

Table 7.1 Euler’s method and Heun’s method applied to the IVP y = 2t(2 − y), y(0) = 1, using h = 0.1 Euler

Heun

Solution

Euler error

Heun error

tn

yn

yn

y(tn )

|y(tn ) − yn |

|y(tn ) − yn |

0

1

1

1

0

0

0.2

1.02

1.039304

1.039210561

0.019989439

0.000093439

0.4

1.115648

1.147959794

1.147856211

0.038539949

0.000103583

0.6

1.267756544

1.302226785

1.302323674

0.053302085

0.000096889

0.8

1.445838152

1.472149858

1.472707576

0.061796472

0.000557718

1

1.618293319

1.630946606

1.632120559

0.062514097

0.001173953

Beyond Euler’s method

427

Obviously, Heun’s method is a major improvement over Euler’s method. In fact, given that we use the Euler approximation at each step to help forecast the next slope encountered, it is somewhat remarkable how accurate Heun’s method is. It can be shown rigorously that the error in Heun’s method is a signiﬁcant improvement over Euler’s method by relating the error in the approximation to the step-size h; it turns out1 that the error in Euler’s method is proportional to h 2 , while the error in Heun’s method is proportional to h 3 . Finally, we might observe that it appears unusual that the error in Heun’s method actually drops from t4 = 0.4 to t6 = 0.6, and that the growth in the error slows in Euler’s method at the same stage. This is due to the fact that the solution function 2 y(t ) = 2 − e −t is an increasing function whose concavity changes (from concave up to concave down) at the point t = 1/2; the change in concavity allows the linear approximations to temporarily catch up, instead of having the error continue to increase at an increasing rate. We have seen that Heun’s method is developed using an application of the trapezoid rule in numerical integration. We consider another similar method (based on the midpoint rule) before introducing more sophisticated techniques in section 7.3. 7.2.2 Modiﬁed Euler’s method

The midpoint rule is normally more accurate than the trapezoid rule.2 Given our experience with Heun’s method and its connection to the trapezoid rule, it makes sense to see if we can develop a related method that uses the perspective of the midpoint rule. Recalling (7.2.3), t 0 +h y(t0 + h) = y(t0 ) + f (t , y(t )) dt t0

if we use the midpoint rule to estimate the integral, then we have to evaluate the integrand at the midpoint t0 + h /2 of the interval [t0 , t0 + h ]. Doing so, h h (7.2.12) y(t0 + h) ≈ y(t0 ) + hf t0 + , y t0 + 2 2 As with Heun’s method, in the context of trying to solve the IVP y = f (t , y), y(t0 ) = y0 , only y(t0 ) is known. Thus, we do not know—and therefore have to estimate—the value of y(t0 + h /2) in (7.2.12). We again employ Euler’s method and write h h ≈ y(t0 ) + f [t0 , y(t0 )] y t0 + (7.2.13) 2 2 1 A more formal analysis of errors that shows the dependence on powers of h is discussed in section 7.3. 2 On an interval where f (x) has consistent concavity, the midpoint rule is approximately twice as accurate as the trapezoid rule.

428

Numerical methods for differential equations

Substituting (7.2.13) in (7.2.12) and replacing y(t0 ) with y0 , h h y(t0 + h) ≈ y0 + hf t0 + , y0 + f (t0 , y0 ) (7.2.14) 2 2 Generalizing (7.2.14) to the situation where we are moving from a known approximation y(tn ) ≈ yn at point (tn , yn ) to the next approximation at (tn+1 , yn+1 ), we have developed the Modiﬁed Euler method given by h h yn+1 = yn + hf tn + , yn + f (tn , yn ) (7.2.15) 2 2 As with Heun’s method, some additional notation assists us in tracking our computations. Let an = f (tn , yn ) and h cn = yn + an 2 so that h yn+1 = yn + hf tn + , cn (7.2.16) 2 We consider an example in order to see the implementation of the Modiﬁed Euler method and to compare its results to those of Heun’s method. We again employ an IVP that we can solve exactly in order to compare the errors of the two methods. Example 7.2.2 Consider the initial-value problem y = e 2t − y, y(0) = 1. Apply the Modiﬁed Euler method to estimate the value of y(1) using h = 0.1 and compare the results with Heun’s method and the exact solution. Solution. Since y = e 2t − y is a linear ﬁrst-order differential equation, we can ﬁnd the general solution y(t ) = Ce −t + 13 e 2t , and hence the exact solution to the IVP is 2 1 y(t ) = e −t + e 2t 3 3 To begin the Modiﬁed Euler method, we know from the given IVP that f (t , y) = e 2t − y and that (t0 , y0 ) = (0, 1). Thus, a0 = f (t0 , y0 ) = e 2·0 − 1 = 0. Next, we observe that c0 = y0 + h2 a0 = 1 + 0.05 · 0 = 1. To compute y1 , by (7.2.16) we have h y1 = y0 + hf t0 + , c0 = 1 + 0.1 · (exp 2(0 + 0.05) − 1) 2 = 1 + 0.1 · 0.105170918 = 1.010517092

Continuing to the next step, a1 = f (t1 , y1 ) = exp (2 · 0.1) − 1.010517092 = 0.210885666. Next, h c1 = y1 + a1 = 1 + 0.05 · 0.210885666 = 1.021061375 2

Beyond Euler’s method

429

Table 7.2 Heun’s method and Modiﬁed Euler’s method (ME) applied to the IVP y = e2t − y, y(0) = 1 with h = 0.1 Heun

ME

Solution

Heun error

ME error

tn

yn

yn

y(tn )

|y(tn ) − yn |

|y(tn ) − yn |

0

1

1

1

0

0

0.2

1.044572834

1.043396835

1.043095401

0.001477433

0.000301434

0.4

1.192009094

1.189291538

1.188727007

0.003282087

0.000564531

0.6

1.478251184

1.473408204

1.472580065

0.005671119

0.000828139

0.8

1.959569856

1.951698881

1.950563451

0.009006405

0.00113543

1

2.722082435

2.70981115

2.70827166

0.013810775

0.001539489

Finally,

h y2 = y1 + hf t1 + , c1 = 1.010517092 + 0.1 · (exp2(0.1 + 0.05) − 1.021061375) 2 = 1.010517092 + 0.1 · 0.328797432 = 1.043396835

Executing eight more steps using a computer, we ﬁnd the results in table 7.2. We also show the results from Heun’s method in order to make a comparison between the two approaches we have developed beyond Euler’s method, again reporting the results from every other step. From the table, we see that the Modiﬁed Euler method is an improvement over Heun’s method. This is not too surprising since the former stems from the midpoint rule for integration, while the latter from the trapezoid rule. In addition, if we plot the exact solution function, we see that the solution is always increasing and concave up over the interval of interest; in the presence of such consistent concavity in the solution function, the midpoint rule will generate noticeably more accurate approximations than will the trapezoid rule. Obviously Heun’s method and the Modiﬁed Euler method are substantial improvements over the standard Euler’s method. Not only are their errors much smaller, but the errors grow less quickly. To better understand why this is so, observe that Euler’s method relies solely on presently available data in generating its estimates. That is, the method takes an approach that relies on just one data point in order to proceed to the next approximation. Our two newest methods instead look into the future: rather than using the current point and the slope at that location, they use the current point and an estimate of the slope at a point that is ahead of our current location. We create these estimates using only

430

Numerical methods for differential equations

the currently available data, but the approaches lead to a substantial increase in accuracy that makes us hopeful for signiﬁcant improvements through other predictive approximation techniques that we are yet to investigate. Exercises 7.2 In exercises 1–10, use (a) Euler’s method, (b) Heun’s method, and (c) the Modiﬁed Euler method to estimate y(1) using h = 0.1, and compare the approximations generated by the three methods. In exercises 1–6, compare the approximations with the exact solution. 1. y + 2ty = 0, 2.

y

y(0) = −2

= 2y − 1,

y(0) = 2

3. y − y = 0, 4.

(y )2 − 2y

y(0) = 2

= 0,

5. y − y 2 = 1, 6.

tyy

y(0) = 2 y(0) = 0

= −1 − y 2 ,

y(0) = 2

7. y + ty = t 2 , 8.

y + y2

y(0) = 1

= t,

y(0) = 1

9. y + sin y = 2e −t , √ 10. y = 2e t /2 sin y,

y(0) = 0 y(0) = 0

7.3 Higher order methods

In calculus, we learn that if F (x) is a function with n + 1 derivatives in an interval surrounding a value x = a, then F has a Taylor polynomial expansion that obeys the relationship F (x) = F (a) + F (a)(x − a) +

F (a) F (n) (a) (x − a)2 +···+ 2! n!

F (n+1) (ζx ) (x − a)n+1 (7.3.1) (n + 1)! which is valid for x-values in an interval surrounding a and ζx is a number within that interval that depends on x. If we think of our interest in the solution y(t ) of an initial-value problem, assuming that y is sufﬁciently differentiable, the Taylor series expansion of y provides insight into errors that arise in approximation schemes. In (7.3.1), if we replace F by y, a by t0 , and x by t0 + h, noting that x − a = h, it follows that +

h 2 hn y (t0 ) + · · · + y (n) (t0 ) + O(h n+1 ) (7.3.2) 2! n! n + 1 n + 1 where by “O(h )” we mean “of order h or “proportional to h n+1 .” y(t0 + h) = y(t0 ) + hy (t0 ) +

Higher order methods

431

From (7.3.2), we can discern the so-called truncation error of certain methods. For example, if we use the approximation y(t0 + h) ≈ y(t0 ) + hy (t0 )

(7.3.3)

which corresponds to Euler’s method,3 we see that the truncation error is proportional to h 2 from the equation y(t0 + h) = y(t0 ) + hy (t0 ) + O(h 2 ). We therefore say that Euler’s method is ﬁrst-order, in reference to the highest power of h present in (7.3.3). Since we use a small step-size h, it is evident that higher order methods are superior: in the error due to truncation, higher powers of h will approach zero faster. In what follows, we will investigate second-, third-, and fourthorder approaches. The ﬁrst two arise through using the Taylor series expansion directly, and are therefore called Taylor methods. 7.3.1 Taylor methods

To employ a second-order Taylor method, from (7.3.2) we must be able to compute h2 (7.3.4) y(t0 + h) ≈ y(t0 ) + hy (t0 ) + y (t0 ) 2 In a standard initial-value problem, we are given y = f (t , y) (plus an initial condition), so we can compute y from the form of the differential equation. In particular, since y (t ) = f (t , y(t )) the chain rule for functions of two variables,4 implies that d

f (t , y(t )) dt d d = ft (t , y) [t ] + fy (t , y) [y ] dt dt = ft (t , y) + fy (t , y)y

y (t ) =

= ft (t , y) + fy (t , y)f (t , y)

(7.3.5)

Combining (7.3.5) with (7.3.4), we have developed the second-order Taylor method given by h2 [ft (t0 , y0 ) + fy (t0 , y0 )f (t0 , y0 )] 2 Generalizing (7.3.6) to the step from yn to yn+1 , we ﬁnd that y(t0 + h) ≈ y(t0 ) + hf (t0 , y0 ) +

yn+1 = yn + hf (tn , yn ) +

h2 [ft (tn , yn ) + fy (tn , yn )f (tn , yn )] 2

(7.3.6)

(7.3.7)

Observe that we are writing y (t0 ), which is given by f (t0 , y0 ) in Euler’s method. We are using the rule that if f (x , y) is a differentiable function of x and y, and x and y are each differentiable functions of t , then d /dt [f (x , y)] = fx (x , y)dx /dt + fy (x , y)dy /dt . 3 4

432

Numerical methods for differential equations

where yn ≈ y(tn ). We consider an example to demonstrate the implementation of this method and compare it to results previously considered. Example 7.3.1 Execute ten steps of the second-order Taylor series method with h = 0.1 to ﬁnd an approximate solution of the initial-value problem y(0) = 1 y = e 2t − y , Compare the results to those of Heun’s method and to the exact solution. Solution. This is the same IVP that we considered in example 7.2.2 with Heun’s method and the Modiﬁed Euler method. To employ (7.3.7), we ﬁrst must compute ft (t , y) and fy (t , y). Since f (t , y) = e 2t − y, we know that ft (t , y) = 2e 2t and fy (t , y) = −1. In addition, to simplify the implementation of the method, we use notation similar to Heun’s method. We let an = f (tn , yn ), rn = ft (tn , yn ), and sn = fy (tn , yn ), so that h2 [rn + sn an ] 2 Beginning with t0 = 0 and y0 = 1, observe that a0 = f (0, 1) = e 2·0 − 1 = 0 yn+1 = yn + han +

r0 = ft (0, 1) = 2e 2·0 = 2 s0 = fy (0, 1) = −1 We then have y1 = y0 + ha0 +

h2 [r0 + s0 a0 ] 2

= 1 + 0.1 · 0 +

0.12 [2 − 1 · 0] 2

= 1.01 Similarly, we can compute a1 = f (0.1, 1.01) = e2·0.1 − 1.01 = 0.211402758

r1 = ft (0.1, 1.01) = 2e2·0.1 = 2.442805516 s1 = fy (0.1, 1.01) = −1 and thus y2 = y1 + ha1 +

h2 [r1 + s1 a1 ] 2

= 1.01 + 0.1 · 0.211402758 +

0.12 [2.442805516 − 1 · 0.211402758] 2

= 1.04229729 Continuing these computations through ten steps, we ﬁnd the results noted in table 7.3, which are listed for every other step. Note, too, that we have included

Higher order methods

Table 7.3 Taylor’s method and Heun’s method applied to the IVP y = 2t(2 − y), using h = 0.1

433

y(0) = 1

Taylor

Heun

Solution

Taylor error

Heun error

tn

yn

yn

y(tn )

|y(tn ) − yn |

|y(tn ) − yn |

0

1

1

1

0

0

0.2

1.04229729

1.044572834

1.043095401

0.000798112

0.001477433

0.4

1.186750654

1.192009094

1.188727007

0.001976353

0.003282087

0.6

1.468880073

1.478251184

1.472580065

0.003699992

0.005671119

0.8

1.944339609

1.959569856

1.950563451

0.006223842

0.009006405

1

2.698337638

2.722082435

2.70827166

0.009934023

0.013810775

the results of Heun’s method from its application to the same IVP with the same step-size h = 0.1. From table 7.3, we can see that the errors in Heun’s method and the secondorder Taylor method are roughly proportionate and seem to grow at the same rate. This suggests that Heun’s method may also be a second-order method—an assertion that may be proved by studying related higher order methods. In particular, Heun’s method can be viewed as one of a collection of algorithms known as Runge–Kutta methods, which we will consider after some additional work with Taylor methods. Having shown that we can use the Taylor series (7.3.2) to motivate the development of the second-order method (7.3.7), it is natural to wonder if we could extend this work further to a third-order method. This is desirable since if the error in our method is proportionate to h 4 , then the method will be more accurate without having to use smaller values of h. It is indeed possible to develop a third-order method, provided that the function f (t , y) from the given IVP is sufﬁciently differentiable. In particular, in order to write y(t0 + h) ≈ y(t0 ) + hy (t0 ) +

h 2 h3 y (t0 ) + y (t0 ) 2 3!

(7.3.8)

we must compute the third derivative of y. From our earlier work (7.3.5), we know that y = ft (t , y) + fy (t , y)f (t , y)

(7.3.9)

434

Numerical methods for differential equations

Applying the chain rule to the ﬁrst term in (7.3.9), along with the fact that y = f (t , y), d

d d ft (t , y) = ftt (t , y) [t ] + fty (t , y) [y ] dt dt dt = ftt (t , y) + fty (t , y)f (t , y)

(7.3.10)

where the ﬁnal step follows from using y

= f (t , y). Using both the product rule and the chain rule on the second term in (7.3.9) and suppressing the “(t , y)” argument of each function present,

d d d fy f = fy f + fy f dt dt dt = fy (ft + fy f ) + (fyt + fyy f )f = fy ft + fy2 f + fyt f + fyy f 2

(7.3.11)

Combining (7.3.10) and (7.3.11) and using the fact that fty = fyt , we have shown that y = ftt + fty f + fy ft + fy2 f + fyt f + fyy f 2 = ftt + 2fty f + fy ft + fy2 f + fyy f 2

(7.3.12)

From (7.3.12), we understand why we normally do not use third-order Taylor methods in practice: the computations are extremely cumbersome. Were we to attempt to write h 2 h3 y (t0 ) + y (t0 ) 2 3! in terms of the function f from the given IVP, we would have to compute y(t0 + h) ≈ y(t0 ) + hy (t0 ) +

h2 h3 (ft + fy f ) + (ftt + 2fty f + fy ft + fy2 f + fyy f 2 ) 2 3! where each appearance of the function f or one of its partial derivatives is also being evaluated at the point (t0 , y0 ). This combination of the determination of a large number of functions and the evaluation of each at every stage of an algorithm makes Taylor methods of orders higher than two unreasonable to use. Hence, we next introduce one of the most popular and effective numerical methods for the solution of IVPs (known as Runge–Kutta methods) that enable us to achieve higher order approximations without the difﬁculty of computing multiple partial derivatives and evaluating these functions repeatedly. y(t0 + h) ≈ y0 + hf +

7.3.2 Runge–Kutta methods

Where higher order Taylor methods require ﬁnding partial derivatives of y = f (t , y) and evaluating these derivatives at each stage of the algorithm, Runge–Kutta methods seek to avoid using partial derivatives altogether, while

Higher order methods

435

still achieving the desired higher order accuracy. Instead, in Runge–Kutta methods the function f is evaluated at a greater number of points, essentially seeking to compute the slope at the current and future points in an effort to make as accurate a prediction as possible. Formally, Runge–Kutta methods can be viewed as a generalization of Heun’s method. Recall that in Heun’s method we write h yn+1 = yn + (an + bn ) 2 where an = f (tn , yn ) and bn = f (tn+1 , yn + han ) Rather than prescribing that we compute or estimate slopes at the points (tn , yn ) and (tn+1 , yn+1 ) and simply average them, a two-stage Runge–Kutta method takes an arbitrary combination of the function values f (tn , yn ) and f (tn + α h , yn + β hf (tn , yn )). Speciﬁcally, we set yn+1 = yn + c1 hf (tn , yn ) + c2 hf (tn + α h , yn + β hf (tn , yn )) (7.3.13) and then determine conditions on c1 , c2 , α , and β that guarantee the approximation generated by (7.3.13) is second-order through a comparison to the Taylor expansion of y(tn + h). It can be shown that among the inﬁnitely many possible valid choices for c1 , c2 , α , and β , taking α = β = 1 and c1 = c2 = 1/2 results in Heun’s method, which justiﬁes the fact that Heun’s method is second-order. Heun’s method is an example of a two-stage Runge–Kutta method; twostage refers to the fact that slopes are evaluated or estimated at two points. It is possible to achieve even higher order Runge–Kutta methods by generalizing the idea in (7.3.13). In particular, we can take arbitrary combinations of the values (or estimated values) of f (t , y) at points in the interval tn ≤ t ≤ tn+1 and select the weights so that the approximation agrees with the Taylor series expansion for y(tn + h) up to, and including, the term involving h 4 , h 5 , or whatever accuracy we desire. The details of the rigorous development of such methods are complicated and unenlightening. But, a more intuitive approach can help us gain a better sense of why the Runge–Kutta method works so well and where the formulas used in the algorithm come from. If we recall our development of Heun’s method and the Modiﬁed Euler method, each was linked to the idea of numerically approximating a deﬁnite integral. Speciﬁcally, Heun’s method is analogous to the trapezoid rule, and the Modiﬁed Euler method corresponds to the midpoint rule. The trapezoid rule and midpoint rule both give the exact value of the deﬁnite integral of any linear function; in addition, when a function has consistent concavity over an interval, the midpoint rule is roughly twice as accurate as the trapezoid rule and the errors in the midpoint and trapezoid rules have opposite signs. As such, it makes sense to take a weighted average of the two rules in an effort to cancel out the error of each. Computing the weighted average 2 · MID + TRAP 3

436

Numerical methods for differential equations

results in a new method known as Simpson’s rule that is a remarkably accurate approximation of the deﬁnite integral. In fact, it can be shown that Simpson’s rule is exact for every cubic polynomial. This same increase in accuracy can be accomplished through similar ideas in the numerical approximation of solutions to initial-value problems. Recalling our work with Heun’s method (H) and the Modiﬁed Euler method (ME), H: ME:

f (tn , yn ) + f (tn+1 , yn + hf (tn , yn )) 2 h h = yn + hf tn + , yn + · f (tn , yn ) 2 2

yn+1 = yn +

(7.3.14)

yn +1

(7.3.15)

we note that each uses a different expression for y, the approximate change in y(t ) in moving from tn to tn+1 . If we let h yH = [f (tn , yn ) + f (tn+1 , yn + hf (tn , yn ))] 2 and

yME = hf

h h tn + , yn + · f (tn , yn ) 2 2

then the analogy to Simpson’s Rule for approximating the solution y to the IVP y = f (t , y), y(t0 ) = y0 is given by yn+1 = yn +

2yME + yH 3

(7.3.16)

Using (7.3.14) and (7.3.15) and letting an = f (tn , yn ), we have the approximation rule given by yn+1 = yn + yS where 2 h h 1 h yS = hf tn + , yn + an + · [an + f (tn+1 , yn + han )] 3 2 2 3 2 h h h an + 4f tn + , yn + an + f (tn+1 , yn + han ) (7.3.17) = 6 2 2 If we slightly modify this expression for yS in recognition of the fact that as we proceed across the interval we have more and more information available (and hence a better approximation of the slope to use), the fourth-order Runge– Kutta rule emerges. In particular, rather than rely on the value an at every stage in (7.3.17), we recognize that we are attempting to compute approximate slopes at not just the left endpoint, but also at the midpoint and right endpoint. It makes sense that we should use these approximations as they become available to us; for instance, when we compute the approximate slope at the right endpoint, we ought to use the approximate slope at the midpoint to do so. Furthermore, given that the midpoint slope is weighted at 4 and the others at 1 in the average given by (7.3.17), it is reasonable to invest additional effort ensuring that the midpoint slope is as accurate as possible.

Higher order methods

437

As in Heun’s method, the computations are easier to understand, track, and implement if we introduce some additional notation. In particular, letting an = f (tn , yn ) slope at left endpoint 1 1 bn = f (tn + 2 h , yn + 2 han ) slope at midpoint (7.3.18) cn = f (tn + 12 h , yn + 12 hbn ) updated slope at midpoint dn = f (tn + h , yn + hcn ) slope at right endpoint we can replace the expression 4f (tn + h /2, yn + h /2an ) in (7.3.17) with the more accurate estimate 2bn + 2cn , and replace f (tn+1 , yn + han ) with f (tn+1 , yn + hcn ); each of these updates takes advantage of the most recent calculation of the approximate slope at points nearby. We thus arrive at the fourth-order Runge– Kutta method by setting yn+1 = yn + y to ﬁnd h (7.3.19) yn+1 = yn + (an + 2bn + 2cn + dn ) 6 where an , bn , cn , and dn are deﬁned as at (7.3.18). Again, through a lengthy development involving complicated calculations, it can be established rigorously that (7.3.19) is a fourth-order approximation technique: the resulting truncation error in the approximation is proportional to h 5 . The next example demonstrates the remarkable accuracy of the Runge– Kutta method. Example 7.3.2 Execute ten steps of the fourth-order Runge–Kutta method with h = 0.1 to ﬁnd an approximate solution of the initial-value problem y(0) = 1 y = e 2t − y , Compare the results to those of the second-order Taylor method. Solution. This is the same IVP as we considered in example 7.3.1. Recall that the exact solution to the problem is y(t ) = 2/3e −t + 1/3e 2t . To implement the Runge–Kutta method, we use f (t , y) = e 2t − y and compute an , bn , cn , and dn as given by (7.3.18). Using the initial condition (t0 , y0 ) = (0, 1), we compute a0 = f (t0 , y0 ) = f (0, 1) = e2·0 − 1 = 0 h ha0 b0 = f t0 + , y0 + = f (0.05, 1 + 0.05 · 0) = f (0.05, 1) 2 2 = e 2·0.05 − 1 = 0.105170918 h hb0 c0 = f t0 + , y0 + = f (0.05, 1 + 0.05 · 0.105170918) 2 2 = f (0.05, 1.005258546) = e2·0.05 − 1.005258546 = 0.099912372

d0 = f (t1 , y0 + hc0 ) = f (0.1, 1 + 0.1 · 0.099912372) = f (0.1, 1.009991237) = e2·0.1 − 1.009991237 = 0.211411521

438

Numerical methods for differential equations

Table 7.4 Fourth-order Runge–Kutta method and second-order Taylor’s method applied to the IVP y = 2t(2 − y), y(0) = 1 using h = 0.1 Runge–Kutta (RK)

Solution

RK error

Taylor error

tn

yn

y(tn )

|y(tn ) − yn |

|y(tn ) − yn |

0

1

1

0

0

0.2

1.043096313

1.043095401

0.000000912

0.000798112

0.4

1.188729047

1.188727007

0.000002040

0.001976353

0.6

1.472583611

1.472580065

0.000003546

0.003699992

0.8

1.950569107

1.950563451

0.000005656

0.006223842

1

2.708280362

2.70827166

0.000008701

0.009934023

and therefore h y1 = y0 + (a0 + 2b0 + 2c0 + d0 ) 6 0.1 = 1+ (0 + 0.210341836 + 0.199824744 + 0.211411521) 6 = 1.010359635 Implementing these same calculations for subsequent steps, we can generate the output displayed in table 7.4, where again we report the results from every other step. The error from Taylor’s method is being reported from table 7.3. In table 7.4 we can see the exceptional accuracy of the fourth-order Runge–Kutta method. In one sense, this is not surprising. Being a fourth-order method, we expect the error in the ﬁrst step to be proportional to h 5 = (0.1)5 = 0.00001, which is in contrast to the second-order Taylor’s method with error proportional to h 3 = 0.001. In each method, the errors are in fact much smaller; one reason why this is so can be understood by thinking about the coefﬁcient 1/5! = 1/120 that arises in the Taylor expansion of y(t0 + h) and multiplies h 5 . What can be considered surprising about the Runge–Kutta method is that it generates such signiﬁcant accuracy through a relatively limited number of computations and by only evaluating the function f (t , y) from the IVP at a select number of points, without the need to compute higher order derivatives. Fundamentally, the method takes four actual or approximate slopes and computes a weighted average of them in order to predict the next value of the solution function y(t ). This fourth-order Runge–Kutta method is so accurate that it is used as the standard plotting tool in Maple when using the DEplot command. In addition, if we command Maple to produce a

Methods for systems and higher order equations

439

numerical estimate to the solution of a stated IVP, the standard option in the dsolve command is a slightly more sophisticated algorithm known as the Runge–Kutta–Fehlberg method. Exercises 7.3 In exercises 1–10, use (a) the second-order Taylor’s method and (b) the fourth-order Runge–Kutta method to estimate y(1) using h = 0.1, and compare the approximations generated by the methods. In exercises 1–6, compare the approximations with the exact solution. Each IVP in exercises 1–10 is identical to those in exercises 1–10 in section 7.2. 1. y + 2ty = 0, 2.

y

= 2y − 1,

3. y − y = 0, 4.

(y )2 − 2y tyy

y(0) = 2 y(0) = 2 y(0) = 0

= −1 − y 2 ,

7. y + ty = t 2 , 8.

y(0) = 2

= 0,

5. y − y 2 = 1, 6.

y(0) = −2

y + y2

= t,

y(0) = 2

y(0) = 1 y(0) = 1

9. y + sin y = 2e −t , √ 10. y = 2e t /2 sin y,

y(0) = 0 y(0) = 0

7.4 Methods for systems and higher order equations

In section 6.4, we introduced an extension of Euler’s method for estimating the solution to nonlinear IVPs such as x = 9y − y 2 , y = x,

x(0) = 1 y(0) = 8

(7.4.1)

We again choose to use the notation x = [x y ]T rather than [x1 x2 ]T because we will be using subscripts to label approximations to the component solutions x(t ) and y(t ): for instance, x1 ≈ x(t1 ), where t1 = t0 + h. Recalling that x and y are each implicit functions of t , we can view (7.4.1) in the form x = f (x , y , t ), y = g (x , y , t ),

x(t0 ) = x0 y(t0 ) = y0

(7.4.2)

For a single initial-value problem y = f (t , y), y(0) = y0 , we have developed a variety of methods for estimating the solution, including Euler’s method, Heun’s method, and Runge–Kutta, in order of increasing accuracy. We will generalize each of these methods to the situation for systems, leaving it as an exercise for

440

Numerical methods for differential equations

the reader to consider other alternatives, such as the Modiﬁed Euler method. Throughout, we keep in mind that for a single IVP, every method has the form yn+1 = yn + y where y is an estimate that is obtained by taking the step-size h times some approximation of the slope of the solution y at or near (tn , yn ). Because Euler’s method is the simplest, we begin there. 7.4.1 Euler’s method for systems

Recall that for a single IVP y = f (t , y), y(0) = y0 , Euler’s method is given by the algorithm yn+1 = yn + hf (tn , yn ) (7.4.3) where tn+1 = tn + h, given a step-size h. As was shown in section 6.4, to implement Euler’s method for a system of two IVPs in the form (7.4.2), for the step from the approximation (xn , yn ) to the approximation (xn+1 , yn+1 ), we compute xn+1 = xn + h · f (tn , xn , yn ) (7.4.4) yn+1 = yn + h · g (tn , xn , yn ) Viewed from a vector perspective, if we let x f (t , x , y) and F(t , x) = x= y g (t , x , y) it follows that Euler’s method for systems is given by the rule x (n+1) = x (n) + hF(tn , x (n) )

(7.4.5)

We use the superscript x (n) ≈ x(tn ) to denote the approximation since subscripts on vectors often indicate particular entries in the vector. In section 6.4, we saw evidence that Euler’s method is not very effective because of the errors that arise. To demonstrate this further, we consider an example involving a linear system whose solution we know exactly. Example 7.4.1 Use Euler’s method with h = 0.1 to estimate the solution x(1) to the initial-value problem 2 2 −1 x = x , x(0) = 0 −2 −1 Compare the results to the exact solution. Solution. Using established methods from chapter 3, it is straightforward to show that the solution to the given IVP is cos 2t x(t ) = 2e −t sin 2t

Methods for systems and higher order equations

441

To estimate this solution via Euler’s method, we ﬁrst observe that 2 x −1 −x + 2y x = F(t , x) = = −2 −1 y −2x − y To compute x (1) ≈ x(t1 ), we use (7.4.5) and write 2 2 2 −1 x (1) = x (0) + hF(0, x (0) ) = + 0.1 0 −2 −1 0 2 1.8 −2 + 0.1 = = 0 −4 −0.4 Continuing Euler’s method in this manner for the subsequent nine steps with h = 0.1 to estimate x(1), we ﬁnd the results shown in table 7.5, where the values from every other step are reported. The ﬁnal column in table 7.5 merits some discussion. Since our exact solution is a vector function and the approximate solutions are also vectors, the error at each stage is given by the vector e(n) = |x(tn ) − x (n) |, where | · | denotes the absolute value function. The size of a vector can be measured by a single number, its length (or magnitude or norm), which is computed by taking the 3 square root + of the sum of the squares of its entries. For a vector x ∈ R , its length is x = (x12 + x22 + x32 ), where x1 , x2 , and x3 are the entries in x. The entries in Table 7.5 Euler’s method applied to the IVP in example 7.4.1 using h = 0.1

tn

0 0.2 0.4 0.6 0.8 1

Euler’s method

Exact solution

Euler error

x(n)

x(tn )

x(tn ) − x(n)

2 0 1.54 −0.72 0.9266 −1.1088 0.314314 −1.187352 −0.18542494 −1.02741408 −0.512646273 −0.724355863

2 0 1.508201923 −0.637657545 0.934032947 −0.961716336 0.397732304 −1.023027791 −0.026240382 −0.898274743 −0.306183731 −0.669023658

0.000000000 0.088268894 0.147271358 0.184285265 0.204979735 0.213748529

442

Numerical methods for differential equations

the ﬁnal column in table 7.5 are computed by taking the length of the vector e(n) which is the difference between the exact solution and the approximate solution at step n. For example, the error that is present at the second step is / / / / / / / 1.54 1.5082 / / = / 0.03180 / e(1) = / − / / / −0.72 −0.6376 −0.08234 / =

(0.03180)2 + (−0.08234)2 = 0.08827

which is the second entry in the third column of table 7.5. Clearly, the errors in Euler’s method are signiﬁcant. From our earlier work with Heun’s method and the Runge–Kutta method, we expect that we can attain much better approximations by using analogous approaches for systems. We consider Heun’s method next. 7.4.2 Heun’s method for systems

From our most recent work, we know that if we view a system of IVPs from the perspective of vector functions, we are trying to estimate the solution to x = F(t , x),

x(t0 ) = x0

and that from this point of view, the vector version of Euler’s method is x (n+1) = x (n) + hF(tn , x (n) ) Recalling that Heun’s method for a single differential equation is given by the rule h (7.4.6) yn+1 = yn + (an + bn ) 2 where an = f (tn , yn ) and bn = f (tn+1 , yn + han ), we realize that the vector analog of (7.4.6) is h (7.4.7) x (n+1) = x (n) + (a(n) + b(n) ) 2 where a (n) and b(n) are given by a (n) = F(tn , x (n) ) and b(n) = F(tn+1 , x (n) + ha (n) )

(7.4.8)

In order to compare and contrast the vector version of Heun’s method with Euler’s method, we consider the following example which builds upon example 7.4.1. Example 7.4.2 Use Heun’s method with h = 0.1 to estimate the solution x(1) to the initial-value problem 2 2 −1 x = x , x(0) = 0 −2 −1 Compare the results to the exact solution and to those from Euler’s method in example 7.4.1.

Methods for systems and higher order equations

443

Solution.

We are considering the IVP 2 x x 2 −x + 2y −1 x = F(t , x) = = , x(0) = 0 = 0 −2 −1 y −2x − y y0

To compute x (1) ≈ x(0.1) by Heun’s method, we ﬁrst compute 2 −1 a(1) = F(t0 , x (0) ) = x (0) −2 −1 2 2 −1 −2 = = 0 −2 −1 −4 Next, to determine b(1) we write (1)

b

2 −1 = F(t0 , x + ha ) = (x (0) + ha (0) ) −2 −1 2 2 + 0.1 · (−2) −2.6 −1 = = 0 + 0.1 · (−4) −2 −1 −3.2 (0)

(0)

Finally, we determine x (1) = x (0) + h /2(a (1) + b(1) ) to ﬁnd 2 1.77 −2 −2.6 (1) x = + 0.05 + = 0 −4 −3.2 −0.36 Updating our work and computing the subsequent approximations results in the values for x (2) , . . . , x (10) shown in table 7.6, where we also display the errors computed in table 7.5 for Euler’s method applied to the same IVP. It is apparent from table 7.6 that just as Heun’s method for a single IVP is a substantial improvement over Euler’s method, it is also better for systems. At the same time, knowing that even higher order methods such as Runge–Kutta are available, we aspire to develop even more accurate methods for systems by converting the Runge–Kutta method for a single DE to one for systems. 7.4.3 Runge–Kutta method for systems

Recall that for the single ﬁrst-order IVP y = f (t , y), y(t0 ) = y0 , the fourth-order Runge–Kutta method is given by h yn+1 = yn + (an + 2bn + 2cn + dn ) 6

(7.4.9)

an = f (tn , yn ) bn = f tn + 12 h , yn + 12 han cn = f tn + 12 h , yn + 12 hbn dn = f (tn + h , yn + hcn )

(7.4.10)

where

444

Numerical methods for differential equations

Table 7.6 Heun’s method applied to the IVP in example 7.4.2 using h = 0.1

tn

0 0.2 0.4 0.6 0.8 1

Heun

Solution

Heun error

Euler error

x(n)

x(tn )

x(tn ) − x(n)

x(tn ) − x(n)

2 0 1.50165 −0.6372 0.924464441 −0.95685138 0.389258164 −1.012962308 −0.03046503 −0.884575076 −0.304699526 −0.654454923

2 0 1.508201923 −0.637657545 0.934032947 −0.961716336 0.397732304 −1.023027791 −0.026240382 −0.898274743 −0.306183731 −0.669023658

0

0

0.006567879

0.088268894

0.010734249

0.147271358

0.013157697

0.184285265

0.014336266

0.204979735

0.014644143

0.213748529

Just as with Euler’s method and Heun’s method, we can develop the vector analog of the Runge–Kutta method. We do so by letting # h " (n) a + 2b(n) + 2c(n) + d(n) (7.4.11) x (n+1) = x (n) + 6 where a (n) = F tn , x (n) b(n) = F tn + 12 h , x (n) + 12 ha (n) (7.4.12) c(n) = F tn + 12 h , x (n) + 12 hb(n) d(n) = F tn + h , x (n) + hc(n) The computations for the Runge–Kutta method for systems can be implemented in a way very similar to those for Heun’s method. Doing so and applying the Runge–Kutta method to the IVP stated in examples 7.4.1 and 7.4.2 results in the values shown in table 7.7; we also display the error from Heun’s method by way of contrast. As with single IVPs, the results of the Runge–Kutta method for systems are impressive. This is again due to the fact that the Runge–Kutta method is fourth-order, while Heun’s method is only second-order. We close this section by recalling the important link between higher order differential equations and systems of ﬁrst-order equations.

Methods for systems and higher order equations

445

Table 7.7 Runge–Kutta method applied to the IVP in example 7.4.2 using h = 0.1

tn

0 0.2 0.4 0.6 0.8 1

RK

Solution

RK error

Heun error

x(n)

x(tn )

x(tn ) − x(n)

x(tn ) − x(n)

2 0 1.508211151 −0.637671316 0.934038085 −0.96174299 0.397725368 −1.023060398 −0.026261217 −0.89830458 −0.306215262 −0.66904348

2 0 1.508201923 −0.637657545 0.934032947 −0.961716336 0.397732304 −1.023027791 −0.026240382 −0.898274743 −0.306183731 −0.669023658

0

0

0.00001658

0.006567879

0.00002714

0.010734249

0.00003334

0.013157697

0.00003639

0.014336266

0.00003724

0.014644143

7.4.4 Methods for higher order IVPs

We have repeatedly used the fact that any linear nth-order differential equation can be converted to a system of linear ﬁrst-order equations. For example, given a second-order equation such as y + 2y − 3y = sin t , we know that with the substitution x1 = y, x2 = y , it follows that x = [x1 x2 ]T is a solution to the system of differential equations x1 = x2 x2 = 3x1 − 2x2 + sin t Given our current interest in approximating solutions to initial-value problems, we are particularly focused on nonlinear equations, including g θ + sin θ = 0, θ (0) = a , θ (0) = b L which governs the motion of a simple undamped pendulum, as developed in section 6.1. In this setting, we are unable to determine an exact solution, and thus wish to generate an approximate one. More generally, we want to be able to develop an approximate solution to any nonlinear IVP. In the second-order case, we can view this problem as having the form y = f (t , y , y ),

y(0) = a , y (0) = b

(7.4.13)

446

Numerical methods for differential equations

We introduce the substitution z = y , then z = y = f (t , y , y ) = f (t , y , z), so that (7.4.13) may be rewritten as the system of IVPs y = z, y(0) = a (7.4.14) z = f (t , y , z), z(0) = b Letting x = [y z ]T and F(t , x) = [z f (t , y , z)]T , we may rewrite (7.4.14) in the form a x = F(t , x), x(0) = b which is precisely the form we considered for Euler’s method, Heun’s method, and the Runge–Kutta method for systems. That is, once we have converted a higher order IVP to a system of ﬁrst-order IVPs, we may choose from any of our existing approximation methods for systems of DEs. We demonstrate this for a particular example using Heun’s method. Example 7.4.3 Use Heun’s method to estimate the solution y(t ) from t = 0 to t = 1 to the second-order IVP y + 0.1y + 4 sin y = 0, y(0) = 1, y (0) = 0 with step-size h = 0.1. Solution. We begin by letting z = y , so that z = y = −4 sin y − 0.1y = −4 sin y − 0.1z. Writing x = [y z ]T , it follows that z x = = F(t , x) −4 sin y − 0.1z Recalling Heun’s method, we must compute h x (n+1) = x (n) + (a(n) + b(n) ) 2 where a (n) = F(tn , x (n) ) and b(n) = F(tn+1 , x (n) + ha (n) ) With the initial condition x (0) = [1 0], we ﬁrst ﬁnd that 0 0 a (0) = = −4 sin(1) − 0.1 · 0 −3.366 from which it follows that −0.3366 (0) (0) (0) b = F(0.1, x + ha ) = −3.332 Therefore, x (1) is given by h x (1) = x (0) + (a(0) + b(0) ) 2 0.1 1 0 −0.3366 = + + 0 −3.366 −3.332 2 0.98317 = −0.33490

Methods for systems and higher order equations

447

Table 7.8 Heun’s method applied to the second-order IVP in example 7.4.3 using h = 0.1 n

x(n)

a(n)

1 0 0.933202302 2 −0.659006349 0.740589862 4 −1.240510452 0.445309489 6 −1.663161107 0.088048126 8 −1.840715689 −0.276080886 10 −1.728307853

0

0 −3.365883939 −0.659006349 −3.148220455 −1.240510452 −2.574842511 −1.663161107 −1.556632719 −1.840715689 −0.167666053 −1.728307853 1.263178986

x(n+1)

b(n)

−0.336588394 −3.3322251 −0.973828394 −2.952961961 −1.497994703 −2.163059344 −1.818824379 −0.919669919 −1.857482294 0.569252015 −1.601989955 1.896140173

0.98317058 −0.334905452 0.851560565 −0.96406547 0.603664604 −1.477405545 0.271210214 −1.786976239 −0.096861773 −1.820636391 −0.442595776 −1.570341895

Executing similar computations for the remaining nine steps to approximate x(1), we ﬁnd the results shown in table 7.8. From the results of table 7.8, we see that −0.276080886 (10) x(1) ≈ x = −1.728307853 Recalling that x(t ) = [y(t ) z(t )]T and that our ultimate goal is to estimate the solution y(t ) to the stated IVP, it follows that y(1) ≈ −0.2761. The approach in example 7.4.3 can be implemented for higher order initialvalue problems through a substitution to convert a given higher order equation to a system of ﬁrst-order ones. More accurate results may be obtained through applying the fourth-order Runge–Kutta method for systems. We note particularly that not only can we estimate solutions to nonlinear equations, but even those with non-constant coefﬁcients. For example, solutions to IVPs like y + ty = 10 sin 2t , y(0) = y (0) = 0 can now be approximated. Exercises 7.4 In exercises 1–6, (a) use Euler’s method for systems with h = 0.1 to estimate the solution x(1) to the initial-value problem, (b) use Heun’s method

448

Numerical methods for differential equations

for systems with h = 0.1 to estimate the solution x(1) to the initial-value problem, and (c) if possible, compare the results to the exact solution. 0 −1 1 1. x = x, x(0) = 1 0 1 1 −1 3 2. x = x, x(0) = 3 1 1 2 1 −1 3. x = x, x(0) = 2 −4 1 t −1 1 4. x = x, x(0) = 1 0 0 0 −1 1 1 5. x = x+ , x(0) = 1 0 t 0 1 −1 1 0 6. x = x+ , x(0) = t 0 1 0 In exercises 7–13, (a) use Heun’s method and (b) use the Runge–Kutta method to estimate the solution of the system of IVPs at the given t -value using the stated h-value. 7. x = y − 2xy , x(0) = 0.75 y = 4xy − x , y(0) = 0.5

t = 1, h = 0.1

8. x = 4 − y 2 , x(0) = −2 y = 1 − x + y , y(0) = −1

t = 3, h = 0.05

9. x = cos y , x(0) = 2 y = 1 − sin x , y(0) = 3

t = 1.5, h = 0.1

10. x = 2x − y , x(0) = 1 y = −4x + 2y , y(0) = 1

t = 1.5, h = 0.1

11. x = e −y , x(0) = 0 y = 1/(1 + x 2 ), y(0) = 0

t = 2, h = 0.05

12. x = ln(2 + y), x(0) = −1 y(0) = −0.5 y = x2 + y, 13. x = y − x 2 , x(0) = 1 y = x − 8y 2 , y(0) = 0.75

t = 2, h = 0.1 t = 1, h = 0.05

14. Recall from section 6.1 that the nonlinear system of differential equations W = −0.75W + 0.25MW M = 0.5M − 0.1MW

For further study

449

models the numbers of wolves and moose (each measured in hundreds) in a predator–prey model, where time is measured in years. Assume that at time t = 0 there are 250 moose and 550 wolves present. Estimate the numbers of moose and wolves present at t = 3, 6, and 9 years using a step-size of (a) h = 0.1, and (b) h = 0.05 with both Euler’s method and Heun’s method. In exercises 15–18, (a) convert the given second-order IVP to a system of ﬁrstorder IVPs, (b) use Euler’s method for systems with h = 0.1 to estimate the solution y(1) to the initial-value problem, (c) use Heun’s method for systems with h = 0.1 to estimate the solution y(1) to the initial-value problem, and (d) if possible, compare the results to the exact solution. 15. y + 16y = 2t + 1,

y(0) = y (0) = 0

16. y + 16y = 2 sin 2t , 17. y + 16y 2 = 2 sin 2t ,

y(0) = y (0) = 0 y(0) = y (0) = 0

18. y + 0.2(y )2 + 2y 2 = 4e −t sin t ,

y(0) = y (0) = 0

7.5 For further study 7.5.1 Predator–prey equations

Recall that a predator–prey scenario is modeled by the equations x = 0.6x − 0.3xy y = −0.9x + 0.6xy

x(0) = 2 y(0) = 3

(7.5.1)

(a) Determine the nontrivial equilibrium solution of (7.5.1) and use a computer algebra system to plot the direction ﬁeld of the system in a suitable window containing the equilibrium solution and the given initial condition. (b) Use a computer to implement Heun’s method to estimate the solution (x(t ), y(t )) of (7.5.1) on the interval 0 ≤ t ≤ 20 using h = 0.1. (c) Use your data from (b) to generate two plots: one a parametric plot of the approximate curve (x(t ), y(t )) and the other a simultaneous plot of the separate functions x(t ) and y(t ) on the same coordinate axes. Discuss the behavior of the populations x(t ) and y(t ) over time. (d) Modify your calculations in (b) appropriately to investigate the impact of changing the parameter ‘0.3’ in the ﬁrst equation to each of the values 0.1, 0.2, 0.4, 0.5, and 0.9. In each case, generate the same plots as instructed in (c). What impact does this have on the behavior of the populations? (e) Modify your calculations in (b) in order to consider the following different initial conditions: x(0) = 1.7, y(0) = 1.8; x(0) = 2.5, y(0) = 3.6; x(0) = 5,

450

Numerical methods for differential equations

y(0) = 1. In each case, generate the same plots as instructed in (c). What impact do the initial conditions have on the behavior of the populations? 7.5.2 Competitive species

In section 6.5.2, we developed the model x = ax 1 − A1 x − αa y " # y = by 1 − B1 y − βb x

(7.5.2)

where a, A, and α are positive constants (a is the population x(t )’s growth constant, A its carrying capacity, and α a parameter that reﬂects the competition for resources from population y(t )). The constants b, B, and β play the same roles for the second population. (a) In (7.5.2), let a = 0.5, b = 0.25, A = 5, B = 2, α = 0.04, and β = 0.02. Find all equilibrium points of the system and plot a direction ﬁeld in a computer algebra system of this system that contains all the equilibrium solutions. (b) Apply Heun’s method to estimate the solution (x(t ), y(t )) of (7.5.2) on the interval 0 ≤ t ≤ 20 using h = 0.1. Plot the trajectory of the approximate solution. (c) Leaving all other parameters the same, change the value of B to B = 8. Repeat questions (a) and (b) and discuss the differences between the results for the two B-values. (d) Repeat question (c) with B = 15. (e) What is the largest value of B for which the two populations can coexist with a stable equilibrium in which each population tends to a nonzero value as t → ∞? What value(s) of B ensure that population y(t ) will dominate as t → ∞ and force x(t ) → 0? (f) For each of the three values of B above, experiment with the impact of the following different sets of initial conditions: x(0) = 1, y(0) = 1; x(0) = 5, y(0) = 1; x(0) = 1, y(0) = 5; x(0) = 5, y(0) = 5. How do the different initial conditions impact the behaviors of the two populations? 7.5.3 The damped pendulum

In section 6.5.1, it was shown that for a pendulum with an arm of length L, bob of mass m, and damping constant c, the angle θ that the arm forms with the vertical axis at time t satisﬁes the IVP L θ = −g sin θ − c θ ,

θ (0) = θ0 , θ (0) = θ0

(7.5.3)

For further study

451

(a) Using the change of variables x = θ , y = x , show that the nonlinear second-order IVP (6.5.2) is equivalent to the system x = y g c (7.5.4) y = − sin x − y L L (b) Apply Heun’s method to estimate the solution (x(t ), y(t )) of (7.5.4) with g = 9.8, L = 1, and c = 1 with initial conditions x(0) = 2, y(0) = 2 on the interval 0 ≤ t ≤ 10 using h = 0.1. Plot the trajectory of the approximate solution. (c) Repeat question (b) using c = 0.1 and c = 5. Discuss the differences in the results. (d) Investigate the effects of changing the initial conditions to the following: x(0) = 2, y(0) = 5; x(0) = 2, y(0) = 15; x(0) = 2, y(0) = −5. Do so for each of the three c-values noted above and discuss the differences among the results and the physical interpretation that explains how the pendulum is behaving.

This page intentionally left blank

8 Series solutions for differential equations

8.1 Motivating problems

In more sophisticated courses in mathematical physics or special functions, a different type of linear differential equation frequently arises from those we have studied to date. From several perspectives, we have thoroughly analyzed the behavior of linear differential equations with constant coefﬁcients of the form y + a1 y + a0 y = f (t ) But there are other important and well-known equations with non-constant coefﬁcients. We list some of these here in anticipation of more in-depth study in subsequent sections. Airy’s equation is a linear second-order equation that arises in physics in the study of light refraction. While it can be stated in a slightly more general form, a good example to begin with is y + ty = 0

(8.1.1)

The explicit presence of the coefﬁcient “t ” in (8.1.1) makes this equation substantially different from those (such as y + y = 0) we have already solved. If we recall the initial approach to solving y + y = 0, we can gain intuition for how to proceed with (8.1.1). We know that guessing y = e rt in y + y = 0 leads to the characteristic equation r 2 + 1 = 0, so that y = e it or y = e −it . We then know from Euler’s formula that both y = sin t and y = cos t arise as linearly independent solutions to y + y = 0. One key characteristic the exponential, sine, and cosine functions have in common is that they can be expressed as inﬁnite power series; indeed, this fact was used to justify the validity of Euler’s formula. 453

454

Series solutions for differential equations

In particular, we can write et = 1 + t + sin t = t −

t2 t3 tn + + ··· + + ··· 2! 3! n!

t3 t5 t 2n+1 + − · · · + (−1)n+1 + ··· 3! 5! (2n + 1)!

(8.1.2) (8.1.3)

t2 t4 t 2n + − · · · + (−1)n + ··· (8.1.4) 2! 4! (2n)! 0 n Each of these expressions for e t , sin t , and cos t is of the form ∞ n =0 an t and is valid for every real number t . In the upcoming chapter, rather than making guesses of the form y = e rt , we instead assume much more generally that enough function to have 0y is a nice n a power series expansion of the form y = ∞ n =0 an t , and then substitute this form of the potential solution function y into the differential equation in order to deduce the coefﬁcients an . Other well-known differential equations that we will consider include the Hermite equation (8.1.5) y − 2ty + 2qy = 0 where q is a constant, the Laguerre equation ty + (1 − t )y + qy = 0 (8.1.6) (again where q is constant), and the Bessel equation cos t = 1 −

t 2 y + ty + (t 2 − n 2 )y = 0

(8.1.7)

where n is a constant. Again, in each of (8.1.5), (8.1.6), and (8.1.7), it is the presence of nonconstant coefﬁcient(s) involving t that makes us seek new ways to ﬁnd solutions. Finally, recalling an elementary differential equation from calculus further motivates the importance of inﬁnite series representations of functions. Among the simplest of all ﬁrst-order differential equations are those of the form y = f (t ); these can be solved (in theory) by integrating. But if we consider an example such as y = e −t 2 we are immediately stuck since the function e −t lacks an elementary antiderivative. If we use (8.1.2) and replace t with −t 2 , then we can write t4 t6 t 2n 2 y = e −t = 1 − t 2 + − + · · · + (−1)n+1 + ··· 2! 3! n! Integrating, it follows that t3 t5 t7 t 2n+1 y = C +t − + − + · · · + (−1)n+1 + ··· 3 5 · 2! 7 · 3! (2n + 1) · n ! Hence we are able to determine the general solution function y, although we must be content to leave y in its series representation. Discovering solutions in 2

A review of Taylor and power series

455

this power series form will be typical of the results we obtain in our work in this chapter.

8.2 A review of Taylor and power series

From calculus, we know that if a function has a derivative at a given point t = a, then the function is approximately linear near t = a. Indeed, the existence of the ﬁrst derivative ensures that the function is smooth: the function must be continuous at a and it’s graph cannot have a corner there. Of course, if having one derivative is a good thing, having several derivatives is even better. The best possible scenario of all is that the function is inﬁnitely differentiable at t = a. That is, f (k) (a) exists for every k = 0, 1, 2, . . .. A function that is inﬁnitely differentiable at t = a and at all points in some small open interval containing a is said to be analytic 1 at t = a. If a function fails to be analytic at a given point, we say that f is singular at that point. For example, the rational function t f (t ) = 2 (t + 9)(t − 4) is singular at t = 4 and t = ±3i since it is undeﬁned at these values (as are each of its derivatives). At every other value of t , f (t ) is analytic. Much of the theory of analytic functions is a natural extension of the ideas of Taylor polynomials and Taylor series from calculus. Here our intention is not to develop a complete theory of analytic functions, but rather to remind the reader of important results on Taylor series and extend this perspective slightly in order to suit our purposes. Most results will be stated without proof. To begin, we assume that f is an analytic function at a = 0 and recall that the polynomial functions P0 (t ) = f (0) P1 (t ) = f (0) + f (0)t P2 (t ) = f (0) + f (0)t +

f (0) 2 t 2!

.. .

f (0) 2 f (k) (0) k t + ··· + t (8.2.1) 2! k! are called the Taylor polynomials of f at a = 0 and form the sequence of partial sums of the inﬁnite series f (0) 2 f (k) (0) k P(t ) = f (0) + f (0)t + t + ··· + t + ··· (8.2.2) 2! k! Pk (t ) = f (0) + f (0)t +

1 Usually when analytic functions are discussed, we allow the function to have complex inputs and consider a disk of a given radius around a complex point. For our purposes, a discussion restricted to real values is sufﬁcient.

456

Series solutions for differential equations

In particular, the function Pk (t ) in (8.2.1) is the kth Taylor polynomial of f at a = 0, and the inﬁnite series (8.2.2) is called the Taylor series of f centered at a = 0; the series converges in (8.2.2) if and only if the sequence of partial sums converges. That is, P(t ) is deﬁned if and only if lim Pk (t )

k →∞

exists. If this limit fails to exist, we say that the Taylor series diverges at this point. What is perhaps most remarkable is the fact that wherever the series (8.2.2) converges, it does so to the value of the given analytic function f ; moreover, the Taylor series converges in an interval centered at t = 0 that extends to the nearest singular point. Formally, we have the following theorem. Theorem 8.2.1 Suppose that f (t ) is an analytic function at 0 and R is the distance from 0 to the nearest singular point of f (t ). Then the Taylor series of f (t ) centered at t = 0 converges to f (t ) in the interval |t | < R and diverges in the interval |t | > R. The number R is called the radius of convergence of the Taylor series. We note, too, that it is possible for singular points to be complex, so R is not necessarily the distance from 0 to the nearest real singular point. We also observe speciﬁcally that for any t such that |t | < R, we know f (0) 2 f (k) (0) k t + ··· + t + ··· 2! k! We consider an example to see many of these ideas at work. f (t ) = f (0) + f (0)t +

Example 8.2.1 Find the Taylor series of f (t ) = ln(1 + t ) centered at t = 0 and determine the radius of convergence of the series. Solution. We begin by taking the ﬁrst several derivatives of f and evaluating them at 0: f (t ) = ln(1 + t ) f (0) = ln(1) = 0 f (t ) = (1 + t )−1

f (0) = 1

f (t ) = (−1)(1 + t )−2

f (0) = −1

f (t ) = (−2)(−1)(1 + t )−3

f (0) = 2!

f (4) (t ) = (−3)(−2)(−1)(1 + t )−4 f (4) (0) = −3! From these calculations, we see that the fourth Taylor polynomial is 1 2 2! 3 3! 4 t + t − t 2! 3! 4! 1 1 1 = t − t2 + t3 − t4 2 3 4

P4 (t ) = 0 + 1t −

A review of Taylor and power series

457

The established pattern implies that the Taylor series of f (t ) = ln(1 + t ) is ∞

, 1 1 1 1 (−1)n+1 t n P(t ) = t − t 2 + t 3 − t 4 + · · · = 2 3 4 n n =1

From calculus, the standard way to test a power series for convergence is to use the Ratio Test. Doing so here with an = (−1)n+1 (1/n)t n , we observe that an+1 (−1)n+2 (1/n + 1) t n+1 = lim lim n →∞ an n →∞ (−1)n+1 (1/n) t n n · t = lim −1 · n →∞ n+1 = |t |

The Ratio Test states that a given series converges if limn→∞ |an+1 /an | < 1. Thus, if |t | < 1, it follows that ∞

, 1 1 1 1 (−1)n+1 t n ln(1 + t ) = t − t 2 + t 3 − t 4 + · · · = 2 3 4 n

(8.2.3)

n =1

converges. The result of example 8.2.1 makes further sense in light of theorem 8.2.1 since we know that f (t ) = ln(1 + t ) has a singularity at t = −1. If we substitute t = −1 in (8.2.3), the opposite of the harmonic series arises (−1 − 12 − 13 − 14 −· · · ), which diverges. However, it can be shown by the alternating series test that (8.2.3) does converge when t = 1; indeed, for any power series that converges for |t | < R, it is possible for the series to converge at both t = ±R, neither, or just one of the points. While this is an interesting mathematical topic in its own right, it is largely irrelevant in our discussion of series solutions to differential equations. We next state several prominent Taylor series expansions along with their respective radii of convergence and leave the development and testing of these series for convergence to the exercises at the end of this section. et = 1 + t + sin t = t −

t2 t3 tn + + ··· + + ··· 2! 3! n!

t3 t5 t 2n+1 + − · · · + (−1)n+1 + ··· 3! 5! (2n + 1)!

R=∞ R=∞ (8.2.4)

t2 t4 t 2n + ··· cos t = 1 − + − · · · + (−1)n 2! 4! (2n)!

R=∞

1 = 1 + t + t2 + t3 + ··· + tn + ··· 1−t

R=1

458

Series solutions for differential equations

From these fundamental Taylor series, the series expansions of other related functions may often be easily found. The following example demonstrates one way in which this may be accomplished. Example 8.2.2

Find the Taylor series expansion of t f (t ) = 1 + 4t 2 as well as its radius of convergence. Solution. If we ﬁrst omit the t in the numerator of f (t ), we can use the ﬁnal result from (8.2.4) and substitute −4t 2 for t , writing 1 = 1 + (−4t 2 ) + (−4t 2 )2 + (−4t 2 )3 + · · · + (−4t 2 )n + · · · 1 − (−4t 2 ) = 1 − 4t 2 + 16t 4 − 64t 6 + · · · + (−4)n t 2n + · · ·

(8.2.5)

To get the Taylor series of f (t ), we now multiply both sides of (8.2.5) by t , and have t f (t ) = = t − 4t 3 + 16t 5 + 64t 7 + · · · + (−4)n t 2n+1 + · · · (8.2.6) 1 + 4t 2 Since the original series from (8.2.4) converges for |t | < 1 and we replaced t with −4t 2 , it follows that (8.2.5) converges for | − 4t 2 | < 1, or in other words for |t | < 1/2. Multiplying (8.2.5) by t has no effect on the radius of convergence of the series, and therefore (8.2.6) converges for |t | < 1/2. Note further that the denominator 1 + 4t 2 of f (t ) is zero at t = ±i /2; each of these complex numbers lies a distance of 1/2 unit away from the origin and is a singular point of f . This observation is additional evidence that R = 1/2 is the radius of convergence of the series expansion of f (t ). Similar reasoning may be used to ﬁnd expansions for such functions as e −t , t sin 4t , and (cos t − 1)/t 2 . In each case, the approach of example 8.2.2 is far simpler than using the deﬁnition of Taylor series directly and computing derivatives of the given function. One reason why the development of Taylor series for functions similar to those in (8.2.4) is so straightforward is the fact that Taylor series are unique. Said differently, if we can ﬁnd a power series expression for a given function, it must be the Taylor series. This is stated formally in the following theorem. 2

0 k Theorem 8.2.2 The series ∞ k =0 bk t converges in the interval |t | < R to the function f (t ) if and only if f (t ) is analytic for all t such that |t | < R and

bk =

1 (k) f (0) k!

A review of Taylor and power series

459

0 k An immediate consequence of theorem 8.2.2 is that if ∞ k =0 bk t = 0 for |t | < R, then bk = 0 for all t in the interval. We will use this result frequently when we solve differential equations by equating like coefﬁcients of two equal power series. If we cannot use substitution to ﬁnd a Taylor series expansion (as we did in example 8.2.2), it may be possible to use differentiation or integration to do so. The following example introduces this approach.

Example 8.2.3 Find the Taylor series expansion and radius of convergence of f (t ) = arctan t . Solution. If we were to attempt to ﬁnd the series via the deﬁnition by taking derivatives, we would ﬁnd that the process becomes laborious after computing f (t ) = 1/(1 + t 2 ), since differentiating will involve both the chain and quotient rules. Instead, we observe that 1 f (t ) = 1 + t2 itself has a series expansion that is not difﬁcult to ﬁnd. Similar to our work in example 8.2.2, we use the ﬁnal result in (8.2.4) and substitute −t 2 for t to write f (t ) =

1 = 1 + (−t 2 ) + (−t 2 )2 + (−t 2 )3 + · · · + (−t 2 )n + · · · 1 − (−t 2 ) = 1 − t 2 + t 4 − t 6 + · · · + (−1)n t 2n + · · ·

(8.2.7)

Because we now have a series expansion for f (t ), it is natural to integrate both sides of (8.2.7) to ﬁnd the series for f (t ). Doing so, we see that 1 1 1 (−1)n 2n+1 f (t ) = arctan t = C + t − t 3 + t 5 − t 7 + · · · + · · · (8.2.8) t 3 5 7 2n + 1 It is a straightforward exercise to use the Ratio Test to show that (8.2.8) converges for all t such that |t | < 1. Moreover, since arctan(0) = 0, it follows that C = 0. While intuition guides our work in example 8.2.3, and we certainly know that we can integrate any ﬁnite polynomial, the one step that is perhaps questionable is when we say we will integrate both sides of (8.2.7) to ﬁnd the series for f (t ). That this step is legitimate (and that it preserves the radius of convergence) is the conclusion of our next formal result, the Taylor series Differentiation and Integration Theorem. Theorem 8.2.3 If f (t ) has the Taylor series expansion f (t ) =

∞ ,

k =0

bk t k , |t | < R

460

Series solutions for differential equations

t then its antiderivative F (t ) = 0 f (x) dx and its derivative f (t ) have the respective Taylor series expansions t ∞ ∞ , , bk k +1 bk x k dx = , |t | < R (8.2.9) t F (t ) = k +1 0 k =0

f (t ) =

∞ ,

k =0

k =0 ∞

bk

, d k [t ] dx = kbk t k −1 , |t | < R dt

(8.2.10)

k =1

That is, theorem 8.2.3 states that any power series may be differentiated or integrated term-wise and that doing so does not change the radius of convergence of the power series. This fact makes more reasonable our plan to solve differential equations by letting y be an unknown power series, taking its appropriate derivative(s), and substituting into the differential equation to determine the coefﬁcients in the series. Finally, it is not always possible to determine an explicit expression for the nth coefﬁcient of the Taylor series expansion of a function in terms of n. In this situation, we must be content with knowing the values of the ﬁrst few coefﬁcients. For this type of computation, we sometimes abbreviate the tail end of a power series by writing O(t n ) = cn t n + cn+1 t n+1 + · · ·

(8.2.11)

where we read the notation O(t n ) as “order of t n ”. For instance, we could write t2 + O(t 3 ) 2 The next example emphasizes the fact that we cannot always explicitly determine a formula for the general nth term in the Taylor expansion of a function. et = 1 + t +

Example 8.2.4 Find the ﬁrst four terms of the Taylor series expansion about t = 0 of the function t f (t ) = t e +1 Solution. Because f is the quotient of two functions that are analytic everywhere and the denominator is never zero, it follows that f is analytic everywhere. In particular, f is analytic at a = 0 and, therefore, has a Taylor series expansion there of the form t = b0 + b1 t + b2 t 2 + b 3 t 3 + · · · (8.2.12) t e +1 We know from the standard expansion of e t that et + 1 = 2 + t +

t2 t3 + + ··· 2! 3!

A review of Taylor and power series

461

Multiplying both sides of (8.2.12) by this expression for e t + 1, we obtain the identity t2 t3 t = 2 + t + + + · · · b0 + b1 t + b2 t 2 + b3 t 3 + · · · 2! 3! Distributing to multiply these two series, we ﬁnd that b0 2 b1 b0 3 t = 2b0 + (2b1 + b0 )t + 2b2 + b1 + t + 2b3 + b2 + + t + ··· 2 2 6 In order for this identity to hold, the uniqueness of Taylor series expansions established in theorem 8.2.2 implies that all of the coefﬁcients of powers of t on the left must equal the corresponding coefﬁcients of powers of t on the right. In particular, it must be the case that 0 = 2b0 1 = 2b1 + b0 1 0 = 2b2 + b1 + b0 2 1 1 0 = 2b3 + b2 + b1 + b0 2 6 From this sequence of equalities, it follows that b0 = 0, b1 = 1/2, b2 = −1/4, and b3 = 0, so that t 1 1 = t − t 2 + 0t 3 + · · · f (t ) = t e +1 2 4 Exercises 8.2 In exercises 1–4, determine the radius of convergence of the stated power series. 1.

∞ n , t

n =1

2.

∞ n n , 2 t

n =1

3.

n!

∞ 2 , n (t − 2)n

n =1

4.

n

5n

∞ , (n !)2 (t + 3)n

n =1

(2n)!

In exercises 5–17, ﬁnd the ﬁrst four nonzero coefﬁcients of the Taylor series expansion for each function f (t ) about a = 0. In addition, state the radius of

462

Series solutions for differential equations

convergence of the series expansion. Wherever possible, use known expansions and the techniques of examples 8.2.2, 8.2.3, and 8.2.4. √ 5. f (t ) = t + 1 6. f (t ) = t 3 + 5t 2 − 3t + 8 7. f (t ) =

1 1 + t4

8. f (t ) = e −t

2

e 2t − 1 2t sin t 10. f (t ) = t 9. f (t ) =

11. f (t ) = t 3 sin t 2 12. f (t ) = cos t 3 13. f (t ) = cos t sin t 14. f (t ) = cos2 (t ) 15. f (t ) = e −t sin t 16. f (t ) =

et 1+t

17. f (t ) = arctan t 2 In exercises 18–24, ﬁnd the ﬁrst four nonzero coefﬁcients of the Taylor series expansion for each integral by ﬁrst ﬁnding the expansion of the integrand and then integrating term by term.2 t 1 18. ds 4 0 1+s t 2 19. e −s ds

0 t

e 2s − 1 ds 2s

t

sin s ds s

20.

0

21.

0 t

22.

s 3 sin s 2 ds

0 2

Your work in exercises 5–17 will be helpful.

Power series solutions of linear equations

t

23.

463

cos s 3 ds

0 t

24.

arctan s 2 ds

0

8.3 Power series solutions of linear equations

In this section, we begin solving linear differential equations by assuming that the solution function may be expressed as a power series. To motivate our work, we revisit a familiar ﬁrst-order equation (which we can solve easily by other means) to explore how series can be used in this way. Example 8.3.1 By assuming that y has a power series expansion of the form y(t ) = a0 + a1 t + a2 t 2 + a3 t 3 + · · · , determine the solution to the initial-value problem y = y, y(0) = 1 Writing y(t ) = a0 + a1 t + a2 t 2 + a3 t 3 + · · · , we know y (t ) = a1 + 2a2 t + 3a3 t 2 + 4a4 t 3 + · · · Equating y and y , we observe that a0 + a1 t + a2 t 2 + a3 t 3 + · · · = a1 + 2a2 t + 3a3 t 2 + 4a4 t 3 + · · · (8.3.1) Because of the uniqueness of Taylor series expansions (theorem 8.2.2), we may equate like coefﬁcients of powers of t in (8.3.1), from which we deduce that the following recurrence relation among the coefﬁcients ai must hold: a 0 = a1 Solution.

a1 = 2a2 a2 = 3a3 .. .

an = (n + 1)an+1 Provided that we know a0 , we can ﬁnd all of the remaining values of ai . Clearly, a0 = y(0), so using the initial condition y(0) = 1, 1 1 1 , ... a0 = 1, a1 = 1, a2 = , a3 = a2 = 2 3 3·2 From this sequence of coefﬁcients and the general recurrence relation an+1 = 1 1 n +1 an , we observe that an = n ! , and therefore 1 1 1 y(t ) = 1 + t + t 2 + t 3 + · · · + t n + · · · 2! 3! n! which we recognize as the familiar power series expansion of y(t ) = e t , the solution to the IVP y = y, y(0) = 1.

464

Series solutions for differential equations

Obviously there is no need to use power series to solve the IVP given in example 8.3.1, as it is a standard linear ﬁrst-order equation. However, given our desire to solve higher order equations that are linear, but for which we currently lack a method for obtaining an analytic solution, this example is important since we hope to generalize from the simpler ﬁrst-order constant coefﬁcient case to the more difﬁcult second-order non-constant coefﬁcient one. For example, a linear second-order differential equation such as y − 2ty + y = 0 (8.3.2) in which the coefﬁcients of y, y , and y are not all constant is not among the collection of equations whose solutions we can currently determine. Equations such as (8.3.2) belong to a family of equations of the general form y + p(t )y + q(t )y = f (t ) (8.3.3) that we now aspire to solve. Before we solve equations of form (8.3.3), we consider one more familiar example that introduces other critical ideas that arise when solving linear second-order equations through power series expansions. Because we already know the solution to the equation we consider, we will be able to check our work appropriately and better see the role that series expansions play. Example 8.3.2

Solve the initial-value problem y + y = 0,

y (0) = 1

y(0) = 1,

by assuming that y has a power series expansion y(t ) = a0 + a1 t + a2 t 2 + a3 t 3 + a4 t 4 + · · · . Solution.

Since y = a0 + a1 t + a2 t 2 + a3 t 3 + a4 t 4 + · · · , it follows that y = a1 + 2a2 t + 3a3 t 2 + 4a4 t 3 + · · · and y = 2a2 + 3 · 2a3 t + 4 · 3a4 t 2 + 5 · 4a5 t 3 + · · ·

Substituting for y and y in the given equation y + y = 0, we have (a0 + a1 t + a2 t 2 + a3 t 3 + a4 t 4 +··· ) + (2a2 + 6a3 t + 12a4 t 2 + 20a5 t 3 +··· ) = 0 Gathering terms with like coefﬁcients, (a0 + 2a2 ) + (a1 + 6a3 )t + (a2 + 12a4 )t 2 + (a3 + 20a5 )t 3 + · · · = 0

(8.3.4)

Setting each coefﬁcient of powers of t in (8.3.4) equal to zero implies that the following sequence of equalities holds: a0 = −2a2

a1 = −6a3

a2 = −12a4

a3 = −20a5

a4 = −30a6

a5 = −42a7

.. .

a2n = −(2n + 2)(2n + 1)a2n+2

.. .

a2n+1 = −(2n + 3)(2n + 2)a2n+3

Power series solutions of linear equations

465

We group these equations into the two columns shown for the natural reason that the coefﬁcients with even indices depend recursively on one another, as do the coefﬁcients with odd indices. Furthermore, we see that if we can identify both a0 and a1 (which we can through the two stated initial conditions), then we can determine all of the remaining coefﬁcients. Speciﬁcally, since y(0) = 1 and a0 = y(0), it follows that a0 = 1. Similarly, with the given condition y (0) = 1 and the fact that a1 = y (0), we know a1 = 1. Thus, from the sequence of equalities with even indices above, 1 1 1 1 = , a0 =1, a2 = − , a4 = − a2 = 2 4·3 4 · 3 · 2 4! 1 1 1 =− and a6 = − a4 = − 30 6 · 5 · 4! 6! From this and the stated recurrence relation for a2n and a2n+2 , we observe that 1 , n = 0, 1, 2, . . . . (8.3.5) a2n = (−1)n (2n)! The formula (8.3.5) implies that the portion of the series expansion for y in which all of the powers of t are even will be 1 1 1 y1 = 1 − t 2 + t 4 − t 6 + · · · (8.3.6) 2! 4! 6! which we recognize as the familiar series expansion for cos t . Returning to the recurrence relation involving the coefﬁcients with odd indices, nearly identical work to that with the even coefﬁcients shows that 1 1 1 1 1 a3 = − , a5 = − a3 = , and a7 = − a4 = − 3! 5·4 5! 42 7! These observations imply that the part of the expansion of y involving odd coefﬁcients has form 1 1 1 y2 = t − t 3 + t 5 − t 7 + · · · (8.3.7) 3! 5! 7! which is sin t . Hence our work with series expansions at (8.3.6) and (8.3.7) has shown that 1 1 1 1 1 1 y = 1 + t − t2 − t3 + t4 + t5 − t6 − t7 + ··· 2! 3! 4! 5! 6! 7! 1 2 1 4 1 6 1 3 1 5 1 7 = 1 − t + t − t + ··· + t − t + t − t + ··· 2! 4! 6! 3! 5! 7! = cos t + sin t (8.3.8) a 1 = 1,

Again, it is no surprise that y = cos t + sin t is the solution to the IVP y + y = 0, y(0) = 1, y (0) = 1. We know from our work in several different contexts that the general solution to this differential equation is y = c1 cos t + c2 sin t , and can easily see that the given two initial conditions lead to c1 = c2 = 1. Even without the initial conditions, we could have determined from our work in example 8.3.2 that y = a0 cos t + a1 sin t . Regardless, there is a great deal we can learn about

466

Series solutions for differential equations

series solutions to differential equations by thinking carefully about our work in this familiar example. First, we saw that in order to get the recurrence relations started, we needed to know the values of a0 and a1 . This reinforces the fact that the solution space to the second-order equation is two dimensional, and suggests that the power series expansion has the property that it detects the need for two linearly independent solutions. Next, we observe from our work in example 8.3.2 that two different unlinked series solutions arose in the solution; these turned out to be the expansions for the cosine and sine functions, respectively, each of which has an inﬁnite radius of convergence. This led to the overall solution series being convergent for every value of t . Finally, we note that normally we will need to be content with expressions that state the ﬁrst few nonzero terms of a power series expansion, as we cannot expect in general to be able to recognize familiar power series expansions within solutions, as we did at (8.3.8). In general, we will be interested in linear differential equations of the form y + p(t )y + q(t )y = 0 (8.3.9) If p(t ) and q(t ) are both analytic functions at t = a (that is, both have a Taylor expansion at a), then we call t = a an ordinary point of the DE (8.3.9). Otherwise, t = a is a singular point of (8.3.9). The following theorem tells us that if t = 0 is an ordinary point of (8.3.9), then there exist two linearly independent solutions to the DE that may be represented by Taylor series centered at t = 0. Theorem 8.3.1 If t = 0 is an ordinary point of (8.3.9), then there exist two linearly independent solutions y1 (t ) =

∞ ,

n =0

an t n and y2 (t ) =

∞ ,

bn t n

(8.3.10)

n =0

Both series converge in a disk |t | < R, where R is at least as large as the distance from the origin to the nearest singular point of the functions p(t ) and q(t ). In example 8.3.2, the coefﬁcient functions of y and y in the DE were simply the constant functions 0 and 1, which are each analytic everywhere. Theorem 8.3.1 implies that the two series expansions we found (which were those of the cosine and sine functions) must therefore converge everywhere. We see from this result that anytime the coefﬁcient functions p(t ) and q(t ) are constant, the solution functions that arise must converge everywhere. This is not surprising, given our experience that in the case of linear differential equations with constant coefﬁcients, solutions essentially consist of the functions e kt , sin kt , and cos kt . More generally, we can now state that if p(t ) and q(t ) are polynomial functions, which are also analytic everywhere, then the series in (8.3.10) must both converge everywhere. We now consider an example involving a differential equation that we are unable to solve by other means in order to gain more understanding of the role played by inﬁnite series in its solution.

Power series solutions of linear equations

467

Example 8.3.3 Consider the linear second-order differential equation y − 2ty + y = 0 (8.3.11) Determine two linearly independent series solutions to this equation. Then, solve the initial-value problem given by this DE along with the initial conditions y(0) = 2, y (0) = −1. Solution. We begin by assuming that y = a0 + a1 t + a2 t 2 + a3 t 3 + · · · . From this, it follows ∞ , = nan t n−1 y = a1 + 2a2 t + 3a3 t 2 + 4a4 t 3 + · · · n =1

−2ty = −2at − 4a2 t 2 − 6a3 t 3 − 8a4 t 3 + · · · = −

∞ ,

2nan t n

n =1

y = 2a2 + 6a3 t + 12a4 t 2 + 20a5 t 3 + · · · =

∞ ,

n(n − 1)an t n−2

n =2

In many instances, it will be most convenient to work with power series represented in the shorthand sigma ( ) notation, which is how we will proceed from here. Substituting in (8.3.11) with the series expressions for y , −2ty , and y, we ﬁnd ∞ ∞ ∞ , , , n −2 n n(n − 1)an t − 2nan t + an t n = 0 (8.3.12) n =1

n =2

n =0

In order to equate the coefﬁcients of like powers of t , it is helpful to write each series in (8.3.12) using the same indices for the sum. Replacing n with n + 2 allows us to write ∞ ∞ , , n(n − 1)an t n−2 = (n + 2)(n + 1)an+2 t n n =2

n =0

In addition, observe that ∞ ,

2nan t n =

n =1

∞ ,

2nan t n

n =0

because the term −2nan vanishes when n = 0. Therefore we can revise (8.3.12) to have the form ∞ ∞ ∞ , , , (n + 2)(n + 1)an+2 t n + −2nan t n + an t n = 0 (8.3.13) n =0

n =0

n =0

Now that each series is indexed from n = 0 with corresponding powers of t , we can combine the three sums into one and write ∞ , [(n + 2)(n + 1)an+2 − 2nan + an ]t n = 0 (8.3.14) n =0

468

Series solutions for differential equations

Because (8.3.14) implies that every coefﬁcient of the series must be zero, we see that the constants an must satisfy the recurrence relation (n + 2)(n + 1)an+2 − 2nan + an = 0 or equivalently 2n − 1 an+2 = (8.3.15) an , n = 0, 1, 2, . . . . (n + 2)(n + 1) Here it is essential to observe that since the subscripts differ by two in (8.3.15), we can obtain two distinct series solutions to the original equation (8.3.11), one involving all of the even terms and the other all of the odd ones. In particular, considering n = 0, 2, 4, . . ., we have from (8.3.15) that −1 · 3 −1 · 3 · 7 1 3 7 a 2 = − a 0 , a4 = a2 = a0 , and a6 = a4 = a0 2 3·4 2·3·4 6·5 2·3·4·5·6 More generally, the pattern −1 · 3 · 7 · · · (4n − 5) a2n = (2n)! holds and therefore 1 1 7 y1 (t ) = a0 − a0 t 2 − a0 t 4 − a0 t 6 + · · · 2 8 240 ∞ , 1 · 3 · 7 · · · (4n − 5) 2n = a0 − a0 t (8.3.16) (2n)! n =1

Similarly, if we examine the odd terms for n = 1, 3, 5, . . . in (8.3.15), we see 1 5 1·5 9 1·5·9 a3 = a 1 , a5 = a3 = a1 , and a7 = a5 = a1 2·3 4·5 2·3·4·5 6·7 2·3·4·5·6·7 Thus, we ﬁnd 1 · 5 · 9 · · · (4n − 3) a1 a2n+1 = (2n + 1)! and therefore 1 1 y2 (t ) = a1 t + a1 t 3 + a1 t 5 + · · · 6 24 ∞ , 1 · 5 · 9 · · · (4n − 3) 2n+1 = a1 t + a1 t (8.3.17) (2n + 1)! n =1

Because y1 only involves even powers of t and y2 only involves odd powers of t , it is obvious that y1 and y2 must be linearly independent functions: it is impossible for one to be a scalar multiple of the other. Hence we have found the two basic solutions to the given DE and the general solution is y = a0 y1 + a1 y2 1 2 1 2 ∞ ∞ , , 1 · 3 · 7 ··· (4n − 5) 2n 1 · 5 · 9 ··· (4n − 3) 2n+1 = a0 1 − t t + a1 t + (2n)! (2n + 1)! n =1

n =1

Power series solutions of linear equations

469

Moreover, since p(t ) = −2t and q(t ) = 1 are analytic everywhere, it follows from theorem 8.3.1 that both y1 and y2 converge for all values of t , as must the general solution (8.3.18). Finally, if we desire to solve the initial-value problem with y(0) = 2 and y (0) = −1, we need only observe from our beginning assumption regarding the series expansion of y that y(0) = a0 = 2 and y (0) = a1 = −1. Therefore, the solution to the IVP is 1 2 1 2 ∞ ∞ , , 1 · 3 · 7 · · · (4n − 5) 2n 1 · 5 · 9 · · · (4n − 3) 2n+1 y = 2 1− − t+ t t (2n)! (2n + 1)! n =1

n =1

In the recurrence relation that arises from assuming that y = a0 + a1 t + a2 t 2 + · · · , it is not always obvious that two linear solutions to the original linear second-order equation arise. Often, we must content ourselves with ﬁnding the ﬁrst several terms of the overall general solution and rely on theorem 8.3.1 to tell us that both have been found. We close this section with an example that demonstrates this fact through connections to earlier material we have studied. Example 8.3.4 Use inﬁnite series to determine the solution to the initial-value problem (8.3.18) y − 2y − 3y = 0, y(0) = 4, y (0) = 0 Compare your result to the known solution to this IVP which can be found without using series. Solution.

Considering the series expansions for y, y , and y , we observe that

y = a0 + a1 t + a2 t 2 + a3 t 3 + · · · + an t n + · · · y = a1 + 2a2 t + 3a3 t 2 + 4a4 t 3 + · · · + (n + 1)an+1 t n + · · · y = 2a2 + 6a3 t + 12a4 t 2 + 20a5 t 3 + · · · + (n + 2)(n + 1)an+2 t n + · · · From the differential equation y − 2y − 3y = 0, we know that y = 2y + 3y. Equating like coefﬁcients from the expressions for y and 2y + 3y, we ﬁnd the recurrence relation 2a2 = 2a1 + 3a0 6a3 = 4a2 + 3a1 12a4 = 6a3 + 3a2 20a5 = 8a4 + 3a3 .. .

470

Series solutions for differential equations

More generally, we can state that for any n ≥ 2, an =

(2n − 2)an−1 + 3an−2 n(n − 1)

Using the given initial conditions, we ﬁnd that a0 = y(0) = 4 and a1 = y (0) = 0, and subsequently that 2a1 + 3a0 0 + 12 = =6 a2 = 2 2 a3 =

4a2 + 3a1 24 + 0 = =4 6 6

6a3 + 3a2 24 + 18 7 = = 12 12 2 and therefore the solution to the IVP is 7 y = 4 + 6t 2 + 4t 3 + t 4 + · · · 2 We can conﬁrm that this is in fact the correct solution by solving the IVP through another approach and considering power series expansions of the basic solution functions. In particular, since the characteristic equation of (8.3.18) is r 2 − 2r − 3 = 0 with roots r = 3 and r = −1, the general solution of the DE is y = c1 e 3t + c2 e −t a4 =

It is a standard exercise to show that the values of the constants that satisfy the initial conditions are c1 = 1 and c2 = 3, so that y = e 3t + 3e −t If we now employ the standard power series expansion for e t to write series expansions for the two solutions present in y, and then combine like terms, we observe that y = e 3t + 3e −t 9t 2 27t 3 81t 4 3t 2 3t 3 3t 4 + + +··· + 3 − 3t + − + −··· = 1 + 3t + 2! 3! 4! 2! 3! 4! = 4+

12t 2 24t 3 84t 4 + + +··· 2! 3! 4!

7 = 4 + 6t 2 + 4t 3 + t 4 +··· 2 which is precisely the power series expansion of the solution we found at the outset.

Legendre’s equation

471

Example 8.3.4 demonstrates that although the series form of the solution can hide some of the inherent structure in the solution, this approach is nonetheless straightforward to apply and will effectively lead us to the power series expansion of the solution to a stated IVP. Exercises 8.3 In exercises 1–13, ﬁnd the ﬁrst four terms in the Taylor series representation of the general solution to the stated DE. 1. y + ty = 0 2. y + 4y = 0 3. y + 4y = 0 4. y + ty = 0 5. y + 6y + 5y = 0 6. y + y + 4y = 0 7. y − y − 6y = 0 8. y + t 2 y = 0 9. (1 − t )y + y = 0 10. (t 2 − 1)y − 4y = 0 11. y + 3ty + 3y = 0 12. (t 2 + 1)y − 2y = 0 13. (1 − t 2 )y − 12ty − 18y = 0 In exercises 14–17, ﬁnd the ﬁrst four nonzero coefﬁcients of the Taylor series expansion for the solution to the stated IVP. 14. (4 − t 2 )y + 2y = 0, 15. y + (1 − t )y = 0,

y(0) = 1,

16. y − t 2 y + y sin t = 0, 17. y + y sin t = 0,

y (0) = 1

y(0) = 0,

y (0) = 0

y(0) = 0,

y(0) = 1,

y (0) = 1

y (0) = 0

8.4 Legendre’s equation

A differential equation that arises naturally in physics, particularly when using spherical coordinates, is the Legendre equation, (8.4.1) (1 − t 2 )y − 2ty + λ(λ + 1)y = 0 The parameter λ is often a positive integer, though it is allowed to be any real, non-negative constant. If we divide both sides of (8.4.1) by 1 − t 2 to write the

472

Series solutions for differential equations

equation in standard form y + p(t )y + q(t )y = 0, we have y −

2t λ(λ + 1) y + y =0 1 − t2 1 − t2

(8.4.2)

With

λ(λ + 1) 2t and q(t ) = 1 − t2 1 − t2 it follows that the origin is an ordinary point of Legendre’s equation and the nearest singularities lie at t = ±1. We therefore expect that we can ﬁnd Taylor series expansions about t = 0 for each of the two linearly independent solutions of (8.4.1), and the radius of convergence of each such series will be at least 1. To solve the Legendre equation, we assume that

p(t ) = −

y(t ) =

∞ ,

an t n

n =0

and consider the three terms present in the DE: (1 − t 2 )y , −2ty , and λ(λ + 1)y. Letting α = λ(λ + 1) and writing each of these expressions in their series expansion, we have (1 − t 2 )y = (1 − t 2 )

∞ ,

n(n − 1)an t n−2 =

n =2

=

∞ ,

(n + 2)(n + 1)an+2 t n −

∞ ,

∞ ,

n(n − 1)an t n

n =2

n(n − 1)an t n

(8.4.3)

n =0

−2ty = −2t

∞ ,

nan t n−1 =

n =1 ∞ ,

n(n − 1)an t n−2 −

n =2

n =0

αy =

∞ ,

∞ ,

−2nan t n =

n =1

∞ ,

−2nan t n

(8.4.4)

n =0

α an t n

(8.4.5)

n =0

To achieve the ﬁnal expression for (1 − t 2 )y in (8.4.3), we re-indexed the ﬁrst sum by letting n be replaced by n + 2 and lowering the index, and re-indexed the second sum by noting that when n = 0 and n = 1, the coefﬁcient n(n − 1) vanishes, so starting at n = 0 is the same as starting at n = 2. Likewise, for the expression for −2ty , the term nan t n is zero when n = 0, so we can start the sum at n = 0 instead of n = 1 in (8.4.4). Thus, all three series are written in terms of powers of t n starting at n = 0. Next, to satisfy Legendre’s equation (8.4.1), we take the series expressions in (8.4.3), (8.4.4), and (8.4.5) and set their collective sum to zero. Doing so, 0 = (1 − t 2 )y − 2ty + α y =

∞ ,

n =0

(n + 2)(n + 1)an+2 t n −

∞ ,

n =0

n(n − 1)an t n +

∞ ,

n =0

−2nan t n +

∞ ,

n =0

α an t n

Legendre’s equation

=

∞ ,

473

[(n + 2)(n + 1)an+2 − (n(n − 1) + 2n − α )an ] t n

n =0

=

∞ ,

(n + 2)(n + 1)an+2 − (n 2 + n − α )an t n

(8.4.6)

n =0

We thus observe (8.4.6) implies the recurrence relation (n + 2)(n + 1)an+2 − (n 2 + n − α )an = 0

(8.4.7)

Recalling that α = λ(λ + 1) = λ2 + λ, we may write n 2 + n − α = n 2 + n − λ2 − λ = (n − λ)(n + λ + 1)

(8.4.8)

Hence, (8.4.7) and (8.4.8) together show a n +2 =

(n − λ)(n + λ + 1) an (n + 2)(n + 1)

(8.4.9)

As we have seen in certain other DEs, the recurrence relation (8.4.9) makes all of the even coefﬁcients in the expansion for y depend on a0 , and all of the odd coefﬁcients depend on a1 . Assuming that a0 = 1 and computing the ﬁrst few even coefﬁcients, we ﬁnd that a0 = 1, a2 =

(−λ)(λ + 1) (2 − λ)(3 + λ) a0 , a4 = a2 2·1 4·3

so that one solution to the Legendre equation is y1 (t ) = 1 −

1 1 λ(λ + 1)t 2 + λ(λ + 1)(λ − 2)(λ + 3)t 4 + · · · 2! 4!

(8.4.10)

Similar computations for the odd coefﬁcients with a1 = 1 results in the function y2 (t ) = t −

1 1 (λ− 1)(λ+ 2)t 3 + (λ− 1)(λ− 3)(λ+ 2)(λ+ 4)t 5 +· · · (8.4.11) 3! 5!

The solutions y1 and y2 are clearly linearly independent and therefore form a basis for the set of all solutions to the Legendre equation. Note particularly that each depends directly on the parameter λ, as the Legendre equation is actually a family of equations where each equation depends on λ. In our development of y1 and y2 , note that we assumed a0 = 1 and a1 = 1, which is equivalent to assuming that y(0) = 1 and y (0) = 1. The general solution of the Legendre equation is y = a0 y1 + a1 y2 , where y1 and y2 are given by 8.4.10 and 8.4.11, respectively. The case when λ is a non-negative integer is particularly interesting. From the recurrence relation (8.4.9), whenever λ = n, it follows that an+2 = 0 and hence an+4 , an+6 , . . . are all zero. Since this causes the series expansion of y1 or y2 to terminate, one of the resulting solutions to the differential equation is a polynomial. In particular, if λ is an even integer, say λ = 2m, then y1 (t ) is a

474

Series solutions for differential equations

polynomial of degree 2m. For example, λ=0:

y1 (t ) = 1

λ=2:

y1 (t ) = 1 − 3t 2

35 4 t 3 Similarly, in the case where λ = 2m + 1 is an odd integer, y2 (t ) is a polynomial of degree 2m + 1. The ﬁrst few examples for small values of λ are λ=4:

y1 (t ) = 1 − 10t 2 +

λ=1:

y2 (t ) = t

5 y2 (t ) = t − t 3 3 14 21 λ=5: y2 (t ) = t − t 3 + t 5 3 5 These polynomials demonstrate that when λ is non-negative integer, at least one basic solution of the Legendre equation is a polynomial function. Moreover, since the Legendre equation is linear, any scalar multiple of a solution is also a solution, so we can scale these polynomials however we like. Doing so to make the polynomial’s value 1 when t = 1 results in the family of polynomials λ=3:

P0 (t ) = 1 P1 (t ) = t 3 1 P2 (t ) = t 2 − 2 2 5 3 3 P3 (t ) = t − 2 2 35 4 30 2 3 P4 (t ) = t − t + 8 8 8 63 5 70 3 15 P5 (t ) = t − t + 8 8 8 The polynomials Pn (t ), which can also be described through a recurrence relation linking Pn+2 to Pn+1 and Pn , are known as the Legendre polynomials and form a well-known class of so-called orthogonal polynomials. The Legendre polynomials have many interesting properties, including the fact that each has n real, distinct roots that lie in the interval (−1, 1) and demonstrate an oscillatory behavior similar to the graph of P11 (t ) shown in ﬁgure 8.1. The study of orthogonal polynomials has important ramiﬁcations in many areas of mathematics and physics, but lies beyond the scope of this text. Regardless of whether λ is a non-negative integer or not, the two inﬁnite series expansions for y1 and y2 in (8.4.10) and (8.4.11) are the two linearly independent solutions of the Legendre equation. In the case where λ is a nonnegative integer, we have shown that one of these two inﬁnite series terminates

Legendre’s equation

475

1

t −1

1

−1 Figure 8.1 The degree 11 Legendre polyno-

mial, P11 (t ).

to form a polynomial, one of the Legendre polynomials. The other solution turns out to have recognizable structure as well. For instance, when λ = 0, we know that one solution to the Legendre equation comes from y1 (t ) = 1 = P0 (t ). Setting λ = 0 in y2 (t ), it follows −1 · 2 3 −1 · (−3) · 2 · 4 5 y2 (t ) = t − t + t + ··· 3! 5! 1 1 = t + t3 + t5 + ··· 3 5 It can be shown from this expansion that 1 1+t y2 (t ) = ln 2 1−t

(8.4.12)

Thus, " when # λ = 0, a second linearly independent solution is given by Q0 (t ) = 1 1+t ln 2 1−t and we write y = c1 P0 + c2 Q0 . More generally, it can be shown that for any non-negative integer λ = n, a related expression involving Q0 exists for the second linearly independent solution Qn that is not a polynomial. In particular, these functions are known as Legendre functions of the second kind; the ﬁrst several of these functions are given by 1 1+t Q0 (t ) = ln 2 1−t Q1 (t ) = P1 (t )Q0 (t ) − 1 3 Q2 (t ) = P2 (t )Q0 (t ) − t 2

476

Series solutions for differential equations

5 2 Q3 (t ) = P3 (t )Q0 (t ) − t 2 + 2 3 35 3 55 t + 8 24 Note that the presence of Q0 (t ) in each solution highlights the fact that singularities are present in the Legendre equation at t = ±1. The functions P1 (t ), P2 (t ), . . . are the previously noted Legendre polynomials. Further, the general solution of the Legendre equation with λ = n ≥ 0 is therefore Q4 (t ) = P3 (t )Q0 (t ) −

y(t ) = c1 Pn (t ) + c2 Qn (t )

(8.4.13)

We close this section with an example. Example 8.4.1

Find the solution of the initial-value problem

(1 − t 2 )y − 2ty + 12y = 0,

y(0) = 1,

y (0) = 1

Solution. First, observe that the given DE is Legendre’s equation with λ = 3, since 3(3 + 1) = 12. From our earlier work in this section, we know that the general solution is y(t ) = c1 P3 (t ) + c2 Q3 (t ) 5 2 2 = c1 P3 (t ) + c2 P3 (t )Q0 (t ) − t + 2 3 5 2 = P3 (t )(c1 + c2 Q0 (t )) + c2 − t 2 + 2 3 5 3 3 c2 1 − t 5 2 2 + c2 − t + c1 + ln t − t = 2 2 2 1+t 2 3

(8.4.14)

Applying the initial conditions y(0) = 1 and y (0) = 1 to 8.4.14, we can show that c1 = −2/3 and c2 = 3/2, and thus 5 3 3 2 3 1−t 15 − + ln − t2 + 1 t − t y= 2 2 3 4 1+t 4 is the solution to the given IVP.

Exercises 8.4 1. Verify by direct substitution that the Legendre equation is satisﬁed by the polynomials P2 (t ) and P3 (t ) when λ = 2 and λ = 3, respectively. 2. Verify by direct substitution that Q0 (t ) = 12 ln(1 + t )/(1 − t ) is a solution of Legendre’s equation with λ = 0.

Three important examples

477

3. Determine the Taylor series expansion about a = 0 of f (t ) = 12 ln(1 + t )/(1 − t ) and conﬁrm that this matches (8.4.12). 4. Determine expressions for P6 (t ) and P7 (t ). In exercises 5–7, ﬁnd the general solution of the stated differential equation in terms of Pn (t ) and Qn (t ). (Hint : Use the method of undetermined coefﬁcients in the standard way to ﬁnd a particular solution of each equation.) 5. (1 − t 2 )y − 2ty + 6y = 6 6. (1 − t 2 )y − 2ty + 20y = 36t 7. (1 − t 2 )y − 2ty + 30y = 12t 2 In exercises 8–17, ﬁnd the ﬁrst four nonzero coefﬁcients of the Taylor series expansion (about t = 0) for the solution to the stated IVP. 8. (1 − t 2 )y − 2ty + 2y = 0,

y(0) = 1, y (0) = 0

9. (1 − t 2 )y − 2ty + 3y = 0,

y(0) = 1, y (0) = 0

10. (1 − t 2 )y − 2ty + 20y = 18t ,

y(0) = 0, y (0) = 1

11. 9(1 − t 2 )y − 18ty + 4y = 0, 12. (1 − t 2 )y − 2ty + 20y = 0,

y(0) = 0, y (0) = 1 y(0) = 1, y (0) = 1

13. (1 − t 2 )y − 2ty + 20y = 14t 2 ,

y(0) = 3, y (0) = 1

8.5 Three important examples

In this penultimate section on series solutions to differential equations, we consider and discuss three examples that arise in applied physics. 8.5.1 The Hermite equation

The Hermite equation is the linear second-order differential equation given by y − 2ty + 2qu = 0 where q is a real constant. Using the Taylor series expansions for y, in the usual way with y = a0 + a1 t + a2 t 2 + · · · , it can be shown that ∞ , [(n + 2)(n + 1)an+2 − 2(n − q)an ]t n = 0

(8.5.1)

y ,

and y

(8.5.2)

n =0

from which follows the recurrence relation 2(n − q) an+2 = (8.5.3) an , n = 0, 1, 2, . . . . (n + 1)(n + 2) As we have seen in previous examples, the even-subscripted coefﬁcients depend on y(0) = a0 , and the odd-subscripted coefﬁcients involve y (0) = a1 .

478

Series solutions for differential equations

To calculate the ﬁrst few nonzero terms in the expansions for the solution y1 (t ) involving even powers of t , we observe that 2(0 − q) q a 0 = −2 a 0 1·2 2! 2(2 − q) q(2 − q) a4 = a0 a2 = −22 3·4 4! 2(4 − q) q(2 − q)(4 − q) a6 = a0 a2 = −23 5·6 6!

a2 =

More generally, it follows that a2k = −2k

q(2 − q) · · · (2k − 2 − q) a0 (2k)!

(8.5.4)

If we elect to use the initial conditions y(0) = 1 and y (0) = 0, this implies that a0 = 1 and a1 = 0; the latter condition and the recurrence relation (8.5.3) imply that all odd-subscripted coefﬁcients are zero, and hence one solution to the Hermite differential equation is y1 (t ) = a0 + a1 t + a2 t 2 + · · · 2q 2 22 q(2 − q) 4 t − t − ··· 2! 4! ∞ , q(2 − q) · · · (2n − 2 − q) 2n = 1− 2n t (2n)! = 1−

(8.5.5)

n =1

Using similar reasoning with odd-subscripted coefﬁcients, (8.5.3) implies 2(1 − q) a1 2·3 2(3 − q) (1 − q)(3 − q) a5 = a1 a3 = 22 4·5 5! 2(5 − q) (1 − q)(3 − q)(5 − q) a7 = a1 a5 = 23 6·7 7! a3 =

From this, we can deduce that the general odd coefﬁcient is given by a2k +1 = 2k

(1 − q)(3 − q) · · · (2k − 1 − q) a1 (2k + 1)!

(8.5.6)

Using the initial conditions y(0) = 0 = a0 and y (0) = 1 = a1 , a second solution to the Hermite equation is y2 (t ) = t +

∞ ,

n =1

2n

(1 − q)(3 − q) · · · (2n − 1 − q) 2n+1 t (2n + 1)!

(8.5.7)

Three important examples

479

Since y1 (t ) and y2 (t ) are linearly independent, the general solution to the Hermite equation is y = c1 y1 + c2 y2 1 2 ∞ , n q(2 − q) · · · (2n − 2 − q) 2n = c1 1 − 2 t (2n)! n =1

1 + c2 t +

∞ ,

n =1

(1 − q)(3 − q) · · · (2n − 1 − q) 2n+1 2n t (2n + 1)!

2 (8.5.8)

Just as we experienced with Legendre’s equation, there are values for the constant q in the Hermite equation that lead to polynomial solutions. In particular, the presence of the factor (2n − 2 − q) in y1 (t ) implies that whenever q is an even, non-negative integer, then y1 (t ) is a polynomial. Speciﬁcally, from (8.5.5), when q = 0, q = 2, and q = 4, it follows that q=0:

y1 (t ) = 1

q=2:

y1 (t ) = 1 − 2t 2

(8.5.9)

4 y1 (t ) = 1 − 4t 2 + t 4 3 Similarly, for q = 1, q = 3, and q = 5, the function y2 (t ) that is a solution to the Hermite equation is found to be q=4:

q=1:

y2 (t ) = t

2 y2 (t ) = t − t 3 (8.5.10) 3 4 4 q=5: y2 (t ) = t − t 3 + t 5 3 15 The polynomial solutions to Hermite’s equation given in (8.5.9) and (8.5.10) are usually called the Hermite polynomials Hn (t ) when scaled such that the coefﬁcient of the highest power of t is 2n . The ﬁrst four Hermite polynomials are q=3:

H0 (t ) = 1 H1 (t ) = 2t H2 (t ) = 4t 2 − 2 H3 (t ) = 8t 3 − 12t The Hermite polynomials are another example of a family of orthogonal polynomials; Hermite polynomials are orthogonal on (−∞, ∞) with respect 2 to the weighting function w(t ) = e −t . Like Legendre polynomials, they have a wide range of interesting properties and the possibilities they present for further study go well beyond the scope of this text. A plot of H11 (t ) is shown in ﬁgure 8.2. The Hermite polynomials have large oscillations; the degree 11

480

Series solutions for differential equations

10

6

t −2

2

−106 Figure 8.2 The degree 11 Hermite

polynomial, H11 (t ), plotted on the interval [−3, 3].

polynomial has two more zeros, located at approximately ±3.7, which are not shown in ﬁgure 8.2. 8.5.2 The Laguerre equation

The Laguerre equation is given by ty + (1 − t )y + qy = 0

(8.5.11)

where q is, once again, a real constant. If we divide through by t , Laguerre’s equation is equivalently expressed as y +

1−t q y + y =0 t t

Since the coefﬁcient functions p(t ) of y and q(t ) of y are each undeﬁned at t = 0, the Laguerre equation has a singular point at the origin. Nonetheless, it turns out that we can ﬁnd a series expansion for a solution at the origin. Letting y = a0 + a1 t + a2 t 2 +· · · and substituting for y, y , and y in (8.5.11) it can be shown that the coefﬁcients an must satisfy ∞ ,

(n + 1)2 an+1 + (q − n)an t n = 0

(8.5.12)

n =1

It follows from (8.5.12) that (n + 1)2 an+1 + (q − n)an = 0 and therefore an+1 = −

q−n an (n + 1)2

(8.5.13)

Three important examples

481

Note that this recurrence relation relies only on the value of a0 , and therefore only leads to one solution to the Laguerre equation.3 Applying (8.5.13), we see q a 1 = − 2 a0 1 q−1 1 a2 = − 2 a1 = (q − 1)qa0 2 (1 · 2)2 q−2 1 a3 = − 2 a2 = − (q − 2)(q − 1)qa0 3 (1 · 2 · 3)2 More generally, q−n−1 (q − n + 1) · · · (q − 1)q an−1 = (−1)n a0 2 n n !2 Taking a0 = 1, we have found that one solution to the Laguerre equation is an = −

y1 (t ) = 1 +

∞ ,

(−1)n

n =1

(q − n + 1) · · · (q − 1)q n t n !2

(8.5.14)

When q is a non-negative integer, we see from (8.5.14) that y1 (t ) is a polynomial of degree q. Recalling the binomial coefﬁcient nq given by q q! q(q − 1) · · · (q − n + 1) = (8.5.15) = n n !(q − n)! n! we are able to ﬁnd a relatively simple expression for these polynomial solutions. The Laguerre polynomial of degree q is given by q , (−1)n q n Lq (t ) = 1 + t (8.5.16) n n! n =1

and these functions turn out to be the only solutions (up to scalar multiples) of the Laguerre equation that are analytic at t = 0. The Laguerre polynomials are yet another family of orthogonal polynomials. The ﬁrst few of these polynomials are given below, followed by a graph of L11 (t ) in ﬁgure 8.3. L1 (t ) = 1 − t 1 L2 (t ) = 1 − 2t + t 2 2 3 3 1 L3 (t ) = 1 − t + t 2 − t 3 2 2 6 2 1 L4 (t ) = 1 − 4t + 3t 2 − t 3 + t 4 3 24 3 A second solution can be found by more sophisticated techniques that lie beyond the scope of this book.

482

Series solutions for differential equations

15 10 5 t 4

2

6

8

10

−5 Figure 8.3 The degree 11 Laguerre poly-

nomial L11 (t ) plotted on the interval [0, 10]. 8.5.3 The Bessel equation

The Bessel equation t 2 y + ty + (t 2 − λ2 )y = 0

(8.5.17)

is a very important DE in mathematical physics. The properties of its solutions have been well studied; the equation often appears in the process of solving certain partial differential equations that appear when using cylindrical coordinates. The parameter λ in (8.5.17) is a real constant. Like the Laguerre equation, the Bessel equation has a singular point at t = 0, so we cannot expect to ﬁnd solutions to the equation with Taylor series centered at a = 0. Nonetheless, as we will show shortly, a solution analytic at t = 0 exists when λ is a non-negative integer. While a second linearly independent solution to the Bessel equation can be found, the techniques required are beyond the scope of this text. Here we only explore the series solutions that do exist for the Bessel equation. Let λ = m be a non-negative integer and assume that y1 (t ) = a0 + a1 t + a2 t 2 + · · · . Substituting directly in (8.5.17) leads to −m 2 a0 + (1 − m 2 )a1 t +

∞ , [(k 2 − m 2 )ak + ak −2 ]t k = 0

(8.5.18)

k =2

Since each coefﬁcient of powers of t in (8.5.18) must be zero, it follows that m 2 a0 = 0, (1 − m 2 )a1 = 0, and (k 2 − m 2 )ak + ak −2 = 0,

k ≥2

(8.5.19)

If k < m, then it follows ak = 0 for each such k by the three preceding equalities. When k = m, the coefﬁcient k 2 − m 2 of ak vanishes and thus (8.5.19) becomes the identity, rendering the value of am arbitrary. Note further that am+1 = am+3 = · · · = 0 is another consequence of (8.5.19). Thus, am can be any

Three important examples

483

constant, and subsequent terms must satisfy the recurrence 1 1 ak = − , k = m , m + 2, m + 4,... (k + 2)2 − m 2 (k + 2 − m)(k + 2 + m) (8.5.20) Hence, given a positive integer λ = m and a value for am , we can determine all of the coefﬁcients of the Taylor expansion of an analytic solution to the Bessel equation. In particular, these coefﬁcients am+2j for j ≥ 0 must satisfy the recurrence relation (8.5.20), from which using am = 1 we ﬁnd the closed formula (−1)j am+2j = 2−2j (8.5.21) j !(m + 1)(m + 2) · · · (m + j) Hence, one solution of Bessel’s equation (again, when λ = m is a positive integer) is ak +2 = −

y1 (t ) =

∞ ,

2−2j

j =0

(−1)j t m+2j j !(m + 1)(m + 2) · · · (m + j)

(8.5.22)

The Bessel function of the ﬁrst kind of order n (it is standard to use n rather than m for the order of the Bessel function) is the scalar multiple of y1 (t ) given by ∞

Jn (t ) =

, 2−n (−1)j n+2j y1 (t ) = 2−2j −n t n! j !(n + j)!

(8.5.23)

j =0

For example, the ﬁrst two Bessel functions are J0 (t ) =

∞ ,

2−2j

j =0

and J1 (t ) =

∞ ,

j =0

2−2j −1

(−1)j 2j t j !j !

(−1)j 2j +1 t j !(j + 1)!

(8.5.24)

(8.5.25)

The graph of J0 (t ) in ﬁgure 8.4 shows that the Bessel function exhibits damped oscillation. In this section, through the Hermite, Laguerre, and Bessel equations, we have encountered examples not only of three important DEs, but also of the various types of important functions that arise as solutions to these equations. Hermite polynomials, Laguerre polynomials, and Bessel functions are often studied in courses on special functions and demonstrate a wide range of interesting properties that mathematicians, engineers, and physicists have studied. Exercises 8.5 1. Determine the degree 4 and 5 Hermite polynomials, H4 (t ) and H5 (t ).

484

Series solutions for differential equations

0.8 0.4 t 20

10 −0.4

Figure 8.4 The Bessel function of the

ﬁrst kind, J0 (t ).

In exercises 2–4, ﬁnd the ﬁrst three nonzero terms in the Taylor series representation of the general solution to the given Hermite equation. 2. y − 2ty + 6y = 0 3. y − 2ty + 10y = 0 4. y − 2ty + 4y = 0 In exercises 5–7, ﬁnd the ﬁrst three nonzero terms in the Taylor series representation of the general solution to the given IVP. 5. y − 2ty + 6y = 0,

y(0) = 2,

y (0) = 10

6. y − 2ty + 10y = 0,

y(0) = 1,

y (0) = 0

7. y − 2ty + 4y = 8t ,

y(0) = 1,

y (0) = 0

8. Determine the degree 5 and 6 Laguerre polynomials, L5 (t ) and L6 (t ). Given that a general solution of Laguerre’s equation is c1 Lq (t ) + c2 u2 (t ), where u2 (t ) is singular at the origin, in exercises 9–11, determine the solution to the given IVP. 9. ty + (1 − t )y + 3y = 0,

y(0) = ﬁnite,

y(1) = 1

10. ty + (1 − t )y + 4y = 0,

y(0) = ﬁnite,

y(2) = 2

11. ty + (1 − t )y + 4y = 3t ,

y(0) = ﬁnite,

y(1) = 4

12. Determine the ﬁrst ﬁve nonzero terms in the series expansion of J2 (t ) about t = 0. In addition, state the form of J2 (t ) in sigma notation. It can be shown that a second linearly independent solution to the Bessel equation when λ = n (called the Bessel function of the second kind of

The Method of Frobenius

485

order n is given by

2 t Yn (t ) = Jn (t ) ln + γ + R(t ) + u(t ) π 2

where R(t ) is a rational function, γ ≈ 0.577215665 is Euler’s constant, and u(t ) is a power series convergent for all t . Note that Yn (t ) is singular at the origin. In exercises 13–15, determine the general solution to the given equation. 13. t 2 y + ty + (t 2 − 4)y = 0 14. t 2 y + ty + (t 2 − 9)y = 0 15. t 2 y + ty + (t 2 − 16)y = 0 In exercises 16–18, determine the solution to the given IVP. 16. t 2 y + ty + (t 2 − 4)y = 0,

y(0) = ﬁnite,

y(1) = 1

17. t 2 y + ty + (t 2 − 9)y = 0,

y(0) = ﬁnite,

y(1) = −3

18.

t 2 y + ty + (t 2 − 16)y

= 0,

y(0) = ﬁnite,

y(1) = 2

8.6 The Method of Frobenius

Some second-order linear DEs that appear in physical applications do not have two linearly independent analytic solutions about t = 0. Perhaps the most important and well-studied example is the Bessel equation (8.5.17). A somewhat simpler example is 3 1 (8.6.1) t 2 y + ty − y 2 2 which is a Cauchy–Euler equation (on which more information can be found in section 4.7.3). It is a√straightforward exercise to show that for all t > 0, y1 (t ) = t −1 and y2 (t ) = t are linearly independent solutions of (8.6.1). Note that neither y1 nor y2 has a derivative at the origin, and therefore neither is analytic at t = 0; thus, each lacks a Taylor series expansion at the origin. F. Georg Frobenius (1847–1917) showed that a certain class of linear second-order DEs with a singular point at the origin can be represented in series form by a slight generalization of a Taylor series. In particular, he showed that these series solutions have the form y =t

r

∞ ,

k =0

k

bk t =

∞ ,

k =0

bk t k +r

(8.6.2)

0 k where r is a real number and ∞ k =0 bk t converges in some open interval containing the origin. The series (8.6.2) is called a Frobenius series, and the following method we will discuss for obtaining r and the coefﬁcients bk is known as the Method of Frobenius.

486

Series solutions for differential equations

The Cauchy–Euler equation and the Bessel equation both belong to this class of equations that can be solved by the Method of Frobenius. In what follows, we focus particularly on equations of the form t 2 y + tp(t )y + q(t )y = 0

(8.6.3)

where p(t ) and q(t ) are low-degree polynomials. Note that p and q are analytic at the origin, and therefore each has a convergent Taylor series there. Any linear second-order DE with this property is said to have a regular singular point at the origin. The Method of Frobenius applies to all such equations. Finally, observe that if p(t ) and q(t ) are constant polynomials, then (8.6.3) reduces to a Cauchy–Euler equation. To begin, we suppose that there is a solution of (8.6.3) that has a series expansion of the form ∞ , y= bk t k +r (8.6.4) where b0 = 0 and

k =0

0∞

k k =0 bk t converges in 0 < |t | < R. From this, it follows that

y =

∞ ,

(k + r)bk t k +r −1

(8.6.5)

(k + r)(k + r − 1)bk t k +r −2

(8.6.6)

k =0

and y =

∞ ,

k =0

Furthermore, we suppose that p(t ) and q(t ) have the expansions p(t ) = p0 + p1 t + p2 t + · · · + pnn + · · · q(t ) = q0 + q1 t + q2 t + · · · + qnn + · · ·

Substituting these expressions for y, y , y , p, and q in (8.6.3) and gathering like terms, we ﬁnd that 0 = t 2 y + tp(t )y + q(t )y =

∞ ,

(k + r)(k + r − 1)bk t k +r + (p0 + p1 t + p2 t + · · · + pnn + · · · )

k =0

×

∞ ,

(k + r)bk t

k +r

+ (q0 + q1 t + q2 t + · · · + qnn

k =0

+ ···)

∞ ,

bk t k +r

k =0 2

= (r(r − 1) + p0 r + q0 )b0 + c1 t + c2 t + · · ·

(8.6.7)

where the general term cn depends on n and all earlier coefﬁcients for each n ≥ 1. A general formula for cn turns out to be complicated and not particularly useful for the examples we wish to study, so we choose not to derive such a formula.

The Method of Frobenius

487

The most important conclusion to draw from (8.6.7) comes from the fact that each coefﬁcient of the general power series expansion must be zero, so that since b0 = 0, (8.6.8) r(r − 1) + p0 r + q0 = 0 Equation (8.6.8) is called the indicial equation for the Method of Frobenius. Note that this equation is quadratic in r; its two roots are the values of r that are used in (8.6.2). At this point, it is useful for us to turn our attention to two speciﬁc example of the Method of Frobenius at work. Example 8.6.1 Find a Frobenius series solution for the Bessel–Clifford equation (8.6.9) t 2 y + (1 − a)ty + ty = 0 where a is a constant. Solution. With a being a constant, we have p(t ) = 1 − a, so in the series expansion for p, p0 = 1 − a. Moreover, q(t ) = t , so q0 = 0. Thus, for the given DE the indicial equation is r(r − 1) + (1 − a)r = 0 Rearranging, we see that r(r − 1 + 1 − a) = r(r − a) = 0, and thus the roots of the indicial equation are r = 0 and r = a. In the case that r = 0, the Method of Frobenius is providing an analytic solution to (8.6.9) of the form ∞ , bk t k y1 = k =0

Dividing both sides of (8.6.9) by t and substituting this expression for y using the standard series methods we have already discussed, it follows that ∞ , [(k + 1)(k + 1 − a)bk +1 + bk ]t k k =0

from which we obtain the recurrence relation −1 bk bk +1 = (k + 1)(k + 1 − a) It follows from (8.6.10) that the closed form expression for bk is bk = so we ﬁnd that

(8.6.10)

(−1)k b0 , k ≥ 1 k !(1 − a)(2 − a) · · · (k − a)

1

y1 (t ) = b0 1 +

∞ ,

k =1

(−1)k tk k !(1 − a)(2 − a) · · · (k − a)

2 (8.6.11)

which is valid for all t provided that a = 1, 2, . . .. Note that from this recurrence relation, every bn is a function of b0 , and thus there cannot be two linearly

488

Series solutions for differential equations

independent solutions to the Bessel–Clifford equation that are analytic at 0. Indeed, every solution linearly independent of y1 (t ) must be singular at 0. And while the equation has a singular point at the origin, there is an analytic solution there for every a except when a is a positive integer. We now turn to the other root of the indicial equation in search of a second solution to the Bessel–Clifford equation. Using r = a, we have ty(t ) =

∞ ,

bk t k +a +1

k =0

(1 − a)ty (t ) =

∞ ,

(1 − a)(k + a)bk t k +a

k =0

t 2 y (t ) =

∞ ,

(k + a)(k + a − 1)bk t k +a

k =0

Adding these equations forms the left side of the differential equation we aspire to solve; doing so and simplifying, we ﬁnd that 0 = t 2 y (t ) + (1 − a)ty (t ) + ty(t ) =

∞ ,

k(k + a)bk t k +a +

k =0

∞ ,

bk t k +a +1

k =0

Since the ﬁrst term in the ﬁrst sum is zero, if we adjust the index of the summation in the second sum and combine, we have ∞ ,

[k(k + a)bk + bk −1 ]t k +a = 0

k =1

from which it follows that k(k + a)bk + bk −1 = 0, k ≥ 1 This standard recurrence relation can be solved to write every bk in terms of b0 . Indeed, we see bk =

(−1)k b0 , k ≥ 1 k !(1 + a)(2 + a) · · · (k + a)

so that the Frobenius series representation of the solution is 1 2 ∞ , (−1)k a k y2 (t ) = b0 t 1 + t k !(1 + a)(2 + a) · · · (k + a)

(8.6.12)

k =1

We close this example with a few important observations. First, if a = 0, then the Frobenius solution y2 (t ) is identical to the earlier obtained y1 (t ). Moreover, if a is a non-negative integer, then the Method of Frobenius produces a Taylor series expansion that is analytic at t = 0. Thus, the cases for a valid analytic solution

The Method of Frobenius

489

excluded by our approach in ﬁnding y1 (t ) are here reconciled. Finally, if a is not an integer, then y2 (t ) is singular at t = 0 and, together with the analytic y1 (t ) given by (8.6.11), we have found a linearly independent set of solutions for the Bessel–Clifford equation valid for t > 0. To complete this section, we consider a second example. Example 8.6.2 Find a Frobenius series solution of Bessel’s equation, t 2 y + ty + (t 2 − λ2 )y = 0

(8.6.13)

Solution. In section 8.5.3, we derived a solution to (8.6.13) in the case where λ is an integer. Thus, in what follows we assume that λ > 0 is not an integer. Since p(t ) = 1 and q(t ) = −λ2 + t 2 , we have p0 = 1 and q0 = −λ2 , which tells us that the indicial equation is r(r − 1) + r − λ2 = r 2 − λ2 = 0 Thus, r = ±λ. Choosing r = λ and using (8.6.4), (8.6.5), and (8.6.6), we ﬁnd that the three relevant series for the differential equation (8.6.13) are (t 2 − λ2 )y(t ) =

∞ ,

bk t k +λ+2 −

k =0

ty (t ) =

∞ ,

∞ ,

bk t k +λ+2

k =0

(k + λ)bk t k +λ

k =0 2

t y (t ) =

∞ ,

(k + λ)(k + λ − 1)bk t k +λ

k =0

From the form of Bessel’s equation, the sum of these three expressions vanishes; adding and simplifying, we observe that ∞ ,

k(k + 2λ)bk t k +λ −

k =0

∞ ,

bk t k +λ+2 = 0

k =0

To combine the sums, we step up the index in the second summation by 2 and ﬁnd ∞ , [k(k + 2λ)bk + bk −2 ]t k +λ = 0 (1 + 2λ)b1 t 1+λ − k =2

So, (1 + 2λ)b1 = 0, and k(k + 2λ)bk + bk −2 = 0, k ≥ 2

(8.6.14)

One solution to this recurrence relation is obtained by setting b0 = 1 and b1 = 0. Then, since we are assuming that λ is not an integer and b1 = 0, (8.6.14) implies

490

Series solutions for differential equations

that all odd-subscripted coefﬁcients are zero and that −1 bk −2 , k = 2, 4, . . . k(2λ + k)

bk =

Therefore, it follows that in closed form we have b2k =

(−1)k 2−2k k !(1 + λ)(2 + λ) · · · (k + λ)

and thus a Frobenius solution to the Bessel equation is y(t ) = t λ +

∞ ,

k =1

(−1)k 2−2k t 2k +λ k !(1 + λ)(2 + λ) · · · (k + λ)

Note that since λ > 0, the ratio test can be applied to show that this series converges for all values of t . A more detailed study of the Method of Frobenius is beyond the scope of this text. (For further discussion, see Potter and Goldberg, Mathematical Methods, second edition, Great Lakes Press 1995.) Exercises 8.6 In exercises 1–10, ﬁnd the indicial equation and use the root that either is not an integer or that is the larger integer to ﬁnd the ﬁrst three nonzero coefﬁcients in a Frobenius series solution to the given DE. 1. 2t 2 y − ty + (1 + t )y = 0 2. 2ty + y + ty = 0 3. ty + (t − 2)y + y = 0 4. 2ty + (1 + 4t )y + y = 0 5. t 2 y − t (t + 5)y + (t + 5)y = 0 6. 2t 2 y − ty + (t − 5)y = 0 7. 4t 2 y + 6ty + (t − 2)y = 0 8. 2ty + (1 − t )y − y = 0 9. t 2 y + ty + (t − 3)y = 0 10. 3t 2 y − ty − 4y = 0 11. Find the indicial equation for the Cauchy–Euler equation t 2 y + pty + qy = 0 12. Show that the roots of the indicial equation are equal for the Laguerre equation ty + (1 − t )y + qu = 0

For further study

491

8.7 For further study 8.7.1 Taylor series for ﬁrst-order differential equations

Let y(t ) =

0∞

n =0 an t

n

be the Taylor series of a solution of

ty + λy = f (t ) 0 n where λ is constant and f (t ) = ∞ n =0 fn t .

(8.7.1)

(a) Show that y(t ) =

∞ , fn n t n+λ

n =0

(b) In terms of the inﬁnite series derived in (a), what is the general solution to (8.7.1)? (c) Using series expansions appropriately and your work in (a), determine the general solution to each of the following DEs. (i) ty + 2y = e t (ii) ty + 3y = sin t (iii) ty + 4y = arctan t (d) Show that t, ∞ ∞ ∞ , , fn n fn n+λ −λ −λ t =t t =t fn x n+λ−1 dx n+λ n+λ 0 n =0

n =0

= t −λ

t

n =0

x λ−1

0

∞ ,

fn x n dx = t −λ

n =0

t

x λ−1 f (x) dx

0

(e) Substitute directly in (8.7.1) to show that t −λ y(t ) = t x λ−1 f (x) dx 0

is indeed a solution. (f) Solve (8.7.1) by use of an integrating factor (see section 2.3) and compare your result to y(t ) as given in (e). 8.7.2 The Gamma function

The Gamma function (x), like Bessel functions and families of orthogonal polynomials, is a special function that plays an important role in many areas of mathematics. The Gamma function is deﬁned by ∞ (s + 1) = e −t t s dt , s > −1 (8.7.2) 0

492

Series solutions for differential equations

(a) Show that (1) = 1. (b) Use integration by parts to show that (s + 1) = s (s). (c) Show that if s is a positive integer, then (s) = s !. ∞ (d) Let r > 0 be given and recall that L[t r ] = 0 e −st t r dt . Hence show that L[t r −1 ] =

(e) Show that

(r) sr

∞ √ 1 2 e −x dx = π =2 2 −∞

(f) Use (b) to show that hn

(h + x /h) = x(x + h)(x + 2h) · · · (x + (n − 1)h) (x /h)

Hence, show that 1 · 3 · 5 · · · (2n − 1) = 2n

(n + 1/2) 2n = √ (n + 1/2) (1/2) π

(g) Finally, explain why 1 · 3 · 5 · · · (2n − 1) = (2n)!/(2n n !) and therefore show 1 (2n)! √ n+ π = n 2 2 n!

A Review of integration techniques

Several standard solution techniques for differential equations require us to integrate functions. Here we brieﬂy review some fundamentals from calculus.

u-substitution

For integrals of the form

f (g (t ))g (t ) dt

we can evaluate the integral by undoing the chain rule through a change of variables. Letting u = g (t ), it follows du = g (t ) dt , and thus f (g (t ))g (t ) dt = f (u) du If we can evaluate the new, simpler integral in u, all that remains is to substitute back to the variable t . For instance, to evaluate t sin t 2 dt we let u = t 2 and du = 2t dt . We note that t dt = 12 du. Thus, substituting for t 2 and t dt , we ﬁnd that the given integral is equivalently 1 sin u du 2 Evaluating the integral in u and substituting back to t , 1 1 1 t sin t 2 dt = sin u du = − cos u + C = − cos t 2 + C 2 2 2 493

494

Appendix A: Review of integration techniques

Overall, u-substitution is particularly relevant for working with composite functions. In attempting to use u-substitution, we should search the integrand for an inside function, and then hope that its derivative (up to a constant multiple) is present outside the composite function. Examples for further practice: 2 1. te −t dt

t 21 (4t 22 − 13)20 dt

2.

3.

4.

5.

6e 1/t · t −2 dt sin t dt 1 + cos2 t (sin t )3 dt

Hint: sin2 t = 1 − cos2 t .

Integration by parts

As u-substitution is used to undo the chain rule, integration by parts undoes the product rule. It is particularly applicable to integrals that involve products of basic functions such as te t dt . Recall that the product rule states d [u(t )v(t )] = u(t )v (t ) + v(t )u (t ) dt Integrating both sides of (A.1), it follows that u(t )v(t ) = u(t )v (t ) dt + v(t )u (t ) dt Solving for u(t )v (t ) dt , we have u(t )v (t ) dt = u(t )v(t ) − v(t )u (t ) dt

(A.1)

(A.2)

(A.3)

Writing dv = v (t ) dt and du = u (t ) du and suppressing the presence of t , we see in (A.3) the standard statement of the integration by parts rule: udv = uv − v du (A.4) For example, let’s evaluate te t dt . Letting u = t and dv = e t dt , we observe that du = dt and v = e t . Thus, integrating by parts, t t te dt = te − e t dt = te t − e t + C

Partial fractions

495

A good way to think of integration by parts is to view it as integrating the product u dv by trading u for its derivative and trading dv for its antiderivative. In particular, once we have decided to use integration by parts, we must make appropriate choices for u and dv. One guideline is that dv should be fairly easy to antidifferentiate; another is that the derivative of u should not be signiﬁcantly more complicated than u itself. Overall, we generally want the integral of v du to be somehow simpler (or at least not more complicated) than the integral of u dv. Examples for further practice: 1. t 4 ln t dt

2.

5t sin t dt

3te 2t dt

3.

4.

√ t 7t + 5 dt

5.

ln t dt

Hint: Try dv = 1.

t 2 e t dt

6.

7.

e t cos t dt

Partial fractions

A remarkable fact is that any rational function (that is, any quotient of two polynomials) may be integrated. The standard method for approaching an integration problem of the form p(t ) dt q(t ) is the technique known as partial fractions. It is necessary to assume (or apply long division so) that the degree of p is less than the degree of q. While partial fractions is an important technique for integration, it is also a useful tool in its own right. For example, we frequently use it when working with the Laplace transform; see sections 5.5 and 5.6. The method is best understood through a sequence of examples. Example A.1 Evaluate the integral

t dt t 2 + 5t + 6

(A.5)

496

Appendix A: Review of integration techniques

Solution.

Factoring the integrand, we can write t t 2 + 5t

+6

=

t (t + 2)(t + 3)

(A.6)

If we view the righthand side as the result of adding two simpler fractions, we can make the reasonable assumption that two fractions of the form A /(t + 2) and B /(t + 3) had to be combined by getting a common denominator to form (A.6). Thus we assume A B t = + (t + 2)(t + 3) t + 2 t + 3

(A.7)

and seek values of A and B which make this relationship hold for all values of t . Multiplying both sides of (A.7) by (t + 2)(t + 3), we ﬁnd t = A(t + 3) + B(t + 2)

(A.8)

Since (A.8) must be valid for every value of t , we can choose t -values that make it especially easy to identify A and B. Choosing t = −2, we see that −2 = A(−2 + 1) = A. Choosing t = −3, it follows −3 = B(−3 + 2), so B = 3. Thus, we have determined t 2 3 =− + (t + 2)(t + 3) t +2 t +3

(A.9)

Having completed the partial fraction decomposition, we can now integrate. In particular, t 2 3 + dt = − t 2 + 5t + 6 t +2 t +3 = −2 ln(t + 2) + 3 ln(t + 3) + C

The approach of example A.1 works any time the denominator q(t ) can be written as a product of distinct linear terms. That is, if q(t ) = (t − r1 ) (t − r2 ) · · · (t − rn ), then we can write A1 A2 An p(t ) = + + ··· + q(t ) t − r1 t − r2 t − rn and use algebra similar to our work above to determine A1 , . . . , An . Example A.2

Solution.

Evaluate the integral

t2 − 4 dt t3 + t2

Factoring the denominator of the integrand, we have t2 − 4 t2 − 4 = t 3 + t 2 t 2 (t + 1)

Partial fractions

497

If we think of the possible simpler fractions from which the given one can arise, we see that it is possible for terms of the form A B C , , and t t2 t +1 to be present. In particular, we must include A /t since this denominator is included in the necessary B /t 2 . Thus we write A B C t2 − 4 = + 2+ 2 t (t + 1) t t t +1

(A.10)

Multiplying both sides of (A.10) by the least common denominator t 2 (t + 1), we ﬁnd t 2 − 4 = At (t + 1) + B(t + 1) + Ct 2 Setting t = 0 implies −4 = B; using t = −1 shows −3 = C. To ﬁnd A, we may use any other value of t , along with the established values of B and C. With t = 1, −3 = A(1)(2) + (−4)(2) + (−3)12 and therefore A = 4. We now apply the partial fractions decomposition and integrate: 2 t −4 4 4 3 dt = − − dt t3 + t2 t t2 t + 1 = 4 ln t + 4t −1 − 3 ln(t + 1) + C

In any rational function where the denominator contains a repeated factor, we use a similar form of partial fraction decomposition. For instance, t 3 − 2t + 1 A B C D E F = + + + + + (t + 4)3 (t − 2)2 (t − 5) t + 4 (t + 4)2 (t + 4)3 t − 2 (t − 2)2 t − 5 so that each repeated factor is represented once for each possible order up to the highest power. Example A.3 Evaluate the integral

t −5 dt t3 + t

Solution. When we factor the integrand, we observe that a quadratic