22  Theorems & Applications

Highlights of this Chapter: we study the relationship between the behavior of a function and its derivative, proving several foundational results in the theory of differentiable functions:

  • Fermat’s Theorem: A differentiable function has derivative zero at an extremum.
  • Rolle’s Theorem: if a differentiable function is equal at two points, it must have zero derivative at some point in-between.
  • The Mean Value Theorem: the average slope of a differentiable function on an interval is realized as the instantaneous slope at some point inside that interval.

The Mean Value theorem is really the star of the show, and we follow it with several important applications

That the derivative (rate of change) should be able to detect local extrema is an old idea, even predating the calculus of Newton and Leibniz. Though certainly realized earlier in certain cases, it is Fermat who is credited with the first general theorem (so, the result below is often called Fermat’s theorem) We will have more to say about extrema later in the chapter, but this theorem is so useful we prove it first, so it’s available for our use throughout.

Theorem 22.1 (Finding Local Extrema (Fermat’s Theorem)) Let f be a function with a local extremum at m. Then if f is differentiable at m, we must have f(m)=0.

Proof. Without loss of generality we will assume that m is the location of a local minimum (the same argument applies for local maxima, except the inequalities in the numerators reverse). As f is differentiable at m, we know that both the right and left hand limits of the difference quotient exist, and are equal.

First, some preliminaries that apply to both right and left limits. Since we know the limit exists, it’s value can by computed via any appropriate sequence xnm. Choosing some such sequence we investigate the difference quotient

f(xn)f(m)xnm

Because m is a local minimum, there is some interval (say, of radius ε) about m where f(x)f(m). As xnm, we know the sequence eventually enters this interval (by the definition of convergence) thus for all sufficiently large n we know f(xn)f(m)0

Now, we separate out the limits from above and below, starting with limxm. If xnm but xn<m then we know xnm is negative for all n, and so

f(xn)f(m)xnm=posneg=neg

Thus, for all n the difference quotient is 0, and so the limit must be as well! That is, limxmf(x)f(m)xm0

Performing the analogous investigation for the limit from above, we now have a sequence xnm with xnm. This changes the sign of the denominator, so

f(xn)f(m)xnm=pospos=pos

Again, if the difference quotient is 0 for all n, we know the same is true of the limit.

limxm+f(x)f(m)xm0

But, by our assumption that f is differentiable at m we know both of these must be equal! And if one is 0 and the other 0 the only possibility is that f(m)=0.

22.1 Mean Values

One of the most important theorems relating f and f is the mean value theorem. This is an excellent example of a theorem that is intuitively obvious (from our experience with reasonable functions) but yet requires careful proof (as we know by know many functions have non-intuitive behavior). Indeed, when I teach calculus I, I often paraphrase the mean value theorem as follows:

If you drove 60 miles in one hour, then at some point you must have been driving 60 miles per hour

How can we write this mathematically? Say you drove D miles in T hours. If f(t) is your position as a function of time*, and you were driving between t=a and t=b (where ba=T), your average speed was

DT=f(b)f(a)ba

To then say *at some point you were going D miles per hour implies that there exists some t between a and b where the instantaneous rate of change - the derivative - is equal to this value. This is exactly the Mean Value Theorem:

Theorem 22.2 (The Mean Value Theorem) If f is a function which is continuous on the closed interval [a,b] and differentiable on the open interval (a,b), then there exists some x(a,b) where f(x)=f(b)f(a)ba

Note: The reason we require differentiability only ont he interior of the interval is that the two sided limit defining the derivative may not exist at the endpoints, (if for example, the domain of f is only [a,b]).

In this section we will prove the mean value theorem. It’s simplest to break the proof into two steps: first the special case were f(a)=f(b) (and so we are seeking f(x=0)), and then apply this to the general version. This special case is often useful in its own right and so has a name: Rolle’s Theorem.

Theorem 22.3 (Rolle’s Theorem) Let f be continuous on the closed interval [a,b] and differentiable on (a,b). Then if f(b)=f(a), there exists some x(a,b) where f(x)=0.

Proof. Without loss of generality we may take f(b)=f(a)=0 (if their common value is k, consider instead the function f(x)k, and use the linearity of differentiation to see this yields the same result).

There are two cases: (1) f is constant, and (2) f is not. In the first case, f(x)=0 for all x(a,b) so we may choose any such point. In the second case, since f is continuous, it achieves both a maximum and minimum value on [a,b] by the extreme value theorem. Because f is nonconstant these values are distinct, and so at least one of them must be nonzero. Let c(a,b) denote the location of either a (positive) absolute max or (negative) absolute min.

Then, c(a,b) and for all x(a,b), f(x)f(c) if c is the absolute min, and f(x)f(c) if its the max. In both cases, c satisfies the definition of a local extremum. And, as f is differentiable on (a,b) this implies f(c)=0, as required.

Now, we return to the main theorem:

Proof (Of the Mean Value Theorem). Let f be a function satisfying the hypotheses of the mean value theorem, and L be the secant line connecting (a,f(a)) to (b,f(b)). Computing this line, L=f(a)+f(b)f(a)ba(xa)

Now define the auxiliary function g(x)=f(x)L(x). Since L(a)=f(a) and L(b)=f(b), we see that g is zero at both endpoints. Further, since both L and f are continuous on [a,b] and differentiable on (a,b), so is g. Thus, g satisfies the hypotheses of Rolle’s theorem, and so there exists some (a,b) with g()=0

But differentiating g we find

0=f()L()=f()f(b)f(a)ba

Thus, at we have f()=f(b)f(a)ba as claimed

Exercise 22.1 Verify the mean value theorem holds for f(x)=x2+x1 on the interval [4,7].

22.2 MVT and Function Behavior

Proposition 22.1 (Zero Derivative implies Constant) If f is a differentiable function where f(x)=0 on an interval I, then f is constant on that interval.

Proof. Let a,b be any two points in the interval: we will show that f(a)=f(b), so f takes the same value at all points. If a<b we can apply the mean value theorem to this pair, which furnishes a point c(a,b) such that f(c)=f(b)f(a)ba But, f(c)=0 by assumption! Thus f(b)f(a)=0, so f(b)=f(a).

Corollary 22.1 (Functions with the Same Derivative) If f,g are two functions which are differentiable on an interval I and f=g on I, then there exists a CR with f(x)=g(x)+C

Proof. Consider the function h(x)=f(x)g(x). Then by the differentiation laws, h(x)=f(x)g(x)=0 as we have assumed f=g. But now ?prp-derivative-zero-implies-const implies that h is constant, so h(x)=C for some C. Substituting this in yields f(x)=g(x)+C

Definition 22.1 Let f be a function. If F is a differentiable function with the same domain such that F=f, we say F is an antiderivative of f.

Corollary 22.2 (Antiderivatives differ by a Constant) Any two antiderivatives of a function f differ by a constant. Thus, the collection of all possible antiderivatives is described choosing any particular antiderivative F as {F(x)+CCR}

This is the familiar +C from Calculus!

We can use the theory of derivatives to understand when a function is increasing / decreasing and convex/concave, which prove useful in classifying the extrema of functions among other things.

Proposition 22.2 (Monotonicity and the Derivative) If f is is continuous and differentiable on [a,b], then f(x) is monotone increasing on [a,b] if and only of f(x)0 for all x[a,b].

As this is an if and only if statement, we prove the two claims separately. First, we assume that f0 and show f is increasing:

Proof. Let x<y be any two points in the interval [a,b]: we wish to show that f(x)f(y). By the Mean Value Theorem, we know there must be some point (x,y) such that f()=f(y)f(x)yx

But, we’ve assumed that f0 on the entire interval, so f()0. Thusf(y)f(x)yx0, and since yx is positive, this implies

f(y)f(x)0

That is, f(y)f(x). Note that we can extract even more information here than claimed: if we know that f is strictly greater than 0 then following the argument we learn that f(y)>f(x), so f is strictly monotone increasing.

Next, we assume f is increasing and show f0:

Proof. Assume f is increasing on [a,b], and let x(a,b) be arbitrary. Because we have assumed f is differentiable, we know that the right and left limits both exist and are equal, and that either of them equals the value of the derivative. So, we consider the right limit f(x)=limtx+f(t)f(x)tx

For any t>x we know f(t)f(x) by the increasing hypothesis, and we know that tx>0 by definition. Thus, for all such t this difference quotient is nonnegative, and hence remains so in the limit:

f(x)0

Exercise 22.2 Prove the analogous statement for negative derivatives: f(x)0 on [a,b] if and only if f(x) is monotone decreasing on [a,b].

22.3 Classifying Extrema

We can leverage our understanding of function behavior to classify the maxima and minima of a differentiable function. By Fermat’s theorem we know that if the derivative exists at such points it must be zero, motivating the following definition:

Definition 22.2 (Critical Points) A critical point of a function f is a point where either (1) f is not differentiable, or (2) f is differentiable, and the derivative is zero.

Note that not all critical points are necessarily local extrema - Fermat’s theorem only claims that extrema are critical points - not the converse! There are many examples showing this is not an if and only if:

Example 22.1 The function f(x)=x3 has a critical point at x=0 (as the derivative is zero), but does not have a local extremum there. The function g(x)=2x+|x| has a critical point at 0 (because it is not differentiable there) but also does not have a local extremum.

If one is only interested in the absolute max and min of the function over its entire domain, this already provides a reasonable strategy, which is one of the early highlights of Calculus I.

Theorem 22.4 (Finding Global Extrema) Let f be a continuous function defined on a closed interval I with finitely many critical points. Then the absolute maximum and minimum value of f are explicitly findable via the following procedure:

  • Find the value of f at the endpoints of I
  • Find the value of f at the points of non-differentiability
  • Find the value of f at the points where f(x)=0.

The absolute max of f is the largest of these values, and the the absolute min is the smallest.

Proof. Because I is a closed interval and f is continuous, we are guaranteed by the extreme value theorem that f achieves both a maximum and minimum value. Let these be max,min respectively, realized at points M,m with f(M)=maxf(m)=min

Without loss of generality, we will consider M (the same argument applies to m).

First, M could be at one of the endpoints of f. If it is not, then M lies in the interior of I, and there is some small interval (a,b) containing M totally contained in the domain I. Since M is the location of the global max, we know for all xI, f(x)f(M). Thus, for all x(a,b), f(x)f(M) so M is the location of a local max.

But if M is the location of a local maximum, if f is differentiable there by Fermat’s theorem we know f(M)=0. Thus, M must be a critical point of f (whether differentiable or not).

Thus, M occurs in the list of critical points and endpoints, which are the points we checked.

Oftentimes one is concerned with the more fine-grained information of trying to classify specific extrema as (local) maxes or mins, however. This requires some additional investigation of the behavior of f near the critical point

Proposition 22.3 (Distinguishing Maxes and Mins) Let f be a continuously differentiable function on [a,b] and c(a,b) be a critical point where f(x)<0 for x<c and f(x)>0 if x>0, for all x in some small interval about c.

Then c is a local minimum of f.

Proof. By the above, we know that f(x)<0 for x<c implies that f is monotone decreasing for x<c: that is, x<cf(x)f(c). Similarly, as f(x)>0 for x>0, we have that f is increasing, and c<xf(c)f(x).

Thus, for x on either side of c we have f(x)f(c), so c is the location of a local minimum.

This is even more simply phrased in terms of the second derivative, as is common in Calculus I.

Theorem 22.5 (The Second Derivative Test) Let f be a twice continuously differentiable function on [a,b], and c a critical point. Then if f(c)>0, the point c is the location of a local minimum, and if f(x)>0 then c is the location of a local maximum.

Proof. We consider the case that f(c)>0, the other is analogous. Since f is continuous and positive at c, we know that there exists a small interval (cδ,c+δ) about c where f is positive (by ?prp-continuous-positive-neighborhood).

Thus, by ?prp-pos-deriv-increasing, we know on this interval that f is an increasing function. Since f(c)=0, this means that if x<c we have f(x)<0 and if x>c we have f(x)>0. That is, f changes from negative to positive at c, so c is the location of a local minimum by ?cor-max-min-first-deriv.

22.4 Contraction Maps

We can use what we’ve learned about derivatives and the mean value theorem to also produce a simple test for finding contraction maps.

Proposition 22.4 (Contraction Mappings) If f is continuously differentiable and |f|<1 on closed interval then f is a contraction map.

Proof. Let f have a continuous derivative f which satisfies |f(x)|<1 for all x in a closed interval I. Because f is continuous and |x| is continuous, so is the composition |f|, and thus it achieves a maximum value on I (); call this maximum M, and note that M<1 by our assumption.

Now let x<yI be arbitrary. By the Mean Value Theorem there is some c[x,y] such that f(y)f(x)=f(c)(yx)

Taking absolute values and using that |f(c)|M this implies |f(y)f(x)|M|yx|

Since x,y were arbitrary this holds for all such pairs, and so the distance between x and$ y decreases by a factor of at least M, which is strictly less than 1. Thus f is a contraction map!

We know contraction maps to be extremely useful as they have a unique fixed point, and iterating from any starting value produces a sequence which rapidly converges to that fixed point. Using this differential condition its easy to check if a function is a contraction mapping, and thus easy to rigorously establish the existence of certain convergent sequences.

As a good example, we give a re-proof of the convergence of the Babylonian procedure to 2

Example 22.2 The function f(x)=x+2x2 is a contraction map on the interval [1,2]. The fixed point of this map is 2$, thus the sequence 1,f(1),ff(1),fff(1), converges to 2.

To prove this, note that if x=(x+2x)/2 then x2=2 whose only positive solution is 2, thus it remains only to check that f is a contraction. Computing its derivative; f(x)=12x22=121x2

On the interval [1,2] the function 1/x2 lies in [1/4,1] and so f lies in the interval [1/2,1/4], and |f| lies in [1/4,1/2]: thus |f| is bounded above by 1/2 and is a contraction map!

22.5 Newton’s Method

Netwon’s method is a recipe for numerically finding zeroes of a function f(x). It works iteratively, by taking one guess for a zero and producing a (hopefully) better one, using the geometry of the derivative and linear approximations. The procedure is simple to derive: given a point a we can calculate the tangent line to f at a

(x)=f(a)+f(a)(xa)

and since this tangent line should be a good approximation of f near a, if a is near the a of f, we can approximate this zero by solving not for f(x)=0 (which is hard, if f is a complicated function) but (x)=0 (which is easy, as is linear). Doing so gives

x=

Definition 22.3 (Newton Iteration) Let f be a differentiable function. Then Newton iteration is the recursive procedure xxf(x)f(x)

Starting from some x0 this defines a recursive sequence xn+1=xnf(xn)/f(xn)

This is an extremely useful calculational procedure in practice, so long as you can cook up a function that is zero at whatever point you are interested in. To return to a familiar example, to calculate a one might consider the function f(x)=x2a, or to find a solution to cos(x)=x, one may consider g(x)=xcos(x).

Exercise 22.3 Show the sequence of approximates from newtons method for 2 starting at x0=2 is precisely the babylonian sequence.

We already have several proofs this sequence for 2 converges, so we know that Newton’s method works as expected in at least one instance. But we need a general proof. Below we offer a proof of the special case of a simple zero: were f(x) crosses the axes like x rather than running tangent to it like x2:

Definition 22.4 (Simple Zero) A continuously differentiable function f has a simple zero at c if f(c)=0 but f(c)0.

Theorem 22.6 (Newton’s Method) Let f be a continuously twice-differentiable function with a simple zero at c. Then there is some δ such that applying newton iteration to any starting point in I=(cδ,c+δ) results in a sequence that converges to c.

Proof. Our strategy is to show that there is an interval on which the Newton iteration N(x)=xf(x)/f(x) is a contraction map.

Since c is a simple zero we know f(c)0 and without loss of generality we take f(c)>0. Since f is continuously twice differentiable f is also continuous, meaning there is some a>0 where f is positive on the entire interval (ca,c+a). On this interval we may compute the derivative of the Newton map

N(x)=1ffff(f)2=ff(f)2

Since f,f and f are all continuous and f is nonzero on this interval, N is continuous. As f(c)=0 we see N(c)=0, so using continuity for any ε>0 there is some b>0 where x(cb,c+b) implies |N(x)|<ε.

Thus, choosing any ε<1 and taking δ=min{a,b} we’ve found an interval (cδ,c+δ) where the derivative of N is strictly bounded away from 1: thus by N is a contraction map on this interval, and so iterating N from any starting point produces a sequence that converges to the unique fixed point of N (). This fixed point satisfies

N()=f(f()= which after some algebra simplifies to f()=0

Since f(c)=0 and f(x) is positive on the entire interval by construction, f is increasing and so f(x)<0 for x<c and f(x)>0 for x>c. That is, f has a unique zero on this interval, so =c and our sequence of Newton iterates converges to c as desired.

The structure of this proof tells us that Netwon’s method is actually quite efficient: a contraction map which contracts by ε<1 creates a cauchy sequence that converges exponentially fast (like εn). And in our proof, we see continuity of N lets us set any ε<1 and get an interval about c where convergence is exponential in ε. These intervals are nested, and so as x gets closer and closer to c the convergence of Newton’s method gets better and better: its always exponentially fast but the base of the exponential improves as we close in.

Exercise 22.4 Provide an alternative proof of Newton’s method when f is convex: if c is a simple zero and x0>c show the sequence of Newton iterates is a monotone decreasing sequence which is bounded below, and converges to the c via Monotone Convergence.

22.6 L’Hospital’s Rule

L’Hospital’s rule is a very convenient trick for computing tricky limits in calculus: it tells us that when we are trying to evaluate the limit of a quotient of continuous functions and ‘plugging in’ yields the undefined expression 0/0 we can attempt to find the limit’s value by differentiating the numerator and denominator, and trying again. Precisely:

Theorem 22.7 (L’Hospital’s Rule) Let f and g be continuous functions on an interval containing a, and assume that both f and g are differentiable on this interval, with the possible exception of the point a.

Then if f(a)=g(a)=0 and g(x)0 for all xa, limxaf(x)g(x)=Limplieslimxaf(x)g(x)=L

Proof (Sketch).

  • Show that for any x, we have f(x)g(x)=f(x)f(a)g(x)g(a)=f(x)f(a)xag(x)g(a)xa
  • For any x, use the MVT to get points c,k such that f(c)=f(x)f(a)xa and g(k)=g(x)g(a)xa.
  • Choose a sequence xna: for each xn, the above furnishes points cn,kn: show these sequences converge to a by squeezing.
  • Use this to show that the sequence sn=f(cn)g(kn) converges to L, using our assumption limxafg=L.
  • Conclude that the sequence f(xn)g(xn)L, and that limxaf(x)g(x)=L as claimed.

Hint: Use the εδ definition of a functional limit our assumption limxaf(x)g(x)=L to help: for any ε, theres a δ where |xa|<δ implies this quotient is within ε of L. Since cn,kna can you find an N beyond which f(cn)/g(kn) is always within ε of L?

Exercise 22.5 Fill in the details of the above proof sketch.