25 Applications
25.1 Information Loss
Proposition 25.1 (Zero Derivative implies Constant) If \(f\) is a differentiable function where \(f^\prime(x)=0\) on an interval \(I\), then \(f\) is constant on that interval.
Proof. Let \(a,b\) be any two points in the interval: we will show that \(f(a)=f(b)\), so \(f\) takes the same value at all points. If \(a<b\) we can apply the mean value theorem to this pair, which furnishes a point \(c\in(a,b)\) such that \[f^\prime(c)=\frac{f(b)-f(a)}{b-a}\] But, \(f^\prime(c)=0\) by assumption! Thus \(f(b)-f(a)=0\), so \(f(b)=f(a)\).
Corollary 25.1 (Functions with the Same Derivative) If \(f,g\) are two functions which are differentiable on an interval \(I\) and \(f^\prime=g^\prime\) on \(I\), then there exists a \(C\in\RR\) with \[f(x)=g(x)+C\]
Proof. Consider the function \(h(x)=f(x)-g(x)\). Then by the differentiation laws, \[h^\prime(x)=f^\prime(x)-g^\prime(x)=0\] as we have assumed \(f^\prime=g^\prime\). But now ?prp-derivative-zero-implies-const implies that \(h\) is constant, so \(h(x)=C\) for some \(C\). Substituting this in yields \[f(x)=g(x)+C\]
Definition 25.1 Let \(f\) be a function. If \(F\) is a differentiable function with the same domain such that \(F^\prime = f\), we say \(F\) is an antiderivative of \(f\).
Corollary 25.2 (Antiderivatives differ by a Constant) Any two antiderivatives of a function \(f\) differ by a constant. Thus, the collection of all possible antiderivatives is described choosing any particular antiderivative \(F\) as \[\{F(x)+C\mid C\in\RR\}\]
This is the familiar \(+C\) from Calculus!
25.2 Function Behavior
We can use the theory of derivatives to understand when a function is increasing / decreasing and convex/concave, which prove useful in classifying the extrema of functions among other things.
Proposition 25.2 (Monotonicity and the Derivative) If \(f\) is is continuous and differentiable on \([a,b]\), then \(f(x)\) is monotone increasing on \([a,b]\) if and only of \(f^\prime(x)\geq 0\) for all \(x\in [a,b]\).
As this is an if and only if statement, we prove the two claims separately. First, we assume that \(f^\prime\geq 0\) and show \(f\) is increasing: :::{.proof} Let \(x<y\) be any two points in the interval \([a,b]\): we wish to show that \(f(x)\leq f(y)\). By the Mean Value Theorem, we know there must be some point \(\star\in (x,y)\) such that \[f^\prime(\star)=\frac{f(y)-f(x)}{y-x}\]
But, we’ve assumed that \(f^\prime\geq 0\) on the entire interval, so \(f^\prime(\star)\geq 0\). Thus\(\frac{f(y)-f(x)}{y-x}\geq 0\), and since \(y-x\) is positive, this implies
\[f(y)-f(x)\geq 0\]
That is, \(f(y)\geq f(x)\). Note that we can extract even more information here than claimed: if we know that \(f^\prime\) is strictly greater than 0 then following the argument we learn that \(f(y)>f(x)\), so \(f\) is strictly monotone increasing. :::
Next, we assume \(f\) is increasing and show \(f^\prime\geq 0\):
Proof. Assume \(f\) is increasing on \([a,b]\), and let \(x\in(a,b)\) be arbitrary. Because we have assumed \(f\) is differentiable, we know that the right and left limits both exist and are equal, and that either of them equals the value of the derivative. So, we consider the right limit \[f^\prime(x)=\lim_{t\to x^+}\frac{f(t)-f(x)}{t-x}\]
For any \(t>x\) we know \(f(t)\geq f(x)\) by the increasing hypothesis, and we know that \(t-x>0\) by definition. Thus, for all such \(t\) this difference quotient is nonnegative, and hence remains so in the limit:
\[f^\prime(x)\geq 0\]
Exercise 25.1 Prove the analogous statement for negative derivatives: \(f^\prime(x)\leq 0\) on \([a,b]\) if and only if \(f(x)\) is monotone decreasing on \([a,b]\).
25.2.1 \(\blacklozenge\) Convexity and the Second Derivative
Recall back from the very introduction to functions we defined the property of convexity, saying that a function was convex if the secant line \(L\) connecting any two points lies strictly above the graph of \(f\), or \(L(x)-f(x)\geq 0\).
It’s good to have a quick review: if \(a,b\) are two points in the domain, the secant line connecting \((a,f(a))\) to \((b,f(b))\) is familiar from our proof of the Mean Value Theorem:
\[L_{a,b}(x)=f(a)+\frac{f(b)-f(a)}{b-a}(x-a)\]
Working even harder, we can come up with a rather simple looking condition that is equivalent to \(f\) lying below its secant line \(L_{a,b}\) for all \(x\in(a,b)\). This is all still strictly algebraic manipulations, encapsulated into a lemma below.
Lemma 25.1 (An Inequality for Convexity) MAYBE SIGN ERROR IN LEMMA
If \(f\) is a function defined on \([a,b]\) the, \(f\) lies below its secant line \(L_{a,b}(x)\) everywhere on the interval if and only if \[\frac{f(b)-f(x)}{b-x}-\frac{f(a)-f(x)}{x-a}>0\] for all \(x\in(a,b)\).
Proof. Because \(1=\frac{b-x}{b-a}+\frac{x-a}{b-a}\), multiplying through by \(f(x)\) yields the identity
\[f(x)=f(x)\frac{b-x}{b-a}+f(x)\frac{x-a}{b-a}\]
Substituting this into the simplified form of ?exr-secant-line-simiplifcation, we can collect like terms and see
\[\begin{align*}L_{a,b}(x)-f(x)&=\left[f(b)-f(x)\right]\frac{b-x}{b-a}+\left[f(a)-f(x)\right]\frac{x-a}{b-a}\\ &= \frac{x-a}{b-a}\left[f(b)-f(x)\right]-\frac{b-x}{b-a}\left[f(x)-f(a)\right] \end{align*}\]
We are trying to set ourselves up to use the Mean Value Theorem, so there’s one more algebraic trick we can employ: we can multiply and divide the first term by \(b-x\), and multiply and divide the second term by \(x-a\): This gives
\[\begin{align*}L_{a,b}(x)-f(x)&= \frac{b-x}{b-x}\frac{x-a}{b-a}\left[f(b)-f(x)\right]-\frac{x-a}{x-a}\frac{b-x}{b-a}\left[f(x)-f(a)\right]\\ &=\frac{(b-x)(x-a)}{b-a}\frac{f(b)-f(x)}{b-x}-\frac{(b-x)(x-a)}{b-a}\frac{f(x)-f(a)}{x-a} \end{align*}\]
Note that each of these terms has the factor \(\frac{(b-x)(x-a)}{b-a}\) in common, and that this factor is positive (as \(x\in(a,b)\) implies \(b-x>0\) and \(x-a>0\)). Thus, we can factor it out and see that \(L_{a,b}(x)-f(x)\) is positive if and only if the remaining term is positive: that is, if and only if \[\frac{f(b)-f(x)}{b-x}-\frac{f(a)-f(x)}{x-a}>0\] as claimed
Now, our goal is to use the Mean Value Theorem to relate this expression (which is a property of \(f\)) to a property of one of its derivatives (here \(f^{\prime\prime}\)).
Exercise 25.2 If \(f^{\prime\prime}>0\) on the interval \([a,b]\) prove that \(f\) lies below its secant line \(L_{a,b}\).
Hint: Here’s a sketch of how to proceed
- For \(x\in(a,b)\), start with the expression \(\frac{f(b)-f(x)}{b-x}-\frac{f(a)-f(x)}{x-a}\), which you eventually want to show is positive.
- Apply the MVT for \(f\) to find points \(c_1\in(a,x)\) and \(c_2\in(x,b)\) where \(f^\prime(c_i)\) equals the respective average slopes.
- Using this, show that your original expression is equivalent to \((c_2-c_1)\frac{f^\prime(c_2)-f^\prime(c_1)}{c_2-c_1}\), and argue that it is sufficient to show that \(\frac{f^\prime(c_2)-f^\prime(c_1)}{c_2-c_1}\) is positive.
- Can you apply the MVT again (this time to \(f^\prime\)) and use our assumption on the second derivative to finish the argument?
Using this, we can quickly prove the main claimed result:
Theorem 25.1 (Convexity and the Second Derivative) If \(f\) is twice differentiable on an interval and \(f^{\prime\prime}>0\) on that interval, then \(f\) is convex on the interval.
Proof. Let \(I\) be the interval in question, and let \(a<b\) be any two points in \(I\). Restricting our function to the interval \([a,b]\) we have \(f^{\prime\prime}(x)>0\) for all \(x\in[a,b]\) by hypothesis; so Exercise 25.2 implies that the secant line lies strictly above the graph. Since the interval \([a,b]\) was arbitrary, this holds for any two such points, which is the definition of convexity.
Though we will not need it, this is actually an if and only if theorem, giving a precise characterization of convex functions in terms of a much easier to check differentiability condition.
25.2.2 Contraction Maps
We can use what we’ve learned about derivatives and the mean value theorem to also produce a simple test for finding contraction maps.
Proposition 25.3 (Contraction Mappings) If \(f\) is continuously differentiable and \(|f^\prime|<1\) on closed interval then \(f\) is a contraction map.
Proof. Let \(f\) have a continuous derivative \(f^\prime\) which satisfies \(|f^\prime(x)|<1\) for all \(x\) in a closed interval \(I\). Because \(f^\prime\) is continuous and \(|x|\) is continuous, so is the composition \(|f^\prime|\), and thus it achieves a maximum value on \(I\) (Theorem 19.2); call this maximum \(M\), and note that \(M<1\) by our assumption.
Now let \(x<y\in I\) be arbitrary. By the Mean Value Theorem there is some \(c\in [x,y]\) such that \[f(y)-f(x)= f^\prime(c)(y-x)\]
Taking absolute values and using that \(|f^\prime(c)|\leq M\) this implies \[|f(y)-f(x)|\leq M|y-x|\]
Since \(x,y\) were arbitrary this holds for all such pairs, and so the distance between \(x\) and$ \(y\) decreases by a factor of at least \(M\), which is strictly less than 1. Thus \(f\) is a contraction map!
We know contraction maps to be extremely useful as they have a unique fixed point, and iterating from any starting value produces a sequence which rapidly converges to that fixed point. Using this differential condition its easy to check if a function is a contraction mapping, and thus easy to rigorously establish the existence of certain convergent sequences.
As a good example, we give a re-proof of the convergence of the Babylonian procedure to \(\sqrt{2}\)
Example 25.1 The function \(f(x)=\frac{x+\frac{2}{x}}{2}\) is a contraction map on the interval \([1,2]\). The fixed point of this map is \(\sqrt{2}\)$, thus the sequence \(1,f(1),ff(1), fff(1),\ldots\) converges to \(\sqrt{2}\).
To prove this, note that if \(x=(x+\tfrac{2}{x})/2\) then \(x^2=2\) whose only positive solution is \(\sqrt{2}\), thus it remains only to check that \(f\) is a contraction. Computing its derivative; \[f^\prime(x)=\frac{1-\frac{2}{x^2}}{2}=\frac{1}{2}-\frac{1}{x^2}\]
On the interval \([1,2]\) the function \(1/x^2\) lies in \([1/4,1]\) and so \(f^\prime\) lies in the interval \([-1/2,1/4]\), and \(|f^\prime|\) lies in \([1/4,1/2]\): thus \(|f^\prime|\) is bounded above by \(1/2\) and is a contraction map!
25.3 Classifying Extrema
We can leverage our understanding of function behavior to classify the maxima and minima of a differentiable function. By Fermat’s theorem we know that if the derivative exists at such points it must be zero, motivating the following definition:
Definition 25.2 (Critical Points) A critical point of a function \(f\) is a point where either (1) \(f\) is not differentiable, or (2) \(f\) is differentiable, and the derivative is zero.
Note that not all critical points are necessarily local extrema - Fermat’s theorem only claims that extrema are critical points - not the converse! There are many examples showing this is not an if and only if:
Example 25.2 The function \(f(x)=x^3\) has a critical point at \(x=0\) (as the derivative is zero), but does not have a local extremum there. The function \(g(x)=2x+|x|\) has a critical point at \(0\) (because it is not differentiable there) but also does not have a local extremum.
If one is only interested in the absolute max and min of the function over its entire domain, this already provides a reasonable strategy, which is one of the early highlights of Calculus I.
Theorem 25.2 (Finding Global Extrema) Let \(f\) be a continuous function defined on a closed interval \(I\) with finitely many critical points. Then the absolute maximum and minimum value of \(f\) are explicitly findable via the following procedure:
- Find the value of \(f\) at the endpoints of \(I\)
- Find the value of \(f\) at the points of non-differentiability
- Find the value of \(f\) at the points where \(f^\prime(x)=0\).
The absolute max of \(f\) is the largest of these values, and the the absolute min is the smallest.
Proof. Because \(I\) is a closed interval and \(f\) is continuous, we are guaranteed by the extreme value theorem that \(f\) achieves both a maximum and minimum value. Let these be \(\max,\min\) respectively, realized at points \(M,m\) with \[f(M)=\max\hspace{1cm}f(m)=\min\]
Without loss of generality, we will consider \(M\) (the same argument applies to \(m\)).
First, \(M\) could be at one of the endpoints of \(f\). If it is not, then \(M\) lies in the interior of \(I\), and there is some small interval \((a,b)\) containing \(M\) totally contained in the domain \(I\). Since \(M\) is the location of the global max, we know for all \(x\in I\), \(f(x)\leq f(M)\). Thus, for all \(x\in(a,b)\), \(f(x)\leq f(M)\) so \(M\) is the location of a local max.
But if \(M\) is the location of a local maximum, if \(f\) is differentiable there by Fermat’s theorem we know \(f^\prime(M)=0\). Thus, \(M\) must be a critical point of \(f\) (whether differentiable or not).
Thus, \(M\) occurs in the list of critical points and endpoints, which are the points we checked.
Oftentimes one is concerned with the more fine-grained information of trying to classify specific extrema as (local) maxes or mins, however. This requires some additional investigation of the behavior of \(f\) near the critical point
Proposition 25.4 (Distinguishing Maxes and Mins) Let \(f\) be a continuously differentiable function on \([a,b]\) and \(c\in(a,b)\) be a critical point where \(f^\prime(x)<0\) for \(x<c\) and \(f^\prime(x)>0\) if \(x>0\), for all \(x\) in some small interval about \(c\).
Then \(c\) is a local minimum of \(f\).
Proof. By the above, we know that \(f^\prime(x)<0\) for \(x<c\) implies that \(f\) is monotone decreasing for \(x<c\): that is, \(x< c\implies f(x)\geq f(c)\). Similarly, as \(f^\prime(x)>0\) for \(x>0\), we have that \(f\) is increasing, and \(c< x\implies f(c)\leq f(x)\).
Thus, for \(x\) on either side of \(c\) we have \(f(x)\geq f(c)\), so \(c\) is the location of a local minimum.
This is even more simply phrased in terms of the second derivative, as is common in Calculus I.
Theorem 25.3 (The Second Derivative Test) Let \(f\) be a twice continuously differentiable function on \([a,b]\), and \(c\) a critical point. Then if \(f^\prime\prime(c)>0\), the point \(c\) is the location of a local minimum, and if \(f^{\prime\prime}(x)>0\) then \(c\) is the location of a local maximum.
Proof. We consider the case that \(f^{\prime\prime}(c)>0\), the other is analogous. Since \(f^{\prime\prime}\) is continuous and positive at \(c\), we know that there exists a small interval \((c-\delta,c+\delta)\) about \(c\) where \(f^{\prime\prime}\) is positive (by ?prp-continuous-positive-neighborhood).
Thus, by ?prp-pos-deriv-increasing, we know on this interval that \(f^\prime\) is an increasing function. Since \(f^\prime(c)=0\), this means that if \(x<c\) we have \(f^\prime(x)<0\) and if \(x>c\) we have \(f^\prime(x)>0\). That is, \(f^\prime\) changes from negative to positive at \(c\), so \(c\) is the location of a local minimum by ?cor-max-min-first-deriv.
25.4 \(\blacklozenge\) L’Hospital’s Rule
L’Hospital’s rule is a very convenient trick for computing tricky limits in calculus: it tells us that when we are trying to evaluate the limit of a quotient of continuous functions and ‘plugging in’ yields the undefined expression \(0/0\) we can attempt to find the limit’s value by differentiating the numerator and denominator, and trying again. Precisely:
Theorem 25.4 (L’Hospital’s Rule) Let \(f\) and \(g\) be continuous functions on an interval containing \(a\), and assume that both \(f\) and \(g\) are differentiable on this interval, with the possible exception of the point \(a\).
Then if \(f(a)=g(a)=0\) and \(g^\prime(x)\neq 0\) for all \(x\neq a\), \[\lim_{x\to a}\frac{f^\prime(x)}{g^\prime(x)}=L\hspace{1cm}\textrm{implies}\hspace{1cm}\lim_{x\to a}\frac{f(x)}{g(x)}=L\]
Proof (Sketch).
- Show that for any \(x\), we have \[\frac{f(x)}{g(x)}=\frac{f(x)-f(a)}{g(x)-g(a)} =\frac{\frac{f(x)-f(a)}{x-a}}{\frac{g(x)-g(a)}{x-a}} \]
- For any \(x\), use the MVT to get points \(c, k\) such that \(f^\prime(c)=\frac{f(x)-f(a)}{x-a}\) and \(g^\prime(k)=\frac{g(x)-g(a)}{x-a}\).
- Choose a sequence \(x_n\to a\): for each \(x_n\), the above furnishes points \(c_n,k_n\): show these sequences converge to \(a\) by squeezing.
- Use this to show that the sequence \(s_n = \frac{f^\prime(c_n)}{g^\prime(k_n)}\) converges to \(L\), using our assumption \(\lim_{x\to a}\frac{f^\prime}{g^\prime}=L\).
- Conclude that the sequence \(\frac{f(x_n)}{g(x_n)}\to L\), and that \(\lim_{x\to a}\frac{f(x)}{g(x)}=L\) as claimed.
Hint: Use the \(\epsilon-\delta\) definition of a functional limit our assumption \(\lim_{x\to a}\frac{f^\prime(x)}{g^\prime(x)}=L\) to help: for any \(\epsilon\), theres a \(\delta\) where \(|x-a|<\delta\) implies this quotient is within \(\epsilon\) of \(L\). Since \(c_n,k_n\to a\) can you find an \(N\) beyond which \(f^\prime(c_n)/g^\prime(k_n)\) is always within \(\epsilon\) of \(L\)?
Exercise 25.3 Fill in the details of the above proof sketch.
25.5 \(\bigstar\) Newton’s Method
Netwon’s method is a recipe for numerically finding zeroes of a function \(f(x)\). It works iteratively, by taking one guess for a zero and producing a (hopefully) better one, using the geometry of the derivative and linear approximations. The procedure is simple to derive: given a point \(a\) we can calculate the tangent line to \(f\) at \(a\)
\[\ell(x)=f(a)+f^\prime(a)(x-a)\]
and since this tangent line should be a good approximation of \(f\) near \(a\), if \(a\) is near the a of \(f\), we can approximate this zero by solving not for \(f(x)=0\) (which is hard, if \(f\) is a complicated function) but \(\ell(x)=0\) (which is easy, as \(\ell\) is linear). Doing so gives
\[x = \]
Definition 25.3 (Newton Iteration) Let \(f\) be a differentiable function. Then Newton iteration is the recursive procedure \[x\mapsto x-\frac{f(x)}{f^\prime(x)}\]
Starting from some \(x_0\) this defines a recursive sequence \(x_{n+1}= x_n-f(x_n)/f^\prime(x_n)\)
This is an extremely useful calculational procedure in practice, so long as you can cook up a function that is zero at whatever point you are interested in. To return to a familiar example, to calculate \(\sqrt{a}\) one might consider the function \(f(x)=x^2-a\), or to find a solution to \(\cos(x)=x\), one may consider \(g(x)=x-\cos(x)\).
Exercise 25.4 Show the sequence of approximates from newtons method for \(\sqrt{2}\) starting at \(x_0=2\) is precisely the babylonian sequence.
We already have several proofs this sequence for \(\sqrt{2}\) converges, so we know that Newton’s method works as expected in at least one instance. But we need a general proof. Below we offer a proof of the special case of a simple zero: were \(f(x)\) crosses the axes like \(x\) rather than running tangent to it like \(x^2\):
Definition 25.4 (Simple Zero) A continuously differentiable function \(f\) has a simple zero at \(c\) if \(f(c)=0\) but \(f^\prime(c)\neq 0\).
Theorem 25.5 (Newton’s Method) Let \(f\) be a continuously twice-differentiable function with a simple zero at \(c\). Then there is some \(\delta\) such that applying newton iteration to any starting point in \(I=(c-\delta,c+\delta)\) results in a sequence that converges to \(c\).
Proof. Our strategy is to show that there is an interval on which the Newton iteration \(N(x)=x-f(x)/f^\prime(x)\) is a contraction map.
Since \(c\) is a simple zero we know \(f^\prime(c)\neq 0\) and without loss of generality we take \(f^\prime(c)>0\). Since \(f\) is continuously twice differentiable \(f^\prime\) is also continuous, meaning there is some \(a>0\) where \(f^\prime\) is positive on the entire interval \((c-a,c+a)\). On this interval we may compute the derivative of the Newton map
\[N^\prime(x)=1-\frac{f^\prime f^\prime - f^{\prime\prime}f}{(f^\prime)^2}=\frac{ff^{\prime\prime}}{(f^\prime)^2}\]
Since \(f,f^\prime\) and \(f^{\prime\prime}\) are all continuous and \(f^\prime\) is nonzero on this interval, \(N^\prime\) is continuous. As \(f(c)=0\) we see \(N^\prime(c)=0\), so using continuity for any \(\epsilon>0\) there is some \(b>0\) where \(x\in(c-b,c+b)\) implies \(|N^\prime(x)|<\epsilon\).
Thus, choosing any \(\epsilon<1\) and taking \(\delta=\min\{a,b\}\) we’ve found an interval \((c-\delta,c+\delta)\) where the derivative of \(N\) is strictly bounded away from \(1\): thus by Proposition 25.3 \(N\) is a contraction map on this interval, and so iterating \(N\) from any starting point produces a sequence that converges to the unique fixed point of \(N\) (Theorem 10.2). This fixed point \(\star\) satisfies
\[N(\star)=\star-\frac{f(\star}{f^\prime(\star)}=\star\] which after some algebra simplifies to \[f(\star)=0\]
Since \(f(c)=0\) and \(f^\prime(x)\) is positive on the entire interval by construction, \(f\) is increasing and so \(f(x)<0\) for \(x<c\) and \(f(x)>0\) for \(x>c\). That is, \(f\) has a unique zero on this interval, so \(\star =c\) and our sequence of Newton iterates converges to \(c\) as desired.
The structure of this proof tells us that Netwon’s method is actually quite efficient: a contraction map which contracts by \(\epsilon<1\) creates a cauchy sequence that converges exponentially fast (like \(\epsilon^n\)). And in our proof, we see continuity of \(N^\prime\) lets us set any \(\epsilon<1\) and get an interval about \(c\) where convergence is exponential in \(\epsilon\). These intervals are nested, and so as \(x\) gets closer and closer to \(c\) the convergence of Newton’s method gets better and better: its always exponentially fast but the base of the exponential improves as we close in.
Exercise 25.5 Provide an alternative proof of Newton’s method when \(f\) is convex: if \(c\) is a simple zero and \(x_0>c\) show the sequence of Newton iterates is a monotone decreasing sequence which is bounded below, and converges to the \(c\) via Monotone Convergence.