<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>wj32</title>
	<atom:link href="http://wj32.org/wp/feed/" rel="self" type="application/rss+xml" />
	<link>http://wj32.org/wp</link>
	<description>information when you need it</description>
	<lastBuildDate>Thu, 23 May 2013 09:19:19 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>First-order ODEs, matrix exponentials, and det(exp)</title>
		<link>http://wj32.org/wp/2013/03/11/first-order-odes-matrix-exponentials-and-detexp/</link>
		<comments>http://wj32.org/wp/2013/03/11/first-order-odes-matrix-exponentials-and-detexp/#comments</comments>
		<pubDate>Mon, 11 Mar 2013 03:18:16 +0000</pubDate>
		<dc:creator>wj32</dc:creator>
				<category><![CDATA[Mathematics]]></category>

		<guid isPermaLink="false">http://wj32.org/wp/?p=1055</guid>
		<description><![CDATA[Last time we derived a formula for the derivative of the matrix exponential. Here we will be focusing instead on the expression $$D\exp(x)u=\exp(x)u=u\exp(x),$$ which holds whenever \(u\) commutes with \(x\). In this post, \(E\) denotes a real Banach space and &#8230; <a href="http://wj32.org/wp/2013/03/11/first-order-odes-matrix-exponentials-and-detexp/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2013/02/26/convex-functions-second-derivatives-and-hessian-matrices/"     class="crp_title">Convex functions, second derivatives and Hessian matrices</a></li><li><a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/"     class="crp_title">Differentiation done correctly: 1. The derivative</a></li><li><a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/"     class="crp_title">Differentiation done correctly: 4. Inverse and implicit&hellip;</a></li><li><a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/"     class="crp_title">Differentiation done correctly: 3. Partial derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/28/frechet-derivative-of-the-matrix-exponential-function/"     class="crp_title">Fréchet derivative of the (matrix) exponential function</a></li></ul></div>]]></description>
				<content:encoded><![CDATA[<p>Last time we <a href="http://wj32.org/wp/2013/02/28/frechet-derivative-of-the-matrix-exponential-function/">derived a formula</a> for the derivative of the matrix exponential. Here we will be focusing instead on the expression $$D\exp(x)u=\exp(x)u=u\exp(x),$$ which holds whenever \(u\) commutes with \(x\). In this post, \(E\) denotes a real Banach space and \(L(E)\) denotes the space of linear operators on \(E\).</p>
<h2>ODEs</h2>
<p>The (matrix) exponential can be used to solve certain types of first-order linear systems of ordinary differential equations with non-constant coefficients: not only can we solve \begin{align}x&#8217;(t)&#038;=x(t)+y(t) \\ y&#8217;(t)&#038;=x(t)+y(t),\end{align} but we can also solve \begin{align}x&#8217;(t)&#038;=tx(t)+y(t) \\ y&#8217;(t)&#038;=x(t)+ty(t).\end{align}</p>
<p><span id="more-1055"></span></p>
<p><strong>Theorem 1.</strong> <em>Let \(I\) be a connected open subset of \(\mathbb{R}\), let \(f:I\to E\) be differentiable and suppose that \(f&#8217;(t)=A(t)f(t)+b(t)\) for all \(t\in I\), where \(A:I\to L(E)\) and \(b:I\to E\) are continuous. Assume that \(A(s)A(t)=A(t)A(s)\) for all \(s,t\in I\). Choose any \(a\in I\). Then there exists some \(c\in E\) such that $$<br />
f(t)=e^{\widehat{A}(t)} \left( c+\int_a^t e^{-\widehat{A}(s)}b(s)\,ds \right),<br />
$$ where $$<br />
\widehat{A}(t)=\int_a^t A(s)\,ds.<br />
$$</em></p>
<p><em>Proof.</em> Choose any \(a\in I\) and let $$<br />
g(t)=e^{-\widehat{A}(t)}f(t)-\int_a^t e^{-\widehat{A}(s)}b(s)\,ds.<br />
$$ It is easy to verify that \(A(t)=\widehat{A}&#8217;(t)\) commutes with \(\widehat{A}(t)\) for every \(t\in I\), so \begin{align}<br />
g&#8217;(t) &#038;= e^{-\widehat{A}(t)}(-\widehat{A}&#8217;(t))f(t)+e^{-\widehat{A}(t)}f&#8217;(t)-e^{-\widehat{A}(t)}b(t) \\<br />
&#038;= -e^{-\widehat{A}(t)}A(t)f(t)+e^{-\widehat{A}(t)}(A(t)f(t)+b(t))-e^{-\widehat{A}(t)}b(t) \\<br />
&#038;= 0<br />
\end{align} and \(g\) is constant (see <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/#id-17">this theorem</a>). \(\square\)</p>
<p>Suppose we have a (non-homogeneous) linear system of ODEs with <em>constant</em> coefficients. Then \(A(t)\) is constant and \(\widehat{A}(t)=tA\), so $$<br />
f(t)=e^{tA} \left( c+\int_a^t e^{-sA}b(s)\,ds \right).<br />
$$ If the system is homogeneous, then \(b=0\) and we simply have $$<br />
f(t)=e^{tA}c.<br />
$$ On the other hand, if \(E=\mathbb{R}\) then we just have the &#8220;integrating factor&#8221; method for solving ODEs of the form $$<br />
x&#8217;(t)+P(t)x(t)=Q(t).<br />
$$</p>
<p><strong>Example.</strong> Consider the system \begin{align}<br />
x&#8217;(t) &#038;= 2tx(t)+y(t) \\<br />
y&#8217;(t) &#038;= y(t)+2tx(t),<br />
\end{align} which can be written as $$<br />
\begin{bmatrix}x&#8217;(t) \\ y&#8217;(t)\end{bmatrix} = \begin{bmatrix}2t &#038; 1 \\ 1 &#038; 2t\end{bmatrix}\begin{bmatrix}x(t) \\ y(t)\end{bmatrix} = A(t) \begin{bmatrix}x(t) \\ y(t)\end{bmatrix}.<br />
$$ It is easy to see that \(A(t)A(s)=A(s)A(t)\) for all \(s,t\in\mathbb{R}\). We have $$\widehat{A}(t) = \begin{bmatrix}t^2 &#038; t \\ t &#038; t^2\end{bmatrix}.$$ Therefore the general solution is \begin{align}<br />
\begin{bmatrix}x(t) \\ y(t)\end{bmatrix} &#038;= e^{\widehat{A}(t)}\begin{bmatrix}C_1 \\ C_2\end{bmatrix} \\<br />
&#038;= \begin{bmatrix}<br />
\frac{1}{2}C_{1}(e^{(t+1)t}+e^{(t-1)t})+\frac{1}{2}C_{2}(e^{(t+1)t}-e^{(t-1)t}) \\<br />
\frac{1}{2}C_{1}(e^{(t+1)t}-e^{(t-1)t})+\frac{1}{2}C_{2}(e^{(t+1)t}+e^{(t-1)t})<br />
\end{bmatrix}.<br />
\end{align}</p>
<h2>Determinant of the exponential</h2>
<p>We now assume that \(V\) is a finite-dimensional real vector space.</p>
<p><strong>Lemma 1.</strong> <em>Let \(U\) be the open set of invertible operators in \(L(V)\). Then \(\det:U\to\mathbb{R}\) is differentiable, and $$<br />
D\det(\tau)u=\det(\tau)\operatorname{tr}(\tau^{-1}u).<br />
$$</em></p>
<p><em>Proof.</em> Let \(\iota\) be the identity map on \(V\). It is easy to see that \(\det\) is differentiable, by choosing a basis for \(V\). Let \(f(s)=\det(\tau+su)=\det(\tau)\det(\iota+s\tau^{-1}u)\). Then $$<br />
D\det(\tau)u=f&#8217;(0)=\det(\tau)\operatorname{tr}(\tau^{-1}u)<br />
$$ since \(\det(\iota+s\tau^{-1}u)\) is a polynomial in \(s\) where the coefficient of \(s\) is \(\operatorname{tr}(\tau^{-1}u)\). \(\square\)</p>
<p>As a simple application of Theorem 1, we prove a well-known formula:</p>
<p><strong>Theorem 2.</strong> <em>For all \(\tau\in L(E)\), we have $$<br />
\det(\exp(\tau))=\exp(\operatorname{tr}(\tau)).<br />
$$</em></p>
<p><em>Proof.</em> The exponential function is a map \(\exp:L(V)\to U\). Define \(\gamma:\mathbb{R}\to L(V)\) by \(\gamma(s)=s\tau\). Then \begin{align}<br />
(\det\circ\exp\circ\gamma)&#8217;(s) &#038;= D\det(\exp(s\tau))(D\exp(s\tau)\tau) \\<br />
&#038;= \det(\exp(s\tau))\operatorname{tr}(\exp(s\tau)^{-1}\exp(s\tau)\tau) \\<br />
&#038;= (\det\circ\exp\circ\gamma)(s)\operatorname{tr}(\tau).<br />
\end{align} Therefore $$<br />
(\det\circ\exp\circ\gamma)(s)=\exp(s\operatorname{tr}(\tau))<br />
$$ by Theorem 1. \(\square\)</p>
<div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2013/02/26/convex-functions-second-derivatives-and-hessian-matrices/"     class="crp_title">Convex functions, second derivatives and Hessian matrices</a></li><li><a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/"     class="crp_title">Differentiation done correctly: 1. The derivative</a></li><li><a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/"     class="crp_title">Differentiation done correctly: 4. Inverse and implicit&hellip;</a></li><li><a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/"     class="crp_title">Differentiation done correctly: 3. Partial derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/28/frechet-derivative-of-the-matrix-exponential-function/"     class="crp_title">Fréchet derivative of the (matrix) exponential function</a></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://wj32.org/wp/2013/03/11/first-order-odes-matrix-exponentials-and-detexp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fréchet derivative of the (matrix) exponential function</title>
		<link>http://wj32.org/wp/2013/02/28/frechet-derivative-of-the-matrix-exponential-function/</link>
		<comments>http://wj32.org/wp/2013/02/28/frechet-derivative-of-the-matrix-exponential-function/#comments</comments>
		<pubDate>Thu, 28 Feb 2013 01:20:33 +0000</pubDate>
		<dc:creator>wj32</dc:creator>
				<category><![CDATA[Mathematics]]></category>

		<guid isPermaLink="false">http://wj32.org/wp/?p=1025</guid>
		<description><![CDATA[$$ D\exp(x)u = \int_0^1 e^{sx}ue^{(1-s)x}\,ds. $$ This intriguing formula expresses the derivative of the exponential map on a Banach algebra as an integral. In particular, using &#8220;matrix calculus&#8221; notation we have the formula $$ d\exp(X)= \int_0^1 e^{sX}(dX)e^{(1-s)X}\,ds $$ when \(X\) &#8230; <a href="http://wj32.org/wp/2013/02/28/frechet-derivative-of-the-matrix-exponential-function/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2011/10/30/power-series-of-tanx-cotx-cscx/"     class="crp_title">Power series of tan(x), cot(x), csc(x)</a></li><li><a href="http://wj32.org/wp/2013/01/26/some-series-convergence-problems/"     class="crp_title">Some series convergence problems</a></li><li><a href="http://wj32.org/wp/2012/12/15/formula-for-the-circumference-of-an-ellipse/"     class="crp_title">Formula for the circumference of an ellipse</a></li><li><a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/"     class="crp_title">Differentiation done correctly: 2. Higher derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/"     class="crp_title">Differentiation done correctly: 3. Partial derivatives</a></li></ul></div>]]></description>
				<content:encoded><![CDATA[<p>$$<br />
D\exp(x)u = \int_0^1 e^{sx}ue^{(1-s)x}\,ds.<br />
$$ This intriguing formula expresses the derivative of the exponential map on a Banach algebra as an integral. In particular, using &#8220;matrix calculus&#8221; notation we have the formula $$<br />
d\exp(X)= \int_0^1 e^{sX}(dX)e^{(1-s)X}\,ds<br />
$$ when \(X\) is a square matrix. As we&#8217;ll see, this is not too hard to prove.</p>
<p><span id="more-1025"></span></p>
<p>We will assume that all Banach algebras are unital.</p>
<p><strong>Definition.</strong> Let \(E\) be a Banach algebra. If \(x\in E\), the <strong>exponential</strong> of \(x\) is $$<br />
\exp(x)=e^x=\sum_{n=0}^\infty \frac{x^n}{n!},<br />
$$ which converges absolutely for all \(x\). Thus we have a map \(\exp:E\to E\), called the <strong>exponential function</strong>.</p>
<p>The usual rules for power series apply. In particular, we can differentiate term by term inside the radius of convergence, which is infinite for the exponential function. Before doing this, we need a lemma (the proof is at the end of the post).</p>
<p><strong>Lemma 1</strong> (Power rule). <em>Let \(E\) be a Banach algebra, let \(n\ge 0\), and let \(p_n:E\to E\) be the map defined by \(p_n(x)=x^n\). Then \(Dp_n(x)\) is the linear map given by $$<br />
Dp_n(x)u=\sum_{k=0}^{n-1} x^kux^{n-k-1}.<br />
$$ In particular, if \(E\) is commutative then \(Dp_n(x)\) is given by $$<br />
Dp_n(x)u=nx^{n-1}u.<br />
$$</em></p>
<p>Applying this lemma to the power series for \(\exp\) gives $$<br />
D\exp(x)u=\sum_{n=1}^\infty \frac{1}{n!} \sum_{k=0}^{n-1}x^kux^{n-k-1}.\tag{*}<br />
$$ Notice that when \(u\) commutes with \(x\), we have \(D\exp(x)u=\exp(x)u=u\exp(x)\). We also need another lemma (again, the proof is at the end of the post):</p>
<p><strong>Lemma 2.</strong> <em>For \(m,n\ge 0\), we have $$\int_0^1 s^m(1-s)^n\,ds=\frac{m!n!}{(m+n+1)!}.$$</em></p>
<h2>Evaluation</h2>
<p>Now we can evaluate the integral given at the beginning of the post. We have \begin{align}<br />
\int_0^1 e^{sx}ue^{(1-s)x}\,ds &#038;= \int_0^1 \sum_{m=0}^\infty\frac{s^m x^m}{m!}u\sum_{n=0}^\infty\frac{(1-s)^n x^n}{n!}\,ds \\<br />
&#038;= \sum_{m=0}^\infty\sum_{n=0}^\infty\frac{x^m u x^n}{m!n!}\int_0^1 s^m(1-s)^n\,ds \\<br />
&#038;= \sum_{m=0}^\infty\sum_{n=0}^\infty\frac{x^m u x^n}{(m+n+1)!},<br />
\end{align} which is clearly equal to (*). (The rearrangements are valid because the infinite series are all absolutely convergent.) This proves our formula!</p>
<h2>Proofs</h2>
<p><em>Proof of Lemma 1.</em> We use induction on \(n\). The case \(n=0\) is clear, so suppose that the result holds for \(n-1\). Since \(p_n(x)=xp_{n-1}(x)\), the <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/#id-5">product rule</a> shows that \(Dp_n(x)\) maps \(u\) to \begin{align}<br />
up_{n-1}(x)+xDp_{n-1}(x)u &#038;= ux^{n-1}+x\sum_{k=0}^{n-2}x^kux^{n-k-2} \\<br />
&#038;= ux^{n-1}+\sum_{k=1}^{n-1}x^kux^{n-k-1} \\<br />
&#038;= \sum_{k=0}^{n-1} x^kux^{n-k-1}.<br />
\end{align} \(\square\)</p>
<p><em>Proof of Lemma 2.</em> We use induction on \(n\). The case \(n=0\) is obvious. Suppose the formula holds for \(n-1\). We have \begin{align}<br />
\int_0^1 s^m(1-s)^n\,ds &#038;= \int_0^1 s^m(1-s)^{n-1}(1-s)\,ds \\<br />
&#038;= \int_0^1 s^m(1-s)^{n-1}\,ds- \int_0^1 s^{m+1}(1-s)^{n-1}\,ds \\<br />
&#038;= \frac{m!(n-1)!}{(m+n)!} &#8211; \frac{(m+1)!(n-1)!}{(m+n+1)!} \\<br />
&#038;= \frac{m!(n-1)!(m+n+1)-(m+1)!(n-1)!}{(m+n+1)!} \\<br />
&#038;= \frac{m!n!}{(m+n+1)!}.<br />
\end{align} \(\square\)</p>
<div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2011/10/30/power-series-of-tanx-cotx-cscx/"     class="crp_title">Power series of tan(x), cot(x), csc(x)</a></li><li><a href="http://wj32.org/wp/2013/01/26/some-series-convergence-problems/"     class="crp_title">Some series convergence problems</a></li><li><a href="http://wj32.org/wp/2012/12/15/formula-for-the-circumference-of-an-ellipse/"     class="crp_title">Formula for the circumference of an ellipse</a></li><li><a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/"     class="crp_title">Differentiation done correctly: 2. Higher derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/"     class="crp_title">Differentiation done correctly: 3. Partial derivatives</a></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://wj32.org/wp/2013/02/28/frechet-derivative-of-the-matrix-exponential-function/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Convex functions, second derivatives and Hessian matrices</title>
		<link>http://wj32.org/wp/2013/02/26/convex-functions-second-derivatives-and-hessian-matrices/</link>
		<comments>http://wj32.org/wp/2013/02/26/convex-functions-second-derivatives-and-hessian-matrices/#comments</comments>
		<pubDate>Mon, 25 Feb 2013 22:45:21 +0000</pubDate>
		<dc:creator>wj32</dc:creator>
				<category><![CDATA[Mathematics]]></category>

		<guid isPermaLink="false">http://wj32.org/wp/?p=988</guid>
		<description><![CDATA[In single variable calculus, a twice differentiable function \(f:(a,b)\to\mathbb{R}\) is convex if and only if \(f^{\prime\prime}(x)\ge 0\) for all \(x\in(a,b)\). It is not too hard to extend this result to functions defined on more general spaces: Theorem. Let \(A\subseteq\mathbb{R}^n\) be &#8230; <a href="http://wj32.org/wp/2013/02/26/convex-functions-second-derivatives-and-hessian-matrices/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/"     class="crp_title">Differentiation done correctly: 5. Maxima and minima</a></li><li><a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/"     class="crp_title">Differentiation done correctly: 2. Higher derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/"     class="crp_title">Differentiation done correctly: 1. The derivative</a></li><li><a href="http://wj32.org/wp/2013/03/11/first-order-odes-matrix-exponentials-and-detexp/"     class="crp_title">First-order ODEs, matrix exponentials, and det(exp)</a></li><li><a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/"     class="crp_title">Differentiation done correctly: 3. Partial derivatives</a></li></ul></div>]]></description>
				<content:encoded><![CDATA[<p>In single variable calculus, a twice differentiable function \(f:(a,b)\to\mathbb{R}\) is convex if and only if \(f^{\prime\prime}(x)\ge 0\) for all \(x\in(a,b)\). It is not too hard to extend this result to functions defined on more general spaces:</p>
<p><span id="more-988"></span></p>
<p><strong>Theorem.</strong> Let \(A\subseteq\mathbb{R}^n\) be a convex open set and let \(f:A\to\mathbb{R}\). Suppose that \(f^{\prime\prime}\) exists on \(A\). Then \(f\) is convex if and only if \(f^{\prime\prime}(x)\) is positive semidefinite for all \(x\in A\).</p>
<h2>Hessian matrices</h2>
<p>Combining the previous theorem with the <a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/#id-41">higher derivative test</a> for Hessian matrices gives us the following result for functions defined on convex open subsets of \(\mathbb{R}^n\):</p>
<p>Let \(A\subseteq\mathbb{R}^n\) be a convex open set and let \(f:A\to\mathbb{R}\) be twice differentiable. Write \(H(x)\) for the Hessian matrix of \(A\) at \(x\in A\).</p>
<ol>
<li>If \(f&#8217;(x)=0\) and \(H(x)\) is positive definite, then \(f\) has a strict local minimum at \(x\).</li>
<li>If \(f&#8217;(x)=0\) and \(H(x)\) is negative definite, then \(f\) has a strict local maximum at \(x\).</li>
<li>If \(f&#8217;(x)=0\) and \(H(x)\) has both positive and negative eigenvalues, then \(f\) does not have a local minimum or a local maximum at \(x\). That is, \(f\) has a saddle point at \(x\).</li>
<li>If \(H(x)\) is positive semidefinite for <em>all</em> \(x\in A\), then \(f\) is convex and has a strict <em>global</em> minimum at any \(x\) for which \(f&#8217;(x)=0\) and \(H(x)\) is positive <em>definite</em>.</li>
<li>If \(H(x)\) is negative semidefinite for <em>all</em> \(x\in A\), then \(f\) is concave and has a strict <em>global</em> maximum at any \(x\) for which \(f&#8217;(x)=0\) and \(H(x)\) is negative <em>definite</em>.</li>
</ol>
<p>Since the determinant of a matrix is the product of its eigenvalues, we also have this special case:</p>
<p>Let \(A\subseteq\mathbb{R}^2\) be a convex open set and let \(f:A\to\mathbb{R}\) be twice differentiable. Write \(H(x)\) for the Hessian matrix of \(A\) at \(x\in A\): $$H(x)=\begin{bmatrix} a &#038; b \\ b &#038; d\end{bmatrix}.$$ (Note that \(a,b,d\) are functions of \(x\).)</p>
<ol>
<li>If \(f&#8217;(x)=0\), \(ad-b^2 > 0\) and \(a > 0\), then \(f\) has a strict local minimum at \(x\).</li>
<li>If \(f&#8217;(x)=0\), \(ad-b^2 > 0\) and \(a < 0\), then \(f\) has a strict local maximum at \(x\).</li>
<li>If \(f&#8217;(x)=0\) and \(ad-b^2 < 0\), then \(f\) has a saddle point at \(x\).</li>
<li>If \(ad-b^2 \ge 0\) and \(a,d \ge 0\) for <em>all</em> \(x\in A\), then \(f\) is convex and has a strict <em>global</em> minimum at any \(x\) for which \(f&#8217;(x)=0\), \(ad-b^2 > 0\) and \(a > 0\).</li>
<li>If \(ad-b^2 \ge 0\) and \(a,d \le 0\) for <em>all</em> \(x\in A\), then \(f\) is concave and has a strict <em>global</em> maximum at any \(x\) for which \(f&#8217;(x)=0\), \(ad-b^2 > 0\) and \(a < 0\).</li>
</ol>
<h2>Proof of the theorem</h2>
<p>We will prove a slightly more general statement.</p>
<p>Let \(E\) be a Banach space, let \(A\subseteq E\) be a convex open set, and let \(f:A\to\mathbb{R}\). We say that \(f\) is <strong>convex</strong> if $$<br />
f(x+\lambda(y-x)) \le f(x)+\lambda(f(y)-f(x))<br />
$$ for all \(x,y\in A\) and \(\lambda\in(0,1)\).</p>
<p><strong>Theorem.</strong> Let \(A\subseteq E\) be a convex open set and let \(f:A\to\mathbb{R}\). Suppose that \(f^{\prime\prime}\) exists on \(A\). Then \(f\) is convex if and only if \(f^{\prime\prime}(x)\) is positive semidefinite for all \(x\in A\).</p>
<p>Here, \(f^{\prime\prime}(x):E\times E\to\mathbb{R}\) <a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/">is a bilinear form</a> and \(f^{\prime\prime}(x)\) is said to be <strong>positive semidefinite</strong> if \(f^{\prime\prime}(x)(h,h)\ge 0\) for all \(h\in E\).</p>
<p>The idea of the proof is quite simple: restrict \(f\) to line segments that lie in \(A\) and use the single variable case mentioned at the start of this post. In order to do this, we need a formula for the second derivative of a composition of maps between Banach spaces: \begin{align}<br />
D^2(g\circ f)(x)(u,v) &#038;= D^2 g(f(x))(Df(x)(u),Df(x)(v)) \\<br />
&#038;\qquad + Dg(f(x))(D^2 f(x)(u,v)).<br />
\end{align} The proof, which is just a long computation, is included at the end of this post. Note that when \(f,g\) are maps from \(\mathbb{R}\) to \(\mathbb{R}\), we recover the formula $$<br />
(g\circ f)^{\prime\prime}(x)=g^{\prime\prime}(f(x))[f'(x)]^2+g&#8217;(f(x))f^{\prime\prime}(x).<br />
$$</p>
<p><em>Proof.</em> Let \(x,y\in A\) and define \(\gamma:(-\varepsilon,1+\varepsilon)\to A\) by \(\gamma(\lambda)=x+\lambda(y-x)\), where \(\varepsilon > 0\) is chosen to be small enough. Then \(D\gamma(\lambda)(1)=y-x\) and \(D^2\gamma(\lambda)(1,1)=0\). Using the above formula, we get \begin{align}<br />
(f\circ\gamma)^{\prime\prime}(\lambda) &#038;= D^2(f\circ\gamma)(\lambda)(1,1) \\<br />
&#038;= f^{\prime\prime}(\gamma(\lambda))(y-x,y-x).\tag{*}<br />
\end{align} If \(f^{\prime\prime}(z)\) is positive semidefinite for all \(z\in A\) then \((f\circ\gamma)^{\prime\prime}(\lambda)\ge 0\) for all \(\lambda\), so \(f\circ\gamma\) is convex. Then for all \(\lambda\in(0,1)\), $$<br />
(f\circ\gamma)(0+\lambda(1-0)) \le (f\circ\gamma)(0)+\lambda[(f\circ\gamma)(1)-(f\circ\gamma)(0)]<br />
$$ and $$<br />
f(x+\lambda(y-x)) \le f(x)+\lambda(f(y)-f(x)),<br />
$$ which proves that \(f\) is convex. Conversely, suppose that \(f\) is convex. Let \(z\in A\) and choose \(r > 0\) so that the open ball of radius \(r\) around \(z\) is contained in \(A\). Let \(|u| < r\) and set \(x=z-u/2\) and \(y=z+u/2\) in (*) so that \begin{align}<br />
f^{\prime\prime}(z)(u,u) &#038;= f^{\prime\prime}(\gamma(1/2))(y-x,y-x) \\<br />
&#038;= (f\circ\gamma)^{\prime\prime}(1/2) \\<br />
&#038;\ge 0<br />
\end{align} since \(f\circ\gamma\) is convex. This shows that \(f^{\prime\prime}(z)\) is positive semidefinite for all \(z\in A\). \(\square\)</p>
<h2>The second derivative formula</h2>
<p><a name="lemma"></a><br />
<strong>Lemma</strong> (Chain rule for the second derivative). <em>Let \(E,F\) be Banach spaces. Let \(A\subseteq E\) and \(B\subseteq F\) be open sets. Let \(f:A\to F\) and \(g:B\to G\) with \(f(A)\subseteq B\). If \(D^2 f\) exists on \(A\) and \(D^2 g\) exists on \(B\), then \(D^2(g\circ f)\) exists on \(A\) and \begin{align}<br />
D^2(g\circ f)(x)(u,v) &#038;= D^2 g(f(x))(Df(x)(u),Df(x)(v)) \\<br />
&#038;\qquad + Dg(f(x))(D^2 f(x)(u,v)).<br />
\end{align}</em></p>
<p><em>Proof.</em> We can write \(D(g\circ f)=c\circ d\circ e\), where<br />
\begin{align}<br />
c:L(F,G)\times L(E,F) &#038; \to L(E,G)\\<br />
(\lambda,\mu) &#038; \mapsto\lambda\circ\mu;\\<br />
d:E\times E &#038; \to L(F,G)\times L(E,F)\\<br />
(x,y) &#038; \mapsto((Dg\circ f)(x),Df(y));\\<br />
e:E &#038; \to E\times E\\<br />
x &#038; \mapsto(x,x).<br />
\end{align} Note that \(c\) is a continuous bilinear map and \(e\) is a continuous<br />
linear map. We compute \begin{align}<br />
D^{2}(g\circ f)(x) &#038; =Dc((d\circ e)(x))\circ Dd(e(x))\circ De(x)\\<br />
 &#038; =Dc(Dg(f(x)),Df(x))\circ Dd(x,x)\circ e.<br />
\end{align} Now $$<br />
Dd(x,y)(u,v)=(D^{2}g(f(x))(Df(x)(u)),D^{2}f(y)(v)),<br />
$$ so<br />
\begin{align}<br />
D^{2}(g\circ f)(x)(u) &#038; =Dc(Dg(f(x)),Df(x))(Dd(x,x)(u,u))\\<br />
 &#038; =Dc(Dg(f(x)),Df(x))(D^{2}g(f(x))(Df(x)(u)),D^{2}f(x)(u))\\<br />
 &#038; =D^{2}g(f(x))(Df(x)(u))\circ Df(x)\\<br />
 &#038; \qquad+Dg(f(x))\circ D^{2}f(x)(u)<br />
\end{align} and \begin{align}<br />
D^{2}(g\circ f)(x)(u)(v) &#038; =D^{2}g(f(x))(Df(x)(u))(Df(x)(v))\\<br />
 &#038; \qquad+Dg(f(x))(D^{2}f(x)(u)(v)).<br />
\end{align} \(\square\)</p>
<div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/"     class="crp_title">Differentiation done correctly: 5. Maxima and minima</a></li><li><a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/"     class="crp_title">Differentiation done correctly: 2. Higher derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/"     class="crp_title">Differentiation done correctly: 1. The derivative</a></li><li><a href="http://wj32.org/wp/2013/03/11/first-order-odes-matrix-exponentials-and-detexp/"     class="crp_title">First-order ODEs, matrix exponentials, and det(exp)</a></li><li><a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/"     class="crp_title">Differentiation done correctly: 3. Partial derivatives</a></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://wj32.org/wp/2013/02/26/convex-functions-second-derivatives-and-hessian-matrices/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Differentiation done correctly: 5. Maxima and minima</title>
		<link>http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/</link>
		<comments>http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/#comments</comments>
		<pubDate>Mon, 25 Feb 2013 09:35:41 +0000</pubDate>
		<dc:creator>wj32</dc:creator>
				<category><![CDATA[Mathematics]]></category>

		<guid isPermaLink="false">http://wj32.org/wp/?p=966</guid>
		<description><![CDATA[Navigation: 1. The derivative &#124; 2. Higher derivatives &#124; 3. Partial derivatives &#124; 4. Inverse and implicit functions &#124; 5. Maxima and minima In this final post, we are going to look at some applications of differentiation to locating maxima &#8230; <a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2013/02/26/convex-functions-second-derivatives-and-hessian-matrices/"     class="crp_title">Convex functions, second derivatives and Hessian matrices</a></li><li><a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/"     class="crp_title">Differentiation done correctly: 2. Higher derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/"     class="crp_title">Differentiation done correctly: 4. Inverse and implicit&hellip;</a></li><li><a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/"     class="crp_title">Differentiation done correctly: 3. Partial derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/"     class="crp_title">Differentiation done correctly: 1. The derivative</a></li></ul></div>]]></description>
				<content:encoded><![CDATA[<p>Navigation: <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/">1. The derivative</a> | <a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/">2. Higher derivatives</a> | <a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/">3. Partial derivatives</a> | <a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/">4. Inverse and implicit functions</a> | <strong>5. Maxima and minima</strong></p>
<p>In this final post, we are going to look at some applications of differentiation to locating maxima and minima of real valued functions. In order to do this, we will be using Taylor&#8217;s theorem (covered in part 2) to prove the higher derivative test for functions on Banach spaces, and the implicit function theorem (covered in part 4) to prove a special case of the method of Lagrange multipliers.</p>
<p><span id="more-966"></span></p>
<h2>Consequences of Taylor&#8217;s theorem</h2>
<p><strong>Definition 35.</strong> Let \(f:X\to\mathbb{R}\) be a map defined on a topological space \(X\). If there is a neighborhood \(U\) of \(x\in X\) such that \(f(t)\le f(x)\) for all \(t\in U\), then we say that \(f\) has a <strong>local maximum</strong> at \(x\). Similarly, if \(f(t)\ge f(x)\) for all \(t\in U\) then we say that \(f\) has a <strong>local minimum</strong> at \(x\). If \(f\) has a local maximum or local minimum at \(x\), then we say that \(f\) has an <strong>extreme value</strong> at \(x\). If strict inequality holds, then we say that \(f\) has a <strong>strict</strong> local maximum or minimum.</p>
<p>In single variable calculus, a differentiable function \(f:\mathbb{R}\to\mathbb{R}\) has a local maximum or minimum at a point \(x\in\mathbb{R}\) only if \(f&#8217;(x)=0\). It is easy to extend this result to maps defined on Banach spaces.</p>
<p><strong>Theorem 36.</strong> <em>Let \(A\subseteq E\) be an open set and let \(f:A\to\mathbb{R}\). If \(f\) is differentiable at \(x\in A\) and has an extreme value at \(x\), then \(f&#8217;(x)=0\).</em></p>
<p><em>Proof.</em> Let \(v\in E\) and let \(g(t)=x+tv\). Then \(f\circ g\) has an extreme value at \(0\), so \(0=(f\circ g)&#8217;(0)=f&#8217;(g(0))g&#8217;(0)=f&#8217;(x)v\). Therefore \(f&#8217;(x)=0\). \(\square\)</p>
<p>Also recall that if \(f:\mathbb{R}\to\mathbb{R}\) is of class \(C^1\) and there is a point \(x\in\mathbb{R}\) such that \(f&#8217;(x)=0\), then \(f(x)\) is a local minimum if \(f^{\prime\prime}(x) > 0\) and \(f(x)\) is a local maximum if \(f^{\prime\prime}(x) < 0\). There is a similar test for higher derivatives that follows from Taylor's theorem. Again, we can prove analogous statements for maps defined on Banach spaces.</p>
<p>If \(q\in L(E,\dots,E;\mathbb{R})\) is a multilinear map from \(E^p\) to \(\mathbb{R}\), then we say that \(q\) is a <strong>multilinear form</strong>.</p>
<p><strong>Definition 37.</strong> Write \(h^{(p)}\) for the \(p\)-tuple \((h,\dots,h)\). We say that a form \(q\) is <strong>positive semidefinite</strong> if \(qh^{(p)} \ge 0\) for all \(h\) and <strong>positive definite</strong> if \(qh^{(p)} > 0\) for all \(h \ne 0\). The terms <strong>negative semidefinite</strong> and <strong>negative definite</strong> are defined similarly. If \(qh^{(p)}\) takes on both positive and negative values, then we say that \(q\) is <strong>indefinite</strong>.</p>
<p><strong>Theorem 38</strong> (Higher derivative test). <em>Let \(A\subseteq E\) be an open set and let \(f:A\to\mathbb{R}\). Assume that \(f\) is \((p-1)\) times continuously differentiable and that \(D^p f(x)\) exists for some \(p\ge 2\) and \(x\in A\). Also assume that \(f&#8217;(x),\dots,f^{(p-1)}(x)=0\) and \(f^{(p)}(x)\ne 0\). Write \(h^{(p)}\) for the \(p\)-tuple \((h,\dots,h)\).</em></p>
<ol>
<li><em>If \(f\) has an extreme value at \(x\), then \(p\) is even and the form \(f^{(p)}(x)h^{(p)}\) is semidefinite.</em></li>
<li><em>If there is a constant \(c\) such that \(f^{(p)}(x)h^{(p)}\ge c > 0\) for all \(|h|=1\), then \(f\) has a strict local minimum at \(x\) and (1) applies.</em></li>
<li><em>If there is a constant \(c\) such that \(f^{(p)}(x)h^{(p)}\le c < 0\) for all \(|h|=1\), then \(f\) has a strict local maximum at \(x\) and (1) applies.</em></li>
</ol>
<p><em>Proof.</em> By <a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/#id-24">Corollary 24</a> and the given assumptions, we can write $$<br />
f(x+h)-f(x)=\frac{1}{p!}f^{(p)}(x)h^{(p)}+\theta(h)|h|^p<br />
$$ where \(\theta(h)\to 0\) as \(h\to 0\). First assume that \(f\) has an extreme value at \(x\). Choose a vector \(h_0\ne 0\) such that \(f^{(p)}(x)h_0^{(p)}\ne 0\). Then for sufficiently small \(t\in\mathbb{R}\) we have both $$<br />
f(x+th_{0})-f(x)=\left(\frac{1}{p!}f^{(p)}(x)h_{0}^{(p)}\pm\theta(th_{0})\left|h_{0}\right|^{p}\right)t^{p}\tag{*}<br />
$$ and $$<br />
\left|\theta(th_{0})\right|\left|h_{0}\right|^{p}<\frac{1}{p!}f^{(p)}(x)h_{0}^{(p)}.<br />
$$ For these \(t\), the sign of (*) is the same as the sign of \(f^{(p)}(x)h_0^{(p)}\). Since \(x\) is an extreme value, the sign of (*) must remain constant for small \(t\), which cannot happen unless \(p\) is even. Similarly, if \(f^{(p)}(x)h^{(p)}\) is not semidefinite then there is some vector \(h_1\ne 0\) such that \(f^{(p)}(x)h_1^{(p)}\) and \(f^{(p)}(x)h_0^{(p)}\) have opposite signs, which contradicts the fact that the sign of (*) is constant for small \(t\).</p>
<p>Now suppose that the condition in (2) holds. Then \begin{align}<br />
f(x+h)-f(x) &#038;= \frac{1}{p!}f^{(p)}(x)h^{(p)}+\theta(h)\left|h\right|^{p} \\<br />
&#038;= \left[\frac{1}{p!}f^{(p)}(x)\left(\frac{h}{\left|h\right|}\right)^{(p)}+\theta(h)\right]\left|h\right|^{p} \\<br />
&#038;\ge \left[\frac{c}{p!}+\theta(h)\right]\left|h\right|^{p}.<br />
\end{align} Since \(\theta(h)\to 0\) as \(h\to 0\), the last term is positive for sufficiently small \(h\ne 0\). For these \(h\) we have \(f(x+h) > f(x)\), so \(f\) has a strict local minimum at \(x\). The proof for (3) is similar. \(\square\)</p>
<p><strong>Corollary 39</strong> (Higher derivative test, finite-dimensional case). <em>In Theorem 38, further assume that \(E\) is finite-dimensional. Then \(h\mapsto f^{(p)}(x)h^{(p)}\) has both a minimum and maximum value on the set \(\{h\in E:|h|=1\}\), and:</em></p>
<ol>
<li><em>If the form \(f^{(p)}(x)h^{(p)}\) is indefinite, then \(f\) does not have an extreme value at \(x\).</em></li>
<li><em>If the form \(f^{(p)}(x)h^{(p)}\) is positive definite, then \(f\) has a strict local minimum at \(x\).</em></li>
<li><em>If the form \(f^{(p)}(x)h^{(p)}\) is negative definite, then \(f\) has a strict local maximum at \(x\).</em></li>
</ol>
<p><em>Proof.</em> Since \(E\) is finite-dimensional, the set \(S=\{h\in E:|h|=1\}\) is compact. Therefore the continuous map \(h\mapsto f^{(p)}(x)h^{(p)}\) attains a minimum \(c\) and a maximum \(C\) on \(S\). Part (1) follows directly from part (1) of Theorem 38. If \(f^{(p)}(x)h^{(p)}\) is positive definite then \(c > 0\), so part (2) of Theorem 38 applies. If \(f^{(p)}(x)h^{(p)}\) is negative definite then \(C < 0\), so part (3) of Theorem 38 applies. \(\square\)</p>
<p>The simplest form of Corollary 39 occurs when \(p=2\). Let \(E\) be an \(n\)-dimensional real Banach space, let \(A\subseteq E\) be an open set, and let \(f:A\to\mathbb{R}\) be a class \(C^1\) map. Suppose that \(f^{\prime\prime}(x)\) exists at \(x\in A\). Let \(\{e_1,\dots,e_n\}\) be a basis for \(E\) so that \(E=E_1\times\cdots\times E_n\), where \(E_i\) is the subspace generated by \(e_i\).</p>
<p><strong>Definition 40.</strong> The <strong>Hessian</strong> matrix of \(f\) at \(x\) is the real matrix $$<br />
\begin{bmatrix}<br />
D_1 D_1 f(x) &#038; \cdots &#038; D_1 D_n f(x) \\<br />
\vdots &#038; \ddots &#038; \vdots \\<br />
D_n D_1 f(x) &#038; \cdots &#038; D_n D_n f(x)<br />
\end{bmatrix},<br />
$$ where each element $$<br />
D_i D_j f(x) \in L(E_i,L(E_j,\mathbb{R}))<br />
$$ is identified with \(D_i D_j f(x)(e_i,e_j)\in\mathbb{R}\).</p>
<p><a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/#id-29">Theorem 29</a> shows that this matrix is symmetric. We can restate Corollary 39 in terms of the Hessian matrix.</p>
<p><a name="id-41"></a><br />
<strong>Corollary 41.</strong> <em>Suppose that \(f&#8217;(x)=0\) and \(f^{\prime\prime}(x)\) exists. Let \(H\) be the Hessian matrix of \(f\) at \(x\).</em></p>
<ol>
<li><em>If \(H\) has both positive and negative eigenvalues, then \(f\) does not have an extreme value at \(x\).</em></li>
<li><em>If \(H\) is positive definite, then \(f\) has a strict local minimum at \(x\).</em></li>
<li><em>If \(H\) is negative definite, then \(f\) has a strict local maximum at \(x\).</em></li>
</ol>
<p><em>Proof.</em> It is clear that \(f^{\prime\prime}(x)(h,h)=\widetilde{h}^T H \widetilde{h}\), where \(\widetilde{h}\) is the column vector representing \(h\). \(\square\)</p>
<h2>Lagrange multipliers</h2>
<p>The method of Lagrange multipliers provides a necessary condition for a function \(f:A\to\mathbb{R}\) to be maximized or minimized subject to a constraint expressed as a function \(g:A\to\mathbb{R}\). We first need an elementary result from linear algebra.</p>
<p><strong>Lemma 42.</strong> <em>Let \(f,g:E\to\mathbb{R}\) be nonzero linear functionals. If \(\ker f\subseteq \ker g\), then \(f=\lambda g\) for some \(\lambda\in\mathbb{R}\).</em></p>
<p><em>Proof.</em> \(\ker f\) cannot be a strict subset of \(\ker g\) since \(\dim(E/\ker f)=1\), so \(\ker f=\ker g\). Let \(v\notin \ker f\) and take \(\lambda=f(v)/g(v)\). Clearly \(f=\lambda g\) on \(\ker f=\ker g\). If \(x\notin \ker f\) then \(x=rv\) for some \(r\in\mathbb{R}\), so $$<br />
f(x)=rf(v)=\lambda rg(v)=\lambda g(x).<br />
$$ Therefore \(f=\lambda g\) on \(E\). \(\square\)</p>
<p><strong>Theorem 43</strong> (Method of Lagrange multipliers, single constraint). <em>Let \(A\subseteq E\) be an open set. Let \(f:A\to\mathbb{R}\) and \(g:A\to\mathbb{R}\) be of class \(C^1\), and let \(S=g^{-1}(\{0\})\). If \(f|_S\) has an extreme value at \(x\in S\) and \(g&#8217;(x)\ne 0\), then there is a number \(\lambda\in\mathbb{R}\) such that \(f&#8217;(x)=\lambda g&#8217;(x)\).</em></p>
<p><em>Proof.</em> If we can prove that \(\ker g&#8217;(x)\subseteq\ker f&#8217;(x)\) then the result follows from Lemma 42. Choose some \(w\notin\ker g&#8217;(x)\), let \(F=\ker g&#8217;(x)\) and let \(G=\langle w \rangle\); then \(E=F\oplus G\). Let \(B=A\cap F\) and let \(C=A\cap G\). Write \(x=(x_1,x_2)\) where \(x_1\in B\) and \(x_2\in C\). Since \(g&#8217;(x)\ne 0\), \(D_2 g(x)\) is invertible; we also have \(g(x_1,x_2)=0\). By the implicit function theorem, there exists a neighborhood \(U\subseteq B\) of \(x_1\) and a \(C^1\) map \(h:U\to C\) such that \(h(x_1)=x_2\) and \(g(x_1,h(x_1))=0\). Let \(\widetilde{h}:U\to A\) be given by \(t\mapsto(t,h(t))\) so that \(\widetilde{h}(U)\subseteq S\) and \(h&#8217;(x_1)|_F\) is the identity map on \(F\). Since \(f|_S\) has an extreme value at \(x\) we have \((f\circ\widetilde{h})&#8217;(x_1)=0\) by Theorem 36, so \(f&#8217;(x)\circ \widetilde{h}&#8217;(x_1)=0\) by the chain rule. In particular, if \(v\in\ker g&#8217;(x)=F\) then $$<br />
0=[f'(x)\circ\widetilde{h}'(x_1)](v)=f&#8217;(x)v,<br />
$$ so \(v\in\ker f&#8217;(x)\). \(\square\)</p>
<p>There is also a more general version for a constraint function that maps into an infinite-dimensional space. We omit the proof because it requires a few theorems from functional analysis.</p>
<p><strong>Theorem 44</strong> (Method of Lagrange multipliers, multiple constraints). <em>Let \(A\subseteq E\) be an open set. Let \(f:A\to\mathbb{R}\) and \(g:A\to F\) be of class \(C^1\), and let \(S=g^{-1}(\{0\})\). If \(f|_S\) has an extreme value at \(x\in S\) and \(g&#8217;(x)\) is surjective, then there is a continuous linear map \(\lambda:F\to\mathbb{R}\) such that \(f&#8217;(x)=\lambda\circ g&#8217;(x)\).</em></p>
<h2>Conclusion</h2>
<p>We have seen how the theorems of multivariable calculus in \(\mathbb{R}^n\) generalize easily to more general Banach spaces. Because we can work coordinate-free, the proofs are often easier to understand than their \(\mathbb{R}^n\) counterparts. By constructing the derivative on Banach spaces, we gain a powerful tool that allows us to both do computations and prove things much more easily than before.</p>
<p>Navigation: <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/">1. The derivative</a> | <a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/">2. Higher derivatives</a> | <a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/">3. Partial derivatives</a> | <a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/">4. Inverse and implicit functions</a> | <strong>5. Maxima and minima</strong></p>
<div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2013/02/26/convex-functions-second-derivatives-and-hessian-matrices/"     class="crp_title">Convex functions, second derivatives and Hessian matrices</a></li><li><a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/"     class="crp_title">Differentiation done correctly: 2. Higher derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/"     class="crp_title">Differentiation done correctly: 4. Inverse and implicit&hellip;</a></li><li><a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/"     class="crp_title">Differentiation done correctly: 3. Partial derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/"     class="crp_title">Differentiation done correctly: 1. The derivative</a></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Differentiation done correctly: 4. Inverse and implicit functions</title>
		<link>http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/</link>
		<comments>http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/#comments</comments>
		<pubDate>Sun, 24 Feb 2013 07:30:03 +0000</pubDate>
		<dc:creator>wj32</dc:creator>
				<category><![CDATA[Mathematics]]></category>

		<guid isPermaLink="false">http://wj32.org/wp/?p=956</guid>
		<description><![CDATA[Navigation: 1. The derivative &#124; 2. Higher derivatives &#124; 3. Partial derivatives &#124; 4. Inverse and implicit functions &#124; 5. Maxima and minima Now we&#8217;re going to prove the inverse function and implicit function theorems for Banach spaces. Theorem 32 &#8230; <a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/"     class="crp_title">Differentiation done correctly: 3. Partial derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/"     class="crp_title">Differentiation done correctly: 2. Higher derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/"     class="crp_title">Differentiation done correctly: 5. Maxima and minima</a></li><li><a href="http://wj32.org/wp/2012/12/22/free-product-of-free-groups-and-group-presentations/"     class="crp_title">Free product of free groups and group presentations</a></li><li><a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/"     class="crp_title">Differentiation done correctly: 1. The derivative</a></li></ul></div>]]></description>
				<content:encoded><![CDATA[<p>Navigation: <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/">1. The derivative</a> | <a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/">2. Higher derivatives</a> | <a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/">3. Partial derivatives</a> | <strong>4. Inverse and implicit functions</strong> | <a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/">5. Maxima and minima</a></p>
<p>Now we&#8217;re going to prove the inverse function and implicit function theorems for Banach spaces.</p>
<p><span id="more-956"></span></p>
<p><strong>Theorem 32</strong> (Contraction principle). <em>Let \((X,d)\) be a complete metric space and let \(\varphi:X\to X\) be a map satisfying $$<br />
d(\varphi(x),\varphi(y)) \le cd(x,y)<br />
$$ for all \(x,y\in X\) and some constant \(c < 1\). Then there is exactly one \(x\in X\) for which \(\varphi(x)=x\).</em></p>
<p><em>Proof.</em> Choose any \(x_0\in X\) and define \(x_{n+1}=\varphi(x_n)\). For all \(n\ge 1\) we have $$<br />
d(x_{n+1},x_n)=d(\varphi(x_n),\varphi(x_{n-1}))\le cd(x_n,x_{n-1}),<br />
$$ so \(d(x_{n+1},x_n)\le c^n d(x_1,x_0)\) by induction. For all \(m > n\), \begin{align}<br />
d(x_n,x_m) &#038;\le d(x_n,x_{n+1})+\cdots+d(x_{m-1},x_m) \\<br />
&#038;\le (c^n+\cdots+c^{m-1})d(x_1,x_0) \\<br />
&#038;\le c^n(1-c)^{-1}d(x_1,x_0),<br />
\end{align} which shows that \(\{x_n\}\) is a Cauchy sequence. Since \(X\) is complete, \(x_n\to x\) for some \(x\in X\). Furthermore, $$<br />
x=\lim_{n\to\infty}x_{n+1}=\lim_{n\to\infty}\varphi(x_n)=\varphi(x)<br />
$$ since \(\varphi\) is continuous. Uniqueness is obvious. \(\square\)</p>
<p><strong>Theorem 33</strong> (Inverse function theorem). <em>Let \(A\subseteq E\) be an open set and let \(f:A\to F\) be of class \(C^p\) (with \(p\ge 1\)). Suppose that \(f&#8217;(p)\) is invertible for some \(p\in A\). Then there is a neighborhood \(U\subseteq A\) of \(p\) such that \(f(U)\) is open and \(f|_U:U\to f(U)\) is a \(C^p\) diffeomorphism.</em></p>
<p><em>Proof.</em> Let \(\iota:E\to E\) be the identity map. By replacing \(f\) with \(f&#8217;(p)^{-1}\circ f\), we may assume that \(E=F\) and \(f&#8217;(p)=\iota\). Since \(f&#8217;\) is continuous at \(p\), there exists an open ball \(U\subseteq A\) around \(p\) such that \(|f&#8217;(x)-\iota| < \frac{1}{2}\) for all \(x\in U\). For \(y\in f(U)\), define the map \(\varphi_y(x)=x-f(x)+y\). Note that \(x\) is a fixed point of \(\varphi_y\) if and only if \(f(x)=y\). For \(y\in f(U)\) we have \(|\varphi'_y(x)|=|f'(x)-\iota|<\frac{1}{2}\) for all \(x\in U\), so by <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/#id-16">Corollary 16</a> we have $$<br />
|\varphi_y(x_1)-\varphi_y(x_2)| \le \frac{1}{2}|x_1-x_2|\tag{*}<br />
$$ for all \(x_1,x_2\in U\). Using the uniqueness argument in Theorem 32, we conclude that \(f|_U:U\to f(U)\) is a bijection.</p>
<p>Now let \(b\in f(U)\) so that \(b=f(a)\) for some \(a\in U\). Let \(B\) be an open ball with radius \(r\) around \(a\) such that \(\overline{B}\subseteq U\), and let \(B&#8217;\) be an open ball of radius \(r/2\) around \(b\). We want to show that \(B&#8217;\subseteq f(U)\), thus proving that \(f(U)\) is open. Let \(y\in B&#8217;\). If \(x\in\overline{B}\) then \begin{align}<br />
|\varphi_y(x)-a| &#038;\le |\varphi_y(x)-\varphi_y(a)|+|\varphi_y(a)-a| \\<br />
&#038;< \frac{1}{2}|x-a|+|y-b| \\<br />
&#038;< r,<br />
\end{align} so \(\varphi_y(x)\in B\). This together with (*) shows that \(\varphi_y|_{\overline{B}}:\overline{B}\to\overline{B}\) is a contraction mapping, and since \(\overline{B}\) is complete we can apply Theorem 32 to obtain a fixed point \(x\in\overline{B}\) of \(\varphi_y|_{\overline{B}}\), which implies that \(f(x)=y\) and \(y\in f(U)\).</p>
<p>For the last part of the proof, we denote \(f|_U\) by \(f\) and \((f|_U)^{-1}\) by \(f^{-1}\) for convenience. Let \(y\in f(U)\) and \(y+k\in f(U)\) with \(k\ne 0\); there exist \(x\in U\) and \(x+h\in U\) with \(y=f(x)\) and \(y+k=f(x+h)\), noting that \(h\ne 0\). In fact we have \begin{align}<br />
|h-k| &#038;= |h-f(x+h)+f(x)| \\<br />
&#038;= |\varphi_y(x+h)-\varphi_y(x)| \\<br />
&#038;\le \frac{1}{2}|h|<br />
\end{align} from (*), so \(|h|\le 2|k|\). Then \(h\to 0\) as \(k\to 0\) and \begin{align}<br />
\frac{|f^{-1}(y+k)-f^{-1}(y)-f'(x)^{-1}k|}{|k|} &#038;= \frac{|f'(x)^{-1}(f(x+h)-f(x))-h|}{|k|} \\<br />
&#038;\le |f'(x)^{-1}|\frac{|f(x+h)-f(x)-f'(x)h|}{|k|} \\<br />
&#038;\le 2|f'(x)^{-1}|\frac{|f(x+h)-f(x)-f'(x)h|}{|h|} \\<br />
&#038;\to 0<br />
\end{align} as \(h\to 0\). (Note that \(f'(x)\) is invertible since \(|f'(x)-\iota|<\frac{1}{2}\).) This proves that $$<br />
(f^{-1})'(y)=f'(x)^{-1}=f'(f^{-1}(y))^{-1},\tag{**}<br />
$$ so \(f^{-1}\) is continuous and differentiable on \(f(U)\). Furthermore, (**) shows that \((f^{-1})'\) is of class \(C^p\) since the maps \(f^{-1}\), \(f'\) and \(\lambda\mapsto\lambda^{-1}\) (operator inversion) are all of class \(C^p\). \(\square\)</p>
<p><strong>Theorem 34</strong> (Implicit function theorem). <em>Let \(A\subseteq E\) and \(B\subseteq F\) be open sets and let \(f:A\times B\to G\) be of class \(C^p\) (with \(p\ge 1\)). Suppose \((a,b)\in A\times B\) such that \(f(a,b)=0\) and \(D_2 f(a,b):F\to G\) is invertible. Then there exists a neighborhood \(U\) of \(a\) and a \(C^p\) map \(g:U\to B\) with the following properties:</em></p>
<ol>
<li><em>\(g(a)=b\).</em></li>
<li><em>\(f(x,g(x))=0\) for all \(x\in A\).</em></li>
<li><em>\(g&#8217;(a)=-[D_2 f(a,b)]^{-1}\circ D_1 f(a,b)\).</em></li>
</ol>
<p><em>Proof.</em> Let \(\iota:E\to E\) be the identity map. Define \begin{align}<br />
\widetilde{f}:A\times B &#038;\to E\times G \\<br />
(x,y) &#038;\mapsto (x,f(x,y))<br />
\end{align} and compute $$<br />
\widetilde{f}&#8217;(a,b) = \begin{bmatrix}<br />
\iota &#038; 0 \\<br />
D_1 f(a,b) &#038; D_2 f(a,b)<br />
\end{bmatrix}.<br />
$$ Then \(\widetilde{f}&#8217;(a,b)\) is invertible, with $$<br />
\widetilde{f}&#8217;(a,b)^{-1} = \begin{bmatrix}<br />
\iota &#038; 0 \\<br />
-[D_2 f(a,b)]^{-1}\circ D_1 f(a,b) &#038; [D_2 f(a,b)]^{-1}<br />
\end{bmatrix}.\tag{*}<br />
$$ By the inverse function theorem, there exist neighborhoods \(V\subseteq A\times B\) of \((a,b)\) and \(W\subseteq E\times G\) of \((a,0)\) such that \(\widetilde{f}|_V:V\to W\) is a \(C^p\) diffeomorphism. Let \(U=\{x\in E:(x,0)\in W\}\); it is clear that \(U\) is a neighborhood of \(a\). Define \(g:U\to B\) by \(g=\pi\circ(\widetilde{f}|_V)^{-1}\circ i\) where \(\pi:A\times B\to B\) is the canonical projection and \(i:A\to A\times B\) is given by \(i(x)=(x,0)\). To complete the proof, we check the three required properties. Firstly, $$<br />
g(a)=\pi((\widetilde{f}|_V)^{-1}(a,0))=\pi(a,b)=b<br />
$$ since \(\widetilde{f}(a,b)=(a,0)\). If \(x\in U\) then \((x,0)\in W\), so \((x,f(x,y))=\widetilde{f}(x,y)=(x,0)\) for a unique \(y\in B\) and $$<br />
f(x,g(x))=f(x,\pi((\widetilde{f}|_V)^{-1}(x,0)))=f(x,\pi(x,y))=f(x,y)=0.<br />
$$ Lastly, \(g&#8217;(b)\) is simply the bottom left entry of (*). \(\square\)</p>
<p>In the next and final post, we will look at some applications of Taylor&#8217;s theorem and the implicit function theorem to finding minima and maxima of maps from Banach spaces to \(\mathbb{R}\).</p>
<p>Navigation: <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/">1. The derivative</a> | <a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/">2. Higher derivatives</a> | <a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/">3. Partial derivatives</a> | <strong>4. Inverse and implicit functions</strong> | <a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/">5. Maxima and minima</a></p>
<div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/"     class="crp_title">Differentiation done correctly: 3. Partial derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/"     class="crp_title">Differentiation done correctly: 2. Higher derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/"     class="crp_title">Differentiation done correctly: 5. Maxima and minima</a></li><li><a href="http://wj32.org/wp/2012/12/22/free-product-of-free-groups-and-group-presentations/"     class="crp_title">Free product of free groups and group presentations</a></li><li><a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/"     class="crp_title">Differentiation done correctly: 1. The derivative</a></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Differentiation done correctly: 3. Partial derivatives</title>
		<link>http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/</link>
		<comments>http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/#comments</comments>
		<pubDate>Sat, 23 Feb 2013 03:00:09 +0000</pubDate>
		<dc:creator>wj32</dc:creator>
				<category><![CDATA[Mathematics]]></category>

		<guid isPermaLink="false">http://wj32.org/wp/?p=935</guid>
		<description><![CDATA[Navigation: 1. The derivative &#124; 2. Higher derivatives &#124; 3. Partial derivatives &#124; 4. Inverse and implicit functions &#124; 5. Maxima and minima While we saw that differentiable maps may be naturally split into component functions when the codomain is &#8230; <a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/"     class="crp_title">Differentiation done correctly: 2. Higher derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/"     class="crp_title">Differentiation done correctly: 4. Inverse and implicit&hellip;</a></li><li><a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/"     class="crp_title">Differentiation done correctly: 1. The derivative</a></li><li><a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/"     class="crp_title">Differentiation done correctly: 5. Maxima and minima</a></li><li><a href="http://wj32.org/wp/2013/02/26/convex-functions-second-derivatives-and-hessian-matrices/"     class="crp_title">Convex functions, second derivatives and Hessian matrices</a></li></ul></div>]]></description>
				<content:encoded><![CDATA[<p>Navigation: <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/">1. The derivative</a> | <a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/">2. Higher derivatives</a> | <strong>3. Partial derivatives</strong> | <a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/">4. Inverse and implicit functions</a> | <a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/">5. Maxima and minima</a></p>
<p>While we saw that differentiable maps may be naturally split into component functions when the codomain is a product of Banach spaces, the situation for the domain is more complicated. (This is partly due to the fact that as a topological space, there is no natural injection into a product of Banach spaces.) In this post, we will look at how the existence of partial derivatives relates to differentiability, how the symmetry of higher derivatives (covered in part 2) affects mixed partial derivatives, and finally a short proof of <a href="http://en.wikipedia.org/wiki/Differentiation_under_the_integral_sign">differentiation under the integral sign</a>.</p>
<p><span id="more-935"></span></p>
<p>Let \(E_1,\dots,E_m\) be Banach spaces and let \(A_1\times\cdots\times A_m\subseteq E_1\times\cdots\times E_m\) be an open set where each \(A_j\) is open in \(E_j\). If \(f:A_1\times\cdots\times A_m\to F\) is any map and \(x=(x_1,\dots,x_m)\in A_1\times\cdots\times A_m\), we can consider the map $$t \mapsto f(x_1,\dots,t,\dots,x_m),$$ which can also be written as \(f\circ\iota\) where \(\iota:A_j\to A_1\times\cdots\times A_m\) is given by \(t\mapsto(x_1,\dots,t,\dots,x_m)\). If this map is differentiable at \(x_j\), we call its derivative the <strong>\(j\)th partial derivative of \(f\) at \(x\)</strong> and denote it by \(D_j f(x)\). Looking again at the definition of the derivative, we see that \(D_j f(x):E_j\to F\) is the unique continuous linear map such that $$<br />
\lim_{h\to 0}\frac{f(x_1,\dots,x_j+h,\dots,x_m)-f(x_1,\dots,x_m)-D_j f(x)h}{|h|}=0.<br />
$$ In practice, if we are working with functions defined on \(\mathbb{R}^m\) then we take \(E_j=\mathbb{R}\) for \(j=1,\dots,m\) so that we have a decomposition of \(\mathbb{R}^m\) into \(\mathbb{R}\times\cdots\times\mathbb{R}\). In this situation we often see the notation $$<br />
\frac{\partial f}{\partial x_j}(x) = D_j f(x)(1),<br />
$$ where we identify the linear map \(D_j f(x):\mathbb{R}\to F\) with the value \(D_j f(x)(1)\in F\).</p>
<p>It is not hard to see that all partial derivatives exist at \(x\) if \(f\) is differentiable at \(x\).</p>
<p><strong>Theorem 25.</strong> <em>Let \(A_1\times\cdots\times A_m\subseteq E_1\times\cdots\times E_m\) where each \(A_j\) is open in \(E_j\) and let \(f:A_1\times\cdots\times A_m\to F\). If \(f\) is differentiable at \(x=(x_1,\dots,x_m)\in A_1\times\cdots\times A_m\), then every \(D_j f(x)\) exists and we have \(D_j f(x)=Df(x)\circ\iota_j\) where \(\iota_j:E_j\to E_1\times\cdots\times E_m\) is the canonical injection, i.e. $$<br />
Df(x)=\begin{bmatrix}D_1 f(x) &#038; \cdots &#038; D_m f(x)\end{bmatrix}.<br />
$$</em></p>
<p><em>Proof.</em> Apply the chain rule to \(f\circ\iota\) where \(\iota:A_j\to A_1\times\cdots\times A_m\) is given by \(t\mapsto(x_1,\dots,t,\dots,x_m)\). Alternatively, restrict \(h\) to elements of \(E_j\) in the definition of the derivative \(Df(x)\). \(\square\)</p>
<p><strong>Definition 26.</strong> Let \(E_1,\dots,E_m\) and \(F_1,\dots,F_n\) be Banach spaces. Let \(A_1\times\cdots\times A_m\subseteq E_1\times\cdots\times E_m\) where each \(A_j\) is open in \(E_j\) and let \(f:A_1\times\cdots\times A_m\to F_1\times\cdots\times F_n\). The matrix $$<br />
\begin{bmatrix}<br />
D_1 f_1(x) &#038; \cdots &#038; D_m f_1(x) \\<br />
\vdots &#038; \ddots &#038; \vdots \\<br />
D_1 f_n(x) &#038; \cdots &#038; D_m f_n(x)<br />
\end{bmatrix}<br />
$$ is called the <strong>Jacobian</strong> matrix of \(f\) at \(x\in A\), where \(f_i=\pi_i\circ f\) and \(\pi_i:F_1\times\cdots\times F_n\to F_i\) is the canonical projection.</p>
<p><strong>Theorem 27.</strong> <em>If \(f:A_1\times\cdots\times A_m\to F_1\times\cdots\times F_n\) is differentiable at \(x\in A\), then the Jacobian matrix of \(f\) at \(x\) exists and represents \(Df(x)\).</em></p>
<p><em>Proof.</em> Apply <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/#id-8">Theorem 8</a> followed by Theorem 25. \(\square\)</p>
<p>As in \(\mathbb{R}^n\), it may be the case that every partial derivative \(D_j f_i(x)\) exists but \(f\) is not differentiable at \(x\). The differentiability of \(f\) implies the existence of the Jacobian matrix, but the converse is not true. Thus we do not have a true analog of Theorem 8 for partial derivatives. We do however have an analog of <a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/#id-18">Theorem 18</a>.</p>
<p><strong>Theorem 28.</strong> <em>Let \(A_1\times\cdots\times A_m\subseteq E_1\times\cdots\times E_m\) where each \(A_j\) is open in \(E_j\) and let \(f:A_1\times\cdots\times A_m\to F\). Then \(f\) is of class \(C^p\) (with \(p\ge 1\)) if and only if every partial derivative $$<br />
D_j f:A_1\times\cdots\times A_m\to L(E_j,F)<br />
$$ is of class \(C^{p-1}\). In that case we have \(D_j f(x)=Df(x)\circ\iota_j\) where \(\iota_j:E_j\to E_1\times\cdots\times E_m\) is the canonical injection, i.e. $$<br />
Df(x)=\begin{bmatrix}D_1 f(x) &#038; \cdots &#038; D_m f(x)\end{bmatrix}.<br />
$$</em></p>
<p><em>Proof.</em> It is clear from the proof of Theorem 25 that every partial derivative is of class \(C^{p-1}\) if \(f\) is of class \(C^p\). For the converse, we only need to prove that \(Df\) exists on \(A_1\times\cdots\times A_m\) since $$<br />
Df(x)=\sum_{j=1}^m D_j f(x)\circ\pi_j<br />
$$ implies that \(Df\) is of class \(C^{p-1}\) if every \(D_j f\) is of class \(C^{p-1}\), where \(\pi_j:E_1\times\cdots\times E_m\to E_j\) is the canonical projection. Let \(x\in A_1\times\cdots\times A_m\) and let \(\varepsilon > 0\). Since every \(D_j f\) is continuous, there exists a \(\delta > 0\) such that $$<br />
|D_j f(y)-D_j f(x)| < \frac{\varepsilon}{m}<br />
$$ for all \(j=1,\dots,m\) and \(y\in B_\delta(x)\subseteq A_1\times\cdots\times A_m\) where \(B_\delta(x)\) is the open ball of radius \(\delta\) around \(x\). Let \(h=(h_1,\dots,h_m)\in E_1\times\cdots\times E_m\) with \(|h|<\delta\). For \(j=0,\dots,m\), let \(p_j=h_1+\cdots+h_j\) so that \(p_0=0\), \(p_m=h\), and $$<br />
f(x+h)-f(x)=\sum_{j=1}^m [f(x+p_j)-f(x+p_{j-1})].<br />
$$ For each \(j=1,\dots,m\) the line segment from \(x+p_{j-1}\) to \(x+p_j=x+p_{j-1}+h_j\) is contained in \(B_\delta(x)\), so we have $$<br />
f(x+p_j)-f(x+p_{j-1}) = \int_0^1 D_j f(x+p_{j-1}+th_j)h_j\,dt<br />
$$ by the <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/#id-14">mean value theorem</a>. Then \begin{align}<br />
&#038; \left\vert f(x+h)-f(x)-\sum_{j=1}^m D_j f(x)h_j \right\vert \\<br />
&#038;\le \left\vert \sum_{j=1}^m \left[ \int_0^1 D_j f(x+p_{j-1}+th_j)h_j\,dt - \int_0^1 D_j f(x)h_j\,dt \right] \right\vert \\<br />
&#038;\le \sum_{j=1}^m |h_j| \int_0^1 |D_j f(x+p_{j-1}+th_j)-D_j f(x)|\,dt \\<br />
&#038;\le \sum_{j=1}^m |h_j| \frac{\varepsilon}{m} \\<br />
&#038;\le |h|\varepsilon<br />
\end{align} for all \(|h|<\delta\), which shows that $$<br />
Df(x)=\sum_{j=1}^m D_j f(x)\circ\pi_j.<br />
$$ \(\square\)</p>
<p>As in the case of the ordinary derivative \(Df\), we may take higher derivatives of partial derivatives: $$<br />
D_{j_1}\cdots D_{j_r} f:A_1\times\cdots\times A_m\to L(E_{j_1},\dots,L(E_{j_r},F)\dots).<br />
$$ These are sometimes known as <strong>mixed partial derivatives</strong>. <a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/#id-21">Theorem 21</a> has an important interpretation in terms of the mixed partial derivatives of \(f\).</p>
<p><a name="id-29"></a><br />
<strong>Theorem 29</strong> (Equality of mixed partial derivatives). <em>Let \(A_1\times\cdots\times A_m\subseteq E_1\times\cdots\times E_m\) where each \(A_j\) is open in \(E_j\) and let \(f:A_1\times\cdots\times A_m\to F\) be of class \(C^2\). Then $$<br />
D_j D_k f(x)(u)(v) = D_k D_j f(x)(v)(u)<br />
$$ for all \(1 \le j,k \le m\), \(x\in A_1\times\cdots\times A_m\), \(u\in A_j\) and \(v\in A_k\).</em></p>
<p><em>Proof.</em> For \(j=1,\dots,m\), let \(\iota_j:E_j\to E_1\times\cdots\times E_m\) be the canonical injection. We have that \(D_k f(x)=Df(x)\circ \iota_k\), so \(D_k f=c\circ Df\) where \begin{align}<br />
c:L(E_1\times\cdots\times E_m,F) &#038;\to L(E_k,F) \\<br />
\lambda &#038;\mapsto \lambda \circ \iota_k.<br />
\end{align} Similarly, \(D_j D_k f = d\circ D(D_k f)\) where \begin{align}<br />
d:L(E_1\times\cdots\times E_m,L(E_k,F)) &#038;\to L(E_j,L(E_k,F)) \\<br />
\lambda &#038;\mapsto \lambda \circ \iota_j.<br />
\end{align} Note that \(c\) and \(d\) are both linear maps. Therefore \begin{align}<br />
D_j D_k f(x) &#038;= [d\circ D(D_k f)](x) \\<br />
&#038;= d(D(c\circ Df)(x)) \\<br />
&#038;= d(Dc(Df(x))\circ D^2 f(x)) \\<br />
&#038;= d(c\circ D^2 f(x)) \\<br />
&#038;= c\circ D^2 f(x)\circ\iota_j<br />
\end{align} and \begin{align}<br />
D_j D_k f(x)(u)(v) &#038;= (c\circ D^2 f(x)\circ\iota_j)(u)(v) \\<br />
&#038;= c(D^2 f(x)(\iota_j(u)))(v) \\<br />
&#038;= D^2 f(x)(\iota_j(u))(\iota_k(v)).<br />
\end{align} A similar calculation shows that \(D_k D_j f(x)(v)(u) = D^2 f(x)(\iota_k(v))(\iota_j(u))\). But $$<br />
D^2 f(x)(\iota_k(v))(\iota_j(u)) = D^2 f(x)(\iota_j(u))(\iota_k(v))<br />
$$ by <a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/#id-21">Theorem 21</a>, so the result follows. \(\square\)</p>
<p>In the special case \(E_j=\mathbb{R}\) for \(j=1,\dots,m\) and \(f:\mathbb{R}^m\to F\), we have $$<br />
\frac{\partial f}{\partial x_j \partial x_k}(x) = D_j D_k f(x)(1)(1) = D_k D_j f(x)(1)(1) = \frac{\partial f}{\partial x_k \partial x_j}(x),<br />
$$ which is sometimes known as the <a href="http://en.wikipedia.org/wiki/Symmetry_of_second_derivatives">symmetry of second derivatives</a>, or <a href="http://en.wikipedia.org/wiki/Symmetry_of_second_derivatives#Clairaut.27s_theorem">Clairaut&#8217;s theorem</a>.</p>
<p>Often, \(p\) times continuous differentiability is defined in terms of the mixed partial derivatives of \(f\). The following theorem shows that this definition is equivalent to ours.</p>
<p><strong>Theorem 30.</strong> <em>Let \(A_1\times\cdots\times A_m\subseteq E_1\times\cdots\times E_m\) where each \(A_j\) is open in \(E_j\) and let \(f:A_1\times\cdots\times A_m\to F\). Then \(f\) is of class \(C^p\) (with \(p\ge 1\)) if and only if the partial derivative $$<br />
D_{\tau(1)}\cdots D_{\tau(k)}f<br />
$$ exists on \(A_1\times\cdots\times A_m\) and is continuous, for every \(k=1,\dots,p\) and every map \(\tau\) from \(\{1,\dots,k\}\) to \(\{1,\dots,m\}\). Furthermore, $$<br />
D_{\tau(\sigma(1))}\cdots D_{\tau(\sigma(k))}f(x)(v_{\sigma(1)},\dots,v_{\sigma(k)}) = D_{\tau(1)}\cdots D_{\tau(k)}f(x)(v_1,\dots,v_k)<br />
$$ for all \(x\in A_1\times\cdots\times A_m\) and any permutation \(\sigma\) of \(\{1,\dots,k\}\).</em></p>
<p><em>Proof.</em> This follows directly from Theorem 28 and Theorem 29. \(\square\)</p>
<h2>Differentiation under the integral sign</h2>
<p><strong>Theorem 31</strong> (Differentiation under the integral sign). <em>Let \(A\subseteq E\) be an open set and let \([a,b]\) be a closed interval with \(a < b\). Let \(f:[a,b]\times A\to F\) be a continuous map such that \(D_2 f\) exists on \([a,b]\times A\) and is continuous. Let \(g:A\to F\) be given by $$<br />
g(x)=\int_a^b f(t,x)\,dt.<br />
$$ Then \(g\) is differentiable on \(A\) and $$<br />
Dg(x)=\int_a^b D_2 f(t,x)\,dt.<br />
$$</em></p>
<p><em>Proof.</em> Let \(x\in A\). Let $$<br />
\lambda = \int_a^b D_2 f(t,x)\,dt.<br />
$$ For sufficiently small \(h\) we have \begin{align}<br />
g(x+h)-g(x)-\lambda h &#038;= \int_a^b [f(t,x+h)-f(t,x)-D_2 f(t,x)h]\,dt \\<br />
&#038;= \int_a^b \left[ \int_0^1 D_2 f(t,x+sh)h\,ds-D_2 f(t,x)h \right]\,dt \\<br />
&#038;= \int_a^b \int_0^1 [D_2 f(t,x+sh)-D_2 f(t,x)]h\,ds\,dt<br />
\end{align} so that \begin{align}<br />
\frac{|g(x+h)-f(x)-\lambda h|}{|h|} &#038;\le \int_a^b \int_0^1 |D_2 f(t,x+sh)-D_2 f(t,x)|\,ds\,dt \\<br />
&#038;\le \sup_{s,t} |D_2 f(t,x+sh)-D_2 f(t,x)|<br />
\end{align} where the \(\sup\) is taken over all \(0\le s\le 1\) and \(a\le t\le b\).</p>
<p>Let \(\varepsilon > 0\). For each \(t\in[a,b]\) there is a neighborhood \(B_t\times U_t\) of \((t,x)\) such that \(|D_2 f(u,y)-D_2 f(t,x)|<\varepsilon\) whenever \((u,y)\in B_t\times U_t\), and such that \(B_t\) and \(U_t\) are open balls around \(t\) and \(x\) respectively. Since \([a,b]\) is compact, there are finitely many balls \(B_{t_1},\dots,B_{t_n}\) that cover \([a,b]\). Then for sufficiently small \(h\) such that \(x+h\in\bigcap_{k=1}^n U_{t_k}\) and all \(0\le s\le 1\) and \(a\le t\le b\) we have \(t\in B_{t_k}\) for some \(k\), so \begin{align}<br />
|D_2 f(t,x+sh)-D_2 f(t,x)| &#038;\le |D_2 f(t,x+sh)-D_2 f(t_k,x)| \\<br />
&#038;\qquad + |D_2 f(t_k,x)-D_2 f(t,x)| \\<br />
&#038;< 2\varepsilon.<br />
\end{align} \(\square\)</p>
<p>The next post will prove exactly three things: the Banach fixed-point theorem, the inverse function theorem, and the implicit function theorem.</p>
<p>Navigation: <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/">1. The derivative</a> | <a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/">2. Higher derivatives</a> | <strong>3. Partial derivatives</strong> | <a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/">4. Inverse and implicit functions</a> | <a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/">5. Maxima and minima</a></p>
<div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/"     class="crp_title">Differentiation done correctly: 2. Higher derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/"     class="crp_title">Differentiation done correctly: 4. Inverse and implicit&hellip;</a></li><li><a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/"     class="crp_title">Differentiation done correctly: 1. The derivative</a></li><li><a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/"     class="crp_title">Differentiation done correctly: 5. Maxima and minima</a></li><li><a href="http://wj32.org/wp/2013/02/26/convex-functions-second-derivatives-and-hessian-matrices/"     class="crp_title">Convex functions, second derivatives and Hessian matrices</a></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Differentiation done correctly: 2. Higher derivatives</title>
		<link>http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/</link>
		<comments>http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/#comments</comments>
		<pubDate>Fri, 22 Feb 2013 04:58:31 +0000</pubDate>
		<dc:creator>wj32</dc:creator>
				<category><![CDATA[Mathematics]]></category>

		<guid isPermaLink="false">http://wj32.org/wp/?p=905</guid>
		<description><![CDATA[Navigation: 1. The derivative &#124; 2. Higher derivatives &#124; 3. Partial derivatives &#124; 4. Inverse and implicit functions &#124; 5. Maxima and minima Last time, we covered the definition of the derivative and its basic properties, which all turn out &#8230; <a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/"     class="crp_title">Differentiation done correctly: 3. Partial derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/"     class="crp_title">Differentiation done correctly: 4. Inverse and implicit&hellip;</a></li><li><a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/"     class="crp_title">Differentiation done correctly: 1. The derivative</a></li><li><a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/"     class="crp_title">Differentiation done correctly: 5. Maxima and minima</a></li><li><a href="http://wj32.org/wp/2013/02/28/frechet-derivative-of-the-matrix-exponential-function/"     class="crp_title">Fréchet derivative of the (matrix) exponential function</a></li></ul></div>]]></description>
				<content:encoded><![CDATA[<p>Navigation: <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/">1. The derivative</a> | <strong>2. Higher derivatives</strong> | <a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/">3. Partial derivatives</a> | <a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/">4. Inverse and implicit functions</a> | <a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/">5. Maxima and minima</a></p>
<p>Last time, we covered the definition of the derivative and its basic properties, which all turn out to be quite similar to their single variable counterparts. Now we are going to explore higher derivatives. In traditional multivariable calculus, true higher derivatives do not exist (<a href="http://en.wikipedia.org/wiki/Hessian_matrix">except in a specific situation</a> which will be discussed in part 5). Of course, we have so-called &#8220;mixed/higher partial derivatives&#8221;, which are coordinate-dependent and notationally tricky to work with. As a consequence, the usual statement of Taylor&#8217;s theorem in \(\mathbb{R}^n\) ends up being ugly and hard to remember. In reality, Taylor&#8217;s theorem for Banach spaces looks almost exactly the same as the single variable Taylor&#8217;s theorem!</p>
<p><span id="more-905"></span></p>
<p>Recall that for any Banach space \(F\), the space of continuous linear maps \(L(E,F)\) is also a Banach space. If \(A\subseteq E\) is an open set and \(f:A\to F\) is differentiable, then \(Df=f&#8217;:A\to L(E,F)\) is a map between Banach spaces. Therefore we may consider the second derivative $$D^2 f=f^{\prime\prime}:A\to L(E,L(E,F))$$ obtained by differentiating \(f&#8217;\). Continuing the process, we have higher order derivatives $$D^p f = f^{(p)} : A \to L^p(E,F),$$ where \(L^p(E,F)=L(E,L(E,\dots,L(E,F)\dots))\). It is clear that \(D^p(f+g)=D^p f + D^p g\) and \(D^p(cf)=cD^p f\) for all scalars \(c\). We say that \(f\) is <strong>of class \(C^p\)</strong> or is <strong>\(p\) times continuously differentiable</strong> if \(D^p f(x)\) exists for all \(x\in A\) and \(D^p f\) is continuous on \(A\). Note that if \(f\) is of class \(C^p\), then \(D^k f\) is automatically continuous for all \(0 \le k < p\) as well. We identify \(L^p(E,F)\) with the space of multilinear maps \(L(E,\dots,E;F)\), and write \(D^p f(x_1,\dots,x_p)\) for \(D^p f(x_1)\cdots(x_p)\).</p>
<p>Our definition of \(p\) times continuous differentiability is not the usual one that is stated in terms of mixed partial derivatives, but part 3 will make it clear that these definitions are equivalent.</p>
<p>We have the following extension of <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/#id-8">Theorem 8</a>, which follows easily by induction.</p>
<p><a name="id-18"></a><br />
<strong>Theorem 18.</strong> <em>Let \(A\subseteq E\) be an open set, let \(F_1,\dots,F_n\) be Banach spaces, let \(f:A\to F_1\times\cdots\times F_n\) and let \(f_i=\pi_i\circ f\) be the component functions of \(f\), where \(\pi_i:F_1\times\cdots\times F_n\to F_i\) is the canonical projection. Then \(f\) is of class \(C^p\) if and only if every \(f_i\) is of class \(C^p\). In that case we have \(D^p f_i(x)=\pi_i\circ D^p f(x)\), i.e. $$<br />
D^p f(x) = \begin{bmatrix}D^p f_1(x) \\ \vdots \\ D^p f_n(x)\end{bmatrix}.<br />
$$</em></p>
<p><strong>Theorem 19.</strong> <em>Let \(A\subseteq E\) and \(B\subseteq F\) be open sets. Let \(f:A\to F\) and \(g:B\to G\) be class \(C^p\) maps with \(f(A)\subseteq B\). Then \(g\circ f\) is of class \(C^p\).</em></p>
<p><em>Proof.</em> We use induction on \(p\), with the chain rule proving the case \(p=1\) (the case \(p=0\) also holds because a composition of continuous maps is also continuous). Assume that the result holds for \(p-1\) and suppose \(f\) and \(g\) are of class \(C^p\). By the chain rule we have $$D(g\circ f)(x)=Dg(f(x))\circ Df(x).$$ As a function of \(x\), the right hand side is a composition of \(C^{p-1}\) maps, so the induction hypothesis shows that \(D(g\circ f)\) is of class \(C^{p-1}\) and therefore \(g\circ f\) is of class \(C^p\). \(\square\)</p>
<h2>Symmetry</h2>
<p>An important fact is that \(D^p f(x)\) is always symmetric (as a multilinear map) if \(f\) is of class \(C^p\). In part 3, we will show that the well-known <a href="http://en.wikipedia.org/wiki/Symmetry_of_second_derivatives">equality of mixed partial derivatives</a> is a special case of this. To prove this result, we start with the case \(p=2\).</p>
<p><strong>Lemma 20.</strong> <em>Let \(\varphi:E\times E\to F\) be a bilinear map. If there is a map \(\psi\) into \(F\) defined for sufficiently small \((v,w)\in E\times E\) such that $$<br />
\lim_{(v,w)\to (0,0)} \psi(v,w) = 0<br />
$$ and $$<br />
|\varphi(v,w)| \le |\psi(v,w)||v||w|,<br />
$$ then \(\varphi=0\).</em></p>
<p><em>Proof.</em> Let \(v,w\in E\). For sufficiently small \(s>0\) we have $$<br />
|\varphi(sv,sw)| \le |\psi(sv,sw)||sv||sw|,<br />
$$ so $$<br />
s^2|\varphi(v,w)| \le s^2|\psi(sv,sw)||v||w|.<br />
$$ Dividing by \(s^2\) and taking \(s\to 0\) proves the result. \(\square\)</p>
<p><a name="id-21"></a><br />
<strong>Theorem 21.</strong> <em>Let \(A\subseteq E\) be an open set and let \(f:A\to F\) be a class \(C^2\) map. Then for every \(x\in A\), the bilinear map \(D^2 f(x)\) is symmetric. That is, $$<br />
D^2 f(x)(v,w)=D^2 f(x)(w,v)<br />
$$ for all \(v,w\in E\).</em></p>
<p><em>Proof.</em> Let \(x\in A\) and choose \(r > 0\) so that the open ball of radius \(r\) around \(x\) is contained in \(A\). Let \(v,w\in E\) with \(|v|,|w| < r/2\). Define \(g(x)=f(x+v)-f(x)\). The <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/#id-14">mean value theorem</a> then gives \begin{align}<br />
g(x+w)-g(x) &#038;= \int_0^1 g&#8217;(x+tw)w\,dt \\<br />
&#038;= \int_0^1 [Df(x+v+tw)-Df(x+tw)]w\,dt \\<br />
&#038;= \int_0^1 \left( \int_0^1 D^2 f(x+sv+tw)v\,ds \right) w\,dt \\<br />
&#038;= \int_0^1\int_0^1 D^2 f(x)(v,w)\,ds\,dt + \int_0^1\int_0^1 \psi(sv,tw)(v,w)\,ds\,dt \\<br />
&#038;= D^2 f(x)(v,w)+\varphi(v,w)<br />
\end{align} where \(\psi(\alpha,\beta)=D^2 f(x+\alpha+\beta)-D^2 f(x)\) and $$<br />
\varphi=\int_0^1 \int_0^1 \psi(sv,tw)\,ds\,dt.<br />
$$ If we repeat the above process starting with \(g_1\) in place of \(g\), where \(g_1(x)=f(x+w)-f(x)\), we obtain that $$<br />
g_1(x+v)-g_1(x)=D^2 f(x)(w,v)+\varphi(w,v).<br />
$$ Since \(g(x+w)-g(x)=g_1(x+v)-g_1(x)\), we have $$<br />
D^2 f(x)(w,v)-D^2 f(x)(v,w)=\varphi(v,w)-\varphi(w,v),<br />
$$ and from the definitions of \(\varphi\) and \(\psi\) we see that $$<br />
|D^2 f(x)(w,v)-D^2 f(x)(v,w)| \le 2 \sup_{0 \le s,t \le 1} |\psi(sv,tw)||v||w|.<br />
$$ Since \(D^2 f\) is continuous, we can apply Lemma 20 to the bilinear map $$<br />
(v,w) \mapsto D^2 f(x)(w,v)-D^2 f(x)(v,w)<br />
$$ to obtain the result. \(\square\)</p>
<p><a name="id-22"></a><br />
<strong>Theorem 22.</strong> <em>Let \(A\subseteq E\) be an open set and let \(f:A\to F\) be a class \(C^p\) map. Then for every \(x\in A\), the multilinear map \(D^p f(x)\) is symmetric.</em></p>
<p><em>Proof.</em> We use induction on \(p\), with Theorem 21 proving the case \(p=2\). Suppose the result holds for \(2,\dots,p-1\). If \(v_1,\dots,v_p\in E\), then \begin{align}<br />
D^p f(x)(v_1,\dots,v_p) &#038;= D^2 D^{p-2} f(x)(v_1,v_2)(v_3,\dots,v_p) \\<br />
&#038;= D^2 D^{p-2} f(x)(v_2,v_1)(v_3,\dots,v_p) \\<br />
&#038;= D^2 D^{p-2} f(x)(v_2,v_1,v_3,\dots,v_p)\tag{*}<br />
\end{align} by applying Theorem 21 to the \(C^2\) map \(D^{p-2}f\). Also, the induction hypothesis shows that $$<br />
D^{p-1}f(x)(v_{\sigma(2)},\dots,v_{\sigma(p)})=D^{p-1}f(x)(v_2,\dots,v_p)<br />
$$ for any permutation \(\sigma\) of \(\{2,\dots,p\}\). If \(\varphi_\sigma:L^{p-1}(E,F)\to F\) is the linear map given by \(\lambda\mapsto\lambda(v_{\sigma(2)},\dots,v_{\sigma(p)})\) then \begin{align}<br />
D^p f(x)(v_1,v_{\sigma(2)},\dots,v_{\sigma(p)}) &#038;= \varphi_\sigma(D^p f(x)(v_1)) \\<br />
&#038;= D(\varphi_\sigma\circ D^{p-1}f)(x)(v_1) \\<br />
&#038;= D(\varphi_e\circ D^{p-1}f)(x)(v_1) \\<br />
&#038;= \varphi_e(D^p f(x)(v_1)) \\<br />
&#038;= D^p f(x)(v_1,\dots,v_p)\tag{**}<br />
\end{align} where \(e\) is the identity permutation. Since any permutation of \(\{1,\dots,p\}\) can be expressed as a composition of the permutations considered in (*) and (**), \(D^p f(x)\) is symmetric. \(\square\)</p>
<h2>Taylor&#8217;s theorem</h2>
<p>Taylor&#8217;s theorem is an important tool for approximating a function based on its derivatives. It will be important in part 5, where we look at necessary and sufficient conditions for a point to be a local minimum or local maximum.</p>
<p><strong>Theorem 23</strong> (Taylor&#8217;s theorem). <em>Let \(A\subseteq E\) be an open set and let \(f:A\to F\) be a class \(C^p\) map. Let \(x\in A\) and let \(v\in E\). Assume that the line segment \(x+tv\) with \(0\le t\le1\) is contained in \(A\). Write \(v^{(k)}\) for the \(k\)-tuple \((v,\dots,v)\). Then $$<br />
f(x+v)=\sum_{k=0}^{p-1}\frac{D^k f(x)v^{(k)}}{k!} + R_p,<br />
$$ where $$<br />
R_p = \int_0^1 \frac{(1-t)^{p-1}}{(p-1)!} D^p f(x+tv)v^{(p)}\,dt.<br />
$$</em></p>
<p><em>Proof.</em> We use induction on \(p\), with <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/#id-14">the mean value theorem</a> proving the case \(p=1\). Assume that the result holds for \(p-1\). Let $$<br />
g(t)=\frac{(1-t)^{p-1}}{(p-1)!} \quad\mathrm{and}\quad h(t)=D^{p-1}f(x+tv)v^{(p-1)}<br />
$$ so that $$<br />
g&#8217;(t)=\frac{-(1-t)^{p-2}}{(p-2)!} \quad\mathrm{and}\quad h&#8217;(t)=D^p f(x+tv)v^{(p)}.<br />
$$ (Note that for convenience, we are again identifying \(h&#8217;(t)\) with an element of \(F\).) Applying <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/#id-12">integration by parts</a> with the vector space product \(\mathbb{R}\times F\to F\) gives \begin{align}<br />
&#038; \int_0^1 \frac{-(1-t)^{p-2}}{(p-2)!} D^{p-1}f(x+tv)v^{(p-1)}\,dt + \int_0^1 \frac{(1-t)^{p-1}}{(p-1)!} D^p f(x+tv)v^{(p)}\,dt \\<br />
&#038;= -\frac{1}{(p-1)!} D^{p-1}f(x)v^{(p-1)},<br />
\end{align} and the result follows. \(\square\)</p>
<p><a name="id-24"></a><br />
<strong>Corollary 24</strong> (Taylor&#8217;s theorem with estimate). <em>In Theorem 23, we also have $$<br />
f(x+v)=\sum_{k=0}^p\frac{D^k f(x)v^{(k)}}{k!} + \theta(v)<br />
$$ where $$<br />
|\theta(v)| \le \sup_{0\le t\le 1} \frac{|D^p f(x+tv)-D^p f(x)|}{p!} |v|^p<br />
$$ and $$<br />
\lim_{v\to 0}\frac{\theta(v)}{|v|^p} = 0.<br />
$$</em></p>
<p><em>Proof.</em> Let \(\psi(\alpha) = D^p f(x+\alpha)-D^p f(x)\). We can write \(R_p\) as $$<br />
\int_0^1 \frac{(1-t)^{p-1}}{(p-1)!} D^p f(x)v^{(p)}\,dt + \int_0^1 \frac{(1-t)^{p-1}}{(p-1)!} D^p \psi(tv)v^{(p)}\,dt.<br />
$$ The first integral gives the \(p\)th term, and the second integral is bounded by $$<br />
\sup_{0\le t\le 1} |\psi(tv)||v|^p \int_0^1 \frac{(1-t)^{(p-1)}}{(p-1)!}\,dt = \frac{1}{p!} \sup_{0\le t\le 1} |\psi(tv)||v|^p.<br />
$$ The result follows from the continuity of \(D^p f\) at \(x\). \(\square\)</p>
<p>Next time, we will examine partial derivatives and how they relate to higher derivatives.</p>
<p>Navigation: <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/">1. The derivative</a> | <strong>2. Higher derivatives</strong> | <a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/">3. Partial derivatives</a> | <a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/">4. Inverse and implicit functions</a> | <a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/">5. Maxima and minima</a></p>
<div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/"     class="crp_title">Differentiation done correctly: 3. Partial derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/"     class="crp_title">Differentiation done correctly: 4. Inverse and implicit&hellip;</a></li><li><a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/"     class="crp_title">Differentiation done correctly: 1. The derivative</a></li><li><a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/"     class="crp_title">Differentiation done correctly: 5. Maxima and minima</a></li><li><a href="http://wj32.org/wp/2013/02/28/frechet-derivative-of-the-matrix-exponential-function/"     class="crp_title">Fréchet derivative of the (matrix) exponential function</a></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Differentiation done correctly: 1. The derivative</title>
		<link>http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/</link>
		<comments>http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/#comments</comments>
		<pubDate>Thu, 21 Feb 2013 03:20:12 +0000</pubDate>
		<dc:creator>wj32</dc:creator>
				<category><![CDATA[Mathematics]]></category>

		<guid isPermaLink="false">http://wj32.org/wp/?p=860</guid>
		<description><![CDATA[Navigation: 1. The derivative &#124; 2. Higher derivatives &#124; 3. Partial derivatives &#124; 4. Inverse and implicit functions &#124; 5. Maxima and minima In multivariable calculus, one often encounters nonsensical equations such as the following: (Chain rule.) If \(F=F(x,y)\) and &#8230; <a href="http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/"     class="crp_title">Differentiation done correctly: 3. Partial derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/"     class="crp_title">Differentiation done correctly: 2. Higher derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/"     class="crp_title">Differentiation done correctly: 4. Inverse and implicit&hellip;</a></li><li><a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/"     class="crp_title">Differentiation done correctly: 5. Maxima and minima</a></li><li><a href="http://wj32.org/wp/2013/02/26/convex-functions-second-derivatives-and-hessian-matrices/"     class="crp_title">Convex functions, second derivatives and Hessian matrices</a></li></ul></div>]]></description>
				<content:encoded><![CDATA[<p>Navigation: <strong>1. The derivative</strong> | <a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/">2. Higher derivatives</a> | <a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/">3. Partial derivatives</a> | <a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/">4. Inverse and implicit functions</a> | <a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/">5. Maxima and minima</a></p>
<p>In multivariable calculus, one often encounters nonsensical equations such as the following:</p>
<p><span id="more-860"></span></p>
<blockquote><p>(Chain rule.) If \(F=F(x,y)\) and \(x=x(t)\), \(y=y(t)\) then $$\frac{dF}{dt}=\frac{dF}{\partial x}\frac{dx}{dt} + \frac{dF}{\partial y}\frac{dy}{dt}.$$</p></blockquote>
<p>(What is &#8220;\(F\)&#8221; on the left, and what is &#8220;\(F\)&#8221; on the right?) A partial derivative of a function \(f:\mathbb{R}^2\to\mathbb{R}\) is denoted by an assortment of vague expressions like $$<br />
\frac{\partial f}{\partial x} \quad\mathrm{or}\quad \frac{\partial f(x,y)}{\partial x} \quad\mathrm{or}\quad \frac{\partial f}{\partial x}(x,y) \quad\mathrm{or}\quad f_x,<br />
$$ which makes it hard to distinguish between the partial derivative as a function and the partial derivative at a point \((u,v)\):<br />
$$<br />
\frac{\partial f}{\partial x}(u,v) \quad\mathrm{or}\quad \left.\frac{\partial f(x,y)}{\partial x}\right|_{(x,y)=(u,v)} \quad\mathrm{or}\quad f_x(u,v) \quad\mathrm{?}<br />
$$<br />
Furthermore, it isn&#8217;t even clear what the difference between $$<br />
\frac{df}{dx} \quad\mathrm{and}\quad \frac{\partial f}{\partial x}<br />
$$ is!</p>
<p>Partial derivatives are usually introduced first because they provide an extension of differentiation in one variable that is convenient for computation. The so-called Jacobian matrix is relegated to a section on the change of variables theorem, which, as usual, is presented as a magic formula. Overall, one gets the impression that multivariable differentiation is far more complicated than single variable differentiation, and that the usual formulas do not apply.</p>
<p>This is all wrong. With the right definitions, the same old theorems (and even proofs) from single variable calculus still apply, with very few changes. I recently learned about this material from Lang&#8217;s <em><a href="http://www.amazon.com/Functional-Analysis-Graduate-Texts-Mathematics/dp/0387940014">Real and Functional Analysis</a></em>, which has an elegantly written chapter on differentiation. I will be writing 5 posts covering the basics of differentiation, starting with this one.</p>
<h2>Basic properties</h2>
<p><em>Note: if you do not know what a Banach space or a Banach algebra is, replace &#8220;Banach space&#8221; with &#8220;something like \(\mathbb{R}^n\) or \(\operatorname{Mat}_{n,m}(\mathbb{R})\) but might be infinite-dimensional&#8221;, and replace &#8220;Banach algebra&#8221; with &#8220;Banach space where you can multiply vectors&#8221;.</em></p>
<p>Here, \(E\), \(F\) and \(G\) will denote real Banach spaces. We write \(L(E,F)\) for the space of continuous linear maps from \(E\) to \(F\). All Banach algebras are assumed to be unital. As usual, if \(f:E\to F\) is a linear map then we often write \(fx\) instead of \(f(x)\). Although we treat the general case here, I recommend visualizing \(E,F\) as \(\mathbb{R^n}\) for concreteness.</p>
<p><strong>Definition 1.</strong> Let \(A \subseteq E\) be an open set and let \(f:A \to F\). A continuous linear map \(\lambda : E \to F\) is said to be the <strong>Fréchet derivative</strong> or just the <strong>derivative</strong> of \(f\) at a point \(x \in A\) if $$<br />
\lim_{h \to 0} \frac{f(x+h)-f(x)- \lambda h}{|h|} = 0,<br />
$$ and we write \(f&#8217;(x)=\lambda\) or \(Df(x)=\lambda\). If \(f\) has a derivative at a point \(x\), we say that \(f\) is <strong>differentiable at \(x\)</strong>. If \(f\) is differentiable at all \(x \in A\), then we say that \(f\) is <strong>differentiable on \(A\)</strong> or simply <strong>differentiable</strong>.</p>
<p>Obviously there is the question of uniqueness.</p>
<p><strong>Theorem 2.</strong> <em>Let \(A \subseteq E\) be an open set. The derivative of a function \(f:A\to F\) at a point \(x \in A\), if it exists, is unique.</em></p>
<p><em>Proof.</em> Suppose \(\lambda_1\) and \(\lambda_2\) are both derivatives of \(f\) at \(x\). Subtracting gives $$<br />
\lim_{h\to 0} \frac{(\lambda_2-\lambda_1)h}{|h|} = 0.<br />
$$ For a fixed nonzero \(u \in E\) we have $$<br />
\lim_{t\to 0^+} \frac{(\lambda_2-\lambda_1)tu}{|tu|} = 0,<br />
$$ and since the left hand side is independent of \(t\) we have \((\lambda_2-\lambda_1)u=0\) for all \(u \in E\). Therefore \(\lambda_1=\lambda_2\). \(\square\)</p>
<p><strong>Theorem 3.</strong> <em>Let \(A \subseteq E\) be an open set. If \(f:A\to F\) is differentiable at \(x \in A\) then \(f\) is continuous at \(x\).</em></p>
<p><em>Proof.</em> For small \(h\) we can write $$<br />
f(x+h)-f(x)=|h|\left(\frac{f(x+h)-f(x)-\lambda h}{|h|}\right)+\lambda h.<br />
$$ The right hand side tends to \(0\) as \(h \to 0\), so \(f\) is continuous at \(x\). \(\square\)</p>
<p>Suppose \(f : A \to F\) is differentiable on \(A\). We have a map \(f&#8217;:A\to L(E,F)\) that sends each point in \(A\) to the derivative of \(f\) at that point. If \(f&#8217;\) is continuous then we say that \(f\) is <strong>continuously differentiable</strong> or <strong>of class \(C^1\)</strong>. We will have more to say about this in a later post.</p>
<p><strong>Theorem 4</strong> (Chain rule). <em>Let \(A\subseteq E\) and \(B\subseteq F\) be open sets. Let \(f:A\to F\) and \(g:B\to G\) with \(f(A)\subseteq B\). If \(f\) is differentiable at \(x\) and \(g\) is differentiable at \(f(x)\), then \(g\circ f\) is differentiable at \(x\) and $$(g\circ f)&#8217;(x)=g&#8217;(f(x))\circ f&#8217;(x).$$</em></p>
<p><em>Proof.</em> To save space, let \(y=f(x)\) and define \begin{align}<br />
\phi(s) &#038;= f(x+s)-f(x)-f&#8217;(x)s, \\<br />
\psi(t) &#038;= g(y+t)-g(y)-g&#8217;(y)t, \\<br />
\rho(h) &#038;= g(f(x+h))-g(y)-g&#8217;(y)f&#8217;(x)h<br />
\end{align} so that $$<br />
\lim_{s\to 0}\frac{\phi(s)}{|s|} = \lim_{t\to 0}\frac{\psi(t)}{|t|}=0\tag{*}<br />
$$ since \(f\) is differentiable at \(x\) and \(g\) is differentiable at \(y\). We want to show that $$<br />
\lim_{h\to 0}\frac{\rho(h)}{|h|}=0.<br />
$$ For all sufficiently small \(h\), \begin{align}<br />
g(f(x+h))-g(y)&#038;=g(y+f&#8217;(x)h+\phi(h))-g(y) \\<br />
&#038;=g&#8217;(y)(f&#8217;(x)h+\phi(h))+\psi(f&#8217;(x)h+\phi(h))<br />
\end{align} and $$<br />
\rho(h)=g&#8217;(y)\phi(h)+\psi(f&#8217;(x)h+\phi(h)).<br />
$$ Since \(g&#8217;(y)\) is continuous, $$<br />
\lim_{h\to 0}\frac{g&#8217;(y)\phi(h)}{|h|} = g&#8217;(y)\left[\lim_{h\to 0}\frac{\phi(h)}{|h|}\right]=0,<br />
$$ and it remains to show that $$<br />
\lim_{h\to 0}\frac{\psi(f&#8217;(x)h+\phi(h))}{|h|}=0.<br />
$$ We can assume that \(|f&#8217;(x)| \ne 0\). Let \(\varepsilon>0\). By (*) there exist \(\delta,\gamma,\beta>0\) such that \(|\phi(s)|/|s|<|f'(x)|\) whenever \(|s|<\delta\), \(|\psi(t)|<\varepsilon(2|f'(x)|)^{-1}|t|\) whenever \(|t|<\gamma\), and \(|f'(x)h+\phi(h)|<\gamma\) whenever \(|h|<\beta\). Then for all \(|h|<\beta\), \begin{align}<br />
\frac{|\psi(f'(x)h+\phi(h))|}{|h|} &#038;\le \frac{\varepsilon}{2|f'(x)|} \left(\frac{|f'(x)h|}{|h|}+\frac{|\psi(h)|}{|h|}\right) \\<br />
&#038;< \varepsilon.<br />
\end{align} \(\square\)</p>
<p><a name="id-5"></a><br />
<strong>Theorem 5.</strong> <em>Let \(F_1,F_2\) be Banach spaces, let \(A\subseteq E\) be an open set and let \(f:A\to F_1\) and \(g:A\to F_2\) be differentiable at \(x\in A\).</em></p>
<ol>
<li><em>If \(f\) is constant then \(f&#8217;(x)=0\).</em></li>
<li><em>If \(f(x)=\lambda x\) for some continuous linear map \(\lambda\), then \(f&#8217;(x)=\lambda\).</em></li>
<li><em>If \(F_1=F_2\) then \((f+g)&#8217;(x)=f&#8217;(x)+g&#8217;(x)\).</em></li>
<li><em>\((cf)&#8217;(x)=cf&#8217;(x)\) for all scalars \(c\).</em></li>
<li>(Product rule). <em>Suppose there is a continuous bilinear map \(\cdot:F_1\times F_2\to G\). Then $$<br />
(fg)&#8217;(x)=f&#8217;(x)g(x)+f(x)g&#8217;(x),<br />
$$ where \(f&#8217;(x)g(x)\) is the linear map that takes \(u\) to \(f&#8217;(x)u\cdot g(x)\).</em></li>
</ol>
<p><em>Proof.</em> The first 4 parts are obvious, so we only prove (5). We have \begin{align}<br />
0 &#038;= \lim_{h\to 0}\frac{[f(x+h)-f(x)-f'(x)h]g(x+h)+f(x)[g(x+h)-g(x)-g'(x)h]}{|h|} \\<br />
&#038;= \lim_{h\to 0}\frac{(fg)(x+h)-(fg)(x)-[f'(x)g(x+h)+f(x)g'(x)]h}{|h|}.\tag{*}<br />
\end{align} Now $$<br />
\frac{|f&#8217;(x)h[g(x+h)-g(x)]|}{|h|} \le |f&#8217;(x)||g(x+h)-g(x)| \to 0<br />
$$ as \(h\to 0\) since \(\cdot\) is continuous and \(g\) is continuous at \(x\), so $$<br />
\lim_{h\to 0}\frac{f&#8217;(x)h[g(x+h)-g(x)]}{|h|} = 0.<br />
$$ Adding this to (*) gives $$<br />
\lim_{h\to 0}\frac{(fg)(x+h)-(fg)(x)-[f'(x)g(x)+f(x)g'(x)]h}{|h|}=0.<br />
$$ \(\square\)</p>
<p><strong>Theorem 6.</strong> <em>Let \(E\) be a Banach algebra and let \(U\) be the open set of its invertible elements. Then the map \(x\mapsto x^{-1}\) is differentiable on \(U\), and its derivative at a point \(x\) is given by $$u \mapsto -x^{-1}ux^{-1}.$$</em></p>
<p><em>Proof.</em> We have \begin{align}<br />
(x+h)^{-1}-x^{-1}+x^{-1}hx^{-1} &#038;= (x(e+x^{-1}h))^{-1}-x^{-1}+x^{-1}hx^{-1} \\<br />
&#038;= (e+x^{-1}h)^{-1}x^{-1}-x^{-1}+x^{-1}hx^{-1} \\<br />
&#038;= [(e+x^{-1}h)^{-1}-(e-x^{-1})]x^{-1}.\tag{*}<br />
\end{align} For sufficiently small \(h\) we have \(|e-(e+x^{-1}h)|=|x^{-1}h|<1/2\), so \begin{align}<br />
|(e+x^{-1}h)^{-1}-(e-x^{-1}h)| &#038;= \left\vert \sum_{k=0}^\infty (-x^{-1}h)^k - (e-x^{-1}h) \right\vert \\<br />
&#038;= \left\vert \sum_{k=2}^\infty (-x^{-1}h)^k \right\vert \\<br />
&#038;\le \frac{|x^{-1}h|^2}{1-|x^{-1}h|} \\<br />
&#038;= \frac{|x^{-1}h|}{|x^{-1}h|^{-1}-1} \\<br />
&#038;< |x^{-1}||h|.<br />
\end{align} Combining this with (*) shows that $$<br />
\frac{(x+h)^{-1}-x^{-1}+x^{-1}hx^{-1}}{|h|} \to 0<br />
$$ as \(h\to 0\). \(\square\)</p>
<p><strong>Corollary 7</strong> (Quotient rule). <em>Let \(F_1\) be a Banach space, let \(F_2\) be a Banach algebra, and let \(U\) be the open set of the invertible elements in \(F_2\). Let \(A\subseteq E\) be an open set and let \(f:A\to F_1\) and \(g:A\to U\) be differentiable at \(x\in A\). Suppose there is a continuous bilinear map \(\cdot:F_1\times F_2\to G\). Write \(fg^{-1}\) for the map \((fg^{-1})(x)=f(x)g(x)^{-1}\). Then \((fg^{-1})&#8217;(x)\) is given by $$<br />
u \mapsto [f'(x)u]g(x)^{-1}-f(x)g(x)^{-1}[g'(x)u]g(x)^{-1}.<br />
$$ In particular, if \(F_2\) is commutative then $$<br />
(f/g)&#8217;(x)=[f'(x)g(x)-f(x)g'(x)]g(x)^{-2}.<br />
$$</em></p>
<h2>Linear maps and direct sums</h2>
<p>Before continuing with the properties of the derivative, we describe a generalization of the concept of block matrices. Suppose \(E_1,\dots,E_m\) and \(F_1,\dots,F_n\) are vector spaces, and \(\lambda:E_1\times\cdots\times E_m\to F_1\times\cdots\times F_n\) is a linear map. We note that in the case of finitely many vector spaces, the notions of direct sum (coproduct) and direct product (product) coincide. Therefore, we have unique linear maps \(\lambda_{i,j}:E_j\to F_i\) such that \(\lambda_{i,j}=\pi_i\circ\lambda\circ\iota_j\), where \(\pi_i:F_1\times\cdots\times F_n\to F_i\) is the canonical projection and \(\iota_j:E_j\to E_1\times\cdots\times E_m\) is the canonical injection. In this case we write $$<br />
\lambda=\begin{bmatrix}<br />
\lambda_{1,1} &#038; \cdots &#038; \lambda_{1,m} \\<br />
\vdots &#038; \ddots &#038; \vdots \\<br />
\lambda_{n,1} &#038; \cdots &#038; \lambda_{n,m}<br />
\end{bmatrix}$$ and say that the matrix <strong>represents</strong> \(\lambda\), and sometimes that the maps \(\lambda_{i,j}\) are the <strong>components</strong> of \(\lambda\). If \(\tau:F_1\times\cdots\times F_n\to G_1\times\cdots\times G_p\) is another linear map, then it is easy to verify that \(\tau\circ\lambda\) is represented by the product of the matrix representing \(\tau\) with the matrix representing \(\lambda\), with the usual matrix multiplication formula. On the other hand, if we start with linear maps \(\lambda_{i,j}\), then there is a unique linear map \(\lambda\) that has the components \(\lambda_{i,j}\). The situation is entirely analogous to block matrices in \(\mathbb{R}\) or \(\mathbb{C}\), so we do not go any further.</p>
<p><a name="id-8"></a><br />
<strong>Theorem 8.</strong> <em>Let \(A\subseteq E\) be an open set, let \(F_1,\dots,F_n\) be Banach spaces, let \(f:A\to F_1\times\cdots\times F_n\) and let \(f_i=\pi_i\circ f\) be the component functions of \(f\), where \(\pi_i:F_1\times\cdots\times F_n\to F_i\) is the canonical projection. Then \(f\) is differentiable at a point \(x\in A\) if and only if every \(f_i\) is differentiable at \(x\). In that case we have \(f&#8217;_i(x)=\pi_i\circ f&#8217;(x)\), i.e. $$<br />
f&#8217;(x) = \begin{bmatrix}f&#8217;_1(x) \\ \vdots \\ f&#8217;_n(x)\end{bmatrix}.<br />
$$</em></p>
<p><em>Proof.</em> If \(\lambda\) is a linear map then the \(i\)th entry (with respect to the direct sum decomposition) of $$<br />
T(h)=\frac{f(x+h)-f(x)-\lambda h}{|h|}<br />
$$ is simply $$<br />
T_i(h)=\frac{f_i(x+h)-f_i(x)-\pi_i\lambda h}{|h|}.<br />
$$ Therefore \(T(h)\) approaches \(0\) as \(h\to 0\) if and only if every \(T_i(h)\) approaches \(0\) as \(h\to 0\). The second statement is clear from the above. \(\square\)</p>
<h2>The fundamental theorem of calculus</h2>
<p>The definition of the derivative extends naturally to closed intervals (with more than one point) in \(\mathbb{R}\). If \(f:[a,b]\to E\) is differentiable and \(x\in[a,b]\), then \(f&#8217;\) is a linear map from \(\mathbb{R}\) to \(E\). For convenience, we identify \(f&#8217;(x)\) with \(f&#8217;(x)(1)\) and write \(f&#8217;(x)=c\in E\), where \(c=f&#8217;(x)(1)\). This coincides with the elementary definition of the derivative as $$<br />
f&#8217;(x)=\lim_{h\to 0}\frac{f(x+h)-f(x)}{h}.<br />
$$</p>
<p><strong>Lemma 9.</strong> <em>Let \(f:[a,b]\to E\) be differentiable. If \(f&#8217;(x)=0\) for all \(x\in[a,b]\), then \(f\) is constant.</em></p>
<p><em>Proof.</em> Suppose \(f(t)\ne f(a)\) for some \(t\in[a,b]\), and choose a linear functional \(\lambda\) such that \(\lambda(f(t))\ne \lambda(f(a))\), e.g. by applying the Hahn-Banach theorem. Then \(\lambda\circ f\) is differentiable and \((\lambda\circ f)&#8217;(x)=0\) for all \(x\in[a,b]\), which implies that \(\lambda\circ f\) is constant. This is a contradiction. \(\square\)</p>
<p>Integration is essential to the study of differentiation. Here we assume the existence of an integral that can integrate Banach space valued continuous functions defined on closed intervals, e.g. the <a href="http://en.wikipedia.org/wiki/Bochner_integral">Bochner integral</a> or the <a href="http://en.wikipedia.org/wiki/Regulated_integral">Regulated integral</a>. However, we only use the most basic properties of the integral, such as linearity and the absolute value estimate.</p>
<p><strong>Theorem 10</strong> (Fundamental theorem of calculus). <em>Let \(f:[a,b]\to E\) be an integrable function, and suppose that \(f\) is continuous at \(x\in[a,b]\). Then the map $$t\mapsto\int_a^t f$$ is differentiable at \(x\) and its derivative is \(f(x)\).</em></p>
<p><em>Proof.</em> We have \begin{align}<br />
\frac{1}{|h|}\left\vert \int_a^{x+h} f &#8211; \int_a^x f &#8211; f(x)h \right\vert &#038;= \frac{1}{|h|}\left\vert \int_x^{x+h} [f(t)-f(x)]\,dt \right\vert \\<br />
&#038;\le \sup_t |f(t)-f(x)| \\<br />
&#038;\to 0<br />
\end{align} as \(h\to 0\), where the \(\sup\) is taken over all \(t\) between \(x\) and \(x+h\) where \(f(t)\) is defined. \(\square\)</p>
<p><strong>Corollary 11.</strong> <em>Let \(f:[a,b]\to E\) be continuous, let \(F:[a,b]\to E\), and suppose that \(F&#8217;(x)=f(x)\) for all \(x\in[a,b]\). Then $$<br />
\int_a^b f = F(b)-F(a).<br />
$$</em></p>
<p><em>Proof.</em> Apply Lemma 8 to the map $$x \mapsto F(x) &#8211; \int_a^x f.$$ \(\square\)</p>
<p><a name="id-12"></a><br />
<strong>Corollary 12</strong> (Integration by parts). <em>Let \(E_1,E_2,F\) be Banach spaces and suppose there is a continuous bilinear map \(\cdot:E_1\times E_2\to F\). Let \(f:[a,b]\to E_1\) and \(g:[a,b]\to E_2\) be continuously differentiable functions. Then $$\int_a^b f&#8217;g + \int_a^b fg&#8217; = f(b)g(b)-f(a)g(a).$$</em></p>
<p><em>Proof.</em> We have \((fg)&#8217;=f&#8217;g+fg&#8217;\) by the product rule, so integrating both sides from \(a\) to \(b\) and applying Corollary 11 produces the result. \(\square\)</p>
<h2>Mean value inequalities</h2>
<p>Let \(\alpha:[a,b]\to L(E,F)\) be a continuous map into the space of linear maps from \(E\) to \(F\). If \(x\in[a,b]\) and \(y\in E\) then we write \(\alpha(x)y\) for the element \(\alpha(x)(y)\in F\).</p>
<p><strong>Lemma 13.</strong> <em>Let \(\alpha:[a,b]\to L(E,F)\) be a continuous map and let \(y\in E\). Then $$<br />
\int_a^b \alpha(t)y\,dt = \left(\int_a^b \alpha(t)\,dt\right)y.<br />
$$</em></p>
<p><em>Proof.</em> The map \(\lambda\mapsto\lambda(y)\) is a continuous linear map from \(L(E,F)\) to \(F\), so the result follows from the general fact that $$\varphi \int_X f = \int_X \varphi \circ f$$ whenever \(\varphi\) is a continuous linear map between Banach spaces. \(\square\)</p>
<p><a name="id-14"></a><br />
<strong>Theorem 14</strong> (Mean value theorem). <em>Let \(A\subseteq E\) be an open set, let \(f:A\to F\) be continuously differentiable, let \(x\in A\), and let \(v\in E\). If the line segment \(x+tv\) with \(0\le t\le 1\) is contained in \(A\), then $$<br />
f(x+v)-f(x)=\int_0^1 f&#8217;(x+tv)v\,dt=\left(\int_0^1 f&#8217;(x+tv)\,dt\right)v.$$</em></p>
<p><em>Proof.</em> Let \(g(t)=f(x+tv)\) so that \(g&#8217;(t)=f&#8217;(x+tv)v\). By the fundamental theorem of calculus, we have $$<br />
g(1)-g(0)=\int_0^1 g&#8217;.<br />
$$ Since \(g(0)=f(x)\) and \(g(1)=f(x+v)\), the result follows and we can apply Lemma 13. \(\square\)</p>
<p>The mean value theorem shows that the change in \(f(x)\) is determined by its derivative and the change in \(x\). This is made precise in the corollary below.</p>
<p><a name="id-15"></a><br />
<strong>Corollary 15</strong> (Mean value inequality). <em>Let \(A\subseteq E\) be an open set, let \(f:A\to F\) be continuously differentiable, and let \(x,y\in A\). If the line segment between \(x\) and \(y\) is contained in \(A\), then $$<br />
|f(y)-f(x)| \le |y-x| \sup_u |f&#8217;(u)|,<br />
$$ where the \(\sup\) is taken over all \(u\) in the line segment. If \(z\in A\), then $$<br />
|f(y)-f(x)-f&#8217;(z)(y-x)| \le |y-x| \sup_u |f&#8217;(u)-f&#8217;(z)|,<br />
$$ with the \(\sup\) as above.</em></p>
<p><em>Proof.</em> We have \begin{align}<br />
|f(y)-f(x)| &#038;= \left\vert \int_0^1 f&#8217;(x+t(y-x))(y-x)\,dt \right\vert \\<br />
&#038;\le |y-x|(1-0)\sup_{t\in[0,1]} |f&#8217;(x+t(y-x))|,<br />
\end{align} which proves the first statement. Then apply this result to the map defined by \(g(v)=f(v)-f&#8217;(z)v\) to obtain the second statement. \(\square\)</p>
<p><a name="id-16"></a><br />
<strong>Corollary 16</strong> (Lipschitz estimate for \(C^1\) maps). <em>Let \(A\subseteq E\) be a convex open set and let \(f:A\to F\) be continuously differentiable. If there is a constant \(M\) such that \(|f&#8217;(x)| \le M\) for all \(x\in A\), then $$<br />
|f(x)-f(y)| \le M|x-y|<br />
$$ for all \(x,y\in A\).</em></p>
<p>This allows us to generalize Lemma 9.</p>
<p><a name="id-17"></a><br />
<strong>Corollary 17.</strong> <em>Let \(A\subseteq E\) be a connected open set and suppose that the derivative of \(f:A\to F\) is zero on \(A\). Then \(f\) is constant.</em></p>
<p><em>Proof.</em> If \(x\in A\) and \(B_r(x)\) is any open ball around \(x\) contained in \(A\) then Corollary 16 shows that \(f\) is constant on \(B_r(x)\). Since \(A\) is connected, \(f\) is constant on \(A\). \(\square\)</p>
<p>Next time, we will look at higher derivatives and the symmetries that arise.</p>
<p>Navigation: <strong>1. The derivative</strong> | <a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/">2. Higher derivatives</a> | <a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/">3. Partial derivatives</a> | <a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/">4. Inverse and implicit functions</a> | <a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/">5. Maxima and minima</a></p>
<div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2013/02/23/differentiation-done-correctly-3-partial-derivatives/"     class="crp_title">Differentiation done correctly: 3. Partial derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/22/differentiation-done-correctly-2-higher-derivatives/"     class="crp_title">Differentiation done correctly: 2. Higher derivatives</a></li><li><a href="http://wj32.org/wp/2013/02/24/differentiation-done-correctly-4-inverse-and-implicit-functions/"     class="crp_title">Differentiation done correctly: 4. Inverse and implicit&hellip;</a></li><li><a href="http://wj32.org/wp/2013/02/25/differentiation-done-correctly-5-maxima-and-minima/"     class="crp_title">Differentiation done correctly: 5. Maxima and minima</a></li><li><a href="http://wj32.org/wp/2013/02/26/convex-functions-second-derivatives-and-hessian-matrices/"     class="crp_title">Convex functions, second derivatives and Hessian matrices</a></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://wj32.org/wp/2013/02/21/differentiation-done-correctly-1-the-derivative/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Coursera first impressions</title>
		<link>http://wj32.org/wp/2013/01/28/coursera-first-impressions/</link>
		<comments>http://wj32.org/wp/2013/01/28/coursera-first-impressions/#comments</comments>
		<pubDate>Mon, 28 Jan 2013 09:50:39 +0000</pubDate>
		<dc:creator>wj32</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://wj32.org/wp/?p=851</guid>
		<description><![CDATA[I recently signed up for a few Coursera courses, and overall it&#8217;s been a fun experience. Here are some of my comments. Game Theory So far the instructors have explained normal form games, Nash equilibrium, Pareto optimality, mixed strategies and &#8230; <a href="http://wj32.org/wp/2013/01/28/coursera-first-impressions/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/software/wjs-backup/"     class="crp_title">WJ&#8217;s Backup</a></li><li><a href="http://wj32.org/wp/mathematics/"     class="crp_title">Mathematics</a></li><li><a href="http://wj32.org/wp/2012/04/09/triangles-in-a-triangle/"     class="crp_title">(How many) triangles in a triangle?</a></li><li><a href="http://wj32.org/wp/2012/12/21/the-both-open-and-closed-trick-for-connected-spaces/"     class="crp_title">The &#8220;both open and closed&#8221; trick for connected&hellip;</a></li><li><a href="http://wj32.org/wp/2012/12/14/adsense-on-process-hackers-website/"     class="crp_title">AdSense on Process Hacker&#8217;s website</a></li></ul></div>]]></description>
				<content:encoded><![CDATA[<p>I recently signed up for a few Coursera courses, and overall it&#8217;s been a fun experience. Here are some of my comments.</p>
<h2><a href="https://www.coursera.org/course/gametheory">Game Theory</a></h2>
<p>So far the instructors have explained normal form games, Nash equilibrium, Pareto optimality, mixed strategies and maxmin strategies. The definitions are pretty clear and plenty of examples follow. The graded problem sets are usually fairly easy and sometimes include interesting examples not found in the videos. However, there are no proofs of any theorems or results. In fact, the theorems themselves are not even written down in any precise way &#8211; the instructors seem to be intent on avoiding mathematical notation here for some reason. Verdict: <strong>Enrol.</strong></p>
<h2><a href="https://www.coursera.org/course/modernworld">The Modern World: Global History since 1760</a></h2>
<p>Starting near the end of the Commercial Revolution, the instructor explores global history all the way up to the present. There are usually one or two questions at the end of each video to determine whether you have been paying attention. The instructor (Philip Zelikow from the University of Virginia) really shows his enthusiasm for the subject in the videos. Verdict: <strong>Enrol!</strong></p>
<h2><a href="https://www.coursera.org/course/images">Image and video processing: From Mars to Hollywood with a stop at the hospital</a></h2>
<p>The instructor begins by giving many applications of image processing, and then gets straight into topics like Huffman coding and the discrete cosine transform (no proofs). I haven&#8217;t watched a lot of them yet. Verdict: <strong>???</strong></p>
<h2><a href=https://www.coursera.org/course/introfinance">Introduction to Finance</a></h2>
<p>The instructor is clearly excited about the subject, but keeps going off on tangents. Furthermore, he takes around an hour to explain the compound interest formula \(P(1+r/n)^{nt}\), something that should take 5 minutes at most. Use Khan Academy instead. Verdict: <strong>Avoid.</strong></p>
<h2><a href="https://www.coursera.org/course/introACpartI">Analytic Combinatorics, Part I</a></h2>
<p>Finally, a course in which the forums aren&#8217;t full of complaints about &#8220;too much math&#8221;. Sedgewick&#8217;s voice can be a little boring at times, but the content should be extremely interesting to anyone who knows about generating functions. Verdict: <strong>Enrol!</strong></p>
<div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/software/wjs-backup/"     class="crp_title">WJ&#8217;s Backup</a></li><li><a href="http://wj32.org/wp/mathematics/"     class="crp_title">Mathematics</a></li><li><a href="http://wj32.org/wp/2012/04/09/triangles-in-a-triangle/"     class="crp_title">(How many) triangles in a triangle?</a></li><li><a href="http://wj32.org/wp/2012/12/21/the-both-open-and-closed-trick-for-connected-spaces/"     class="crp_title">The &#8220;both open and closed&#8221; trick for connected&hellip;</a></li><li><a href="http://wj32.org/wp/2012/12/14/adsense-on-process-hackers-website/"     class="crp_title">AdSense on Process Hacker&#8217;s website</a></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://wj32.org/wp/2013/01/28/coursera-first-impressions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Some series convergence problems</title>
		<link>http://wj32.org/wp/2013/01/26/some-series-convergence-problems/</link>
		<comments>http://wj32.org/wp/2013/01/26/some-series-convergence-problems/#comments</comments>
		<pubDate>Sat, 26 Jan 2013 11:30:54 +0000</pubDate>
		<dc:creator>wj32</dc:creator>
				<category><![CDATA[Mathematics]]></category>

		<guid isPermaLink="false">http://wj32.org/wp/?p=842</guid>
		<description><![CDATA[Here are some series convergence problems that I gathered quite a while ago. A few of them are a bit tricky. Determine convergence/divergence for the nonnegative series below. Prove your answers. \(\displaystyle\sum_{n=2}^\infty\frac{1}{(\log n)^p}\) for any real \(p\) \(\displaystyle\sum_{n=2}^\infty\frac{1}{(\log n)^n}\) \(\displaystyle\sum_{n=2}^\infty\frac{1}{(\log &#8230; <a href="http://wj32.org/wp/2013/01/26/some-series-convergence-problems/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2011/10/30/power-series-of-tanx-cotx-cscx/"     class="crp_title">Power series of tan(x), cot(x), csc(x)</a></li><li><a href="http://wj32.org/wp/2012/12/15/formula-for-the-circumference-of-an-ellipse/"     class="crp_title">Formula for the circumference of an ellipse</a></li><li><a href="http://wj32.org/wp/2012/04/09/triangles-in-a-triangle/"     class="crp_title">(How many) triangles in a triangle?</a></li><li><a href="http://wj32.org/wp/2013/02/28/frechet-derivative-of-the-matrix-exponential-function/"     class="crp_title">Fréchet derivative of the (matrix) exponential function</a></li><li><a href="http://wj32.org/wp/2012/10/10/strictly-positive-extensions-of-linear-functionals/"     class="crp_title">Strictly positive extensions of linear functionals</a></li></ul></div>]]></description>
				<content:encoded><![CDATA[<p>Here are some series convergence problems that I gathered quite a while ago. A few of them are a bit tricky.</p>
<p><span id="more-842"></span></p>
<p>Determine convergence/divergence for the nonnegative series below. Prove your answers.</p>
<ol>
<li>\(\displaystyle\sum_{n=2}^\infty\frac{1}{(\log n)^p}\) for any real \(p\)</li>
<li>\(\displaystyle\sum_{n=2}^\infty\frac{1}{(\log n)^n}\)</li>
<li>\(\displaystyle\sum_{n=2}^\infty\frac{1}{(\log n)^{\log n}}\)</li>
<li>\(\displaystyle\sum_{n=3}^\infty\frac{1}{(\log n)^{\log\log n}}\)</li>
<li>\(\displaystyle\sum_{n=2}^\infty\frac{1}{n \log n}\)</li>
<li>\(\displaystyle\sum_{n=3}^\infty\frac{1}{n (\log n) (\log\log n)^2}\)</li>
<li>\(\displaystyle\sum_{n=2}^\infty\frac{\log n}{e^{\sqrt{n}}}\)</li>
<li>\(\displaystyle\sum_{n=0}^\infty\frac{n!}{e^{n^2}}\)</li>
<li>\(\displaystyle\sum_{n=1}^\infty\log\left(1+\frac{1}{\sqrt{n}}\right)\)</li>
<li>\(\displaystyle\sum_{n=1}^\infty(\sqrt[n]{n}-1)^n\)</li>
<li>\(\displaystyle\sum_{n=1}^\infty(\sqrt[n]{n}-1)\)</li>
</ol>
<h2>Answers/Hints</h2>
<ol>
<li>No, compare with harmonic series.</li>
<li>Yes, compare with geometric series or use ratio/root test.</li>
<li>Yes, compare with \(\sum 1/n^2\).</li>
<li>No, compare with harmonic series.</li>
<li>No, use integral test.</li>
<li>Yes, use integral test.</li>
<li>Yes, compare with \(\sum 1/n^{3/2}\).</li>
<li>Yes, compare with geometric series.</li>
<li>No, compare with \(\log(1+1/n)\) and consider partial sums.</li>
<li>Yes, use root test.</li>
<li>No, compare with harmonic series. It may help to show that \((1+1/n)^n \le 3\) for all \(n\).</li>
</ol>
<h2>Very Big Hints</h2>
<ol>
<li>Obvious for \(p \le 0\). For \(p < 0\), we have \(\log n < n^{1/p}\) for sufficiently large \(n\).</li>
<li>\((\log n)^n > 2^n\) for sufficiently large \(n\).</li>
<li>\((\log n)^{\log n} = e^{(\log n)(\log \log n)} = n^{\log\log n} > n^2\) for sufficiently large \(n\).</li>
<li>\((\log n)^{\log\log n} = e^{(\log\log n)^2} < e^{(\sqrt{\log n})^2} = n\) for sufficiently large \(n\).</li>
<li>Integrate by taking \(u = \log x\).</li>
<li>Integrate by taking \(u = \log\log x\).</li>
<li>For sufficiently large \(n\) we have \(\log n < n^{1/4}\) and \(e^{\sqrt{n}} > e^{(\log n)^2} = n^{\log n} > n^2\).</li>
<li>\(n!/e^{n^2} < n^n/e^{n^2} = (n/e^n)^n < (1/2)^n\) for sufficiently large \(n\).</li>
<li>\(\log(1+1/\sqrt{n}) \ge \log(1+1/n) = \log(n+1) &#8211; \log n\), and \(\sum_{n=1}^N \log(1+1/n) = \log(N+1)\).</li>
<li>Use the root test since \(\sqrt[n]{n}-1 \rightarrow 0\). Alternatively, for \(n \ge 2\) we have $$<br />
n = (1+(\sqrt[n]{n}-1))^n \ge \frac{n(n-1)}{2} (\sqrt[n]{n}-1)^2<br />
$$ by the binomial theorem, so $$<br />
\sqrt[n]{n}-1 \le \sqrt{\frac{2}{n-1}}.<br />
$$ Now $$<br />
(\sqrt[n]{n}-1)^n \le \left(\frac{2}{n-1}\right)^{n/2},<br />
$$ and the comparison test can be applied.</li>
<li>For all \(n\) we have $$<br />
\left(1+\frac{1}{n}\right)^n = \sum_{k=0}^n \binom{n}{k} \frac{1}{n^k} = \sum_{k=0}^n \left(\frac{1}{k!}\right) \frac{n(n-1) \cdots (n-k+1)}{n^k} \le \sum_{k=0}^n \frac{1}{k!} \le 3,<br />
$$ so \((1+1/n)^n \le 3 \le n\) for all \(n \ge 3\). Therefore \(1+1/n \le \sqrt[n]{n}\), and the comparison test can be applied.</li>
</ol>
<div class="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://wj32.org/wp/2011/10/30/power-series-of-tanx-cotx-cscx/"     class="crp_title">Power series of tan(x), cot(x), csc(x)</a></li><li><a href="http://wj32.org/wp/2012/12/15/formula-for-the-circumference-of-an-ellipse/"     class="crp_title">Formula for the circumference of an ellipse</a></li><li><a href="http://wj32.org/wp/2012/04/09/triangles-in-a-triangle/"     class="crp_title">(How many) triangles in a triangle?</a></li><li><a href="http://wj32.org/wp/2013/02/28/frechet-derivative-of-the-matrix-exponential-function/"     class="crp_title">Fréchet derivative of the (matrix) exponential function</a></li><li><a href="http://wj32.org/wp/2012/10/10/strictly-positive-extensions-of-linear-functionals/"     class="crp_title">Strictly positive extensions of linear functionals</a></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://wj32.org/wp/2013/01/26/some-series-convergence-problems/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
