Convex functions, second derivatives and Hessian matrices

In single variable calculus, a twice differentiable function \(f:(a,b)\to\mathbb{R}\) is convex if and only if \(f^{\prime\prime}(x)\ge 0\) for all \(x\in(a,b)\). It is not too hard to extend this result to functions defined on more general spaces:

Theorem. Let \(A\subseteq\mathbb{R}^n\) be a convex open set and let \(f:A\to\mathbb{R}\). Suppose that \(f^{\prime\prime}\) exists on \(A\). Then \(f\) is convex if and only if \(f^{\prime\prime}(x)\) is positive semidefinite for all \(x\in A\).

Hessian matrices

Combining the previous theorem with the higher derivative test for Hessian matrices gives us the following result for functions defined on convex open subsets of \(\mathbb{R}^n\):

Let \(A\subseteq\mathbb{R}^n\) be a convex open set and let \(f:A\to\mathbb{R}\) be twice differentiable. Write \(H(x)\) for the Hessian matrix of \(A\) at \(x\in A\).

  1. If \(f'(x)=0\) and \(H(x)\) is positive definite, then \(f\) has a strict local minimum at \(x\).
  2. If \(f'(x)=0\) and \(H(x)\) is negative definite, then \(f\) has a strict local maximum at \(x\).
  3. If \(f'(x)=0\) and \(H(x)\) has both positive and negative eigenvalues, then \(f\) does not have a local minimum or a local maximum at \(x\). That is, \(f\) has a saddle point at \(x\).
  4. If \(H(x)\) is positive semidefinite for all \(x\in A\), then \(f\) is convex and has a strict global minimum at any \(x\) for which \(f'(x)=0\) and \(H(x)\) is positive definite.
  5. If \(H(x)\) is negative semidefinite for all \(x\in A\), then \(f\) is concave and has a strict global maximum at any \(x\) for which \(f'(x)=0\) and \(H(x)\) is negative definite.

Since the determinant of a matrix is the product of its eigenvalues, we also have this special case:

Let \(A\subseteq\mathbb{R}^2\) be a convex open set and let \(f:A\to\mathbb{R}\) be twice differentiable. Write \(H(x)\) for the Hessian matrix of \(A\) at \(x\in A\): $$H(x)=\begin{bmatrix} a & b \\ b & d\end{bmatrix}.$$ (Note that \(a,b,d\) are functions of \(x\).)

  1. If \(f'(x)=0\), \(ad-b^2 > 0\) and \(a > 0\), then \(f\) has a strict local minimum at \(x\).
  2. If \(f'(x)=0\), \(ad-b^2 > 0\) and \(a < 0\), then \(f\) has a strict local maximum at \(x\).
  3. If \(f'(x)=0\) and \(ad-b^2 < 0\), then \(f\) has a saddle point at \(x\).
  4. If \(ad-b^2 \ge 0\) and \(a,d \ge 0\) for all \(x\in A\), then \(f\) is convex and has a strict global minimum at any \(x\) for which \(f'(x)=0\), \(ad-b^2 > 0\) and \(a > 0\).
  5. If \(ad-b^2 \ge 0\) and \(a,d \le 0\) for all \(x\in A\), then \(f\) is concave and has a strict global maximum at any \(x\) for which \(f'(x)=0\), \(ad-b^2 > 0\) and \(a < 0\).

Proof of the theorem

We will prove a slightly more general statement.

Let \(E\) be a Banach space, let \(A\subseteq E\) be a convex open set, and let \(f:A\to\mathbb{R}\). We say that \(f\) is convex if $$
f(x+\lambda(y-x)) \le f(x)+\lambda(f(y)-f(x))
$$ for all \(x,y\in A\) and \(\lambda\in(0,1)\).

Theorem. Let \(A\subseteq E\) be a convex open set and let \(f:A\to\mathbb{R}\). Suppose that \(f^{\prime\prime}\) exists on \(A\). Then \(f\) is convex if and only if \(f^{\prime\prime}(x)\) is positive semidefinite for all \(x\in A\).

Here, \(f^{\prime\prime}(x):E\times E\to\mathbb{R}\) is a bilinear form and \(f^{\prime\prime}(x)\) is said to be positive semidefinite if \(f^{\prime\prime}(x)(h,h)\ge 0\) for all \(h\in E\).

The idea of the proof is quite simple: restrict \(f\) to line segments that lie in \(A\) and use the single variable case mentioned at the start of this post. In order to do this, we need a formula for the second derivative of a composition of maps between Banach spaces: \begin{align}
D^2(g\circ f)(x)(u,v) &= D^2 g(f(x))(Df(x)(u),Df(x)(v)) \\
&\qquad + Dg(f(x))(D^2 f(x)(u,v)).
\end{align} The proof, which is just a long computation, is included at the end of this post. Note that when \(f,g\) are maps from \(\mathbb{R}\) to \(\mathbb{R}\), we recover the formula $$
(g\circ f)^{\prime\prime}(x)=g^{\prime\prime}(f(x))[f'(x)]^2+g'(f(x))f^{\prime\prime}(x).
$$

Proof. Let \(x,y\in A\) and define \(\gamma:(-\varepsilon,1+\varepsilon)\to A\) by \(\gamma(\lambda)=x+\lambda(y-x)\), where \(\varepsilon > 0\) is chosen to be small enough. Then \(D\gamma(\lambda)(1)=y-x\) and \(D^2\gamma(\lambda)(1,1)=0\). Using the above formula, we get \begin{align}
(f\circ\gamma)^{\prime\prime}(\lambda) &= D^2(f\circ\gamma)(\lambda)(1,1) \\
&= f^{\prime\prime}(\gamma(\lambda))(y-x,y-x).\tag{*}
\end{align} If \(f^{\prime\prime}(z)\) is positive semidefinite for all \(z\in A\) then \((f\circ\gamma)^{\prime\prime}(\lambda)\ge 0\) for all \(\lambda\), so \(f\circ\gamma\) is convex. Then for all \(\lambda\in(0,1)\), $$
(f\circ\gamma)(0+\lambda(1-0)) \le (f\circ\gamma)(0)+\lambda[(f\circ\gamma)(1)-(f\circ\gamma)(0)]
$$ and $$
f(x+\lambda(y-x)) \le f(x)+\lambda(f(y)-f(x)),
$$ which proves that \(f\) is convex. Conversely, suppose that \(f\) is convex. Let \(z\in A\) and choose \(r > 0\) so that the open ball of radius \(r\) around \(z\) is contained in \(A\). Let \(|u| < r\) and set \(x=z-u/2\) and \(y=z+u/2\) in (*) so that \begin{align} f^{\prime\prime}(z)(u,u) &= f^{\prime\prime}(\gamma(1/2))(y-x,y-x) \\ &= (f\circ\gamma)^{\prime\prime}(1/2) \\ &\ge 0 \end{align} since \(f\circ\gamma\) is convex. This shows that \(f^{\prime\prime}(z)\) is positive semidefinite for all \(z\in A\). \(\square\)

The second derivative formula


Lemma (Chain rule for the second derivative). Let \(E,F\) be Banach spaces. Let \(A\subseteq E\) and \(B\subseteq F\) be open sets. Let \(f:A\to F\) and \(g:B\to G\) with \(f(A)\subseteq B\). If \(D^2 f\) exists on \(A\) and \(D^2 g\) exists on \(B\), then \(D^2(g\circ f)\) exists on \(A\) and \begin{align}
D^2(g\circ f)(x)(u,v) &= D^2 g(f(x))(Df(x)(u),Df(x)(v)) \\
&\qquad + Dg(f(x))(D^2 f(x)(u,v)).
\end{align}

Proof. We can write \(D(g\circ f)=c\circ d\circ e\), where
\begin{align}
c:L(F,G)\times L(E,F) & \to L(E,G)\\
(\lambda,\mu) & \mapsto\lambda\circ\mu;\\
d:E\times E & \to L(F,G)\times L(E,F)\\
(x,y) & \mapsto((Dg\circ f)(x),Df(y));\\
e:E & \to E\times E\\
x & \mapsto(x,x).
\end{align} Note that \(c\) is a continuous bilinear map and \(e\) is a continuous
linear map. We compute \begin{align}
D^{2}(g\circ f)(x) & =Dc((d\circ e)(x))\circ Dd(e(x))\circ De(x)\\
& =Dc(Dg(f(x)),Df(x))\circ Dd(x,x)\circ e.
\end{align} Now $$
Dd(x,y)(u,v)=(D^{2}g(f(x))(Df(x)(u)),D^{2}f(y)(v)),
$$ so
\begin{align}
D^{2}(g\circ f)(x)(u) & =Dc(Dg(f(x)),Df(x))(Dd(x,x)(u,u))\\
& =Dc(Dg(f(x)),Df(x))(D^{2}g(f(x))(Df(x)(u)),D^{2}f(x)(u))\\
& =D^{2}g(f(x))(Df(x)(u))\circ Df(x)\\
& \qquad+Dg(f(x))\circ D^{2}f(x)(u)
\end{align} and \begin{align}
D^{2}(g\circ f)(x)(u)(v) & =D^{2}g(f(x))(Df(x)(u))(Df(x)(v))\\
& \qquad+Dg(f(x))(D^{2}f(x)(u)(v)).
\end{align} \(\square\)

Leave a Reply