In single variable calculus, a twice differentiable function \(f:(a,b)\to\mathbb{R}\) is convex if and only if \(f^{\prime\prime}(x)\ge 0\) for all \(x\in(a,b)\). It is not too hard to extend this result to functions defined on more general spaces:

**Theorem.** Let \(A\subseteq\mathbb{R}^n\) be a convex open set and let \(f:A\to\mathbb{R}\). Suppose that \(f^{\prime\prime}\) exists on \(A\). Then \(f\) is convex if and only if \(f^{\prime\prime}(x)\) is positive semidefinite for all \(x\in A\).

## Hessian matrices

Combining the previous theorem with the higher derivative test for Hessian matrices gives us the following result for functions defined on convex open subsets of \(\mathbb{R}^n\):

Let \(A\subseteq\mathbb{R}^n\) be a convex open set and let \(f:A\to\mathbb{R}\) be twice differentiable. Write \(H(x)\) for the Hessian matrix of \(A\) at \(x\in A\).

- If \(f'(x)=0\) and \(H(x)\) is positive definite, then \(f\) has a strict local minimum at \(x\).
- If \(f'(x)=0\) and \(H(x)\) is negative definite, then \(f\) has a strict local maximum at \(x\).
- If \(f'(x)=0\) and \(H(x)\) has both positive and negative eigenvalues, then \(f\) does not have a local minimum or a local maximum at \(x\). That is, \(f\) has a saddle point at \(x\).
- If \(H(x)\) is positive semidefinite for
*all*\(x\in A\), then \(f\) is convex and has a strict*global*minimum at any \(x\) for which \(f'(x)=0\) and \(H(x)\) is positive*definite*. - If \(H(x)\) is negative semidefinite for
*all*\(x\in A\), then \(f\) is concave and has a strict*global*maximum at any \(x\) for which \(f'(x)=0\) and \(H(x)\) is negative*definite*.

Since the determinant of a matrix is the product of its eigenvalues, we also have this special case:

Let \(A\subseteq\mathbb{R}^2\) be a convex open set and let \(f:A\to\mathbb{R}\) be twice differentiable. Write \(H(x)\) for the Hessian matrix of \(A\) at \(x\in A\): $$H(x)=\begin{bmatrix} a & b \\ b & d\end{bmatrix}.$$ (Note that \(a,b,d\) are functions of \(x\).)

- If \(f'(x)=0\), \(ad-b^2 > 0\) and \(a > 0\), then \(f\) has a strict local minimum at \(x\).
- If \(f'(x)=0\), \(ad-b^2 > 0\) and \(a < 0\), then \(f\) has a strict local maximum at \(x\).
- If \(f'(x)=0\) and \(ad-b^2 < 0\), then \(f\) has a saddle point at \(x\).
- If \(ad-b^2 \ge 0\) and \(a,d \ge 0\) for
*all*\(x\in A\), then \(f\) is convex and has a strict*global*minimum at any \(x\) for which \(f'(x)=0\), \(ad-b^2 > 0\) and \(a > 0\). - If \(ad-b^2 \ge 0\) and \(a,d \le 0\) for
*all*\(x\in A\), then \(f\) is concave and has a strict*global*maximum at any \(x\) for which \(f'(x)=0\), \(ad-b^2 > 0\) and \(a < 0\).

## Proof of the theorem

We will prove a slightly more general statement.

Let \(E\) be a Banach space, let \(A\subseteq E\) be a convex open set, and let \(f:A\to\mathbb{R}\). We say that \(f\) is **convex** if $$

f(x+\lambda(y-x)) \le f(x)+\lambda(f(y)-f(x))

$$ for all \(x,y\in A\) and \(\lambda\in(0,1)\).

**Theorem.** Let \(A\subseteq E\) be a convex open set and let \(f:A\to\mathbb{R}\). Suppose that \(f^{\prime\prime}\) exists on \(A\). Then \(f\) is convex if and only if \(f^{\prime\prime}(x)\) is positive semidefinite for all \(x\in A\).

Here, \(f^{\prime\prime}(x):E\times E\to\mathbb{R}\) is a bilinear form and \(f^{\prime\prime}(x)\) is said to be **positive semidefinite** if \(f^{\prime\prime}(x)(h,h)\ge 0\) for all \(h\in E\).

The idea of the proof is quite simple: restrict \(f\) to line segments that lie in \(A\) and use the single variable case mentioned at the start of this post. In order to do this, we need a formula for the second derivative of a composition of maps between Banach spaces: \begin{align}

D^2(g\circ f)(x)(u,v) &= D^2 g(f(x))(Df(x)(u),Df(x)(v)) \\

&\qquad + Dg(f(x))(D^2 f(x)(u,v)).

\end{align} The proof, which is just a long computation, is included at the end of this post. Note that when \(f,g\) are maps from \(\mathbb{R}\) to \(\mathbb{R}\), we recover the formula $$

(g\circ f)^{\prime\prime}(x)=g^{\prime\prime}(f(x))[f'(x)]^2+g'(f(x))f^{\prime\prime}(x).

$$

*Proof.* Let \(x,y\in A\) and define \(\gamma:(-\varepsilon,1+\varepsilon)\to A\) by \(\gamma(\lambda)=x+\lambda(y-x)\), where \(\varepsilon > 0\) is chosen to be small enough. Then \(D\gamma(\lambda)(1)=y-x\) and \(D^2\gamma(\lambda)(1,1)=0\). Using the above formula, we get \begin{align}

(f\circ\gamma)^{\prime\prime}(\lambda) &= D^2(f\circ\gamma)(\lambda)(1,1) \\

&= f^{\prime\prime}(\gamma(\lambda))(y-x,y-x).\tag{*}

\end{align} If \(f^{\prime\prime}(z)\) is positive semidefinite for all \(z\in A\) then \((f\circ\gamma)^{\prime\prime}(\lambda)\ge 0\) for all \(\lambda\), so \(f\circ\gamma\) is convex. Then for all \(\lambda\in(0,1)\), $$

(f\circ\gamma)(0+\lambda(1-0)) \le (f\circ\gamma)(0)+\lambda[(f\circ\gamma)(1)-(f\circ\gamma)(0)]

$$ and $$

f(x+\lambda(y-x)) \le f(x)+\lambda(f(y)-f(x)),

$$ which proves that \(f\) is convex. Conversely, suppose that \(f\) is convex. Let \(z\in A\) and choose \(r > 0\) so that the open ball of radius \(r\) around \(z\) is contained in \(A\). Let \(|u| < r\) and set \(x=z-u/2\) and \(y=z+u/2\) in (*) so that \begin{align}
f^{\prime\prime}(z)(u,u) &= f^{\prime\prime}(\gamma(1/2))(y-x,y-x) \\
&= (f\circ\gamma)^{\prime\prime}(1/2) \\
&\ge 0
\end{align} since \(f\circ\gamma\) is convex. This shows that \(f^{\prime\prime}(z)\) is positive semidefinite for all \(z\in A\). \(\square\)

## The second derivative formula

**Lemma** (Chain rule for the second derivative). *Let \(E,F\) be Banach spaces. Let \(A\subseteq E\) and \(B\subseteq F\) be open sets. Let \(f:A\to F\) and \(g:B\to G\) with \(f(A)\subseteq B\). If \(D^2 f\) exists on \(A\) and \(D^2 g\) exists on \(B\), then \(D^2(g\circ f)\) exists on \(A\) and \begin{align}
D^2(g\circ f)(x)(u,v) &= D^2 g(f(x))(Df(x)(u),Df(x)(v)) \\
&\qquad + Dg(f(x))(D^2 f(x)(u,v)).
\end{align}*

*Proof.* We can write \(D(g\circ f)=c\circ d\circ e\), where

\begin{align}

c:L(F,G)\times L(E,F) & \to L(E,G)\\

(\lambda,\mu) & \mapsto\lambda\circ\mu;\\

d:E\times E & \to L(F,G)\times L(E,F)\\

(x,y) & \mapsto((Dg\circ f)(x),Df(y));\\

e:E & \to E\times E\\

x & \mapsto(x,x).

\end{align} Note that \(c\) is a continuous bilinear map and \(e\) is a continuous

linear map. We compute \begin{align}

D^{2}(g\circ f)(x) & =Dc((d\circ e)(x))\circ Dd(e(x))\circ De(x)\\

& =Dc(Dg(f(x)),Df(x))\circ Dd(x,x)\circ e.

\end{align} Now $$

Dd(x,y)(u,v)=(D^{2}g(f(x))(Df(x)(u)),D^{2}f(y)(v)),

$$ so

\begin{align}

D^{2}(g\circ f)(x)(u) & =Dc(Dg(f(x)),Df(x))(Dd(x,x)(u,u))\\

& =Dc(Dg(f(x)),Df(x))(D^{2}g(f(x))(Df(x)(u)),D^{2}f(x)(u))\\

& =D^{2}g(f(x))(Df(x)(u))\circ Df(x)\\

& \qquad+Dg(f(x))\circ D^{2}f(x)(u)

\end{align} and \begin{align}

D^{2}(g\circ f)(x)(u)(v) & =D^{2}g(f(x))(Df(x)(u))(Df(x)(v))\\

& \qquad+Dg(f(x))(D^{2}f(x)(u)(v)).

\end{align} \(\square\)