# Convex functions, second derivatives and Hessian matrices

In single variable calculus, a twice differentiable function $$f:(a,b)\to\mathbb{R}$$ is convex if and only if $$f^{\prime\prime}(x)\ge 0$$ for all $$x\in(a,b)$$. It is not too hard to extend this result to functions defined on more general spaces:

Theorem. Let $$A\subseteq\mathbb{R}^n$$ be a convex open set and let $$f:A\to\mathbb{R}$$. Suppose that $$f^{\prime\prime}$$ exists on $$A$$. Then $$f$$ is convex if and only if $$f^{\prime\prime}(x)$$ is positive semidefinite for all $$x\in A$$.

## Hessian matrices

Combining the previous theorem with the higher derivative test for Hessian matrices gives us the following result for functions defined on convex open subsets of $$\mathbb{R}^n$$:

Let $$A\subseteq\mathbb{R}^n$$ be a convex open set and let $$f:A\to\mathbb{R}$$ be twice differentiable. Write $$H(x)$$ for the Hessian matrix of $$A$$ at $$x\in A$$.

1. If $$f'(x)=0$$ and $$H(x)$$ is positive definite, then $$f$$ has a strict local minimum at $$x$$.
2. If $$f'(x)=0$$ and $$H(x)$$ is negative definite, then $$f$$ has a strict local maximum at $$x$$.
3. If $$f'(x)=0$$ and $$H(x)$$ has both positive and negative eigenvalues, then $$f$$ does not have a local minimum or a local maximum at $$x$$. That is, $$f$$ has a saddle point at $$x$$.
4. If $$H(x)$$ is positive semidefinite for all $$x\in A$$, then $$f$$ is convex and has a strict global minimum at any $$x$$ for which $$f'(x)=0$$ and $$H(x)$$ is positive definite.
5. If $$H(x)$$ is negative semidefinite for all $$x\in A$$, then $$f$$ is concave and has a strict global maximum at any $$x$$ for which $$f'(x)=0$$ and $$H(x)$$ is negative definite.

Since the determinant of a matrix is the product of its eigenvalues, we also have this special case:

Let $$A\subseteq\mathbb{R}^2$$ be a convex open set and let $$f:A\to\mathbb{R}$$ be twice differentiable. Write $$H(x)$$ for the Hessian matrix of $$A$$ at $$x\in A$$: $$H(x)=\begin{bmatrix} a & b \\ b & d\end{bmatrix}.$$ (Note that $$a,b,d$$ are functions of $$x$$.)

1. If $$f'(x)=0$$, $$ad-b^2 > 0$$ and $$a > 0$$, then $$f$$ has a strict local minimum at $$x$$.
2. If $$f'(x)=0$$, $$ad-b^2 > 0$$ and $$a < 0$$, then $$f$$ has a strict local maximum at $$x$$.
3. If $$f'(x)=0$$ and $$ad-b^2 < 0$$, then $$f$$ has a saddle point at $$x$$.
4. If $$ad-b^2 \ge 0$$ and $$a,d \ge 0$$ for all $$x\in A$$, then $$f$$ is convex and has a strict global minimum at any $$x$$ for which $$f'(x)=0$$, $$ad-b^2 > 0$$ and $$a > 0$$.
5. If $$ad-b^2 \ge 0$$ and $$a,d \le 0$$ for all $$x\in A$$, then $$f$$ is concave and has a strict global maximum at any $$x$$ for which $$f'(x)=0$$, $$ad-b^2 > 0$$ and $$a < 0$$.

## Proof of the theorem

We will prove a slightly more general statement.

Let $$E$$ be a Banach space, let $$A\subseteq E$$ be a convex open set, and let $$f:A\to\mathbb{R}$$. We say that $$f$$ is convex if $$f(x+\lambda(y-x)) \le f(x)+\lambda(f(y)-f(x))$$ for all $$x,y\in A$$ and $$\lambda\in(0,1)$$.

Theorem. Let $$A\subseteq E$$ be a convex open set and let $$f:A\to\mathbb{R}$$. Suppose that $$f^{\prime\prime}$$ exists on $$A$$. Then $$f$$ is convex if and only if $$f^{\prime\prime}(x)$$ is positive semidefinite for all $$x\in A$$.

Here, $$f^{\prime\prime}(x):E\times E\to\mathbb{R}$$ is a bilinear form and $$f^{\prime\prime}(x)$$ is said to be positive semidefinite if $$f^{\prime\prime}(x)(h,h)\ge 0$$ for all $$h\in E$$.

The idea of the proof is quite simple: restrict $$f$$ to line segments that lie in $$A$$ and use the single variable case mentioned at the start of this post. In order to do this, we need a formula for the second derivative of a composition of maps between Banach spaces: \begin{align}
D^2(g\circ f)(x)(u,v) &= D^2 g(f(x))(Df(x)(u),Df(x)(v)) \\
\end{align} The proof, which is just a long computation, is included at the end of this post. Note that when $$f,g$$ are maps from $$\mathbb{R}$$ to $$\mathbb{R}$$, we recover the formula $$(g\circ f)^{\prime\prime}(x)=g^{\prime\prime}(f(x))[f'(x)]^2+g'(f(x))f^{\prime\prime}(x).$$

Proof. Let $$x,y\in A$$ and define $$\gamma:(-\varepsilon,1+\varepsilon)\to A$$ by $$\gamma(\lambda)=x+\lambda(y-x)$$, where $$\varepsilon > 0$$ is chosen to be small enough. Then $$D\gamma(\lambda)(1)=y-x$$ and $$D^2\gamma(\lambda)(1,1)=0$$. Using the above formula, we get \begin{align}
(f\circ\gamma)^{\prime\prime}(\lambda) &= D^2(f\circ\gamma)(\lambda)(1,1) \\
&= f^{\prime\prime}(\gamma(\lambda))(y-x,y-x).\tag{*}
\end{align} If $$f^{\prime\prime}(z)$$ is positive semidefinite for all $$z\in A$$ then $$(f\circ\gamma)^{\prime\prime}(\lambda)\ge 0$$ for all $$\lambda$$, so $$f\circ\gamma$$ is convex. Then for all $$\lambda\in(0,1)$$, $$(f\circ\gamma)(0+\lambda(1-0)) \le (f\circ\gamma)(0)+\lambda[(f\circ\gamma)(1)-(f\circ\gamma)(0)]$$ and $$f(x+\lambda(y-x)) \le f(x)+\lambda(f(y)-f(x)),$$ which proves that $$f$$ is convex. Conversely, suppose that $$f$$ is convex. Let $$z\in A$$ and choose $$r > 0$$ so that the open ball of radius $$r$$ around $$z$$ is contained in $$A$$. Let $$|u| < r$$ and set $$x=z-u/2$$ and $$y=z+u/2$$ in (*) so that \begin{align} f^{\prime\prime}(z)(u,u) &= f^{\prime\prime}(\gamma(1/2))(y-x,y-x) \\ &= (f\circ\gamma)^{\prime\prime}(1/2) \\ &\ge 0 \end{align} since $$f\circ\gamma$$ is convex. This shows that $$f^{\prime\prime}(z)$$ is positive semidefinite for all $$z\in A$$. $$\square$$

## The second derivative formula

Lemma (Chain rule for the second derivative). Let $$E,F$$ be Banach spaces. Let $$A\subseteq E$$ and $$B\subseteq F$$ be open sets. Let $$f:A\to F$$ and $$g:B\to G$$ with $$f(A)\subseteq B$$. If $$D^2 f$$ exists on $$A$$ and $$D^2 g$$ exists on $$B$$, then $$D^2(g\circ f)$$ exists on $$A$$ and \begin{align}
D^2(g\circ f)(x)(u,v) &= D^2 g(f(x))(Df(x)(u),Df(x)(v)) \\
\end{align}

Proof. We can write $$D(g\circ f)=c\circ d\circ e$$, where
\begin{align}
c:L(F,G)\times L(E,F) & \to L(E,G)\\
(\lambda,\mu) & \mapsto\lambda\circ\mu;\\
d:E\times E & \to L(F,G)\times L(E,F)\\
(x,y) & \mapsto((Dg\circ f)(x),Df(y));\\
e:E & \to E\times E\\
x & \mapsto(x,x).
\end{align} Note that $$c$$ is a continuous bilinear map and $$e$$ is a continuous
linear map. We compute \begin{align}
D^{2}(g\circ f)(x) & =Dc((d\circ e)(x))\circ Dd(e(x))\circ De(x)\\
& =Dc(Dg(f(x)),Df(x))\circ Dd(x,x)\circ e.
\end{align} Now $$Dd(x,y)(u,v)=(D^{2}g(f(x))(Df(x)(u)),D^{2}f(y)(v)),$$ so
\begin{align}
D^{2}(g\circ f)(x)(u) & =Dc(Dg(f(x)),Df(x))(Dd(x,x)(u,u))\\
& =Dc(Dg(f(x)),Df(x))(D^{2}g(f(x))(Df(x)(u)),D^{2}f(x)(u))\\
& =D^{2}g(f(x))(Df(x)(u))\circ Df(x)\\
\end{align} $$\square$$ 1. Sara says: