Horn’s inequality for singular values via exterior algebra

\)Horn’s inequality states that for any two compact operators \(\sigma,\tau\) on a Hilbert space \(E\), $$\prod_{k=1}^n s_k(\sigma\tau) \le \prod_{k=1}^n s_k(\sigma)s_k(\tau)$$ where \(s_1(\tau),s_2(\tau),\dots\) are the singular values of \(\tau\) arranged in descending order. Alfred Horn’s original 1950 paper provides a short proof that relies on the following result:

Theorem. If \(H\) is a positive, symmetric, completely continuous operator whose first \(n\) eigen-values are \(\lambda_1,\dots,\lambda_n\), then $$\det[(Hy_i,y_j)] \le \lambda_1\cdots\lambda_n\det[(y_i,y_j)]$$ for any elements \(y_1,\dots,y_n\).

It seems that back in 1950, they used the term “completely continuous” instead of the more modern “compact”. Horn also states:

Weyl’s elegant proof uses an appeal to the theory of \(n\)-tensors. A straightforward proof may be given by using the relation \((Hy_i,y_j)=\sum_k\lambda_k(y_i,x_k)(x_k,y_j)\), where the \(x_k\) form a complete ortho-normal set.

I couldn’t really understand Weyl’s paper, and if you try to expand \(Hy_i\) as suggested, things get messy very quickly. (You will probably get there eventually!) In the process of trying to come up with a proof, I discovered that the theorem is actually a very intuitive and cleverly disguised statement about the \(n\)th exterior power of \(E\).

The \(k\)th exterior power of a Hilbert space

I’m going to assume that you know what the terms “exterior algebra” and “\(k\)th exterior power” mean. Let \(E\) be a Hilbert space over \(K\), where \(K=\mathbb{R}\) or \(K=\mathbb{C}\). We will denote the \(k\)th exterior power of \(E\) by \(\Lambda^k E\). The basic idea is that the element \(v_1\wedge\cdots\wedge v_k \in \Lambda^k E\) represents an oriented \(k\)-parallelotope spanned by \(v_1,\dots,v_k \in E\). The vector space \(\Lambda^k E\) consists of all possible linear combinations of such elements. Since \(E\) has an inner product, we can give \(\Lambda^k E\) an inner product that satisfies \begin{align}
\langle u_1\wedge\cdots\wedge u_k, v_1\wedge\cdots\wedge v_k\rangle
&= \det((\langle u_i,v_j \rangle)_{i,j}) \\
&= \det\begin{bmatrix}
\langle u_1,v_1 \rangle & \cdots & \langle u_1,v_k \rangle \\
\vdots & \ddots & \vdots \\
\langle u_k,v_1 \rangle & \cdots & \langle u_k,v_k \rangle

To show that such an inner product exists, use the fact that the determinant of a matrix is alternating and multilinear in the rows and columns, and apply the universal property of the \(k\)th exterior power twice. The next theorem proves that this inner product is in fact positive definite.

When \(u_i=v_i\) for \(i=1,\dots,k\), the matrix on the right hand side above is called the Gram matrix of \(v_1,\dots,v_k\). Geometrically, \(|v_1\wedge\cdots\wedge v_k|\) is the volume of the \(k\)-parallelotope spanned by \(v_1,\dots,v_k\). Note that \(|v_1\wedge\cdots\wedge v_k|=|v_1|\cdots|v_k|\) whenever \(v_1,\dots,v_k\) are orthogonal.

Theorem 1. If \(v_1,\dots,v_k \in V\) then the Gram matrix $$G(v_1,\dots,v_k)=(\langle v_i,v_j \rangle)_{i,j}$$ is positive semidefinite. It is invertible if and only if \(v_1 \wedge\cdots\wedge v_k \ne 0\).

Proof. For all \(x=(x_1,\dots,x_k)\in K^k\), \begin{align}
\langle G(v_1,\dots,v_k)x,x \rangle
&= \sum_{i=1}^k\sum_{j=1}^k\langle v_i,v_j \rangle x_i\overline{x_j} = \sum_{i=1}^k\sum_{j=1}^k\langle xv_i,xv_j \rangle \\
&= \left\vert\sum_{i=1}^k x_i v_i\right\vert^2 \ge 0. \tag{*}
\end{align} It is a well-known result that \(v_1\wedge\cdots\wedge v_k=0\) if and only if \(v_1,\dots,v_k\) are linearly dependent. If \(v_1,\dots,v_k\) are linearly dependent then \(G(v_1,\dots,v_k)\) is not invertible because its columns are linearly dependent. Conversely, if \(G(v_1,\dots,v_k)\) is not invertible then \(G(v_1,\dots,v_k)x=0\) for some \(x\ne 0\). (*) shows that \(\sum_{i=1}^k x_i v_i = 0\), so \(v_1,\dots,v_k\) are linearly dependent. \(\square\)

Theorem 2 (Hadamard’s inequality). For all \(v_1,\dots,v_k\in V\) we have $$|v_1\wedge\cdots\wedge v_k|\le|v_1|\cdots|v_k|,$$ with equality if and only if \(v_1,\dots,v_k\) are orthogonal or \(v_i=0\) for some \(i\).

Proof. We can assume that \(v_1,\dots,v_k\ne 0\). Let \(u_i=v_i/|v_i|\) for \(i=1,\dots,k\). Let \(\lambda_1,\dots,\lambda_k\ge 0\) be the eigenvalues of \(G(u_1,\dots,u_k)\), with repetitions. Using the AM-GM inequality, \begin{align}
|u_1\wedge\cdots\wedge u_k|
&= \det(G(u_1,\dots,u_k)) = \lambda_1\cdots\lambda_k \\
&\le \left(\frac{\lambda_1+\cdots+\lambda_k}{k}\right)^k = \left(\frac{\tr G(u_1,\dots,u_k)}{k}\right)^k \\
&= \left(\frac{|u_1|^2+\cdots+|u_k|^2}{k}\right)^k = 1,
\end{align} with equality if and only if \(\lambda_1=\cdots=\lambda_k\). This is true if and only if \(G(u_1,\dots,u_k)\) is the identity matrix, \(\{u_1,\dots,u_k\}\) is an orthonormal set. \(\square\)

An important consequence that we will be using is:

Corollary 3. The alternating multilinear map \((v_1,\dots,v_k)\mapsto v_1\wedge\cdots\wedge v_k\) is continuous.

We have turned \(\Lambda^k E\) into an inner product space, but in general \(\Lambda^k E\) is not complete (when \(E\) is infinite-dimensional). From now on, we will be working with its completion \(\overline{\Lambda^k}E\), which is a Hilbert space by definition.

The expansion formula

The formula below is the key to understanding the proof of the theorem stated at the beginning of this post. We denote the set of continuous linear operators on \(E\) by \(L(E)\).

Theorem 4. Let \(E\) be a Hilbert space and let \(\{e\}_{\alpha\in A}\) be an ordered Hilbert basis for \(E\). If \(f\in L(E)\) and \(fe_\alpha=\lambda_\alpha e_\alpha\) for all \(\alpha\in A\), then for all \(v_1,\dots,v_k\in E\), $$
fv_1\wedge\cdots\wedge fv_k = \sum_{\alpha_1<\cdots<\alpha_k} \lambda_{\alpha_1}\cdots\lambda_{\alpha_k} \langle v_1\wedge\cdots\wedge v_k,e_{\alpha_1}\wedge\cdots\wedge e_{\alpha_k} \rangle e_{\alpha_1}\wedge\cdots\wedge e_{\alpha_k}.$$

Proof. Due to Corollary 3, we can apply the usual Fourier expansion formula to get \begin{align}
& fv_1\wedge\cdots\wedge fv_k \\
&= \sum_{\alpha_1\in A} \langle v_1,e_{\alpha_1}\rangle fe_{\alpha_1} \wedge\cdots\wedge \sum_{\alpha_k\in A} \langle v_k,e_{\alpha_k}\rangle fe_{\alpha_k} \\
&= \sum_{\alpha_1\in A}\cdots\sum_{\alpha_k\in A} \lambda_{\alpha_1}\cdots\lambda_{\alpha_k} \langle v_1,e_{\alpha_1}\rangle\cdots\langle v_k,e_{\alpha_k}\rangle e_{\alpha_1}\wedge\cdots\wedge e_{\alpha_k} \\
&= \sum_{\alpha_1<\cdots<\alpha_k} \lambda_{\alpha_1}\cdots\lambda_{\alpha_k} \sum_{\sigma\in S_k} \langle v_1,e_{\alpha_{\sigma(1)}}\rangle\cdots\langle v_k,e_{\alpha_{\sigma(k)}}\rangle e_{\alpha_{\sigma(1)}}\wedge\cdots\wedge e_{\alpha_{\sigma(k)}} \\ &= \sum_{\alpha_1<\cdots<\alpha_k} \lambda_{\alpha_1}\cdots\lambda_{\alpha_k} \sum_{\sigma\in S_k} \sgn(\sigma) \langle v_1,e_{\alpha_{\sigma(1)}}\rangle\cdots\langle v_k,e_{\alpha_{\sigma(k)}}\rangle e_{\alpha_1}\wedge\cdots\wedge e_{\alpha_k} \\ &= \sum_{\alpha_1<\cdots<\alpha_k} \lambda_{\alpha_1}\cdots\lambda_{\alpha_k} \det((\langle v_i,e_{\alpha_j} \rangle)_{i,j}) e_{\alpha_1}\wedge\cdots\wedge e_{\alpha_k} \\ &= \sum_{\alpha_1<\cdots<\alpha_k} \lambda_{\alpha_1}\cdots\lambda_{\alpha_k} \langle v_1\wedge\cdots\wedge v_k,e_{\alpha_1}\wedge\cdots\wedge e_{\alpha_k} \rangle e_{\alpha_1}\wedge\cdots\wedge e_{\alpha_k}. \end{align} \(\square\)

If we apply the theorem to \(f=\Id_E\), then we have the formula $$
v_1\wedge\cdots\wedge v_k = \sum_{\alpha_1<\cdots<\alpha_k} \langle v_1\wedge\cdots\wedge v_k,e_{\alpha_1}\wedge\cdots\wedge e_{\alpha_k} \rangle e_{\alpha_1}\wedge\cdots\wedge e_{\alpha_k}.$$ This shows that the set $$\mathcal{B}=\{e_{\alpha_1}\wedge\cdots\wedge e_{\alpha_k}:\alpha_1<\cdots<\alpha_k\}$$ is a Hilbert basis for \(\overline{\Lambda^k}E\), where \({\alpha_1,\dots,\alpha_k}\) ranges over the subsets of \(A\) with \(k\) elements. Also note that $$|v_1\wedge\cdots\wedge v_k|^2 = \sum_{\alpha_1<\cdots<\alpha_k} |\langle v_1\wedge\cdots\wedge v_k,e_{\alpha_1}\wedge\cdots\wedge e_{\alpha_k} \rangle|^2.$$ Theorem 4 states that if \(f\in L(E)\) is diagonalizable with respect to \(\{e\}_{\alpha\in A}\), then the induced map \(\Lambda^k f\in L(\overline{\Lambda^k}E)\) (satisfying \((\Lambda^k f) v_1\wedge\cdots\wedge v_k = fv_1\wedge\cdots\wedge fv_k\)) is diagonalizable with respect to \(\mathcal{B}\). Furthermore, the eigenvalue corresponding to \(e_{\alpha_1}\wedge\cdots\wedge e_{\alpha_k}\) is \(\lambda_{\alpha_1}\cdots\lambda_{\alpha_k}\).

Proof of Horn’s inequality

We now return to the theorem stated at the beginning of this post, using our new notation.

Theorem 5. Let \(\rho\in L(E)\) be a positive compact operator and let \(\lambda_1\ge\lambda_2\ge\cdots\) be the eigenvalues of \(\rho\) (with repetition). If \(v_1,\dots,v_n\in E\) then $$\langle \rho v_1\wedge\cdots\wedge\rho v_n,v_1\wedge\cdots\wedge v_n\rangle \le \lambda_1\cdots\lambda_n|v_1\wedge\cdots\wedge v_n|^2.$$

Proof. Write \(\rho=\sum_{k=1}^\infty \lambda_k e_k e_k^*\) for some countable orthonormal set \(\{e_k\}\) in \(E\), and choose an ordered Hilbert basis \(\{e_\alpha\}_{\alpha\in A}\) for \(E\) that contains \(\{e_k\}\). Since \(\rho e_\alpha=0\) whenever \(e_\alpha\notin\{e_k\}\), Theorem 4 shows that \begin{align}
& \rho v_1\wedge\cdots\wedge\rho v_n \\
&= \sum_{k_1 < \cdots < k_n} \lambda_{k_1}\cdots\lambda_{k_n} \langle v_1\wedge\cdots\wedge v_n,e_{k_1}\wedge\cdots\wedge e_{k_n} \rangle e_{k_1}\wedge\cdots\wedge e_{k_n}. \end{align} Therefore \begin{align} & \langle \rho v_1\wedge\cdots\wedge\rho v_n,v_1\wedge\cdots\wedge v_n\rangle \\ &= \sum_{k_1 < \cdots < k_n} \lambda_{k_1}\cdots\lambda_{k_n} |\langle v_1\wedge\cdots\wedge v_n,e_{k_1}\wedge\cdots\wedge e_{k_n} \rangle|^2 \\ &\le \lambda_1\cdots\lambda_n \sum_{k_1 < \cdots < k_n} |\langle v_1\wedge\cdots\wedge v_n,e_{k_1}\wedge\cdots\wedge e_{k_n} \rangle|^2 \\ &\le \lambda_1\cdots\lambda_n \sum_{\alpha_1<\cdots<\alpha_n} |\langle v_1\wedge\cdots\wedge v_n,e_{\alpha_1}\wedge\cdots\wedge e_{\alpha_n} \rangle|^2 \\ &= \lambda_1\cdots\lambda_n|v_1\wedge\cdots\wedge v_n|^2. \end{align} \(\square\)

Lemma 6. If \(f\in L(E)\), \(u_1,\dots,u_k\in E\), and \(v_1,\dots,v_k\in E\), $$\langle fu_1\wedge\cdots\wedge fu_k, v_1\wedge\cdots\wedge v_k\rangle = \langle u_1\wedge\cdots\wedge u_k, f^* v_1\wedge\cdots\wedge f^* v_k\rangle.$$

Proof. This follows directly from the fact that \(\det((\langle fu_i,v_j \rangle)_{i,j}) = \det((\langle u_i,f^* v_j \rangle)_{i,j})\). \(\square\)

Finally, we can prove Horn’s inequality.

Theorem 7 (Horn’s inequality). Let \(\sigma,\tau\in L(E)\) be compact. For all \(n\), $$\prod_{k=1}^n s_k(\sigma\tau) \le \prod_{k=1}^n s_k(\sigma)s_k(\tau).$$

Proof. Let \(\sigma\tau = \sum_{k=1}^\infty s_k(\sigma\tau) v_k u_k^*\) be a singular value decomposition of \(\sigma\tau\). Clearly $$|\sigma\tau u_1\wedge\cdots\wedge\sigma\tau u_n| = \prod_{k=1}^n s_k(\sigma\tau).$$ By Theorem 5 and Lemma 6, \begin{align}
\prod_{k=1}^n s_k(\sigma\tau)^2
&= |\sigma\tau u_1\wedge\cdots\wedge\sigma\tau u_n|^2 \\
&= \langle (\sigma^*\sigma)\tau u_1\wedge\cdots\wedge (\sigma^*\sigma)\tau u_n, \tau u_1\wedge\cdots\wedge \tau u_n\rangle \\
&\le \left(\prod_{k=1}^n s_k(\sigma)^2\right) |\tau u_1\wedge\cdots\wedge \tau u_n|^2 \\
&= \left(\prod_{k=1}^n s_k(\sigma)^2\right) \langle (\tau^*\tau) u_1\wedge\cdots\wedge (\tau^*\tau) u_n, u_1\wedge\cdots\wedge u_n\rangle \\
&\le \left(\prod_{k=1}^n s_k(\sigma)^2\right) \left(\prod_{k=1}^n s_k(\tau)^2\right).
\end{align} \(\square\)

Leave a Reply