Tuesday, 15 May 2018

Quantum machine learning algorithm for supervised cluster assignment

Machine learning refers to various methods for deriving patterns from data that can be used to interpret new inputs. This is what is meant by computers that can "learn" without being explicitly programmed. Machine learning techniques are often associated with the development of artificial intelligence, but they are also prevalent in applications that involve large amounts of data, such as in image and speech recognition, and in risk assessment for business and finance.

Broadly speaking, there are three forms of machine learning: (i) supervised, in which a machine infers a function from a sample of input-output relations (called training data); (ii) unsupervised, in which a machine tries to uncover a hidden structure from assorted data;  and (iii) reinforced, in which a machine attempts to develop a strategy for winning a game given a set of rules and objectives.

Formally, machine learning techniques involve data represented as vectors in a high-dimensional space. As it turns out, quantum information has taught us that quantum computers are good at manipulating high-dimensional vectors for certain tasks. The general aim of quantum machine learning is to develop quantum algorithms that exhibit significant speedup over classical machine learning techniques.

In many cases, the quantum learning approach seems to amount to running a classical learning problem on a quantum computer, hoping to find a speedup. But in other cases such as neural networks or Bayesian decision theory, the approach is to translate stochastic methods into the language of quantum information, hoping to develop purely quantum methods with no real classical counterpart.

Here we will describe a quantum algorithm by Seth Lloyd, Masoud Mohseni, and Patrick Rebentrost for evaluating distances and inner products between vectors that has exponential speedup over known classical methods. This was chosen because it is one of the earliest known applications of quantum information to machine learning, and mostly because it is simple enough to be described in detail.

In the problem of supervised cluster assignment, we are given an unassigned input $\vec{u}$ and a sample of $M$ vectors $\vec{v}_{j}$ and $\vec{w}_{k}$ for two clusters $V$ and $W$. The task is to assign $u$ to either $V$ or $W$. In this case, the typical approach is to compute the mean or centroid of the samples for $V$ and $W$ and find which of these two centroids is closer to $\vec{u}$. If we consider vectors of real numbers then it is sufficient to use the Euclidean distance

$D(\vec{a},\vec{b}) = | \vec{a} - \vec{b}| = \sqrt{| \vec{a} |^2 + | \vec{b} |^2 - 2 \vec{a} \cdot \vec{b}}$

as a measure of distance between any two vectors $\vec{a}$ and $\vec{b}$.

 A diagram for the supervised cluster assignment. Given a sample of vectors from $V$ (squares) and $W$ (triangles) from previously cataloged data and a new input vector $\vec{u}$ (blue dot), the task is to assign $\vec{u}$ to one of the clusters. The standard approach here is to compute the mean or centroid of each sample (green and red crosses), and find which centroid is nearer to $\vec{u}$.

We will use a quantum algorithm to compute the distance between any pair of vectors. To do this efficiently, we assume that our data for $\vec{u}$, $\vec{v}_{k}$, and $\vec{w}_k$ are stored in some quantum memory, which a quantum computer can access in order to create quantum states of the form

$|a\rangle = \dfrac{ \vec{a} }{ \sqrt{| \vec{a}|} }$,

that is, the data vectors are directly encoded into quantum states where the data vector lengths are stored in the norms, i.e.,  $\langle a | a \rangle = | \vec{a} |$.

The quantum algorithm computes the distance between vectors $\vec{a}$ and $\vec{b}$ by estimating the inner product between the quantum states

$| \psi \rangle = \dfrac{1}{\sqrt{2}} \left( \dfrac{|0\rangle |a\rangle}{\sqrt{|\vec{a}|}} + \dfrac{|1\rangle |b\rangle}{\sqrt{ |\vec{b}|}} \right)$ and $|\phi \rangle = \dfrac{1}{\sqrt{Z}} \left( |\vec{a}| |0\rangle - |\vec{b}| |1\rangle \right)$

where $Z = |\vec{a}|^{2} + |\vec{b}|^{2}$. These two states look complicated but they can easy enough to construct if the data for $\vec{a}$ and $\vec{b}$ are available in a quantum memory. We construct $|\psi\rangle$ and $|\phi\rangle$ because it can be shown that

$\langle \phi | \psi \rangle = \dfrac{1}{\sqrt{2Z}} ( |\vec{a}| \langle 0| - |\vec{b}| \langle 1|) \left( \dfrac{|0\rangle |a\rangle}{\sqrt{|\vec{a}|}} + \dfrac{|1\rangle |b\rangle}{\sqrt{ |\vec{b}|}} \right) = \dfrac{1}{\sqrt{2Z}} \left( \sqrt{|\vec{a}|} |a\rangle - \sqrt{|\vec{b}|} |b\rangle \right)$,

and so

$|\langle \phi | \psi \rangle|^2 = \dfrac{1}{2Z} \left\lVert \sqrt{|\vec{a}|} |a\rangle - \sqrt{|\vec{b}|} |b\rangle \right\rVert^2 = \dfrac{1}{2Z} \left( |\vec{a}| \langle a | a \rangle + |\vec{ab}| \langle b | b \rangle - \sqrt{ |\vec{a}| |\vec{b}| } \langle a | b \rangle - \sqrt{ |\vec{b}| |\vec{a}|} \langle b | a \rangle \right) = \dfrac{1}{2Z} \left( |\vec{a}|^2 + |\vec{b}|^2 - 2 \sqrt{ |\vec{a}| |\vec{b}|} \langle a | b \rangle \right) = \dfrac{D(\vec{a},\vec{b})^2}{2Z}.$

To estimate $| \langle \phi | \psi \rangle |^2$, we will use a swap test between $|\phi\rangle$ and the qubit part of $|\psi\rangle$.

A quantum swap test is often used to test if two states $|\alpha\rangle$ and $|\beta\rangle$ are the same or not. It involves the following circuit:

 Quantum swap test for  states $|\alpha\rangle$ and $|\beta\rangle$. We apply a Hadamard gate to an ancillary qubit and use this as the control qubit of a controlled-swap operation between $|\alpha\rangle$ and $|\beta\rangle$. After that, we again apply a Hadamard gate to the ancillary qubit and finally measure that qubit in the standard qubit. If we run this circuit many times, we can estimate the probability of measuring $|0\rangle$ for the ancillary qubit.

Going over how the state changes in the circuit shown above:

$|0\rangle |\alpha\rangle |\beta\rangle \mapsto \dfrac{1}{\sqrt{2}} \left (|0\rangle + |1\rangle) |\alpha\rangle |\beta\rangle \right) \mapsto \dfrac{1}{\sqrt{2}} \left (|0\rangle |\alpha\rangle |\beta\rangle + |1\rangle |\beta\rangle |\alpha\rangle \right) \mapsto \dfrac{1}{2} \left [ |0\rangle \left( |\alpha\rangle |\beta\rangle + |\beta\rangle |\alpha\rangle \right) + |1\rangle \left( |\alpha\rangle |\beta\rangle - |\beta\rangle |\alpha\rangle \right) \right]$

From the last step, it can be seen that if $|\alpha\rangle = |\beta\rangle$ then measuring the ancillary qubit yields $|0\rangle$ with probability 1. More generally,

$\Pr[ \text{obtain } |0\rangle] = \dfrac{1}{4} \lVert |\alpha\rangle |\beta\rangle + |\beta\rangle |\alpha\rangle \rVert^{2} = \dfrac{1 + | \langle \alpha | \beta \rangle |^2}{2}$.

When we run the swap test for $|\phi \rangle$ and the ancillary qubit of $|\psi \rangle$, the state changes in the circuit as follows:

$|0\rangle |\phi\rangle |\psi \rangle \mapsto \left( \dfrac{ |0\rangle + | 1\rangle}{2} \right) |\phi\rangle \left( \dfrac{|0\rangle |a\rangle}{\sqrt{|\vec{a}|}} + \dfrac{|1\rangle |b\rangle}{\sqrt{ |\vec{b}|}} \right) \mapsto \dfrac{1}{2} \left[ |0\rangle \left( \dfrac{|\phi\rangle |0\rangle |a\rangle}{\sqrt{|\vec{a}|}} + \dfrac{ |\phi\rangle |1\rangle |b\rangle}{\sqrt{ |\vec{b}|}} \right) + |1\rangle \left( \dfrac{|0\rangle |\phi\rangle |a\rangle}{\sqrt{|\vec{a}|}} + \dfrac{|1\rangle |\phi\rangle |b\rangle}{\sqrt{ |\vec{b}|}} \right)\right] \mapsto \dfrac{ |0\rangle}{2\sqrt{2}} \left[ \left( |\phi\rangle |0\rangle + |0\rangle |\phi\rangle \right) \dfrac{|a\rangle}{\sqrt{|\vec{a}|}} + \left( |\phi\rangle |1\rangle + |1\rangle |\phi\rangle \right) \dfrac{|b\rangle}{\sqrt{|\vec{b}|}} \right] + \dfrac{ |1\rangle}{2\sqrt{2}} \left[ \left( |\phi\rangle |0\rangle - |0\rangle |\phi\rangle \right) \dfrac{|a\rangle}{\sqrt{|\vec{a}|}} + \left( |\phi\rangle |1\rangle - |1\rangle |\phi\rangle \right) \dfrac{|b\rangle}{\sqrt{|\vec{b}|}} \right]$.

The probability of obtaining $|0\rangle$ when the first qubit is measured is given by the norm of the post-measurement state.

Let $| \Omega_{j}\rangle = |\phi\rangle |j\rangle + |j\rangle |\phi\rangle$ for $j = 0,1$. Using the definition of $\phi$, it is a bit tedious but straightforward to verify that

$\langle \Omega_0 |\Omega_0\rangle = \dfrac{4 |\vec{a}|^2 + 2 |\vec{b}|^2}{Z}, \quad \langle \Omega_1 |\Omega_1\rangle = \dfrac{4 |\vec{b}|^2 + 2 |\vec{a}|^2}{Z}, \quad \langle \Omega_0 |\Omega_1\rangle = \langle \Omega_0 |\Omega_0\rangle = \dfrac{-2 |\vec{a}| |\vec{b}|}{Z}$

Then we can easily compute

$\Pr[ \text{obtain } |0\rangle] = \dfrac{1}{8} \left\lVert \dfrac{ |\Omega_{0}\rangle |a\rangle}{\sqrt{|\vec{a}|}} + \dfrac{ |\Omega_{1}\rangle |b\rangle}{\sqrt{|\vec{b}|}} \right\rVert^2 = \dfrac{1}{8} \left( \dfrac{ \langle \Omega_0 |\Omega_0\rangle \langle a |a \rangle}{|\vec{a}|} + \dfrac{ \langle \Omega_1 |\Omega_1\rangle \langle b | b \rangle}{|\vec{b}|} + \dfrac{ \langle \Omega_0 |\Omega_1\rangle \langle a |b \rangle}{\sqrt{|\vec{a}| |\vec{b}|}} + \dfrac{ \langle \Omega_1 |\Omega_0\rangle \langle b |a \rangle}{\sqrt{|\vec{b}| |\vec{a}|}} \right) = \dfrac{1}{8Z} \left[ 4 ( |\vec{a}|^2 + |\vec{b}|^2) + 2( |\vec{a}|^2 + |\vec{b}|^2 - 2 \vec{a}\cdot \vec{b} \right] = \dfrac{1}{8Z} \left( 4Z + 2 D(\vec{a},\vec{b})^2 \right)$

so an estimate for this probability yields an estimate for the distance between $\vec{a}$ and $\vec{b}$. We just need to make several copies of the states and run the swap test a few times, and estimate the probability from the fraction of outcomes where $|0\rangle$ was obtained.

Finally, we apply this to the problem of cluster assignment by setting $\vec{a} = \vec{u}$ and computing the distance for $\vec{b}_0 = \frac{1}{M} \sum_j \vec{v}_j$ and $\vec{b}_1 = \frac{1}{M} \sum_k \vec{w}_k$, and assign $\vec{u}$ to $V$ if it is closer to $\vec{b}_0$, or to $W$ if it is closer to $\vec{b}_1$.

What we have seen above is just one example of a problem that involves taking the vector dot product in a high-dimensional real vector space, where typical classical methods take $O(N)$ steps to compute, while the quantum version takes $O(\log N)$ steps. However, it is important to note that this speedup comes with some caveats, such as the need for a quantum memory in order to prepare the necessary states in the above method, and such requirements are not necessarily easy to achieve in practice.

In the same paper, Lloyd, Mohseni, and Rebentrost used the quantum algorithm above as a subroutine in an adiabatic quantum algorithm for solving the $k$-means problem. The $k$-means problem is an unsupervised learning problem of assigning $M$ vectors to $k$ clusters in such a way that the average distance to the centroid of the cluster is minimized. The adiabatic algorithm they describe is based on the standard classical method used for solving the $k$-means problem, but with the optimization being run on an adiabatic quantum computer.

Reference:

S. Lloyd, M. Mohseni, P. Rebentrost, "Quantum algorithms for supervised and unsupervised machine learning", arXiv:1307.0411v2.