# The folly of fidelity and how I learned to love randomness

If a quantum device would ideally prepare a state $\psi$, but instead we prepare some other approximation state $\rho$. Then how should the error be quantified?

For many people, the gold standard is to use the fidelity. This is defined as

$F = \langle \psi \vert \rho \vert \psi \rangle$

One of the main goals of this blog post is to explain why the fidelity is a terrible, terrible error metric. You should be using the “trace norm distance”. I will not appeal to mathematical beauty — though trace norm wins on these grounds also — but simply to what is experimentally and observationally meaningful. What do I mean by this? We never observe quantum states. We observe measurement results. Ultimately, we care that the observed statistics of experiments is a good approximation of the exact, target statistics. In particular, every measurement outcome is governed by a projector $\Pi$ and if we have two quantum states $\psi$ and $\rho$ then the difference in outcome probabilities is

$\epsilon_\Pi = | \mathrm{Tr} [ \Pi \rho ] - \langle \psi \vert \Pi \vert \psi \rangle |$

Error in outcome probabilities is the only physically meaningful kind of error. Anything else is an abstraction. The above error depends on the measurement we perform, so to get a single number we need to consider either the average or maximum over all possible projectors.

The second goal of this post is to convince you that randomness is really, super useful! This will link into the fidelity vs trace norm story. Several of my papers use randomness in a subtle way to outperform deterministic protocols (see here, here and most recently here). I’ve also written another blog posts on the subject (here). Nevertheless, I still find many people are wary of randomness. Consider a protocol for approximating $\psi$ that is probabilistic and prepares states $\phi_k$ with probability $p_k$. What we will see is that the error (when correctly quantified) of probabilistic protocols can be considerably less than any of the pure states $\phi_k$ in this ensemble. Strange but true. Remember, I only care about measurement probabilities. And you should only care about measurement probabilities! Given a random protocol, the probability of a certain outcome will be
$\sum_k p_k \langle \phi_k \vert \Pi \vert \phi_k \rangle = \mathrm{Tr}[ \Pi \rho ]$
where
$\rho = \sum_k p_k \vert \phi_k \rangle \langle \phi_k \vert$
so our measure of error must depend on the averaged density matrix. Even though we might know after the event which pure state we prepared, the measurement statistics are entirely governed by the density matrix that averages over the ensemble of pure states. When I toss a coin, I know after the event with probability 100% whether it is heads or tails, but this does not prevent it from being an excellent 50/50 (pseudo) random number generator. Say I want to build a random number generator with a 25/75 split of different outcomes. I could toss a coin once: if I get heads then I output heads; otherwise I toss a second time and output the second toss result. This clearly does the job. We do not hear people object, “ah ha but after the first toss, it is now more likely to give tails so you algorithm is broken”. Similarly, quantum computers are random number generators and are allowed a “first coin toss” that determines what they do next.

Let’s thinking about the simplest possible example; a single-qubit. Imagine the ideal, or target state, is $\vert \psi \rangle = \vert 0 \rangle$. We consider states of the form $\vert \theta \rangle = \cos( \theta / 2 ) \vert 0 \rangle + \sin( \theta / 2 ) \vert 1 \rangle$. The states $\vert \theta \rangle$ and $\vert -\theta \rangle$ both have the same fidelity with respect to the target state. So too does the mixed state
$\rho(\theta) = \frac{1}{2} ( \vert \theta \rangle \langle \theta \vert + \vert -\theta \rangle \langle -\theta \vert )$.
Performing a measurement in the X-Z plane, we get the following measurement statistics.

We have three different lines for three states that all have the same fidelity, but the measurements errors behave very differently. Two things jump out.

Firstly, the fidelity is not a reliable measure of measurement error. For the pure states, in the worst case, the measurement error is 20 times higher than the error as quantified by fidelity. Actually, for almost all measurement angles the fidelity is considerably less than the measurement error for the pure states. Just about the only thing fidelity tells you is that there is one measurement (e.g. with $\varphi=0$) such that the fidelity tells you the measurement error. But it is not generally the case that the quantum computer will perform precisely this measurement. This closes the case on fidelity.

Secondly, the plot highlights how the mixed state performs considerably better than the pure states with the same fidelity! Clearly, we should choose the preparation of the mixed state over either of the pure states. For each pure state, there are very small windows where the measurement error is less than for the mixed state. But when you perform a quantum computation you will not have prior access to a plot like this to help you decide what states to prepare. Rather, you need to design your computation to provide a promise that if your error is $\delta$ (correctly quantified) then all probabilities should be correct up-to and not exceeding $\delta$. The mixed state gives the best promise of this kind.

Now we can see that the “right” metric for measuring errors should be something like:
$\delta = \mathrm{max}_{\Pi} | \mathrm{Tr} [ \Pi \rho ] - \langle \psi \vert \Pi \vert \psi \rangle |$
Nothing could be more grounded in experiments. This is precisely the trace norm (also called the 1-norm) measure of error. Though it needs a bit of manipulation to gets into it’s most recognisable form. One might initially worry that the optimisation over projections is tricky to compute, but it is not! We rearrange as follows

$\delta = \mathrm{max}_{\Pi} | \mathrm{Tr} [ \Pi (\rho- \vert \psi \rangle\langle \psi \vert) ] |$

let

$M = \rho- \vert \psi \rangle\langle \psi \vert = \sum_j \lambda_j \vert \Phi_j \rangle\langle \Phi_j \vert$

where we have given the eigenvalue/vector decomposition of this operator. The eigenvalues will be real numbers because $M$ is Hermitian.

It takes a little bit of thinking to realise that the maximum is achieved by a projection onto the subspace with positive eigenvalues
$\delta = \mathrm{max}_{\Pi} | \mathrm{Tr} [ \Pi M ] = \sum_{j : \lambda_j > 0} \lambda_j$
Because, $M$ is traceless, the eigenvalues sum to zero and we can simplify this as follows
$\delta = \frac{1}{2}\sum_{j } | \lambda_j |$
If we calculated this $\delta$ quantity and added it to the plots above, we would indeed see that it gives a strict upper bound on the measurement error for all measurements.

We have just been talking about measurement errors, but have actually derived the trace norm error. Let us connect this to the usual mathematical way of introducing the trace norm. For an operator $M$, we denote $|| M ||_1$ or $|| M ||_{\mathrm{tr}}$ for the trace norm, which simply means take the absoluate sum of the eigenvalues of $M$. Given two states, the trace norm distance is defined as
$\frac{1}{2} || \rho -\sigma ||_{\mathrm{tr}}$
Hopefully, the reader can see that this is exactly what we have found above. In other words, we have the equivalence
$\frac{1}{2} || \rho -\sigma ||_{\mathrm{tr}} = \mathrm{max}_{\Pi} | \mathrm{Tr} [ \Pi ( \rho -\sigma) ]$

Note that we need the density matrix to evaluate this error metric. If we have a simulation that works with pure states, and we perform statistic sampling over some probability distribution to find the average trace norm error
$\sum_{k} p_k \frac{1}{2} \big|\big| \vert \phi_k \rangle \langle \phi_k \vert - \vert \psi \rangle \langle \psi \vert \big|\big|_{\mathrm{tr}}$
then this will massively over estimated the true error of the density matrix
$\rho = \sum_{k} p_k \vert \phi_k \rangle \langle \phi_k \vert$

I hope this convinces you all to use the trace norm in future error quantification.

I want to finish by tying this back into my recent preprint:
arXiv1811.08017
That preprint looks not at state preparation but the synthesis of a unitary. There I make use of the diamond norm distance for quantum channels. The reason for using this is essentially the same as above. If we have a channel that has error $\delta$ in diamond norm distance, it ensures that for any state preparation routine the final state will have no more than error $\delta$ in trace norm distance. And therefore, it correctly upper bounds the error in measurement statistics.