Extra qubits, please!

I’m working in hybrid quantum computing — that is, in this context, the instances where you combine classical computing power with quantum computing power. There are plenty of interesting questions in this space, in part because addressing “hybrid computing” is fairly vague; my particular approach has been to look at what you can do with more or less computing time on the quantum side, but other authors have, for example, examined the classical overhead necessary to expand the working space of your quantum computer by a little bit: in this post, I’ll outline some of the ideas put forward by Bravyi, Smith and Smolin in 2016 about this.

Before we get into it, though, a forenote about quantum computing and what to make of it if you are unfamiliar with the topic: “quantum computing” has the misfortune of putting together two cool and sufficiently vague terms (i.e., “quantum” and “computing") to give rise to all sorts of related nonsense and misunderstanding². On the other hand, and as with (I expect) any scientific area, there are many ways of “looking” at quantum computing. Personally, I am a big fan of Scott Aaronson’s "Quantum Computing Since Democritus", as it provides a lovely bird’s-eye view of the field, as described by someone who is both a great communicator and an important subject in the field.³ ⁴

One of the reasons why we should look at quantum computing is that it is probably more powerful than the classical model of computation. Part of this can be understood to have to do with how many numbers you need to keep track of to fully describe a quantum circuit: whereas to describe a classical circuit of \(n\) wires you need to keep track of \(n\) bits, to describe an \(n\) wire quantum circuit you need generally⁵ to keep track of \(2^n\) values. However, by the end of the circuit, you want to produce a classical output that you can read — say, a single bit — and so you convolve these \(2^n\) numbers in some particular way to fold them into that output.⁶

Ok, so now suppose that your quantum computer is very crummy. In particular, suppose that your quantum computer does, at most, \(1000\) qubits.⁷ Say also that you’re very interested in a procedure that requires \(1001\) qubits. Surely, you can do better than simulating those \(1001\) qubits classically!⁸ Turns out a positive answer to this statement (when asked in its full generality) is not obvious.

Bravyi et al. address a restricted version of this question: provided that each qubit (which is to say, each line in the circuit) participates in at most \(d\) two-qubit gates⁹, what can you do about a circuit with \(n+k\) lines, given access to a circuit with \(n\) lines? Thus, they first give a name to such circuits:

And then, per the discussion above, we’re not so much interested (necessarily) in these circuits themselves, but rather in the whole procedure that performs a measurement at the end of the circuit and distils the result into a single (classical) bit. So, Bravyi, Smith and Smolin give also a name to the full procedure:

Note how the overhead goes with \(2^k\), exponentially the amount of “extra” qubits we don’t have! That’s intuitively satisfying, I think. (Also, there’s an exponentially growing overhead, which is also intuitive, otherwise simulating an SQC shouldn’t be so hard after all.) But how might they prove this?

First, an assertion about operations on joint blocks of \(k\) and \(n\) qubits, respectively, is made:

Why is this assertion true? Essentially, it is a fundamental result¹¹ that two-qubit gates can be described as a sum of \(2^4\) circuits, where each circuit is composed of two single-qubit gates. On the other hand, we are dealing with a \(d\)-sparse circuit; therefore, at worst, there are \(d\) gates connecting a qubit in block \(A\) to block \(B\). If block \(A\) has \(k\) qubits (and block \(B\) has \(n\) qubits), there are therefore at most \(kd\) gates connecting the two blocks. By decomposing each of them into a sum of \(2^4\) actions, and as sequential actions are multiplicative, we get a total of \((2^4)^{kd}=2^{4kd}\) circuits.

Now, to move on, we need some symbolic description of a SQC. I’m just going to provide it to you now, and ask that you don’t panic:

What’s going on here? Broadly speaking, the capital \(\Pi\) operation represents the different measurements we may obtain (thus every bit string from \(0\) to \(2^{n+k}-1\)), but filtered according to our classical process (if \(f(x)=0\), that bit string is ignored). \(U\vert{0^{n+k}}\rangle\) is the quantum state our circuit produces, and “sandwiching” \(\Pi\) with the state gives us the expected value of \(\Pi\) over the probability distribution of the state: this is called Born’s rule.

Now, we can take \(U\), which, recall, represents a \(d\)-sparse circuit, and apply our previous statement. This will give us a sum of many such “sandwiches” of states of only \(n\) qubits:

\[ \pi = \sum_{y \in \{0,1\}^k} \sum_{\alpha,\beta = 1}^{2^{4kd}} (\cdots) \langle \phi_\alpha \vert \Lambda(y) \vert \phi_\beta \rangle \] \[ \Lambda(y) = \sum_{z \in \{0,1\}^n \vert f(y \oplus z)=1} \vert z \rangle\langle z \vert \] \[ \vert \phi_\alpha \rangle = W_\alpha \vert 0^n \rangle \]

(\({\scriptstyle(\cdots)}\) is a numeric factor we can calculate. Notice how in the definition of \(\Lambda(y)\) we write \(f(y \oplus z)\), where \(\oplus\) denotes concatenation. It’s as though we were running many circuits, and assuming for each one that the measurement of the \(k\) qubits took a particular value.)

This puts us nearly there, because we have a sum of \(2^{4kd + k}\) terms, where each term is a sandwich of \(n\)-qubit states. We only need a prescription for an \(n\)-qubit circuit to calculate those sandwiches. Fortunately, calculating such a sandwich is a fairly standard problem in quantum computing, and can be achieved with a so-called Hadamard test. For completeness, here are the circuits that do that:

Now, are these circuits sparse? Yes, they are \(dk\)-sparse! Why? Because, again due to the statement that gives us \(W_\alpha\), we know that \(W_\alpha\) and \(W_\beta\) differ by at most \(kd\) single qubit gates. This means that those controls need to affect these gates only; and so we have at most \(kd\) two-qubit gates between the first qubit and the rest of them. At the same time, \(W_\alpha\) and \(W_\beta\) are \(d\)-sparse, so all qubits except the control qubit participate in at most \(d\) two qubit gates. There is, thus, a sort of special structure, where most of the circuit is much more sparse than the circuit is a whole; BS&S tell us that by shuffling qubits around we can take advantage of this (provided \(n\) is sufficiently large, namely \(n\geq kd+1\)), and create an equivalent circuit that is \((d+3)\)-sparse.

Since calculating these sandwiches is a well-studied task, we also know how many times we need to run the circuits: about \(2^{4kd+k}\).¹²

Thus, we have a procedure that involves \((d+3)\)-sparse circuits of \(n+1\) qubits, requiring about \(2^{8kd+2k} = 2^{O(kd)}\) samples, with polynomial time classical calculation for each sample, meaning \(2^{O(kd)}\mathrm{poly}(n)\)-time classical calculation to add everything up. We can easily relable \(k = k'+1\) and \(n = n'-1\) to get the statement originally claimed! Well done.

As a concluding remark, note how deep a use we made of the fact that the original circuit was \(d\)-sparse. It shouldn’t be surprising, thus, that making a similar statement about the generality of quantum circuits (which is to say, the generality of procedures) is non-obvious!

Why “measure \(u\) times”? Cf. footnote 6; when you measure a quantum circuit you get a sample from a probability distribution, not the parameters of the distribution itself. So if you need to have an idea of the probability distribution — because, say, the relevant quantity is the mean of the distribution, which happens often — you need to measure multiple times. ↩
If you want to spot a fraudster, look out for promises of efficiently solving an NP-complete problem, like the travelling salesman problem. Or, for a more complete experience, try my QC nonsense bingo. ↩
Mr. Aaronson has a markedly computer-scientist approach to the field, which is also my favourite — and I’m supposedly a physicist! — but it must be acknowledged that it is not the only one. ↩
If, however, you’re looking for an even faster introduction to QC, here’s what I can give you: QC is essentially the computing model you get when you consider probability amplitudes, rather than probabilities, i.e., numbers that are in the range \(\vert z\vert < 1\) and that map to probabilities as \(\vert z\vert^2\). (It turns out such quantities are physical.) This is different from considering probabilities, because two objects with the same probability distribution may have different underlying probability amplitudes, such that they are mapped differently under the same operation. This has some strange implications for join probabilities. Got all that? ↩
As usual reality is much more interesting. ↩
Specifically how you convolve these numbers has to do with what each of the \(2^n\) values is: namely, each value is a probability amplitude for each bit string in \(0, \ldots, 2^n-1\). The most natural way to produce an output, then, is to sample a bit string with corresponding probability and perform some polynomial-time classical processing of it that results in a single bit. ↩
I wish! Google currently holds the record (at the time of writing), with about \(50\) qubits. ↩
You’d better hope so, as (for strong simulation) you’d need to keep track of a cool \(21430172143725346418968500981200036211228096234110672148875007767407021022498722449863967576313917162551893458351062936503742905713846280871969155149397149607869135549648461970842149210124742283755908364306092949967163882534797535118331087892154125829142392955373084335320859663305248773674411336138752\) values. ↩
You can break down anything into single-qubit and two-qubit gates, which is why this requirement makes sense. ↩
I’ll be referring to the three authors by their initials, as is also tradition. I think Bourbaki had the right idea. ↩
The Pauli matrices set tensor itself is a basis for \(SU(4)\). ↩
I’m trying to keep this light, but if you want to check it for yourself, you’ll need to see that the variance of the estimator you produce with the circuits is \(\leq 2^{k+1}\), which isn’t hard, and then use Hoeffding’s inequality. However, at that point I recommend just following the original publication :) ↩