Why I am optimistic about the silicon-photonic route to quantum computing

6 Mar

This is a short overview explaining how building a large-scale, silicon-photonic quantum computer has been reduced to the creation of good sources of 3-photon entangled states (and may simplify further). Given such sources, each photon needs to pass through a small, constant, number of components, interfering with at most 2 other spatially nearby photons, and current photonics engineering has already demonstrated the manufacture of thousands of components on two-dimensional semiconductor chips with performance that, once scaled up, allows the creation of tens of thousands of photons entangled in a state universal for quantum computation. At present the fully integrated, silicon-photonic architecture we envisage involves creating the required entangled states by starting with single-photons produced non-deterministically by pumping silicon waveguides (or cavities) combined with on-chip filters and nanowire superconducting detectors to herald that a photon has been produced. These sources are multiplexed into being near-deterministic, and the single photons then passed through an interferometer to non-deterministically produce small entangled states—necessarily multiplexed to near-determinism again. This is followed by a “ballistic” scattering of the small-scale entangled photons through an interferometer such that some photons are detected, leaving the remainder in a large-scale entangled state which is provably universal for quantum computing implemented by single-photon measurements. There are a large number of questions regarding the optimum ways to make and use the final cluster state, dealing with static imperfections, constructing the initial entangled photon sources and so on, that need to be investigated before we can aim for millions of qubits capable of billions of computational time steps. The focus in this article is on the theoretical side of such questions.

Topics

Superconducting devices, Integrated circuits, Waveguides, Aircraft, Nanowires, Interferometry, Photonic entanglement, Chemical elements, Quantum computing, Quantum information

I. INTRODUCTION

Photonics is the “ugly duckling” of approaches to quantum computing. There are two overarching reasons to try and nurture it into a swan.

The first reason is that in photonic integrated circuits (PICs) we believe that we can reduce stochastic noise levels several orders of magnitude below even optimistic estimates of such noise for matter-based approaches. For instance, raw stochastic error rates of 10−6 would, I believe, be a pessimistic estimate (see Section VI for an argument why current experiments indicate they are no bigger than 10−5 and why I would expect at least 10−8 is achievable with current devices). However, in photonics we have an inescapable (constant) overhead, due to the fact we use non-deterministic linear optical gates, of around 100 physical photons to end up with a single “raw computational” photon that is the equivalent of one normal qubit. Are we crazy? From this perspective no one has ever built a single “functional” photonic qubit yet! The reason we should not be committed to an asylum (yet) is that the overhead which photonics assumes will be subsumed by the standard error correction required. For example, in Ref. 1, the authors calculated that it requires 2×1052×105 matter qubits with a raw error rate of 10−3 (as well as many thousands of gates and measurements, etc.) to achieve a magic state2 with a stochastic error rate of 3×10−63×10−6⁠.

The second reason is that PICs are being vigorously pursued for classical computing purposes, and the core components necessary for the quantum architecture are already under investigation and optimization, in multiple variations, by classical photonics engineers. Moreover, the vast majority of this work is fully compatible with present-day fabrication techniques and standards (i.e., CMOS technology), so that while current classical PICs only contain several thousands of components (in a few hundred square microns),3 it is reasonable to expect PICs containing many hundreds of thousands of components soon. This has become even more likely with recent breakthroughs4,5 that prove PICs can not only be CMOS compatible but also they can be built with “zero-change” of current CMOS fabrication facilities. As such, contemplating PICs of many millions of photonic components is brave but not unreasonable. One may wonder if millions of components are really necessary. Earl Campbell and Joe O’Gorman have recently improved magic-state type protocols (which are some of the most resistant to stochastic noise) and have done careful estimates of resources required for running Shor’s algorithm to factorize numbers beyond those which are classically achievable. Their numbers indicate that we need to be able to swallow resource numbers qubits × time steps ≈1015−1017≈1015−1017⁠. CMOS technology is the only method known for manufacturing dynamical devices on this scale.

Typically matter-based approaches to quantum computing begin by building and controlling small numbers of qubits; the challenge is then scaling up. For the photonic architecture I will overview here, things are somewhat inverted: the primary challenge is efficiently producing small (3-photon) entangled states on a PIC. The large scale architecture becomes relatively trivial—it is just an interferometer built from components that are already able to be fabricated to high precision, sufficient to generate large amounts of entanglement in a cluster state6 universal for quantum computing. Given such sources, our overhead will then be less than 20 physical photons per final qubit in the cluster state, only two photons would undergo any kind of potentially noisy active element, no proper quantum memory is required, and all photons go through a constant number of photonic elements regardless of the computation size. Note that there are two “only a constant” overheads here that are both crucial to the viability of this architecture. The first is that every logical qubit in the final architecture comprises a constant number of photons, and the second is that each of these photons will itself only encounter a constant number of optical elements, which means the amount of loss per component will not need to decrease as we scale.

Because the architecture is based on a highly modular production of the cluster state, once we do have a single functional photonic qubit we not only have arbitrary numbers of such qubits, we also have the ability to perform arbitrary gates between them—another way in which this architecture is very different from matter-based approaches for which there is a big difference between single-qubit gates, two-qubit gates, and multi-qubit gates between distant qubits.

None of the preceding should be taken as advocating the abandonment of matter-based approaches (see the Appendix for a rant about this). Rather I want here to emphasize the considerable amount of “photonics specific” theory that remains to be done. Matter-based approaches benefit from many theoretical results that can abstract away the underlying physics of any particular implementation. However, most of this work is of marginal relevance to the key challenges of building a photonic quantum computer.

II. OPTICAL FREQUENCY PHOTONS AS QUBITS

For most physical realizations of a qubit, the natural environment in which it is immersed is a source of stochastic noise. By stochastic noise I will mean any random evolution of the qubit, the unknown specific realization of which differs from one computational time step to another. For some matter-based realizations, it is now possible to keep the qubit fairly well isolated until the time a gate needs to be performed on it, and the majority of stochastic errors arise when it is manipulated. Almost all theory of the quantum error correction and fault tolerance is based around combating the most common such errors.

For optical frequency photons manipulated by passive interferometers built from essentially perfect linear dielectric—such as waveguides etched into a semiconductor chip7—environmental stochastic noise is presently unmeasurable and expected to be very small. Essentially the energy scales (optical for the photon versus microwave for even room temperature surroundings) are too disparate for this to form any kind of worry until very large scale photonic devices are in hand. I am fond of glibly pointing out that photonic quantum information has already been proven stable against thermal decoherence for 11.5 × 109 yr, which is the time light emitted from the Lyman-α blob LAB-1 retains its polarization on its journey to earth. This is close to the theoretically maximum achievable lifetime for any qubit of 13.7 × 109 yr.

This is not to say there will be no stochastic noise on photonic qubits. The most relevant sources of such noise are the two active elements of the architecture, which are adjustable phase shifters and switches. Essentially they are a source of stochastic noise because we cannot expect them to operate identically from one use to the next. The imperfection is primarily that of the imprecision of the voltage control. In Section VI I will discuss switches and phase shifters in more detail. The over-arching challenge of the architecture discussed here is to minimize the use of active components.

A. Photons need to be indistinguishable from their neighbors

In addition to switches and phase shifters, our photon sources need considerable improvement. They are discussed in detail in Section VIII, but at a high level it is worth noting that the architecture we are proposing requires every photon to interfere with at most 3 other photons, spatially nearby, on the PIC. For the interference to work, ideally the photons would have the same wavepacket.8 It is not important per se whether they are a single frequency, only that their wavepackets are the same, but one way to ensure that they are close to identical is to filter the photons as close to a single frequency mode as possible. For this we can use integrated ring resonators.9,10 Filtering increases purity at the expense of increasing loss, but it is better to have loss, which is directly detectable, than undetectable stochastic errors. Impurity of single-mode entangled states can be distilled out extremely efficiently, limited only by the negligible thermal stochastic noise of the interferometers.11

To enhance interference, it is also possible to locally “trim” the components of the PIC9,12–14 using lasers, and this allows extremely precise manipulation of the final single photon state. (Trimming is also used extensively on classical integrated circuits.) Such trimming is “set and forget,” i.e., once it is done it does not require maintenance or consume power, but it changes the static properties of the device.

B. Static imperfection is not stochastic noise

Manufacturing imperfections on PICs are inescapable, but they are static, they do not drift with time. If the details of static imperfections (into which we can subsume other static errors like imperfections of the photon wavepacket and inaccuracy of the voltage controls) are unknown, then this leads to systematic errors. However photonics is unique (I think), in that we can use a classical field (laser light) to determine the interferometer’s parameters (the mode transformation it effects) to very high accuracy15–20 and post-production tuning can be used such that current fabrication tolerances allow for nearly perfect gate operations.21 These are the same parameters as those that govern the 2-photon interference crucial to the whole computer. Such a characterization can also be achieved with the repeated use of single photons instead, if such sources are more readily at hand on the chip. So, one way or the other, we actually expect to have very well known systematic imperfection to deal with.

Unknown systematic error is typically more benign than stochastic noise, and can (for example) be dealt with algorithmically,22 amongst other ways. Known systematic imperfection is much more benign again, although we have done little theory to really explore the implications. Some general things can be said however. Computational universality in the measurement-based setting we envision does not require a cluster state. Having a known state “close to” the cluster state will be sufficient,23 and rather than trying to somehow correct the state and then use it we likely would do better to simply use the ideas of Ref. 24 (and generalizations thereof) to perform computation with it directly in a way that effectively corrects the imperfection via the side classical computation that chooses measurement bases.

When I say a known state “close to” the cluster state suffices, this does not mean we need to produce a state with near unit fidelity (overlap with), the ideal cluster state. In fact it could have a vanishing overlap with it, because local imperfections can multiply to drive the state far from the ideal one, and still work fine. For example, while we do not yet have a complete picture of exactly how interferometer imperfections translate into logical imperfection of the large scale cluster state discussed in Section III, in simple examples the final N-qubit state takes the form

∏i∼jCZij∼|+∼⟩⊗Ni,∏�∼��∼|+∼⟩�⊗�,

where CZij∼��∼ and |+∼⟩|+∼⟩ are “close to” the ideal controlled-Z between qubits i and j and +1 eigenstate of Pauli X. Consider the simple case that CZij∼=CZij��∼=��⁠, while |+∼⟩i|+∼⟩� varies (but is known well) over each qubit. Then there are simple and highly precise filters (beamsplitter followed by detection), which we can apply to each photon individually, that with success probability dependent on |⟨+|+∼⟩i|2|⟨+|+∼⟩�|2⁠, either turn this into the ideal cluster state25 or remove the qubit in the standard way (i.e., via an effective computational basis measurement).

The reason I emphasize this over-simple example is that it demonstrates that although there may be a finite systematic imperfection on every qubit which varies over the whole device, the state still has infinite localizable entanglement (and is computationally universal) as long as |⟨+|+∼⟩i|2|⟨+|+∼⟩�|2 is close enough to 1. (How close depends on details of the cluster state created. For the bond-percolated lattice discussed in Sec. III, |⟨+|+∼⟩i|2∼0.96|⟨+|+∼⟩�|2∼0.96 would suffice.) To reiterate, the final state might have a vanishing overlap with the ideal state—we just need to know what the actual state is as precisely as possible. Ideally we would rather avoid a two-stage process of “state correction” followed by computation and rather just compute directly on the imperfect state.

Here are some of the theoretical questions which remain largely unexplored on this topic:

Is there a simple model that captures how do manufacturing imperfections translate into logical imperfection of the large scale entangled state we produce?
Can we apply the wealth of existing abstract theory regarding states universal for the measurement based quantum computing to determine how best to use these states?

III. LOGICAL OVERVIEW OF THE ARCHITECTURE

At the logical level, the current architecture we are proposing creates a large entangled cluster state,6 with several layers of the cluster “alive” at the same time. The algorithm implemented is determined by the choice of phase shifts (as computed by an efficient classical side-computation) that some of the photons experience before the measurement; all the rest of the architecture is fully modular. At this logical level, the cluster state being produced is a bond-percolated 3d lattice, as depicted in Fig. 1. Lattices where each vertex only has degree 4, such as the diamond lattice or the“surface code” lattice of Ref. 26, seem best, although they are not the only ones achievable from the 3-photon GHZ states which will be our fundamental resource.27

FIG. 1.

VIEW LARGEDOWNLOAD SLIDE

An example of a bond-percolated surface code cluster state that can be generated on a 2d photonic integrated circuit. Only a constant number of layers of the cluster in the z-direction will need to be present at any particular time. The red/black coloring of qubits is primarily to aid the eye, though for this lattice correspond to primal/dual qubits of the scheme in Refs. 26 and 28. Even for only a very small thickness—here 6 qubits—in the y-direction, the percolated lattice will extend O(104) qubits in the x and z directions.

In Fig. 1, the z-dimension of the cluster state shown should be thought of as time. Thankfully, not all the photons depicted would need to be “alive” simultaneously; in fact, in the cluster state computation, in principle one needs only to have the qubits within two bonds of the qubit currently being measured actually alive. However, due to the percolated nature of the lattice, the choice of measurements to make on the leading layer of photons depends on features of the lattice behind that layer. There is therefore a certain “computational window” of qubits which will necessarily need to be kept alive within the computer while choices of measurement basis are made.

There are two ways in which we can use such a lattice in a computation. In method 1, which was the original proposal,29 we simply carve through paths that encode single qubits as per standard cluster state computation. Preliminary simulations show that we can keep such paths going essentially indefinitely with a computational window of about 10-15 (this means we need a delay line of about 10-15 clock cycles, see Section VII). We also find that, although we have not yet incorporated any coding against loss, there is actually some serendipitous loss tolerance, on the order of a few percent. Specifically, detecting a lost photon we can attempt to logically punch it out using computational basis measurements. This works because our lattice has percolated quite far above the percolation threshold. Note that this recovery method is almost certainly not optimal even for method 1—various stabilizers can be measured to remove lost qubits as well.

Method 2 is to actually use the surface code from which this lattice originates, as per the ideas in Ref. 28. Given that this method automatically provides some tolerance to stochastic noise, it is likely preferable. However, unlike the lattice assumed there, here we are missing bonds (the specific locations of which we know in advance of performing the computational measurement). We also will have some photon loss that we only know has occurred at the point of detection. The computational method of Ref. 28 involves carving out deformable “tubes” measured in the computational basis (essentially the equivalent of the single qubit lines in method 1) with in-between (“vacuum”) qubits measured in the |±⟩|±⟩ eigenbasis of Pauli-X. Both the missing bonds and loss can potentially be tackled via the method of “super-stabilizers,” where we combine (in a side classical computation) stabilizers into larger ones and deform the tube surfaces suitably to swallow errors into tube interiors. This technique makes regular surface code computation highly resilient to loss,30,31 despite the fact it is, of course, a code designed to deal with stochastic errors. So our focus at present is trying to understand if method 2 really will be better than method 1 (a combined option in which we first “repair” the surface code lattice similar to method 1 also exists), but a full simulation and comparison have not yet been done.

Note that once we have built the cluster state, there is no difference between performing a logical two-qubit gate versus a one-qubit gate. That difference is one only of the patterns of single photon measurements. So the natural progression for photonics is very different to the conventional progression from single to two-qubit gates, each based on very different physics.

In Fig. 1 the depth in the y-direction is 6, and yet this lattice will, with high probability, have paths extending O(104) qubits in the x and z directions. We see therefore that the percolation is remarkably efficient—so far we have been unable to find a 2d lattice that we can percolate with such small initial resources as 3-photon GHZ states,45 and yet we need to go only a very small way into the third dimension to get large scale entanglement.

The x-direction is the one across which our logical qubits will be defined. Because the amount of 3-dimensionality required purely for the purposes of percolation is very small, if the goal is to use method 1, as per the original ideas of producing percolated cluster states,29 then this suggests that a “squashing down” of the third dimension onto a 2d chip would not cause too many problems. That is, it will only require a fixed number (about 6) of crossings for photons to reach the partner they need to interfere with, and exchanging photonic qubits for these sorts of distances is not an issue. (It could be argued that photonics should be looking beyond local, lattice-based codes because there are not the same spatial locality constraints as for matter qubits—for example, there are interesting finite-rate codes32 unavailable to “locality-constrained” qubits). This architecture would consist of sources at the one end of the chip, interferometers in the middle, and detectors at the far end. I will call this a 1 + 1 (one space, one time) dimensional architecture. However; if, as seems likely, we would rather follow method 2, then a 1 + 1 dimensional architecture is more problematic. Fault tolerance is achieved in the surface code lattice by having a large amount of qubits in the y-direction. Doing the same kind of lattice squashing would require the non-locality of the photonic interactions to grow as the size of the computation grows.

It turns out that it is possible33 to engineer a 2-dimensional photonic chip to allow for arbitrarily large 3-dimensional logical cluster states, in such a way that the number of photon crossings required is at most 1. It is this architecture that will be overviewed in Sec. IV.

Here are some of the theoretical questions which remain largely unexplored on this particular topic:

What are the good protocols for exploiting serendipitous loss tolerance?
Ultimately should we pursue method 1 or method 2 or some other variant/combination? What are the best “path finding” algorithms for each case? What are minimal computational windows required in each case?
How local can the classical side-computation of the measurement basis be? In method 1 we believe it can be highly local, as for the standard cluster state computation on a non-percolated lattice, but for method 2 much less is known.
Recently it was shown that almost all the well-studied quantum error correcting codes can be “clusterized” into a 3d lattice34 very similar to the surface code approach. Should we be using one of these variants, particularly as we are more concerned about loss tolerance than stochastic noise anyway? Should we give up on spatially local codes altogether?
Photons need not be qubits; they can readily encode higher dimensional systems, which, at least abstractly, can be advantageous.35,36 Can this be translated into a practical advantage?

IV. PHYSICAL OVERVIEW OF THE ARCHITECTURE

The big-picture view of the 2 + 1 dimensional architecture (see Refs. 33 and 37 for many more details) is depicted in Fig. 2. We imagine a large semiconductor wafer, probably silicon, divided into a square lattice. The two dimensions x and y of the wafer correspond to the x and y dimensions of the logical cluster state depicted in Fig. 1.

FIG. 2.

VIEW LARGEDOWNLOAD SLIDE

Unit cell of the wafer which extends in the x and y directions of the logical lattice of Fig. 1. Circles labelled “II” are boosted type-II fusion gates. Black squares are sources, each emitting a 3-photon GHZ state. Two of the 18 emitted photons need to be delayed before the measurement (the time delay corresponding to the z-direction of Fig. 1); these are computational photons and are the only ones undergoing an active phase shifter once the photons are emitted from the source. Note that the only crossing here is that of the two photons at the centre of the unit cell.

Each unit cell of the wafer contains 6 sources capable of producing 3-photon GHZ states (equivalent to a 3-photon linear cluster state) with high efficiency. Of these eighteen photons, two will be computational qubits in the final logical lattice. One of these will be ahead of the other, i.e., in Fig. 1 they will lie along the line of qubits that extend into the z-direction emanating from each x,y location (for aficionados of a surface code cluster state—one qubit is primal and one is dual). A delay (around 10-15 clock cycles due to the computational window mentioned above) is necessary before we know which basis these photons will be measured in. This is depicted as a winding waveguide delay (see Sec. VII) before the detector. Of the eighteen emitted photons it is only these two which pass through an active element (a phase shifter to choose the measurement basis). We call this architecture “ballistic” for hopefully obvious reasons.

Four of the sixteen non-computational photons are used to connect the unit cell to adjacent cells, by entering a “boosted type-II” fusion interferometer at the cell edge. Type-II fusion38 is an operation that with probability 1/2 joins up (fuses) the cluster state, the boosted form37 exploits the ideas of Refs. 39 and 40 to increase the success probability to 3/4. The boosting process itself consumes either four single photons or a Bell pair, so within each such “II” box indicated in the figure more sources are implicitly necessarily integrated on the chip. The remaining twelve non-computational photons all undergo fusion gates with each other within the cell.

Note that each photon’s world line in this architecture is really short. Even including the non-deterministic circuits to convert single photons into the 3-GHZ states, the “optical depth” of any given photon (the number of elements it passes from birth to measurement) is on the order of 10, and the number of other photons it ever needs to interfere with is at most 3. Contrast this with boson sampling41 which requires optical depths of several hundred elements and interference with many tens of other photons. Perhaps the easiest way to do boson sampling (though why would you) is to build a photonic quantum computer!

In terms of the initial 3-photon GHZ generation, coupling in photons from off-chip sources via fiber is possible, but this is clunky and ultimately hopefully avoidable. The sources will need to operate on some clock cycle that is at least in the megahertz but hopefully gigahertz range. I should say that although I lean towards being “as monolithic as possible,” excellent photonics engineers have expressed the sentiment to me that they envision an architecture like this as consisting of a large number of separate modular chips talking via low-loss interconnects. Either way what is important is that this architecture avoids multiple stages of massive, technologically demanding active switching and routing networks throughout the computation to construct complex resource states—without doing so the resource scalings would be prohibitive.42,43

The last thing to point out about the architectural schematic envisaged here is that a 2 + 1 configuration will likely require pump laser distributed around the chip and will also have some regions of the wafer devoted to classical electronics for the computations that determine the measurement bases. These may also be implemented in different layers.44 As mentioned already, ideally such a computation will use simple logic and be local. But to some extent the simplicity of this whole architecture arises because we have shunted into a classical computation much of what we used to need to deal with actively in a tricky quantum-coherent way. So perhaps I am just being greedy to want the classical computation to also be simple and local, and yet I am reasonably sure that such is feasible.

To summarize, if we had highly pure sources of 3-photon entanglement, compatible with CMOS fabrication, then our photonic architecture becomes incredibly simple: our overhead would be 16 physical photons per final qubit in the cluster state and only two photons would undergo any kind of potentially noisy active element. The rest of the computer is just an interferometer built from components that are already able to be fabricated to tolerances sufficient for us to generate large amounts of entanglement.

Here are some of the theoretical questions which remain largely unexplored on this topic:

Our current proposal is to use interferometers that are loss tolerant, in that we only herald off registered clicks (another improvement over the original percolation proposal29). However, we know we can give up such loss intrinsic tolerance for higher success probabilities (e.g., type-I fusion38), which leads to more connected lattices, etc. Is this beneficial?
We have no proof that 3-GHZ initial states are necessary. Perhaps ballistic protocols with Bell pairs or even single photons are possible, particularly if we take into account options, as per the previous question, which do not destroy two photons at a time. As far as I know only one, not-particularly smart, theorist has devoted about two days to this question.89
Is it possible to boost fusion to higher probabilities (with ancillas that are no more entangled than 3-GHZ states)? We have no methods (even numerical) for verifying the optimality of any of the current interferometric gates we use! Forget playing Go,46 cannot we get some artificial intelligence onto this problem?

V. DETECTORS

Ten years ago, poor detectors would have sat alongside sources as the primary obstacle to photonic quantum computing. With the advent of fully integrable superconducting detectors47,48 that can be parrellized to make them number resolving,49,50 the problem is well on the way to being fully solved.

Direct measurement of detector efficiencies is dominated by other circuit loss,51 but efficiency percentage in the high 90’s, jitter in the low 10’s of picoseconds, reset times on the order of 10’s of nanoseconds, and vanishing rates of dark counts are considered achievable. It should be emphasized, however, that even very small improvements in detectors can lead to large savings downstream in terms of the overall scale of the computer. This is particularly true if our sources are of a multiplexed variety.

There does not seem to be much interesting a theorist can investigate about detectors per se, so we just encourage our experimental colleagues to make them better. Oh—and getting them to work at room temperature would help make lab tours for theorists more interesting.

VI. SWITCHES/PHASE SHIFTERS

A phase shifter in a Mach-Zender configuration can act as a switch. Phase shifters can be built on a thermo-electric effect, but these are slow and in general we expect they will need to be built on some variation of an electro-optical effect, or, if speed of operation is not crucial, on an electro-mechanical effect. Nanoscale electro-mechanical devices are slow (though megahertz achievable) but do have the advantage that they are extremely low loss when in the “open” position, unlike those based on electro-optical effects.

There are a very wide variety of such devices being investigated (within classical photonics engineering), and I am not well-versed enough to summarize them all. There seem to be no good physical reasons that these types of switches cannot have large performance increases in the near future. There will, however, always be tradeoffs: at present, in silicon, they can be very fast (e.g., gigahertz52,53) but then are lossy; they can be slow (e.g., megahertz54) and very low loss, and a wide variety of performance characteristics in-between.

As switches and phase shifters are the primary source of stochastic noise, it is worth understanding some of the limiting factors in switch performance. Static limitations such as fabrication tolerances, accuracy of the voltage source, polarization/spectral-width/other wavepacket issues, and so on lead to imperfections, as discussed in Section II, which can be well characterized and dealt with much more easily than stochastic noise. The dominant source of stochastic noise (at cryogenic temperatures drowning out those from induced refractive index changes due to thermodynamic fluctuations/vibrations, etc.) is expected to be electrical noise (thermal, shot noise, 1/f, etc.) on the voltage source.

It should be pointed out that if we do have a source of high quality GHZ states then the photons will (at most) have to pass through one switch/phase shifter before the measurement. In particular, computational photons will need to be measured in either the Pauli-X or Z bases if we take a surface code + magic states approach. The former amounts to interfering the two modes that comprise the qubit on a 50:50 beamsplitter and the latter to just measuring the two modes directly with no interference. One option is to send the two modes through a Mach-Zender and adjust the relative phase in the paths to 0 or π/2�/2⁠.

An alternative is to use a switch to direct the two photonic qubit modes toward some static measurement setups, i.e., either direct them straight toward some detectors or direct them toward a static 50:50 beamsplitter followed by detectors. The latter method has a possible advantage in that we can characterize and then tune the static imperfections of the effective X, Z measurements very well. This “set-and-forget” possibility has already been demonstrated to work with incredible accuracy.9,55 Also, since X-measurements are more sensitive, perhaps we make them the default and switch only to a Z-measurement. More interestingly, from a theory perspective, things can be arranged so that if the switch does not operate perfectly, in as much as the photon goes the wrong way, all that happens is that we obtain a measurement in the undesired basis. But it is presumably better (and algorithmically correctable) to do a good clean measurement in the wrong basis than measure in the right basis but get the wrong outcome because of stochastic errors. So if, as seems likely, demanding less perfect performance from our switches can reduce stochastic errors then this is worth investigating.

The upshot is that the stochastic noise in the switches is probably at worst going to affect the relative phase between the two paths being directed towards the 50:50 beamsplitter for the X-measurement. The fact that this error is primarily of Pauli-Z type is also of relevance once we consider the quantum error correction and fault tolerance more abstractly. There may well be fancy engineering tricks to minimize this. For example, perhaps we can drive both switches from the same voltage in such a way that first order fluctuations cause opposite sign phase shifts (that is just a vague theorists’ idea!) or use a nano electro-mechanical switch (NEMS) that has this measurement as its open-position static default. But it would be nice to get an estimate of how low we can make this noise with current technology before worrying about future possibilities.

Despite persistent nagging on my part, very little experimental effort has been made to try and obtain an accurate direct measurement of how low the stochastic noise can be made in photonics. The numbers are small, so it is difficult, but presumably by suitable cascading of devices (and perhaps incorporating randomized benchmarking techniques56) the effect can be amplified. We can, however, obtain an indirect estimate.58 It is currently possible to build a Mach-Zender type switch57 with an extinction ratio of about −50 dB. If the imperfectness of the extinction were completely due to a random phase shift ΔφΔ� in the arms of a perfectly balanced interferometer (it most certainly is not, it is mainly due to static imperfection!) then we would say sin2Δφ≈(Δφ)2≈10−5sin2Δ�≈(Δ�)2≈10−5⁠. Such a phase fluctuation in the X measurement is equivalent to having a Pauli-Z error rate of 10−5. But it would be surprising if we cannot get down to around 10−8, which is roughly what shot noise at the 1 V, 1 mA range would imply. Of course I am then told that sub-shot-noise operation is possible, but I have no idea at present how low that might push these numbers.90

Here are some of the theoretical questions which remain largely unexplored on this topic:

Are there advantages to using different speed switches for different roles within the full architecture?
Is there really an advantage of “set-and-forget” over Mach-Zender type choices of the measurement basis?
Should we aim for lower performance switches to reduce stochastic errors and try to algorithmically correct incorrect measurements being performed?

VII. DELAYS

Photonic quantum memory is a delicate technology (see, e.g., Ref. 59), requiring controlled interaction of light with matter. It is crucial to the viability of photonic quantum computing that this ballistic architecture does not require such memory because at present their performance is very far from what we require, and they are not integrable. Also, by involving matter we would need to be concerned about the addition of the stochastic error within the memory. Of course if high-performance memory does become available (and integrable) it would likely be extremely useful.

All we need for this architecture are fixed time delays. Delaying a photon a fixed number of clock cycles, involves only a long, passive, waveguide. On-chip delays that basically wind a waveguide into a compact spiral have already been constructed60,61 with lengths up to 27 m! If we have 100 MHz switches, this is already longer than we require. Note, however, that these delays are considerably larger than any other part of the architecture, and the lowest loss integrated versions (that remarkably potentially achieve losses of less than 0.01 dB/m60) are for materials other than purely silicon. As such it may well be that the architecture involves delay lines in a different layer of the chip, and the issue of coupling between these layers becomes important.

VIII. SOURCES, MULTIPLEXING, AND ALL THAT

Methods for producing single photons have been under active investigation in a wide variety of systems. Given 6 single photons, it is possible to produce a 3-photon GHZ state but only with probability 1/32.62 In a perfect world, we would be able to produce 3-photon GHZ states directly. This certainly can be done, for example, the photon machine gun63 has recently been built using a self-assembled quantum dot by the Gershoni group.64 Many other options for building the photon machine gun exist—NV centers in diamond, trapped atoms, and ions are all technologies for which the current method of producing single photons is more or less amenable to producing 3-photon GHZ states instead (though admittedly it is more complicated). The extent to which any such method is compatible with being implemented directly on chip varies greatly, as do achievable repetition rates, as does the indistinguishability of the photons produced from different sources. The photons could be produced off-chip and coupled in by fiber; present state of the art for such coupling is an efficiency of −0.36 dB.65 Most concerning, however, is that these methods are vulnerable to stochastic noise being transferred into the photons from the matter responsible for producing them.

A technology we have considerable experience with that can produce highly pure, single photons is spontaneous parametric downconversion (SPDC), where a single photon is heralded via the detection of its partner. Separate SPDC sources can produce highly indistinguishable photons.66 The states from a single SPDC source are highly pure, it should be said, except for the time-bin degree of freedom—the emission time is random. Roughly speaking, such a source can produce single photons at 10’s of gigahertz repetition rates, with a (controllable) probability of any particular time bin containing a photon ranging from about 0.1% to 20% (we assume efficient detectors herald any multi-photon emissions, which are then ignored). Particularly relevant for our purposes is that the spontaneous four-wave mixing (SFWM) variant of such nonlinear optical photon production is fully compatible, in fact already manufacturable, within an integrated architecture—it does not even require any new type of component to be introduced to the fabrication process. This is because we can use the small nonlinearity of the semiconductor itself67 and obtain heralded single photons by the pulsed pumping of long waveguides or of waveguide cavities.4,68–72

The key issue for such sources is to produce photons that are highly indistinguishable. Typically the photon pairs produced (which may be quite different frequencies) will be entangled, in which case we need to measure the heralding photon to collapse its partner onto a pure state. Doing this measurement requires strongly filtering the heralding photon to ensure high purity of the partner, and although extremely good filters exist such filtering decreases the probability per pulse of the photons. Alternatively sources which pump cavities directly produce the photon pairs in a product state using high-quality waveguide cavities. Typically indistinguishability is measured in terms of visibility of a Hong-Ou-Mandel (HOM) test.73 Although extremely high visibilities of the photon pairs produced in a single nonlinear process are possible (showing how strong the correlation/entanglement is), we cannot expect nearly as high visibilities from heralded photons produced from two different such processes. My suspicion is that non-unit visibility arising from two photons being in slightly different pure states is much less detrimental than that caused by the photons being slightly mixed, but a detailed investigation needs to be done. Regardless of how pure we produce these heralded photons, they are still produced by a spontaneous process and so emerge in random time bins.

A nice idea74 to remove the time-bin randomness is that of multiplexing. This can take either a spatial form—place many sources in parallel and use a switching network to switch out a photon from a source you know has fired—or a temporal form—use adjustable delays on a single stream of photons to shuffle them backwards into desired time bins. Both methods are achievable with integrated photonics,75–78 but both require the photons to traverse switches, which are both lossy and our only significant source of stochastic noise and so minimizing switching is the key concern.

Most of what I will say here is generic, but I will use temporal multiplexing for definiteness. We have a stream of photons where each time bin contains a photon with probability p. Using S + 1 switches, a photon can be delayed by any amount of time between 0 and 2S−1, as depicted in Fig. 3. What I will refer to as standard multiplexing divides the stream of random photons into blocks of length 2S. If a block contains a photon, it shuffles that photon to the back of (i.e., the last time bin in) the block. If there is more than one photon in the block, then the extra photons are ignored/discarded. The final stream of photons can then be viewed as having a new time bin period of 2S, with a new probability of a photon in each bin of

p′=1−(1−p)2S.�′=1−(1−�)2�.

(1)

If the switches were lossless, this would let us readily turn our non-deterministic source into an effectively deterministic one.

FIG. 3.

VIEW LARGEDOWNLOAD SLIDE

Cascading delays of length 1, 2, 4, 8,…… allows S + 1 switches to delay a photon by any time between 0 and 2S−1. The figure makes it look as if the delays are in fiber, but in reality they would be waveguide “spirals” as in Section VII.

This would not, however, be the end of the need for multiplexing. We need to turn these deterministic single photons into 3-GHZ states or Bell pairs. The methods we use for doing so are probabilistic.33,62,79 As such we need to multiplex again after the probabilistic production of the desired states.

A. Asynchronous computation and relative-time multiplexing

It turns out that large improvements are possible to standard multiplexing. The big-picture realization is that there is no reason the photonic quantum computer need be run perfectly synchronously—an asynchronous operation will do. The whole reason we are trying to reduce the indeterminism in the source is that any particular photon is going to need to arrive at some optical element at the same time as some other photon with which it will interfere. But it certainly need not arrive at the same time as all other photons in the quantum computer. This leads to the concept of relative time multiplexing (RMUX).

Full RMUX is a bit complicated,80 so first consider a simpler “sliding window” method that captures some of the idea. Imagine we want to impinge two photons on a beamsplitter. We have two streams of photons in (known but) random time bins and each photon can be delayed up to 2S −1 bins. We consider the first incoming photon from either stream, and ask whether there is a photon from the other stream that is no more than 2S −1 bins behind the first. If so then we delay the first photon appropriately to meet it, and repeat. If not we discard the first photon and look to the second incoming photon from either stream. The expected number of “matched photon pairs” can increase significantly because we are not forcing the photons into a particular clock cycle, rather we are only adjusting indeterminism in their relative time separation.

In practice we would use the generalization of the sliding window idea to either four streams of photons to produce Bell pairs79 or to six streams62 to directly produce the 3-GHZs we so desire. We also will likely only put delays on one of the photon streams to reduce the number of switches. But the sliding window technique still discards many potentially useful photons because only one photon per stream within the current window is used. At low values of p, this is not significant, but at higher values we are failing to use many photons. Therefore it makes sense to look at more sophisticated matching algorithms (well studied in graph theory), and these do significantly better again. The simplest comparison for the simple case of two photon streams is shown in Fig. 4. There is a subtlety however, when the density of photons is high there is the potential for “collision” of photons within the delay network, and this must be accounted for.

FIG. 4.

VIEW LARGEDOWNLOAD SLIDE

Comparison between relative time (purple) and standard (blue) multiplexing for two streams of photons, each with 20% probability of a photon per pulse, where photons from one stream are delayed in order to “partner up” with photons from another. The yield is the final percentage of pulses that end up with two photons impinging on a beamsplitter. The yield in standard multiplexing eventually drops because only one of the many photons produced in a large standard multiplexer actually gets used.

Where RMUX really comes into its own is when we have the streams of 3-GHZs that we are now fusing to produce our final logical lattice, as overviewed above (details of the fusing configurations are in Ref. 37). Non-computational photons within each unit cell (i.e., the majority), which effectively live only to produce all the bonds of the final logical lattice, need not synchronize with any photon other than the one with which it undergoes fusion (there also being photons within the boosted fusion gate to consider). This saves considerably on the switching and delays required. The whole computer operates asynchronously—but it is one of the wonders of quantum entanglement that the ordering of events on different members of an entangled state is irrelevant to the quantum correlations (despite quantum nonlocality as it were), and we can utilize this to great advantage. Moreover, different photons (typically the computational qubits) can be judiciously chosen to end up having to go through many fewer switches than others. The whole procedure is quite complicated and still undergoing considerable optimization.

Here are some of the theoretical questions which remain largely unexplored on this topic:

Are there better delay networks, particularly ones designed to avoid collisions?
What are the best combinations of matching and multiplexing procedures, given we do not need synchronous GHZ states?
How much advantage can we gain from asynchronous operation, taking into account the fact that we cannot have arbitrarily long delays before measuring computational photons?
Not all photons have to go through the same number of switches, so which apportionment of switching asymmetry is best?

B. Dump the pump?

One of the reasons our switch technology needs to be so good is that our photonic qubits are interacting directly with the switch, and qubits are delicate creatures. Interestingly this is not, in fact, necessary. A different way to multiplex heralded photons is to turn off the pump instead.

The idea is easiest explained in a bulk-optics SPDC schematic although this is not, of course, how we would implement it on-chip. Recall the famous two-crystal experiment of Zou, Wang, and Mandel81 where two nonlinear crystals were aligned and pumped in such a way that it was impossible to tell if a detected photon had originated in the first crystal or the second. The idea of dump-the-pump is to cascade such a process, but rather than do some fancy interference experiment with the idler photons as Zou et al. did, we simply use them to herald when a photon has been produced (see Fig. 5). If a photon is detected, then we turn off the pump to all the remaining crystals. We view the experiment of Zou et al. as proving the important point that this process can produce indistinguishable photons.

FIG. 5.

VIEW LARGEDOWNLOAD SLIDE

Dump-the-pump: A way to avoid switching of our photonic qubits.

The nice thing about dump-the-pump is that the potentially noisy active element is operating on the pump. Not only is the frequency of the pump very different to that of the photonic qubits but also it is only necessary to extinguish the pump not to coherently switch it per se. This opens up quite different switching devices to those currently considered for multiplexing. The disadvantage is that once a photon is produced it needs to pass through all the crystals, a potentially lossy process. The procedure is also “linear depth,” you do not get the compactness of a logarithmic depth switching network as in Fig. 3.

At a high level, one can think of dump-the pump as effectively taking a long nonlinear crystal and slicing it, so that we remove the uncertainty as to where exactly the photon was produced. From one strongly pumped SPDC source, we could at most hope for a probability of single photon emission around 1/5. I suspect dump-the-pump could be most useful to raise this probability to around 2/3–3/4 (requiring 5 or 6 cascaded downconversions) at which point multiplexing the single photons as discussed above becomes highly efficient and possibly the better way to progress.

Here are some of the theoretical questions which remain largely unexplored on this topic:

SPDC can be used to produce heralded Bell pairs and GHZ states not just single photons; is there an advantage to using dump-the-pump (or regular multiplexing for that matter) on those setups?
How efficiently and how fast can we extinguish a pump? How do we optimize the likely combination of dump-the-pump and RMUX, given estimates of switch performance parameters attainable in the near future?
We tend to focus on single photon encodings of our quantum information. But good number-resolving detectors means we can have heralded sources of higher number Fock states, possibly with higher efficiency than single photons (imagine dump-the-pump without switching off the pump). Two such modes containing N photons in total are equivalent to a spin-N + 1 particle, which can be easily evolved under the corresponding representation of SU(2). Little is known about the computational power of such systems; is “photonic universality” easily achieved in such encodings?

IX. LOSS TOLERANCE

The most important error mechanism to affect photonic quantum computing is photon loss, which can arise at any point during the photon’s journey. Unlike the stochastic error, however, loss error is known to have occurred when a particular detector fails to register a photon. Crucially, by making use of variations of the type-II fusion gates only,38 it is possible to design the architecture to only make use of events that trigger off photon clicks. Depending on the details these are not even necessarily from number-resolving detectors.

Two mechanisms for dealing with losses have been mentioned so far: punching out loss by the measurement and logical renormalization to super-stabilizers. Both are fortunate byproducts of some unrelated architectural flexibility, but in reality we should aim to build in loss tolerance explicitly. Understanding how to do this within a ballistic architecture such as the one discussed here is ongoing work. Although I cannot yet provide a mechanism for very high loss tolerance in this particular architecture, in this section I want to overview why I am optimistic about loss in general being an extremely benign error mechanism. In fact I will argue that in the absence of stochastic noise anything less than 100% loss is in principle tolerable!

It was shown in Refs. 82 and 83 that there exist cluster states (“tree codes”) for which the overall loss probability per photon can be as high as 50%. There are several disadvantages to this scheme. It requires quantum memory (itself lossy) in order to build the trees using probabilistic gates. Also the trees themselves involve very large numbers of qubits when the loss rate is high. To some extent these disadvantages have been reduced by magic state injection methods because the tree codes were designed to work in the presence of both high loss and arbitrary non-Clifford measurements. However, with magic state injection techniques2 the measurements need only be Pauli-X and Z basis measurements, and the whole tree structure is designed to allow inference of these outcomes counterfactually when a photon is lost. Thus the restriction to the X, Z measurement can reduce the size of the trees required significantly. It forms the basis of a recent quantum repeater proposal.84

If we want to protect against loss there is a different method, particularly suited to magic state type protocols. For the basic idea, consider teleporting an arbitrary state |ψ⟩|�⟩ along a linear cluster state “wire” as in Fig. 6(a) via X measurements. Losing even one qubit causes the propagation to fail. However if, as in Fig 6(b), we replace every qubit of the original wire by L qubits, completely connected to its neighbors as shown, then as long as at least one qubit per column survives, the teleportation will succeed.

FIG. 6.

VIEW LARGEDOWNLOAD SLIDE

Teleportation along a linear cluster state (a) and its loss tolerant “crazy graph” encoding (b) where each qubit of the linear cluster is replaced by a column of L qubits.

For any loss probability per qubit ϵ<1�<1 and wire of length N, we can choose “polynomially large” L such that (1−ϵL)N≈1(1−��)�≈1⁠. So in principle we can tolerate 100% loss for this process! Of course as L increases the number of bonds in the graph (that we call the “crazy graph”) also increases (polynomially). Therefore unless we are able to perform extremely low stochastic error gates to build this cluster state (as we do anticipate in photonics) we will run into problems. To some extent, this overstates the problem however. First, any Y or Z Pauli error on a qubit will flip the value of the X-measurement outcome on that qubit. However, in the absence of noise all the qubits in the column would register the same value outcome for the X-measurement. Thus a simple majority vote amongst those qubits in a column which are not lost will reveal what the correct measurement outcome should be. Second, building a highly connected graph does not automatically involve using lots of quantum gates, tricks to do with graph operations like local complementation can be employed. An example of a building block for the crazy graph built by the measurement on a low vertex degree graph is shown in Fig. 7.

FIG. 7.

VIEW LARGEDOWNLOAD SLIDE

Highly connected graphs do not necessarily involve as many CZ gates as it may at first seem. In the left hand graph each qubit has bounded degree. Measuring the qubits labelled X in the X-eigenbasis and applying a unitary Hadamard transformation to those labelled H result in the right hand piece of the crazy graph. This generalizes—increasing the number of qubits in the rings on the left increases the number of qubits in each column on the right.

The loss tolerance for the wire can be extended for any Clifford gate circuit we want to perform via cluster state methods. We imagine building up a crazy graph cluster state “on the fly” by attaching in chunks of the pre-built cluster state just ahead of where the measurement based computation currently has reached. Hadamard gates are performed by simply changing whether the number of qubits in the underlying linear cluster is odd or even. An S-gate can be implemented attaching on a graph state pre-prepared as illustrated in Fig. 8(b). Note that the central qubit measured in the Y-basis is not loss protected, however this piece of the graph state will have been prepared in advance, and so cases where that qubit was missing will have been post-selected out. (Or, more naturally, we will have just built the graph that results when the Y-measurement is performed).

FIG. 8.

VIEW LARGEDOWNLOAD SLIDE

A piece of the cluster state that will implement the Clifford S gate. The central Pauli-Y measurement would be attempted before attachment to the computational cluster state, and the piece only retained if the qubit was not lost.

Similar pieces of the cluster state can be built to do CZ gates between qubits and so on.

Here are some of the theoretical questions which remain largely unexplored on this topic:

Is there a way to integrate highly loss tolerant code ideas into a ballistic architecture, without using quantum memory or extremely long delays?
Crazy graph is tolerant to stochastic Pauli noise once it has been built, but are there mechanisms for creating the graph that do not amplify such stochastic noise too drastically?
Can a crazy graph improve quantum repeaters, along the lines exploited for tree codes?

X. CONCLUSIONS

Since the seminal paper by Knill, Laflamme, and Milburn,85 huge progress has been made towards an all-photonic quantum computer. On the experimental side, we now have integrated photonics, including the detectors of our dreams ten years ago. On the theory side, gone is the circuit model and gates that only worked with approximately unit efficiency even when using arbitrarily large numbers of photons, gone are the giant ancillary resource states requiring huge quantum memory to prepare offline, gone is the need for highly nonlocal operations, etc. In short, gone are the eye-watering overheads which would forever have destined linear optical quantum computing to the fate of a mere intellectual curiosity, irrespective of the progress on the experimental front. Despite all this progress, it is clear that there are still a large number of potentially highly significant simplifications to the architecture for theorists to tackle.

I have focussed here on a purely photonic implementation. However the architecture has obvious potential for hybrid light/matter approaches, as well as networks of PICs with lossy interconnects,86,87 and these should be investigated further.

Finally, given that many of the performance characteristics required can also already be met in optical fiber technology, and switches there can be significantly of lower loss,88 there is some temptation to imagine an all-fiber photonic quantum computer being built in a disused aircraft hangar, with huge spools of fiber delays hanging off walls and ceilings—a quantum ENIAC as it were. As with the original ENIAC, I suspect this possibility will only be explored seriously by the military research establishment. If so, I guess I will never know. Apparently the US government thinks I am not to be trusted.

2017

Alex Mack

Why I am optimistic about the silicon-photonic route to quantum computing

I. INTRODUCTION

II. OPTICAL FREQUENCY PHOTONS AS QUBITS

A. Photons need to be indistinguishable from their neighbors

B. Static imperfection is not stochastic noise

III. LOGICAL OVERVIEW OF THE ARCHITECTURE

IV. PHYSICAL OVERVIEW OF THE ARCHITECTURE

V. DETECTORS

VI. SWITCHES/PHASE SHIFTERS

VII. DELAYS

VIII. SOURCES, MULTIPLEXING, AND ALL THAT

A. Asynchronous computation and relative-time multiplexing

B. Dump the pump?

IX. LOSS TOLERANCE

X. CONCLUSIONS

Quantum Computing: 1 million qubits

Towards a Quantum Computer