On the capacities of bipartite Hamiltonians and unitary gates

We consider interactions as bidirectional channels. We investigate the capacities for interaction Hamiltonians and nonlocal unitary gates to generate entanglement and transmit classical information. We give analytic expressions for the entanglement generating capacity and entanglement-assisted one-way classical communication capacity of interactions, and show that these quantities are additive, so that the asymptotic capacities equal the corresponding 1-shot capacities. We give general bounds on other capacities, discuss some examples, and conclude with some open questions.

communication are strictly incomparable resources: no one of them can be generated even from an infinite supply of the other two. Thus the capacity of a given interaction to create each of the three resources cannot exceed the amount used to perform the interaction. For example, the cnot can be simulated using 1 ebit 1 , 1 forward, and 1 backward classical bit [10], so that the entanglement capacity and both forward and backward classical capacities are upper bounded by 1. Second, the efficiency for one interaction to simulate another provides bounds on the relative efficiency for the interactions to generate resources. For example, any capacity of swap is at most 3 times that of cnot since the swap can be written as 3 cnots.
In this paper, we focus on tasks (2) and (3), and investigate the capacities of a unitary interaction to generate entanglement and perform classical communication. The unitary interaction can be a nonlocal Hamiltonian or gate. We are primarily concerned with the asymptotic limit, when many uses of the gate (or a long duration of the Hamiltonian) are given. We consider an interaction on two d-dimensional systems, allow unlimited local operations, local ancillas of arbitrarily large dimensions, and arbitrary input state. We give expressions for the entanglement generating capacity and the entanglement-assisted one-way classical capacity [11,12,13,14] of an interaction. We show that these quantities are additive in the sense that the amount of entanglement or classical communication generated by n uses of a gate is n times the amount generated by one use.

Motivation II -interactions as bidirectional channels
The capacities of generating entanglement and communication are well studied in the context of a noiseless or noisy channel connecting a sender (Alice) to a receiver (Bob). In this usual model of a quantum channel, a quantum system is physically transported from Alice to Bob, with possible changes (noise) caused by a quantum operation [15] (i.e. a trace-preserving, completely positive, or TCP, linear map).
This model of a channel is unidirectional -Bob cannot send information to Alice. However, such unidirectional interactions are a special case of quantum interactions, and in general, a quantum system cannot affect another without being changed itself. For example, the cnot (defined in the computational basis) operates in reverse direction in the conjugate basis, and transmits an equivalent amount of information in either direction when used in conjunction with other local gates.
In view of this, we generalize the usual model of a quantum channel to take into account the bidirectional nature of a quantum interaction. We define a "bidirectional channel" as a bipartite quantum operation.
Alice and Bob each inputs a state to the "bidirectional channel" and receives an output. This work can be viewed as studying the entanglement and classical capacities in bidirectional channels, restricted to the unitary case.
Throughout this paper a protocol means a procedure that uses a nonlocal gate one or more times, or a nonlocal Hamiltonian for some total amount of time, possibly also consuming and/or producing various amounts of other standard resources, such as entanglement and classical communication in each direction.
We always allow unlimited local operations, and we are interested in a protocol's net yield (production minus consumption) of standard resources per gate use or unit interaction time. The protocol can be written as a quantum circuit, and the net effect can be described as a bipartite quantum operation, with bipartite input and output. We call this quantum operation the protocol as well. In general there is a tradeoff among the yields of various resources when the protocol is varied. For example, cnot can transmit a classical bit in the forward or backward direction, but not both. As back communication is 1 The unit ebit is defined to be the amount of entanglement in the EPR state 1 generic in a bidirectional channel, a protocol using it is generically interactive.
In the next two subsections, we provide more detailed introductions to the two tasks studied in this paper and discuss closely related work.

Entanglement generating capacity of bidirectional channels
In Ref. [16], the quantum communication capacity of a channel is shown to be equal to its capacity for generating pure entanglement; a greater quantum capacity typically results if two-way classical communication is allowed. Likewise, a bidirectional channel (bipartite quantum operation) can be used to generate entanglement. Simple examples are considered in Refs. [1,2,3,4]. Reference [6] considers the average amount of entanglement created by one use of a nonlocal operation on a distribution of product states. Reference [7] classifies the type of entanglement (bound or distillable) that can be created from product states. Reference [8] considers the optimal 1-shot rate of creating entanglement using an arbitrary 2-qubit Hamiltonian on possibly entangled pure input states without local ancillas. Reference [9] considers the optimal amount of entanglement created by one use of an arbitrary 2-qubit gate on pure product input states without ancillas. References [8,9] also exhibit examples in which local ancillas increase the amount of entanglement created.
In this paper, we follow the philosophy of Ref. [16] and investigate the asymptotic entanglement generating capacity of a bidirectional channel acting on two d-dimensional systems. Contrary to previous work [6,7,8,9], we do not restrict ourselves to qubit systems, we allow arbitrary local ancillas and input states (including entangled or mixed states), and we consider the most general asymptotic protocols. We also consider the effect of many auxiliary resources including classical communication. We restrict our attention to unitary bidirectional channels. We derive the expression for the capacity, show that it is additive, and discuss the optimal protocol.
Leifer, Henderson, and Linden [17] have independently shown, by similar arguments, that the asymptotic entanglement generating capacity on pure input states is an optimization over a 1-shot expression. They also investigate the capacities for many 2-qubit gates with low dimension ancillas both analytically and numerically.

Classical communication capacities of bidirectional channels
The classical capacity of an ordinary (unidirectional) quantum channel is in general affected by the availability of auxiliary resources, such as entanglement [18] and back communication. For a general noisy quantum channel, the capacity without auxiliary resources is found in Refs. [11,12], and that with unlimited supply of pure entanglement is found in Refs. [13,14]. The capacity for a noiseless quantum channel with unlimited supply of a certain noisy entangled state is found in Refs. [19,20,21,22].
In treating bidirectional channels, we again follow the philosophy of Refs. [11,12,13,14] and consider various asymptotic classical capacities of unitary bidirectional channels of arbitrary dimensions. We allow unlimited local resources, including free instantaneous local operations, and the freedom for Alice and Bob to attach and remove local ancillas. Shared randomness is also given as a resource. Our philosophy is also similar to Shannon's study of the classical capacities of classical two-way communication channels [23].
A new ingredient in the case of bidirectional channels is the simultaneous forward and backward communication, resulting in a pair of achievable rates. One can define many classical capacities other than the forward and the backward capacities. Generally, there is a tradeoff between the forward and backward rates.
Our long term goals are to obtain expressions for these capacities, understand the tradeoff between forward and backward communication, and relate the quantities to other capacities such as the entanglement generating capacity. In this paper, we define various asymptotic capacities of bidirectional channels. We obtain an expression for the one-way (forward or backward) entanglement-assisted classical capacity for any arbitrary nonlocal gate or Hamiltonian, and the protocol achieving it. The asymptotic capacity is achieved by a 1-shot expression, as an optimization over input ensembles for one use of the gate.
We remark that other independent investigations on optimal methods to perform classical communication in low dimensions without entanglement assistance are being conducted [24,25,26].

Structure and assumptions of the paper
In the next section, we discuss in detail the problem of entanglement generation, and derive the ex- Local ancillas of arbitrarily large but finite dimensions and unlimited local operations.
We do not consider ancillas of infinite dimensions and do not know if they can be more useful.
Though we have motivated the discussion with both Hamiltonians and gates, we now argue that it is sufficient to focus on gates only. This is because Hamiltonian capacities are simply gate capacities in the limit of infinitesimal gates, so that any Hamiltonian capacity can be obtained from the corresponding gate capacity. A protocol using a Hamiltonian is similar to one using a gate, with additional freedom on how long each free Hamiltonian evolution can last before being interspersed with local operations.
However, different durations of evolution are simply concatenation of different numbers of infinitesimal ones. Thus any Hamiltonian capacity GH can be expressed in terms of the corresponding gate capacity GU : 2 Entanglement capacity of bidirectional channels

Main idea
Before a formal treatment of the entanglement capacity, we first illustrate our central idea with the following example. Let Ee be the entropy of entanglement [33]. Suppose the goal is to increase Ee as much as possible. Different uses of U can be used sequentially or in parallel, and be interspersed by LOCC (Local Operation and Classical Communication). We allow an arbitrary pure input state with ancillas, possibly entangled over different uses of U . What is the optimal strategy? The answer turns out to be very simple. Consider the quantity which represents the entanglement generated by optimizing the input state for one use of U . Let |ψ * ABA ′ B ′ attain the supremum. Then applying individual uses of U to copies of |ψ * ABA ′ B ′ is asymptotically optimal. This is because the total increase in Ee in any asymptotic protocol is at most the sum of the increases due to each use of U , each is no greater than ∆EU .
In the following, we will develop this idea rigorously in the most general setting. We consider mixed input states and different entanglement measures, and analyze the roles of various auxiliary resources.

Definitions and summary of results
The entanglement capacity of a gate U can only be defined when the entanglement measures for the input and output and the available auxiliary resources are specified.
Traditionally, entanglement is a qualitative phenomenon. The theory of quantifying entanglement is not complete, though much progress has been made [27] (Refs. [28,29] give informative reviews). Based on the transformation properties of entangled states, measures of entanglement are defined which are very different in the asymptotic and nonasymptotic regimes. Different measures in the same regime can also be inequivalent.
The entanglement generated by a protocol P on an input ρ is intuitively where Ein, Eout are the input and output entanglement measures specified in the problem. We can now define the entanglement capacity of U : The t-shot entanglement capacity of U is the maximum amount of entanglement generated per use of of U by any protocol Pt that uses U t times, auxiliary resources labelled by r, and local resources specified in Sec. 1.5 (ancillas of arbitrary but finite dimensions and unlimited local operations). We consider two possible t-shot capacities, depending on the allowed input state: 1. when the input is restricted to be a product state (without loss of generality |00 , since Alice and Bob can locally transform any product state into any other product state): where the superscript ∅ denotes "starting from nothing", and 2. when there is no restriction on the input state: where the superscript * denotes an optimization over all possible input states.
The corresponding asymptotic capacities are: In our notation, we assume that an entanglement measure is written as Ex where the subscript x labels the measure. For example, in Ein, Eout, and the notation for the above capacities, the "in" and "out" are placeholders for the entanglement measures being referred to. Whenever Ein = Eout = Ex we simplify the notation of the capacity to E (··· ) x,U . Finally, an arbitrary entanglement measure is written as E without the subscript, and the capacity is written as E in→out,U has an operational meaning that a supply of the initial state ρ is available at a price Ein(ρ). This is a resource, because the ability to create ρ with an average cost Ein(ρ) is generally not guaranteed (unless Ein is the entanglement cost [30]). We refer to this as "the resource * " throughout the paper. In contrast, no such resource is assumed in the capacity Since we are interested in asymptotic capacities, we are primarily concerned with asymptotic measures.
These include the entanglement cost Ec [30] and the distillable entanglement E d [16]. We also study the entanglement of formation E f [16], which is closely related to Ec. All of these measures coincide with the entropy of entanglement Ee on pure states. As our results apply to more general measures, and may be useful in other contexts, we follow an abstract approach [28,29], which requires more technicalities in our arguments. However, the essence can be made clear by relating to our simplified example, and we leave this step as an exercise to the readers.
The auxiliary resources can be divided into three types according to their quantities. The first type is given in an amount that is negligible or can be recovered at the end of the protocol. For example, sublinear (in the number of uses of U ) amount of resources in the asymptotic case are negligible, and catalytic resources in the 1-shot case are used and regenerated (for example, see Ref. [31]). However, we need not consider these resources. In the 1-shot case, catalytic resources are a subset of the resource * and need not be treated separately. In the asymptotic case, sublinear amount of any resource can be produced at a vanishing average cost and does not affect the asymptotic capacity. This is because any nonlocal gate has nonzero capacity to create pure entanglement and to perform classical communication (see Sec. 6) from which any other resource can be produced. The second type of resources are at least linear in the number of uses of U . To consider these resources is an important open question, but it is out of scope of the present paper. The third type of resources are unlimited and free. In the context of generating entanglement, we focus on the auxiliary resource of unlimited 2-way classical communication, labelled by "cc".
Our results can be summarized in terms of the entanglement capacities just defined.
Thus given the resource * , the 1-shot capacity is no less than the asymptotic capacity. We give a sufficient condition for additivity, E • In Sec. 2.5 we consider the maximum gain of pure entanglement. This is given by E (t, * ,cc) c→d,U , and we show that it is equal to E (t, * ,cc) c,U . Thus, the optimal protocol in Sec. 2.4 applies, without the need of resources * or cc.

Expression for E
( * , cc) U when E in = E out = E Throughout this subsection, Ein = Eout = E and both resources * , cc are available.
|i A|i B be the n × n maximally entangled state shared between Alice and Bob.
Unless otherwise stated, E satisfies the following assumptions, but is otherwise arbitrary: A1. E = 0 for product states.
A2. E is invariant under local unitaries.
A1-3 are basic axioms for entanglement measures, while A4 is needed to define the "net" amount of entanglement generated by a protocol. Generally, we do not assume E is normalized (∀n E(|Φn ) = log n) and will state the assumption explicitly when it is needed.
We first state a lemma based on the following simple observation [3,4]. Alice and Bob can implement U if Alice teleports her input to Bob, who applies U locally in his own laboratory and teleports her output to her. This consumes two copies of |Φ d and 2 log d bits of classical communication in each direction.
Proof: For any protocol with t uses of U and LOCC, modify it by replacing each use of U with its double teleportation implementation. Let ρin and ρout be the input and output of the original protocol.
The modified protocol uses only LOCC and has input ρin ⊗ |Φ d Φ d | ⊗2t and output ρout. Applying A3-4 to the modified protocol, 2tE(|Φ d ) ≤ E(ρout) − E(ρin). ✷ We now proceed to prove Theorem 1, which says that the asymptotic capacity is equal to the 1-shot capacity given the resources * and cc. This is done by proving two separate inequalities, each is referred to as a half of the Theorem.
Proof: Since LOCC operations cannot increase entanglement, the best 1-use protocol has the form: The only optimization is over the initial state, and thus In Eq. (7) and throughout the paper, the subscripts of an operator denote the systems being acted on. As an aside, E . We say that E is weakly additive and strong subadditivity and superadditivity are defined by replacing the equality in the corresponding additivity definitions by the inequalities ≤ and ≥ respectively.
Theorem 1 (2nd half ): If E is weakly additive or subadditive on the optimal input in Eq. (7), and is weakly additive or superadditive on the optimal output, E Proof: Consider the t-use protocol that repeats the 1-use protocol in Fig. (6) t times, each on a separate copy of the optimal input. The entanglement generated is at least tE and Ee as special cases). It is an open problem whether E f is weakly additivity, however, we will prove that the second half of Theorem 1 still holds for E = E f .
In general, we say that "Theorem 1 holds" whenever both halves of Theorem 1 hold. Eq. (7)  . We discuss how to obtain this supply of optimal input in Sec. 2.4.

In the expression for E
(1, * ,cc) U in Eq. (7), the supremum is taken over finite but arbitrarily large dimensional ancillas A ′ B ′ . This can also be viewed as a limiting quantity as the ancilla dimensions increase.
Lemma 2: Suppose we restrict to n-dimensional A ′ , B ′ in Eq. (7), and denote the subsequent maximization by en. Then, limn→∞ en = E Proof: Let ρ AA ′ BB ′ be a state attaining the supremum in Eq. (7) to within ǫ. We omit the system label when it is AA ′ BB ′ in Lemmas 3-4. Let ρ = i λi|ψi ψi| be an optimal decomposition so that E f (ρ) = i λiE f (|ψi ). Then, The second inequality is obtained by applying convexity of E f to the first term, and the definition of the optimal decomposition in the second term. Thus, E Proof: Let ρ attain the supremum in Eq. (7) up to ǫ/2. That is, For any [30]. Substitute this into Eq. (9) with δ = ǫ/2, Using weak additivity of Ec and the fact E f ≥ Ec, the first term in the RHS of Eq. (10) can be rewritten: But the expression in the bracket represents the entanglement of formation generated by a certain m-use protocol, and is no greater than mE by the 1st half of Theorem 1. Together with Lemma 3, Finally, we replace E f by Ec on the RHS since they coincide on pure states, − ǫ can be attained on a pure state. Convexity in E f,c is required in our proofs of Lemmas 3 and 4, but unlike E f and Ec, E d may not be convex [32].
Note that Theorem 1 is concerned with weak additivity of the entanglement capacity of bidirectional channels, i.e., the protocol uses only one type of nonlocal gate. We can consider strong additivity when different types of nonlocal gates are available: Theorem 1S (1st half ): For a protocol with ni uses of the gate Ui, the maximum amount of entanglement generated (given * , cc) is no more than i niE Theorem 1S (2nd half ): If E is strongly additive or subadditive on the optimal input and strongly additive or superadditive on the optimal output for each Ui, then repeating ni times the 1-shot protocol for Ui for each i generates an amount of entanglement no less than i niE In particular, Theorem 1S holds for E = E f,c , and the entanglement capacities are strongly additive given * and cc.

Auxiliary resources are unnecessary when
In this subsection, we show that the resource * is unnecessary in the optimal asymptotic protocol in the previous subsection (repeating the optimal 1-shot protocol) for the specific measures Ein = Eout = E c,f .
By Lemma 4, when E = Ec, the optimal input and output of the 1-shot protocol are pure. The amount − ǫ can be generated by adapting an argument in [8]. The protocol first creates m copies of the pure optimal input |ψ ⊗m AA ′ BB ′ (inefficiently), and then repeats the cycle: (1) apply U ⊗m , (2) "concentrate" [33] the outputs to EPR pairs and (3)  inefficiently is also negligible when the cycle is repeated sufficiently many times. The same argument The asymptotic entanglement capacity for E = E c,f under the most general setting in Sec. 2.2 can be generated with no initial entanglement and without * nor cc. The core part of the optimal protocol is basically 1-shot -tensor product of the optimal 1-shot protocol. The only collective steps, entanglement concentration and dilution, are auxiliary.
Since no initial entanglement is required for the optimal asymptotic generation of E f and Ec, one can  [15,38], denoted as Sch(·), is the unique number of terms in the above "Schmidt decomposition." The λi are called the Schmidt coefficients. We will repeatedly use the fact that the Schmidt number of a state is nonincreasing under LOCC and that Sch(U |ψ ) ≤ Sch(U ) Sch(|ψ ) (see Ch. 6.4.2 of [38]).
can be achieved without initial entanglement, the initial state has Schmidt number 1, and the final state of a t-use protocol has Schmidt number ≤ Sch(U ) t . Hence, the output entropy of entanglement is ≤ t log Sch(U ), and E Proof: This is the entanglement generated when |ψ Interested readers can repeat the above analysis for other measures. It holds for E = E d if the optimal input ρ is pure or if ρ satisfies Ec(ρ) = E d (ρ) (by replacing concentration with distillation of the optimal output and replenishing the optimal input ρ using E d (ρ) EPR pair per copy of ρ and classical communication (see Appendix A)).

Different input and output entanglement measures
Each choice of entanglement measures for the input and output can be given an operational meaning.
We consider the important example of creating EPR pairs in this subsection, which requires different entanglement measures for the input and output. Alice and Bob fabricate the possibly mixed optimal input state and distill entanglement from the output. Thus, the appropriate choices for the input and output entanglement measures are the entanglement cost Ec and the distillable entanglement Pt denote an optimal t-shot protocol and the corresponding quantum operation, and let ρ be the optimal input to within ǫ (again we omit the system label AA ′ BB ′ ). Then = sup where we have used Corollary 4 in Sec. 2.3 to obtain Eq. (14). This means that the asymptotic capacity to create EPR pairs is and the protocol in Sec. 2.4 is optimal for creating EPR pairs even in the most general setting described in Sec. 2.2.
Furthermore, since the optimal output is pure, and E d , Ec are strongly additive on pure states strong ( * ,cc) c→d,U .
In particular, when Ein = Eout = E f , or when Ein = Ec, and Eout = Ec or E d , the asymptotic capacities become independent of the availability of * and cc, and they are all equal to E (1, * ) e,U = ∆EU in Eq. (1). The only capacity mentioned above that is different from ∆EU is E (1,∅) in→out,U . We will study these two capacities in Sec. 6 and Sec. 8 in more detail.
As an aside, when c,U for all finite t. This is because tE 2 Note that t is finite, but we have chosen the asymptotic measures Ec and E d . We are mainly interested in protocols with large t, with the understanding that the 1-shot capacity E (1, * ) c→d,U is achieved with collective pre-and post-processing.

Classical capacities of bidirectional channels
If Alice and Bob have access to a nonlocal gate U to couple their systems, then the classical communication capacity of U is the maximum asymptotic number of classical bits that can be reliably transmitted per use of U . Communication can be achieved simultaneously in both directions, with possible tradeoffs.
Free local resources as stated in Sec. 1.5 and shared classical randomness are always allowed.
In the context of classical communication, the most important auxiliary resource is free entanglement.
Communication is called "assisted/unassisted" when the resource is/not available.
The most general protocol (see Sec. 1.2) with t uses of U can be represented as: In Fig. (16), Definition 2 A pair of rates (R→, R←) is said to be achievable by a gate U if it is possible to intersperse t uses of U with local unitaries Aj ⊗Bj, such that an n1-bit message M1 from Alice to Bob and an n2-bit message M2 from Bob to Alice are communicated with high fidelity, and In the above definition, the fidelity F between two states |ψ ψ| and ρ is given by ψ|ρ|ψ (this is a simplified expression when one of the states is pure).
We first discuss unassisted capacities, and the assisted capacities are defined in exactly the same way.
Each gate U defines a region of achievable unassisted rate-pairs (R→, R←). The region is convex by using mixed strategies. Furthermore, if (R→, R←) is achievable, so is any (R ′ → , R ′ ← ) where R ′ → ≤ R→ and R ′ ← ≤ R←. In particular, the boundary of the achievable region never has positive slope (see Fig. (18)). Thus, the forward and backward capacities can always be achieved at the boundary points, and can be defined respectively as C→,U = sup{R : (R, 0) is achievable by U } C←,U = sup{R : (0, R) is achievable by U } We can also define various bidirectional capacities, for example, the duplex and the total capacities: We omit the subscript U when the notation is too cumbersome. The following is a schematic diagram for the achievable region and the definitions of the various capacities. We present all the known properties and intentionally show the features that are not ruled out, such as the asymmetry of the region, and the nonzero curvature of the boundary. In general, little is known about the unassisted achievable region of (R→, R←) besides the convexity and the monotonicity of its boundary. The most perplexing question is perhaps whether the region has reflective symmetry about R→ = R←, which implies C→ = C← and C+ = 2C↔. Refs. [8,9] show that any two-qubit gate or Hamiltonian is locally equivalent to one with Alice and Bob interchanged, so that the achievable region is indeed symmetric. This implies the conjecture in [3] that the one-shot forward and backward unassisted capacities are equal. In higher dimensions, [40] shows that there are Hamiltonians (and so unitary gates) that are intrinsically asymmetric. However, it remains open whether the achievable rate pairs are symmetric, or more weakly, whether C→ = C← or C+ = 2C↔.
Assisted capacities C E →,U , C E ←,U , C E ↔,U , C E +,U can be defined in exactly the same manner, now the ancilla |ψanc A ′ B ′ is maximally entangled instead of being |0r A ′ |0r B ′ in the definition of the achievable rate Entanglement assistance greatly simplifies the analysis of the classical capacities CE of the usual (uni-directional) quantum channels. 3 An expression for CE has been found and proved to be strongly additive [13,14]. The study of CE also provides useful upper bounds for the unassisted capacities and insights to the classification of channels [41]. In the next section, we derive a simple expression for C E →,U and C E ←,U , the 1-way (forward or backward) entanglement-assisted capacity of any bidirectional channel. Surprisingly, this capacity is also strongly additive, as in the unidirectional case! Comparison of the two problems of generating entanglement and classical communication will be given in Sec. 5, and the two resulting capacities are related in Sec. 6.

.1 Preliminaries and definitions
In this section we derive expressions for C E →,U and C E ←,U , as defined in Eq. (17) with |ψanc being a maximally entangled state. Without loss of generality, we focus on C E →,U . It can be evaluated using the general framework of 1-way classical communication with quantum resources [42,11,12]. In this framework, suppose classical messages i, occurring with probabilities pi, are encoded in the "signal states" ηi received by Bob, forming an ensemble E = {pi, ηi}. The information on i obtained by measuring a signal state is upper bounded by the Holevo information χ for the ensemble E , defined as The Holevo-Schumacher-Westmoreland (HSW) Theorem states that this amount of mutual information per signal state is achievable given the ability to transmit an asymptotically large number of signal states.
(See [11,12] and Ch. 12.3.2 of [15].) We will see that the optimal methods to generate EPR pairs (see Secs. 2.4-2.5) and entanglement-assisted classical communication have many similarities. The respective goals are to maximize the increase in entanglement and the Holevo information. The optimal asymptotic strategies in both cases are to repeat the 1-shot protocol, with an optimal input state in the former and with an optimal input ensemble in the latter. In the case of entanglement generation, allowing the most general 1-shot optimal input with arbitrary ancillas and initial entanglement makes the 1-shot capacity equal to the asymptotic ones.
Likewise, we will allow the most general 1-shot input ensemble for assisted classical communication, and will show that the resulting 1-shot capacity is equal to the asymptotic capacities by establishing a method to "replenish" the optimal input ensemble (analogous to concentration and dilution in entanglement generation).
Let E = {pi, |ψi AA ′ BB ′ } be an ensemble of bipartite states. A trace-preserving operation acts on E by acting on each component state (preserving its probability). For example, we will write U E = We have the following definitions analogous to those in Sec. 2.2:

Definition 3
The t-shot Holevo information capacity of U is the maximum increase in Holevo information per use of U due to any protocol Pt that uses U t times, the auxiliary resources labelled r, and the local resources specified in Sec. 1.5. There are two possible t-shot capacities, depending on the allowed input ensembles: 1. when the input ensemble E0 is restricted to satisfy χ (tr AA ′ E0) = 0 ∆ χ (t,∅,r) 2. when the input ensemble E is unrestricted: Since we always assume free entanglement as an auxiliary resource, and we always focus on forward capacity, we omit r = E and → in the above notation: We have Note that it is unnecessary to consider mixed state ensembles in Eq. (20) and Eq. (21) -we can replace a mixed state ρ AA ′ BB ′ by its purification |ψ AA ′ A ′′ BB ′ , where A ′′ is the purifying system, without affecting In the next two subsections, we will prove C E →,U = ∆ χ (1, * ) U . We first prove that C E →,U ≤ ∆ χ (1, * ) U , and then we describe a protocol to achieve the upper bound, thereby proving additivity and providing an optimal asymptotic strategy.

An additive upper bound
We first prove an analogue of Lemma 1: Lemma 5: C E →,U ≤ 2 log d and C E ←,U ≤ 2 log d.
Proof: Consider a t-use protocol. Replace each use of U by double teleportation (see Lemma 1). If the original protocol consumes and produces Cin and Cout bits of forward communication, the modified protocol consumes and produces Cin + 2 t log d and Cout bits of forward communication. By causality [43] of the modified protocol 2 t log d ≥ Cout − Cin. Hence C E →,U ≤ 2 log d. Similarly C E ←,U ≤ 2 log d. (Note that the above proof is stronger than we need, since we have allowed Cin = 0.) ✷ Consider the best 1-shot protocol to increase the Holevo information. Since local operations do not increase mutual information, the optimal 1-shot protocol is to just apply U , as in Fig. (6). Thus where the supremum is over the most general bipartite pure state ensemble E = {pi, |ψi AA ′ BB ′ }.
We now consider the asymptotic problem. Using the same idea that proves Theorem 1 (1st half), we obtain the following analogue.
Theorem 2 (1st half ): ∆ χ (t, * ) Proof: Consider the most general protocol Pt with t uses of U (such as depicted in Fig. (16)). Let E be an arbitrary bipartite input ensemble. Then, the total increase in χ is upper bounded by the sum of the stepwise increases. Since local operations cannot increase χ , and the increase in χ by each use of U is bounded by Eq. (24), In optimal asymptotic entanglement generation, the following basic cycle is repeated: (1) convert EPR pairs into n copies of the optimal input state, (2) apply the gate to each, (3) convert the n copies of optimal output state into EPR pairs.
More EPR pairs are obtained in (3) than used in (1) -as excess entanglement generated.
In entanglement-assisted classical communication, we want a similar basic cycle: (1) convert classical communication to create n states drawn from the optimal input ensemble, (2) apply the gate to each state, (3) convert the states from the optimal output ensemble into classical communication.
Step (1) is called remote state preparation [35,44,45] (RSP), a procedure whereby Alice helps Bob to construct quantum states of her choice in his laboratory using entanglement and classical communication.
In RSP, Alice performs a measurement on her half of the shared entangled state, sends the outcome to Bob, who conditioned on the outcome operates on his half of the shared entangled state to complete the RSP. It is known [46] how to approximately prepare n pure bipartite states from an ensemble E with free entanglement and n χ (tr AA ′ E ) + o(n) bits of classical communication.
Step (3)  When describing and analyzing the protocol, we loosely call E the optimal ensemble achieving the supremum in Eq. (24). For arbitrarily small ǫ, E is chosen so that Since how ǫ enters the following analysis is obvious, and the analysis is independent of the choice of ǫ and E , ǫ is omitted for simplicity.
Protocol that achieves C E →,U = ∆ χ (1, * ) , and an RSP instruction Ri of length n χ (tr AA ′ E ) for Bob to create a state |φi+1 ∈ E ⊗n such that U ⊗n |φi+1 ∈ (U E ) ⊗n encodes Ni+1 (by the HSW Thm). In order to generate Ri for Ni, Alice needs to determine |φi+1 and to perform her measurement for the RSP of |φi+1 . This in turns requires knowledge of Ni+1. So Alice first computes the last message N k (in which M k is known and R k is irrelevant), classically calculates |φ k , performs measurement for the RSP of |φ k to find out R k−1 in N k−1 , works her way backwards through N k−1 , · · · , N1, determining |φi from Ni and performing measurement for RSP for |φi for decreasing i.
• Quantum protocol: Alice uses the given initial classical communication to create |φ1 , which she shares with Bob. Then U ⊗n is applied to convert it to U ⊗n |φ1 , Bob reads off the message N1, which consists of R1 to instruct him to do RSP for |φ2 and so on.
The protocol is summarized in Fig. (26).
In Fig. (26), RSPiA denotes Alice's RSP measurement to obtain the instruction Ri for Bob to prepare |φi . RSPiB denotes Bob's conditional operation to complete the preparation of |φi .
The initial amount of classical communication can be created by Alice and Bob using cn uses of U inefficiently, for some constant c. 4 The communication rate is We have not yet discussed small inaccuracies and inefficiencies in the protocol. The asymptotic correctness of this protocol comes from the asymptotic reliability of its component pieces: RSP and the HSW Thm. However, since errors and inefficiencies accumulate over many rounds, we need to choose the rates of increase of n and k slightly more carefully.
Suppose that preparing a member of E ⊗n with RSP requires n( χ (tr AA ′ E ) + δ rsp n ) bits of communication and has error ǫ rsp n , where δ rsp n , ǫ rsp n → 0 as n → ∞. Similarly, a state in (U E ) ⊗n provides n( χ (tr AA ′ U E ) − δ hsw n ) bits of information with error ǫ hsw n , where again δ hsw n , ǫ hsw n → 0 as n → ∞. Combining these into δn = δ rsp n + δ hsw n and ǫn = ǫ rsp n + ǫ hsw n , we find that the communication rate is 28) and the total error is kǫn. This vanishes if one chooses k first, and then chooses n such that ǫnk is small (n thus depends on k).
We summarize the order of the limits. First, choose the optimal ensemble E to approximate ∆ χ (1, * ) Second, choose k large to make c/k negligible (to overcome the initial cost). Finally choose n large to make both of kǫn and δn vanish. As this protocol does not require initial mutual information, it follows that: Putting these together gives: Thus initial mutual information does not increase the asymptotic capacity, analogous to entanglement generation. Finally, we generalize Theorem 2 to prove strong additivity: Theorem 2S: The classical communication achievable by ni uses of Ui is asymptotically i ni ∆ χ (1, * ) U i .
Proof: The argument that proves Theorem 2 (1st half) can be applied to prove that the amount of communication generated is no more than i ni∆ χ (1, * ) U i , which is achieved by applying the optimal protocol for each Ui separately.

Additivity
We conclude this section with two observations about additivity: • We emphasize that in Theorem 2 (1st half), the Holevo bound is applied to the output of a general protocol Pt with possibly entangled inputs to different uses of U . Thus the 1-way entanglement-assisted capacity for unitary bidirectional channels is strongly additive independent of whether the Holevo information χ is additive or superadditive.
• In the optimal asymptotic protocol, the n copies of U are applied to n states each chosen from the optimal input ensemble. Thus, entangling the inputs to different uses of U does not improve C E →,U .

Discussion
Despite the many similarities between generating entanglement and entanglement-assisted classical communication, there is an important difference. Communication cannot be stored and be used later. In particular, Alice needs to work backwards in our optimal entanglement-assisted communication protocol, so that the classical messages need to be known at the beginning of the protocol to share the initial cost.
In contrast, entanglement can be stored. The optimal entanglement generation protocol can be stopped and resumed at arbitrary times.
We can generalize the first half of Theorems 1 and 2 to any other quantity which is monotonic under the given resources, as long as a sufficiently general input (e.g. state or ensemble) is allowed for the 1-shot capacity. In particular, the input should possess all the properties the output may possess. If in addition the quantity is weakly additive or subadditive on the optimal input and weakly additive or superadditive on the optimal output, repeating the optimal 1-shot protocol allows the upper bound to be attained asymptotically, and additivity holds.
We end this section with a discussion on the parallel versus sequential applications of bidirectional channels in a protocol. Note that there is no such distinction for unidirectional channels (in the absence of back channels), as the output state of a given application of the channel is with the receiver and can never be used as an input for later uses. For bidirectional channels, there are sequential schemes that cannot be made parallel. For example, the protocol for entanglement-assisted 1-way classical communication in Sec. 4.3 cannot be made parallel. Sequential schemes are always at least as powerful as parallel ones.
The opposite is true in the asymptotic regime, in which case any capacity of U ⊗n (i.e. one must apply n copies of U in parallel) is equal to n times the capacity of U . The proof is simple -let Pt be any protocol that uses U sequentially. A particular t-use protocol for U ⊗n is to run n copies of Pt in parallel. Thus the t-shot capacity of U ⊗n is no worse than n times that of U , and equality holds.

Other general bounds
We have proved a few simple general bounds: E We now derive other general bounds that hold for all U . We focus on the entropy of entanglement Ee, and on the two capacities E  Alice can send a noisy bit to Bob with the following t-use protocol. Bob inputs |Φ d ⊗t BB ′ to all t uses of U . To send "0" Alice inputs |Φ d ⊗t AA ′ to share tE0 ebit with Bob. To send "1", Alice inputs |0 A to the first use of U , takes the output and uses it as the input to the second use, and so on, so that their final entanglement is no more than log d. Thus different messages from Alice result in very different amount of entanglement at the end of the protocol. Using Fannes' inequality [47,48] where ρi is the reduced density matrices of Bob when Alice sends i. For any ǫ > 0, ∃ t such that E0 − ǫ ≤ log d tr|ρ0 − ρ1| and Bob can distinguish ρ0 from ρ1 with nonzero advantage. It means that the t-use protocol then simulates a noisy classical channel with nonzero capacity and C→,U > 0. Obviously Proof: Suppose a t-use protocol Pt transmits na bits from Alice to Bob and n b bits from Bob to Alice with fidelity 1−ǫ. Recall from Sec. 3 that Pt can be assumed unitary with the ancillas starting in the state |0r A ′ |0r B ′ where r is a share random variable. Let |x A|y B carry the messages to be communicated, where x and y are na-and n b -bit strings. Then, by definition (Eq. (17)), the state change is given by: By Uhlmann's Theorem [49], there are normalized states |cxy A ′ B ′ and |exy ABA ′ B ′ such that and tr A ′ B ′ |exy exy| ABA ′ B ′ has support orthogonal to the span of |y A|x B .
To prove E (∅) U ≥ C+,U , we simply change the inputs to the protocol so that it creates entanglement.
Alice's input system A is now in a maximally entangled state with another ancilla A ′′ , each with 2 na dimensions, and similarly for Bob. Thus the input state is given by where x and y are summed over their possible values. The output is given by To calculate E(|η (ǫ) ), we first calculate E(|η (0) ) for E(|η (0) ) is simply the entropy of Alice's reduced density matrix, which can be found by the "Joint Entropy Theorem" (Eq. (1.58) in [15]).
From Eq. (34) and Eq. (37) we can lower bound the entanglement generated per use of U : As t increases, ǫ can be made arbitrarily small and 1 t (na + n b ) → C+,U . Furthermore, 2 log d + (na + n b )/t → 2 log d + C+,U ≤ 4 log d is well bounded. The above equation then implies E (∅) U ≥ C+,U . ✷ Remark: In the proof above, it is crucial to bound Sch(|η (ǫ) ) and Sch(|η (0) ) as functions of t, and our bound is based on having a product initial state. Furthermore, the two limits t → ∞ and ǫ → 0 are dependent. Thus one cannot assume ǫ → 0 as a separate premise in the above proof and extra care is needed in how the limits are taken.
After this paper was first posted, Berry and Sanders [50] proved that if the capacity is achievable by an To adapt the proof of bound 2 for C E U † in the general case when ǫ > 0 will require an explicit bound on Sch(|η (ǫ) ) and Sch(|η (0) ) and knowledge of how various inaccuracies vanish asymptotically, so as to specify how various dependent limits should be taken. So far, we do not see how this can be done.
In the following, we prove a weaker bound U † for ǫ > 0, by adapting the proof of bound 2 and an idea from [50], as well as using details on the optimal protocol for achieving C E →,U and an improved method for RSP of bipartite pure entangled state that uses less entanglement than the method in Ref. [46].
Before we present the proof, we give an interpretation of E (∅) since U † creates as much entanglement on the input |ψ as U can destroy on U † |ψ . Note that to disentangle a state unitarily is a nonlocal task. We now turn to our proof.
Proof: We omit details already given in the proof of bound 2. Let Pt be a unitary protocol transmitting na bits from Alice to Bob with fidelity 1 − ǫ. The ancillas are initially in the maximally entangled state where log M is the amount of initial entanglement required to assist the communication. Let |x A carry the na-bit message of Alice. The state change is given by: where tr A ′ B ′ |ex ex| ABA ′ B ′ has support orthogonal to |0 A|x B .
In the entanglement generation protocol, Alice inputs half of |Φ2na AA ′′ while Bob still inputs |0 B . The input and output states are given by For Applying the definition of the entanglement destroying capacity to Eq. (40), Hence, Using Thus and In particular, consider the entanglement-assisted communication protocol in Sec. 4.3. For any δE > 0, ∃E Following Eq. (28), t = cn + nk for some constant c and the rate is na where δn → 0 as n → ∞. The total error is ǫ = kǫn, where limn→∞ ǫn = 0. The RSP method in [46] can be improved [51] to prepare n states from an ensemble E with n χ (tr AA ′ E ) + o(n) cbits and nS(ρB) + o(n) ≤ n log M0 ebits whereρB is the average reduced density matrix of the ensemble as seen from Bob, so that M ≤ M t 0 . Putting all these parameters into Eq. (48), For any δ > 0, choose 1. E such that δE < δ/5, 2. k such that c k C E →,U < δ/5 and 2/e(c + k) ≤ δ/5 so that 2/en(c + k) ≤ δ/5, 3. n such that δn < δ/5 and ǫ = kǫn small enough for 2 √ ǫ 3 log M0 + 6 log d + na t < δ/5. ✷ To summarize, for all U , we have We now return briefly to Hamiltonian capacities. Recall from Sec. 1.5 that any Hamiltonian capacity can be expressed in terms of the corresponding gate capacity, GH = lims→0 1 s G U =e −iHs . The finiteness of GH is not immediate from the above definition. Even though G U =e −iHs → 0 when s → 0 due to continuity, it is not guaranteed that G U =e −iHs ≤ O(s). One may argue that physically, the rate should be finite, but the availability of unlimited local resources complicates the argument. We now provide a proof of the finiteness of the Hamiltonian capacities [52]. We have assumed U acting on a d × d bipartite system. We note here that all the results discussed hold for a nonlocal gate (or Hamiltonian) acting on a d1 × d2 system (without loss of generality, d1 ≤ d2).
The interested reader can easily verify that all the arguments hold in this case, because the fact d1 = d2 is never used in the proofs. We also note a subtle observation, that the d1 × d2 case is not described by embedding the operation in a d2 × d2 system by taking the direct sum with a d2 − d1 dimensional identity matrix acting on the side of lower dimension.

Open questions and examples
We have found expressions for the entanglement capacity and the entanglement-assisted classical capacity

Open questions
• How large do the ancillas A ′ B ′ need to be in the optimal input for entanglement generation? How large do A ′ B ′ need to be, and how many states are needed in the optimal ensemble for entanglement-assisted classical communication? These are important for numerical studies of the capacities.
• Will infinite dimensional ancillas improve the entanglement capacity and the entanglement-assisted 1-way classical capacity? Will an ensemble with an infinite number of members improve the latter?
• How do the forward and backward rates trade off with each other (in either the unassisted or assisted case)?
• Are forward and backward classical capacities always equal (in either the unassisted or assisted case)?
• Is there a gate U with C+,U < E (∅) e,U a strict inequality?
• Is E e,U * (U * is the complex conjugate of U ). This generalizes the proof in Ref. [50] for 2-qubit gates since U = U T for all 2-qubit gates in their normal form [9]. Numerical work suggests that the equality does not hold for some U in higher dimensions [54].
• When can a gate be simulated efficiently, i.e., by an amount of some resource equal to the capacity?
• How do auxiliary resources of quantities linear in the number of uses affect the capacity?

Examples
Example 1: Let U = cnot. It can be simulated using 1 ebit and 1 bit of classical communication in each direction [10]. Thus E (∅) cnot ≤ 1, C E →,cnot ≤ 1, C E ←,cnot ≤ 1, and bound 2 further implies C +,cnot ≤ 1. These are all achievable with obvious methods, without the need of entanglement assistance in C E →,cnot and C E ←,cnot and without the need of initial entanglement in E The rate pairs in the triangle with vertices (0, 0), (0, 1), (1, 0) are achievable without entanglement assistance, and convexity implies no other pair is achievable. We also have C E +,cnot ≥ 2 due to the following protocol. Starting with the EPR state |Φ2 AB , Alice applies σ a x and Bob applies σ b z if their respective input bits are a and b. The cnot is then applied, converting the state to (−1) ab swap is achieved on the input |Φ2 AA ′ |Φ2 BB ′ . To achieve the forward assisted classical capacity, Alice and Bob start with the state |Φ2 AB ′′ |Φ2 BB ′ and Alice applies σ a 1 xA σ a 2 zA when her 2-bit message is a1a2. Then swap is applied. In other words, superdense coding [18] is performed, consuming an existing EPR pair on AB ′′ , while a new EPR pair is created on AB ′ simultaneously. Thus the unassisted and assisted one-way classical capacities are both 2. C +,swap = 2 is achieved in the obvious way. Superdense coding in both the forward and backward directions implies C E +,swap ≥ 4, and by the monotonicity of the achievable region of assisted rate pairs, C E +,swap = 4. Therefore, any rate pair inside the triangle with vertices (0, 0), (0, 2), (2, 0) can be achieved without entanglement assistance, and any rate pair inside the square with vertices (0, 0), (0, 2), (2, 0), (2,2) can be achieved with entanglement assistance.
The cnot and swap are very simple. We now turn to more intriguing examples.
Example 3: The gate j acts as where the first and second registers are A and B (same throughout the examples). Without ancillas, j creates 1 ebit but seems to create less than 1 cbit in 1-shot, but [26] presents a product 2-qubit input that communicates 1 cbit from Alice to Bob.
Numerical optimization of the generated entanglement with 2-dimensional A ′ and B ′ in Eq. (7) is 1.83186 ebit, and the optimal input has 0.055338 ebit. As a comparison, only 1.8113 ebit is generated by inputting Starting from |Φ2 AB = 1 √ 2 (|00 + |11 ), Alice and Bob can communicate one bit to each other, by applying σ a z and σ b z if their respective messages are a and b. The j gate further converts the state to |x A|x B where x = a + b mod 2, from which they learn each other's input.
We suspect C +,j < E (∅) j . For instance, the best total rate we found requires creating 1 ebit with 1 use of j followed by assisted two-way communication in the second use of j. Asymptotically, 2.83186 uses of j can create at least 1.83186 × 2 bits of communication, so that C +,j ≥ 1.2938, which is much less than 1.83186. For d = 3, we have also studied forward communication without ancillas. It is impossible to transmit 1 bit from Alice to Bob by one use of U , but it is possible asymptotically, so that C →,cp = 1. ae = log d. C →,ae = log d is achievable in the obvious manner. Thus C +,ae = log d. We can prove that one use of ae can communicate strictly less than log d cbit from Bob to Alice starting from product states but allowing ancillas. However, we suspect C ←,ae < C →,ae = log d.

Acknowledgements
We thank M. Leifer, L. Henderson, and N. Linden for discussions and for kindly sharing their results on entanglement capacity prior to publication. We also thank the above, as well as L. Spector and H. Bernstein, and K. Hammerer, G. Vidal, and J. I. Cirac for communicating their results on classical communications with bidirectional channels.
We are indebted to many colleagues for their inputs to our work. We thank P. Shor for communicating his RSP results which are crucial to our results. We thank A. Childs and H.-K. Lo for their critical reading of the manuscript and for many constructive suggestions, part of which motivated a more precise version of Theorem 1 and the problem on d1 × d2 systems. The finiteness of the Hamiltonian capacities was questioned by G. Vidal, who also provided the proof for the finiteness of entanglement capacity. We thank M. Nielsen for his upper bound on the entanglement capacity in terms of the Schmidt number.
We thank I. Devetak for important input in proving bound 2. We thank D. DiVincenzo, J. Dodd, A.
Kitaev, B. Terhal, and other members of the IQI at Caltech for additional helpful discussions.
Since this paper was first posted, other related results have been posted [50,53,55,56].
This work is supported in part by the NSA under the US Army Research Office (ARO), grant numbers DAAG55-98-C-0041 and DAAD19-01-1-06.

A Linear bound in communication cost for distillation
In this appendix, we obtain a bound on the communication cost in distillation using [22] which derives the enhancement factor of the capacity of a noiseless quantum channel assisted by noisy entanglement, i.e. unlimited supply of the mixed state ρ.
Suppose given ρ ⊗qn , cn forward classical bits (in either direction) is sufficient to distill n ebits (q ≥ 1/D(ρ)). Here, we do not require maximum yield of entanglement, so that the classical communication cost is upper bounded by that required in the more difficult job of distillation.
Then, the following is a noisy superdense coding strategy for Alice and Bob -first distill and then perform noiseless superdense coding: cn cbits + ρ ⊗qn → n ebits n ebits + n qubits → 2n cbits Together, the enhancement factor is equal to 2n−cn n = 2 − c, which cannot exceed the optimal value [22] C sd = 1 + sup n Λ A nS(trA(ρ)) − S(ΛA(ρ ⊗n )) S(ΛA(trB(ρ ⊗n )) where the supremum is taken over all trace-preserving completely positive maps ΛA on Alice's half of ρ ⊗n . Hence c ≥ 2 − C sd ≡ ∆. Even though it is not known how to calculate ∆ for an arbitrary ρ it is unlikely to be zero for all ρ. If ∃ρ for which ∆ = 0, then distillation would take at least linear classical communication.