A Consistent Reduced-Speed-of-Light Formulation of Cosmic Ray Transport Valid in Weak and Strong-Scattering Regimes

We derive a consistent set of moments equations for CR-magnetohydrodynamics, assuming a gyrotropic distribution function (DF). Unlike previous efforts we derive a closure, akin to the M1 closure in radiation hydrodynamics (RHD), that is valid in both the nearly-isotropic-DF and/or strong-scattering regimes, and the arbitrarily-anisotropic DF or free-streaming regimes, as well as allowing for anisotropic scattering and transport/magnetic field structure. We present the appropriate two-moment closure and equations for various choices of evolved variables, including the CR phase space distribution function, number density, total energy, kinetic energy, and their fluxes or higher moments, and the appropriate coupling terms to the gas. We show that this naturally includes and generalizes a variety of terms including convection/fluid motion, anisotropic CR pressure, streaming, diffusion, gyro-resonant/streaming losses, and re-acceleration. We discuss how this extends previous treatments of CR transport including diffusion and moments methods and popular forms of the Fokker-Planck equation, as well as how this differs from the analogous M1-RHD equations. We also present two different methods for incorporating a reduced speed of light (RSOL) to reduce timestep limitations: in both we carefully address where the RSOL (versus true c) must appear for the correct behavior to be recovered in all interesting limits, and show how current implementations of CRs with a RSOL neglect some additional terms.


INTRODUCTION
Cosmic rays (CRs) could play a potentially crucial role in the interstellar and circum-galactic medium, star and galaxy formation, and our understanding of high-energy astro-particle and plasma physics. In recent years, there has been a surge of interest in attempts to model CR dynamics explicitly in star, planet, and galaxy simulations -i.e. following the transport and matter interactions of CRs alongside the magnetohydrodynamics (MHD), gravity, and other plasma physics effects in these systems (see e.g. Uhlig et al. 2012;Wiener et al. 2013b;Salem & Bryan 2014;Simpson et al. 2016;Pakmor et al. 2016;Salem et al. 2016;Ruszkowski et al. 2017;Zweibel 2017;Mao & Ostriker 2018;Girichidis et al. 2018;Chan et al. 2019;Butsky & Quinn 2018;Su et al. 2020;Hopkins et al. 2020d;Ji et al. 2020b). Simultaneously, work has continued on more traditional CR propagation methods that trace CR trajectories as "tracer particles" across static analytic galaxy models in order to to understand solar system observables (e.g. Guo et al. 2016;Jóhannesson et al. 2016;Cummings et al. 2016;Korsmeier & Cuoco 2016;Evoli et al. 2017;Amato & Blasi 2018). Ideally, one would simply solve the full Vlasov equation for CRs as a function of position x and momentum p for each CR species, but the high dimensionality of this equation is prohibitive. Moreover, in planet/star/galaxy formation models the resolution scales are vastly larger than CR gyro-radii for CRs with energies TeV (which contain most of the energy/pressure, and dominate the interactions with the non-relativistic matter). As such, these applications have generally relied on moment-based approaches, where one begins by assuming that the CR distribution function (DF) f is gyrotropic (symmetric around the magnetic-field direction), averages over the micro-scale Lorentz forces and scattering processes, then considers moments of the distribution function in terms of the remaining momentum direction, the pitch angle µ.
The simplest of these -"zeroth moment methods" -correspond to pure diffusion models. These involve either assuming nearly-isotropic behavior and solving an isotropic Fokker Plank equation for f , or solving a diffusion-like equation, ∂t q = ∇ · (κ · ∇q) + ..., for some integrated "macroscopic" CR property q (e.g., energy density; the diffusion tensor κ should be anisotropic on scales much larger than the gyro radius, κ = κ bb ). But it is wellknown that this approximation cannot accurately represent many regimes of interest: the free-streaming or weak-scattering regimes, significantly-anisotropic f (µ), the trans-Alfvénic CR "streaming" limit, and others. Moreover, it can produce highly un-physical behavior (e.g. super-luminal CR transport), and imposes a severe timestep (and therefore CPU cost) penalty in numerical simulations that explicitly integrate the CRs. Motivated by this, recently Jiang & Oh (2018); Chan et al. (2019); Thomas & Pfrommer (2019) proposed two-moment schemes, effectively evolving not just the isotropic part of f but its first moment as well (or equivalently, evolving both CR energy and its flux), which resolve many of these problems. The formulations in Jiang & Oh (2018) and Chan et al. (2019) were heuristically motivated by the analogous popular moments methods for radiation hydrodynamics (RHD), but they did not attempt to link these to the actual equations of motion for a gyrotropic CR distribution. Thomas & Pfrommer (2019) did make such a link and developed a formalism for further expanding on this; however, their formulation makes some restricting assumptions, e.g. that the CRs are ultra-relativistic and that the DF f (µ) is always nearly-isotropic. Moreover, although all of these works have suggested and adopted the use of a "reduced speed of light" (RSOL) as a method to prevent extremely small numerical timesteps when CRs are free-streaming (again, analogous to the procedure common in RHD), none have attempted to verify that the RSOL formulation is consistent in all relevant limits of their equations to guarantee accurate steady-state solutions.
In this manuscript we therefore expand upon this previous work to develop more general forms of the CR-MHD equations. In application, this work is intended primarily for numerical models of planet, star, and galaxy formation, or the interstellar or circum/intergalactic medium, where one desires to evolve CR populations explicitly. We make two fundamental assumptions throughout, appropriate for these applications: (1) that the background MHD fluid velocities u are non-relativistic (so we can expand to leading-order in e.g. O(u/c)), and (2) that the CRs have a gyrotropic DF with gyro radii/timescales much smaller than the macroscopically resolved scales in the calculation. Importantly, however, we do not assume True (c) & "reduced" (c) speed-of-light (RSOL) β, γ CR velocity/Lorentz factors β = v/c, γ = 1/ 1 − β 2 u, β u Gas velocity u, with β u ≡ u/c Dt Conservative comoving derivative Dt X ≡ ∂t X + ∇ · (u X) q, Fq Moments of the DF & associated fluxes (Eqs. 2-3) n, e, CR number n, energy e, kinetic energy densities n , e , Differential n ≡ dn/d p = 4π p 2f 0 , etc. fn Pitch-angle moments of the DF:fn ≡ µ n f µ (Eq. 6) µ n f DF-weighted pitch-angle moment µ n f ≡fn/f 0 (Eq. 7) ν Pitch-angle averaged scattering rateν ≡ν + +ν − (Eq. 24) Dµµ,Dpp Averaged scattering coefficients Dµµ, etc. (Eq. 24) Derivative operator G(X) ≡b · [∇ · (D X)] (Eq. 25) P, D CR pressure tensor P & Eddington-type tensor D (Eq. 26) χ Second-moment function χ ≡ (1 − µ 2 f )/2 (Eq. 27) M 2 Closure function µ 2 f ≈ M 2 ( µ 1 f ) (Eq. 28) that e.g. the CR scattering mean-free-paths are short -akin to e.g. kinetic MHD (Kulsrud 1983), we will show that the small-gyroradius assumption is sufficient for a "fluid-like" expansion of the Vlasov equation, provided appropriate closure relations are adopted to truncate the moments expansion.
In § 2 we present various assumptions and definitions, and in § 3 use this to derive the appropriate two-moment equations ( § 3.4) and closures governing the CR distribution function ( § 3.4.1) or its integrals (CR number or energy density; § 3.4.2), as well as the corresponding couplings to the gas equations ( § 3.5). In § 4 we alternatively present expressions appropriate for methods that attempt to explicitly evolve the CR pitch-angle distribution directly ( § 4.1). In § 5 we consider a number of test problems to compare various closure assumptions and "zeroth moment methods" to exact solutions, summarized in § 5.6. In § 6, we discuss how the formulations here extend previous moments equations in the literature ( § 6.1) and popular forms of the Fokker-Planck equation ( § 6.2), and relate to analogous RHD expressions ( § 6.3). We discuss the reducedspeed-of-light (RSOL) approximation in § 7 and present two possible implementations ( § 7.1), deriving correction terms needed in various limits to ensure reasonable behavior ( § 7.2) and reviewing the (dis)advantages of each ( § 7.3). We summarize in § 8.
For ease of reference, we define variables in Table 1 and collect many of the most important derived equations in Appendix A.

ASSUMPTIONS & DEFINITIONS
Our starting point is the general focused CR transport equation (see e.g. Skilling 1971Skilling , 1975Isenberg 1997;le Roux et al. 2001le Roux et al. , 2005Zank 2014;le Roux et al. 2015) as written in polar momentum coordinates: 1 This describes the evolution of a gyrotropic CR distribution function (DF) f , defined in the co-moving frame (with fluid velocity u), valid to second order in O(u/c) (where c denotes the true speed of light throughout). We will consider the CR equations as a continuous function of momentum p or Lorentz factor γ for a given CR species s -i.e. it should be understood here that some quantity ψ is actually ψγ,s(x, t, p, s, ms, ...) for species s with mass ms, etc., but we will not write this out for the sake of compact notation. In Eq. 1, µ is the CR pitch angle,b ≡ B/|B| is the unit magnetic field vector, β ≡ |v|/c = v/c is the speed of the CRs, β u ≡ u/c the speed of the fluid, a ≡ du/dt ≡ ∂u/∂t + (u · ∇) u the fluid acceleration, A : B ≡ Tr[A · B] denotes the double dot product, Dt X ≡ ∂t X + ∇ · (u X) ≡ ρ dt (X/ρ) is the conservative comoving derivative, ρ the fluid density, dt X ≡ ∂t X + (u · ∇)X, ∂t X ≡ ∂X/∂t, and ∂t f |coll denotes the scattering+collisional terms and other loss/injection processes.
We define various integrals of the DF as, where φ is the phase angle, x is the spatial coordinate, and ψq corresponds to each q. So for e.g. the volumetric number density n, total energy e, or kinetic energy , we have q = (n, e, ) with ψq = (1, E(p), T (p)), respectively, where E(p) = γ m c 2 and T (p) = E(p) − m c 2 refer to the total and kinetic energy of an individual CR particle of rest mass m. We will consider a single CR species: we can later reconstruct the total DF by summing over different species. The corresponding fluxes are, where the alignment with ±b follows immediately from our assumed gyrotropic DF. The CR pressure tensor P is defined as, where P0 ≡ 4π p 2 d p (p v/3) f is a scalar pressure and D is an Eddington-type tensor of trace unity (specified below). We also define the pitch-angle-averaging operations, pitch-angle moments of f , and DF-weighted pitch angle moments: fn ≡ µ n f µ,

DERIVATION OF THE CR TRANSPORT MOMENTS EQUATIONS
3.1 Ordering in O(u/c)

General Moments Equations
Let us first discuss the general case before considering to the specific isotropic and anisotropic limits. Begin from Eq. 1, and take the "0th moment" equation (average Eq. 1 over µ). Integrating by parts, O(kf0) ∼ O(kf2) (at least in the isotropic limit), we cannot drop one of these relative to the other. Next we have a large number of "adiabatic terms" O(∇βf1) ∼ O(kf1 u/c); but these are always O(u/c) smaller than the flux/focusing terms O(kf0), both in the free-streaming limit (where O(f1) ∼ O(f0)) by O(u/c), and in the isotropic limit by O[(f1/f0) (u/c)] O(u/c). 2 Next a similar set of terms appears in O(∇βf3), but since O(f3) O(f1) (or more formally sincef3 is bounded likef1 with |f3| ≤ |f0|) and we dropped the terms in O(∇βf1), we should drop the O(∇βf3) terms as well. Finally we have the acceleration terms O(f0 |a|/c 2 ); given the order of |a| noted above, we immediately see this is O(u 2 /c 2 ) smaller than the leading terms.
We can also obtain this hierarchy from the various integral equations. Multiplying Eq. 8 by 4π p 2 d p E(p) and Eq. 9 by 1 Note, even in the strong-coupling limit, if CR pressure dominates the forces on the gas, so P eff, gas → Pcr ∼ ecr ∼ γ ncr m c 2 , we have O(∇P eff, gas /ρ c 2 ) ∼ O(k Pcr/ρ c 2 ) ∼ O(k ncr/ngas), i.e. this scales as the ratio of the number of CRs to non-relativistic particles, which is also extremely small for any limits we consider where we could treat the gas in the MHD limit. 2 Like the analogous radiation-hydrodynamics case, it is important here that we began from the co-moving focused transport equation, sof 1 is comoving, and the dropped O(∇β u ) terms in the flux equations are those outside the operator Dt . Iff 1 were the "lab-frame" moment, leading-order O(∇β uf1 ) terms in Eq. 9 would appear outside the Eulerian derivatives ∂tf 1 . 4π p 2 d p E(p) v and integrating, we obtain the CR total energy and energy flux equations: These are directly analogous to the comoving equations of radiation hydrodynamics (RHD; Mihalas & Mihalas 1984, Eqs. 95.87-95.88), with each featuring the co-moving time-derivative term (Dt ), flux term (∇·(bF) orb·∇P), velocity-gradient terms (∝ ∇β u ), acceleration term (∝ a), and collisional/scattering terms. In RHD, it is wellestablished that in any relevant limit (free-streaming/unconfined, with ν → 0; or static/dynamical diffusion or strong-scattering, with Fe ∼ (c 2 /ν) ∇e; or advection, with Fe ∼ vstream e; whether the gas or relativistic particle pressure dominates a): (1) the acceleration terms are always smaller by O(u 2 /c 2 ) compared to the dominant terms; and (2) the velocity gradient ∇β u terms in the flux (f1) equation are smaller by O(u/c), but must be retained in the energy (f0) equation to recover the correct behavior in the strong-scattering limit.
If we now return to Eq. 8 and keep only leading-order terms in in O(u/c), we have (after some algebra to simplify):

Scattering Terms
Enormous controversy still surrounds the behavior of the CR scattering terms, and this is the focus of much of the CR literature (see e.g. Chandran 2000; Yan & Lazarian 2002;Yan & Lazarian 2004, 2008Zweibel 2013Zweibel , 2017Zank 2014;Bai et al. 2015Bai et al. , 2019Lazarian 2016;Holcomb & Spitkovsky 2019;van Marle et al. 2019). Our derivation here, on the other hand, is almost entirely focused on the collisionless CR transport terms (those outside ∂t f |coll). However to write down a sensible galactic CR transport equation, we must make some assumption about scattering. So we will briefly consider these, in an intentionally simplified manner. We begin from the usual quasi-linear theory (QLT) slab scalings (Schlickeiser 1989): where vA is the appropriate Alfvén speed and ν±(µ) are the scattering rates from forward and backward-propagating waves (Skilling 1975). Taking the appropriate moments and assuming O(vA) ∼ O(u) gives: Note we have definedν andvA for convenience, withν± representing the appropriate µ-averages. For completeness, the ∂t f |coll term should also include a term p −2 ∂p [p 2 Sf0] representing continuous external momentum loss/gain processes (e.g. radiative losses), and some j representing injection or catastrophic losses.

Focused Transport Equation to Leading Order
With § 3.1.1 in mind, we now return to the focused transport Eq. 1 to obtain a simplified form valid to O(u/c). First dropping just the (always higher-order) acceleration terms, after some tedious algebra we can write Eq. 1 as: Based on the arguments above in § 3.
where χ = (1 − µ 2 )/2, and ν±(µ) are a function of µ. We note that all expansions and discussion used to derive Eq. 16 rely only on our O(u/c) expansion, and the derivation can, if desired, be carried out without needing to first follow the moments expansion in our § 3.1.1.

The Close-to-Isotropic-DF Case
We now consider an example of a specific form for the CR DF that is nearly-isotropic in µ. The derivation here will closely follow Thomas & Pfrommer (2019), to whom we refer for more details.
By assumption, if f is close-to-isotropic in µ, it can be expanded in pitch angle moments as f (µ) ≈f0 + 3 µf1 + O(| f1| 2 /| f0| 2 1), which impliesf2 ≈f0/3 or µ 2 f = 1/3 (andf3 ≈ 3f1/5). With this assumption the pressure tensor becomes isotropic: P = P0 I (i.e. D = I/3) where P0 = d 3 p f p v/3 (= β 2 e/3 integrated in a narrow interval of p). Either directly using this form for f and taking the zeroth and first µ moment-averages of Eq. 1, or simply inserting the above for µ 2 f in Eqs. 8-9, we can immediately verify that these give consistent expressions, and the ordering in O(u/c) is the same as § 3.1.1. For the leading-order terms, we have: where ... denotes the dropped terms, and we write out µ 2 f (instead of inserting 1/3) for reference below. For the scattering terms, we obtain to leading order in O(u/c):Dµµ,µ =ν,Dµp = (pvA/v)ν,

The Maximally-Anisotropic-DF Case
Next, consider the opposite limit of the maximally anisotropic DF f (µ) =f0 δ(µ − µ0) -i.e. all CRs at a given (x, p, s, ...) have identical pitch angle, andfn = µ n f f 0 = µ n 0f0 . Our ordering above in O(u/v) is not sensitive to this, so keeping only the terms to leading order, the moments of Eq. 1 become, and (again being careful regarding µ commutation), where ... denotes the dropped terms of sub-leading order in O(u/c). If µ0 is independent of p (or we integrate over a narrow range of p), the pressure tensor is Pcr = 3 P0 D = Piso + Paniso with Defining the mean scattering coefficients so thatν± = ν±(µ = µ0) because f ∝ δ(µ − µ0), we obtain to leading O(u/c):Dµµ,µ =ν, Written this way, we verify an important connection to Eqs. 17-18: at this order, the equations differ only in the addition of terms with the pre-factor (3 µ 2 f − 1), which vanish identically with the nearly-isotropic-DF closure µ 2 f = 1/3. Likewise, the pressure tensor and these expressions for theD coefficients reduce to exactly their near-isotropic-DF values when µ 2 f = 1/3. Thus Eq. 11 or Eqs. 19-21 are valid in both the nearly-isotropic-DF and maximallyanisotropic-DF cases, for appropriate choice of µ 2 f .

General Expressions & Closure Relation
After some re-arrangement we can now write a series of expressions valid in both the nearly-isotropic-DF and maximally-anisotropic-DF limits: We have added the terms S, which represents continuous (e.g. radiative) losses, and j, which represents injection or catastrophic losses. We also define the operator G(q) and Eddington tensor D in terms of the variable χ: Various other choices are reviewed in Murchikova et al. (2017). We stress that while the closure relation Eq. 28 (or Eq. 29) is an approximation, Eqs. 22-33 are exact (to lowest order in u/c) for any DF, provided the "correct" µ 2 f andν. So one can easily imagine constructing more complicated or exact closure relations, analogous to "variable Eddington tensor" methods in RHD, to assign the correct values of µ 2 f .

CR Number & Energy Equations
We can now obtain equations for (q, Fq) by multiplying Eqs. 22-23 by 4π p 2 ψq d p and integrating. First, it is helpful to consider the equations integrated over an infinitesimal range of p, e.g. ∆n ≡ (dn/d p) ∆p. This gives: Dt F n + c 2 G β 2 n = −ν F n − 3 χvA n + S Fn , where n ≡ dn/d p = 4π p 2f 0, F n ≡ dFn/d p = 4π p 2 vf1, S n ≡ 4π p 2 j0, S Fn ≡ 4π p 2 v j1. For total energy e we have: Dt F e + c 2 G β 2 e = −ν F e − 3 χvA (e + P 0 ) + S Fe , Then for kinetic energy we obtain: It is useful to note the relations: i.e. the "effective adiabatic index" relating CR pressure and kinetic energy density is γeos = (4 + γ −1 )/3 at a given Lorentz factor γ.
One uses µ 1 f = Fq/q v to determine the closure values of µ 2 f or χ.
Note every term in the "macroscopic" equations for q has a simple interpretation and correspondence with a term in Eqs. 22-23 for f . The Dtf0,1 → Dt (q, Fq) term is the comoving conservative derivative; ∇ · (βf1b) → ∇ · (Fq) is the normal flux; D : ∇β u → P : ∇u is the "adiabatic" term (for µ 2 f = 1/3, P : ∇u → P0∇ · u) related in detail to the non-inertial frame (akin to the analogous RHD term); S and j represent loss/gain processes in number and momentum space (e.g. radiative/catastrophic losses, injection); βG(f0) → G(β 2 q) is the "flux of flux" (flux source) term; Dµµf1 →ν F the scattering term in the flux equation; Dµp ∂pf0 → χvA (q + ...) is the "streaming" term if the scattering is asymmetric; and the Dpµ and Dpp terms give rise to the gyro-resonant loss or diffusive reacceleration termsSsc (discussed below).

Spectrally-Integrated Expressions
Integrating Eqs. 30-33 over all CR momenta gives equations for the spectrally-integrated CR number and energy, for example: Although d p q = q is trivial, this immediately introduces practical difficulties in terms like d p G(β 2 q ) and d pν [F q − 3 χvA (q + ...)] in the flux, and d p [S sc − P : ∇u] in the energy equations. The issue is that even if we know the form ofν±(p), we cannot write these equations in terms of a single "effective" χ, ν,vA, β, γ, etc, because the "weights" (combination of p-dependent factors in the integrals) in each part of each term are different. Moreover, even if we specified an initial spectral shape (f0(p) andf1(p)) to calculate some effective values, the p-dependence would immediately alter the spectrum and change those values.
If one wishes to adopt the spectrally-integrated equations in practical applications, therefore, one must impose a universal (fixed) spectral shape. In that limit, the CR total energy is the meaningful quantity to evolve, since a "fixed-spectrum" CR number equation will not conserve energy or momentum. We can further simplify by noting that most of the total CR energy is in particles with β ≈ 1 and E(p) ∼ T (p), giving: Here with χe,v e A , andνe understood to be the appropriate "spectrallyaveraged" values. 4

The Gas Equations & Conservation
As discussed in Zweibel (2013Zweibel ( , 2017 and Thomas & Pfrommer (2019), the CRs can exchange momentum with the (non-relativistic) gas and magnetic fields 5 primarily via two effects: (1) scattering, and (2) Lorentz forces. If we note that the CR momentum density is d 3 p p f = (1/c 2 ) Fe (using p = E(p) v/c 2 ), then it is immediately clear how to account for (1): we simply add an equal-and-opposite momentum flux to the gas momentum equation to match the scattering (ν) term in Eq. 31, i.e. Dt (ρ u) + ..
Deriving the Lorentz term (2) requires re-visiting the CR momentum equation before gyro-averaging. In generality (making no assumption about the form of f ) for a non-relativistic background, the comoving Vlasov equation where ∇x,p denote gradients in position and momentum space, respectively, and F is the external force term.
in this frame. 6 Now, take the momentum density by multiplying by 4 For completeness, we note that the "0th moment" spectrally-integrated CR energy equation arises from Eq. 38 taking the strong-scattering (isotropic-DF, µ 2 f → 1/3), flux-steady-state (Dt Fe → 0) limit, so Fe → v e A (e + P 0 ) − (c 2 /νe)b · ∇P 0 . 5 Since we are working in the limit where the CR gyro radii are small, and obviously the non-relativistic ion+electron gyro radii are much smaller still, the MHD assumption that the non-relativistic ion gyro radii are vanishingly small compared to resolved scales is reasonable. 6 We neglect other exchange terms such as e.g. the gravity of the CRs, secondary transfer of momentum from scattering of beamed CR radiation, etc, as these are several orders-of-magnitude smaller.
p and integrating over d 3 p. Integrating by parts and using various identities, note: Now separate this into parallel and perpendicular components by projecting withbb and (I −bb), respectively. Becauseb · FLorentz = 0, the parallel equation becomes our Eq. 31 for (1/c 2 ) Dt Fe, multiplied byb. Since the terms on the left-hand side of this parallel equation represent free transport and relativistic corrections (coordinate-transformation terms), with no F term appearing, the scattering term represents the only parallel momentum exchange with the gas -i.e. we have re-derived the scattering term (1), which was derived more heuristically above from momentum-conservation arguments. Now consider the perpendicular component. Averaged over the "macroscopic" spatial/time scales ( macro, tmacro) much larger than the gyro radius/time (rg, Ωg), the first term Dt F e, ⊥ = d 3 p (I − bb) p f Ω must vanish, because there can be no coherent flux of CRs perpendicular to the field (more precisely, this term must be smaller than the dominant terms by O(rg/ macro)). The second term (the ∇x term) does not vanish, but gives: Lorentz represents the total Lorentz force per unit volume on CRs f cr Lorentz . The scattering term in the perpendicular direction The Lorentz force on CRs redirecting v requires an equal-and-opposite force on gas, 9 giving Dt (ρ u) This has a simple interpretation: spatial differences in the collisionless CR pressure tensor (non-zero ∇·P) source a net CR current (mean v or net flux Fe). The parallel momentum current isb Fe, which is resisted only by scattering (exchanging momentum with gas). The perpendicular current, on the other hand, is immediately redirected by Lorentz forces, exerting an equal-and-opposite force on the gas. The gas momentum equation becomes: 7 In this last step, we have used the fact that F ≈ F Lorentz can be written as 8 We define the parallel and perpendicular tensor divergence as ∇ · P ≡ bb · (∇ · P) and ∇ ⊥ · P ≡ (I −bb) · (∇ · P). 9 Equivalently, we can insert jcr in Ampere's law to obtain ∇ × B = (jgas + jcr)/c, and use this to calculate the "back-reaction" force = −f cr Lorentz on gas. 10 It may appear inconsistent with our assumption of a gyrotropic CR distribution elsewhere to show jcr × B /c ≈ ∇ ⊥ · P = 0, since for a perfectly gyrotropic distribution jcr × B = 0 exactly. Physically, one can think of this as the perpendicular CR pressure gradient inducing a very small non-gyrotropic perturbation to compensate. The fractional deviation from perfectly-gyrotropic orbits can be estimated as . So in all other expressions derived in this paper, this correction is sub-dominant by O(rg/ macro) and can be safely neglected. However in the back-reaction force on the gas, this term remains finite and leading order even as (rg/ macro) → 0. or where the ... refers to all the non-CR terms, and the sum and integral refer to the summation over all CR species & integration over all momenta. Notingb G(β 2 e ) = ∇ · P , it is often convenient to rewrite this as: This has the form of a hyperbolic pressure gradient term ∇ · P that can be included in a Riemann solver, plus a "source term" (the right hand side) which vanishes identically when the energy flux equation is in local steady-state.
In the total gas+radiation energy equation, the behavior is straightforward: the kinetic energy terms simply follow the momentum equation: Dt egas = ...u · Dt (ρ u) |cr (where Dt (ρ u) |cr collects the terms on the right-hand side of Eq. 40), and the ther-mal+magnetic+radiation terms see the source terms Dt egas+rad (42) Physically, the source/sink S e term corresponds to either energy lost to CR acceleration at injection, or thermalized or radiated away from various loss processes (thus determining how much goes into thermal vs. radiation energy). The kinetic terms reflect work done and, in flux steady state, behave like an adiabatic "PdV" term balancing the P : ∇u term in the CR energy equation. The scattering termSsc corresponds to energy loss/gain from scattering with micro-scale (gyro-resonant) magnetic fluctuations. By definition for the applications of interest, these are unresolved, and have rapid thermalization times, so this can be treated as part of the gas thermal/internal energy budget, although one could also evolve them explicitly as in e.g. Zweibel 2013; Thomas & Pfrommer 2019. As discussed at length in Mihalas & Mihalas (1984) in the RHD context and Thomas & Pfrommer (2019) for the CR limit, there are subtle ambiguities related to exact, separate energy and momentum conservation if we include the CR inertia at this order in O(u/c). These are related to the definition of frame, the consistency of other terms of higher O(u/c), and the fact that nonrelativistic MHD drops terms of higher order in O(u/c). For example, including the CR inertia, the momentum change includes terms Dt Fe/c 2 =b Dt Fe/c 2 + (Fe/c 2 ) Dtb, where the latter term be- smaller than all the retained terms in the flux equation. These could be added to maintain manifest conservation if desired, but are not well-posed, as they relate to higher-order terms dropped in both the CR and MHD equations. However, one can immediately verify that in the flux-steady-state or Newtonian (c → ∞) limits, as assumed in MHD, manifest conservation in the lab and comoving frames is recovered.

DF Equation in Finite-Volume Form
Although we have focused on developing the µ-moments equations, there may be occasions where one wishes to directly evolve the pitch-angle distribution, as in our exact solution cases below in § 5.1. This can be done explicitly by integrating on a phase-space grid that includes the µ dimension explicitly, similar to e.g. direct ray integration methods for RHD like those in e.g. Jiang et al. 2014. This is actually simpler for CRs as compared to RHD, because we retain the gyrotropic assumption so can still integrate out the φ dimension. For these applications, it is useful to take the focused transport equation in Eq. 16, which incorporates the scattering terms (Eq. 12) and carefully retains only leading-order terms in O(u/c). This can be conveniently written as: There is a one-to-one correspondence between each term in Eq. 43 and their pitch-angle-averaged equivalents inf0,f1 .
Eq. 43 is straightforward to implement numerically using standard finite-volume methods: the time-evolution Dt f of the comoving f can be operator split into three terms representing (1) translation/flux in position-space (the ∇ · (...) advection term) at fixed µ and p; (2) translation/flux in pitch-angle space (the ∂µ(...) terms) at fixed x and p; (3) translation/flux in rigidity/energy space (the ∂p(...) terms) at fixed x and µ. Each reduces to a finite-volume problem in the x, µ, p space, and (2)-(3) being local in position space allows them to be integrated efficiently; the major overhead is the higher dimensionality of the problem causing (potentially excessive) computation. For an example where e.g. the p terms are integrated in a finite-volume fashion in p-space, see Girichidis et al. (2020).

Equations for the Mean Evolution of a CR "Group"
It is instructive to consider the gyro-averaged evolution equations for the mean state of a CR "wave packet" or "group" with instantaneous state , t) in the general DF Eq. 43, and then multiplying Eq. 43 by U and integrating over x, µ, p, s to obtaiṅ U , the rate-of-change of the state vector along the path of the group. The "species equation" for s trivially evaluates to˙ s = 0, since we have not included explicit spallation or other specieschanging processes. The position equation is simply˙ x = u + µ v b , i.e. translation with the gas velocity and along the field. The pitch angle and momentum equations are non-trivial, however. For µ, we havė 11 It is also often useful to write Eq. 43 in terms of the one-dimensional DF such that dn = d p dµ f 1D as opposed to dn = d 3 p f = p 2 d p dµ dφ f defined above. This gives where δν ≡ ν+ − ν−, ν = ν( U ), and the ≈ makes the grey approximation for ν (which slightly changes the pre-factors but none of the behaviors). We can understand the physics of each term in Eq. 45: (1) The term ∝ v ∇ ·b is the "focusing" term, corresponding to the ∇ ·b terms in G(q) in the flux equations; (2) Dµµ ∂µ f → ν ∂µ f → ν µ is the normal scattering term (∼ ν Fq in the flux equations), which acts like a "drag" term on the mean µ -but note, because this an equation just for µ , the diffusive behavior (which would increase µ 2 if we started from a δ-function DF) does not appear here; gives rise to trans-Alfvénic CR streaming, appearing as the χvA q terms in the flux equations, and giving a mean µ →vA/ v , i.e. streaming at ∼vA, in the strong-scattering (ν → ∞) limit. For the momentum equation: where again ≈ indicates the grey approximation. Again, the terms can be understood as follows: (1) D : ∇u is the "adiabatic" term (immediately analogous to the term in the energy equations); (2) Dpµ → ν χvA ∂µ f → νvA µ is the streaming/gyroresonant loss term (∝ νvA Fe inSsc in the energy equations); If desired, these equations can be directly integrated as well, in Monte Carlo-type methods where each explicitly-evolved CR "super-particle" represents the gyro-averaged behavior of an ensemble of CRs with a δ-function DF, but this would require adding some stochastic scattering terms to capture the diffusive/secondderivative behavior (i.e. the change in µ 2 or non-δ-function behavior of f as it evolves away from an initial δ-function).

Setup & Closures Considered
We now consider some extremely-simplified test problems to illustrate how solutions of the CR transport equations differ depending on the closure. In that spirit, we take ultra-relativistic CRs (β → 1) in a gas medium with negligible fluid motion (u → 0),b =b(x) and ν =ν(x) independent of time, uniform ρ, with symmetric scattering and weak fields (vA → 0, vA → 0) no sources/sinks/other losses, and sufficiently low CR density such that the CR forces on gas are negligible (i.e. "pure CR transport"). We will make the problem dimensionless by defining f → f / fi (e → e/ei), τ →ν0 t, x → xν0/c, for some reference fi (or ei) andν0, and define the path-length integrated along a field line = x f x 0 dx·b (sob·∇X → ∂ X). With these simplifications, the equations are effectively one-dimensional in and are identical for any moments pair (q, Fq) = (f0,f1), (n, Fn), (e, Fe), etc: ∂τ q = −∇ · (Fqb) and ∂τ Fq + G(q) = −ν Fq.
We will compare the following closure assumptions. Except for the 0th-Moment/Diffusion and Exact Solution cases, all adopt the two-moment expansion, but make different assumptions about the closure assumption for µ 2 f orf2.
(iv) Maximal-Anisotropy: Assume the DF corresponds to a δfunction with the given µ 1 from Levermore (1984), which interpolates between the isotropic-DF and anisotropic-DF limits and represents the exact closure for any DF which can be made isotropic under some Lorentz transformation. Minerbo (1978), which similarly interpolates between limits but is exact for a DF satisfying a classical maximum-entropy principle.
(vii) Interpolated µ 2 f : Wilson: Wilson et al. (1975), which is realizable but represents an ad-hoc interpolation function between isotropic and anisotropic limits.
(viii) Exact Solution: We compare these to the results of directly integrating the focused CR transport equation for f (µ) explicitly as a function of µ and x per Eq. 16 ( § 4), using a grid of ∼ 1000 elements in the µ dimension at each spatial position. For the isotropic IC we initialize an isotropic DF f (µ), for the streaming IC we initialize f (µ) ∝ δ(µ−1), and for simplicity we assume isotropic scattering ν =ν.

1D Pure-Propagation in A Homogenous Medium
Takeb =ẑ = constant,ν =ν0 = constant, so the transport equations simplify to ∂τ q = −∂ Fq and ∂τ Fq + ∂ ( µ 2 f q) = −Fq. The problem is one-dimensional and the solutions depend only on the ICs and closure µ 2 f , which we vary and compare in Fig. 1. First (top-left panel), consider a case which is well-described by the isotropic, diffusive limit: ICs with σq = 10, Fq(τ = 0) = 0, evolved to τ = 200. The CRs begin isotropic, and, recalling that = 1 corresponds in these units to the scattering mean-free-path (MFP) = c/ν, all of the gradient length and time scales even in the ICs are much larger than the CR scattering MFP. Indeed, the 0th Moment (i), Isotropic-DF (ii), and all the interpolated closures (v)-(vii) give nearly identical results here in excellent agreement with the Exact solution (viii), as they should. The Maximal-Anisotropy closure (iv) fails catastrophically: it assumes an initial µ 1 f = 0 corresponds to a pitch angle distribution with all CRs at µ = 0, so no flux can ever develop. The Maximal-Streaming closure (iii) fails as well: although the flux equation approaches steady-state, the assumed µ 2 f = 1 means that the effective diffusion coefficient is 3× larger than the correct value.
Second (top-center panel), consider a case which is close to free-streaming, a "streaming" IC with σq = 0.02, Fq(τ = 0) = q, evolved to τ = 0.02, so the CRs are initially free-streaming and all scales are much shorter than the MFP. Now, the Maximally- Parameters : Figure 1. Idealized test problems from § 5 comparing different closure assumptions ( § 5.1) for the Boltzmann/Vlasov moments hierarchy vs. exact solutions. We simplify to "pure transport" problems in a stationary background where the ICs are specified by the initial pitch-angle DF ( µ 1 f = 0 corresponding to an isotropic DF f =f 0 , µ 1 f = 1 to a free-streaming DF with f =f 0 δ(µ − 1)), width of the (initially-Gaussian) CR number orf 0 ∝ exp {−( − 0 ) 2 /2 σ 2 q }, scattering coefficientν( ) and field divergence ∇ ·b. We plot the value of the µ-integrated DFf 0 or its moments (n, e) versus spatial coordinate along a field line , in units of scattering time 1/ν 0 and length c/ν 0 , at plotted time τ =ν 0 t. Exact solutions evolve the entire pitch-angle-resolved DF f (µ) explicitly. The "interpolated" closures evolve the first two CR µ-moments equations, differing in the exact form of µ 2 f = M 2 ( µ 1 f ) used to close the moments hierarchy; they give very similar results and qualitatively reproduce the exact solution behavior (albeit imperfectly) in all problems while retaining positive-definitef 0 . The "isotropic-DF," "maximal-streaming" and "maximal-anisotropy" closures adopt µ 2 f = 1/3, = 1, = µ 1 f 2 (appropriate for isotropic or free-streaming or δ-function DFs) respectively; these can give qualitatively incorrect behavior and produce solutions with negativef 0 (negative energy/particle number) in some circumstances. "0th-Moment/Diffusion" refers to the common diffusion closure at 0th order by assuming flux-steady-state and strong-scattering; this preserves positive-definite behavior but produces qualitatively wrong behaviors and super-luminal CR transport in many problems.

Anisotropic (iv), Maximal-Streaming (iii), and interpolated (v)-(vii) closures are very similar to the Exact solution (viii). 0th
Moment/Diffusion (i) fails catastrophically as expected, since the system is not in the diffusive limit. The Isotropic-DF closure (ii) underestimates the correct speed of propagation of the "pulse," as expected, 12 but more problematically we see that q (e.g.f0 or e or n) has become negative in some places. This is the formally correct solution if we impose µ 2 f = 1/3 -the issue stems from the fact that this closure violates the realizability constraint from § 3.4.1: there 12 Taking the derivative of the q equation in § 5.2 to combine it with the Fq equation, we have ∂ 2 τ Fq + ∂τ Fq = ∂ ( µ 2 f ∂ Fq). If we enforce the isotropic-DF µ 2 f = 1/3, then we see immediately that this reduces the maximum free-streaming speed from c to c/ √ 3.
exists no positive-definite DF with µ 1 f = 1 (imposed by the ICs) and µ 2 f = 1/3 everywhere. Third, consider two intermediate cases. For an isotropic IC with σq = 0.15 evolved to τ = 2 (top-right panel), the exact solution (for isotropic scattering; (viii)) is a symmetric flat-topped "shelf" moving outwards at speed intermediate between the isotropic and free-streaming cases, with diffusive "tails." 13 None of the closures perfectly reproduces this, but the interpolated closures (v)-(vii) are much closer to the exact solution and behave qualitatively similar to one another (and also rapidly converge to the exact solution as we evolve further in time). Maximal-Anisotropy (iv) again fails catastrophically as it cannot propagate starting from µ 1 f = 0. Despite the IC being isotropic, the 0th Moment/Diffusion approximation (i) also performs poorly (producing excessive "tails" and an incorrectly-peaked shape), as the strong-scattering/flux-steady-state assumption does not apply. Both the Isotropic-DF (ii) and Maximal-Streaming (iii), or any other closure with µ 2 f = constant, produce two spurious "peaks" which propagate outwards with a low central density in between.
For a streaming IC with σq = 0.1 evolved to τ = 1 (middleleft panel), the interpolated closures (v)-(vii) all resemble the exact solution (viii) (the peak propagates at the correct speed, with just a slightly modified shape). As expected the 0th-order/Diffusive closure (i) fails totally. The Isotropic-DF closure (ii) again produces an unphysically negativef0, and under-estimates the pulse speed. The Maximal-Streaming (iii) closure over-estimates the front speed but also produces an artifact of a "shelf" extending to < 0. Unlike the previous streaming IC, the Maximal-Anisotropy (iv) closure now also under-estimates the propagation speed, as assuming µ 2 f = µ 1 f 2 suppresses the flux source term too rapidly when µ 1 f is not very close to ±1.

1D Propagation With Variable Scattering
Now consider a spatially-variableν =ν0 g( ) (dimensionless equa- 2 /(2 σ 2 g )} with σg ∼ 0.1 − 10, qualitatively akin to analytic models for Galactic CR transport with representing the height in the Galactic disk/halo, with both an isotropic (middlecenter panel) and streaming (middle-right panel) IC. The effect here is primarily to exaggerate the differences already seen in § 5.2. Most notably, the 0th Moment/Diffusion approximation fails much more dramatically here, because ν → 0, causing the diffusivity κ → ∞ at | − 0| σg. This leads to the PDF becoming almost perfectly flat and the diffusive "tails" travelling at v c (e.g. at the times plotted, we obtain fronts moving at 10 6 c).
Next, consider g = exp {−2 ( − 0)} (bottom-left panel), where there is an asymmetric gradient across the injection region (akin to injection in any off-center location in a disk or galaxy). With the streaming IC (not shown) the differences between closures are similar to the case above. With an isotropic IC (bottomleft panel), the broken symmetry is important: at τ = 1, the exact solution predicts an asymmetric shelf from −0.5 − 0 0.8, with slightly higher density f at < 0 (as CRs are being scattered more rapidly at < 0). The constant-µ 2 f closures (ii), (iii) fail to capture this: they again produce two peaks but these move with nearly-symmetric speed, and actually predict much larger amplitude of the peak in the > 0 direction (the opposite of the correct behavior). The 0th Moment (i) case predicts essentially infinite transport speeds in the + direction. Interestingly, of the interpolated closures here the Wilson closure (vii) best captures the correct asymmetry, suggesting this test can distinguish between more subtle variations.

Propagation With Bent Fields in A Simple Geometry
Now consider a variant of the "diffusing ring" in a cylindrical field geometry, withν = constant andb =φ purely azimuthal about exact solution in this limit, and even a 0th-order closures can capture the relevant behavior provided careful numerical treatment (Sharma et al. 2010). f , µ 1 f (the "closure relation") is seemingly identical if we equateb with the specific intensity directionn, Lorentz forces confining CRs give rise to fundamentally different anisotropic transport confined to fields. The figure illustrates this in a problem with purely cylindrical fields (arrows show the local directionb, with an initial narrow Gaussian distribution of f (magenta circle) injected at some position (black circle shows the closed field line along which this appears), and distribution at a later time in blue. In the isotropic-DF strong-scattering limit (left) the CR equations here reduce to spatially-anisotropic diffusion (despite the DF being isotropic in µ) along the field line in both directions; in the RHD closure they reduce to globallyisotropic multi-dimensional diffusion. In the anisotropic-DF free-streaming limit ( µ 1 f = 1, initially; right) the CR closure reduces to free-streaming "around" the field lines, while the RHD closure produces straight-line trajectories. some axis. This is a useful problem to illustrate the differences between the closure relation (even for "pure transport" in the ultrarelativistic limit) for CRs, derived here, and the analogous M1 closure relation for photons (RHD), as discussed in § 6.3.

Comparison to the M1 RHD Closure
To illustrate the key behaviors, here we explore mathematically the intuitive idea that CR streaming and diffusion is confined along field lines (unlike RHD). This is also sketched in Fig. 2. Take the Newtonian limit (c → ∞) or flux steady-state Dt F → 0, so we have ∂τ q = −∇ · Fq with Fq = −K gcr where K ∼ c 2 /ν is some effective diffusivity and gcr ≡b G(q) =bb · [∇ · (D q)]. For q = e, this becomes gcr =bb · (∇ · P). Compare this to the RHD M1 closure, where the flux equation has the form Dt Frad + ∇ · Prad with Prad = Drad erad, where Drad = χrad I + (1 − 3 χrad)nn with χrad ≡ (1 − µ 2 rad )/2, identical to our definition for CRs if we identifyb =n (the radiation flux direction). In flux steady-state, this gives Frad = −Krad grad with grad = ∇ · Prad.
Thus, even in flux-steady-state with identical effective diffusivities, we see that although the anisotropic P and Prad are similar, Fcr fundamentally differs from Frad in that Fcr is projected alongb. This leads to major qualitative differences in behaviors in both isotropic-DF and streaming limits. First take the isotropic-DF ( µ 2 f → 1/3) case: gcr → (1/3)bb · ∇e = (1/3)φφ · ∇e and grad → (1/3) ∇e. So for CRs, even if the pitch-angle distribution is isotropic, we still have anisotropic diffusion with only parallel diffusion along the field lines allowed, owing to our assumption of a gyrotropic DF with small gyro radii. For RHD we obtain isotropic diffusion, and all information about the field lines is lost, because photons are not "confined" to field lines. Now consider the free-streaming limit: for CRs gCR →b {∇ · (eb)} while for RHD grad → ∇ · (enn) = ∇ · (ebb). Now the difference is less obvious, as the RHD case is still anisotropic. But the ordering here produces totally different behavior: gcr →b∇ · (eb) =φφ · ∇e corresponds again to transport around an azimuthal ring (followingb), 14 while grad → ∇ · (ebb) = (φ · ∇) (eφ) =φφ · ∇e + e (φ · ∇)φ = φφ · ∇e − (e/r)r produces a radially-propagating flux. Notably, while gCR saturates once e → e(r) becomes azimuthally-symmetric, the RHD solution in this limit actually corresponds to a ring which expands outwards at speed ∼ K/r (see e.g. Hopkins 2017), because in the free-streaming limit there is nothing to "bend" the photon trajectories.

Behavior of the CR Closures
Returning to the two-moment CR equations, noting forb =φ that ∇ ·φ = 0, so ∇ · (qb) =φ · ∇q = ∂ q, we can write ∂τ q = −∂ Fq, ∂τ Fq + ∂ ( µ 2 f q) = −Fq. But this is exactly identical to the equations withb = constant in § 5.2, written in terms of the distance along the field line (so we have already shown the effects of different closures in Fig. 1). The only difference is (1) that this line is globally curved, but that can simply be considered an embedding/coordinate transformation; and (2) the circular nature ofφ means that the boundaries for e, f are periodic, whereas in § 5.2 we implicitly considered open boundaries. In these simplified cases with u = 0, time-invariant background, ∇ ·b = 0, etc., any field geometry can be transformed into an equivalent 1D problem since CRs are confined alongb. The physical assumption that drives this behavior, fundamentally, is that the gyro radii of the CRs are much smaller than the radius of curvature ofb smoothed on the scales of interest.
With an isotropic IC, we see that the isotropic-DF (ii) and Maximal-Anisotropy (iv) cases fail completely to capture the correct anisotropy: in the Fq equation, an isotropic-DF closure exactly eliminates the focusing term, and the maximal-anisotropy case produces propagation opposite the exact solution (viii). Meanwhile, Maximal-Streaming (iii) strongly over-estimates the anisotropy. The interpolated closures (v)-(vii) at least capture the key qualitative behaviors.
With the streaming IC, the interpolated closures (v)-(vii) are nearly identical and all behave qualitatively akin to the exact solution (viii). Both constant-µ 2 f (isotropic or anisotropic) (ii), (iii), and the Maximally-Anisotropic (iv) cases produce negative DFs. 15 We also see the 0th Moment (i) closure fail in a new manner: this closure cannot correctly treat the focusing term. For anisotropic diffusion Fq ∝ −bb · ∇q, as required for realistic CR dynamics, a nonzero ∇ ·b still appears as a source term in the q equation, but the flux closure assumption (i) means the focusing term in the flux is not included. The result is that the front for (i) actually propagates in the opposite direction to that of the correct solution.

Summary
Just like the analogous RHD case, no two-moment closure can capture the exact behavior of full phase-space solutions for f (µ). However, the interpolated closures (v)-(vii) at least capture the qualitative behaviors of all terms in all test problems considered here. Constant-µ 2 f closures like assuming a near-isotropic-DF (ii) or a free-streaming-DF (iii) or a maximally-anisotropic (δ-function) DF (iv) fail catastrophically on some problems and, most crucially, fail to ensure non-negative solutions for f orf0 (e.g. CR number and energy density). While taking the 0th-Moment/Diffusion limit (i) does ensure positive-definite solutions, it fails catastrophically in other ways: it drives CR transport in the incorrect direction in situations with strong focusing, streaming, or scattering-rate-gradients, and it produces super-luminal transport.
Among the interpolated closures, the Levermore and Minerbo closures (v)-(vi) produce very similar results (not surprising since they give nearly-identical µ 2 f ( µ 1 f ) functions). The Wilson closure (vii) performs slightly more accurately with isotropic ICs, though it sometimes slightly under-estimates peak-amplitude in free-streaming ICs, which is expected as it gives µ 2 f slightly closer to the isotropic-DF µ 2 f = 1/3 at intermediate µ 1 f . Of course, real problems will be vastly more complex, with advection velocities u comparable to CR transport speeds, spatialand-time variable versions of all quantities above,ν dependent on µ as well as space and time, etc. We emphasize that many of the most important consequences of the proposed closures may only be evident in those scenarios. For example, if the "adiabatic" terms ∝ (χ I + [1 − 3 χ]bb) : ∇u, gyro-resonant losses ∝vA Fe, diffusive re-acceleration gains ∝ 3 χ v 2 A (e + P0), trans-Alfvénic or CR "streaming" speed ∝ 3 χvA are important, these depend quite strongly on χ and therefore on the closure (with re-acceleration and Alfvénic streaming behaviors vanishing entirely in the anisotropic limit). Likewise, simulations where the CR forces on gas are important will be sensitive to the closure relation because the shape and anisotropic form of P depend explicitly on the closure relation.  Jiang & Oh (2018) were heuristically motivated by two-moment treatments of RHD but the authors did not attempt to derive a set of equations consistent with the actual DF equation for CRs (nor appropriate closure, etc).
Thomas & Pfrommer (2019) (here TP) did attempt such a derivation for the nearly-isotropic-DF case, and indeed § 3.2 mostly follows their more detailed and comprehensive discussion. It is therefore worth noting how the work here extends their formulation. The major differences here are: (1) We derive moments equations for the DF f itself as well as integrals like CR number/total energy/kinetic energy n, e, , while TP primarily focused on just e.
(3) We develop the equations for the entire CR spectrum f (p) or e (p), while TP focused on the spectrally-integrated expressions. (4) Our equations are agnostic to the specific scattering model (this physics is not our focus), while TP focused in detail on deriving specific expressions forν± due to CR scattering from Alfvén waves within the context of CR self-confinement scenarios. (5) Most importantly, TP focused exclusively on the nearly-isotropic-DF case and enforced the strongscattering closure µ 2 f = 1/3; we derive a more general set of expressions that allow for anisotropic DFs and CR pressure, and can approximately capture the CR free-streaming limit.
Most earlier CR transport models in galaxy simulations adopted a "zeroth-moment" or pure-diffusion approximation, evolving e.g. the spectrally-integrated e with Fe = κ ∇P0. The anisotropic version of this, with κ = κ bb , of course arises if we take the isotropic-DF, strong-scattering, Newtonian (c → ∞, so flux-steady-state always applies) limit. Although simpler, this can give a number of unphysical behaviors, as discussed above. This can be mitigated by adopting a flux-limited-diffusion-type approximation, replacing κ bb · ∇e → φlim κ bb · ∇e with φlim ≡ MIN[1, β e c/|κ bb · ∇e|], but as we have shown, there are qualitative phenomena this closure still fails to capture.

Relation to the Isotropic FP Equation
By far the most popular form of the CR transport equations adopted in Galactic models of CR transport that do not attempt to explicitly follow galactic dynamics -e.g. GALPROP (Strong & Moskalenko 2001) or DRAGON (Evoli et al. 2017) -is the isotropic Fokker-Planck equation: If fluid velocities are included (these are often dropped), they are taken to add the terms −u · ∇ f + (1/3) (∇ · u) p ∂p f to the righthand side of Eq. 47. This equation arises from our Eqs. 22-23, if we make the following assumptions: (1) assume an isotropic-DF closure, so µ 2 f → 1/3, D → I/3, G(q) →b · ∇q/3, etc.; (2) assume the Newtonian limit (c → ∞) or the infinite-strong-scattering (ν → ∞) limit in the CR flux or first µ-momentf1 equation (Eq. 23), so that the CR flux reaches its local equilibrium value instantaneously, with Dtf1 → 0; (3) assume that the scattering is also exactly isotropic with respect to pitch angle, so thatν+ =ν− (to O(u/c)) andvA → 0; this causes the Dµp and Dpµ terms to vanish; (4) take the resulting anisotropic spatial diffusion term ∇ · (βbf1) → ∇ · (D bb · ∇ f0) withD = (β c) 2 /(3ν), and assume that the magnetic field directionb is isotropically random or "tangled" on scales of the mean free path (below some averaging scale), allowing it to be approximated as an isotropic diffusion ∇ · (Dxx ∇f0) withDxx ≡D /3 (which produces the commonly-assumed relation for this limit Dxx Dpp = p 2 v 2 A /9); and (5) drop the terms involving the fluid velocities u (sometimes called "convective" terms).
The major limitations of Eq. 47 are therefore that it cannot capture anisotropy in the DF f (µ), anisotropy in the scattering rates ν±(µ), or anisotropy in the field geometryb (each of which is independent). It also cannot correctly describe the free-streaming/weak-scattering or out-of-flux-equilibrium limit (e.g. Dt F = 0, relevant just after injection, or whenb changes direction rapidly, or when ν varies spatially or temporally). Finally, depending on the form adopted, it ignores or treats less accurately the fluid velocity and comoving-vs-inertial frame terms.

Relation to the M1 RHD Equations
Our derivation of the CR moment equations & closure from the focused transport equation closely parallels the derivation of the radiation moments and M1 closure from the specific intensity equation in e.g. Levermore (1984); Mihalas & Mihalas (1984) and others, and indeed there are many similarities. However there are some important differences. The physics, of course, is completely distinct, and the detailed form of the scattering and collisional/loss terms totally different. Most obviously, radiation is always in the ultrarelativistic limit, so properties like β → 1 and → e are always satisfied in RHD. Nonetheless, even for "free" transport of ultrarelativistic CRs, important differences arise from two key effects: (1) the CRs are gyrotropic and feel Lorentz forces, and there is a scale hierarchy imposed by the assumption that the gyro radius is much smaller than resolved scales; (2) the "preferred direction" isb (not the solid angle vectorn in RHD), which can change direction and responds to the gas physics.
As a result, a number of terms appear which do not have an RHD analog, including (1) theSsc terms andvA terms that introduce the Alfvén frame; (2) the perpendicular pressure forces in the gas hydro equation (which relate to Lorentz forces and therefore do not vanish even with weak parallel scattering), and (3) various geometric terms that alter the directions of key transport behaviors. For the latter, mathematically we see that the non-commutation ofb and ∇ results in the flux equation having the formb Dt F instead of Dt F. Terms such as G(q) = (1 − 2 χ)b · ∇q + (1 − 3 χ) q ∇ ·b have fundamentally non-hyperbolic components and do not have the same form as their RHD analog, which can be written Dt F = −∇ · P + .... We could only do this ifb and χ were uniform everywhere. The consequences of this are plainly illustrated in § 5.4.1 -it produces qualitatively different behaviors.
Like the M1 case in RHD, there are still cases where our "interpolated" closure (Eq. 28) fails. For example, it cannot capture the "intersecting rays" problem, where µ 1 f = 0 not because of an isotropic distribution (as the proposed closure in Eq. 28 assumes), but because f (µ) = (1/2)f0 (δ(µ − 1) + δ(µ + 1)). Ifν → 0, the closures predict that two free-streaming rays will "collide" and then diffuse out, rather than pass one another truly collisionlessly. More complicated closure schemes for µ 2 f can be devised to address this. It is less clear, however, whether this is as much a problem for CRs as for radiation, since the CRs are not truly collisionless "test particles" as they stream, in the way photons are. In fact, in this particular situation the CRs would be unstable to two-stream instabilities, so "collide then diffuse" may indeed be a more accurate description of their true dynamics. Fully kinetic CR models that do not assume even CR gyrotropy (as assumed from the start of our derivations) are needed to properly address such physics.
Related to this, an important physical difference is that the M1-RHD closure imposes the assumption that the DF is symmetric about the flux directionF ad-hoc, without any particular physical motivation. This can be violated rather severely on all spatial scales, e.g. if rays intersect at oblique angles. Here, the gyrotropic CR assumption is much more well-motivated, and has a well-defined scale length (the gyro scale) providing a formal scale-separation hierarchy.

Hybrid Schemes & a Note on the "Gyro-Resonant Loss" & "Re-Acceleration" Terms
Recently hybrid schemes have been proposed that evolve f (x, p) in large-scale simulations by directly evolvingf0(p | x) in momentumspace at each cell position x, while using a zeroth or first-moment expansion scheme for the spatial terms (e.g. Girichidis et al. 2020).
These are straightforward to generalize to the methods here, by evolvingf0,f1 according to Eqs. 22-23. In these approaches, the equations for q orf0 can be operator-split into a hyperbolic spatial transport step Dtf0 + ∇ · (vf1b) = 0 and a momentum-space step where all the source and sink terms (including e.g. the "adiabatic" term P : ∇u,Ssc, and Sq) are evolved following Eq. 22. In this spirit, recall from § 4.2 that we can derive from the momentum-space translation/diffusion terms (including the adiabatic and p −2 ∂p p 2 (Dpµf1 +Dpp∂pf0) terms) a mean rate-of-changė p of the CR momentum or energy (of a CR "group" with the same initial p; see Eq. 46). Pitch-angle averaging Eq. 46,ing If we take q = e , and use various identities in § 3.4.2 to replace β, we can rewrite this as: The first (adiabatic) term immediately reduces to the familiar˙ p = −(1/3) (∇ · u) p expression if we assume an isotropic-DF closure.
The second (scattering) term closely resemblesS sc , and indeed in the ultra-relativistic limit where E ∝ p (and P 0 = e /3) it becomes exactlyS sc /e (i.e. the rate of change of energy and momentum become identical). In this term the first (∝vA F) part stems from Dpµ, while the second (∝ v 2 A e) stems from Dpp. The "..." term refers to other collisional terms (e.g. radiative losses).
In any case, the preceding discussion makes it clear that our derived scalings include both the "gyro-resonant" or "streaming" 16 As discussed in Hopkins et al. (2020b), if one somehow did haveν ∼ν − on micro-scales, the timescale for theν ± to come into the equilibrium state withv A → v AF e ·b is much smaller than resolved timescales in galaxy-scale simulations. 17 In these studies the CRs were taken to be ultra-relativistic so the gyroresonant losses simply become −v A |b · ∇P 0 |/3 P 0 . loss and "turbulent/diffusive reacceleration" terms, in a more general form.

Where and When Are These Differences Most
Important?
It is helpful to ask "under what conditions will the predictions from the more accurate expressions herein differ most dramatically from the predictions of simpler, less-accurate (e.g. isotropic Fokker-Plank, zeroth-moment/diffusion, or isotropic-DF) CR transport expressions?" Examination of the relevant equations and our tests in Fig. 1 suggest this will typically be most important when the CR scattering mean free time (∼ν −1 ) or path ( MFP ∼ c/ν, since we must consider the full range of µ) become larger than some other scales of interest or relevance for CR transport (e.g. the gradient scale-lengths forb, b ≡ |b|/|∇ ·b|, orν, ν ≡ν/|∇ν|, or background quantities such as the gas density or pressure if CR-gas interactions are of interest). As shown in Fig. 1, this is true even if the CR DF is close-to-isotropic. And although the scattering timē ν −1 is generally short, the scattering length can be quite large: if we take state-of-the-art empirical estimates ofν in the Solar neighborhood/LISM (e.g. Evoli et al. 2017;Amato & Blasi 2018;Chan et al. 2019;Hopkins et al. 2020b;de la Torre Luque et al. 2021, convert-ing from an isotropic diffusivity toν), we obtain MFP ∼ 10 pc R 0.5 GV , where RGV is the CR rigidity in GV.
In phenomenological models whereν is constant, ν → ∞ by definition, so the effects of the expressions here will generally be more modest. However, for ∼ 1 − 10 GV CRs, b (essentially the Alfvén scale of ISM turbulence) can be comparable to MFP, and for 10 GV CRs, MFP can begin to exceed the Galactic disk scale-height. So propagation models over these scales, especially for high-energy CRs and/or models where the CR-gas coupling is important (e.g. models of CR-driven winds where the "launching" occurs from the disk) could be sensitive to the more detailed CR transport expressions here.
Much more dramatically, in physically-motivated models where the scattering rates ν are set by some competition between damping and driving either by gyro-resonant instabilities (selfconfinement models) or extrinsic turbulence, thenν can be a strong function of quantities such as the neutral fraction or gas temperature or local Mach numbers (see e.g. Yan & Lazarian 2004;Zweibel 2017, or the review in Hopkins et al. 2020b, which can vary on vastly smaller scales (the skin depth of phase transitions or shock widths, orders-of-magnitude smaller than MFP). These rapid changes can be tightly associated with phenomena such as CR "bottlenecks" (as CRs propagate across phase transitions) or the CR "staircase" which arises in self-confinement models of CR-driven outflows, all of which have been the subject of considerable recent study using variations of the simpler CR transport expressions that may not accurately represent the exact solutions in this regime (e.g. Bustard & Zweibel 2020;Winner et al. 2020;Huang & Davis 2021;Quataert et al. 2021;Hin Navin Tsung et al. 2021). In these regimes, the bulk CR behavior could differ substantially with the more accurate expressions proposed herein ( § 3.4.1).
Finally, if ν(µ) itself is strongly anisotropic, then an approach which evolves the pitch-angle DF, as in § 4.1, becomes crucial to obtaining accurate results.

THE REDUCED-SPEED-OF-LIGHT (RSOL) APPROXIMATION
Explicitly integrating Eqs. 22-33 imposes a Courant-type timestep limiter ∆t ≤ C ∆x/c in Lagrangian codes (or ∆t ≤ C ∆x/(c + u) in Eulerian codes). While this is generally less onerous at high resolution than the quadratic condition imposed by "pure diffusion" or "zeroth moment" schemes (where ∂t f ∝ κ ∇ 2 f , imposing ∆t ≤ C ∆x 2 /κ), it is still often numerically prohibitive because c is much faster than any other signal speed in the problem. By analogy to RHD, we can therefore adopt a "reduced speed of light" or RSOL approximation, as in many previous CR studies (Jiang & Oh 2018;Su et al. 2019Su et al. , 2020Ji et al. 2020b;Chan et al. 2019;Hopkins et al. 2020d,a,b,c;Buck et al. 2020). However, in those studies, the CR transport equations were developed ad-hoc, as described above.
Here we develop two viable RSOL formulations, and describe the terms where additional corrections are needed.

Alternative (Viable) Formulations
Per the preceding derivations, we can generically write the spatial transport terms in the CR moment equations for (f0,f1) 18 or (q, Fq) with q = (n, e, ) for some species and energy interval as: (we collect all of the non-transport terms such as scattering and sources/sinks in S eff ). When using the RSOL approximation, it is important to be careful which values of c are replaced with the RSOLc. We wrote these equations in the form c −1 Dt q = ... because then (just like in radiation hydrodynamics; see Skinner & Ostriker 2013, and references therein) the RSOL replaces only the value[s] of c associated with the Dt term. 19 There are then two choices of viable scheme, first: or alternatively The formulation in Eq. 50 is exactly equivalent to replacing c −1 Dt f →c −1 Dt f in the original focused transport Eq. 1, 20 then following our derivations identically. It is also the more common scheme in RHD. The formulation in Eq. 51 associatesc only with the flux equation, instead, and introduces the function Ψ ≡ MIN[1, |Fq|/Ftrue] with Ftrue ≈ MIN[q β c, |Fq(c → ∞)|], as justified below. 21 These share the most important features: (1) the maximum signal speed for free-streaming is reduced to βc, meaning that the stable Courant timestep condition becomes ∆t ∝ ∆x/(βc), allowing much larger timesteps (the reason to introduce the RSOL); (2) both exactly recover the true Eq. 49 asc → c; (3) both converge exactly 18 Note Eqs. 22-23 can be written c −1 Dtf 0 + ∇ · (βf 1b ) = (...), c −1 Dt (βf 1 )+β 2 G(f 0 ) = β (...), matching the form in Eq. 49 for (q, Fq) = (f 0 , vf 1 ). 19 Because our moments are defined in the comoving frame, we associatẽ c with Dt , as opposed to ∂t , which is more appropriate when the salient quantities are defined in the lab frame. 20 Consider the free-streaming limit of the focused transport Eq. 1, with negligible scattering in a homogeneous medium: c −1 Dt f + ∇ · (µ β fb) = 0. This is pure advection with v = β µ c; taking c →c correspondingly reduces the maximum bulk/free-streaming advection speed from β c to βc. to the true (c = c) solutions for q, Fq, S eff q , in local steady-state (when Dt → 0).

Out of Equilibrium Behaviors and Timescales
The differences between the schemes come whenc c out of steady-state. Define Γ ≡ c/c and consider some key timescales: the flux-convergence timescale ∆tF, the loss/injection timescale ∆t in/loss , and the CR transport/escape timescale ∆tesc. First assume S eff F is dominated by a scattering term ∼ −ν F/c 2 : with Γ = 1 (Eq. 49), the flux equation should converge to steady state (Dt → 0) on a scattering time ∆t true F ∼ ν −1 . For Eq. 50, ∆t F ∼ Γ 2 ν −1 ∼ Γ 2 ∆t true F . Now assume in the number/energy equation S eff q ∼ ±q/(c τ ), for some loss or production/injection processes. These processes reach equilibrium in ∆t true in/loss ∼ τ for Eq. 49. For Eq. 50,∆t (50) in/loss ∼ Γ ∆t true in/loss , and for Eq. 51 ∆t (51) in/loss ∼ Ψ −1 ∆t true in/loss . The CR transport/escape time ∆tesc ∼ L/veff to some distance L is given by the effective transport speed veff (writing Dt q + ∇(veff q) = ...): for Eq. 50, ∆t  (51) esc ∼ L/c ∼ Γ ∆t true esc . The quantities of interest in CR models -e.g. CR number densities of a given species at a given energy, primary-to-secondary or radioactive-to-stable ratios, etc. -are set by the appropriate ratios of injection/loss/escape timescales (for a given galactic background). Since injection and non-transport (e.g. collisional) losses scale together in ∆t in/loss in both Eq. 50 & Eq. 51, their ratio (and therefore scalings that depend on balancing injection and non-escape losses) is insensitive toc. For Eq. 50, in all limits, the ratio ∆t in/loss /∆tesc is also equal to its "true" (c = c) value, as both scale identically with Γ. For Eq. 51, however, this is only true if Ψ → 1 in case (a) and Ψ → Γ −1 (or more generically Ψ → F/F true ) in case (b).

(Dis)Advantages of Each Formulation
This leads us to the major (dis)advantages of each method. The formulation of Eq. 50 "uniformly" slows down CR transport: it is essentially equivalent to a uniform rescaling of time, as seen by the CRs, by a factorc/c. This has the advantage that although the time ∆t to reach equilibrium in q and Fq is increased, in both the free-streaming and confined limits (equivalent to the optically thin and thick limits in the RHD literature where these were first derived), the system reaches the "correct" number/energy density and losses/production at the same distance ∆x from any source. Also, the flux equation converges more rapidly than Eq. 51 (∆t transport can potentially become so long, for computationally tractable RSOL valuesc, that the system never actually reaches that ∆x or steady-state. This is most acute in the circum/inter-galactic medium (CGM/IGM) around galaxies, where many have argued CRs may be most important (Booth et al. 2013;Wiener et al. 2013a;Butsky & Quinn 2018;Butsky et al. 2020;Hopkins et al. 2020a;Ji et al. 2020b,a). Consider that even for rapid diffusion (diffusivity κ ∼ κ30 10 30 cm 2 s −1 ), at L ∼ L30 30 kpc from a galaxy ∆t (50) esc ∼ Γ L 2 /κ ∼ 100 Gyr (c/1000 km s −1 ) −1 L 2 30 κ −1 30 . In other words, we requirec 10 4 km s −1 for the CRs to "reach" the CGM in less than a Hubble time in the formulation of Eq. 50. Similarly, we need very largec to ensure ∆t (50) in/loss is not much longer than galaxy dynamical times (which would risk converging to the wrong equilibrium).
The formulation of Eq. 51 avoids this, by converging in the number/energy (loss and transport) equations much more rapidly (on the "correct" timescale, independent ofc, on large scales). It converges in the flux equation more slowly, but this is still rapid in absolute terms, as e.g. ∆t (51) F ∼ 3 Myr κ30 (c/1000 cm 2 s −1 ) −2 . The problem with Eq. 51 is that we can find ourselves in case (b), and potentially in the sub-case where ∆t (51) F is larger than one of ∆t true in/loss or ∆t true esc -the limit where capturing the correct behavior withc c requires including the Ψ term with Ψ → F/F true . Motivated by the above and treatments of the flux-limiter in fluxlimited RHD with an RSOL, we therefore suggest the interpolation function Ψ = MIN[1, |Fq|/Ftrue], where Ftrue = MIN[e β c, |vA (e + P 0 ) + κ ∇ e |] for q = e (or Ftrue = MIN[n β c, |vA n + κ ∇ n |] for q = n , etc.) is given by the value the flux would have in local steady-state (Dt Fq → 0) forc = c at the given energy. This ensures the correct behavior in both asymptotic limits discussed in § 7.2.
With this definition, one can verify that both formulations in Eq. 50 & 51 converge to identical solutions asc increases. One would expect from the above that in the dense ISM, the formulation of Eq. 50 converges somewhat faster with respect toc/c (i.e. one can obtain converged solutions with lowerc, hence lower computational expense). But for the reasons above, in the CGM, the formulation of Eq. 51 converges at much lower values ofc. Eq. 51 therefore has advantages for applications in, e.g. cosmological galaxy formation simulations, while the formulation in Eq. 50 potentially advantageous for transport around sources or in the ISM within galaxies.

Which Speed of Light Enters the Closure Relation?
Recall that for the closure relation Eq. 28 that we proposed to estimate µ 2 f , we used µ 1 f =f1/f0 = Fq/(β q c). For the formulation in Eq. 50, the "actual" flux of q is (c/c) Fq, so Fq retains its usual meaning -free streaming will still have Fq = β q c, so we can use this relation in unmodified form, µ 1 f =f1/f0 = Fq/(β q c) (provided we follow all the definitions above). For the formulation in Eq. 51, we need to be more careful: Fq saturates at ∼ qc, but this can occur even if the system approaches a near-isotropic DF, for sufficiently-large diffusivity. So in the closure relation, we require a function similar to the Ψ term above; for example, taking Fq/(β q c) → Fq/MAX[β qc, |Fq(c → ∞)|].

Rigidity-Dependent RSOL
Finally, we note that although the arguments above assumec is constant in space and time, they do not requirec be the same for different CR species or energies. In calculations that evolve a set of CR species of energies binned in rigidity, for example, one can adopt ã c that increases for the highest-rigidity CRs (for example, asc =c0 for R < 1 GV, andc0 (R/GV) at larger values). Larger-rigidity CRs have larger κ (e.g. larger ∆tF) so require largerc to converge. By sub-cycling the CR equations for the highest-rigidity values, faster convergence may be possible.

Appearance in the Gas+Radiation (Momentum+Energy) Equations & Conservation
Just like with RHD (see e.g. Skinner & Ostriker 2013), it is important that the RSOL appear only in the dynamical equations for the CRs, not in the terms that couple to the gas that are written in terms of physical quantities. Otherwise certain terms, like the parallel forces or CR thermal heating rates, would not, in fact, converge to equilibrium when Dt → 0 and would be severely incorrect. Thus, for example, the form of the gas momentum Eq. 40 as written remains identical. Likewise the gas heating terms have their "normal" values with respect to e, etc. One consequence of this, again identical to RHD, is that the formally conserved quantities with an RSOL are not total energy (Eother + Ecr) and momentum (Pother + c −2 Fcr). Instead, for the formulation in Eq. 50, they are (Eother + (c/c) Ecr) and (Pother + (cc) −1 Fcr), while for the formulation in Eq. 51, they are (Eother + Ecr) and (Pother +c −2 Fcr). This is important to note but introduces no conceptual difficulty, provided the definitions above are used.

SUMMARY
Beginning from the focused CR transport equation allowing for an arbitrary pitch-angle distribution, we have derived and tested a consistent set of moments equations for CR-MHD applications, analogous to widely used closures for RHD. We present equations for either e.g. the first two pitch-angle moments of the DF f ( f µ, µ f µ), or corresponding integrated pairs like CR number density and its flux (n, Fn), total CR energy and flux (e, Fe), or CR kinetic energy and its flux ( , F ). We present two different schemes to integrate these explicitly in simulations with a RSOL approximation, discuss their relative convergence properties and merits, and note some important terms missing from previous CR-RSOL implementations. The derived equations are summarized in Appendix A.
Our equations are valid for all relevant CR β = v/c (not just the ultra-relativistic limit), and do not impose any assumption about the slope or form of f (p). Unlike the Fokker-Planck or pure diffusion+streaming (zeroth-moment) formulations of the CR transport equations, the expressions here can handle both free-streaming/weak-coupling (arbitrarily large mean-freepath) and strong-scattering (static or dynamic diffusion or advective) limits, for both near-isotropic and arbitrarily anisotropic DFs, anisotropic forward/backward scattering, and anisotropic magnetic fields/global transport. The expressions are accurate to leading order in O(u/c) in all limits. The key assumptions are: (1) that the background fluid is non-relativistic, |u| c; and (2) the CRs have a gyrotropic DF, with gyro radii much smaller than resolved scales.
It is easy to imagine extending this even further to include more complicated "variable Eddington tensor" formulations akin to RHD (representing arbitrary CR DFs), although the gyrotropic nature of CRs removes some of the ambiguities associated with RHD formulations. In this spirit we also present the relevant gyro-averaged equations for direct finite-volume phase-space integration of the pitch-angle distribution (following f (x, p, µ, t, ...) explicitly on a grid of x, µ, p), as there may be cases where the different formulations are beneficial.
Finally, it is worth commenting on a major practical difference between RHD and CR-MHD applications: in many astrophysical RHD applications, the collisional/scattering terms (absorption and scattering coefficients) are reasonably well understood, and much of the debate in the literature has centered on methods to accurately handle the actual radiation transport. In contrast, in CR-MHD, the scattering terms -and, as a consequence, the diffusion/streaming coefficients -are enormously uncertain. This is true even of their qualitative form and dimensional scalings. Different state-of-the-art models for CR scattering rates ν differ by several orders of magnitude and often predict opposite dependence on properties like magnetic field or turbulence strength (see the review in Hopkins et al. 2020b). Real progress in predictions will require a better understanding of the form of the CR scattering rates, their dependence on pitch angle and local plasma/ISM properties, and developing new diagnostics to compare models to observations. Nonetheless, the hope is that the calculations in this paper can aid in reducing some of the better-understood uncertainties in CR transport. And we argue in § 6.5 that there are many physically important situations, especially those which involve rapidly-varying CR scattering rates and/or CR "bottlenecks," where the more accurate form of the equations herein may predict significantly different behaviors compared to more simplified and less-accurate expressions. Further, in numerical applications where an RSOL is adopted, it is crucial to adopt treatments that can correctly interpolate between different limits. Finally, the basic principles of the closure structure proposed here can be used to include additional information about scattering coefficients in the CR-moment framework. For example, if one wished to model a scattering rateν =ν[ f (µ)] ≈ν( µ 1 f , µ 2 f , ...) that is a function of the CR pitch-angle distribution, the structure herein provides a well-defined way to retain and estimate some (though certainly not all) of this physics without having to evolve the entire pitch angle distribution function at each momentum and position.
ACKNOWLEDGMENTS Support for PFH was provided by NSF Research Grants 1911233 & 20009234, NSF CAREER grant 1455342, NASA grants 80NSSC18K0562, HST-AR-15800.001-A. Numerical calculations were run on the Caltech compute cluster "Wheeler," allocations FTA-Hopkins supported by the NSF and TACC, and NASA HEC SMD-16-7592. Support for JS was provided by Rutherford Discovery Fellowship RDF-U001804 and Marsden Fund grant UOO1727, which are managed through the Royal Society Te Apārangi.

DATA AVAILABILITY STATEMENT
The data supporting this article are available on reasonable request to the corresponding author.