Gravitational wave radiometry: Mapping a stochastic gravitational wave background

The problem of the detection and mapping of a stochastic gravitational wave background (SGWB), either of cosmological or astrophysical origin, bears a strong semblance to the analysis of CMB anisotropy and polarization. The basic statistic we use is the cross-correlation between the data from a pair of detectors. In order to `point' the pair of detectors at different locations one must suitably delay the signal by the amount it takes for the gravitational waves (GW) to travel to both detectors corresponding to a source direction. Then the raw (observed) sky map of the SGWB is the signal convolved with a beam response function that varies with location in the sky. We first present a thorough analytic understanding of the structure of the beam response function using an analytic approach employing the stationary phase approximation. The true sky map is obtained by numerically deconvolving the beam function in the integral (convolution) equation. We adopt the maximum likelihood framework to estimate the true sky map that has been successfully used in the broadly similar, well-studied CMB map making problem. We numerically implement and demonstrate the method on simulated (unpolarized) SGWB for the radiometer consisting of the LIGO pair of detectors at Hanford and Livingston. We include `realistic' additive Gaussian noise in each data stream based on the LIGO-I noise power spectral density. The extension of the method to multiple baselines and polarized GWB is outlined. In the near future the network of GW detectors, including the Advanced LIGO and Virgo detectors that will be sensitive to sources within a thousand times larger spatial volume, could provide promising data sets for GW radiometry.


I. INTRODUCTION
The existence of gravitational waves (GW), has long been verified 'indirectly' through the observations of Hulse and Taylor [1]. However, direct observation of such waves with manmade gravitational wave detectors has been lacking. At present the laser interferometric detectors have achieved sensitivities close to that required for detecting such waves [2]. The space mission LISA [3] is also planned by the ESA and NASA to detect low frequency GW. The significance of the direct detection of GW lies, not only in the opening of an entirely new window into observational astronomy by probing phenomena in the regime of strong gravity; it further promises to test our present theories of gravitation. Different types of GW sources have been predicted and may be directly observed by Earth-based detectors in the near future (see [4,5,6,7,8,9,10] and references therein for recent reviews): (i) Transient sources -such as binary systems of neutron stars (NS) and/or black holes (BH) in their in-spiral phase, BH/BH and/or BH/NS mergers, and supernovae explosions, whose signals last for a time much shorter, typically between a few milli-seconds and a few minutes, than the planned observational time; (ii) Continuous wave (CW) sources -e.g. rapidly rotating neutron stars, where a weak deterministic signal is continuously emitted, and (iii) Stochastic backgrounds of radiation, either of primordial or astrophysical origin.
In this paper we will address the problem of a spatially resolved search of the gravitational wave stochastic background. This approach was advocated in the LIGO technical note [11] and the basic analysis was recently implemented on the fourth science run data from the LIGO interferometers to prepare an upper limit map [12]. Our main focus will be on a stochastic astrophysical GW background (AGWB), which might arise from a superposition of a large number of independent and unresolved GW sources. The gravitational wave background can arise from a variety of sources: supernovae with asym-metric core collapse, binary black hole (BBH) mergers, GWs from low-mass X-ray binaries (LMXBs) and hydrodynamical instabilities in neutron stars (r-modes), or even GWs from astrophysical objects that we never knew existed. When a collection of any subset of these sources is unresolvable, it can appear as a stochastic GW background (SGWB) of a variable duration in our detectors of interest. While an astrophysical background will provide information about our immediate neighborhood, cosmological GW backgrounds (CGWB) could probe the physics of the early universe. There exist cosmological scenarios (e.g., cosmic strings and super-string models) which predict CGWB that should be detectable by Advanced LIGO [13].
We propose and develop a data analysis method that measures and maps the power in the SGWB from a specific location in the sky -GW radiometry using a network of detectors. We find that the angular resolution essentially depends on the effective GW bandwidth and the linear size of the network. In this paper we will restrict ourselves to the network of the two 4km LIGO detectors at Hanford (LHO) and Livingston (LLO). For the purposes of our analysis, we take their noise curves to be identical with the LIGO-I design power spectral density [14]. Our future plan is to include VIRGO and other detectors around the world in the numerical implementation of this analysis.
The basic statistic is the cross-correlation between the data from a pair of detectors. In order to 'point' the pair of detectors at different locations one must suitably delay the signal by the amount it takes for the GW to travel to both detectors corresponding to the source direction. This delay will be a function of the source position and will vary as the Earth rotates. Using the delay allows the detectors to sample the same wavefront from the source. The cross-spectrum formulation has been carried out in [15,16]. Methods for searching for isotropic backgrounds [17] using the cross-correlation and for anisotropies using spherical harmonic decomposition [18] have been devised. Efforts have also been made to devise methods to measure the spherical harmonic moments of the SGWB anisotropy using a network of ground or space based detectors [19,20]. Here we focus on a spatially resolved search and the final goal is to make a map of the true SGWB sky. We achieve this goal by pixelizing the sky, that is, we use a pixel basis.
The advantage of a spatially resolved search is seen immediately if we examine the so called overlap reduction factor, which partially determines the fractional power of source spectrum the search filter will receive at different frequency bands. The overlap reduction factor, normally denoted by γ(f ) in the literature [17] for the isotropic unpolarized background, becomes a time-dependent factor γ(Ω, f, t) for the spatially resolved search. For the LIGO detectors, γ(f ) quickly reduces to zero beyond few tens of Hz, while γ(Ω, f, t) has infinite bandwidth. So the bandwidth of the spatially resolved search is essentially detector bandwidth limited. This is typically valid for a network of detectors and therefore important from the point of view of the sensitivity regime of GW detectors which lies in this region.
As in radio-interferometry, the correlation statistic so constructed produces a 'dirty' map where a point source does not produce a point image, but one that is smeared by a beam response function (beam, for brevity). The 'cleaned' GW sky map is obtained from the measured cross-correlation statistic by deconvolving the beam. In other words, to obtain the GW power from each direction in the sky one needs to solve an integral equation where the measured power (data) is a convolution of the actual power with a kernel (beam). In order to understand the structure of the beam we carry out a numerical and an analytical study using the stationary phase approximation (SPA). We find that at low declinations (latitudes) of a point source, the kernel essentially has the shape of a figure '8' with a bright spot at the intersection. The bright spot is at the location of the point source. The figure of '8' continuously changes and bifurcates into a 'tear drop' as the point source moves to higher declinations. The declination at which this bifurcation occurs is determined by the half-angle of the cone traced out by the vector joining the two detectors. For the LIGO detectors this declination is about 26 • . The size of the bright spot or effective sky patch, defined by a certain percentage of reduction in the beam response function, say 50%, is determined by the inverse of the band-width divided by the light (GW) travel time between the detectors. Considering a broadband source and LIGO detectors having kHz bandwidth with 10 ms light travel distance between them, the angular size is about 5 • in radius. We find that these results agree very well with those obtained by applying singular value decomposition to the kernel matrix; the eigenvalues fall off steeply after a certain point which determines essentially the number of 'degrees of freedom' of the kernel matrix and thereby the size of the sky patch.
We employ the maximum likelihood (ML) approach for deconvolving the sky map. The integral equation for a discrete pixelised sky leads to a set of linear algebraic equations. Several deconvolution algorithms exist in literature for solving such a problem. However, because of the broad similarity of our present problem with the cosmic microwave background (CMB) analysis, we have opted for techniques that have been successfully applied to deconvolve CMB sky maps. Moreover, the ML approach provides a framework to study the SGWB anisotropy in other basis of interest, e.g., the spherical harmonic basis. We find the ML estimate by employing the conjugate gradient method. To verify our method, we apply the analysis on simulated sky maps mapped by a GW radiometer consisting of LIGO pair of detectors LHO and LLO. We generate 'realistic' colored Gaussian noise corresponding to LIGO-I design sensitivity curve [14] detector noise and embed various simulated sky maps of the GW stochastic background into the noise. We demonstrate that the true sky maps can be recovered satisfac-torily.
The paper is organized as follows: In section II we briefly review the GW radiometer concepts and obtain an expression for the GW radiometric cross-correlation signal, which is then optimized for the maximum signalto-noise (SNR). In section III, we set up the integral equation that must be solved in order to obtain the true sky map from the data. A time-frequency analysis is performed and the directed optimal filter is derived for anisotropic searches. Moreover, a stationary phase analysis is presented which provides us with the understanding of the kernel or the point spread function. In section IV, we describe the maximum likelihood approach and the conjugate gradient method and apply it to simulated data to test the efficacy of this method. Later subsections outline the extension of the GW radiometer analysis to incorporate multiple baselines obtained with a network of detectors and the extension of the GW radiometer to search for polarized SGWB. The numerical implementation of the method is described in section V. We conclude in section VI.

II. GW RADIOMETER EMPLOYING EARTH ROTATION APERTURE SYNTHESIS
A. The principle of a radiometer Radiometry or aperture synthesis is a well known technique in radio astronomy and CMB experiments. The idea is to point a pair of detectors separated by a baseline to a desired direction in the sky by introducing an appropriate time-delay between their data-streams. This delay corresponds to the difference between the times of arrival of a GW signal if it were to arrive at those two detector sites from that direction. For a given source in the sky, this delay will change as the baseline orientation changes due to the rotation of the earth. The cross-correlation of the data from the two detectors, appropriately timedelayed, would cause potential GW signals arriving from the chosen direction to interfere constructively. Whereas signals from other directions will tend to cancel out because of destructive interference. This principle of Earth rotation aperture synthesis, which is well-known in radio astronomy, could very well be used in GW astronomy using pairs of GW antennae. Figure 1 illustrates the principle on which the GW radiometer works.
We consider the Celestial Equatorial frame whose origin coincides with the centre of the Earth. The axes are defined as follows: For a fixed but arbitrarily chosen time t = 0, the x-axis is directed towards the intersection of the equator and the longitude φ = 0, which can be taken as the Greenwich meridian; the z-axis is directed towards the North Celestial Pole and the y-axis is chosen orthogonal to the previous two axes forming a right-handed triad. The Earth rotates in this frame with the angular velocity ω E = 2π/(1 sidereal day) ∼ 7.3 × 10 −5 radians/sec oriented along the z-axis. The two detectors are 1: Geometry of an elementary radiometer. Above, ∆x(t) is the separation or baseline vector between the two detectors; as the Earth rotates, its direction changes, but its magnitude remains fixed. The direction to the sourceΩ is also fixed in the barycentric frame. The phase difference between signals arriving at two detector sites from the same direction is also shown.
at locations x I (where I = 1, 2), and the baseline vector joining the two sites is ∆x := x 2 − x 1 .Ω is the unit vector in the direction to the source, which is fixed in the barycentric frame. The baseline ∆x rotates with the Earth, but its magnitude |∆x|, which is the distance between the detectors, remains constant. The map of the SGWB can be constructed by performing this synthesis for each location in the sky, patch by patch. An approximate size of the patch or resolution of the GW radiometer and number of patches required to cover the full sky can be estimated from the following simple argument. This simple model produces a fringe pattern with resolution given by the standard formula for the central width ∆θ ∼ λ GW /(|∆x| sin θ) where λ GW is the GW wavelength and θ is the angle of incidence. This is, however, a naive estimate. A better estimate of the resolution would follow from considerations involving the pixel-to-pixel Fisher information matrix. In which case the solid angle resolution will scale inversely proportional to the square of the SNR.
In this paper we will consider a GW radiometer made using the two 4 km LIGO detectors LHO and LLO in the United States. However, this analysis is equally well applicable to any other baseline or a network of baselines involving other detectors such as VIRGO, TAMA, GEO etc. The light travel time between the LHO and LLO is ∼ 10 ms, which at a bandwidth of 1 kHz gives a resolution ∆θ of few degrees. This resolution implies that a few thousand patches or pixels are required to completely cover the full sky. This estimate, in turn, will be useful for assessing the computational cost and numerical complications in handling large matrices for obtaining the full skymap. We make these estimates more robust below.

B. Basic Framework and the Statistical Properties of the Data
We first set up the notation and framework needed for investigating the problem.
In the transverse traceless (TT) gauge the metric perturbation h ij (t, x) in the equatorial frame can be expanded in terms of plane waves of the two polarizations +, × in the following way: 1) where a tilde on a variable denotes its Fourier transform; the complex Fourier amplitudes satisfy the relation Here A = {+, ×} is the polarization index and summation over the repeated index A is implied. In terms of the spherical polar coordinates (θ, φ), the source direction is given bŷ Ω = cos φ sin θx + sin φ sin θŷ + cos θẑ (2.2) and, hence, the infinitesimal solid angle along the direc-tionΩ is dΩ ≡ sin θ dθ dφ. Note that the wave propagation direction is −Ω. The polarization tensors e A (Ω) are defined by the following expressions: whereê θ = cos φ cos θx + sin φ cos θŷ − sin θẑ , e φ = − sin φx + cos φŷ , are the two orthonormal basis vectors on a two-sphere.
The statistics of the GW signal can be best described in the Fourier domain: If we assume the signal to be stochastic and uncorrelated in the two polarizations [50], different frequencies and different directions, then the Fourier components of the GW strain obey, where P A (Ω) is proportional to the strength of the SGWB in the directionΩ and H(f ) is the two sided GW source Power Spectral Density (PSD). The interpretation of the quantity P A (Ω) can be made apparent by relating it to the specific intensity or brightness [21] of GW I GW (f,Ω). Specific intensity is defined as (c times) the incident energy density per unit frequency interval per unit solid angle, or, equivalently, specific intensity is the (normally incident) flux per unit solid angle. Following the convention commonly used in SGWB analysis, if the incident GW energy density is expressed in the units of critical energy density for a flat universe ρ c := 3c 2 H 2 0 /(8πG), where H 0 is the Hubble constant at the current epoch and G is the Newton's gravitational constant, it can be shown that [22]: Therefore, P A (Ω) can be interpreted as the specific intensity of SGWB for the corresponding polarization (up to a certain proportionality constant).
In general, we cannot separate H(f ) from P A (Ω), because the frequency power spectrum H(f ) could also depend on the directionΩ. A more general treatment would use a quantity like P A (Ω, f ) which describes both frequency and angular distribution of SGWB power together. But for a small enough bandwidth the assumption may be justified, as the signal is expected to have a smooth profile of the power spectra. Further, it should be noted that P A is actually a second rank tensor and should be represented by two indices as P AA , but because we assume that the two polarizations are uncorrelated the quantity P +× is zero and thus absent. To avoid unnecessary indices and with a slight abuse of notation, we therefore write P A with a single index instead of P AA .
We consider two GW detectors located at x I (t), I = 1, 2. The detector frames are denoted by the coordinates (X I , Y I , Z I ) where the arms of the respective detectors lie along the (X I , Y I ) axes. Then the detector tensors d I are given by: The factor 1 2 is inherited from the geodesic deviation equation. Owing to the Earth's rotation, the detector coordinates and d I are functions of t. Thus in matrix form: where R(t) is the rotation matrix describing a rotation of angle ω E t around the Earth's rotation axis, namely, the z-axis.
Here ω E is the Earth's sidereal angular velocity ≈ 7.3 × 10 −5 radians/sec. The strain h I (t) in the I th detector is given by: We define the antenna pattern functions as for a wave incident from the directionΩ. Contracting Eq. (2.1) with the detector tensors d I , the signal amplitudes can be expressed in terms of the antenna pattern functions as: The baseline ∆x at any time t (in matrix form) is given by: The detector output s I (t) is a sum of the GW signal h I (t) and noise n I (t) s I (t) = h I (t) + n I (t) . (2.13) Statistically, the gravitational-wave strains h I (t) are uncorrelated with the detector noise n J (t); that is, the following four correlations, in the time domain are zero: where t, t are any two time instants. We also assume that the noise in different detectors is uncorrelated; that is, n 1 (t) n 2 (t ) = 0. This assumption is not unreasonable when the detectors are widely separated. Thus the only possible correlation is between h 1 (t) and h 2 (t ). The statistic that we construct in the next subsection is based on this fact.

C. The Cross-correlation Statistic for the Directed Search
Stochastic GW signals inherently can arrive from any direction with any amplitude. Moreover, they are characterized by the statistical expectation values of energy density. The noise in the two detectors are assumed to be essentially independent. In this situation the crosscorrelation of the data from the two detectors is an appropriate statistic for detecting and observing stochastic GW. In order to optimize the SNR, the cross-correlation statistic for the directed search involves a directiondependent filter function Q(t,Ω; t , t ), which connects sidereal time t of one detector's data to t of the other detector's data to match the phases of the GW strains in the detectors. The filter does a inverse noise weighting using the PSDs of the two detectors in order to suppress noisy frequency bands and enhance the SNR by assigning relatively large weight to the sensitive regimes of the detectors and the bands where the source is expected to emit more power. (In general, Q will depend on P A (Ω). However, for the directed search that we envisage here, as we will see later, the P A (Ω) are delta functions and, therefore, the filter Q only depends onΩ.) The sampling interval with which the data are sampled is determined by the Nyquist frequency of the stochastic signals of interest, and can be well below a millisecond, corresponding to several kilohertz. While it is possible to compute the filter function on a time segment of this sampling interval, it is erroneous to do so for the physical problem at hand. This is because the signals at the two detectors will be incoherent on time-scales smaller than the light-travel time, t d , of that baseline, which is at most a few tens of milli-seconds for ground-based detectors.
Thus t d sets the lower limit on the duration of the time segment on which the filter function should be computed. The upper limit, τ , is set by the smaller of the timescales, on which the data are stationary and the timescale on which the detector orientation changes appreciably. We thus divide the data into time segments, ∆t, such that t d ∆t τ . The time segments used currently in LSC data analysis vary from 32 to 192 seconds, and are consistent with these limits.
The final statistic for the full observation time T is obtained by linearly combining the cross-correlations over the smaller time-intervals as a weighted sum. The filter is optimized for each time segment, I k = [t k − ∆t/2, t k + ∆t/2], at sidereal time t k and the statistic for the k th segment is given by: The final cross-correlation statistic S for all the n = T /∆t sidereal time bins can then be obtained by combining the ∆S k as a weighted sum as follows: where, the n quantities w k (Ω) are to be chosen so that the SNR for the statistic S(Ω) is maximized. We denote the SNR of S by ρ S = µ S /σ S , where µ S and σ S are the mean and standard deviation of S respectively. We use normalized weights n k=1 w k = 1. The µ S , σ S and the SNR ρ S are given in terms of the means µ k = ∆S k and variances σ 2 k = ∆S 2 k − ∆S k 2 of the individual mutually uncorrelated time segments as follows: The maximization is achieved elegantly by invoking the Schwarz inequality. We consider an n-dimensional Euclidean space equipped with the usual scalar product in which w is a vector with components w k . We define κ and ρ having the components κ k = w k σ k and ρ k = µ k /σ k respectively. Then, we may write ρ S ≡κ · ρ where the vectorκ is a unit vector in the direction of κ. ρ S is maximized when κ points in the direction of ρ. Thus, w k ∝ µ k /σ 2 k , the proportionality factor being n k=1 µ k σ −2 k −1 . We have the results: Further, we are free to choose the normalization of the filter Q. We choose the normalization such that all µ k are made identical, equal to µ, say. This simplifies the statistic S and its SNR ρ S : which are their final forms. Normally the GW signal is expected to be weak so that even after its integration over ∆t it is still smaller than the noise, that is, |µ k | σ k . Therefore the variance in each segment obeys σ 2 k ≈ ∆S 2 k . The signal however builds up when we integrate over all the time segments during the observation time. Secondly, from Eq. (2.20), it is evident that noisy time intervals contribute less to the statistic S leading to its optimal character.

III. THE CONVOLUTION EQUATION FOR THE SKY MAP
Since the GW power spectra and the detector noise are modeled in the frequency domain, it is convenient to for-mulate the whole analysis in that domain. However, since the detector output is a time-series, it is pertinent to ask over what time-duration must one compute their Fourier transforms. As discussed in the last section, typical acceptable segment sizes are a few tens to a few hundreds of seconds. Fourier transforms computed over such "small" (visà vis the total observation) time scales are termed as short-term Fourier transforms (SFTs). Each SFT becomes a function of time t as well because t is essentially the identifier of the segment. Thus we have a timefrequency representation of the data. As we shall see, this representation is most suitable for further analysis and has also been used in previous literature [17,18,19].

A. Time-Frequency Analysis of the Signal and Noise
The (approximate) SFT of a segment of detector output can be defined as [18], We retain here the convention of using 'tilde' over a symbol to denote Fourier transform -the distinction should be evident from the context. Most importantly, by taking the inverse Fourier transform, the exact time series segment s I (t) can be recovered. The same notation will be used for several other quantities as well in this analysis. While constructing the statistic, it was assumed that the true GW strains h I (t) in the detectors are correlated, but that the noise streams are uncorrelated. The expression for the correlation between the GW strains in two detectors will be derived here, which is necessary for the derivation of the optimal filter in the next subsection. The detector parameters can be approximated to be stationary over the period of the time segment. Hence we may consider the quantities d I and x I to be nearly constant over a time segment and regard them as functions of the time t labeling the time segment. The SFT of the GW strain in detector I over a time segment is given by: where summation over A is implied. The δ ∆t (f ) is the finite time delta function (sinc function) defined by, The finite time delta function δ ∆t (f ) behaves as the Dirac delta function δ(f ) in the limit ∆t → ∞, but has the property δ ∆t (0) = ∆t. Hence for a large time segment ∆t the SFT from detector I takes the simple form: The important result of this subsection is the expectation of cross-correlation between the SFTs of time segments of detector outputs at time t from the two detectors 1 and 2, which can be obtained from Eq. (3.3) and Eq. (2.5) as: The general overlap reduction function defined by Eq. (3.7) is a generalization of the usual overlap reduction function for the isotropic SGWB case, first constructed by Nelson Christensen [15] and formally written in a closed form by Flanagan [16]. In this case it is a more complex object. Besides the frequency, it is also a function of the segment time t. It also depends on P A (Ω) which is integrated over the full sky S 2 . Thus it is a functional of P A (Ω). In general, when we construct our directed filters, the integral will be restricted to a small patch of the sky, ∆Ω ⊂ S 2 . (In principle, ∆Ω can be any (measurable) subset of S 2 .) Accordingly, P A (Ω) plays the part of a weight function over the sky and we may define the measures: Thus, the γ in general becomes a functional of the SGWB power in both polarizations coming from the patch ∆Ω. We therefore separate the function arguments, t and f from the non-function arguments, ∆Ω and P A , by a semicolon. Finally, the exponential term in Eq. (3.6) before the integral is just the time-shift term in the Fourier transform of the segment at time t.
In the limit of a large time segment, Eq. (3.6) takes the simple form: can be readily realized if we put f = f . In this case, the correlation given by Eq. (3.9) diverges in the limit ∆t → ∞. But, in practice, ∆t is finite, and hence, we expect a finite value for the correlation. Eq. (3.6) lets us compute that finite value of h * f ) at f = f . We use the large ∆t limit and replace one of the finite time delta functions δ ∆t (f − f ) in the integrand of Eq. (3.6) by the Dirac delta function δ(f − f ), while treating the other δ ∆t (f − f ) as a normal function and put δ ∆t (0) = ∆t. We get, This formula was derived by following the same procedure as prescribed in [17], so, not surprisingly, for isotropic backgrounds our result matches the formula obtained in [17]. This result is important for injecting test signals in the detector output [23].
Next we describe the properties of detector noise in a finite time segment.
The noise in the segment labeled by t is a time series n I (t) in a detector I. Its SFT is given by: We take the mean to be zero: n I (t) = n I (t; f ) = 0. Since n I (t) is real, its SFT obeys the reality condition, n * I (t; f ) = n I (t; −f ). The noise in a detector is uncorrelated with the noise in another detector and with the GW signal, i.e., n 1 (t) n 2 (t ) = h 1 (t) n 2 (t ) = n 1 (t) h 2 (t ) = 0 [see Eq. (2.14)]. These relations also hold for their corresponding SFTs. The length of the time segment is usually kept a few tens of seconds long, over which the detector noise can be regarded as stationary. Thus n I (t ) n I (t ) is a function of t − t , provided both t , t are in the same segment centered at time t. Then, using the fact that n I (t) is real, we have, (3.12) where P I (t; f ) is the one-sided noise PSD. This noise PSD is also a function of time t as detector noise is nonstationary. The correlation between the corresponding SFTs can be easily obtained from the above relations: In the limit of large length of the time segment, we arrive at the usual formula Again, the advantage of using Eq. (3.13) in expressing n I (t; f ) n I (t; f ) becomes evident when we set f = f . The usual formula, Eq. (3.14), involving Dirac delta function diverges, whereas, in practice, that expression is actually finite because of a finite observation time. However, in Eq. (3.13), if we replace one finite time delta function by the Dirac delta function and treat the other as a normal function we obtain a finite result, This is used in our work when generating simulated noise for our test analyses.

B. Optimal Filters for Anisotropic Searches and the Directed Search
The aim of this subsection is to construct an optimal filter to maximize the SNR of the cross-correlation statistic over the small time-segments. We essentially generalize the analysis presented in [17] for isotropic background search to the case of anisotropic background search.
The optimal filter depends on the theoretical model of the SGWB, that is, on P A (Ω) and H(f ). First we will derive the filter for the general case of the anisotropic search and then specialize to the directed search. Our first goal is to compute the SNR ρ(t) of the statistic ∆S(t) over the time segments of length ∆t at time t.
The equation for the general case is the generalization of Eq. (2.15): Here we have suppressed the model dependence of Q. Assuming that the noise in both detectors and the earth are stationary within the duration of each time segment, we may write Q(t; t , t ) = Q(t; t − t ), which allows us to expand the filter in terms of its SFT Q(t, f ) as, Thus in terms of SFTs the statistic can be expressed as, The mean of the statistic ∆S(t) is: Here the H and P A are the actual quantities pertaining to the SGWB source. The corresponding quantities of the theoretical model are hidden inside the filter Q.
The variance after a routine but fairly involved calculation is obtained as: From these results the SNR ρ(t) for the segment at time t can be computed. However, we need to maximize the SNR over each time segment. Here again we follow the prescription presented in [17] -we invoke the Schwarz inequality. To this end it is convenient to define a scalar product of two functions A and B on each time segment, labeled by t, as, Then the mean and variance are given in terms of the scalar product as follows: The SNR ρ(t) is just the ratio µ(t)/σ(t). The SNR is maximized when the "signal" and the "filter" vectors are parallel, which happens when, where λ(t) is a (real) proportionality constant for the time segment. It is a function of the segment time t. This function will be chosen so that the SNR of S for the full observation is maximized. For the optimal filter, the expression for the mean given by Eq. (3.20) simplifies to: We exploit the freedom of choosing λ(t) by setting µ(t) = 1 for each time segment. This immediately gives: We further require to find λ(t) explicitly. This is done by computing Q 2 . We find Q 2 = λ 2 P 2 N W where, The integral on the right-hand side of the above equation is positive definite: Its integrand contains the fourth power of the GW amplitude. Therefore, it may be denoted by the square of a real quantity P N W , which is determined by the GW power accessible to the network of the two detectors. The above equation gives the normalizations: The optimal statistic is then easily obtained as in Eq. (2.20). We replace the sum in that equation by an integral over the segment time t: where the integration is over the observation time (which could consist of disconnected time intervals). The filter given by Eq. (3.25) is the general optimal filter to search for any anisotropic SGWB, which, unfortunately, requires the knowledge of H(f ) and P A (Ω). In practice, we do not have an exact a priori model for H(f ) and P A (Ω), which anyway we are trying to measure! We then must use models for those quantities, H (f ) and P A (Ω), to search for different anisotropic backgrounds. These models will be used to construct the modeldependent overlap reduction function γ(t, f ; ∆Ω, dP A ) and the filter Q.
The angular power distribution for only one unit point source on the sky in the directionΩ with equal power in both the polarizations can be expressed as: This immediately simplifies the expression for the overlap reduction function because now the integral in Eq. (3.7) simplifies owing to the delta-function, and γ becomes a function ofΩ as well. Therefore, we have: Unlike the time-independent overlap reduction function of the isotropic SGWB case, the direction-dependent overlap reduction function, γ(Ω, t, f ), accepts power from all the frequencies and in fact has infinite bandwidth in the limit of vanishing pixel area. So the bandwidth of the filter Q in this case would only be limited by the bandwidth of the detectors through the coefficients P I (t; f ). If, instead, the source has a finite spatial extent, the bandwidth would be limited, because the integral in Eq. (3.7) would have to be performed over the solid angle ∆Ω subtended by the source. If one takes a small patch of the sky of size (∆θ, ∆φ) around some fixed (source) directionΩ 0 := (θ 0 , φ 0 ), it is easy to show that γ is essentially a product of sinc functions in (∆θ, ∆φ). In fact: where |∆Ω| = sin θ∆θ∆φ is the solid angle subtended by the patch ∆Ω. In the integral, the factor Γ(Ω, t) remains nearly constant. The sinc functions go to zero when their arguments reach π radians. Taking this definition as the bandwidth and taking |∆x|/c ∼ 10ms for the two LIGO detectors, the bandwidth is about 750 Hz for a square source of side 10 • on the sky. If there were known models for the anisotropic SGWB, the optimal filter for the general anisotropic case would have included P A (Ω) and we would perform a full sky search for an anisotropic background. However, no reasonable model for the anisotropic SGWB sky exists in literature and, so, blind estimations are currently the only possible alternatives.
Directed search is one blind estimation approach, where the strength of each point (pixel) of the sky is 'observed' using a direction dependent filter, assuming that the other points on the sky do not contribute towards the observed value. So, in the directed case Q becomes a function ofΩ: where λ(Ω, t) is the normalization constant, which now varies from pixel to pixel. It is given by Eq. (3.30), but with the γ in the expression for P 2 N W in Eq. (3.29) replaced by the simpler Γ of (Eq. (3.34)). The directed filter given in Eq. (3.36) is an optimal filter if there is a single point source in the directionΩ and no sources elsewhere in the sky. If there are other sources in the sky, as in a general anisotropic background, the filter becomes suboptimal as it stands. However, we use the above filter to make a "dirty" map of the sky, which is a convolution of the the actual anisotropic background with the beam function and contains additive noise. We intend to extract information about the true background by the process of deconvolution. The convolution equation will be derived in the next subsection.
The working principle of the above filter is evidently similar to the earth rotation aperture synthesis often used in CMB and radio astronomy to make map of a certain portion or the whole sky. The phase lag between two detectors, separated by a distance ∆x(t), in receiving a plane wavefront from a certain directionΩ, as shown in Figure 1, is compensated in the filter via the phase factor exp[2πifΩ · ∆x(t)/c]. As the earth rotates this factor adjusts, so that, waves from the given direction are coherently added, while the waves from other directions tend to cancel out. Note that, we did not introduce the phase factor by hand, it appeared automatically through the process of the maximization of the SNR. Though the whole radiometer analysis is based on this principle, the idea is clearly realized in the directed search analysis.
where the beam functions B A (Ω,Ω ) are defined as: The function H (f ) is the model power spectrum of the source we insert in the kernel and ∆Ω =Ω −Ω . We measure S(Ω) and from the kernels B A (Ω,Ω ), we propose to solve the integral equation for P A (Ω). Physically, we may expect the power in both polarizations to be the same, that is, P + (Ω) = P × (Ω) = P(Ω), say, the kernel then is just the sum of the two individual kernels of each polarization, Our numerical deconvolution strategy is described in the next section. But before we do that we examine the kernel in Eq. (3.40) and try to understand it from a physical point of view. This will afford us some insight into the beam patterns associated with directed filters. It is worth noting that the beam function B(Ω,Ω ) is not symmetric only due to the leading normalization factor Λ(Ω), which comes from the normalization of the statistic S(Ω). We make use of this observation to introduce a symmetric kernel in section V, which is advantageous for numerical deconvolution.

D. The Stationary Phase Approximation of the Kernel and its Singular Value Decomposition
The GW radiometer beams are not pointed but have a spread out profile, which varies with sky position. Thus in order to make progress towards deconvolving the GW sky map we try to understand the beam pattern. We find that the Stationary Phase Approximation (SPA) of B(Ω,Ω ), given in Eq. (3.40), yields useful results. It is essentially the exponential term exp[−2πif ∆Ω · ∆x(t)/c] containing the phase that determines the integral -the integrand constructively contributes when the phase in the integral in Eq. (3.40) is stationary. We also use the fact that rest of the functions in the integrand vary slowly with time, so that they effectively behave as constants as far as the integral is concerned.
We obtain the beam function for a unit point source at Ω 0 ≡ (θ 0 , φ 0 ). We write ∆Ω :=Ω −Ω 0 . Note that ∆Ω may not be necessarily small; the pointsΩ,Ω 0 can lie anywhere on the unit sphere. By performing a numerical computation for an observation time of one sidereal day, we find that for low declinations, the beam is shaped like the figure of "8", as shown in Figure 2(a), while as one goes higher in declination, the "8" smoothly turns into a "tear drop".
With the application of the SPA we can explain the shape of the beam. The integrand in the kernel usually oscillates rapidly, because of the exponential phase term, except when the phase is stationary. The kernel is a double integral over f and t and therefore the SPA must be carried out in two dimensions. Setting the first derivative of the phase with respect to both variables f and t equal to zero, we obtain: where ∆ẋ(t) := d∆x(t)/dt. The detector separation vector ∆x(t) rotates about the earth's rotation axis (zaxis in our coordinate system) with the angular velocity ω E . Geometrically ∆x(t) traces out a right circular cone with z-axis as its symmetry axis [see Figure 3]. It is explicitly given by: where ∆R = |∆x(t)| is the constant distance between the detectors. As ω E t ranges from 0 to 2π, ∆x(t) traces out a cone with half angle Θ. The half-angle Θ of the cone, 0 ≤ Θ ≤ π, is given by, the SPA condition is satisfied when: where ∆Ω = |∆Ω| can take both positive or negative values. Since bothΩ 0 andΩ(t) =Ω 0 + ∆Ω(t) are constrained to lie on the unit sphere and thus both have unit norm, it follows from Eq.(3.46) that, the SPA solution Ω(t) is a curve on the unit sphere given by [22], The trajectory has been parameterized in terms of the sidereal time t. One can even obtain an approximate analytical expression for the beam function along the SPA trajectory using standard SPA techniques as [22]:

.(3.48)
As t is varied over a full sidereal day, the shaded figure of "8" is generated through Eq. (3.47) and Eq. (3.48) as shown in Figure 2(b). Clearly, SPA results match very well with the numerical beam pattern shown in Figure 2(a).
The case where Eq. (3.48) does not apply (though the analysis still remains valid) is when the detectors are at the same latitude, as the normal to the baseline cone, n cone (t), always remains parallel to theẑ axis, causing the denominator of Eq. (3.48) to vanish. In this case the whole SPA trajectory shrinks to a point, which is the image of the pointing direction about the equatorial plane,Ω =Ω 0 − 2[Ω 0 ·ẑ]ẑ. The value of the beam function at the image point, is also quite easy to compute. Therefore, a skymap produced by such a baseline will be a superposition of the (blurred) true sky and its (differently blurred) reflection about the equatorial plane. In practice, a situation like this arises for the LHO-Virgo pair, as their latitudes are quite close, 46 • 27 N and 43 • 37 N respectively.
The SPA solution given in Eq. (3.48) does not remain finite very close to the pointing direction, as the denominator vanishes. However, close to the pointing direction a better approximation is obtained by expanding the the phase term exp[−2πif ∆Ω · ∆x(t)/c] up to the second order, which can also be used for more accurate modeling of the core of the beam.
The stationary phase analysis also indicates an approximate resolving power of the radiometer. Since near the maximum of the beam function, the phase must not vary too much over the bandwidth, say no more than a radian, this implies that the resolving power is determined by |∆Ω| ∼ λ/|∆x|, where the λ corresponds to the bandwidth f = ∆f of the detectors. For the LIGO detectors, |∆x| ∼ 3000km. If we take the bandwidth to be ∼ 1kHz, then λ ∼ 300km, so the radiometer resolution is ∼ 0.1 radians, that is, ∼ 6 degrees, which we find is consistent with the numerically obtained beam profiles. A contour plot of the core of the beam of the radiometer formed by the LIGO pairs (with white noise and upper cut-off frequency of 1024 Hz) is shown in Figure 4. The plot confirms that the beam size in this case is 6 • ∼ 0.1 rad. The width of the beam estimated above seen to be consistent with the 'number of degrees of freedom' present in the kernel (beam). A widely used method that identifies the linearly independent modes in a linear transformation is the singular value decomposition (SVD) [25]. The decomposition identifies linear combinations of modes that have almost zero eigenvalues -the null subspace. It then projects out the solution orthogonal to the null subspace which spans the true degrees of freedom. The singular values of the kernel for the LIGO detectors at Hanford and Livingston for white noise with upper cut-off frequency f u = 1024 Hz are plotted in Figure 5 (solid line). The figure shows that the eigenvalues become essentially negligible after ∼ 1000 implying that this is the number of degrees of freedom in the kernel. The numerical and the SPA analysis shows that the size of the central spot or the resolution is ∼ 0.1 radian, which means that there are 4π/(0.1) 2 ∼ 1000 independent patches in the sky. So the SVD results are consistent with the size of the beam obtained by numerical and theoretical methods.
In practice, however, the higher frequency response of the detectors is not as good, hence the achievable angular sensitivity of the radiometer becomes relatively poor. The plot of singular values for the same detector pair both having LIGO-I goal noise PSD with an upper cutoff frequency of f u = 512 Hz is overlaid (dashed line) on Figure 5. Clearly, the number of degrees of freedom, which represents the amount of information content in a map, is less in this case.

A. Unpolarised Background and Single Baseline
We first consider the simpler case of detecting and deconvolving the signal from an unpolarised SGWB using one pair of detectors. We later indicate in the subsections that follow how this method can be extended to the more general cases of SGWBs and detector baselines.
The observed data construct, S(Ω), consists of a signal and additive noise, namely, S(Ω) = s(Ω) + n(Ω) . (4.1) The first term on the rhs, the expectation of the dirty map given in Eq. (3.37), is a convolution of the true power in SGWB P A (Ω) in the two polarizations arriving from a directionΩ in the sky with corresponding beam response functions B A (Ω,Ω ). Note that, though the definitions of the above quantities involve complex Fourier transforms, these quantities are all real owing to the fact that they were originally derived from real time series, so that, s * I (t; −f ) = s I (t; f ), γ * (Ω, t, −f ) = γ(Ω, t, f ) and so on.
In this section, we construct the Maximum Likelihood (ML) estimator for the angular power distribution P A (Ω) given the measured data S(Ω). For simplicity and clarity of presentation, we first limit the analysis to the simple case where both the polarizations follow the same angular power distribution P(Ω). This simplifies the form of the construct to S(Ω) = S 2 dΩ B + (Ω,Ω )+B × (Ω,Ω ) P(Ω ) + n(Ω).

(4.2)
In practice, the sky is divided into finite number of pixels. Then, the observed data vector is denoted by S, whose component S i := S(Ω i ) is the signal measured at the i th pixel. We similarly define the vectors P and n, with components P(Ω i ) and n(Ω i ), respectively. In this notation the convolution leads to a set of linear algebraic equations, where B is the known [51] beam matrix, expressed as In the weak-signal approximation, the variances of the signal-noise cross terms h * 1,2 (t; f ) n 2,1 (t; f ) are much smaller than the variance of the noise-noise cross term n * 1 (t; f ) n 2 (t; f ). So the observed pixel noise is strongly dominated by the noisenoise term and can be written as, the cross-terms h * 1,2 (t; f ) n 2,1 (t; f ) have been dropped. The pixel noise is a sum of a large number of zero mean random numbers, where none of the addend strongly dominate (statistically) over the others. Hence, following the generalized central limit theorem [24], one can argue that the pixel noise tends to be zero mean Gaussian. If this argument is used to calculate the variance of n(Ω), we consistently get the same result as expected from Eq. (2.17). After a routine but fairly involved algebra, the pixel-to-pixel noise covariance matrix, N ≡ N ij of the dirty map turns out to be: The deconvolution problem is, of course, a very standard problem in many areas of science. In particular, Eq. (4.3) is identical in structure to the set of equations that arise in the map-making stage of CMB experiments. The temperature anisotropy ∆T i in a direction is inferred from time-stream data, d t using a linear model d t = i A ti ∆T i + n t . The convolution kernel A ti that relates the time domain to the pixel domain is determined by the pointing or scan strategy as well as the beam response function of the antenna. The noise n t is (assumed to be) Gaussian and described by the noise covariance matrix N tt . As described below, a ML solution for the sky map ∆T i is readily obtained in this linear model under the assumption that the noise is Gaussian. We adapt this technique to solve our problem since it has been applied with great success in the CMB field and there exists an extensive literature [26] and also public domain package for implementing it numerically [27,28]. However, it should be noted that the problem differs in two important aspects. First, in our case there is the simplicity that the kernel connects two vectors which are both in sky pixel space. This implies the kernel is a square matrix for the case of single baseline and unpolarized background. Second, in our the case the statistics of the noise is potentially non-trivial; the noise in this situation is a complex object built by integrating the product of two random variables corresponding to the noise streams in each detector. The Gaussianity of the noise has to arise from the generalized central limit theorem [24].
We proceed assuming that the joint probability distribution of the elements of n is a multivariate Gaussian distribution [29] given by the probability distribution function: 6) where N pix is the total number of pixels and N := nn T is the known noise covariance matrix [see Eq. (4.5)]. Thus, given a signal vectorP, the probability of observing radiometer output S is [27], Solving for ∂P(S|P)/∂P = 0, using the fact that N is symmetric and positive definite, it is straightforward to show that the above probability is maximum when which is, therefore, the well-known result for the desired ML estimator of the true sky map of SGWB anisotropy.
The deconvolved map will also have pixel noise, given by and the pixel to pixel noise covariance matrix of the ML map is obtained as Therefore, to obtain the ML map estimate one has to first compute the inverse of the pixel to pixel noise covariance matrix Σ −1 = B T N −1 B. The ML map is then obtained as a solution to the linear algebraic equations, In solving this equation, we have a choice of either using one of a number of direct methods or one of the iterative methods. In the low resolution regime, the matrix inversion looks feasible with reasonable accuracy. However, in general, direct methods are computationally more expensive and iterative methods are preferred, provided their convergence to the solution is rapid. For sparse matrices the iterative Conjugate Gradient (CG) method is well suited for this inversion [25,30] problem. The CG method solves linear systems with symmetric positive definite matrix and have been found to be more advantageous compared to other iterative methods such as the Jacobi method [26,31]. Starting with a guess solution, the convergence of the method can be often greatly improved by 'preconditioning' the system of equations. i.e., multiplying both sides with a suitable matrix (say, the inverse of diagonal elements of the matrix.). Our choice is also motivated by the fact that the CG method has been successfully implemented for map making in CMB experiments [26,27,32,33,34].
The above method can be extended to include multiple baselines and also to estimate power in each polarisation component. We discuss these extensions in the following subsections.

B. Multiple Baselines
The above analysis can be extended to a set of N b GW radiometer baselines. Let S (i) be the observed map by the i th baseline with beam matrix B (i) and observed noise n (i) . Then Eq. (4.3) can still be written as: with X representing the matrices S, B, and n. Note that S, n are now 1 × N pix N b vectors and B is a N pix × N pix N b matrix, while P (the true SGWB sky) remains unchanged. This is similar to CMB experiments where each pixel is visited by the detector several times. In the multi-baseline GW radiometer case each pixel is visited by different baselines, and, unlike a CMB experiment, each pixel is visited an equal number of times. In this case too the Maximum Likelihood estimate[52] and the pixel to pixel noise covariance matrix of the ML map are given by: but the noise covariance matrix N := nn T of the raw sky map has to be modified. Let n i (Ω) be the pixel noise from the radiometer baseline i with detectors I and I and we follow the same convention for λ i and Q i . Then, (4.14) If i and j denote the same baseline we get back the previous result [Eq. (4.5)]. However, if i and j denote different baselines, at least one of the two detector pairs will be different (i.e. either I = J or I = J ), so in that case n i (Ω 1 )n j (Ω 2 ) = 0. Hence, one can write The above algebra suggests that it is fairly straight forward to combine the observations made by multiple baselines for the estimation of the true SGWB angular power distribution. In [17,35] it was shown that the optimal way to search for a isotropic SGWB using a network of detectors (with uncorrelated noise) is to linearly combine correlations from pairs of detectors, instead of computing higher order correlations using data from more than two detectors. We can extrapolate the same logic to the directed search and argue that the procedure described above to combine data from a network of detectors is also optimal.
The search for a SGWB using a network of detectors is becoming progressively relevant as other kilometer scale detectors, namely Virgo, GEO and LCGT, are expected to reach their initial goal sensitivity in the next few years. A network of detectors can enhance the directed search in many ways. The resolution of a radiometer is proportional to the length of the baseline. Inclusion of a detector at a distance like Virgo, which is further away of the LIGO detectors than the mutual separation of the LIGO sites, will clearly increase the highest baseline separation and hence the resolution of the baseline. However, more important enhancement would be realized due to better coverage of the sky. An analogy with radio astronomy using an array of antennas may be appropriate to mention in this context [53]. As the earth rotates, the projections of the radio antenna baselines on the plane perpendicular to the source direction sample different points on the two dimensional Fourier plane (commonly known as the u-v plane). The sampled Fourier plane is then inverse transformed to generate the image. While the highest resolution of the network is limited by the projection of maximum antenna separation, addition of more antennas to the network ensures better sampling of the u-v plane reducing the side lobes, thereby producing a more faithful image of the sky. Detailed introduction to the basic principles of earth rotation aperture synthesis can be found in most of the standard texts on radio astronomy, e.g., [36]. In GW radiometry with a network of detectors we expect that a similar scenario will arisebetter coverage of the sky should be possible due to different orientations of the baselines with respect to the source. Moreover, since the true power distribution will be estimated from an over-constrained set of equations, the error in the estimated quantities will be reduced. In addition, a radiometer search can benefit from certain technical advantages that a network of detectors can provide: A detector at a third site joining the LIGO detectors will boost the "single-baseline integration time", i.e., the single-baseline duty-cycle in a three-site network will be at least as good as, but will likely be better than, that in a two-site network. Also, owing to common instrumental noise sources in the LIGO detectors, certain frequency bands are currently notch-filtered in computing the cross-correlation statistic. Some of these noise sources are known not to affect Virgo and, therefore, the LIGO-Virgo cross-correlation statistics. For example, the power-line noise affects the LIGO detectors at the harmonics of 60Hz, whereas it affects the Virgo detector at the harmonics of 50Hz. Therefore, a radiometer search that benefits from the LIGO-Virgo baseline's contribution will probe the presence of astrophysical signals over a larger set of frequencies than one limited to the baseline consisting of the LIGO pair of detectors.

C. Polarization Map
We may also choose to extract power from different polarizations separately. The discrete convolution equation, S = B + · P + + B × · P × + n , (4.16) can also be expressed by Eq.
This case can also be generalized for a network of detectors by retaining the same definition of P [Eq. (4.17)], but redefining the beam matrix as: and, again, using the same ML estimation formula given in Eq. (4.13).

V. IMPLEMENTATION AND NUMERICAL RESULTS
In this work we have numerically implemented the maximum likelihood estimation algorithm on simulated data using the MATLAB R software package [37] to estimate the "true" (unpolarized) SGWB sky observed with a single baseline ground based GW radiometer. The details of the computation scheme, the numerical deconvolution algorithm, the simulated data and the deconvolved maps are presented in this section.

A. Preparation of Simulated Dirty Maps
The data are simulated in the frequency domain for each time segment. Since the noise of ground based interferometric detectors is very high at frequencies greater than few 100 Hz and the computation cost increases with the number of frequency bins, we use an upper cut-off frequency of f u = 512 Hz and bin width of ∆f = 2 Hz for testing of the algorithm. The length of each time segment is chosen as ∆t = 192 sec and the total integration time is T = 86400 sec [54]. The sky is pixelized using the Hierarchical Equal Area isoLatitude Pixelization (HEALPix)[55] scheme, which divides the 2-sphere (S 2 ) in 12 n 2 side pixels, where n side is an integer power of 2. Since the radiometer beam-width is greater than ∼ 6 • , we chose n side = 16, which corresponds to a pixel width of ∼ 3 • and a total of 3072 pixels. The HEALPix scheme also allows fast spherical harmonic transform on a sphere, which may become useful for more advanced analysis in future. Note that, the algorithm is independent of the pixelization scheme, other equal area pixelization schemes can also be used in the analysis.
We generate the detector noise n I (t; f ) using a Gaussian pseudo random number generator for each time segment. The noise is colored using the (one sided) noise PSD P I (t; f ) of the corresponding detector according to Eq. (3.15). MATLAB R software's pseudo random number generator randn can generate very long sequences of random numbers, so we relied on that routine for simulating detector noise. For each of T /∆t = 86400/192 = 450 time segments, we generated a complex random sequence (that is, two real random sequences) of f u /∆f = 512/2 = 256 real numbers. The total number of random numbers, 2(T /∆t)(f u /∆f ) = 225, 000, is much less than the period of randn, which is 2 1492 > ∼ 10 449 [38]. Signal is also generated directly in the frequency domain. However, the GW strain in each detector, h I (t; f ), is not generated independently; rather the product of the strains in the detectors, h * 1 (t; f ) h 2 (t; f ), is generated directly using the statistical properties of the strain correlation described in subsection III A, in particular, Eq. (3.10). We may write the product of the strains as a sum of its expectation value and statistical fluctuations: (5.1) Since our main aim is to generate, 2) and since statistically the variation in the signal terms are much weaker than the zero mean uncorrelated detector noise terms, we may simply drop the signal "fluctuations" term from Eq. (5.1) -that is, we may approximate the product of the detector outputs using the formula[56]: 3) For all the cases considered in this paper we have used flat source PSDs, i.e., H(f ) = constant.
In this analysis we assume the sky to be a collection of uncorrelated point sources of different strengths placed at every pixel. Moreover, the numerical analysis has been restricted to the case of equal power in each polarization. So the (injected) true sky is constructed by putting P true (Ω) = k P k δ(Ω −Ω k ), (5.4) where P k is the strength of the point source placed at pixel k, located in the directionΩ k (in order to inject only one point source at pixel k 0 , we make all the P k = 0 except for k = k 0 ). In this set up, the expression for the overlap reduction function [Eq. (3.7)] for the true SGWB strain becomes 5) where we use our usual notation for Γ(Ω, t) defined by Eq. (3.34).
We substitute the above in Eq. (3.10) and inject that simulated signal in noise using Eq. (5.3) to generate products of outputs from two detectors. In order to preserve the reality of time series data, the products of signals are generated only for positive frequencies and setting the negative frequencies equal to the complex conjugates of their positive frequency counterparts, that is, s I (t, −f ) = s * I (t, f ). The radiometer analysis is then run on the simulated data by placing filters Q(Ω k , t, f ; H) at each pixel k to generate the dirty maps.

B. Deconvolution: Clean Maps
Any deconvolution routine requires the beam function at all the points of interest on the sky. For a GW radiometer, the filters (and hence the beam functions) are dependent on the data set itself. So, if the beam function is calculated for each sky pixel, apparently the computational cost should go up by a factor of number of pixels (N pix ∼ 3000) times the cost to make one sky map of beam for a given pointing direction. However, we can use algebraic tricks to make this method computationally viable. A possible way to implement this method for the simple case of one baseline and equal power in each polarization is demonstrated below.
The beam and noise covariance matrices are given by: (5.8) Now one can see that the beam matrix is a summation of parts which depend on eitherΩ i orΩ j . So, it is possible to precompute the arrays λ(Ω, t), Γ(Ω, t), G(t, f ) and ∆x(t), only once, and then use them efficiently to evaluate the whole B matrix. Since each of these arrays are functions of any two of the three variablesΩ,t,f , the memory size required for each two dimensional array will not be too large. Once the arrays are computed, the next step is to obtain each element of the beam matrix separately. Each element requires the evaluation of two sum loops (like matrix multiplication) involving one exponential. The symmetry of the beam matrix (without the normalization constant) can be utilized here to reduce the computation cost by a factor close to 2. To summarize, a significant amount of CPU time required to make a skymap of the beam for one pointing direction could be utilized to make the beam maps for all other pointing directions. This is true for the noise covariance matrix as well, which, in this case, is proportional to the (unnormalized) beam matrix. A Fast Fourier Transform (FFT) with interpolation trick [42] and assumption of stationarity of noise for deconvolution (not for making the dirty map, where non-stationarity will be accounted for) can also be incorporated in future to reduce the CPU time.
In this simple case, since the beam matrix is a square matrix, so that, Even for the general cases of multiple baselines and polarized background, the estimation equation takes the above simple form [see section IV]. The first task was to compute the beam matrix, which happens to be the computationally most intensive task. A typical beam matrix for the LIGO baseline using 192 HEALPix pixels is shown in Figure 6. By construction B kk = 1 and |B kk | < 1 for k = k , hence the matrix is diagonal dominated. The "stripes" in the matrix are related to the pixelization scheme. The beam is stronger if the pixels are closer to the pointing direction and it weakens as the distance between the pixels and pointing direction increases. In other words, the pixels closer to a point source will have stronger contamination from the point source. However, since we have used a isoLatitude pixelization scheme, the indices of two neighboring pixels at different latitudes differ by the total number of pixels on that latitude. This fact is reflected in the plot of the beam matrix -the matrix is sparse with certain "periodic" behavior which produces the stripes in the plot. The matrix becomes even more sparse for finer resolutions as greater number of isoLatitude pixel rings pass through the core of the beam. Making a legible plot of the beam matrix for higher resolution is difficult, so the plot presented here is restricted to lower resolution -192 pixels instead of 3072. Each row of the matrix is the beam response function for the pointing direction that corresponds to the row index. The matrix is diagonal dominated as the radiometer receives maximum contribution from the pointing direction. The stripes are related to the isoLatitude pixelization scheme -the indices of the neighboring pixels at different latitudes differ by the total number of pixels on that latitude. The possibility of making the beam matrix more diagonal dominated using a nested pixelization scheme, where the indices of the neighboring pixels are close, is being explored.
Since the sparseness of the beam matrix depends on the pixelization scheme, it may be possible to make the beam matrix significantly diagonal by using a nested pixeliza-tion scheme, where the indices of the neighboring pixels are close. This possibility is being explored.
Sparse matrices are computationally easier to invert; however, the stability of inversion of such matrices is a numerical challenge. Therefore, as mentioned in section IV, instead of evaluatingP = B −1 · S [Eq. (5.9)], we choose to algebraically solve forP from the set of linear equations with the same number of unknowns as the number of equations (i.e., the system is not under or over constrained). The above equation can be cast into a linear system with symmetric kernel, b ·P = s, (5.11) by introducing two new quantities, (5.12) Comparing with Eq. (5.6) one can clearly see that b is a symmetric matrix. A symmetric kernel is always preferred by most of the algorithms for solving numerical linear equations. Moreover, it is possible to show from the algebra presented in subsection IV B that, to incorporate observations from multiple baselines, one can simply replace s and b by the sum of those respective quantities over all the baselines and solve Eq. (5.11) forP (which is, of course, independent of any baseline). Thus the extension of this analysis to incorporate a network of detectors becomes straight forward with these new quantities.
Following the discussion presented in section IV, we use a Conjugate Gradient (CG) iterative technique to solve the above set of linear equations. CG algorithm solves a set of linear equations A · x = b, where A is a square matrix and x, b are vectors, by minimizing the quadratic form 1 2 x · A · x − b · x. We use the minimum residual method, which efficiently utilizes the fact that b is symmetric and does not require b to be positive definite. The minimum residual method aims to minimize the residual |A · x − b| 2 itself, instead of the quadratic form 1 2 x · A · x − b · x. Further details on CG methods can be found in standard literature, e. g., [39].
The clean maps also contain pixel noise -partly due to the random noise present in the data and partly due to the numerical errors introduced at each stage of the pipeline, mainly during the process of deconvolution. There are pixels in the deconvolved map, which have negative values, even though the injected map is positive. To reduce the noise in the clean maps, we introduce an additional step: We compute the root-mean-square (RMS) noise "σ" in a map when there is no injected source. Then in the clean map (with source) we mask, that is, set to zero, all the pixels that have values less than a threshold of few σ. The number of iterations for deconvolution and the threshold for masking can be adjusted according to the tolerable levels of false alarm and false dismissal probabilities.
The above can be easily extended to handle real data where we have no control on the injections. One can calculate the equivalent of RMS noise for no injection by shifting the data streams from different detectors by a large time lag (much smaller than the segment duration), say, 1 sec, that corresponds to distances much greater than the earthly distances, so that, true GW signals are not added coherently. To be more careful, one can perform this exercise for a few large time shifts and confirm that the noise levels are not significantly different for different shifts.
It is, however, not so straight forward to measure the quality of deconvolution. The signal-to-noise ratio (SNR) of individual pixels do not carry enough significance, as the neighboring pixels are highly correlated. It is also difficult to define a quantity that can take into account the pixel-to-pixel noise covariance due to the difficulty in inverting the beam matrix. In this paper, we use a rather simplistic measure to quantify the quality of deconvolution, which is often used in image processing to measure the reconstruction error. We use a quantity known as the "Normalized Mean Square Error" (NMSE) [40], expressed in terms of the injected P and the estimatedP maps as Obviously, lower the NMSE, better the reconstruction. The whole analysis was tested for different kinds of injected maps consisting of localized sources and diffuse sources. In all these cases each pixel k of the injected map was assigned a value P k between 0 and 1 with a source PSD H(f ) = 5 × 10 −47 /Hz. This means that, if a pixel of a test map has strength 1, the standard deviation of the Fourier transform of stochastic GW coming from that pixel is H(f ) ∼ 7 × 10 −24 / √ Hz. This standard deviation is about one third of the standard deviation of Fourier transform of noise at the most sensitive frequency band of the LIGO-I detectors which is about 2 − 3 × 10 −23 / √ Hz. To our knowledge, the strength of anisotropic astrophysical GW background has not so far been predicted theoretically. However, if we try to extend the results from the all-sky averaged (isotropic) astrophysical background [41] to have a crude estimate of the strength of the anisotropic background, it turns out that, the PSD of the anisotropic astrophysical background in the universe is weaker than the H(f ) we have injected by roughly a few orders of magnitude. In the present work we have used an observation time of one day to demonstrate the method. With longer observation times and employing several baselines comprising of the upcoming advanced (more sensitive) detectors, the difference between the expected background and the detectable background would diminish or altogether disappear. We first inject a 4-pixel wide localized source near the Virgo cluster, a potential point source of SGWB, as illustrated in Figure 7. Figure 7(a) shows the injected map. Figure 7(b) shows the dirty map made from simulated data. Figure 7(c) shows the clean map, obtained by deconvolving the above dirty map with the beam using 15 conjugate gradient iterations and Figure 7(d) shows the same clean map masked using a 5σ threshold. It is evident that deconvolution has successfully localized the source in a relatively smaller area as compared to the dirty map. Still, one should note that, the deconvolution routines do not perform well when the injected source is like a delta function. This causes high NMSE, 1.22 for the unmasked and 0.64 for the masked clean maps, and significant loss of the peak strength of the reconstructed point source, as indicated in Figure 7. Moreover, increasing the number of iterations beyond a certain level actually deteriorates the quality of deconvolution due to noise amplification, and this level is dependent on the kind of source one is searching for. In this basic analysis we have used 15 iterations to search for localized sources and 40 iterations to search for broad sources, which offer reasonably clean deconvolution and comparatively low NMSE. Introduction of a minimum error criterion [43] to terminate the iteration process is being considered. Several other deconvolution algorithms are being explored in order to identify the one which is best suited for the GW radiometer analysis.
Next, we inject two kinds of diffuse sources, viz., one that is nearly equatorial and another that is distributed across multiple declinations, as illustrated in Figure 8. We injected modified (using FTOOLS[57]) galactic foreground seen in CMB temperature anisotropy measurements as our test patterns for the diffuse SGWB sources. The left panels of Figure 8 correspond to a modified form of the temperature anisotropy map measured by the WMAP satellite [44]. We emphasise that the sky looks different in barycentric coordinates, similar to what is shown in the top-right panel of Figure 8. We omit the coordinate transformation step intentionally in order to get a diffuse equatorial source. The right panels of Figure 8 correspond to a modified form of the temperature anisotropy map in the barycentric coordinates generated by the Planck Simulator [46]. One of the main modifications applied to both of these maps was to mask the brightest part of the galaxy. This step reveals more structures in the maps, which is useful for testing a deconvolution algorithm. Figure 8(a) shows the injected toy maps. Figure 8(b) shows the dirty maps obtained by the radiometer analysis. One can see that the dirty maps have lost all the fine structures present in the injected maps. Furthermore, they show certain features that were not even present in the injected map. Also, the pixel values in the dirty maps are spread over a range consisting of large positive and almost equally negative values. Figure 8(c) shows the clean maps recovered by 40 CG iterations. Clearly, many of the features of the injected maps have been recovered in the clean maps, which is also evident from the lower values of NMSE, 0.33 and 0.22 respectively. Also, the positivity of the estimated map has been vastly improved -the pixel values of the clean maps lie mostly on the positive side, as one should expect. Finally, Figure 8 (d) shows the masked clean maps obtained by using a 4σ threshold. Though the masked maps give better visual impression, masking can actually discard several pixels which have weak sources, thereby increasing the NMSE. In Figure 8(d), for example, masking increases NMSE to 0.36 and 0.33 respectively, though the masked maps look more similar to the injected maps shown in Figure 8(a), than the unmasked clean maps in Figure 8 (c).

VI. CONCLUSION
The stochastic astrophysical GW background is likely to be dominated by sources in the nearby anisotropic universe, so the detection of localized sources is more favorable than the all-sky-averaged search. Making a skymap of the SGWB sky has been a long standing ambition of stochastic GW research. Different analysis methods have been proposed to create skymaps by measuring the first few spherical harmonic multipoles of the sky. Here we have presented a direct approach of directed GW radiometer analysis. In this approach, the whole sky is decomposed in a discrete set of pixels and the contribution from each pixel is measured separately by correlating phase shifted detector outputs to generate the whole skymap, which is a clear application of earth rotation aperture synthesis. Specifically, for the AGWB detection statistic, we have defined a correlation statistic with a directed optimal filter that targets a fixed point in the sky by adjusting the time-delay across a baseline to track its rotation with the Earth. This statistic however provides us with a dirty map of the sky which we numerically deconvolve to obtain the true skymap. For this purpose, we have employed the conjugate gradient method. We numerically implement the deconvolution on simulated unpolarised GW sky maps obtained with the LIGO detector baseline. The success of this method is demonstrated by the recovery of simulated source distributions, namely, (i) of a point source, (ii) of a diffuse source in the equatorial plane, and (iii) of a diffuse source off the equatorial plane.
This work needs to be implemented on other baselines of the upcoming/future network of detectors such as LIGO-VIRGO, LIGO-LCGT, LISA etc. The outline of the analysis has been presented here. However, further detailed analysis and the implementation is a future goal. Even for the single baseline, SPA analysis shows that, perhaps a more efficient method of deconvolution, yielding better accuracy and convergence, lower computational costs and convenience of application may be possible using more sophisticated analysis, for example, one involving basis functions. The Maximum Likelihood framework presented here is, in fact, independent of any particular choice of basis. So once a suitable basis is chosen, rest of the analysis can be applied without requiring any major change. It may also be possible to deconvolve only a patch of the sky using a similar method [45].
The work presented in this paper should also benefit two other searches. First, since the long-duration integration of the data will essentially comprise a sum over short stretches, a large signal in a short stretch will constitute a candidate for a transient or burst (short-duration) event. Unlike the coincidence search being currently conducted for such events, our work will combine coherently the outputs of several detectors and, thus, improve their detectability. Second, the long-duration integration of the data should be able to find gravitational wave signals from modelled sources, such as pulsars. Although our method is optimal for searching unmodelled sources, it is not so for pulsars, the signals from which can be matched filtered. The problem with the latter method is that owing to the very large parameter space volume, an all-sky, all-frequency search for pulsars with matched filtering is not computationally viable. Our proposed method is not handicapped by this problem since it does not use the intrinsic source parameters for the search; rather it uses the data from one detector to 'filter' that from others in the network, after appropriately time-shifting them and weighting them with the respective antenna patterns.
SGWB anisotropy for each possible spectral index.
[52] Of course, one could combine the maps from different baselines with suitable pixel dependent weight factors wi, which may also reduce the noise by a factor of ∼ √ N b . But, it would have issues of combining data from baselines with different beam functions. Further analysis might be required to assess which method would be more advantageous.
[53] Notably, aperture synthesis technique using earth rotation was first introduced by Martin Ryle for radio observation of cosmic sources.
[54] For convenience we have used 86400 sec for one sidereal day, instead of 86164 sec. It hardly affects the accuracy of the results presented in this paper, as the same duration has been used for both injecting signals and analyzing data. Even for analyzing real data the same segments can be used, however, earth's rotation frequency should be accurately supplied in order to establish correct correspondence between time and celestial coordinates.
[55] http://healpix.jpl.nasa.gov/ [56] Note that, it is also possible to generate the correlated detector strains hI (t) independently and construct e sI (t; f ) = e hI (t; f ) + e nI (t; f ) for each detector I separately using simulated colored noise e nI (t; f ) to test the analysis. However, we chose the above method in order to reduce complications in the primary testing of the analysis presented in this paper.