Best chirplet chain: near-optimal detection of gravitational wave chirps

The list of putative sources of gravitational waves possibly detected by the ongoing worldwide network of large scale interferometers has been continuously growing in the last years. For some of them, the detection is made difficult by the lack of a complete information about the expected signal. We concentrate on the case where the expected GW is a quasi-periodic frequency modulated signal i.e., a chirp. In this article, we address the question of detecting an a priori unknown GW chirp. We introduce a general chirp model and claim that it includes all physically realistic GW chirps. We produce a finite grid of template waveforms which samples the resulting set of possible chirps. If we follow the classical approach (used for the detection of inspiralling binary chirps, for instance), we would build a bank of quadrature matched filters comparing the data to each of the templates of this grid. The detection would then be achieved by thresholding the output, the maximum giving the individual which best fits the data. In the present case, this exhaustive search is not tractable because of the very large number of templates in the grid. We show that the exhaustive search can be reformulated (using approximations) as a pattern search in the time-frequency plane. This motivates an approximate but feasible alternative solution which is clearly linked to the optimal one. [abridged version of the abstract]


I. INTRODUCTION
The worldwide network [1] of large scale interferometric gravitational wave (GW) detectors have started to take data. The network includes the detectors GEO600, LIGO and TAMA. It will be completed soon by the upcoming Virgo. The overall sensitivity of these detectors is continuously improving. Interesting upper-limits for the amplitude of GWs are being set and the first detection is hopefully not too far.
A large variety of astrophysical sources are expected to emit GWs in the observational frequency bandwidth of these detectors. From the data analysis viewpoint, the detection methodology for these sources depends on the availability of a reliable and complete model of the GW.
Generally speaking, the oscillations of the GWs are related to the orbital, rotational bulk motion of the constituents of the emitting system. Since the system loses energy by radiation, or because of some other physical process involved, its orbital period, and consequently the GW frequency can vary with time. In such case, the emit-ted GW is a frequency modulated signal i.e., a chirp. A detailed knowledge of the dynamics of the system is required to describe precisely the characteristics of the GW chirps, in particular the phase evolution. This may not be always possible as described in the following examples.
The GW emitted by a coalescing binary of compact objects can be divided into three phases (inspiral, merger and ringing). Although the GW can be obtained accurately in the inspiral phase when the bodies are well separated [2] (using post-Newtonian expansions) and in the ringing phase after they have merged [3] (using perturbative methods), the in-between merger phase still defeats both the numerical and analytical efforts [4] for modelling its highly non-linear regime. For large mass binaries, the merger phase contributes to a dominant fraction of the signal-to-noise ratio (SNR) [5]. In this case, the search method has to accommodate the significant lack of signal information.
Kerr black holes accreting matter from a surrounding magnetized torus are putative sources of the long gammaray bursts (GRBs) [6]. It is claimed that, the black hole spin energy is radiated away through GWs along with the GRB. The precise shape of the emitted waveform would need accurate hydro-dynamical numerical simulations.
A third example is the GW emitted in the form of the quasi-normal modes [7] by a newly born hot neutron star (during the cooling phase which follows the core collapse). Here, the characteristics of the GW depend on the equation of state of the proto-neutron star and various physical processes (like neutrino diffusion, thermalization and cooling) which are currently not known with accuracy.
All these three examples are expected to emit GW as an unmodelled chirp, the phase information being not (perfectly) known. Its typical duration in the detector bandwidth is of the order of a few seconds.
While matched filtering is a well-known and efficient detection technique when a precise waveform model is available, the lack of waveform information prevents us from using the same approach. It is thus natural to advocate for exploratory searches (based on partial information or "good sense" models) as opposed to targeted ones (relying on a precise model).
Various strategies [8,9,10,11,12,13,14] have been designed following this viewpoint, for the detection of transients of short duration (tenth to hundredth of milliseconds) or GW burst. Such transients are typically from supernovae core collapses. The notion of a varying frequency is not adequate for such a small number of cycles. It is thus not meaningful to describe such transients as chirps. Their detection is a different issue than the one considered here.
Here, we are interested in exploratory search specifically for unmodelled chirps. In the past, this question has already been investigated yielding a detection method, the Signal Track Search [13] (STS). The STS relies on the observation that, in a time-frequency (TF) representation, a chirp appears as a filiform pattern and this discriminatory signature can be searched for. A satisfactory implementation of this phenomenological argument calls for a proper TF representation (TFR) and pattern search algorithm. The STS results from "ad-hoc" choices for the above mentioned points.
In this paper, we propose a new method for the detection of unmodelled chirps. It is based on the same general principles (pattern search in a TFR) as the STS. Its originality resides in the clear link we establish between the method (i.e., the choices of TFR and pattern search) and an optimality criterion.
The paper is organized as follows. We state the detection problem in Sec. II. We introduce the general chirp model referred to as smooth chirp and we assume that most physically realistic GW chirps can be described by this model. The phase of a smooth chirp is an arbitrary continuous and differentiable function with bounded first and second derivatives. In Sec. III, we derive the optimal statistic for detecting a given smooth chirp in noise, which is usually referred to as quadrature matched filtering. The idea is then to apply this statistic for any smooth chirp, and select the maximum which is associated to the individual that best fits the data. This maximization has to be done numerically. To do so, the set of smooth chirp being a continuous set, has to be discretized. In Sec. IV, we show that grids of templates can be constructed for smooth chirps using chains of small chirps, we call chirplet chains (CC). We further prove that the grid is tight i.e., any smooth chirp can be closely approximated by a chirplet chain. The maximization of the statistic over the set of smooth chirp can be reliably replaced by a maximization of the set of CCs. However, the number of CCs being very large, the computation of the quadrature matched filter for all CCs is not tractable. In Sec. V, we propose a feasible (TF based) procedure for finding the best CC. We show that the quadrature matched filter can be reformulated approximately as a path integral computed in the TF representation given by the discrete Wigner-Ville (WV) distribution. As a result, the maximization of the statistic over the CCs amounts to obtaining the TF path of largest integral. We demonstrate that this kind of problem can be solved efficiently with dynamic programming. We detail our path search algorithm and we evaluate its computational cost. Finally, we compare the resulting algorithm with other methods in Sec. VI. Receiver operating characteristics obtained in several realistic situations demonstrate the superiority of the proposed approach.

II. SMOOTH CHIRPS IN GAUSSIAN NOISE
We introduce a general chirp model which we refer to as smooth chirp, and s(t) = 0 outside this interval. A smooth chirp is characterized by the amplitude A, the initial phase ϕ 0 and a smooth phase evolution φ(t) (without loss of generality, we assume φ(t) = 0 at the arrival time t = t 0 ). We define the term smooth as follows. A phase φ(t) is smooth if this function and its first three derivatives are continuous and we have for all t and where f (t) ≡ (2π) −1 dφ/dt is the instantaneous frequency. The chirping rate limitsḞ andF are chosen based on the allowed upper bounds obtained from general astrophysical arguments on the GW source of interest. The chirp model thus includes four parameters p ≡ {A, ϕ 0 , t 0 , φ(·)} which are not known a priori and need to be determined from the data. Let the signal be correctly sampled at the Nyquist rate f s ≡ 1/t s and let assume that we acquire the data x k by blocks of N samples. The signal is denoted by s k ≡ s(kt s ) for k = 0, . . . , N − 1 with the duration T = t s N . The noise n k is assumed to be additive white and Gaussian with zero mean and unit variance. Since the noise of GW detectors is colored, this noise model implies that a whitening procedure has been already applied to the data. (Therefore, the signal s k in Eq. (3) is a "whitened" version of the actual GW signal).
In this initial work, we restrict the smooth chirp model to have a constant envelope, although GW chirps are generally amplitude modulated. The constant envelope thus limits the descriptive power of the model. However, we argue that the model is still reasonable for many cases 1 and that the phase information plays a major role for detection of chirps. We leave the problem of detecting amplitude modulated chirps for subsequent work.

III. OPTIMAL DETECTION OF A SMOOTH CHIRP
For each block of N data samples, the signal detection problem is to decide which statistical hypothesis suits best to the data among the following two : In practice, this requires thresholding a functional of the data, commonly referred to as statistic. If the statistic crosses the threshold, H 1 is chosen as opposed to H 0 and vis a versa.
Due to the presence of random noise, this decision may not be always the right one. There are two types of errors associated to this: false alarms (decide H 1 while H 0 is present) and false dismissals (the opposite). The probabilities of occurrence of these two errors fully quantify the performance of a given statistic. This information can be used to rank the large number of possible statistics and to identify the best one. This is the approach followed by the Neyman-Pearson (NP) criterion [15]: the NP-optimal statistic minimizes one error probability, while keeping the other fixed to a given value. To be precise, in the present case, it minimizes the false dismissal probability for a fixed false alarm probability.
For simple detection problems, it can be shown that the likelihood ratio (LR) defined by λ ≡ P({x k }|H 1 )/P({x k }|H 0 ) is NP-optimal [15]. For smooth chirp detection problem described in Eq. (3), the LR can be easily obtained if we assume that the chirp parameters p are known in advance. When the parameters are not known a priori (which is the situation here), the ideal would be to have a statistic which is NP-optimal for all values of the parameters. This statistic is usually referred to as uniformly most powerful. However, it is not guaranteed that such statistic always exists, and even if it does, it is generally difficult to obtain.
A sensible solution consists in getting some kind of estimates for the unknown parameters and then use the LR assuming that the estimated value is the actual value. If we use maximum likelihood (ML) estimators of the unknown parameters, the resulting statistic is referred to as generalized likelihood ratio test (GLRT) [15] (or maximum likelihood test in the statistical community).
The GLRT can be shown to be uniformly most powerful in certain cases [15]. For our problem, up to our knowledge, this is an open question. Strictly speaking, it is thus not correct to qualify the GLRT as "optimal" (as is often done in the literature on GW data analysis). Nevertheless, we continue this misuse of language since the GLRT has proven to perform reasonably well and no better alternative appears to be available.
In the following subsections, we give the derivation of the GLRT statistic. We proceed with the maximization of likelihood ratio with respect to the parameters. Following [16], we note that out of the four parameters, A, ϕ 0 and t 0 are extrinsic parameters (known as kinematical or dynamical parameters) whereas φ(·) is an intrinsic parameter (which determines the shape of the chirp waveform). On the basis of this distinction, the maximization over the extrinsic parameters can be treated in a simple manner whereas the computation of the ML estimate of the intrinsic parameter requires a more sophisticated numerical treatment.
A. Maximize the likelihood ratio: A and ϕ0 In this subsection, we maximize the LR with respect to A and ϕ 0 . In case of Gaussian noise, it is more convenient to use log-likelihood ratio (LLR) which is expressed by We introduces k ≡ cos(φ k + ϕ 0 ) (such that s k = As k ) with the norm N ≡ N −1 k=0s 2 k . The maximization of the LLR Λ(x; p) over A is straightforward and gives the expression of the ML estimate of the amplitude, namelyÂ = N −1 k=0 x ksk /N . Inserting this expression into Eq. (4), we obtain The analytical maximization of the LLR over ϕ 0 deserves a little more attention. The same calculation has been performed for the detection of chirps from inspiralling binaries [17,18] but it is based on the assumption that N is independent of ϕ 0 which is not valid in the context of arbitrary chirps. In Appendix A, we detail this calculation and discuss the validity of this assumption.
We express the resulting statistic ℓ(x; t 0 , φ) ≡ Λ(x; {Â,φ 0 , t 0 , φ(·)}) using the following notations for the cross-correlation of the data with the two quadrature waveforms, and for the norms and cross-products of cos φ k and sin φ k , We distinguish two cases. In the degenerate case where the two quadrature waveforms are linearly dependent (φ k is a constant), O ≡ n c n s − n 2 x vanishes and we have Otherwise O > 0, the optimal statistic is (9) and is commonly referred to as quadrature matched filtering, (see Appendix A).
B. Maximize the likelihood ratio: φ and t0 The statistic ℓ results from a quadratic combining of the cross-correlations defined in Eq. (6). It can be seen as a "generalized dot-product" and can be related to a "distance" measuring the discrepancy between the data and template waveforms (or, in short, templates) defined by the phase φ (see Eq. (A8)). Maximizing ℓ over φ is equivalent to minimizing this distance.
The expression in Eq. (9) is for a given known phase φ. If the phase is unknown but belongs to the set of smooth chirps, then we need to minimize the distance within this feasible set. In other words, we need to find that smooth chirp which best fits the data i.e., find This maximization is difficult to tackle analytically and has to be done numerically. The set of smooth chirps is a continuous set and hence not easy to manipulate numerically without discretizing it. For this purpose, we introduce chirplet chains, which we discuss in the next section.
As described earlier, we process the data stream blockwise. We compute the statistic independently for each block. The maximization over t 0 is obtained by comparing ℓ max for neighbouring blocks and selecting the maximum. The ML estimate of t 0 is then given by the starting time of the corresponding block. The period separating two successive starting times thus defines the resolution of the estimate. If required, this resolution can be improved by increasing the overlap between two neighbouring blocks.
We now concentrate on the maximization of ℓ(x; φ) over φ in a given block. In the following, we remove t 0 from the arguments of ℓ to keep the notations simple.

IV. CHIRPLET CHAINS: A TIGHT TEMPLATE GRID FOR SMOOTH CHIRPS
In this section, we show that chirplet chains (CCs) can be used to construct template grids for smooth chirps. CCs are based on the simple geometrical observation: broken lines give good approximations of smooth curves. CCs are signals whose (instantaneous) frequency is a broken line. We verify that they are good approximation of the frequency curve of an arbitrary smooth chirp. We obtain the conditions ensuring that, for any smooth chirp, there always exists a sufficiently close CC. If these conditions are satisfied, the set of the CCs forms a tight template grid which can be used to search for an unknown smooth chirp. Finally, we examine the implementation of such grid for the toy (but realistic) model given by the inspiralling binary chirp.
. . N f } be a regular TF grid led on D by splitting the time axis into N t intervals of size δ t ≡ T /N t , and the frequency axis into N f bins of size In the following, the subscripts j and m designate the index of the time interval and the frequency bin of the grid, respectively. The index k ∈ {0, . . . , N − 1} denotes the time index of a sample.
A chirplet is a short piece of signal whose frequency varies linearly between two successive nodes of the grid. In the time interval j, we denote the time and frequency coordinates of the chirplet extreme points by (j, m j ) and (j + 1, m j+1 ). In the TF plane, it is thus represented by a line joining the grid nodes (t j , f mj ) and (t j+1 , f mj+1 ) (see Fig. 1). Concretely, this means that the phase φ k = φ(t s k) of a chirplet is a quadratic function of time, as follows, for t j ≤ kt s < t j+1 where We build the chirplet chain (CC) by enforcing chaining rules. The frequency and phase of this chain are continuous. Clearly, the continuity of the frequency is ensured by construction, while the phase continuity requires that for j ≥ 1, and fixing θ −1 = 0. The slope of the chirplet frequency as well as the difference between the slopes of the frequencies of two consecutive chirplets are bounded absolutely. These bounds are given by the A chirplet is a short piece of signal whose frequency varies linearly between two successive nodes of the grid. It is thus represented by a line joining the grid nodes. The slope of the chirplet frequency is limited (triangular region in light gray, here N ′ r = 1). We chain the chirplets, imposing the continuity of the chain and limiting the difference between the slopes of two consecutive chirplets (triangular region with dark gray stripes, here N ′′ r = 1). Admissible chirplets in time interval j belong to the intersection of these two regions associated to the regularity constraints. Clearly, a chirplet chain is represented by a broken line in the TF plane.
two parameters N ′ r and N ′′ r respectively such that (i) Clearly, a CC is represented by a broken line in D. The two parameters N ′ r and N ′′ r control the regularity of this line. Consistently, we will refer to (i) and (ii) as regularity constraints.
The instantaneous frequency of a smooth chirp is associated to a smooth curve in D. In the same manner that broken lines are good approximations of smooth curves, CCs are good approximations of smooth chirps. Since CCs form a finite discrete set, they sample 2 the set of smooth chirps. In other words, they form a template grid of this set.
It is important to know whether this template grid is sufficiently tight i.e., whether for any smooth chirp, there always exists a sufficiently close CC. The template grid tightness is controlled by the choice of the four parameters defining the set of CCs, namely the TF grid parameters N t , N f and the regularity parameters N ′ r and N ′′ r . The first and preliminary step to address the tightness question is to define a distance measuring the "similarity" (or ambiguity) between two different chirps.

B. Distance in the set of smooth chirps
We follow the approach suggested in [18] and assume that we "receive" a chirp whose phase φ is different than the template phase φ * . We set x k= s k = A cos(φ k + θ), and consider Clearly, L measures the reduction factor of the "detection peak" due to the mismatch between the chirp present in the data and the chosen template. It is a relative measurement done with respect to the ideal case where the template matches exactly the considered chirp. In this sense, it can be interpreted as a SNR loss.
Since L ≥ 0 and equals 0 when φ = φ * , it can be interpreted as a distance between the chirps. Note that L does not depend upon A. It depends only on the phases φ and φ * , but this dependency is difficult to perceive intuitively from its definition in Eq. (13).
An approximated but much simpler expression can be obtained for nearby chirp and template retaining the leading terms of a Taylor expansion for small ∆ k ≡ φ * k − φ k . The approximation is detailed in Appendix B and leads to the following expression with ∆ = 1/N N −1 k=0 ∆ k . Interestingly, we recognize in this expression the empirical estimate of the variance of the phase difference ∆ k . With this definition of the distance, two chirps are "identical" (their distance measured by L is zero) if and only if they have the same phase evolution up to an additive offset.

C. Is the CC grid tight ?
In this section, we address the grid tightness problem and find the regularity and TF grid parameters which yield a tight template grid of CCs. We proceed as follows: we first consider an arbitrary smooth chirp of phase φ. Then we construct a CC "geometrically" close to this chirp. We check if this CC is admissible i.e., if it satisfies the regularity constraints. This imposes two conditions on the parameters. Finally, we check whether it is effectively close to the chirp (as measured by the metric). This yields the loss due to the approximation of the chirp by a CC in the worst case. For tight CC grid, this loss (homogeneous to a SNR loss) has to be small which imposes one more condition on the parameters. 1. "Geometrically" close CC Let φ(t) and f (t) be the phase and frequency of an arbitrary smooth chirp. The frequency evolution appears as a smooth curve in the TF plane.
We construct a CC "geometrically" close to this chirp as follows: for each j = 0, . . . , N t , we choose the node (t j , f * j ) of the j-th column of the TF grid defined in Sec. IV A which is nearest to the point (t j , f (t j )). We draw the broken line by joining these nodes (see Fig. 2). The associated CC is the "geometrically" close CC to the chirp under consideration and we denote its phase φ * .

Admissibility of the "geometrically" close CC
For a given chirping rate limitsḞ andF , the "geometrically" close CC may or may not satisfy the regularity constraints. This depends on the regularity and TF grid parameters. Below, we investigate this question.
a. 1 st order regularity -Let us consider the chirplet of the interval j, we have Using the mean value theorem 3 (see e.g., [19]), we get Thus, the geometrically close CC satisfies the regular- We rewrite this condition in the following form where N ′ ≡Ḟ T 2 is an adimensional quantity which depends only on the fundamental characteristics of the smooth chirp model. b. 2 nd order regularity -We consider two successive chirplets in intervals j − 1 and j. Using a similar method, the difference of their slopes can be bounded by Two consecutive applications of the mean value theorem to a function f (·) which satisfies Eq. (2) for all s ∈ [r, t] with 0 ≤ r < t ≤ T yield the following result Therefore, the geometrically close CC satisfies the reg- We rewrite the above condition as where N ′′ ≡ 3F T 3 is an adimensional quantity which depends only on the fundamental characteristics of the smooth chirp model, and thus from the astrophysical input. It is related to the maximum overall curvature of the chirp frequency or more precisely to the largest number F T 3 of Fourier bins that the chirp frequency can sweep, the linear trend being removed.
3. Is the "geometrically" close CC effectively close?
We obtain a worst case estimate on the distance between the smooth chirp and its "geometrically" close CC by bounding the variations of their phase difference. We begin with bounding the frequency discrepancy.
The starting point is the following lemma (inspired from [20], p. 23) obtained from the application of the mean value theorem and some algebraic manipulations. If f (·) satisfies Eq. (2) for all s ∈ [r, t) then However, this upper-bound can be slightly improved: the term (t − r) 2 over-estimates the more precise bound (Note that g(t) = 0 and g(r) = 0 as expected.) In the worst case, we have s = (t+r)/2 and g(s) = 3/8(t−r) 2 as opposed to (t−r) 2 . We include this gain in the following.
We apply the lemma in Eq. (24) with t = t j+1 and r = t j . The term inside the square brackets in the left hand side of Eq. (24) is equal to the frequency at time s of the chirplet obtained by joining f (t j−1 ) to f (t j ) (see the dot-dash line in Fig. 2). We denote this frequencỹ f (s), and then obtain Since Integrating both sides of the above inequality between two successive points r = t s (k − 1) and which constraints the variations of the phase difference ∆ k = φ * − φ * k . We prove in Appendix C that the approximated distance L(φ, φ * ) as shown in Eq. (14) is maximum under this constraint, when ∆ k = ±2π∆ f t s k. In this case, the maximum is We can finally state the following tight template grid theorem: for all smooth chirps of phase φ, there exists a CC of phase φ * such that where µ ′ = π 2 T 2 (3F δ 2 t /4 + δ f ) 2 /12 is the maximum (i.e., in the worst case) energy SNR loss due to the mismatch between the smooth chirp and the chosen template given by a close CC. The corresponding maximum amplitude We express the maximum SNR loss µ in terms of the CC parameters as In principle, this loss can be made arbitrarily small by choosing N t and N f adequately. Therefore, the grid of CC can sample the set of smooth chirps tightly.
It is evident that two types of losses contribute to µ. The first one is related to the geometrical error due to the fact that the model is a broken line: within the time intervals of the TF grid, the model is a straight line which cannot perfectly follow the curvature of the smooth chirp frequency. The finer the grid along the time axis, the smaller the time interval, the better the line fits the smooth chirp frequency, thus reducing this error. The other is related to the quantization error as we require the node of best broken line to belong to the TF grid: there is a difference between the best broken line we can possibly draw and the closest (quantized) one with vertices belonging to the grid. The finer the grid along the frequency axis, the closer the quantized line from the original, thus reducing this error. The maximum SNR loss is the function of these two independent parameters.
When N t = N ′′ and N f = 2N , the maximum SNR loss is of order ∼ π 2 /96 ≈ 10% and the two types of errors contribute equally. The same maximum SNR loss can be achieved with other choices for N t and N f . In the next section, we propose a criterion to solve this indetermination.

D. Smallest tight CC grid
As elaborated in the previous section, the tight template grid theorem gives the condition on the TF grid parameters N t and N f which ensures that the template set (the CCs) covers all the feasible set (the smooth chirps) with a given accuracy specified by the maximum SNR loss. The same accuracy can be achieved with several pairs of parameters, leading to a parameter indetermination.
For a small maximum SNR loss, the maximization of the LLR in Eq. (10) performed over the set of smooth chirps can be safely replaced by a maximization over the set of CCs, i.e., The statistic ℓ max in Eq. (30) results from the CC which maximizes the statistic or in other words, from the waveform of the template set which best fits the data. Generally speaking, when the data is noise only, the larger the number of (reasonably different) waveforms in the template set, the larger the risk that one of the waveforms fits the noise and consequently, the larger the false alarm rate.
The TF grid parameters N t and N f influence very differently the number of CCs. The above argument suggests to select the parameters which minimize the numbers of CCs, for a given specified maximum SNR loss. We refer to the smallest tight CC grid as the set of CCs which results from this constrained optimization.
Let us first estimate the number of CCs. According to the regularity conditions, each of the number N c ∼ N f (2N ′ r + 1) of possible chirplets in a given time interval can be chained to (at most) 2N ′′ r + 1 chirplets in the next time interval. Counting CCs is then a combinatorial problem. We have N c chirplets in the first time interval, and 2N ′′ r + 1 possible choices for the N t − 1 successive time intervals. Neglecting what happens at the lower and upper boundaries of the frequency axis (i.e., near DC and Nyquist), we obtain an upper-bound on (the logarithm of) the number N cc of CCs as In practice, we have N t ≫ 1. The second term largely dominates the right hand side and the first term can be neglected. We thus have ln N cc ∼ N t ln(2N ′′ r + 1). At this point, it is convenient to introduce u ≡ N ′′ /N t and v ≡ 2N/N f and express the smallest tight CC grid problem with these variables. From the regularity constraints, we have N ′′ r = 4u 2 /(3v) + 2. We want to minimize the number of CCs subject to a given maximum SNR loss i.e., Combining the derivatives of the objective dg = ∂ u gdu + ∂ v gdv and of the constraint dv = −2udu, we obtain the equation giving the admissible point where the derivative dg/du vanishes, viz.
where we defined y ≡ 8u 2 /(3v) + 5. This equation can be solved numerically and gives y ≈ 8.95. Let α ≡ u 2 /v = 3(y − 5)/8 be the ratio between the two errors contributing to µ. We obtain the smallest tight template grid when this ratio is α ∼ 1.48. For a required µ, we get the parameters of the resulting grid as follows.
The parameters of the smallest tight template grid may not be always suitable in practise (see the later discussion on the implementation and numerical contingencies in Sec. V B 5) but they give interesting indications.
At this point, it is useful to see with an example if the proposed model and template grid sound tractable in a realistic case.

E. Toy model and CC parameters
We use the inspiralling binary chirps as a toy model to check whether the various parameters have reasonable order of magnitudes in this physically realistic situation.
We consider the Newtonian approximation of the chirp whose frequency evolution is given by [17] where t 0 denotes the arrival time. In practice, the arrival time corresponds to the time at which the chirp enters the detector's bandwidth i.e., when its frequency reaches the low frequency (seismic) cut-off (denoted f 0 ) of the interferometric detectors. The T defines the chirp duration, i.e. the time taken by the chirp from the arrival time till the binary coalescence. The chirp duration can thus be estimated by where M is the total mass (objects of equal masses).
In this calculation, we assume the seismic cut-off frequency 4 of 20 Hz.
We fixḞ andF to the corresponding values of the first and second derivatives of the chirp frequency, pertaining to the last stable circular orbit (LSCO 5 ) viz., We note that T, f LSCO ,Ḟ andF decrease with an increasing mass. When M increases, the chirp is thus shorter, less steep and curved, and it reaches only the lower part of the frequency band. From the above equations, we deduce that The sampling frequency f s is fixed by the width of the observational band of the GW detector, namely f s = 2048 Hz. We thus have Following Sec. IV D and fixing µ = 10%, the smallest tight CC grid has the following parameters for the TF and for the regularity, we have The orders of magnitude for the various parameters appear to be reasonable. Since these parameters don't increase with M , the template grid defined with the above values remains acceptable and tight for higher masses M ≥ 50M ⊙ .

V. FIND THE BEST CHIRPLET CHAIN
In Sec. IV, we have shown that, the SNR loss due to the use of a CC instead of the ideal template can be made small with an appropriate choice of parameters i.e., by making the CC grid tight. In other words, the problem of detecting a smooth chirp is equivalent to the one of detecting a CC as stated by Eq. (30). The maximization over the set of CCs -involved in the latter case -has the great advantage that it can be resolved numerically.

A. The exhaustive search is not feasible
Since CCs are in finite number, an obvious maximization procedure is to try them all and select the one which gives the maximum. To understand whether this solution is tractable, we need to know how many CCs are there. We consider that the search parameters N t , N f , N ′ r and N ′′ r are known and can be obtained from the physical and grid tightness requirements as discussed earlier.
We already presented an estimate of the number of CCs in Eq. (31) and saw that it grows exponentially with the number of time intervals of the TF grid. This estimate computed for the toy model example presented in the previous section gives log 10 N cc ≈ 1400. Clearly, this number is too large for an exhaustive search (i.e., computing ℓ for all possible CCs) to be carried out in real time on existing computers. Generally speaking, since the number of CCs increases exponentially with N t , the cost of an exhaustive search scales exponentially with N t and thus with the problem size N .
In the next section, we propose an algorithm which gives a good estimate for the optimal CC instead of the exact solution of the maximization problem described in Eq. (30). However, as opposed to the exhaustive search, the computational cost of this algorithm scales as a polynomial of the problem size N .

B. Near optimal search
The maximization of ℓ(x; φ) in Eq. (30) is a combinatorial maximization problem. The existence of an efficient solving algorithm for such problem is related to the structural properties of the "objective" function to be maximized, that is, ℓ in the present case. In this Section, we show that ℓ can be reasonably approximated by a path integral computed over a time-frequency representation (TFR) of the data. The structure of the approximated statistic allows us to perform its maximization efficiently with dynamic programming. The approximation goes through two stages with an intermediate step for the reformulation of the statistic in the TF plane. As shown in Eq. (A8), the statistic ℓ can be expressed as where the templatesc k ands k are the orthonormalized counterparts of the waveforms in quadrature cos φ k and sin φ k obtained from the Gram-Schmidt procedure as given below Let {cos φ k } and {sin φ k } be the vectors in R N associated to the quadrature waveforms. As it appears in the above expressions, these vectors are generally not orthonormal. Their deviation from orthonormality can be quantified with two parameters, defined by which are related to their vector lengths n c , n s and their scalar product n x . The parameter δ measures the relative difference in the vector lengths while ǫ measures the angle between them. The vectors are orthonormal if and only if both δ and ǫ are zero. Intuitively, if the quadrature waveforms oscillate sufficiently, they should be close to orthonormality and δ and ǫ are expected to be small. This intuition is examined in details in Appendix D, in which we exploit the fact that φ is not arbitrary but it is the phase of a CC. We show that if φ is the phase of a CC whose node frequencies are in the bandwidth f l ≤ f mj ≤ f s /2 − f l for all j with then |δ| η and |ǫ| η.
In the following, we assume that this condition is satisfied. This imposes the CC frequency not to approach arbitrarily close to the DC nor to the Nyquist frequencies. Since the amplitude of the instrumental noise of GW interferometers diverges rapidly when going close to DC, it is not expected to detect GWs at low frequencies. Therefore the reduction of the bandwidth in the low frequency region should not be a problem as long as f l remains small. We will check later with examples that the reduction of the useful bandwidth is indeed sufficiently small.
Provided a good choice of η (and checking the consequences on f l ), this approximation error can be made small. We can safely replace ℓ byl which we express as the following complex sum : 2. Go to time-frequency: Moyal The expression ofl in Eq. (54) computes the canonical hermitian scalar product between the data and a complex template waveform. While Parseval's formula allows an equivalent formulation of this scalar product in the frequency domain, Moyal's formula does the same in the TF domain, provided the use of a unitary TFR. One such TFR is the discrete Wigner-Ville (WV) distribution defined in [21] and given by with k n ≡ min{2n, 2N − 1 − 2n}, p n,k ≡ ⌊n + k/2⌋ and q n,k ≡ ⌊n − k/2⌋ where ⌊·⌋ gives the integer part. The arguments of w x are the time index n and the frequency index m which correspond in physical units, to the time t n = t s n and the frequency is Using Eqs. (54) and (56), we rewritel as the innerproduct of two TFRs namely, the WV of the data w x and the template WV w e which is the WV of complex template waveform e k ≡ exp iφ k , Chirp signals are easily modelled and described in the TF plane. Qualitatively, we expect that the TFR of a chirp signal have large values essentially in the vicinity of a curve corresponding to their instantaneous frequency and vanishes elsewhere. The template WV w e being the TFR of a chirp, it shares these characteristics. In the following section, we make use of this feature to simplify the statistic.

Approximation 2: the WV of a CC is almost Dirac
With continuous time and frequency variables, it is well-known that ( [22], p. 130 and also 217) the WV of a linear chirp (i.e., a chirp whose frequency is a linear function of time) is a Dirac distribution along the TF line associated to the chirp frequency.
We assume that this remains reasonably true for discrete time and frequency and when the chirp is non-linear (and in particular, when it is a CC). More precisely, we consider that we have where m n = [2T f n ] where [·] denotes the nearest integer.
Here, f n is the instantaneous frequency of the (possibly non-linear) chirp. Eq. (58) dissembles two approximations which we explain now.
For discrete time and frequency, the discrete WV of a linear chirp can be calculated analytically [21]. For the positive frequencies i.e., for 0 ≤ m ≤ N , the model in Eq. (58) is an acceptable approximation of the exact result, as illustrated in Fig. 3. However, there is a significant difference in the negative frequencies i.e., for N + 1 ≤ m ≤ 2N − 1. In this region, the discrete WV exhibits aliasing terms (clearly seen in the left panel of Fig. 3) which are closely related to the unitarity property of the WV. In [21], the aliasing terms are shown to be oscillating terms (switching signs) with smaller amplitude than the preponderant terms modelled by Eq. (58). We can then expect their contribution to the summation in Eq. (57) to be negligible.
It is well known [22] that interference terms appear when computing (both continuous and discrete) WVs of non-linear chirps. They can be related to the quadratic nature of this distribution (see [22] for a detailed analysis of the nature and geometry of these interferences). Interference terms change sign rapidly (see Fig. 3, right panel) and can be neglected for the same argument invoked for aliasing terms.
Inserting Eq. (58) into Eq. (57), we get the following approximation ofl : We see that this statistic results from the integral of the WV of the data along the TF path determined by the CC frequency f n . In other words, this integral is the area under this TF path. We refer to this quantity as the path length 6 .
With this approximation, the maximization of the statistic in Eq. (30) amounts to finding the path giving the largest integral, or the longest path. Efficient methods exist for longest path problems [23]. These methods exploit the structural properties of path length (or integral) measurement, in particular, additivity. The length of this entire path can be measured by splitting the path and summing the length of its constituent parts. Thanks to this property, the maximization problem can be decomposed into a recursive series of small problems, each of them being solvable in polynomial time. This is the main principle of dynamic programming (DP), which we describe in the next Section.
We remind the reader that, contrarily to the new statisticl, the exact statistic ℓ is not additive. DP cannot be applied to maximize ℓ. For the negative frequencies, the WV distribution exhibits aliasing terms (we highlight them by a background in light gray) which we neglect in the simplified model in Eq. (58). right: when the chirp frequency is not linear (here, it is a parabolic chirp), interference terms appear (evidenced by a dark gray background). Their contribution are also disregarded in the simplified model. Note that the WV of the non-linear chirp chosen for this illustration do present aliasing terms (with a light gray background), but they have a smaller amplitude than in the linear case.

Maximization with dynamic programming
DP is a classical method [23] for solving combinatorial optimization problems. As explained in the previous section, the idea is to decompose the problem into smaller ones that can easily be solved. In our context, the natural decomposition is given by the tiling of the time axis into chirplet intervals i.e., t j ≤ t s n < t j+1 or equivalently jb ≤ n ≤ (j + 1)b − 1 where b ≡ δ t /t s is the number of samples in an interval. The overall path integral is equal to the sum of the integrals computed in each chirplet interval marked with an superscript index as follows where m j n = [2N f j n ] and the frequency f j n follows the line joining the grid points (t j , f mj ) and (t j+1 , f mj+1 ). We also denote with a subscript index j, the path integral up to interval j, viz.
DP relies on the principle of optimality. We elaborate this principle with the help of Fig. 4. We consider the chirplet in time interval j. In a chain passing through this chirplet, the regularity constraints limit the choice of preceding chirplets in the time interval j − 1. We suppose that there are only three such chirplets; namely α, β and γ. Now, consider the time interval j − 1. We assume that we know the chain passing through the chirplet z (z being either α, β or γ) and giving the largest path integral summed up to the interval j −1. We denote this quantity byl (z) j−1 . (In this discussion, the chirplet and its associated CC are designated by the same label).
We compute the path integral contribution in j-th interval for the considered chirplet, and add the result tõ ℓ (z) j−1 to obtainl (z) j for all the three paths z = α, β and γ. We mark with (⋆) the optimal chain associated to the global maximum ofl (i.e., summing from interval 0 to N t − 1) which we denotel (⋆) . We further assume that this optimal chain (⋆) follows (α) up to interval j − 1, continues following the considered chirplet in interval j and proceeds to the last interval j = N t − 1 with some chain (δ), hencel (⋆) =l (α) j +l (δ) wherel (δ) denotes the contribution of the chain (δ).
The principle of optimality states that the optimal chain (⋆) has the largest path integrall (⋆) j−1 at interval j − 1 as compared to all the other chains passing by the same chirplet in interval j. In particular, this means that ℓ Proof by contradiction: Let us assume thatl We construct the chain (△) formed by (β), the considered chirplet in interval j and the chain (δ). This CC is admissible. By construction, its path integrall (△) =l (β) j +l (δ) is larger thanl (⋆) . Therefore, the chain (⋆) is not optimal which contradicts our hypothesis -QED.
We apply this principle recursively starting from interval j = 0 and incrementing. For each chirplet interval and for all N c chirplets of interval j, we keep only the CC maximizing the path integral up to this point and we discard the others. This procedure "prunes the combinatorial tree" and avoids to consider useless candidates before going to the next interval.
When the recursion reaches the last interval N t − 1, we end up with a number N c of CCs ending with a different last chirplet and having the maximum path integral among all chains with the same last chirplet. Finally, within these "short-listed" candidates, we select the chain with the largestl which is the global maximum.

Numerical contingencies and computational cost
To summarize, we started with the initial problem in Eq. (30) of finding the CC with the largest statistic. We rephrased this problem (using approximations) into a longest path problem in the TF plane. Here, path refers to the TF curve followed by the frequency of the CC, and the length is given by the integral of the WV of the data along the path. The maximization of the path length The definition of the CCs does not comprehend the fact that we only have access to discretized versions of the data and of their associated TF domain, denoted D in Sec. IV A. We begin this section by a discussion on these aspects.
a. Discretization issues -On one hand, the definition of the set of CCs relies on a TF grid sampling the continuous TF domain D. Theoretically, this grid can be refined arbitrarily. On the other hand, the search operates effectively using the discretized version of D, resulting from the sampling associated to the WV. This fixes a maximum TF resolution which cannot be surpassed.
It is useless to increase the resolution of the TF grid used for defining CCs beyond the one defined by WV. The WV divides the time axis into N intervals and the frequency axis into N bins 7 . Consequently, we have the following limitations, N t ≤ N and N f ≤ N . Furthermore, in order to have time intervals (resp. frequency bins) of equal size , the TF grid parameters N t (resp. N f ) must be divisors of N .
All these requirements limit the choice of N t and N f . It may happen that the parameters of smallest tight CC grid are not suitable because of that. Note that in the case we consider, we are generally led to adopt the finest resolution for the frequency axis i.e., N f = N .
b. Estimate of the computational cost -We estimate of the computational cost by counting the floating point operations for all the primary subparts in the course of the procedure. The computation of the WV of the data involves N FFTs with time base 2N f [21], such that the cost of this part is about 5N N f log 2 N f (assuming a standard implementation with RADIX-2).
The number of operations required by DP is better estimated by grouping them by types, rather than by a sequential assessment. The path integrall j in Eq. (61) is computed (with b additions) only once for each N c chirplets of all N t intervals, with a corresponding cost equal to N c N .
For each of the N c chirplets in each interval, the algorithm selects among the (at most) 2N ′′ r + 1 possibly connected paths. This procedure is repeated N t − 1 times, and thus requires ∼ N t N c (2N ′′ r + 1) operations. Knowing that the number of chirplets is N c ≈ (2N ′ r + 1)N f , the overall cost C thus scales with which is a polynomial of the problem size.

VI. APPLICATIONS
In this section, the proposed method is evaluated with several numerical tests and compared with two other TF based algorithms for the detection of unmodelled chirps, namely the Signal Track Search (STS) [12] and Time-Frequency Clusters (TFC) [14]. The simulation code 8 of these tests uses the implementation of these algorithms provided by [24]. We first give a brief presentation of STS and TFC.
A. Existing algorithms

Signal Track Search
We have seen earlier that the TFR of a chirp signal can be essentially described in the TF plane as a regular alignment of large values forming "ridges" along the instantaneous frequency evolution. The STS uses this observation as a heuristic basis: detecting chirps amounts to finding ridges in a TFR.
In practise, the algorithm extracts the ridges from the WV distribution 9 of the data. Because of the presence of noise, image processing techniques are required to get a good ridge extraction. The authors chose an algorithm which is normally used for road extraction from aerial images. This algorithm is based on the fact that a ridge is a locus of points having a maximum curvature (as measured by the second derivative) in the transverse direction and a small gradient along the longitudinal direction. A hysteresis thresholding procedure is applied over the second derivative of the WV (smoothed by a low pass filter) 8 Freely distributed scripts are available at http://www.obs-nice.fr/ecm for reproducing all the illustrations presented here. 9 In [12], the authors use the standard definition of the discrete WV originally proposed by Claasen-Mecklenbräuker (see [21] for a definition and a detailed discussion). This definition differs from the one presented in Sec. V B 3. In particular, it does not satisfy unitarity.
to detect TF points which suffice the above condition, and to grow iteratively chains of TF points from these ridge precursors. In [12], the ridge length (number of TF points in a ridge) is then employed as the detection statistic. However, we don't use this definition here, but we rather consider the one given by the largest path integral computed along the detected ridges. We observed that this variation outperforms the original definition of STS.

TFClusters
TFClusters is initially thought to detect short oscillatory transients (and not specifically chirps). The TFR of such transient is sparse i.e., the TF contents is essentially described by few components of large amplitude. The basic idea of TFClusters is that, for reasonable SNR, the amplitude of the transient components is larger than the noise.
This motivates the thresholding of the TFR of the data, given by the spectrogram (modulus square of shorttime Fourier transform), to retain the TF points with the largest values. A clustering algorithm is used to group the selected points. "Significant" clusters are chosen whose cardinals are greater than a threshold. "Insignificant" clusters are merged iteratively if they are sufficiently close to eventually form significant clusters. The statistic is then chosen to be the maximum sum of the TF powers over the clusters in the resulting list of significant clusters.

Discussion
It is important to stress a major difference between STS and TFClusters and the proposed method. By construction, the formers work well provided that the signal "stands above" the noise somewhere in the TF plane. If we define a local SNR in the TF plane (by computing at a given TF point, the squared difference of the mean values of the TFR under the hypotheses H 0 and H 1 divided by its variance under H 0 ), then this is equivalent to say that the local SNR has to be large at least for some TF points. However, just like the standard matched filtering, a detection with the best CC algorithm requires the global SNR (obtained by summing the local SNRs for all TF points) to be large. Clearly, this is a less stringent condition.
TF path integration is a central ingredient of the best CC search. This idea is also used for other methods developed for the detection of other GW sources, for instance, for inspiralling binaries, we can cite [25,26] and for the periodic GW sources, the Hough transform [27] and the stack slide searches [28].
Several distinctions must be stressed. First, the TF representation we use here (discrete WV) satisfies a specific and crucial property, namely unitarity. This allows us to link the final statistic to the quadratic matched filtering. TF representations based on short-time Fourier or wavelet bases used by the above methods are not unitary. Second, other methods require a precise model of the TF path (relying on the astrophysical source modelling) as opposed to our method. For the problem adressed here i.e., the detection of unmodeled chirps, we have shown that CCs can be treated as an effective finite template grid. We could then imagine to apply one of the above methods and integrate along the entire set of TF paths associated to CCs. This is however computationally impossible because of the too large number of CCs, as already discussed in Sec. V A.

B. Newtonian chirps: illustrations and benchmark
For the illustration of the best CC search, we use the Newtonian chirp signal introduced in Sec. IV E. We recall that the frequency of such chirp is a power law given by Eq. (35). Normally, the Newtonian chirp also includes a prescribed evolution of the chirp amplitude. However, for simplicity and better match with our initial model, we decide not to take this into account and keep the chirp envelope to a constant.
The Newtonian chirp is completely defined by the total mass M of the binary (if we assume that the objects have equal masses) and its initial frequency f 0 . Fig. 5 presents an example of a typical Newtonian chirp signal, where we set M = 7.3M ⊙ and f 0 = 96 Hz. The chirp duration is T = 0.5 s. We fix the sampling frequency to f s = 2048 Hz (therefore, the number of samples is N = T f s = 1024). A white Gaussian noise of unit variance is added to the signal.
Within the GW literature, it is customary to define the SNR through matched filtering (assuming the initial phase is known a priori). We follow this definition which gives in the present case, We note that, with this definition we have ρ 2 = 2ℓ(s; φ) (the factor of 2 accounts for the unknown initial phase).
We choose to scale the chirp amplitude to a SNR ρ = 20.
We apply the best CC search to this signal with the following search parameters. We arbitrarily fix the chirping rate limits to beḞ = 8192 Hz/s andF = 917.5 kHz/s 2 . These values are quite smaller than the ones expected at the LSCO (see Sec. IV E) but the time instant when theses limits are reached is close (few tenths of milliseconds before) to the LSCO. In Fig. 5, the time instants when the chirp (the solid line on the right panel) reaches the chirping rate limits (with dotted vertical lines) and when the binary system reaches LSCO (with dashed-dotted horizontal line) are indicated. We fix the frequency axis sampling to the finest accessible resolution i.e., N f = N = 1024. Similarly, we choose the smallest possible chirplet size with N t = N/2. The rest of the parameters are derived from the regularity constraints. In this respect, it is useful to calculate the adimensional characteristics of the problem i.e., N ′ = 2048 and N ′′ = 586.6. The resulting parameters are N ′ r = 9 and N ′′ r = 3, which gives a maximum SNR loss µ ≈ .28. We recall that the best CC search relies on the approximation of the optimal statistic by a complex sum presented in Sec. V B 1. The parameter η controls the relative precision of this approximation. From the results of Sec. V B 1 and the above general chirp specifications, the approximation holds with a precision η = 0.14 in a frequency bandwidth [f l , f s /2 − f l ] with f l = 96 Hz which coincides (at least for the low frequencies, which are most important) with the frequency support of the present chirp. Fig. 5 presents the result of the best CC search with the above choice of parameters. The best CC closely matches the actual instantaneous frequency in the region where the regularity constraints are satisfied.
An example is obviously not sufficient to evaluate the method thoroughly. Receiver operating characteristics (ROC) gives a systematic assessment of the performance. The ROC of a given statistic l is the diagram giving the detection probability P d (l 0 ) ≡ P(l ≥ l 0 |H 1 ) versus the false alarm probability P f a (l 0 ) ≡ P(l ≥ l 0 |H 0 ) at a given SNR and for all thresholds l 0 .
For this exercise, due to computing limitations, we prefer short signals with a small number of samples N . We choose a Newtonian chirp with total mass M = 11M ⊙ and initial frequency f 0 = 96 Hz which has a short duration T ∼ 250 ms. Choosing the sampling frequency f s = 1024 Hz, we have N = 256 samples. White Gaus-sian noise is added to the signal and the amplitude is scaled such that the SNR is ρ = 10.
We fix the chirping rate limits toḞ = 8.192 kHz/s andF = 1.05 MHz/s 2 . Like the above example, these limits are reached at a time instant close to the LSCO. We choose the finest TF grid parameters N t = 128 and N f = 256, and the regularity parameters N ′ r = 9, N ′′ r = 4. The resulting CC grid is tight with µ ≈ 0.4.
Concerning STS 10 and TFClusters 11 , we set their free parameters empirically using the recommendations available in the references, without a precise fine-tuning. Fig.  6 displays a single trial and Fig. 7 presents the ROCs of the three methods presented previously. We see that the best CC search outperforms the two others as expected.
Here, we wish to add few remarks regarding the comparison between the best CC search and STS. The improvement in the ROC of the best CC with respect to STS has two origins. First, the use of a unitary discrete WV instead of the standard WV helps in increasing the detection probability by few percent. The unitarity preserves the power in TF plane and hence improves the efficiency. Second, the major part of the improvement comes from the TF pattern search procedure. As explained in Sec. VI A, the use of a global search criterion instead of a local one is a crucial ingredient.
It is interesting to compare these ROCs with what could optimally achieve an imaginary observer which knows in advance the targeted chirp. Since this clairvoyant observer knows the chirp phase exactly, he can apply the optimal statistic i.e., the quadrature matched filter obtained in Appendix A. The ROCs of the quadrature matched filter can be obtained analytically (under Gaussian noise hypotheses). The false alarm and detection probabilities are given respectively by [29]: This ROC depends only on one parameter, namely the SNR ρ c . The ROC curve of clairvoyant statistic with ρ c = ρ provides an absolute upper bound on the detection probability. Obviously, having in hand all the information makes a very large difference with respect to the case 10 Following the notations of [12], the size of the Gaussian kernel of the pre-smoothing filter is fixed to σ = 2. The low and high thresholds of the hysteresis are set to 3.3/pixel 2 and 10/pixel 2 resp. 11 The TFR is given by the short-time Fourier transform computed over non-overlapping blocks of 16 samples (i.e., intervals of ≈ 7.8 ms). The frequency axis is tiled into 32 bins (i.e., a resolution of 32 Hz). We use the nominal values given in [14] for the rest of the parameters namely p = 0.1, σ = 5, δ = [0, 0, 0, 0, 0, 0, 2, 3, 4, 4] and α = 0.25. where we only know that the incoming GW is a smooth chirp. The detection probability of the clairvoyant statistic is very close to 1 over the entire range of value chosen for the false alarm rate. This is why we don't show this curve. It is more interesting to compare the performances of the various methods with the ones of the clairvoyant observer for SNRs ρ c < ρ. More precisely, we adjust ρ c in such a way that the resulting curve matches reasonably well the ROC of the best CC search in the region of interest i.e., for false alarm probabilities in the range 10 −5 to 10 −4 . Since the SNR is inversely proportional to the distance of the GW source, the ratio of the actual SNR to the best-fit value ρ/ρ c gives the reduction factor of the sight distance with respect to the ideal (and non accessible) situation where we have at our disposal all the information about the chirp we want to detect. We include the fitted clairvoyant ROC in Fig. 7. The ratio in the sight distance can be estimated ∼ 10/6.15 ≈ 1.6.

C. Random CCs and robustness
While benchmarks based on Newtonian chirps are satisfactory for a comparison of several detection methods in a nominal situation, they don't provide a test for the robustness i.e., a measurement of their ability to detect reliably a large class of different chirps.
In this section, we present ROC curves computed using random CCs. Random CCs are generated by chaining chirplets randomly chosen in a range specified by regularity constraints. Therefore, the frequency of a random CC follows a kind of random walk in the TF plane. We generate a new random CC for each trial made to estimate the detection probability.
The detection of random CCs is obviously much more difficult than the detection of a single chirp. It is an effective test of the method robustness. No classical approaches (e.g., based on banks of quadrature matched filters as for inspiralling binary chirps) can be applied successfully in this case. We assume the same general characteristics of the Newtonian chirp used in the first example in the previous section, namely T = 0.5 s, f s = 2048 Hz, thus N = 1024 samples,Ḟ = 8192 Hz/s andF = 917.5 kHz/s 2 . We already computed satisfactory search parameters for this set-up. Therefore, they remain unchanged (N t = 512, N f = 1024, N ′ r = 9 and N ′′ r = 3). The random CCs are generated on the same basis, but with a time interval slightly larger, the regularity parameters being increased accordingly i.e., N t = 64, N f = 1024, N ′ r = 65 and N ′′ r = 57. We use an additive white Gaussian noise. Fig. 8 presents an example of such signal (with SNR ρ = 20) and the result of the application of the best CC search. Fig. 9 displays the ROC curve of the best CC search (with SNR ρ = 12) along with the one of the clairvoyant quadrature matched filter adjusted to an adequate SNR. We estimate a loss in the sight distance with respect to the clairvoyant case to be a factor of ∼ 2.6. Best CC search "sees" to distances comparable to (in the sense, with a reduction factor less than one order of magnitude) what classical methods achieve in other GW detection problems.
The computational cost of this search as estimated by Eq. (63) is about 142 millions of floating points operations for one block of duration T = 0.5s. Assuming 10% overlap between successive blocks, real-time processing can be achieved with a computing power of 2.8 Gflops which is less than what a single standard workstation can handle today.  Fig. 5). bottomright: actual chirp frequency in solid/green and best CC in dashed/red. It is worthwhile to note that, although the best CC can lose track for some time because of noise fluctuations, it is able to recover the exact TF path.

VII. CONCLUDING REMARKS
Smooth chirps define a general model of "nearly physical" GW chirps. Chirplet chains -chains of linear chirplets -allow the design of tight template grids for the detection of smooth chirps. The optimal detection requires these grids to be searched thoroughly to find the template which best matches with the data. Although the shear large number of templates prevents the use of an exhaustive search, near-optimal detection can be performed with the time-frequency based procedure presented here. Its originality lies in the clear link established between the optimal statistic and the proposed search algorithm as opposed to with other approaches. In particular, it justifies the choice of a specific timefrequency representation (the unitary discrete WV) and pattern search algorithm (dynamic programming). We have evaluated that best CC search is computationally tractable for detection of typical GW chirps.
It is important to emphasize several features which makes the proposed method attractive in practise. First, the free parameters (the chirp duration T and the chirping rate limitsḞ andF ) are few and directly related to physical characteristics. Second, the principle "He who can do more can do less" applies here: smooth chirps is a very general class of chirps. This model, and thus the search algorithm can be easily modified and adapted to incorporate additional astrophysical information. For instance, it is easy to search only chirps with an increasing (or decreasing) frequency. One may also want a more stringent constraint on the chirping rate at low frequencies than at high frequencies. The inclusion in the algorithm of a dependency of the chirping rate limit upon the frequency is straightforward. This leaves the possibility of a compromise between efficiency (since the restriction of the set of admissible waveforms due to additional constraints reduces the false alarm rate) and robustness, depending on the quantity and reliability of the information available on a specific GW source. Third and finally, it is simple to restrict the search to chirps starting and/or finishing at given time-frequency location. This feature could be used for partially known chirps whose waveforms is known only on a part of the total duration. Those signals could be detected with a hybrid approach combining a standard matched filtering where the waveform model is available, and best chirplet chain search for the rest.
In the literature concerning the detection of inspiralling binaries of compact objects [17,18], this maximization is usually performed assuming that N is independent of ϕ 0 . This assumption is correct when the two quadratures cos φ k and sin φ k , viewed as vectors of R N , are orthonormal (i.e., orthogonal and of same norms). In this case, we have n c = n s = N/2 and n x = 0 where n c , n s and n x are the norms and cross-products of the quadratures as defined in Eqs. (7). Inserting this into N = n c cos 2 ϕ 0 − n x sin(2ϕ 0 ) + n s sin 2 ϕ 0 , we conclude that N = N/2 is a constant. However, for general phase evolution, the quadrature waveforms are not necessarily orthonormal. This is approximately true when the chirp oscillates sufficiently rapidly during many cycles (e.g., for inspiralling binaries of a small mass). Since we are considering chirps with an arbitrary phase and of relatively short duration, such assumption is not realistic and we opt for the general case keeping the dependency of N upon ϕ 0 .
To proceed with the maximization, we first examine the special case where the quadratic waveforms are linearly dependent i.e., cos φ k ∝ sin φ k for all k. This implies that we are in the degenerate case where φ k = ϕ 0 is constant. Introducing the two angles ϕ = arg(x c + ix s ) and η = arg( √ n c + i √ n s ), we can rewrite Eq. (A3) as Λ(x; ϕ 0 ) = 1 2 The proportionality of the quadrature waveforms implies that √ n c x s ± √ n s x c = 0 which gives sin(η − ϕ) = 0, and hence η = ϕ + πZ. We conclude that Λ(x; ϕ 0 ) remains constant for all ϕ 0 and is equal to the statistic given by In the non degenerate case, we compute the derivative of the statistic as given in Eq. (A3) w.r.t. ϕ 0 . Its numerator turns out to be a second order polynomial of tan ϕ 0 . 12 Here, we adopt the precedence rule ∂ k ab = (∂ k a)b.
Similarly to the first derivative, we insert the expression of the signal s k = A cos(φ k + ϕ 0 ) and evaluate each of the component terms of the above expressions. We have to distinguish two cases i.e., the non-diagonal cross terms of the Hessian matrix when k = l and the diagonal ones when k = l.

b. Auto terms, k = l
We consider the case where k = l. Now, the second order derivatives do not vanish. In fact, we have The consequence is an additional term D kl to the second order derivative of the statistic ∂ 2 kl ℓ| φ * =φ = X kl + D kl . With a direct calculation, we obtain its exact expression (no approximation needed): where the Kronecker symbol is δ kl = 0 for k = l and 1 for k = l.

Approximated distance
From Eqs. (B16), (B21) and assuming that |ǫ|, |δ| ≪ 1, we have ℓ(s; φ) ≈ A 2 N/4. The distance defined in Eq. (13) can thus be written as Considering that ∆ k and ∆ 2 k are slowly varying with respect toĉ k andŝ k , we argue that, similarly to what is discussed in Appendix D, the positive and negative terms compensate when making the following sums kĉ k ∆ k , kŝ k ∆ k and kĉ k ∆ 2 k . We neglect the small residual, which leads to the final approximation of the distance in Eq. (14).

APPENDIX C: CONSTRAINED MAXIMIZATION OF THE DISTANCE
We rewrite the constrained maximization problem described in Sec. IV C 3 of the distance in Eq. (14) under the constraint in Eq. (27) with simpler notations. We relate them to the initial problem at the end of this Appendix.
Let {r k } a series of N real numbers. We want to maximize the empirical variance V (r) expressed by under the constraint that the increments u k ≡ r k − r k−1 are absolutely bounded by some constant U > 0 i.e., |u k | ≤ U for k > 0. The empirical variance V (r) is invariant by the addition of an arbitrary constant C: let r k = y k + C, for all k, then V (r) = V (y). We can thus assume with no loss of generality that r 0 = 0 (i.e., choose C = −y 0 ). Therefore, we have r k = k j=1 u j for k > 0. We want to maximize the convex function V in the set of feasible solution described by {r k } which is a polyhedron of R N . From a classical theorem of convex analysis (see [31], p. 187), we conclude that V reaches its maximum at one of the extreme points of this polyhedron. The extreme points are the points where the increments are either u k = +U or u k = −U . There are 2 N −1 extreme points and we need to identify the one which maximizes the convex function.
Let us rewrite the empirical variance V (r) as a function of u k . We leave the "auto-terms" u 2 k aside (for all extreme points, the auto-terms are equal to U 2 independently of the sign of u k . Their contribution is thus unimportant for the identification of the maximum) and concentrate on "cross-terms" (i.e., terms in u j u k ). A direct calculation leads to where c jk = 2j(N − k)/N 2 and V a is the contribution due to the auto-terms. Since all c jk > 0, the maximum of V is reached when all u k have the same signs, that is when u k are all identically +U or −U . Therefore, the empirical variance is maximum when r k = ±kU and in this case V (r) = U 2 (N 2 − 1)/12.
We recall that the distance between the smooth chirps is well approximated by the empirical variance of the phase discrepancy [see Eq.14]. We apply this result to the original maximization problem by setting r k= ∆ k and U=2π∆ f t s as given in Eq. (27). Noting that their norms and scalar-product are respectively given by n c = c, c , n s = s, s and n x = c, s as defined in Eq. (7), the departure from "orthonormality" of c and s can be quantified by the two parameters δ = n c − n s n c + n s ǫ = 2n x n c + n s .
The parameter δ measures the relative difference of the lengths of c and s while ǫ is related to the cosine of the angle between the two vectors.
When the vectors c and s are orthonormal i.e., orthogonal and of same lengths, both δ and ǫ are zero. By continuity, for nearly orthonormal vectors, δ and ǫ are then expected to be small. Intuitively, this should be true for vectors with oscillating components like c and s. Indeed, ǫ and δ can be rewritten in the form of oscillating sums, namely The positive and negative contributions cancel in the summation, and thus leaves a small residual. In this appendix, we go beyond this intuitive rationale when the phase φ is a CC as defined in Eq. (11) and give a systematic investigation of the maximum value taken by δ and ǫ.
Eq. (D2) motivates us to combine δ and ǫ is the following complex sum S Bounding the modulus of S is equivalent to bounding δ and ǫ. Analytic number theory provides a large number of results concerning exponential sums like S, for improving upon the trivial bound |S| ≤ 1. We use one of these, namely the Kuzmin-Landau lemma, see [32] p. 7. We present a proof of this lemma pertaining to the present case where the phase φ is a CC.
The proof can be summarized as follows. A change of variables is introduced which allows us to put a bound on the modulus of S by a sum of the finite difference of complex variables. These new variables appear to be collinear in the complex plane. The sum of the modulus of their difference is thus equal to the distance between the extremes. The final bound on |S| is then obtained by combining this property with the explicit expression of the phase of the CC, provided a constraint on the lower and higher frequencies reached by the CC.
Let us define for 1 ≤ k ≤ N − 1, the following variables We perform the above change of variables in the sum S using the relation and we get By taking the modulus on both side, we obtain the following bound, where we split the sum in Eq. (D6) into smaller ones calculated over chirplet intervals i.e., t j ≤ t s k < t j+1 or equivalently jb ≤ k ≤ (j + 1)b − 1 with b = δ t /t s , the number of samples in a chirplet interval. In the last sum, we separate the terms corresponding to the transition between two consecutive chirplets from the terms corresponding to the individual chirplets.
We now obtain a bound on each term of RHS of Eq. (D7), starting with the first sum. Eq. (D4) can be rewritten as and combining with we get the result Finally, from Eq. (D8), we have the following inequalities which, when applied with k = 1 and k = N − 1, set an upper limit to the remaining terms in the RHS of Eq. (D7), noting that |1 − ζ N −1 | = |ζ N −1 |. Combining this result with Eqs (D15) and (D18), we get (D20) The number of samples in a chirplet interval being an integer b ≥ 1, and selecting the dominating contribution, we conclude that |S| η with This bound is obtained from a worst case estimate. Generally, δ and ǫ are smaller than this value. With the choice of a small c, a more realistic estimate rather than a strict bound can be obtained replacing the inequality 2c ≤ sin(πc) by the first order Taylor approximation πc ∼ sin(πc) in the proof above, yielding the following estimate η = N ′ r N t πc 2 N 2 . (D22) Summarizing, we obtained an upper bound η on |S| by restricting the frequency of the CC in a bandwidth defined by c. We rather use the reciprocal i.e., we get the limits of the frequency bandwidth from an acceptable value for η. If we assume that N ′ r ≈ 4(N ′ /N t )(N f /(2N )) as given by the regularity condition, the frequency bandwidth is [f l , f s /2 − f l ] with where the leading constant is obtained from 20/π ≈ 2.5. We use this result in Sec. V B 1.