In reflection seismology, full-waveform inversion (FWI) can generate high-wavenumber subsurface velocity models but often suffers from an objective function with local minima caused mainly by the absence of low frequencies in seismograms. These local minima cause cycle skipping when the low-wavenumber component in the initial velocity model for FWI is far from the true model. To avoid cycle skipping, we discovered a new wave-equation reflection traveltime inversion (WERTI) to update the low-wavenumber component of the velocity model, while using FWI to only update high-wavenumber details of the model. We implemented the low- and high-wavenumber inversions in an alternating way. In WERTI, we used dynamic image warping (DIW) to estimate the time shifts between recorded data and synthetic data. When compared with correlation-based techniques often used in traveltime estimation, DIW can avoid cycle skipping and estimate the time shifts accurately, even when shifts vary rapidly. Hence, by minimizing traveltime shifts estimated by dynamic warping, WERTI reduces errors in reflection traveltime inversion. Then, conventional FWI uses the low-wavenumber component estimated by WERTI as a new initial model and thereby refines the model with high-wavenumber details. The alternating combination of WERTI and FWI mitigates the velocity-depth ambiguity and can recover subsurface velocities using only high-frequency reflection data.
Traveltime inversion (Bishop et al., 1985; Stork, 1988; Luo and Schuster, 1991) and full-waveform inversion (FWI) (Tarantola, 1984; Pratt, 1999) are two different methods that geophysicists use to estimate subsurface velocity models from reflection seismic data. Both methods have advantages and disadvantages in estimating a subsurface model , a function of spatial coordinates. In this study, we consider that the model is composed of a smooth low-wavenumber background and a rough high-wavenumber component (Clément et al., 2001): (1)The smooth background corresponds to the kinematics of wave propagation : (2)and corresponds to the reflection wavefield : (3)where and are squared slownesses, denotes a source wavelet, and is the source location. We can in fact obtain the reflection wavefield by subtracting the background wavefield from the full wavefield , which is the solution to (4)For the purpose of this paper, we choose a constant-density acoustic wave equation.
Traveltime inversion minimizes only traveltime differences between observed (recorded) reflection data and computed (synthetic) reflection data , where denotes the receiver location. Because the traveltime shift, compared to the data difference, is more sensitive and more linearly related to the long-wavelength model perturbation (Cara and Lévêque, 1987; Fichtner et al., 2008), traveltime inversion better resolves the low-wavenumber background model , which can be subsequently used as an initial model in FWI for the high-wavenumber details .
Traveltime inversion has succeeded in applications to crosswell data (Luo and Schuster, 1991; van Leeuwen and Mulder, 2010) and refraction data (Zelt and Barton, 1998), where commonly only first arrivals are used to measure traveltime shifts. However, reflection traveltime inversion (RTI) is more difficult. One difficulty is that traveltime shifts vary with time. Manually picking arrivals in seismograms is the most reliable way to estimate these shifts, but it is time consuming. More automatic approaches compute crosscorrelations of seismograms to estimate time shifts (Luo and Schuster, 1991; Hale, 2009; van Leeuwen and Mulder, 2010). Unfortunately, when time shifts change rapidly (Hale, 2013), correlation-based methods still suffer from the cycle-skipping problem. A second difficulty in RTI is the ambiguity between the low-wavenumber background velocities and the depths of high-wavenumber reflectors (Stork and Clayton, 1986; Bube et al., 2002). This ambiguity can be resolved by the migration/demigration approach in tomography techniques (Lambaré, 2008; Woodward et al., 2008), which minimizes residual moveout (RMO) in migrated gathers. RMO-based tomography is not the focus of this paper, and we concentrate on only traveltime inversion and FWI, two wave-equation-based methods with objective functions defined in the domain of unmigrated data.
Conventional FWI does not differentiate and and aims to estimate the subsurface model by minimizing differences between the recorded data and the synthetic data, including differences in traveltimes, amplitudes, converted waves, multiples, etc. Gradient-based methods are often used to solve the FWI optimization problem. The gradient of the objective function in FWI can be computed by the adjoint-state method (Plessix, 2006). In general, if the data lack low frequencies, the gradient in conventional FWI lacks low wavenumbers. Therefore, conventional FWI with reflection data can update and may recover the high-wavenumber component of the model , but its capability to accurately recover depends highly on the accuracy of a background model . Unfortunately, without long offsets and low frequencies, conventional FWI cannot estimate the low-wavenumber background model (Snieder et al., 1989). And even where long offsets are used to record refraction energy, conventional FWI can accurately update only the shallow parts of the low-wavenumber background model.
Difficulties in conventional reflection FWI are mainly due to local minima, which often arise when the initial background model is far away from the true model and the recorded data do not contain sufficient low frequencies. As a result, the recovery of high wavenumbers suffers from cycle skipping (Virieux and Operto, 2009), one primary cause of local minima. The other important reason for local minima is the nonlinear relationship between data and the model (Snieder, 1998). In the presence of cycle skipping, the high-wavenumber details estimated with FWI are mispositioned. Various methods have been proposed to solve the cycle-skipping problem. Multiscale approaches (Bunks, 1995; Sirgue and Pratt, 2004) recursively add higher wavenumber details to models first computed from lower frequency data, but they are useful only if sufficient low frequencies are available. Snieder et al. (1989) and Clément et al. (2001) reparameterize as a function of vertical traveltime instead of depth to mitigate cycle skipping. Sparse-model inversions (Ma et al., 2012; Ma and Hale, 2012) mitigate the lack of low frequencies by generating so-called image-guided low-wavenumber gradients.
We propose a new wave-equation reflection traveltime inversion (WERTI) to update the low-wavenumber , while using conventional FWI to generate the high-wavenumber . As in Snieder et al. (1989), Clément et al. (2001), and Xu et al. (2012), WERTI and FWI are performed in an alternating fashion, which separates the contributions from and . That is, during application of WERTI, we update only and the high-wavenumber model does not change; likewise, when performing FWI, we update only and stays the same. In WERTI, we use dynamic image warping (DIW) (Hale, 2013) to automatically estimate the time-varying traveltime shifts. DIW avoids the cycle skipping inherent in crosscorrelation methods when traveltime shifts change rapidly.
In the following sections, we first introduce DIW to estimate traveltime shifts. Then, we analyze and compute the gradient in WERTI using the adjoint-state method (Plessix, 2006). We then test our method for a high-velocity-anomaly model and a part of the Sigsbee2A model (Paffenholz et al., 2002). Results demonstrate that by combining WERTI and FWI, we can use only high-frequency reflection data to recover low- and high-wavenumber components of velocity models.
Common methods for estimating time shifts are based on crosscorrelation (Luo and Schuster, 1991; van Leeuwen and Mulder, 2010). Crosscorrelation-based methods, however, may suffer from cycle skipping when shifts change rapidly. In this paper, we choose DIW to overcome the cycle-skipping problem in estimating time shifts. After posing an inverse problem for WERTI, we investigate the gradient for WERTI to update the low-wavenumber component of velocity models and propose an alternating inversion strategy.
Traveltime shifts from dynamic warping
DIW (Hale, 2013) is an efficient extension to a classic dynamic warping algorithm for speech recognition (Sakoe and Chiba, 1978), and it can be considered a nonrigid image registration (Glasbey and Mardia, 1998; Zitová and Flusser, 2003). DIW obtains the traveltime shifts between the recorded data and the synthetic data by finding a shift field that is the solution to the following constrained optimization problem: (5)subject to (6)where measures an L2 distance between synthetic and recorded data and is a functional, a function of the function , defined by (7)Hale (2013) uses a dynamic programming method to find the minimizing path that minimizes the distance and then solve this DIW optimization problem.
Constraints in equation 6 control the rates at which shifts vary in directions , , and , respectively. These constraints ensure that the time shifts neither decrease nor increase too rapidly in time or space. In this paper, we choose , , and , where , , and are intervals in time, source, and receiver, respectively. Then, units of , , and are , , and , respectively. The infinite bound means that we do not constrain the shifts in the direction of ; in other words, we estimate these time shifts for each shot independently. If necessary, we could instead specify a finite to constrain changes in time shifts with respect to shot coordinates.
Figure 1 shows an example of dynamic warping applied to estimate shifts between two synthetic seismograms, (Figure 1a) and its warped version (Figure 1b). For this synthetic example, the known shifts between and correspond to a sine function, indicated by the dotted line in Figure 1c. Dynamic warping correctly detects the shifts (solid line); meanwhile, crosscorrelation fails (dashed line), especially when shifts change rapidly. In addition, dynamic warping is robust to noise. In Figure 2a, the synthetic seismogram is plus band-limited random noise. The rms signal-to-noise ratio (S/N) in is as low as 1, and therefore not every event in can match an event in . Despite the strong noise, dynamic warping estimates the shifts accurately, as shown in Figure 2b. Compared to dynamic warping, crosscorrelation is more sensitive to noise.
After we obtain the traveltime misfit between the recorded data and the synthetic data , we can use this misfit in WERTI. To do this, we must formulate an objective function that is minimized (or maximized) when the misfit is zero, indicating that the reflections in the recorded data and in the synthetic data are well-aligned. The most straightforward way of using the misfit is analogous to conventional traveltime tomography, in which we aim to solve a least-squares inverse problem: (8)
Various methods, including conjugate-gradient methods and quasi-Newton methods, can be used to solve the inverse problem implied by equation 8. All of the methods require computing the gradient of the objective function with respect to the model parameters.
We expand equation 8 about some starting models and , keeping only the linear terms: (9)where and are gradients of the objective function with respect to and , measured at and , respectively. Directly computing the gradients involves calculating Fréchet derivatives and , which can be obtained efficiently by using the concept of a connective function (Luo and Schuster, 1991) or a transfer function (Kennett and Fichtner, 2012).
In this paper, we instead use the adjoint-state method (Plessix, 2006) to derive the gradients (10)and (11)where and are adjoint (backward propagating) wavefields, which are the solutions to two adjoint-state wave equations (equations A-4 and A-5), respectively. The adjoint wavefields and are solved in a time-reversed fashion with final conditions in time. Appendix A provides the details of the adjoint-state method for WERTI, including the adjoint sources for the adjoint-state wave equations.
The first term on the right side of equation 10 represents the zero-lag crosscorrelation of a forward source-side wavefield and a backward wavefield ; likewise, the second term is the crosscorrelation of a forward wavefield and a backward receiver-side wavefield . Figure 3 illustrates schematically how we compute the gradients in WERTI. In Figure 3, the high-wavenumber component scatters the forward-propagating source wavefield and the back-propagating receiver wavefield . In other words, every point in can be considered a virtual source that contributes to the scattered wavefields and .
With these virtual sources, the calculation of is therefore analogous to that in a transmission-type experiment, such as the crosswell examples in Luo and Schuster (1991) and van Leeuwen and Mulder (2010). We expect low wavenumbers from because it is distributed along the wave path between the source and the virtual source and also along the wave path between the virtual source and the receiver. In contrast, is analogous to reverse time migration (RTM), thereby supplying mainly high wavenumbers near the reflector . Of course, the wavenumber in also depends on the reflection angle (Sirgue and Pratt, 2004).
Comparison to FWI
FWI uses a different objective function that includes (directly) amplitude differences and (indirectly) traveltime differences: (12)where .
As in the derivation above of gradients in WERTI, we can also get two gradients for FWI, with respect to and , respectively: (13)and (14)where the adjoint wavefields and are again solutions to equations A-4 and A-5, with a substitution of the right-hand side of equation A-4 with .
Meanwhile, in the context of conventional FWI, we aim to minimize the same data-difference objective function (equation 12). However, we do not distinguish gradients and ; instead, we use the following gradient: (15)where the adjoint wavefield is the solution to equation 4, which is solved with as the source, again in a time-reversed fashion. Scattered wavefields and are normally one order of magnitude smaller than background wavefields and , which dominate and , respectively. That is, in equation 15 can approximate in equation 14. When contains only a smooth background, is the same as .
The previous comparison highlights that WERTI and FWI can provide gradients with respect to the low-wavenumber background model and the high-wavenumber details . These gradients share the same recipe with only differences in the objective functions and the consequent adjoint sources. However, WERTI and FWI in fact show different sensitivities to and , and this fact thereby determines our inversion strategy in the next section.
On the one hand, WERTI measures the kinematic difference that is more sensitive to than to . Hence, we choose WERTI and use its low-wavenumber gradient to estimate . On the other hand, FWI predominantly measures the amplitude difference and therefore is more sensitive to . Then, we choose conventional FWI and use to estimate the high-wavenumber .
Although, as implied by equation 13, one can obtain for FWI a low-wavenumber gradient , which can also provide some low-wavenumber updates. However, even with , FWI still suffers from local minima inherent in the simple data-difference objective function, in which the nature of nonlinearity and cycle skipping do not vanish.
Our inversion methodology is the following two-step combination of WERTI and FWI, both of which use a conjugate-gradient method:
|begin with a smooth background,|
|1) for until converges, do|
|FWI (1 iteration):|
|with , compute gradient|
|WERTI (1 iteration):|
|with and , compute gradient|
|use from step 1 as a new initial model,|
|2) for until converges, do|
|iterate conventional FWI:|
In practice, inversion often begins with a smooth model that would create no reflections in synthetic seismograms. However, WERTI requires reflections in synthetic data, so that DIW can estimate time shifts between these reflections and those in recorded data. Therefore, before applying WERTI, we use one iteration of FWI in step 1 to obtain the rough-model perturbation , which we use to synthesize reflections in seismograms. Then, WERTI in step 1 extracts traveltime shifts from the synthetic seismic reflections and updates the low-wavenumber component .
In the next iteration of step 1, we apply again one iteration of FWI after WERTI to obtain a new rough-model perturbation , which we use to replace and to generate new seismic reflections; this is necessary because reflectors in will move after WERTI updates the low-wavenumber component . Through this alternation in step 1, we can eliminate the ambiguity between the low-wavenumber velocities and the depths of reflectors. This alternation is similar to the migration and demigration approach in migration-based traveltime tomography (MBTT) (Clément et al., 2001). After we resolve the low-wavenumber component in step 1, we use it as a new initial model for conventional FWI in step 2 to iteratively refine the model with high-wavenumber details.
True amplitude migration or least-squares migration can be used in step 1 to replace FWI and to approximate . We choose FWI in this step because FWI can invert for the impedance contrast that more accurately describes the subsurface than does a migration image. Multiple iterations of FWI can reduce amplitude differences between synthetic data and recorded data, thereby improving the estimation of traveltime shifts with dynamic warping. Fortunately, we do not necessarily need multiple iterations of FWI in this step because before applying dynamic warping, we can use an amplitude-matching filter to bring synthetic and recorded data to a similar amplitude level.
When performing WERTI in step 1, we apply a spatial low-pass filter to the gradient , so that we can fix the rough-model perturbation and update only the low-wavenumber component . Likewise, also in step 1, when performing FWI, we apply a spatial high-pass filter to the gradient to obtain only the rough-model perturbation .
To demonstrate the application of WERTI, we test the algorithm using a simple Gaussian high-velocity anomaly model and a part of the Sigsbee2A model. In both tests, we use only reflection data; we mute any direct and refracted arrivals.
A simple velocity anomaly
We refer to the model in Figure 4a that contains a Gaussian high-velocity anomaly as the true model. Figure 4b displays the initial constant model from which we start the inversion. In this example, we use 24 evenly distributed shots on the surface, and a Ricker wavelet with a peak frequency of 15 Hz is used as the source for simulating wavefields. The source and receiver intervals are 0.16 and 0.016 km, respectively. In this example, the maximum time is 2 s and the maximum offset is 4 km.
Figure 5a and 5b shows the FWI gradient and the updated model (), respectively, for only one iteration of conventional FWI. As expected, conventional FWI mispositions the reflector and fails to recover the Gaussian anomaly above the reflector. However, beginning with this misplaced reflector, WERTI can recover the anomaly.
WERTI first synthesizes reflection data using the model shown in Figure 5b, and one of the common-shot gathers is displayed in Figure 6b. Figure 6c illustrates the difference between the true data (Figure 6a) and the synthetic data (Figure 6b) and highlights the cycle skipping, especially for offsets .
More compelling evidence of cycle skipping is in the time shifts estimated from the recorded and synthetic data. Figure 7a displays the time shifts estimated with dynamic warping, and these shifts are normalized by the dominant period of the data. In Figure 7a, shifts greater than half a cycle indicate the presence of cycle skipping.
After obtaining the time-varying time shifts , we calculate the objective function and compute the gradient . The adjoint-state method is an efficient way to compute the gradient, and it uses an adjoint source defined as in Appendix A. Figure 7b shows an example of the adjoint source for one shot. Figure 8a depicts the gradient in the first iteration of WERTI, clearly showing a low-wavenumber feature near the high-velocity anomaly. After step 2, the estimated model in Figure 8b recovers the anomaly (especially the top and flanks) reasonably well and, moreover, locates the reflector in the correct position.
We can modify step 1 by replacing WERTI with FWI (equation 12) to minimize the data-difference objective function and then use the new low-wavenumber gradient to estimate the high-velocity anomaly. Then, this approach becomes similar to MBTT. Figure 9a displays the gradient in the first iteration of such a modified FWI, and Figure 9b shows the updated model after step 2. In general, this modified FWI approach changes the model in the right direction. However, due to cycle skipping at far offsets (Figure 6c) and the resulting strong side lobes in the gradient, convergence for this method is significantly slower than that for WERTI.
Figure 10a displays the true velocity model that is part of the Sigsbee2A model created by SMAART JV (Paffenholz et al., 2002); Figure 10b shows an initial velocity, which increases as a function of depth. Except for the water layer, which is the same as in the true model, the initial model in Figure 10b is a scaled (by 0.95) and smoothed version of the horizontal average of the true model. In this example, except for the maximum offset (3 km), other survey parameters are the same as in the previous example. Figure 11 displays two common-shot gathers simulated in the true model with a 15-Hz Ricker wavelet as the source, and only reflection data will be used for inversion.
Figure 12a is the RTM image migrated with the true model; accordingly, Figure 12b is the migration image for the initial model. Because the initial model significantly deviates from the true model, all the reflectors in Figure 12b, except for the water bottom, are located at inaccurate positions, and some deep reflectors are misplaced by more than the dominant wavelength. In addition, two major faults and two diffractors are poorly imaged in the RTM with the incorrect smooth initial model.
In this example, conventional FWI can only add some details in the scale of one wavelength or less, as indicated by the updated model shown in Figure 13a, which is the inversion result after 10 iterations of conventional FWI. Because of the lack of low wavenumbers in the model (Figure 13a), the corresponding RTM image shown in Figure 13b does not provide any noticeable improvement over the initial result in Figure 12b. Figure 14a and 14b show two shot gathers simulated in the updated model shown in Figure 13a. Compared with the true gathers in Figure 11, we observe that the synthetic data match the true data well, but only for small offsets. As illustrated in Figure 14c and 14d, traveltime shifts estimated by dynamic warping between the true data and the synthetic data indicate large misalignments at far offsets. Without low-wavenumber updates, conventional FWI cannot avoid cycle skipping, even after 10 iterations.
However, after one iteration FWI adds reflectors to the initial model at possibly incorrect positions, WERTI begins to update the low-wavenumber background model and to correct the reflector positions. These two parts take place in the same iteration (step 1) but in an alternating fashion. To ensure that WERTI updates only the background model, we apply a spatial low-pass filter to the gradient to remove any components or artifacts that are close to or less than the dominant wavelength.
Figure 15a shows the updated low-wavenumber background model after 10 iterations of WERTI. With this background model, we can significantly improve the RTM image quality, as illustrated by Figure 15b. Compared to the initial migration image in Figure 12b, Figure 15b shows that two major faults and diffractors are better resolved, and reflectors are now close to the true positions shown in Figure 12a. We plot the convergence of WERTI in this example in Figure 16. The traveltime misfit function (Figure 16a) decreases rapidly to nearly zero after WERTI updates the low-wavenumber background and corrects reflector positions. Because this is a synthetic example, we are able to also plot a model misfit curve (Figure 16b), which also decreases significantly with low-wavenumber updates.
In step 2, we can further improve the WERTI result in Figure 15a by using it as a new initial model and begin iterations of conventional FWI. In this way, we can refine the low-wavenumber WERTI model with high-wavenumber details. Figure 17 shows the final model after 10 iterations of FWI. With the addition of high-wavenumber details, the final model represents most of the major features including faults and diffractors. We once again simulate synthetic data in the final model and show two gathers in Figure 18a and 18b. Compared with the true data in Figure 11a and 11b, seismograms are well aligned at the near and far offsets. The small traveltime shifts displayed in Figure 18c and 18d illustrate this alignment in a quantitatively compelling way. Finally, we compare the convergence of FWI after WERTI to conventional FWI in Figure 19. Conventional FWI can decrease the data misfit function, but due to cycle skipping, the model misfit function in fact does not decrease. On the other hand, FWI after WERTI decreases the data misfit and model misfit curves more rapidly.
Conventional FWI with reflection data requires a good initial model to avoid cycle skipping. In contrast, the initial model in WERTI can be far away from the true one, so that cycle skipping occurs in high-frequency reflection data. As shown in previous examples, WERTI with dynamic warping can minimize traveltime shifts between synthetic and recorded reflections, even though they initially deviate from each other by more than half a cycle.
The dynamic warping optimization problem implied by equation 7 is similar to the FWI problem in equation 12. Dynamic warping finds globally optimal time shifts, instead of velocities in FWI, to solve a nonlinear least-squares problem in the objective function (equation 7). Because dynamic warping is a global optimization, it is less sensitive to noise and phase errors in wavelet estimation than correlation-based methods. Figure 2 and more examples shown by Hale (2013) demonstrate the robustness of DIW when used to estimate time shifts from data with other differences unrelated to time shifts, such as noise and waveform differences. Still, further testing is necessary to assess the ability of DIW to estimate time shifts between synthetic and recorded data in the context of WERTI, especially when WERTI is applied to real data. In this paper, WERTI minimizes only time shifts between synthetic and recorded data. Future extension of WERTI to minimize spatial shifts along the source and receiver coordinates of data may also be necessary.
Each WERTI iteration is more expensive than each FWI iteration; the additional cost is primarily due to the new gradient , which requires computing four wavefields (, , , and ). Therefore, WERTI is twice as expensive as conventional FWI in a single iteration. Relative to the cost of computing the wavefields required by WERTI and FWI, the cost of DIW is negligible.
In this paper, we proposed WERTI to estimate the low-wavenumber components of subsurface velocity models using seismic reflections. WERTI requires no picking, and it uses dynamic warping to detect time-varying time shifts, without suffering from cycle skipping problems that often plague correlation-based methods. By minimizing only traveltime errors, WERTI can overcome local minima that cannot be avoided in reflection FWI with insufficient low frequencies in recorded data.
We design an inversion process by combining WERTI and FWI to estimate the low- and high-wavenumber components of the velocity model alternately. Tests on two synthetic models demonstrate that this process can effectively recover subsurface velocity models using only high-frequency reflection data. This alternation strategy can be extended to invert for the low-wavenumber background velocity and the high-wavenumber impedance contrast, which in fact characterize the subsurface more accurately and, of course, requires density in the state and adjoint-state wave equations.
We are grateful to Y. Luo (Saudi Aramco) for insightful discussions and suggestions. We thank the associate editor S. Operto, R.-É. Plessix, G. Lambaré, and one anonymous reviewer for their thorough review and many constructive suggestions that led to significant improvements of the paper. This work is sponsored by a research agreement between ConocoPhillips and the Colorado School of Mines (SST-20090254-SRA).
APPENDIX A ADJOINT-STATE METHOD FOR WERTI
In this appendix, we follow the adjoint-state approach described by Plessix (2006) to derive the gradients and in WERTI, where we have a total of three state equations and two of them are equations 2 and 3. The third state equation is from the DIW optimization problem.
Recall that DIW aims to find to minimize the objective function (equation 7), which implies (A-1)where . This partial derivative must be zero because if it were not zero, then we could decrease by changing slightly, which contradicts the fact that is minimized in DIW. For more details, we refer readers to Hale (2013), where plots of can be found.
Equations 2, 3, and A-1 are the three state equations in WERTI. We define an augmented functional (A-2)where , , and are state variables and , , and are adjoint-state variables, respectively. and are also called back-propagating wavefields.
Adjoint-state variables , , and can be obtained by zeroing derivatives of with respect to state variables , , and , respectively. This provides us (A-3)(A-4)and (A-5)where (A-6)and (A-7)Equations A-4 and A-5 are adjoint-state equations, corresponding to state equations 2 and 3, respectively. denotes the adjoint source for the adjoint-state equation A-4; for equation A-5, the adjoint source is .
- Received January 4, 2013.
- Revision received June 13, 2013.
- Accepted June 30, 2013.