- Society of Exploration Geophysicists
Spectral decomposition methods help illuminate lateral changes in porosity and thin-bed thickness. For broadband data, an interpreter might generate 80 or more somewhat redundant amplitude and phase spectral components spanning the usable seismic bandwidth at intervals. Large numbers of components can overload not only the interpreter but also the display hardware. We have used principal component analysis to reduce the multiplicity of spectral data and enhance the most energetic trends inside the data. Each principal component spectrum is mathematically orthogonal to other spectra, with the importance of each spectrum being proportional to the size of its corresponding eigenvalue. Principal components are ideally suited to identify geologic features that give rise to anomalous moderate- to high-amplitude spectra. Unlike the input spectral magnitude and phase components, the principal component spectra are not direct indicators of bed thickness. By combining the variability of multiple components, principal component spectra highlight stratigraphic features that can be interpreted using a seismic geomorphology workflow. By mapping the three largest principal components using the three primary colors of red, green, and blue, we could represent more than 80% of the spectral variance with a single image. We have applied and validated this workflow using a broadband data volume containing channels draining an unconformity, which was acquired over the Central Basin Platform, Texas, U.S.A. Principal component analysis reveals a channel system with only a few output data volumes. The same process provides the interpreter with flexibility to remove any unwanted high-amplitude geologic trends or random noise from the original spectral components by eliminating those principal components that do not aid in delineation of prospective features with their interpretation during the reconstruction process.
Spectral decomposition of seismic data is a recently introduced interpretation tool that aids in the identification of hydrocarbons, classification of facies, and calibration of thin-bed thickness. Whether based on the discrete Fourier transform (Partyka et al., 1999), wavelet transform (Castagna et al., 2003), or S-transform (Matos et al., 2005), spectral decomposition typically generates significantly more output data than input data, presenting challenges in conveying the meaning of these data in a concise and interpreter-friendly form. Typically, an interpreter might generate 80 or more spectral amplitude and phase components spanning the usable seismic bandwidth at intervals. With so much data available, the key issue for interpretation is to develop an effective way for data representation and reduction.
The most common means of displaying these components is simply by scrolling through them to determine manually which single frequency best delineates an anomaly of interest. We illustrate this process for the phantom horizon slice through the seismic amplitude volume (zero-phase reflectivity) above the Atoka unconformity from a survey acquired over the Central Basin Platform, Texas, U.S.A. (Figure 1). We compute 86 spectral components ranging from through using a matched pursuit technique described by Liu and Marfurt (2007a). Figure 2 shows representative corresponding phantom horizon slices at intervals from (Figure 2a) through (Figure 2e).
By observing how bright and dim areas of the response move laterally with increasing frequency, a skilled interpreter can determine whether a channel or other stratigraphic feature of interest is thickening or thinning. For example, the thinner upstream portions of the channel indicated by the magenta arrows in Figure 1 are better delineated on the components displayed in Figure 2c, whereas the thicker downstream portions of the channel indicated by the yellow arrows in Figure 1 are better delineated by the components displayed in Figure 2b. After initial analysis, we might be able to limit the display to only those frequency components most important to the task at hand (Fahmy et al., 2005). For example, we might choose the component to map the channel if we observe that most channel beds have the maximum constructive interference at this frequency. However, in terms of conveying information of the spectral variation from the data space, only a small portion of the spectral variation is displayed (one out of a possible 86 spectral components).
Furthermore, although scanning works well for choosing the best frequency for a given horizon, it can become impractical when trying to determine the best frequency for multiple spectral component volumes. The spectral decomposition algorithm also outputs the phase spectrum corresponding to each amplitude spectrum. Figure 3 shows the corresponding phase spectra to the amplitude spectra shown in Figure 2. The phase images in Figure 3 are not easy to interpret when compared to the amplitude slices in Figure 2, and thus the interpreter might choose to ignore the phase spectrum information during the interpretation.
We can reduce the amount of images to be scanned by the use of color stacking (e.g., Liu and Marfurt, 2007b; Stark, 2005; Theophanis and Queen, 2000). In this workflow, we plot one component against red, a second against green, and a third against blue, and scale so that the color value range (0–255) of each primary color represents 95% of the corresponding component data values. Figure 4 shows red, green, and blue (RGB) images of different frequency combinations. Figure 4a–c is designed to accentuate the spectral variation in the low-, intermediate-, and high-frequency ranges, respectively. Figure 4d–f shows three possible frequency combinations on the global frequency ranges.
Compared with the images in Figure 2, the RGB images in Figure 4 show greater detail by combining multiple images. Simply stated, color stacking can triple the information that is conveyed by a single mono-frequency display. Considering the complexity of the spectra in the target horizon and the limit of color channels for display, each combination will highlight only certain parts of the whole spectrum. Without further data reduction, representation of the original spectral content reaches its limits (for example, combinations). This situation motivates more efficient data-reduction techniques.
One method of data reduction is to generate synthetic attributes first by correlating the data against predefined spectra (or basis functions) and plotting the correlation coefficients, instead of the spectral components, against color. In the Central Basin Platform example, we note that the combination of 30, 60, is not strikingly different from the combination of 20, 50, , implying that there are no significant changes over a range. Taking advantage of the redundancy or correlation in the original spectra, Stark (2005) defines three spectral basis functions that can be interpreted as simple averages and plotted the low-frequency average against red, intermediate-frequency average against green, and highest-frequency average against blue.
Liu and Marfurt (2007b) provide a moderate improvement to Stark's (2005) gate or “boxcar” spectral basis function by applying raised cosines over user-defined spectral ranges to generate low-, mid-, and high-frequency color stack images. Whereas in some cases the raised cosine function better approximates the cosine-like thin-bed tuning response, such approximations are suboptimal; we might lose a considerable amount of frequency variability represented by using either Stark's (2005) or Liu and Marfurt's (2007b) three predefined basis functions. Furthermore, there is no simple measure of how much of the spectrum we do not represent.
Instead of using predefined basis functions or spectra, we propose using principal component analysis to determine mathematically those frequency spectra that best represent the data variability, thereby segregating noise components and reducing the dimensionality of spectral components. We begin with a review of principal component analysis applied to real spectral amplitudes, and then show how it is readily generalized to represent complex spectra consisting of spectral magnitudes and phases. We apply the method to a land seismic data volume from the Central Basin Platform and show how we can represent 80% of the data variation using a color stack. We conclude by showing how we can filter out selected spectral variations and thereby enhance individual spectral images by eliminating interpreter-chosen components during the reconstruction.
THEORY AND METHOD
Principal component analysis finds a new set of orthogonal axes that have their origin at the data mean and that are rotated so that the data variance is maximized (Figure 5). The orthogonal axes are called eigenvectors and represent spectra in the original frequency domain. The projections of the original spectra onto these axes are called principal component (PC) bands. The amount of the total variance that each PC band can represent is quantified by its corresponding eigenvalue. Compared with the original frequency components, PC bands are orthogonal (and thus uncorrelated) linear combinations of the original spectra. In theory, we calculate the same number of output PC bands as the input spectral components. The first PC band represents the largest percentage of data variance, the second PC band represents the second-largest data variance, and so on. The last PC bands represent the uncorrelated part of the original spectral data, which includes random noise.
In the seismic spectral volumes we have analyzed, we find that the first 15 PC bands represent more than 95% of the total variance of the original data. The first three PC bands represent as much as 80% of the total variance of the data. Furthermore, we can reconstruct “filtered” spectra by using a subset of the interpreter-chosen PC spectra, thereby providing a means of rejecting random noise and components that interpreters consider as uninteresting background. We show a quantitative example of this in the next section.
Principal component analysis of more than 100 spectral components is well established in remote-sensing interpretation software and workflows (Rodarmel and Shan, 2002). Principal component analysis of seismic attributes also is well established, particularly with respect to seismic shape analysis (Coléou et al., 2003). Mathematically, principal component analysis is an effective statistical data-reduction method for data spaces in which the dimension is very large and the data axes are not orthogonal (that is, somewhat redundant) to each other. From an interpreter's point of view, principal component analysis applied to spectral components begins by determining which frequency spectrum (the first principal component) best represents the entire data volume. The second principal component is that spectrum that best represents the part of the data not represented by the first principal component. The third principal component is the spectrum that best represents that part of the data not represented by the first two principal components, and so on.
If normalized by the total sum of all eigenvalues, the eigenvalue associated with each eigenvector represents the percentage of data that can be represented by its corresponding principal component. In general, the first principal component is a spectrum that represents more than 50% of the data variance, the second principal component is a spectrum that represents another 15% through 25% of the data variance, and the third principal component is a spectrum that represents about 5% of the data variance. Together, these three components usually represent about 80% of the frequency variance seen in the data along this horizon slice.
Principal component analysis on spectral components consists of three steps. The first step is to assemble the covariance matrix by crosscorrelating every frequency component, , with itself and all other frequency components (Figure 6), resulting in a square, J by J symmetrical, covariance matrix C: (1) where is the element of the covariance matrix C; N is the number of seismic lines in the survey; M is the number of seismic crosslines in the survey; and and are spectral magnitudes of the and frequencies at line n and crossline m.
The second step is to decompose the covariance matrix into J scalar eigenvalues and J unit length eigenvectors by solving the equation (2)
Almost all numerical solutions of equation 2 (we use LAPACK  program ssyevx) sort the eigenvectors in either ascending or descending order according to their corresponding eigenvalues. The third step is to project the spectrum at each trace onto each eigenvector to obtain a map of coefficients that measure how much of each spectrum is represented by a given eigenvector: (3) where the index j indicates the frequency. The output is a series of PC bands sorted in descending order of their statistical significance — the percentage of original data variance observable in the particular PC band.
To illustrate the effectiveness of this technique, we examine the phantom horizon slice above the Pennsylvanian age Atoka unconformity from a survey acquired over the Central Basin Platform, shown in Figure 1. We compute 86 spectral components ranging from 5 to using a matched pursuit technique described by Liu and Marfurt (2007a). Next we perform principal component analysis on the 86 spectral components, form an 86 by 86 covariance matrix using equation 1, decompose it into 86 eigenvalue-eigenvector pairs using equation 2, and project the original spectra at each trace onto each eigenvector using equation 3. Figure 7a displays the percentage of data defined by each of the principal component spectra (or eigenvectors) displayed in Figure 7b. The first three principal components account for most of the spectral variance seen along this horizon. The remaining components account for only about 17% of the data variance.
Although the input amplitude volume has nonwhite amplitude spectra, the spectral components were statistically balanced as part of Liu and Marfurt's (2007a) matched-pursuit spectral decomposition algorithm. For this reason, the first eigenvector PC band 1 is approximately flat and represents 62% of the total variance in the original data. The second eigenvector is monotonically decreasing, with the high frequencies contributing less than the low frequencies. We interpret this trend as representing the fact that the low frequencies probably are in-phase with each other, whereas the higher frequencies might have greater variance and need to be represented by more than one eigenvector.
Although it is tempting to assign physical significance to these spectra (with eigenvector 3 perhaps representing thin-bed tuning at about ), we need to remember that they reside in mathematical rather than geologic space. Whereas the first eigenvector best represents the data, all subsequent eigenvectors are constructed mathematically to be orthogonal to the previous ones. The PC bands are just weighted sums of the original spectral components, as seen in equation 3.
In Figure 8, we plot the four largest principal components of the data, as well as the 71st component. It is important to remember that if a given event has very high reflectivity, it will have high amplitude in its components as well. For this reason, we see channels tuning in and out in PC 1. We note that components 2, 3, and 4 display more anomalous behavior in the sense of spectral shape changes relative to the total of all components. The PC 71 represents one example of the “noisy” PC bands starting from number 20; we interpret PC 71 as random noise.
By mapping the three largest principal components against red, green, and blue, we can represent 83% of the spectral information with a single colored image (Figure 9) in which each component is rescaled to span the range 0–255. The conspicuous red features delineate the wider channels seen on the erosional high in the central and upper region in the horizon time map. The green features delineate narrower channels that are upstream from the previous red channels. Note that the narrow channels indicated by green arrows are quite difficult to see in Figures 2, 3, and f4. Although they are best delineated by the higher-frequency components, these components also are contaminated by noise. Based on our earlier discussion, note that coherent spectra will be represented by the first few principal components, and incoherent spectra corresponding to random noise will be represented by a linear combination of later principal components (such as PC 71 shown in Figure 8e). Selection of the PC bands with large eigenvalues implicitly increases the signal-to-noise ratio by rejecting noisy PC bands such as 71.
PRINCIPAL COMPONENTS OF COMPLEX SPECTRA
The previous analysis was performed only on the magnitude of the spectra, represented by the images displayed in Figure 2. The phase spectra displayed in Figure 3 provide additional information. For example, in Figure 3c the channels marked by the yellow arrows show a phase shift relative to the background, and these channels are more chaotic in the higher-frequency phase image as shown in Figure 3e. Note the spectral variation of the northwest features marked by the white arrows in Figure 3a–e. Several geologic depositional environments might be represented by the same amplitude spectrum. In an extremely simple example, consider the following four 2-term time series: (1, ), (, ), (, 1), and (, ). Each of these four series will have the same amplitude spectra. This ambiguity could create a false sense of continuity, whereas in reality we change from an upward-coarsening to an upward-fining sequence.
A formal generalization of equations 1–3 would begin by computing a J by J complex Hermitian-symmetrical covariance matrix from the amplitude and phase of the complex spectra. This complex Hermitian covariance matrix then is decomposed into J real eigenvalue-complex eigenvector pairs. Crosscorrelating the complex conjugate of the complex eigenvectors with the complex spectra provides complex crosscorrelation coefficients or principal component maps. We have implemented this approach and find it unsatisfactory for two reasons. First, it is unclear how to generalize our RGB multiattribute display to represent multiple complex maps. Second, and more important, the phase of the eigenvectors provides an extra degree of freedom to fit the complex spectra. This extra degree of freedom results in the channels being blurred instead of appearing as sharp discontinuities.
Marfurt (2006) recognized this same limitation in principal-component coherence computations using the analytic (complex) rather than the original (real) trace. He devised an alternative means of computing the statistics of the original (real) and Hilbert transform (imaginary) data, and simply treated the Hilbert transform of the data as if they were additional real samples. For our complex spectral analysis problem, we therefore simply add the covariance matrices computed from the real components, , to the covariance matrix computed from the imaginary components, , where is the magnitude and is the phase of the complex spectra at the frequency at line n and crossline m. If our survey has seismic traces, this process is equivalent to considering the survey as having traces. Because the real and imaginary components of the complex spectra are independent, in general, we will need to use more principal components to reconstruct the data than if we used the real data (or alternatively magnitude data) alone.
Generalizing equation 1, we obtain a J by J real symmetrical covariance matrix: (4) We then reapply equations 2 and 3 and note that the phase between the real and imaginary parts of the complex spectrum is “locked” and cannot be rotated when crosscorrelated with the real eigenvectors.
In Figure 10, we display maps of the complex spectra projected onto the first four principal components. The first PC band is strikingly similar to the counterpart in Figure 8. For the second PC band, the channels stand out in excellent contrast in Figure 10 compared with Figure 8.
PC ANALYSIS AS A FILTER
Because principal component analysis assigns the most coherent spectra to the first eigenvalues and the incoherent or random components of the spectra to the later eigenvalues, we can use PC analysis as a filter. For the data, the first five PC bands will take account of more than 85% of the variance of the original data. The eigenvalues corresponding to PC bands greater than five drop to a negligible level, so that spectra crosscorrelated against eigenvectors with eigenvectors greater than five appear to be quite random, representing very small variance in the data spectra. Plotting the first three spectral components, we generate the RGB color stack image in Figure 9, thereby implicitly suppressing noise and increasing the signal-to-noise ratio. A more dramatic example can be found in Guo et al. (2006), in which the horizon was contaminated by bad picks.
We can use the first five PC bands to reconstruct most of the original data. In Figure 11a, we redisplay the spectral magnitude component shown originally in Figure 2e. In Figure 11b, we reconstruct the spectral component from the first five PC bands. We notice that the reconstructed data are very similar to the original data, which proves the effectiveness of data reconstruction. In Figure 11c, we use PC bands 1, 3, 4, and 5 to reconstruct the spectral magnitude. Note how the narrow channels are delineated more easily after rejection of PC band 2 in Figure 8b.
Principal component filtering provides interpreters with flexibility to remove any unwanted trends from the original data by interactively rejecting those principal components that are not correlated to features of interpretation interest during data reconstruction. Such exploratory data analysis is a well-accepted workflow in attribute analysis. In some cases, the acquisition footprint might appear strongly in a given principal component spectrum and thus can be suppressed in subsequent reconstruction. In other cases, a strong thin-bed tuning imprint corresponding to the background matrix might show up a given component and can be rejected during reconstruction, thereby enhancing more subtle lateral changes in spectral components. We interpret PC band 2 in Figure 8b to be such a tuning effect.
Unfortunately, the principal components themselves do not always have a fixed relation to reflector thickness but instead will vary from horizon to horizon and survey to survey. Thus, we object to predicting reservoir thickness using principal components through geostatistics or neural nets, although we suspect this might work in some cases. Still, prediction of thickness from physical principals requires use of the original spectral components or reconstructed original components from major PC components. On the other hand, as with principal components applied to seismic trace-shape analysis (Coléou et al., 2003), we suspect that principal components will be an excellent tool for anomaly mapping using self-organized maps because the first few principal components preserve most of the original data variance.
Display of major principal components might overlook subtle features with little reflectivity. By construction, principal components are ordered, with the first principal component representing the greatest variance in the data. For this reason, high-amplitude background reflectors (the rock matrix through which the channel was cut in our example) will show up strongly in the first few components. However, not all features of exploration interest have strong reflectivity. A significant challenge in spectral decomposition is the illumination of “invisible channels” (Suarez et al., 2008), channels whose reflectivity is nearly indistinguishable from that of the surrounding matrix. Wallet and Marfurt (2008) discuss an alternative, more exhaustive search, based on the “grand tour” method. Because the grand tour is a projection method, they find that using the first few principal components (computed using the method described here) form optimal, compact basis functions that can be projected either as a movie or interactively.
As mathematical combinations of different spectral components, principal component spectra have little direct relation to porosity thickness provided by the original spectral components themselves. Instead, the images need to be interpreted in a more qualitative manner, using principals of seismic geomorphology within a given depositional, erosional, or diagenetic framework. Calibration of these geomorphological features needs to be done the hard way — through comparison to the original seismic data, and to wells and production data. Direct correlation of spectra to reservoir properties also needs to be calibrated. Fortunately, such workflows are well established in seismic waveform analysis (e.g., Coléou et al., 2003).
We have shown that principal component analysis can reduce the redundant spectral components into significantly fewer, more manageable bands that capture most of the statistical variance of the original spectral response. By mapping the three largest principal components using an RGB color stack, we can represent most of the spectral variance with a single image, which in our example provided an excellent delineation of channels.
Whereas the first PC band having the largest eigenvalue represents most of the signal, the later PC bands having smaller eigenvalues represent random noise. This noise might be associated with input data contaminated by ground roll, acquisition footprint, and/or bad picks. Reconstructing the spectral components from a subset of PC bands provides a filtering tool that allows us to reject noise or enhance spectral behavior that best delineates the features of interest. Mathematically, the first three PC bands represent most variance of the original data. However, it is important to note that there is no definite physical significance to the exact shape of the PC bands. Thus, although such images are excellent for highlighting lateral changes in data that can fit into a seismic geomorphology model, they are not easily usable for quantitative inversion, such as predicting porosity thickness. However, they could be useful input attributes to a supervised multiattribute prediction.
We thank Burlington Resources for permission to use their data in this research. We also thank the sponsors of the Allied Geophysical Laboratories (AGL) industrial consortium on mapping of subtle structure and stratigraphic features using modern geometric attributes. We thank associate editor Dengliang Gao and three anonymous reviewers for their help in generating a significantly improved paper.
- Received June 25, 2008.
- Revision received September 17, 2008.