# Geophysics

- Society of Exploration Geophysicists

## Abstract

Spectral decomposition methods help illuminate lateral changes in porosity and thin-bed thickness. For broadband data, an interpreter might generate 80 or more somewhat redundant amplitude and phase spectral components spanning the usable seismic bandwidth at

## INTRODUCTION

Spectral decomposition of seismic data is a recently introduced interpretation tool that aids in the identification of hydrocarbons, classification of facies, and calibration of thin-bed thickness. Whether based on the discrete Fourier transform (Partyka et al., 1999), wavelet transform (Castagna et al., 2003), or S-transform (Matos et al., 2005), spectral decomposition typically generates significantly more output data than input data, presenting challenges in conveying the meaning of these data in a concise and interpreter-friendly form. Typically, an interpreter might generate 80 or more spectral amplitude and phase components spanning the usable seismic bandwidth at

The most common means of displaying these components is simply by scrolling through them to determine manually which single frequency best delineates an anomaly of interest. We illustrate this process for the phantom horizon slice through the seismic amplitude volume (zero-phase reflectivity)

By observing how bright and dim areas of the response move laterally with increasing frequency, a skilled interpreter can determine whether a channel or other stratigraphic feature of interest is thickening or thinning. For example, the thinner upstream portions of the channel indicated by the magenta arrows in Figure 1 are better delineated on the

Furthermore, although scanning works well for choosing the best frequency for a given horizon, it can become impractical when trying to determine the best frequency for multiple spectral component volumes. The spectral decomposition algorithm also outputs the phase spectrum corresponding to each amplitude spectrum. Figure 3 shows the corresponding phase spectra to the amplitude spectra shown in Figure 2. The phase images in Figure 3 are not easy to interpret when compared to the amplitude slices in Figure 2, and thus the interpreter might choose to ignore the phase spectrum information during the interpretation.

We can reduce the amount of images to be scanned by the use of color stacking (e.g., Liu and Marfurt, 2007b; Stark, 2005; Theophanis and Queen, 2000). In this workflow, we plot one component against red, a second against green, and a third against blue, and scale so that the color value range (0–255) of each primary color represents 95% of the corresponding component data values. Figure 4 shows red, green, and blue (RGB) images of different frequency combinations. Figure 4a–c is designed to accentuate the spectral variation in the low-, intermediate-, and high-frequency ranges, respectively. Figure 4d–f shows three possible frequency combinations on the global frequency ranges.

Compared with the images in Figure 2, the RGB images in Figure 4 show greater detail by combining multiple images. Simply stated, color stacking can triple the information that is conveyed by a single mono-frequency display. Considering the complexity of the spectra in the target horizon and the limit of color channels for display, each combination will highlight only certain parts of the whole spectrum. Without further data reduction, representation of the original spectral content reaches its limits (for example,

One method of data reduction is to generate synthetic attributes first by correlating the data against predefined spectra (or basis functions) and plotting the correlation coefficients, instead of the spectral components, against color. In the Central Basin Platform example, we note that the combination of 30, 60,

Liu and Marfurt (2007b) provide a moderate improvement to Stark's (2005) gate or “boxcar” spectral basis function by applying raised cosines over user-defined spectral ranges to generate low-, mid-, and high-frequency color stack images. Whereas in some cases the raised cosine function better approximates the cosine-like thin-bed tuning response, such approximations are suboptimal; we might lose a considerable amount of frequency variability represented by using either Stark's (2005) or Liu and Marfurt's (2007b) three predefined basis functions. Furthermore, there is no simple measure of how much of the spectrum we do not represent.

Instead of using predefined basis functions or spectra, we propose using principal component analysis to determine mathematically those frequency spectra that best represent the data variability, thereby segregating noise components and reducing the dimensionality of spectral components. We begin with a review of principal component analysis applied to real spectral amplitudes, and then show how it is readily generalized to represent complex spectra consisting of spectral magnitudes and phases. We apply the method to a land seismic data volume from the Central Basin Platform and show how we can represent 80% of the data variation using a color stack. We conclude by showing how we can filter out selected spectral variations and thereby enhance individual spectral images by eliminating interpreter-chosen components during the reconstruction.

## THEORY AND METHOD

Principal component analysis finds a new set of orthogonal axes that have their origin at the data mean and that are rotated so that the data variance is maximized (Figure 5). The orthogonal axes are called eigenvectors and represent spectra in the original frequency domain. The projections of the original spectra onto these axes are called principal component (PC) bands. The amount of the total variance that each PC band can represent is quantified by its corresponding eigenvalue. Compared with the original frequency components, PC bands are orthogonal (and thus uncorrelated) linear combinations of the original spectra. In theory, we calculate the same number of output PC bands as the input spectral components. The first PC band represents the largest percentage of data variance, the second PC band represents the second-largest data variance, and so on. The last PC bands represent the uncorrelated part of the original spectral data, which includes random noise.

In the seismic spectral volumes we have analyzed, we find that the first 15 PC bands represent more than 95% of the total variance of the original data. The first three PC bands represent as much as 80% of the total variance of the data. Furthermore, we can reconstruct “filtered” spectra by using a subset of the interpreter-chosen PC spectra, thereby providing a means of rejecting random noise and components that interpreters consider as uninteresting background. We show a quantitative example of this in the next section.

Principal component analysis of more than 100 spectral components is well established in remote-sensing interpretation software and workflows (Rodarmel and Shan, 2002). Principal component analysis of seismic attributes also is well established, particularly with respect to seismic shape analysis (Coléou et al., 2003). Mathematically, principal component analysis is an effective statistical data-reduction method for data spaces in which the dimension is very large and the data axes are not orthogonal (that is, somewhat redundant) to each other. From an interpreter's point of view, principal component analysis applied to spectral components begins by determining which frequency spectrum (the first principal component) best represents the entire data volume. The second principal component is that spectrum that best represents the part of the data not represented by the first principal component. The third principal component is the spectrum that best represents that part of the data not represented by the first two principal components, and so on.

If normalized by the total sum of all eigenvalues, the eigenvalue associated with each eigenvector represents the percentage of data that can be represented by its corresponding principal component. In general, the first principal component is a spectrum that represents more than 50% of the data variance, the second principal component is a spectrum that represents another 15% through 25% of the data variance, and the third principal component is a spectrum that represents about 5% of the data variance. Together, these three components usually represent about 80% of the frequency variance seen in the data along this horizon slice.

Principal component analysis on spectral components consists of three steps. The first step is to assemble the covariance matrix by crosscorrelating every frequency component, *J* by *J* symmetrical, covariance matrix **C**:
**C**; *N* is the number of seismic lines in the survey; *M* is the number of seismic crosslines in the survey; and *n* and crossline *m*.

The second step is to decompose the covariance matrix into *J* scalar eigenvalues *J* unit length

Almost all numerical solutions of equation 2 (we use LAPACK [2007] program **ssyevx**) sort the eigenvectors *j* indicates the

## EXAMPLE

To illustrate the effectiveness of this technique, we examine the phantom horizon slice

Although the input amplitude volume has nonwhite amplitude spectra, the spectral components were statistically balanced as part of Liu and Marfurt's (2007a) matched-pursuit spectral decomposition algorithm. For this reason, the first eigenvector PC band 1 is approximately flat and represents 62% of the total variance in the original data. The second eigenvector is monotonically decreasing, with the high frequencies contributing less than the low frequencies. We interpret this trend as representing the fact that the low frequencies probably are in-phase with each other, whereas the higher frequencies might have greater variance and need to be represented by more than one eigenvector.

Although it is tempting to assign physical significance to these spectra (with eigenvector 3 perhaps representing thin-bed tuning at about

In Figure 8, we plot the four largest principal components of the data, as well as the 71st component. It is important to remember that if a given event has very high reflectivity, it will have high amplitude in its components as well. For this reason, we see channels tuning in and out in PC 1. We note that components 2, 3, and 4 display more anomalous behavior in the sense of spectral shape changes relative to the total of all components. The PC 71 represents one example of the “noisy” PC bands starting from number 20; we interpret PC 71 as random noise.

By mapping the three largest principal components against red, green, and blue, we can represent 83% of the spectral information with a single colored image (Figure 9) in which each component is rescaled to span the range 0–255. The conspicuous red features delineate the wider channels seen on the erosional high in the central and upper region in the horizon time map. The green features delineate narrower channels that are upstream from the previous red channels. Note that the narrow channels indicated by green arrows are quite difficult to see in Figures 2, 3, and f4. Although they are best delineated by the higher-frequency components, these components also are contaminated by noise. Based on our earlier discussion, note that coherent spectra will be represented by the first few principal components, and incoherent spectra corresponding to random noise will be represented by a linear combination of later principal components (such as PC 71 shown in Figure 8e). Selection of the PC bands with large eigenvalues implicitly increases the signal-to-noise ratio by rejecting noisy PC bands such as 71.

## PRINCIPAL COMPONENTS OF COMPLEX SPECTRA

The previous analysis was performed only on the magnitude of the spectra, represented by the images displayed in Figure 2. The phase spectra displayed in Figure 3 provide additional information. For example, in Figure 3c the channels marked by the yellow arrows show a

A formal generalization of equations 1–3 would begin by computing a *J* by *J* complex Hermitian-symmetrical covariance matrix from the amplitude and phase of the complex spectra. This complex Hermitian covariance matrix then is decomposed into *J* real eigenvalue-complex eigenvector pairs. Crosscorrelating the complex conjugate of the complex eigenvectors with the complex spectra provides complex crosscorrelation coefficients or principal component maps. We have implemented this approach and find it unsatisfactory for two reasons. First, it is unclear how to generalize our RGB multiattribute display to represent multiple complex maps. Second, and more important, the phase of the eigenvectors provides an extra degree of freedom to fit the complex spectra. This extra degree of freedom results in the channels being blurred instead of appearing as sharp discontinuities.

Marfurt (2006) recognized this same limitation in principal-component coherence computations using the analytic (complex) rather than the original (real) trace. He devised an alternative means of computing the statistics of the original (real) and Hilbert transform (imaginary) data, and simply treated the Hilbert transform of the data as if they were additional real samples. For our complex spectral analysis problem, we therefore simply add the covariance matrices computed from the real components, *n* and crossline *m*. If our survey has

Generalizing equation 1, we obtain a *J* by *J* real symmetrical covariance matrix:

In Figure 10, we display maps of the complex spectra projected onto the first four principal components. The first PC band is strikingly similar to the counterpart in Figure 8. For the second PC band, the channels stand out in excellent contrast in Figure 10 compared with Figure 8.

## PC ANALYSIS AS A FILTER

Because principal component analysis assigns the most coherent spectra to the first eigenvalues and the incoherent or random components of the spectra to the later eigenvalues, we can use PC analysis as a filter. For the data, the first five PC bands will take account of more than 85% of the variance of the original data. The eigenvalues corresponding to PC bands greater than five drop to a negligible level, so that spectra crosscorrelated against eigenvectors with eigenvectors greater than five appear to be quite random, representing very small variance in the data spectra. Plotting the first three spectral components, we generate the RGB color stack image in Figure 9, thereby implicitly suppressing noise and increasing the signal-to-noise ratio. A more dramatic example can be found in Guo et al. (2006), in which the horizon was contaminated by bad picks.

We can use the first five PC bands to reconstruct most of the original data. In Figure 11a, we redisplay the

Principal component filtering provides interpreters with flexibility to remove any unwanted trends from the original data by interactively rejecting those principal components that are not correlated to features of interpretation interest during data reconstruction. Such exploratory data analysis is a well-accepted workflow in attribute analysis. In some cases, the acquisition footprint might appear strongly in a given principal component spectrum and thus can be suppressed in subsequent reconstruction. In other cases, a strong thin-bed tuning imprint corresponding to the background matrix might show up a given component and can be rejected during reconstruction, thereby enhancing more subtle lateral changes in spectral components. We interpret PC band 2 in Figure 8b to be such a tuning effect.

Unfortunately, the principal components themselves do not always have a fixed relation to reflector thickness but instead will vary from horizon to horizon and survey to survey. Thus, we object to predicting reservoir thickness using principal components through geostatistics or neural nets, although we suspect this might work in some cases. Still, prediction of thickness from physical principals requires use of the original spectral components or reconstructed original components from major PC components. On the other hand, as with principal components applied to seismic trace-shape analysis (Coléou et al., 2003), we suspect that principal components will be an excellent tool for anomaly mapping using self-organized maps because the first few principal components preserve most of the original data variance.

## LIMITATIONS

Display of major principal components might overlook subtle features with little reflectivity. By construction, principal components are ordered, with the first principal component representing the greatest variance in the data. For this reason, high-amplitude background reflectors (the rock matrix through which the channel was cut in our example) will show up strongly in the first few components. However, not all features of exploration interest have strong reflectivity. A significant challenge in spectral decomposition is the illumination of “invisible channels” (Suarez et al., 2008), channels whose reflectivity is nearly indistinguishable from that of the surrounding matrix. Wallet and Marfurt (2008) discuss an alternative, more exhaustive search, based on the “grand tour” method. Because the grand tour is a projection method, they find that using the first few principal components (computed using the method described here) form optimal, compact basis functions that can be projected either as a movie or interactively.

As mathematical combinations of different spectral components, principal component spectra have little direct relation to porosity thickness provided by the original spectral components themselves. Instead, the images need to be interpreted in a more qualitative manner, using principals of seismic geomorphology within a given depositional, erosional, or diagenetic framework. Calibration of these geomorphological features needs to be done the hard way — through comparison to the original seismic data, and to wells and production data. Direct correlation of spectra to reservoir properties also needs to be calibrated. Fortunately, such workflows are well established in seismic waveform analysis (e.g., Coléou et al., 2003).

## CONCLUSIONS

We have shown that principal component analysis can reduce the redundant spectral components into significantly fewer, more manageable bands that capture most of the statistical variance of the original spectral response. By mapping the three largest principal components using an RGB color stack, we can represent most of the spectral variance with a single image, which in our example provided an excellent delineation of channels.

Whereas the first PC band having the largest eigenvalue represents most of the signal, the later PC bands having smaller eigenvalues represent random noise. This noise might be associated with input data contaminated by ground roll, acquisition footprint, and/or bad picks. Reconstructing the spectral components from a subset of PC bands provides a filtering tool that allows us to reject noise or enhance spectral behavior that best delineates the features of interest. Mathematically, the first three PC bands represent most variance of the original data. However, it is important to note that there is no definite physical significance to the exact shape of the PC bands. Thus, although such images are excellent for highlighting lateral changes in data that can fit into a seismic geomorphology model, they are not easily usable for quantitative inversion, such as predicting porosity thickness. However, they could be useful input attributes to a supervised multiattribute prediction.

## ACKNOWLEDGMENTS

We thank Burlington Resources for permission to use their data in this research. We also thank the sponsors of the Allied Geophysical Laboratories (AGL) industrial consortium on mapping of subtle structure and stratigraphic features using modern geometric attributes. We thank associate editor Dengliang Gao and three anonymous reviewers for their help in generating a significantly improved paper.

- Received June 25, 2008.
- Revision received September 17, 2008.