The soil particle-size fractions (PSFs) are one of the most important attributes to influence soil physical (e.g., soil hydraulic properties) and chemical (e.g., cation exchange) processes. There is an increasing need, therefore, for high-resolution digital prediction of PSFs to improve our ability to manage agricultural land. Consequently, use of ancillary data to make cheaper high-resolution predictions of soil properties is becoming popular. This approach is known as “digital soil mapping.” However, most commonly employed techniques (e.g., multiple linear regression or MLR) do not consider the special requirements of a regionalized composition, namely PSF; (1) should be nonnegative (2) should sum to a constant at each location, and (3) estimation should be constrained to produce an unbiased estimation, to avoid false interpretation. Previous studies have shown that the use of the additive log-ratio transformation (ALR) is an appropriate technique to meet the requirements of a composition. In this study, we investigated the use of ancillary data (i.e., electromagnetic (EM), gamma-ray spectrometry, Landsat TM, and a digital elevation model to predict soil PSF using MLR and generalized additive models (GAM) in a standard form and with an ALR transformation applied to the optimal method (GAM-ALR). The results show that the use of ancillary data improved prediction precision by around 30% for clay, 30% for sand, and 7% for silt for all techniques (MLR, GAM, and GAM-ALR) when compared to ordinary kriging. However, the ALR technique had the advantage of adhering to the special requirements of a composition, with all predicted values nonnegative and PSFs summing to unity at each prediction point and giving more accurate textural prediction.
Soil texture is determined by its constituent particle-size fractions (PSFs); e.g., sand, silt, and clay. In agriculture, understanding soil texture is crucial for economic and environmental reasons. This is because PSFs play a central role in defining key soil properties such as hydraulic conductivity, water holding capacity, and nutrient status. The use of PSF data for agricultural management is increasing as pedotransfer functions (Wosten et al., 2001; McBratney et al., 2002) become more widely accepted and used to predict soil physical processes. However, laboratory-based analysis of PSFs is time-consuming and expensive. It is therefore inevitable that the ever-increasingly available remote-sensing and ground-based ancillary data should be used to create digital soil maps of PSFs data at resolutions and scales previously not practicable.
Several types of ancillary data have been used for the prediction of a single component of soil PSFs. For example, electromagnetic (EM) induction was used to identify variations in topsoil clay content (Cockx et al., 2009), depth to clay (Saey et al., 2011), and highlight areas of sandy soil (Cook et al., 1992). Other examples include the use of airborne gamma-ray spectrometry data for mapping sand (Cook et al., 1996), digital elevation models (DEMs) for predicting sand, silt, and clay (Gobin et al., 2001; Odeh et al., 1994), and Landsat TM data for modeling sand (Levin et al., 2004) and clay (Dematte and Nanni, 2003). All these were based on empirical or deterministic relationships between a given PSF and the ancillary variables. Previously, various geostatistical models of kriging were applied to spatially predict soil PSFs (e.g., Burgess and Webster, 1980). Additionally, techniques such as cokriging and regression kriging (Odeh et al., 1995 and Triantafilis et al., 2001) that take advantage of the ancillary data at fine-resolution, have also been applied. However, these nonspatial deterministic or empirical models and/or geostatistical models are not amenable to compositional data because they do not produce estimates that sum to a constant — a strict requirement of compositions. De Gruijter et al. (1997) point out other requirements: (1) each component of the composition must be nonnegative; and (2) estimates should be unbiased.
To meet these requirements, a few authors have used data transformation prior to interpolation. For example, McBratney et al. (1992) applied a semisymmetric log-ratio transformation, prior to spatial prediction of fuzzy soil classes. Later, De Gruijter et al. (1997) developed compositional kriging, which allowed interpolation of fuzzy membership grades, while simultaneously honoring the constant sum requirement. More recently (Odeh et al., 2003) used additive log-ratio transformation (ALR) prior to mapping of PSFs, while Lark and Bishop (2007) proposed a similar approach, but cokriged the ALR transformed data. While the latter two studies were focused on PSFs as compositional data, little has been done to take advantage of the increasingly rich and widely available ancillary data. The objectives of this paper are therefore to (1) determine the optimal combinations of ancillary variables EM, gamma-ray spectrometry, DEM, and Landsat TM for predicting soil PSFs; (2) determine the best performer among various techniques (multiple linear regression [MLR], generalized additive models [GAM], and ordinary kriging [OK]); and, (3) assess the efficacy of using an ALR transformation prior to modeling, which would honor requirements of compositional data.
MATERIALS AND METHODS
The study area is the Bourke Irrigation District (BID) in the lower Darling River valley in northern New South Wales of Australia (Longitude 145° 56′E, Latitude 30° 06′S). The BID is generally characterized by flat unconsolidated tertiary alluvium with a gentle increase in elevation away from the Darling River. Northcote (1966) identified four major soil map units (Figure 1a), which include: uniform textured-deep yellow gray self-mulching clays (II1) and gray self-mulching cracking clays (CC19), gradational profiles with neutral reaction trend (My1), and duplex profiles with red clayey subsoil (Nb4). The yellow gray self-mulching cracking clays (II1) are found predominantly on the lower lying alluvial floodplains ( Australian Height Datum) of the Darling River (Figure 1b). The slightly gilgaied gray self-mulching clays (CC19) are associated with drainage-ways and swampy Playa basins. The red earths, which are characterized by a neutral reaction trend (My1), occur on a gently sloping and undulating tableland and its remnants. The alkaline red earths (Mx1) and the duplex red earths (Nb4), which are associated with Aeolian dunes and low stony ridges, are lighter in texture and are generally not suitable for agriculture.
Acquisition of ancillary data
Initially, we carried out an EM survey involving 1,236 locations; whereby, EM34 readings at intercoil spacing of 10 (EM34-10), 20 (EM34-20), and 40 m (EM34-40) in the horizontal mode of operation (Triantafilis and Buchanan, 2010) were taken. These readings provide bulk apparent soil electrical conductivity values () to depths of 7, 15, and 30 m, respectively. Measurements were also made with a Geonics EM38 in the horizontal (EM38-h) and vertical (EM38-v) dipole giving bulk values to a depth of 0.75 and 1.5 m, respectively (McNeill, 1990). In irrigated and dryland areas, readings were taken on an approximate 0.5 and a 1 km grid, respectively.
Gamma-ray spectrometry detects radioactive emissions from the natural decay of uranium (U), thorium (Th), and potassium (K) from rock and soil. The relative abundance of these elements provides information on the upper 0.30 m (Minty, 1997). Potassium is measured directly using decay of at 1.46 MeV, while U (1.76 MeV) and Th (2.62 MeV) abundances are inferred from the daughter products of bismuth and () and thalium (), respectively; and are therefore referred to as “equivalent” uranium (U) and thorium (Th). The spectrometry data for this study was obtained from the New South Wales Department of Mineral Resources acquired through an aerial geophysical survey (July 1995). Line spacing is 250 m with a sampling interval of 60 m.
Soil development often occurs in response to the way water moves across a landscape. Consequently, terrain analysis provides useful data for predicting soil properties where topographic shape relates to pedogenesis (McKenzie et al., 2000). Over the last two decades, DEMs have been used as ancillary data to improve prediction of soil properties at farm (Pachepsky et al., 2001), subcatchment (Odeh et al., 1992), and regional scales (Gobin et al., 2001). The DEM was collected using an 11-channel rms GR33A flying at an altitude of 60 m, concurrent with the gamma-ray survey.
As satellite images become increasingly available at little or no expense, data in the visible and near infrared wavelengths acquired by sensors such as Landsat TM, are widely being used for the prediction of soil properties (Mattikalli, 1997). This is because different soil types variably absorb and/or reflect EM radiation at different wavelengths (Andronikov and Dorbrolv'skiy, 1991). In general, fine soil particles can cause increased surface albedo and reduce the spectral contrast between absorption features (Scull et al., 2003). In this study, seven bands of a Landsat TM image (acquired on May 3 1999), in combination with the other ancillary data, were used as predictors of PSFs.
Soil sampling and ancillary data preparation
During the field sampling, 125 locations were visited and sampled at 0–0.1 m depth (Figure 1c). The site selection was based on a combination of EM34 and EM38 readings (Triantafilis and Buchanan, 2010) and Landsat TM data (Odeh and Onus, 2008). Variation in EM data, from small to large , ensures sampling covering a wide range of soil characteristics (e.g., clay content); whereas, sampling according to variation in Landsat TM allowed collection of samples to represent irrigated and noncultivated fields, as well as areas of remnant vegetation. Samples were analyzed for clay (), silt (20–2 µm) and sand (20–2000 µm) fractions using the pipette technique (Coventry and Fett, 1979). Due to differing resolution of ancillary data, it is necessary to resample each to a common grid of cells. The EM34 and EM38 data were interpolated by local kriging algorithm using the VESPER software program (Minasny et al., 1999). The gamma-ray spectrometry, DEM, and Landsat TM, were resampled by averaging values of the eight adjacent pixels using IMAGINE software (ERDAS, 2003). The same resampling or interpolation techniques were performed to estimate the ancillary variables onto the 125 soil sample locations.
MLR and GAM
To make use of ancillary data that is well correlated with PSFs, a linear regression relationship is often used. In our case, involving multiple independent variables, we applied an MLR model defined as(1)where represents the value of the dependent variable (in our case, one of the PSFs) at the th site, , ,…, represent the ancillary data acquired at this site, , ,…, represent unknown regression parameters, and represents the random error component, which is assumed to exhibit some type of spatial dependence.
GAMs extend the linear model by flexibly modeling the relationships between the predictors and the dependent variable. The additive model generalized the liner model by modeling the response function as(2)where are smooth functions and . The GAM further extends the linear model by allowing a link between and the expected value of , which allows for an alternative distribution for the underlying random variation beside just the normal distribution.
Additive log-ratio transforms (ALR)
Traditional multivariate geostatistical techniques assume an unconstrained sample space and may fail to reflect the real spatial covariance structure of compositions (Odeh et al., 2003). As such, predicted values may not meet requirements of compositional data (De Gruijter et al., 1997):
Each of the components of the composition must be nonnegative(3)where is the estimate of a compositional regionalized variable, of th component at th location.
At each location, the components must sum to a constant and (4)
Estimates of the composition should be unbiased (5)
Crucial to the interpretation of regionalized compositions is understanding that the sample space is a positive simplex () and not a multidimensional space () (Aitchison 1990). A d-part simplex is(6)where are considered row vectors of d-part compositions; is a constant, the sum of vectorial compositions (e.g., 100 if composition is a percentage). The additive log-ratio (ALR), which allows transformation of the simplex to the real space , is defined as(7)where is the log ratio transformation of .
The inverse transformation of equation 7 is expressed as(8)
The effects of this transformation are twofold: (i) closure effect is removed, and (ii) through perturbation, the transformed data may fit a normal distribution, making the data suited to classical statistical analysis such as MLR (Odeh et al., 2003).
Prediction precision and bias
The prediction of PSFs, based on the correlations of the untransformed fractions with the ancillary variables, is undertaken using MLR and a GAM. To determine a suitable method to proceed, in terms of an ALR transformation of the PSF data, several criteria for comparing prediction performance of the statistical techniques were used. The mean error (ME) (Odeh et al., 1995), which measures bias of prediction, should be close to 0. This is because this value represents an unbiased technique. It is defined as(9)where and are the observed and predicted values, respectively. On the other hand, the root mean square error (RMSE), which is defined as(10)provides a measure of the accuracy of the prediction technique, which means that a precise set of predictions would have a value close to zero for a given method. To determine the improvement in prediction, the 125 data points alone were interpolated using OK, with the subsequent ME and RMSE compared to the results achieved using MLR and GAM. In addition, and because OK does not use ancillary data for prediction, any improvement in ME and RMSE will be attributable to the ancillary data. To calculate ME and RMSE, a modified version of the jackknife procedure (Good, 1999) is employed. This involved randomly selecting th of the total number of sites to be used as validation points. This process is undertaken 20 times to ensure each point is removed and used to independently to validate each of the prediction techniques.
To highlight which technique best met the conditions to spatially predict a regionalized composition, the summation of PSFs at each prediction point is analyzed. Further to this, an analysis of which technique most accurately classified textural class at each calibration point is undertaken. The optimal method is then repeated using a priori additive log ratio transformation of PSFs data. The VESPER software package is employed to carry out kriging with local variograms used.
RESULTS AND DISCUSSION
Exploratory analysis of soil PSF
Table 1 shows the summary statistics for the measured and log-transformed clay, silt, and sand fractions at each of the 125 calibration sites. The frequency distribution of clay is slightly negatively skewed, with the largest range (5.30–70.15%) and mean value (46.73%). The large mean clay content is a reflection of the predominantly clay alluvial nature of the landscape. Conversely, silt and sand are skewed slightly positive. Sand (stan. dev. = 17.29%) is most variable, followed by clay (stan. dev. = 15.23%) and silt (stan. dev. = 7.98%). The relatively large standard deviation for clay, and in particular sand, is probably due to the geomorphologically disparate units which these PSFs represent (i.e., alluvial clay and Aeolian dunes). The additive log-transformation of the PSFs reduced the skewness for sand (i.e., from to ); however, it slightly increased the skewness of the silt (i.e., from 0.51 to ) and clay (i.e., from -1.04 to ) fractions.
Spatial distribution of ancillary data
Figure 2a shows the spatial distribution of EM38 . The smallest values () are generally associated with the Aeolian dunes and the current floodplain south of the Darling River. The intermediate values are associated with parts of the alluvial floodplain with the intermediate-large () values characterizing some of the irrigated areas. In these locations, the largest values () may be a function of point-source soil salinity adjacent to some of the larger water reservoirs (e.g., north of the circular reservoir). Figure 2b shows the spatial distribution of TC (counts per second — CPS). There are many similarities with . For example, the Aeolian dunes are similarly characterized by intermediate-low values (i.e., 1,000–1,400 CPS). In addition, the alluvial plains, particularly where irrigation development has been extensive, are characterized by intermediate TC. Where TC and differ, is in the area south of the current Darling River and in parallel with the Bourke-Louth Road. Here, is intermediate-small, whereas TC is largest (). K and Th were also large (data not shown). This is consistent with Bierwirth (1997), who noted that areas high in Th can represent active floodplains containing K-feldspar and illite clay minerals, which characterize well-drained silty soil types.
Figure 1c shows the spatial distribution of DEM. The areas of greatest elevation are the Aeolian dunes (). The parts of the landscape of intermediate elevation are the current Darling River and various prior stream channels to the south (101–103 m). The area of least relief and lowest elevation are the alluvial plains north of the Darling River. Figure 2c shows the spatial distribution of B4 (red), B6 (green), and B7 (blue). The largest values for B4 coincide for the most part with the Aeolian dunes, as well as the present-day floodplain. The largest values for B6 and B7 coincide with the irrigated areas and some of the vegetated areas associated with the Aeolian dunes.
Correlation between ancillary data and PSFs
The Pearson correlation coefficients between the ancillary data and each of the raw and log-transformed PSFs are shown in Table 2. In general, the raw and log-transformed data showed similar coefficients. We address the PSFs in turn, from smallest to largest. With respect to clay, the strongest correlations were obtained with TC () followed closely by the EM38-h and EM38-v (). Smaller correlations were achieved with the deeper sensing EM34-10 (), EM34-20 (), and EM34-40 () followed by moderately strong correlations with radioelement K () and Th (). With regard to the use of DEM data, the level of correlation is surprisingly strong given that relief throughout the BID is only 20 m. DEM is negatively correlated with clay (). Landsat TM showed the smallest correlations overall. The best correlations were achieved with Bands 1, 2, and 6 (respectively, 0.40, 0.33, and 0.41).
Of all the PSFs, silt had the smallest correlation with ancillary data. These results are consistent with those reported by Sorensen and Dalsgaard (2005). In their study, they concluded that the mixture of minerals in the silts’ size fraction (i.e., the highly weathered sand minerals, such as quartz, and primary clay minerals such as feldspar and mica) resulted in less-defined correlations with near infrared spectroscopy data. Here, the correlation between silt and K is largest (), with the next best correlation achieved with the EM38-h (). Correlations between sand and the ancillary data were generally the inverse of clay. Gamma-ray spectrometry data showed the largest coefficients with sand, particularly TC () and K ().
Selection of ancillary data for MLR and GAM
The ancillary data for MLR and GAM were selected in a forward stepwise manner whereby data were introduced and removed from a model. The combined ancillary data that minimized the Akaike information criteria (Akaike, 1973) were selected for prediction (results not shown). This is because a minimization of Akaike information criteria ensures there is a balance between quality of fit and number of variables used in prediction (Triantafilis et al., 2004). Table 3 shows the ancillary data selected for each method to predict the PSF. With regard to the development of an MLR and GAM, clay is best predicted by EM38-v, EM34-20, TC, and Landsat B4, B6, and B7. With regard to developing a MLR or GAM for prediction of silt, the EM38-v, EM38-h, radioelement K, and B1, B5, and B7 were optimal. The results for sand were equivalent to clay in terms of optimal ancillary data, except that the EM34-10 and DEM were shown to be as useful as the EM38-v, EM34-20, and TC, along with B6 and B7, with B2 of more value than B4.
Prediction precision of PSF
Table 4 shows the RMSE and ME achieved using each of the prediction techniques and ancillary data shown in Table 3. As a benchmark, the RMSE and ME produced through OK of the 125 soil sample sites is shown. With respect to accuracy, the inclusion of ancillary data and development of either an MLR or GAM led to a considerable improvement in RMSE of clay and sand in comparison to OK. With respect to prediction of clay, the MLR () and GAM () show similar performance and represent an improvement in precision of 33% and 28% over OK (), respectively. With regard to silt, neither the MLR (7.39) nor GAM (7.39) produced an improvement in precision over OK (7.97). For sand, the GAM () is most accurate, showing an improvement in precision over OK and performing slightly better than the MLR ().
With regard to bias of prediction, the ME values for MLR and GAM showed that the GAM is least biased. That is, for clay (), silt (0.04), and sand (0.08) the predictions overall were close to zero, as compared with the MLR and OK. The reason for the slightly better results of the GAM over the MLR is attributable to the flexibility in the modeling between the PSF data and ancillary data. Given these results, we pursued the development of a GAM using ALR-transformed PSF data. Table 3 shows that the best ancillary data set for clay is identical to that for the GAM. In terms of silt, the GAM-ALR model is better served with the inclusion of EM34-40 instead of EM34-20, and B5 instead of B6 and B7. Sand differed whereby the GAM-ALR required B3 and not B1 for GAM.
Meeting the compositional requirements of PSF
Table 5 shows that only the GAM-ALR honors the requirement of a composition by summing to unity (100%) and having nonnegative predictions. This is not the case with respect to the predictions in clay achieved using an MLR or GAM. Using either method, although the mean clay is roughly equal to that of the sampled clay, the minimum values predicted by these techniques are both negative (i.e., 11.4% and -13.4%, respectively). This is also the case for silt (respectively, % and ) and sand (% and 14.1%). These negative values are due to some of the prediction points being outside the calibration range and therefore not being constrained to a positive simplex. These results show unequivocally the superiority of the ALR technique.
Spatial distribution of PSF
Figure 3 shows the spatial distribution of the various PSFs using GAM-ALR. It can be seen from Figure 3a that the GAM-ALR technique shows a sharp change in clay associated with the alluvial plain and the Aeolian dunes. The juxtaposed nature of these units is most evident south of the Wanaaring Road. It is also evident between the circular and large dual-cell reservoir in the central western part of the area. In these and many other locations, the change in clay is 30%, which equates to about half the range in clay (i.e., 5.3% and 70.15%), and occurs over a distance of less than 500 m. The ancillary data which allows the GAM-ALR to capture and represent this abrupt change is the gamma-ray spectrometry data (i.e., TC) shown in Figure 2b. The reason for this is the higher spatial resolution of this ancillary data; because it is collected every 60 m and on 250 m transects. This is similarly the case with regard to the DEM. To some extent, the Landsat TM data is similarly useful in discerning the sharp boundary between the alluvial and Aeolian landscapes. With regard to the EM data, which broadly reflected these two landscapes, some of the subtleties in the variation of clay on the alluvial plain are better reflected by changes in EM38-v. This is shown in Figure 3a, and in particular, between the four large water reservoirs north of the Darling River. Here, it is evident that the larger clay content (i.e., ) is consistent with the areas where EM38-v is largest (i.e., ).
Overall, the use of the GAM-ALR and the various ancillary data sets clearly identifies the alluvial plain from the Aeolian dunes. In terms of a qualitative validation, the contiguous band of large clay evident north of the Darling River, which extends from Ford’s Bridge Road and runs in parallel with Janbeth Road, is consistent with the extent of the alluvial floodplain, which is often subject to inundation. In addition, the area of larger clay content in the northernmost part of the study area, where clay is greater than 50% and in some areas is greater than 60%, coincides with the location of the playa basin.
The largest values of predicted silt (i.e., ) are associated with the sediments south of Bourke and north of the Bourke-Louth Road (Figure 3b). The main reason is the larger radioelement counts, in particular K, which is most useful in silt prediction. The reason for this is that the silt size grains are less weathered than the older clays associated with the alluvial plain, and as such have larger amounts of K retained in the mineral structure in these locations near the River. It is also worth noting that, of the three PSFs, only the prediction of silt is optimal in terms of including both EM38 readings. Given that the Landsat TM Bands 1, 5, and 7 were the most useful as ancillary data for prediction, this suggests that the spatial distribution of silt is a function of the most recent geomorphological processes, because the most-used ancillary data provide information about the soil surface, topsoil, and subsoil. Consequently, the predictions of silt might be influenced by the anthropogenic activities. This is evident in many of the irrigated cotton fields, where silt prediction appears to be much higher in some fields as compared with fields directly adjacent; for example, west of the circular water reservoir. Figure 3c shows the spatial distribution of predicted sand content, which is essentially the inverse of clay.
Spatial distribution of soil texture classes
From the end-user point of view, PSFs are better interpreted as soil texture classes. Figure 4a and 4b shows, respectively, the spatial distribution of predicted soil texture using GAM and GAM-ALR. The Australian soil texture triangle is used to classify the texture class (see Figure 5). In both maps, the juxtaposition of the predominantly alluvial plain and the Aeolian dunes are evident. The alluvial floodplain is characterized by clay and silty clay textures. The former is the largest mapped, and accounts for PSFs where the clay is generally larger than 35%, silt is between 0% and 20%, and sand is of generally no more than 35%. The circular playa basin, and to a lesser extent the plains landscape south of the Darling River, are characterized by these two texture classes.
With regard to the GAM, the silty clay texture class takes up almost half the area. In some areas, the silty clay is mapped as narrow blocks or lines interspersed in areas predominantly mapped as clay. These artifacts of prediction were probably caused by farm access roads, which change the spectral reflectance of the Landsat TM bands. They are also due to the effect of differential land management regimes used in fields within the irrigated regions. The reason that GAM and GAM-ALR produced such disparity in textural classifications is a function of two factors. The first is that a classification boundary problem exists, and is a function of assigning hard classification to continuous data (0%–100%). That is, assigning the three individual components (i.e., sand, silt, and clay) into hard classes (i.e., Australian soil texture classification). For example, where components of two different samples of PSF data that in reality differ by 1% or 2% in clay for example, gives rise to the appearance of a larger difference in soil texture because it falls within another class. The second factor is due to the fact that the GAM does not meet all of the requirements of a composition. That is, each PSF’s prediction should sum to a constant at each location (i.e., 100%). In this study, the average summation to the constant at each PSFs prediction point with respect to the GAM is 97%. To account for this discrepancy, and to assign a location to a soil texture, the remaining 3% on average is attributed to silt. This inherently leads to lighter soil texture prediction, overall. The GAM-ALR, which is constrained by the constant-sum of a simplex, gives values that minimize change to any particular PSF. The result is an increase in the accuracy of texture class allocation, albeit minor, from 57% to 59% with GAM and GAM-ALR, respectively.
The improved soil-texture classification of the GAM-ALR, as compared with the GAM, can be attributed to the “lost information” in using silt fraction to top-up the PSFs to unity for the non-ALR techniques. It has been argued by Woronow (1997) that log-ratio analysis cannot add any information, and therefore it cannot expand the scope of questions that can be resolved with compositional data. While this may be true, it can be argued that log-ratio analysis does not lead to loss of information either, because there is a one-to-one correspondence between any D-part composition and its log-ratio vector (Aitchison, 1999). The GAM-ALR technique is less affected by the ancillary data points that fall outside the calibration range. This appears to be a benefit above and beyond constraining data to the simplex.
It should also be noted that, through prediction of all components of the composition, large errors are minimized as compared to making a noncompositional prediction of each PSF. This is best illustrated by considering the ternary plots of the predictions made with GAM and GAM-ALR shown in Figure 5. It is obvious that the GAM-ALR produced results that better reflect the distribution of the PSFs of the 125 samples (marked as black dots) than is the case for GAM (Figure 5a). Additionally, many of the GAM predictions approach unrealistically large values (e.g., 100% clay) and small values (e.g. 0% silt). However, the GAM-ALR achieved more robust predictions in the various PSFs than GAM alone because it constrains the values to more closely match the sampled distribution.
The Aeolian dunes are also well-discerned. The largest is located north of the Wanaaring Road. In addition, another large Aeolian dune is evident at the junction of this road and the Bourke-Louth Road, with a third contiguous area discerned in the approximate shape of a semicircle, which is evident in the southwest corner of the study area. The Aeolian dunes are characterized predominantly by the texture classes of either clay loam or loam. However, smaller areas are characterized by loamy sand and sandy loams. Interestingly, these smaller pockets are usually situated in the western parts. This internal variation of soil texture is best illustrated on the semicircular shaped Aeolian dune evident in Figure 4b. The variation in texture shows a progressive transition from a coarse-textured (sandy loam) along the southwestern edge, to finer textures (loam to clay loam) toward the northeast. The orientation of this transition in soil texture suggests the direction of the prevailing wind which deposited these Aeolian sediments is from the southwest. The reason for the fining of the soil texture is a function of the winnowing of the finer particles from the front edge of the Aeolian dune further to the northeast.
Various sources of ancillary data were collected to assist in developing digital soil maps of PSFs, including clay, silt, and sand. To determine a suitable method of interpolation, we developed relationships between the ancillary data and the PSFs using MLR and GAM. Owing to the flexibility in the modeling between the PSF data and ancillary data, we pursued the development of a GAM using ALR transformed PSF data. The benefit of using the GAM-ALR is clearly demonstrated in the significant improvement in precision and bias of prediction for each of the PSF data as compared with the OK, MLR, and GAM approaches. The GAM-ALR also honors the requirement of a composition by summing to unity (100%) and having nonnegative predictions. Furthermore, the use of the GAM-ALR resulted in a larger percentage of correct predictions of texture class.
To value add to the results achieved here, the method of GAM-ALR could be applied to a 3D stratigraphic models of true electrical conductivity recently developed for the study area (see Triantafilis and Monteiro Santos, 2011). This is because the calibration of the data to PSF provides additional information which can be used to model water flow, using pedotransfer functions, through the soil and which can potentially recharge a saline shallow water table (see Buchanan and Triantafilis, 2009). In addition, and to determine where errors in prediction of each of the PSFs occur, an error budget procedure, as developed by Nelson et al. (2011), could be used to quantify the relative contributions that positional, analytical, covariate, and model error make to the predictions.
The Australian Cotton Research and Development Corporation provided the senior author with a Ph.D. scholarship. We acknowledge the Australian Cotton Cooperative Research Center, who provided the core funding. We thank the Bourke Irrigators Association for supporting an application and administering funds obtained from the Australian Federal Government’s Natural Heritage Trust. We are grateful to the numerous landholders who allowed us unrestricted access to their farms. We acknowledge the contribution of Michael Short, Mathew McRrae, and Andrew Huckel who assisted in carrying out the EM surveys and soil sampling. We thank Davood Moghadas (Forschungszentrum Jülich), Timothy de Smet (Texas A&M University), and an anonymous reviewer for their robust and constructive criticisms of the manuscript, which materially improved the readability and scientific quality of the paper.
- Received February 14, 2012.
- Accepted March 28, 2012.