Fig 1.
Standard soil depths following the GlobalSoilMap.net specifications and example of numerical integration following the trapezoidal rule.
Fig 2.
Example of soil variable-depth curves: Original sampled soil profiles (black rectangles) vs predicted SoilGrids values at seven standard depths (broken red line), and predicted soil organic carbon stock for depth intervals 0–100 and 100–200 cm.
Locations of points from the USDA National Cooperative Soil Survey Soil Characterization database: mineral soil S1991CA055001 (-122.37°W, 38.25°N), and an organic soil profile S2012CA067002 (-121.62°W, 38.13°N).
Fig 3.
Input profile data: World distribution of soil profiles used for model fitting (about 150,000 points shown on the map; see acknowledgments for a complete list of data sets used).
Yellow points indicate pseudo-observations. For the majority of points shown on this map, laboratory data can be accessed from ISRIC’s World Soil Information Service (WoSIS) at http://wfs.isric.org/geoserver/wosis/wfs.
Fig 4.
Examples of covariates used to generate SoilGrids: TWI is the Topographic Wetness Index (values multiplied by 100), EVI is the MODIS Enhanced Vegetation Index (values multiplied by 10,000), s.d. LST is the long-term standard deviation of MODIS Land Surface Temperatures (values in Celsius degrees).
Location: San Francisco bay area, California. Size of the bounding box is 300 by 300 km.
Fig 5.
The (data-driven) statistical framework used for generating SoilGrids.
SoilGrids are primarily based on publicly released soil profile compilations, NASA’s MODIS and SRTM data products and Open Source software compiled with the ATLAS library: R (including contributed packages), and Open Source Geospatial Foundation (OSGeo) supported software tools.
Fig 6.
Fitted variable importance plots for target variables.
Generated as an average of predictions using the ranger and xgboost packages (for soil types results are based on the ranger model only). DEPTH.f is depth from soil surface, T**MOD3 and N**MOD3 are mean monthly temperatures daytime and nighttime (red color), TWI, DEM, VBF and VDP are DEM-parameters (bisque color), M**MOD4 are mean monthly MODIS NIR band reflectances (cyan color), P**MRG3 are mean monthly precipitation (blue color), E**MOD5 are mean monthly EVI derivatives (dark green color), VW*MOD1 are monthly MODIS Precipitable Water Vapor images (orange color), C**GLC5 are land cover classes (light green color), and ASSDAC3 is the average soil and sedimentary-deposit thickness (brown color).
Fig 7.
Examples of relationships for target variables and the most important covariates: (top row) bulk density in kg m−3, (middle row) soil pH, and (bottom row) soil organic carbon in permilles (on log scale).
Plots show target variables and the top three most important covariates as reported by the random forest model. DEPTH.f is the observed depth from soil surface, T09MOD3 is mean monthly temperature for September, TMDMOD3 is mean annual temperature, PRSMRG3 is total annual precipitation, M04MOD4 is mean monthly MODIS NIR band reflectance for April, P07MRG3 is mean monthly precipitation for July, T01MOD3 is mean monthly temperature for January, and T02MOD3 is mean monthly temperature for February.
Table 1.
SoilGrids average prediction error for key soil properties based on 10–fold cross-validation.
N = “Number of samples used for training”, ME = “Mean Error”, MAE = “Mean Absolute Error”, RMSE = “Root Mean Squared Error” and R-square = “Coefficient of determination” (amount of variation explained by the model). For variables with a skew distribution, such as organic carbon, coarse fragments and CEC, the accuracy statistics are also provided on log-scale⊗.
Table 2.
Mapping performance of SoilGrids250m compared to summary results for SoilGrids1km [9].
Amount of variation explained by models (Eq 4), i.e. prediction accuracy for soil types was determined using 10–fold cross-validation. GSIF = “Global Soil Information Facilities”.
Fig 8.
Correlation (density) plots produced as a result of 10–fold cross-validation.
See also Table 1 for more details.
Fig 9.
Maps of scaled Shannon Entropy index (Eq 5) for USDA and WRB soil classification maps.
Fig 10.
Example of scaled Shannon Entropy index for USDA and WRB soil classification maps with a zoom in on USA state Illinois near the city of Chicago.
This figure uses the same legend as used in Fig 9.
Table 3.
Classification accuracy for predicted USDA class probabilities based on 10–fold cross-validation, ordered according to number of occurrences.
ME = “Mean Error”, TPR = “True Positive Rate”, AUC = “Area Under Curve”, N = “Number of occurrences”, USDA = “United States Department of Agriculture” soil classification system. The 1st and 2nd most probable classes are taken from the confusion matrix.
Table 4.
Classification accuracy for predicted WRB class probabilities based on 10–fold cross-validation, ordered according to number of occurrences.
ME = “Mean Error”, TPR = “True Positive Rate”, AUC = “Area Under Curve”, N = “Number of occurrences”, WRB = “World Reference Base” soil classification system. The 1st and 2nd most probable classes are taken from the confusion matrix.
Fig 11.
List of some remote sensing data of relevance for global soil mapping projects (i.e. with a near to global coverage and with remote sensing technology of interest to soil mapping).
Landsat 8 is part of the Landsat Data Continuity Mission (LDCM) maintained by NASA and the United States Geological Survey (USGS). ALOS Global Digital Surface Model is a product of the Japanese Aerospace Exploration Agency. Sentinel–1,2 is the Earth observation mission developed by the European Space Agency as part of the Copernicus Programme. WorldDEM™ is a commercial product distributed by Airbus Defence and Space.
Fig 12.
Comparison between predicted soil pH: (above) SoilGrids (our predictions) for part of California and predictions based on the SSURGO data set (for 0–200 cm depth interval) developed by the National Cooperative Soil Survey, (below) SoilGrids (our predictions) for Tasmania and predictions based on the Soil and Landscape Grid of Australia [76] (for 0–5 cm depth interval).
The correlation coefficients between the two data sources are 0.79 and 0.71, respectively. Crosses on the map indicate soil profiles used for generating SoilGrids.
Fig 13.
SoilGrids can be considered the ‘coarsest’ component of the global soil variation ‘signal’ curve.
Other components, e.g. finer products based on local / more detailed 250–100 m resolution imagery, could be added to produce a merged product.
Fig 14.
Basic design and functionality of SoilGrids.org: Soil web-mapping browser that provides interactive viewing of 3D soil layers.
Reference administrative data, basic functionality and output data license of SoilGrids.org are primarily based on OpenStreetMap.