Data

Report Dataset

Predictions of PFAS Occurrence in Groundwater at the Depth of Drinking Water Supplies in the Conterminous United States: Data and Model Archive

Tokranov, A.K., Bexfield, L.M., Ransom, K.M., Kingsbury, J.A., Fram, M.S., Lindsey, B.D., Watson, E., Dupuy, D.I., Voss, S.A., Jurgens, B.C., Stackelberg, P.E., Beaty, D.A., Smalling, K.L., and Bradley, P.M., 2024, U.S. Geological Survey data release

An extreme gradient boosting ensemble tree model predicting per- and polyfluoroalkyl substances (PFAS) occurrence in groundwater at the depths typical of the bottom of public and domestic drinking water supplies across the conterminous United States was developed. PFAS data used to train the model were collected between 2019 and 2022 by the U.S. Geological Survey National Water Quality Network, Groundwater and the California Groundwater Ambient Monitoring and Assessment Program – Priority Basin Project. This dataset contains concentrations of PFAS, volatile organic compounds (VOCs), pharmaceuticals, and tritium in groundwater, along with associated quality assurance and quality control data. Concentrations of VOCs and pharmaceuticals were measured at the U.S. Geological Survey (USGS) National Water Quality Laboratory in Lakewood, Colorado. Concentrations of PFAS were measured at SGS North America Inc. - Environment Health & Safety in Orlando, Florida. Concentrations of tritium were measured at the USGS Tritium Laboratory in Menlo Park, California. PFAS concentrations were converted to a binary (either detected or not detected) for modeling. Model predictor variables used for training the final model were urban land use, well depth, average soil clay content, nitrogen loading from septic systems, recharge, population density, depth to water, and distance to the nearest potential PFAS source (fire training areas, airports, waste facility, etc.). Model predictions were at 1x1 kilometer resolution. All model inputs (model training dataset and rasters for prediction), the model object, model outputs, and source code to generate final model predictions are provided. Model outputs are the rasters for the predicted occurrence of PFAS at the depths typical for (1) public drinking water supply, and (2) domestic (private well) drinking water supply.