Satellite Monitoring of Terrestrial Plastic Waste

Plastic waste is a significant environmental pollutant that is difficult to monitor. We created a system of neural networks to analyze spectral, spatial, and temporal components of Sentinel-2 satellite data to identify terrestrial aggregations of waste. The system works at continental scale. We evaluated performance in Indonesia and detected 374 waste aggregations, more than double the number of sites found in public databases. The same system deployed across twelve countries in Southeast Asia identifies 996 subsequently confirmed waste sites. For each detected site, we algorithmically monitor waste site footprints through time and cross-reference other datasets to generate physical and social metadata. 19% of detected waste sites are located within 200 m of a waterway. Numerous sites sit directly on riverbanks, with high risk of ocean leakage.


Introduction
Plastics are a major pollutant impacting our planet. They are integrated into nearly all aspects of our daily life and are leaking into the environment via pathways that are not fully understood. Plastics in the environment are now ubiquitous and have reached the world's highest points, deepest parts of the ocean, seafloor sediment cores, from populated areas to remote islands, and both poles [1] [2][3] [4][5] [6]. On reaching the ocean, plastics persist for decades as an insidious pollutant [7] [8]. Plastics have been found to cause harm to hundreds of species, including all sea turtle species, almost half the cetacean and marine bird species, and damage coral reefs and other ecosystems [9][10] [11]. With an estimated 11 million metric tons of plastic waste currently entering the ocean each year, a rate that is expected to nearly triple by 2040 [12], it is more urgent than ever to address the plastic pollution issue further upstream.
Previous research has shown that plastic in the environment and ocean is influenced by mismanaged waste on land [13] [14] [15]. It is estimated that 70-80% of plastic pollution comes from land-based sources and that 91% of ocean plastic pollution occurs via watersheds [16] [14]. Additionally, while there are an estimated 100 rivers transporting plastic waste to the ocean, the top ten are located in South or Southeast Asia where dumpsites are still commonly used for disposal [17][18] [19].
Given that plastics constitute more than 12 percent of global waste [19], the development of new and effective plastic management strategies requires an understanding of waste aggregations and flows (particularly for litter hotspots, illegal dumping, and related high-risk leakage sites). Due to lack of resources to scale, government reporting can be scarce, out of date, and often doesn't account for informal waste management practices. This work seeks to leverage remote sensing data to fill this information gap, tracing waste aggregation locations in service of reducing the impacts of plastic pollution. With remote detection of waste aggregations, one can measure rather than model waste distributions and monitor waste site development through time, eventually within a globally consistent and comprehensive dataset.
To our knowledge, no terrestrial global and operational monitoring system for plastic waste exists. For ocean plastics, conceptual [20] and small-scale [21] studies have shown that the spectral signature of floating plastic debris is likely characterizable. Both Biermann et al. [22] and Themistocleous et al. [23] demonstrated that indices derived from multispectral Sentinel-2 data are sufficient arXiv:2204.01485v1 [cs.CY] 24 Mar 2022 to identify floating debris in a marine environment. On land, the spectral diversity of waste and land cover makes it challenging to devise spectral indices that can effectively discriminate waste. More recently, Gill et al. [24] developed a method for detecting large, managed landfills in Kuwait using Landsat-derived land surface temperature increases and Page et al. [25] developed a classification method for both tire and plastic waste in Scotland using Sentinel-1 and Sentinel-2 data.
Growth of computational infrastructure, architectural innovations, and new training techniques have established neural networks as preeminent systems for image classification. Recent work has shown that neural network systems have the ability to produce global datasets from Earth observation data, classifying and monitoring features at a greater level of specificity, robustness, and scale than ever before [16] [26][27] [28]. We build on this work, creating a novel pipeline of neural networks that parse spectral, structural, and temporal information from Sentinel-2 satellite data to identify plastic waste aggregations on land throughout Southeast Asia.
The computational engine consists of two convolutional neural networks that analyze and combine spectral, spatial, and temporal signals. The two networks work in tandem, with candidate regions generated by the first that are then cross-validated by the second. The first ingests per-pixel spectrograms, twelve-band Sentinel-2 spectra concatenated across two time steps. This data structure leverages both spectral and temporal patterns when assessing the likelihood that a given pixel contains plastic waste. Candidate waste sites are then generated via blob detection on the raster of positively-classified pixels. A second neural network validates the candidate sites. This secondary network operates on patches of raster data rather than individual pixels, incorporating spatial context in its evaluations along with the spectral and temporal data.
In training the models, we began with only ten known waste sites on the island of Bali. We bootstrapped on early model outputs and incorporated unlabeled data through semi-supervised distillation until the models achieved sufficient expressivity to operate over the whole of Southeast Asia. Deployed on the Descartes Labs geospatial analytics platform, the system returns a threefold increase in validated waste site detections over those documented on OpenStreetMap. The approach allows for a repeatable, scalable, cost-effective, and operational monitoring capability for plastic waste on land. This work is in direct support of the global observation system for marine debris as proposed by Martínez-Vicente et al. [29].

Waste Site Detection and System Performance
Indonesia We evaluated every 10×10m Sentinel-2 pixel captured in Indonesia (1.81 × 10 6 km 2 ) at nine time steps between January, 2019 and March, 2021. This produced 163 billion predictions at the pixel level and 623 million classifications of patches. To reduce variance, we average the time-step outputs to arrive at a final assessment of the presence of waste.
In total, the model detected 374 plastic aggregation sites across Indonesia that we were subsequently able to validate (Fig. 1). This is more than double the number cataloged waste sites in known databases. The nature of detected sites vary, though identifications are predominantly formal government-run open waste sites and small-scale informal dumpsites. Using data from the Indonesian Ministry of Public Works and Public Housing [30] and OpenStreetMap, we compiled a list of 184 waste sites operating across Indonesia. Though a complete set of waste locations is not known, we use this subset of sites to evaluate the false negative rate for the model. The system had a recall rate of 80% in a high sensitivity configuration, and a 40% recall rate in low and medium sensitivity modes ( Table 1). The system detects about three previously unknown waste sites for every site it misses.

Southeast Asia
We also ran the system across all countries in Southeast Asia. Because the model received no tuning or additional training data to expand beyond Indonesia, we ran the pipeline in a low sensitivity configuration. We detected and confirmed a total of 996 plastic aggregation sites in Southeast Asia (Fig. 2). This is a nearly three-fold increase over the number of recorded waste sites listed in these countries on OpenStreetMap. 53% of candidate locations produced by the pixel classifier and confirmed by the patch classifier were validated by human evaluators as waste aggregations (Table 2).

Site-Specific Metrics
Site Proximity to Waterways We find that the centers of 19% of waste sites in Southeast Asia are located within 200 meters of a waterway or waterbody listed on OpenStreetMap, and more than half are within 750 m (Fig.  3). For sites that are located within 5 km of a waterbody, the median distance is 706 m.
We also identify a number of waste sites situated directly on the banks of rivers. Referencing high-resolution satellite imagery, we can observe waste overflowing retaining structures and spilling directly into the waterways (  Footprint Monitoring Using the pixel classifier, we generate monthly site boundaries. It is difficult to quantitatively assess their accuracy and precision in the absence of ground truth data. Visual inspection of imagery does not suffice, because human labelers cannot reliably delineate boundaries between bare earth and waste in high resolution imagery. Qualitatively, waste site boundaries frequently visually match historical imagery (Fig. 6). They also exhibit the misclassification modes of the pixel classifier. In terms of the mean footprint area across time, 38% of detected waste sites are smaller than 0.1 ha, equivalent to a square area about 30 meters on a side, and 82% of sites are smaller than 0.5 ha (Fig. 4). The mean area of a single site in the region is 0.47 ha (SE 0.044). As a reference for dumping ground size, the footprints of known formal waste sites in Bali range from 0.51 to 4.5 ha. Given that managed waste sites empirically tend to be larger, the number of identified sites smaller than 0.5 ha may indicate that the majority of detections in this work are informal dumping grounds.

Individual Model Accuracy
In evaluating on a fully-withheld test dataset, the component single pixel and patch-based neural networks prove to be highly performant, with f1 scores over 90% 3). Performance persists across individual land-cover classes (forest, farm, etc.) within the negative-class test data, with city and bare earth emerging as relatively challenging cover types. Details on the construction of the test dataset used in these evaluations are given in Methods (Sec. 4.1).
To identify the importance of input data features, we compared the performance of the 12-band spectrogram pixel classifier against models with selectively reduced inputs (Table 3).
Including temporal information through a spectrogram input improved the true positive rate from 66.64% to 71.99%, and increased the true negative rate from 99.65% to 99.97%. Despite hiding in the decimal-percentage places, this increase in true negative rate represents an order of magnitude improvement in false negative suppression. The domain of operation for the system is highly class imbalanced, with approximately ten million true negative pixels for each single true positive pixel. These seemingly fine numerical margins of improvement make the difference between a practically useful system and one where true positives are lost in the noise of false detections.
Broad spectral coverage is also seen to be essential for waste identification. Reduced-spectrum networks that took only RGB or RGB+NIR bands as input showed only a minimal capacity to identify waste, as evidenced by true positive rates under 30%.
Finally, the neural network demonstrated a greater capacity for classification compared to a random forest trained on the per-pixel data, with an unweighted f1 score of 90.46% vs. 77.39%.
Though the patch classifier functions only to cross-validate the pixel classifier candidates, it too has a high level of classification accuracy. We evaluated the performance of this semi-supervised single network versus its teacher ensemble of 32 supervised networks, and found that the student network had an f1 score of 97.56% vs. the ensemble's 95.31%. Of course, the single network also ; Indonesia, where a waste site has collapsed into the river (6.591 S, 107.741 E); Indonesia, with waste spilling into the water (6.141 S, 106.616 E); Indonesia, where the side stream is eroding the mounded waste (6.206 S, 107.034 E); and Sri Lanka bay shore (7.771 N, 81.601 E). Many sites that leak into water later show signs of remediation, in the form of retaining walls, rebuilt banks, or waste burial, suggesting that they are in fact environmental hazards and recognized as such by local authorities. All images are © Maxar / Google, from Google Earth, except the Sri Lanka scene which is © Mapbox aerial imagery. offers the benefit of more efficient inference as compared to the ensemble (Table 3).

Discussion
This research establishes the first comprehensive dataset on the distribution and characteristics of plastic aggregation and waste sites within Southeast Asia by remote sensing. It also demonstrates an architecture for using neural network-based systems to consistently and extensibly detect and monitor plastic and waste aggregations in Sentinel-2 data. We found 277 waste sites that do not exist in public databases, and more than doubled the count of known sites in Indonesia, and then expanded the model to identify and validate nearly a thousand plastic and waste aggregation sites across Southeast Asia. The results of this work are presented in an open data portal at https://plastic.watch.earthrise. media, providing information to the public, governments, non-government organizations, and multiple industries about the spatial and temporal characteristics of waste aggregation sites. This data can be used to inform upstream interventions and to prevent further plastic pollution, as well as inform other mitigation efforts, waste management strategies, and cleanup campaigns. This open data platform offers both assessment and monitoring of terrestrial plastic pollution at a scale that has not been realized previously.
This work may serve as a template for utilizing deep learning for environmental monitoring and detection systems. In particular, we demonstrate model and system architectures for incorporating multiple dimensions of remotely sensed information. Convolving learnable filters across spatial, spectral, and temporal dimensions combines these signals in ways that would be nearly impossible to envision for handcrafted algorithms and greatly boosts system performance. The neural networks also show a capacity for geographic robustness, functioning in unseen countries across the South Asian biome without additional tuning.
Labeled data for environmental monitoring is often scarce, and this work demonstrates strategies for training neural networks in data-poor conditions. The work began with only 10 known waste locations, which would typically be considered an insufficient dataset to train heavilyparameterized models like neural networks. We amplified the amount of training data by sampling at multiple time points and continuously adding to the dataset as new detections were validated and new failure modes were identified. This data engine enhanced the quantity and diversity of training data and continually improved the training set through time. Then, the data engine is selfreinforcing as the geographic scope expands and new waste locations are identified.
Beginning with a per-pixel classifier architecture facilitated early progress. The classifier is forced to learn spectral and temporal patterns, minimizing bias and overfitting towards waste site structure that would likely have been seen in a spatial classifier. The second-stage temporal patch classifier is able to rule out candidates that spectrally match waste site profiles but are structurally different from waste sites. For example, the single pixel classifier initially identified plastic greenhouses as a plastic aggregation site, but second-stage spatial classification was able to distinguish the difference such that only plastic aggregation sites were identified. Finally, the use of a semi-supervised noisy student distillation process improved the quality of the patch classifier. While distillation is often used to shrink the size of a model, our work showed that it also improved model performance and robustness. Using this technique in cases with limited training data may be useful for application in other earth observation tasks.
This work has implications for the integration of science into decision-making for plastic pollution and waste management. For example, the data illustrates that waste aggregations are often nearby or adjacent to waterways. More than half of sites are within 750 m, and 19% are within 200 m of a waterway. This highlights the role of these areas as a potential link between terrestrial waste aggregations and aquatic plastic pollution. Communities are burdened with waste management. After disposal, waste loses traceability and transparency. This observation system can help communities further understand plastic pollution pathways. Researchers may also be able to use this data as a complement to other data being collected (e.g., litter data), and/or to validate or improve waste generation and management models, thereby improving estimates.
This data may also be used to prioritize remediation of high-risk waste sites. The data illustrates where waste aggregation is already occurring. In many of these cases in South and Southeast Asia, an informal waste management system already exists. Instead of closure, these areas could be targeted for inclusive infrastructure development since there is existing informal collection, aggregation, and management occurring. Informal workers are knowledgeholders that would contribute to both the development of, and participation in, a waste management system that is more protective of their health and the environment. Because the data allows for near real-time monitoring of waste site presence and boundaries, the effectiveness of management interventions can be measured and monitored.
With all data open and available, non-governmental organizations, community leaders, and members of the public will be able to use it to advocate for changes to policies and practices in their communities. However, engagement of local government and key stakeholders is absolutely critical to the use of the data for contextsensitive interventions. The intention of this work is to expand it to the global scale, as plastic pollution knows no boundaries, and we are working to both refine detection of smaller waste aggregations and improve recall in new geographies. Although this work is groundbreaking from an open access assessment and monitoring scale, an Earth observation system is only one piece of an integrated approach to addressing plastic pollution. Partnering this data with a more holistic approach, including upstream interventions, is essential to effectively serve communities and reduce plastic entering our oceans. Figure 7: The major stages of the methodological pipeline. The components are modular, with products flowing from one stage as input to the next. There are four functional modules (data generation, network training, inference, and site monitoring). Major subcomponents of each shown and labeled for reference in the methods section. A diagram with additional detail is included in the supplemental materials.

Methods
We developed a system of neural networks to analyze spectral, spatial, and temporal characteristics of Sentinel-2 satellite data to identify sites with aggregations of waste. Supposing sufficient quantities of data and that the signal from waste is unique, one should be able to train a single convolutional neural network for the task. However, we began with only a handful of known waste site examples and found that the spectral signal of waste is subtle and noisy. Many of the resulting methods and system design features can be understood as flowing from constraints on the data, as methods to introduce additional streams of information to the neural networks.
We built the first stage of classification to operate on a perpixel basis in order to amplify the amount of data extracted from each known site, and to limit spatial overfitting that would be seen in a classifier that incorporates spatial information. Adding a temporal component to the spectral information serves to suppress some backgrounds which share characteristics with waste site fill. For example, the turned earth of farmed fields or senescence of seasonal vegetation [31] can appear spectrally similar to a waste site, but exhibits more seasonal variation. We incorporate spatial information through a secondary patch-based classifier that validates candidates surfaced in pixel classification. Pairing these neural networks compensates for each others' biases. We augment the training dataset continuously, incorporating prior true and false positive detections, into the training dataset, and leveraging unlabeled data through semi-supervised distillation [32]. The major stages of the methodological pipeline are laid out in figure 7 and explored in more detail through the component sections of the methods.

Data
Data Sources The Copernicus Sentinel-2 program of the European Space Agency provides a globally comprehensive, open-access dataset of satellite-based Earth observations, with moderately high spatial resolution (10, 20, or 60 meters / pixel depending on the band), broad multi-spectral range (12 bands between 442 nm and 2186 nm), and frequent temporal revisit rate ( 5 days). Sentinel-2 data has been collected continuously since late 2015. High resolution basemap data (Google Earth, Bing, 30-50 cm / pixel) has proved valuable for site validation, but using the underlying proprietary imagery for detection would involve significant tradeoffs in cost, spectral range, revisit rate, and data standardization and accessibility (Fig. 8). Earth. At right, a red, green, and blue band composite of Sentinel-2 data. As seen here, site identification at 10meter resolution is challenging for even human evaluators.
The public data portal includes site metadata queried from other publicly available datasets. The parameters include soil type information (clay and sand percentage, soil bulk density, and soil great group identity from OpenLandMap [33]), site elevation and slope (SRTM [34]), landform type (Global ALOS Landforms [35]), distance to nearest water bodies (OpenStreetMap [36]), and nearby population (WorldPop [37]).

Data Labeling
We began with a set of ten known waste sites in Bali, Indonesia, with hand-drawn boundaries. We select negative-class sites to capture the distribution of terrain in the target domain, while biasing toward features closer in spectra to waste than the dominant land cover type. In Bali, tropical forest dominates outside urban areas.
After trained models are run on a region, we add confirmed positive sites to the training set, and we evaluate dominant failure modes to select new negative-class sampling locations. In this way, we create a data sampling system that continually incorporates new information as the geographic scope increases. We refer to this generative process as the "data engine." Training Data Generation (Stage 1) For each labeled site location, we extract all 2019 Sentinel-2 L1C (topof-atmosphere) data on a square patch around the site centroid, with Descartes Labs cloud and cloud-shadow masking (stage 1.1). The masks are broadly effective but leave behind residual cloud edges and haze. Haze in particular proved to be a consistent source of false-positive detections for early models. We experimented with various data compositing techniques. Because clouds and haze are bright, we were able to eliminate most wispy clouds and haze that escaped the cloud masks by taking a minimum instead of a median composite. In the final reckoning, the data input to the neural networks is Sentinel-2 L1C data with cloud and cloud-shadow masks, composited across a three-month window by selecting the minimum unmasked value for each pixel. To form a spectrogram, a three-month composite is paired with another composite at the same location, offset by six months. We then normalize the data per-spectral-channel across the training dataset. In total, the labeled patch dataset is composed of 1,770 positive samples and 3,104 negative samples.
The unlabeled dataset (stage 1.2) is constructed similarly but consists of randomly sampled patches from 10x10 km regions that are themselves selected for broad geographic diversity.
Pixel data (stage 1.3) is derived from the labeled patch dataset. For the positive-class data, pixels are drawn from within the hand-drawn waste-site boundaries. The variable nature of the sites can still cause some negativeclass data to be sampled. In particular, sites may have dormant periods where vegetation grows. To minimize class confusion, we delete any positive-class spectrogram with a normalized difference vegetation index (NDVI) greater than 0.4 from the training data. Each pixel accompanies its temporal pair, and the two spectral profiles are concatenated into a single spectrogram of shape (2,12). After multiple iterations of the data engine, the pixel classifier dataset contains 200,663 and 3,687,725 positive and negative class pixel spectrograms, respectively.
For the test dataset, positive-class data is sampled from within boundaries drawn around 50 known waste sites in Indonesia, between June, 2019, and June, 2021. Here too, the resulting data is likely contaminated with some vegetated and bare earth pixels, from times when formal waste site operators bury waste or shift active waste operations. We sampled negative test data from a range of land-cover classes in Indonesia, oversampling challenging modes such as cities and bare earth. The test dataset contains 18,473 and 312,557 positive and negative class pixel spectrograms, respectively. These are sampled from 259 positively-labeled patches and 274 negative-class patches.

Model Architectures and Training (Stage 2)
Pixel Spectrogram Classifier (Stage 2.1) The pixel spectrogram classifier is a small convolutional neural network (CNN) with fully-connected layers following the convolutional block, as detailed in figure 9.
The convolutional block generates features across band combinations at a single time point as well as differences in spectra across the points in the spectral time series. These features can then be synthesized in the fully-connected layers. The number of free parameters in the architecture are kept small in order to reduce the risk of overfitting to the relatively uniform training dataset.
Through parameter sweeps and model comparisons we set the default training to use the Adam optimizer with a learning rate of 0.001, a batch size of 128, and initialized layer weights using a Glorot uniform initializer. We did not observe a strong influence on model performance from training hyperparameters.

Patch Classifier (Stages 2.2-2.4)
To enrich the training the patch classifier is trained using a semi-supervised Figure 9: The architecture of the pixel spectrogram classifier neural network. Each block represents a layer or stage within the network. Dropout layers are only applied during training. Figure 10: The semi-supervised training process for the patch classification network. Labels are generated using inputs from a variety of models, which are then combined into a single soft target that is used in combination with supervised data to train the final model. distillation process. An ensemble of classifiers is first trained on labeled data (stage 2.2). These classifiers make predictions on unlabeled data (stage 2.3) that are then combined and used as soft targets to train the final patch classifier (stage 2.4). The workflow for the distillation is drawn in Fig. 10.
Strong Labelers (Stage 2.2) An ensemble of 32 neural networks are trained on the labeled multispectral patch data. In contrast to the pixel classifier, the two temporal frames are concatenated along the spectral channel axis, so that on a patch 28 pixels square, the input tensor has shape (28,28,24). The models in the ensemble are trained with the same hyperparameters and on the same data, but the weights for each network are initialized with different random seeds in order to encourage model diversity.
The patch network is again a CNN, wider and deeper than the pixel classifier. The convolutional head contains three rounds of three convolutional layers followed by max pooling. These convolutional features are processed by a dense block. Architecture details are shown in Fig. 12. During training, we augment input data with reflections and rotations and apply batch normalization and dropout. During inference, these components are inactive. Aside from a scheduled learning rate, we train with the same hyperparameters noted for the pixel classifier.
Soft Labels (Stage 2.3) Though support vector machines (SVM) are not often optimal for this form of image classification, we chose to incorporate predictions from an SVM to increase the diversity of outputs used to generate the soft labels. We train a radial basis function kernel support vector machine on a flattened representation of the labeled patch data.
The neural network ensemble, the SVM, and the pixel classifier each generate predictions for all patches in the unlabeled dataset. Given that the patch classifier ensemble and the pixel classifier generate more than a single prediction, the outputs of these model types are processed into a single value through a series of heuristics.
The predictions of the neural network ensemble are combined by first converting them to binary values at a threshold of 0.5, and then selecting the mode of this set. This single value is multiplied by a metric of disagreement, formulated as (1−2σ), where σ is the standard deviation of the binary outputs. This allows a single label to represent richer information from the ensemble of networks.
The pixel classifier produces an individual prediction for each pixel in the patch. The patch is assigned a binary class if the mean value of all predictions within the patch surpass a threshold value of 0.02. This threshold was determined empirically on the labeled test set.
These individual model predictions are then unified into a single soft target through a Bayesian process. The neural ensemble serves as the first prior, which is then updated sequentially using predictions and training statistics from the SVM and the pixel classifier. The order in which our hypothesis is modified is arbitrary, since Bayesian updates are commutative. These soft targets are generated for every unlabeled patch and then used for training the student patch classifier.
Student Training (stage 2.4) A single student network is then trained on a combination of these soft-labeled data and hard targets from the human-labeled dataset. The network architecture and training strategy is the same as the ensemble of neural networks described previously (Fig.  12. Figure 12: Architecture diagram of the patch classification neural networks. The convolutional block is repeated three times, followed by two repetitions of the dense block. Dropout and batch normalization are only applied during training.

Waste Site Identification (Stage 3)
Model Inference (Stage 3.1) The Descartes Labs engine breaks a geographic region of interest into sub-tiles for parallel processing on a cluster of machines. The pixel classifier evaluates every pixel in the sub-tile, creating a heatmap of predicted waste locations. At the same time, the patch classifier is convolved across the scenes with a stride width of 8 pixels. This generates a set of patch-based predictions for the presence of waste (Fig. 11).

Pixel Classifier Candidate Detection (Stage 3.2)
To identify candidates from the pixel classifier heatmap, we mask any prediction below a threshold of 0.6, and detect connected clusters ("blobs") of pixels with high-valued predictions using the scikit-image determinant-of-Hessian blob detection function [38]. This eliminates single-pixel noise that may be present in the outputs and also produces a single coordinate for each candidate waste site. The sensitivity of the candidate detection stage can be tuned by controlling the minimum required blob size (min_sigma) and setting the prediction value threshold (Table 4, fig.  13).

Patch Classifier Candidate Validation (Stage 3.3)
Because the pixel classifier has no ability to incorporate spatial information, it is liable to misclassify objects that share a spectral profile with waste. We have seen that the pixel classifier may positively identify the plastic roofs on greenhouses given their similar spectral profile to plastic waste in dump sites. For this reason, each pixel classifier candidate is checked by the patch classifier predictions for that location. If any of the patch classifier predictions is greater than a threshold, the candidate site is accepted. Again, sensitivity can be tuned by adjusting this patch classifier threshold (Table 4).   Manual Site Verification (Stage 3.4) A curator evaluates each candidate site using publicly-available highresolution satellite data, Google Street View imagery where available, and/or Planetscope data.

Site Footprint Monitoring (Stage 4)
For each confirmed waste aggregation we compute the footprint of the site and how it changes through time. To do so, we extract composited mosaic pairs as previously described (Stage 1.1). The composites are extracted every month for the full extent of the Sentinel-2 catalog, reaching from mid-2017 through January, 2021. A pixel-classifier prediction is computed for every pixel within the patch, which produces a set of heat map predictions of waste locations for each time point in the dataset.
Generating site boundaries is an unforgiving task. Misclassification of a single 10-meter Sentinel-2 pixel may represent a substantial fluctuation in the total site area, and classification at single time points is prone to noisier predictions than we achieve after time averaging in the detection stage. Thus, we generate and apply a rolling prediction mask to minimize the influence of outliers that are more often present when evaluating at a single time point. This mask is computed as a thresholded rolling median of the 8 following predictions and is applied to the current frame. Applying a mask utilizes the information present in the time series prediction in order to generate a region of interest and filters outliers, while still allowing the outputs at the current time step to establish the current waste location.
This masked prediction frame is thresholded, and contours surrounding the binary output are generated [39]. These contour boundaries establish the waste site footprint at monthly intervals. Because the pixel classifier takes temporal inputs that are offset by a six month period, contours tend to represent the locations of waste that are present at both time points. This process is repeated for each image in the dataset to create a record of site footprints through time.

Data Availability
The dataset produced in this work is open and can be explored at https://plastic.watch.earthrise. media/. Site location data can be accessed at the https://github.com/earthrise-media/plastics Github repository, or via an API at https://api. plastic.watch.earthrise.media/sites. Given that this work will expand in sensitivity and scope, only the "GPW v1.0" branch on the repository will remain limited to the sites published in this work. Site boundaries can be accessed programmatically using the site ID at the contour API endpoint (e.g. https: //api.plastic.watch.earthrise.media/sites/ 8f416806d8a2ad9/contours).
Sentinel-2 L1C data is freely available through a variety of sources. In this work, we access this data through the Descartes Labs platform, which is not publicly accessible. All geophysical site metadata is sourced from Google Earth Engine [40], which is free for research and educational use. Population data is openly accessible via the https://worldpop. org/API, and known waterways can be accessed from OpenStreetMap using the Overpass API (https://wiki. openstreetmap.org/wiki/Overpass_API).

Code Availability
The code and models used for this work are available in a public Github repository at https://github.com/ earthrise-media/plastics. Given that the code in this repository continues to be under active development, the results in this paper may be reproduced using the release titled "GPW v1.0." Access to the Descartes Labs geospatial analytics platform is required to access the Sentinel-2 data for model runs and training data generation.

Acknowledgements
This work was supported and funded by the The Minderoo Foundation. It would not have been possible with the support of the full Earthrise team. Particular thanks to Glynis Lough for structuring thought and action, Daniel Israel for deploying the system and validating candidates, Stephen Downs for creative direction of the data exploration platform, and Tom Ingold and Tom MacWright for building the infrastructure and site.   Figure 14: Diagram showing the methodology pipeline in greater detail. Elements are colored according to type. Processing stages are shown in gray, processing configuration parameters in red, and outputs in blue. Major pipeline components grouped and contained within dashed outlines.