Human degradation of tropical moist forests is greater than previously estimated

In this study, we use the spaceborne GEDI⁶ from the National Aeronautics and Space Administration (NASA) to analyse the extent of forest degradation on canopy structure at pantropical scale, but its short lifetime limits long-term monitoring. To overcome this limitation, we combine GEDI data with long-term information on forest dynamics from Landsat using a space-time substitution strategy. While this approach has been used in previous studies¹⁵, it assumes that differences in neighbouring land characteristics can be used as a proxy for changes over time and that climate and vegetation remain relatively constant over the 20- to 30-year analysis period. For example, when studying forest recovery, we assume that different height metrics from GEDI represent different ages since the last disturbance.

Preparation of input datasets

TMF datasets

We use JRC-TMF, which provides information on changes in humid forest cover from 1990 to 2022 derived from the Landsat archive collection (more details on the methodology and accuracy assessment in Vancutsem et al.⁴). Mangrove forests were excluded from the analysis as periodical tidal floods affect the consistent estimation of canopy height over time. Bamboo-dominated forests were also excluded, as the dynamics of forest structure are related to seasonal or occasional defoliation rather than anthropogenic disturbances. We used the JRC Transition Map and the Annual Change Collection that capture the TMF extent and the related disturbances on an annual basis to derive the following classes.

Intact forests

Undisturbed forest (forest without any disturbance observed over the Landsat time series) located at more than 120 m from degraded forests and more than 3,000 m from the forest/non-forest edge.

Degraded forests

Closed evergreen or semi-evergreen forests that have been temporarily disturbed for a period of a maximum of 2.5 years by selective logging, fire, or unusual weather events. We derived the year since the last degradation from the JRC-TMF dataset used as a proxy for forest recovery. To attribute forest degradation to its direct driver, we first used the global forest cover loss due to fire dataset (GFC-Fire) from 2001 to 2021 from The Global Land Analysis and Discovery (GLAD) laboratory⁵⁶. All certainties of forest fires were considered. Regarding degradation due to selective logging, we performed an extensive visual interpretation and delineation of selective logging operations based on their specific spatial features visible on the JRC-TMF Transition Map. The selected degraded forest pixels correspond to temporary logging roads, logging felling gaps, decks, and skid trails. This dataset covers Brazil, French Guiana, Guyana, Cameroon, Central African Republic, Gabon, Congo, the Democratic Republic of Congo, Indonesia, Malaysia and Papua New Guinea. The managed forest concessions dataset from the World Resource Institute was used to guide the collection of polygons in central Africa and southeast Asia while the delineation in the Amazon was generated from previous scientific experience^57,58. An independent visual interpretation of selective logging was performed in order to analyse how the delineation influenced our results. This sensitivity analysis showed small differences in the magnitude and trends of logging impacts on forest structure without altering the subsequent analysis and conclusions (Supplementary Fig. 8). It also proved to be unbiased and robust when comparing changes in forest height in the vicinity of forest degraded by selective logging (Supplementary Fig. 9). We created a buffer of 300 m radius (10 Landsat pixels) around fire pixels to avoid an overlap between the two causes when analysing impacts from selective logging alone. When looking at forest degradation alone, we excluded pixels within the edge forest defined with a conservative value of 120 m from the edge.

Forest edge

To compute forest edges, we considered undisturbed or degraded forest pixels from the JRC-TMF Annual Change Collection dataset for years spanning from 1989 to 2022. We applied a 5 × 5 pixel moving window for all annual forest maps to remove isolated pixels for both forest and non-forest classes using the sieve algorithm and replace them with the value of the most frequent class within the moving window. For the analysis of forest edge effect penetration, we used the extent of forest cover in 2022 to derive undisturbed forest edges using edge widths varying from 60 m to 10,200 m at different intervals (0–60 m, 60–120 m, 120–240 m, 240–420 m, 420–720 m, 720–1,020 m, 1,020–1,500 m, 1,500–2,040 m, 2,040–2,580 m, 2,580–3,120 m, 3,120–4,020 m, 4,020–5,100 m, 5,100–6,000 m, 6,000–7,200 m, 7,200–8,100 m, 8,100–9,000 m, 9,000–10,200 m) using the Euclidean distance calculated from the non-forest class. These distances were selected based on previous studies reporting on the scale at which edge effects operate and affect microclimate⁵⁹ (up to 400 m), canopy moisture levels²⁹ (up to 2.7 km), phenology⁶⁰ (up to 5–10km) and forest biomass²² (up to 1.5 km). The first 6 intervals of distances are centred on the most recent and accurate evaluation of the extent of the edge effect^17,18,32,61 (~100–200 m). To focus on the scale of edge effects due to deforestation, we discarded grid cells of 1.5° showing a value of accumulated deforestation (1991–2022) compared to forest area in 1990 of less or equal than 2% (estimates derived from JRC-TMF). To mitigate the effects of canopy disturbance interactions between degraded and undisturbed forests, we eliminated areas of transition using a buffer of 120 m around degraded forests. This distance corresponds to the area initially affected by the felling of individual trees in selective logging operations⁶² where localized edge effects are the highest²⁸. To calculate the age of forest edges, we adopted a 120 m edge width, which constitutes the threshold of significant AGBD changes observed in the tropics^7,17,18. We produced forest edges from 1989 to 2022, masked natural edges (transitions forest/water and forest/savannah), derived the year of forest edge creation and computed the age of all edges classified as forest in 2022. We separated forest edges into undisturbed forest edges, burned forest edges (where a fire from the GLAD dataset occurred after the year of forest edge creation) and logged forest edges (all other types of degradation occurring after the year of forest edge creation).

Forest regrowth or secondary forest

Forest regrowth or secondary forest refers to a two-phase transition from moist forests to deforested land to vegetative regrowth. A minimum duration of 3 years (2020–2022) of permanent presence of moist forest cover is needed to classify a pixel as forest regrowth to avoid confusion with other land uses. Using the JRC-TMF Annual Change Collection, we calculated the age of secondary forests (from 1 to 32 years old), which may have an uncertainty of 1 year, depending on whether a deforestation event was detected at the end of a year or at the beginning of the next year. In case of late detection, the area will be classified as regrowing one year later (if it does not show signs of permanent deforestation).

GEDI dataset

The GEDI mission uses a LiDAR deployed on the International Space Station from April 2019 until March 2023. One of its primary scientific objectives is to map forest structural properties and understand the effects of vegetation structure on biodiversity. It provides sparse measurements (hereafter sample plots or shots) of vegetation structure, including forest canopy height⁶ with a vertical accuracy of about 50 cm, over an area defined by a sampling footprint of ~25 m width. For our analysis, we used GEDI L2A⁶³ Elevation and Height Metrics (version 2) and GEDI L4A⁶⁴ Above Ground Biomass (version 2.1) which represent returned laser energy metrics on canopy height and estimated AGBD for each 25 m diameter GEDI footprint. The footprint data are geolocated and have an expected positional error⁶ (that is, horizontal geolocation accuracy) of 11 m. For each footprint, we extracted a set of relative height (RH) metrics, the AGBD and the associated prediction standard error (AGBD_SE). AGBD are reported as weighted averages, using the AGBD_SE as weight. Note that the estimation of AGBD based on RH metrics from GEDI L2A varied considerably in performance across the TMF domain, having a determination coefficient (R²) of 0.66 (mean residual error (MRE) of 10.4 Mg ha⁻¹), 0.64 (MRE of 15.32 Mg ha⁻¹), 0.36 (MRE of 121.15 Mg ha⁻¹) and 0.61 (MRE of 8.17 Mg ha⁻¹) for South America, Africa, Asia and Oceania, respectively (further details on the validation of the GEDI L4A are in ref. ²⁵).

RH metrics represent the height (in metres) at which a percentile of the laser energy is returned relative to the ground. RH98 corresponds to the maximum canopy height (hereafter ‘canopy height’), which is a more stable height metric than RH100. RH50 (also known as ‘height of median energy’ (HOME)²⁴) is the median height at which the 50th percentile of the cumulative waveform energy returned relative to the ground and has been identified as one of the LiDAR metrics with the greatest potential for estimating structural characteristics in tropical forests²⁴. When validated against ground-based data, RH50 generally exhibits a strong correlation with key structural variables, including AGBD, stem diameter, and basal area⁶⁵. Due to its strong dependence on the vertical distribution of canopy elements and gaps within the canopies and canopy cover, RH50 serves as a highly complementary metric to RH98 for characterizing changes in canopy structure from degradation⁶⁶ (see also Extended Data Fig. 7).

We selected GEDI data acquired from 1 January 2019 to 31 December 2022. To select the highest quality data, we filtered the GEDI data (both GEDI L2A and L4A) by selecting only the observations collected in power beam mode and labelled them as good quality (quality flag equals 1), thus avoiding risks of having degraded geolocation under suboptimal operating conditions (degrade flag equals 0). Additionally, we filtered GEDI 2A data using only night acquisitions to limit the background noise effects of reflected solar radiation. We used the Shuttle Radar Topography Mission (SRTM) information to exclude GEDI footprints above 20° slopes to avoid errors in vegetation height. Steep slopes might lead to erroneous relative height metrics (especially over sparsely vegetated areas), so applying our threshold of 20° is a conservative approach⁶⁷. Additionally, we filtered out GEDI footprints classified as water in the Global Land Analysis and Discovery Landsat Analysis Ready Data quality layer (ARD; https://glad.umd.edu/ard/home) or when a GEDI footprint was located within an urban area defined by the Global Urban Dataset of Florczyk et al.⁶⁸. Finally, we excluded GEDI footprints with RH98 values below 5 m to be compliant with the Food and Agriculture Organization (FAO) definition of forest.

Further, we used the beam sensitivity information from GEDI L2A as a proxy for signal-to-noise ratio and the ability of GEDI to penetrate the highest canopy cover. For the intact and undisturbed forest classes, we considered only shots with a beam sensitivity greater than 0.98, while for the other classes (for example, degraded, edge and regrowth forests), we used a beam sensitivity greater than 0.95, as previously recommended^67,69.

Combining datasets

On the temporal scale, we used separately yearly GEDI data to estimate as accurately as possible the year since the last disturbance (that is, degradation, forest edge creation or deforestation). All degraded and edge forests were masked out if the date of disturbance or the year of edge creation occurred during the GEDI acquisition period. A similar step was performed for secondary forests when the year of regrowth overlapped with the GEDI acquisition period. On the spatial scale, to reduce the noise caused by GEDI geolocation errors, we applied a morphological (circular shape) filter of 35 m to the forest cover change class of interest (intact, degraded, edge or regrowth), which resulted in the removal of single- small-patches of pixels. We thus ensured that GEDI samples fell within the class of interest and avoided any partial overlap. The extent of mapped forest change areas in the JRC-TMF dataset was used to target the sampling of GEDI footprints and quantify forest edge effects or canopy disturbance contagiousness between degraded and undisturbed forests on forest structure still classified as ‘undisturbed forest’.

To ensure robust and comparable observations of forest structure metrics across the multiple classes of forest cover change, we considered a minimum of 600 GEDI samples for each 1.5° grid cell (~167 km at the Equator; around a given point) and a minimum of 7 grid cells per continent to derive continent-level statistics of forest RHs and AGBD. When analysing the time series (Fig. 3 and Supplementary Fig. 5), a minimum threshold of 30 GEDI samples for each time step of the trajectory—and a minimum of 600 GEDI samples for the sum of all the time steps—within each grid cell was required. Note that the time step does not refer to the GEDI date but to the JRC-TMF dataset where the timing of degradation, regrowth etc. is assessed. Similarly, for edge effect penetration, a minimum of 30 GEDI samples for each distance to the edge within each grid cell—and a minimum of 600 GEDI samples for the sum of all the distances—was required (see Fig. 2 and Supplementary Fig. 3). Metadata on the number of GEDI samples for aggregated classes of forest cover change is provided in Supplementary Fig. 2. Wall-to-wall information of relative heights with high spatial resolution on large scales, such as those produced by Lang⁷⁰ for canopy height only, will increase in the future the quantity of data, thus improving the quality and the robustness of the analysis.

The computation of canopy heights for intact, degraded, edge, and regrowing TMFs at the 1.5° grid cell level may vary due to local environmental and anthropogenic factors (for example, soil and forest types), leading to potential high variability in the reported canopy height statistics. In order to reduce sampling bias in the structural variable dataset, we randomly resampled GEDI observations 500 times within each 1.5° × 1.5° grid cell. We then summarized the random samples by calculating the mean and standard deviation of each structural variable, for each grid cell. Using this random sampling procedure based on the iteration (500 times) of sampling 300 GEDI observations for each grid cell, we found that the intra-grid variability of canopy heights was not significant. The results of the random sampling procedure show the low standard deviation for each class of RH98 distribution and forest considered (that is, intact, degraded, edge and regrowth) (Supplementary Fig. 10).

Intact forest landscape assessment and comparison with Potapov’s data product

We selected undisturbed forests in 2020 free from any disturbances located at: (1) a distance higher than the scale of the forest/non-forest edge effect identified at the grid cell level; and (2) more than 120 m distance from degraded forests from the JRC-TMF dataset (identified scale of the forest/degraded forest edge effect). Potapov’s map of 2020²⁷ was constrained to the extent of TMFs (excluding mangroves and bamboo-dominated forests). We resampled our JRC-TMF-derived intact forest landscape (IFL) map from 30 m to 1 km. We computed the number of connected pixels (where each pixel contains the number of 4-connected neighbours) and then restricted them to values greater or equal to 500 to obtain an approximation of forest patch area greater than 500 km² (to match the definition of IFL of Potapov, with a minimum area of 500 km²). Other criteria in Potapov on minimum IFL patch width (10 km) or minimum corridor width (2 km) were not implemented in our approach.

Statistical tests

We performed a series of one-way ANOVAs to test for differences in the impacts of edge effects at different distances and times on the long-term recovery of the relative heights and biomass variables. ANOVAs were performed separately for each continent. For the height variables (RH50 and RH98), a series of standard one-way ANOVAs were used. In the analyses involving AGBD, we used a modified approach to propagate the prediction standard error associated with the AGBD dataset values which involved using a Monte Carlo approach (n = 500). In brief, we generated random noise that was added to the AGBD data. For each iteration i we generated a noise term, noise_i_j, by drawing a random value from a normal distribution with mean μ of 0 and s.d. equal to the prediction standard error of the AGBD (σ_j) for each GEDI footprint. The noise can be represented as: ${{\rm{n}}{\rm{o}}{\rm{i}}{\rm{s}}{\rm{e}}}_{ij}\sim N(\mu ,{\sigma }_{j}^{2})$. We then perturbed the AGBD values by adding the generated noise to the original dataset (biomass_original,j) for the ith iteration (biomass_{perturbed,i,j}). We then performed an ANOVA for each iteration using the perturbed dataset and recorded the results. We subsequently examined the distribution density of the F values. The results showed minimal variability suggesting that observed differences are robust to uncertainty associated with the AGBD values (Supplementary Fig. 11). For each ANOVA, we conducted a series of Tukey honest significant difference post hoc tests to assess significant differences between distance classes or time steps. The significance level was set to P < 0.05.

Modelling deforestation risk

We assessed whether changes in RH50, RH98 and AGBD due to the occurrence of forest degradation and the distance to the edge represent an early warning signal of future deforestation. We retrieved GEDI footprints of 2019 sampled in forest degraded before 2018, followed or not by deforestation (2020–2022), together with GEDI footprints of 2020 sampled in forest degraded before 2019, followed or not by deforestation (2021–2022) and, footprints of 2021 sampled in forest degraded before 2020 followed or not by deforestation (2022). We then separated all samples based on their location within the first 120 m to the edge or beyond. The probability of deforestation in degraded forests was modelled using a generalized linear modelling approach. We fitted two models. One included only a single predictor, so that the percentage of intact forest height was the only predictor (Supplementary Fig. 12). The second model included two predictors—that is, the percentage of intact forest height and the distance to the edge. The error structure associated with the models was assumed to be binomial with a logit link function. A given model takes the general form:

$${Y}_{i} \sim B({\pi }_{i},{n}_{i})$$

(1)

$$E({Y}_{i}) \sim {n}_{i}\times {\pi }_{i}\,\,{\rm{and}}\;{\rm{var}}({Y}_{i}) \sim {n}_{i}\times {\pi }_{i}\times (1{-\pi }_{i})$$

(2)

$${\rm{logit}}({\pi }_{i})={\eta }_{i}$$

(3)

$${\eta }_{i}=\alpha +\beta {X}_{i}$$

(4)

where Y_i is the ith observation corresponding to the occurrence of a deforestation event and βX_i is a matrix of regression coefficients.

Models were fitted within a Bayesian framework. We fitted the models using the programming language Stan via the brms package in the R software for statistical computing⁷¹. Models were run using 4 chains of 4,000 iterations each, with a warm-up of 1,000. We used the brms default priors for our model parameters. Convergence was visually assessed using trace plots (Supplementary Fig. 13) and the Rhat values (that is, the ratio of the effective sample size to the overall number of iterations, with values close to one indicating convergence). Markov chain Monte Carlo diagnostics showed a good convergence of the four chains, while the posterior distributions are centred around one peak value. The discriminatory ability of the models—that is, their ability to successfully predict a deforestation event—was assessed using the receiver operating characteristic (ROC) curve. We calculated the area under the curve (AUC) and compared the values with the guidelines provided by Swets⁷².

Cloud computing platform

All data extraction for this study was performed in Google Earth Engine⁷³, which provides the ability to compute GEDI footprint statistics and analyse the entire data records with high computational efficiency. The GEE data catalogue contains processed L2A and L4A GEDI data products—that is, the rasterized versions of the original GEDI products, with each GEDI shot footprint represented by a 25 m pixel.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link