Earth Engine is introducing noncommercial quota tiers to safeguard shared compute resources and ensure reliable performance for everyone. All noncommercial projects will need to select a quota tier by April 27, 2026 or will use the Community Tier by default. Tier quotas will take effect for all projects (regardless of tier selection date) on April 27, 2026. Learn more.

Brazil Forest Imagery Dataset (G-BFID v1.0)

Introduction

In support of the Brazilian government's efforts to implement the Brazil Forest Code and protect forests on private lands, Google has produced two basemaps derived from SPOT satellite observations: a Visual Basemap and an Analytic Basemap (G-BFID v1.0).

These mosaicked image products establish a baseline for July 22, 2008, a critical date under the Brazil Forest Code for identifying "consolidated areas"—regions with pre-existing human occupation or agricultural use. Derived from the SPOT satellite archive, these mosaic datasets provide a higher-resolution alternative to the 30-meter Landsat data traditionally used to establish this baseline.

To synthesize thousands of SPOT images into unified basemaps, the imagery underwent a processing routine consisting of:

Edge refinement to remove compression artifacts.
Radiometric normalization against a Landsat baseline.
Cloud masking (conservative, manual).
Misregistration correction (coregistration to a Landsat composite).

For the final compositing, a deterministic mosaic method was employed rather than statistical reduction. Pixels were layered according to spatial resolution and satellite mission chronology, with priority given to observations from newer satellites. This layering hierarchy was combined with nearest neighbor resampling to establish strict data provenance. Consequently, every pixel in the final basemap retained its discrete origin and remained directly traceable to a specific source observation and its metadata.

Visual and Analytic Basemaps of Brazil — **Figure 1.** Full-extent representation of the G-BFID v1.0 Visual Basemap in pseudo-natural color (left) and the Analytic Basemap displayed as a near-infrared false-color (right).

Source data

Satellite and sensor specifications

The G-BFID v1.0 basemaps are derived from SPOT (Satellite pour l’Observation de la Terre) image data. The SPOT missions, operated by CNES and distributed by Airbus, provide high-resolution optical imagery with a 60 km swath. This project utilizes an archive of imagery from three specific satellites to establish the 2008 baseline:

SPOT 2 & 4: Equipped with the HRV/HRVIR sensors, providing multispectral data at 20-meter resolution and panchromatic data at 10-meter resolution.
SPOT 5: Equipped with the HRG sensor, offering significantly improved spatial resolution with multispectral data at 10-meter resolution and panchromatic data at 5-meter resolution.

	SPOT 2	SPOT 4	SPOT 5
Multispectral	20 m	20 m	10 m
Panchromatic	10 m	10 m	5 m
Spectral Bands	Green, Red, NIR	Green, Red, NIR, SWIR	Green, Red, NIR, SWIR

Table 1. Technical specifications for SPOT 2, 4, and 5 missions.

Note: Airbus produces a "supermode" panchromatic product that can be used to pan–sharpen SPOT 5 images to 2.5-meter resolution. The availability of these data are very limited for the specifications of this project and were not included.

Input Earth Engine collections

Three distinct SPOT data products were ingested and processed to generate the final basemaps. These collections span the period from January 9, 2007, to November 26, 2009, providing the necessary temporal depth to overcome cloud cover constraints while maintaining a high-resolution baseline.

Scene selection

To support forest protection efforts on private lands registered in the Rural Environmental Registry (CAR), geographic coverage was prioritized for the Amazon biome and the five "Arc of Deforestation" states: Maranhão, Mato Grosso, Pará, Rondônia, and Tocantins.

To align with the requirements of the Brazil Forest Code, a mid-2008 temporal window was targeted. Based on these temporal and cloud-cover criteria (<50%), a total of 10,072 images from SPOT 2, 4, and 5 were identified and ingested into the Earth Engine Data Catalog for processing.

Note on Spectral Composition: SPOT 2, 4, and 5 sensors do not natively capture a blue spectral band. For the natural color products, a synthesized blue band—derived from existing spectral bands to approximate a true-color representation—is provided by the data distributor.

1. Pansharpened multispectral natural color

Earth Engine Asset: AIRBUS/SPOT_2_4_5/BRAZIL/2007_2009/PMS_NC/V1
Number of Images: 2977
Date Range: March 1, 2007 – November 26, 2009
Description: This product merges the higher-resolution panchromatic band with the multispectral bands to produce a sharp, 3-band RGB image. This is the preferred source for the Visual Basemap.

Histogram of images by year-month for pansharpened natual color image collection — **Figure 2.** Histogram of images by year-month for the pansharpened multispectral natural color image collection.

2. Multispectral natural color

Earth Engine Asset: AIRBUS/SPOT_2_4_5/BRAZIL/2007_2009/MS_NC/V1
Number of Images: 3536
Date Range: January 9, 2007 – November 26, 2009
Description: A 3-band pseudo-natural color product (simulated RGB) at native multispectral resolution. This collection is used in the Visual Basemap as a secondary source where pansharpened data may be unavailable.

Histogram of images by year-month for multispectral natural color image collection — **Figure 3.** Histogram of images by year-month for the multispectral natural color image collection.

3. Multispectral

Earth Engine Asset: AIRBUS/SPOT_2_4_5/BRAZIL/2007_2009/MS/V1
Number of Images: 3559
Date Range: January 9, 2007 – November 26, 2009
Description: The source for the Analytic Basemap. This collection contains the original spectral bands (including near-infrared and short-wave infrared where available) necessary for calculating vegetation indices (e.g., NDVI) and performing cover classification.

Histogram of images by year-month for multispectral image collection — **Figure 4.** Histogram of images by year-month for the multispectral image collection.

Geographic coverage and gaps

The primary objective of the G-BFID v1.0 basemaps is to establish a high-resolution, near-2008 baseline for the administrative territory of Brazil. However, maintaining a strict temporal window (2007–2009) while filtering for high-quality, low-cloud images resulted in spatial gaps, particularly in regions with persistent cloud cover.

Scene selection and quality filtering

To ensure data integrity, a candidate pool was identified from 2007–2009 with <50% initial cloud cover. From this, a total of just over 10,000 images were ingested and further filtered to exclude those with significant quality issues:

Discontinuity check: Images with manually identified telemetry or geometric artifacts, including discontinuous scenes bundled together by the data provider, were excluded.
Cloud QA: Scenes rejected by professional image analysts for being entirely cloud-cover or other reasons.
Data density: Only images with >10% valid pixels (Visual Basemap) remaining after cloud masking were retained (>5% for Analytic Basemap).

Spatial distribution

As shown in Figure 5, the mosaics achieve the highest density in the "Arc of Deforestation" and the north/central-west regions. Transparent areas represent regions where no SPOT 2, 4, or 5 imagery passed the quality filters listed above within the target timeframe or were cloud masked.

Map of Brazil showing the spatial coverage of the G-BFID v1.0 Visual Basemap. — **Figure 5.** Spatial extent of the G-BFID v1.0 Visual Basemap. The internal gaps and incomplete coverage in the south and east are a result of prioritizing the 2008 temporal baseline, quality filtering, and cloud masking. While the Analytic Basemap is not shown, its spatial extent is nearly identical to the Visual Basemap at this scale.

Coverage by state

Figure 6 provides a state-by-state breakdown of valid pixel coverage. Coverage is near-complete (>95%) for states such as Rondônia and Mato Grosso, while states in the south and parts of the northeast show lower density due to the constraints of the archive and cloud masking.

The chart also illustrates that the Visual Basemap consistently provides slightly higher valid pixel coverage than the Analytic Basemap across nearly all states. This difference exists because the Visual product can utilize both the pansharpened and multispectral natural color collections, whereas the Analytic product is derived solely from the multispectral collection.

Bar chart comparing Visual and Analytic basemap valid pixel coverage across Brazilian states. States are listed vertically, and percent coverage horizontally. — **Figure 6.** Valid pixel coverage for Visual and Analytic basemaps by Brazilian state. Coverage is highest in the "Arc of Deforestation" states and lowest in regions with persistent cloud cover, limited SPOT acquisition, and cloud masking.

Processing methodology

The processing methodology for the G-BFID v1.0 basemaps prioritizes data provenance and radiometric integrity to support Forest Code compliance assessments. This section first defines the overarching mosaic architecture and layering logic used to preserve this integrity, followed by the chronological preprocessing and normalization steps applied to individual source images prior to final assembly.

Mosaic method

The final basemaps were created using a mosaic method rather than statistical reducers (such as mean or median composites). This approach ensures the final products retain the original spectral values and spatial textures of the source imagery. By avoiding the averaging of multiple observations, the mosaic preserves the discrete provenance of each pixel.

Layering logic

The input imagery was layered using a hierarchy that favors higher spatial resolution and later satellite missions:

Visual Basemap: Imagery was sorted by pixel size (prioritizing the highest resolution data available) and then by satellite mission (prioritizing data from newer SPOT satellites).
Analytic Basemap: Imagery was layered by satellite mission, prioritizing observations from later SPOT missions to target the best available data.

Integrity and traceability

Resampling: All internal reprojections used nearest neighbor resampling. This avoids the interpolative smoothing effects of other methods, ensuring that the original radiometric and spatial properties of the source pixels are not influenced by their neighbors.
Data traceability: A per-pixel date metadata band is included in both products. This allows users to identify the exact date of observation for any given location, ensuring full transparency for Forest Code compliance assessments.

Image edge refinement

To ensure clean boundaries between overlapping scenes, an edge refinement process was applied to remove artifacts present in the source imagery. These artifacts, appearing as speckled or "noisy" pixels along image borders, were a characteristic of the lossy compression in the data delivered by the provider. To address this, a 2.5-pixel focal minimum erosion was applied to the image masks, removing the low-quality edge artifacts and ensuring that only valid data is used in the final mosaics (Figure 7).

Side-by-side comparison showing image edge artifacts before and after refinement. — **Figure 7.** Comparison of an image border before (left) and after (right) the edge refinement process. The 2.5-pixel erosion removes the "noisy" artifacts caused by lossy compression, leading to cleaner transitions between overlapping scenes.

Cloud masking

To ensure the highest possible data integrity for G-BFID v1.0, a manual masking procedure was implemented across the imagery archive. This approach was chosen over automated methods to allow for a more conservative exclusion of pixels affected by clouds or other atmospheric interference.

Masking procedure

Trained analysts identified areas containing clouds and related atmospheric artifacts. To ensure these problematic pixels were fully captured, a conservative masking strategy was employed using simplified, coarse geometries. Rather than precisely tracing individual cloud perimeters, larger rectangular areas were masked to ensure the resulting mosaic remained as clear as possible.

While this approach is intentionally aggressive—often removing valid pixels adjacent to clouds—it was deemed necessary to provide a high-integrity product for the 2008 baseline.

Masking policy and integration

Manual masks were produced exclusively for the multispectral and pansharpened natural color image collections. Because the multispectral natural color product is derived from the multispectral data, these masks were then propagated to the coincident multispectral images during processing.

Any multispectral image that did not have a matching, manually-masked natural color counterpart was excluded from the final mosaic. This ensures that every pixel included in the G-BFID v1.0 suite—regardless of the product type—has had clouds removed.

Interpretation of data gaps

Transparent gaps in the final mosaics represent areas where no valid, high-quality data is available within the 2007–2009 temporal window. These gaps result from a combination of the conservative manual masking described above, the initial rejection of scenes with high cloud cover (>50%), or a total absence of available source imagery from the provider for a specific region. While larger regional gaps typically reflect a lack of available 2008-era images, the distinct "blocky" patterns shown in Figure 9 are a consequence of cloud masking.

Examples of manual cloud masking on satellite imagery, showing rectangular areas removed. — **Figure 9.** Examples of the manual masking applied to the Visual Basemap. The rectangular black polygons illustrate the conservative approach used to exclude clouds and other atmospheric artifacts from the final products. Masked pixels (no data) in the data products are transparent.

Misregistration correction

An automated coregistration workflow was implemented to mitigate significant misregistration in the SPOT source imagery relative to a geographic baseline derived from Landsat Collection 2 data.

Reference baseline

A cloud-free Landsat Collection 2 reference mosaic was generated to serve as a coregistration reference image. This reference image was constructed using a median reducer on Landsat 7 and 8 images intersecting Brazil for the period 2006–2010. The Landsat red band was selected as the primary registration target to match the SPOT red band.

Displacement estimation

The ee.Image.displacement algorithm was used to calculate the pixel-level offset between the SPOT source data and the Landsat reference.

Search parameters: A maximum offset of 500 m and a stiffness parameter of 5 were applied to the displacement model.
Statistical aggregation: Delta x ($dx$) and delta y ($dy$), and confidence values were aggregated across the entire image area using a mean reducer.
Magnitude calculation: From these aggregated statistics, a total magnitude offset estimate $M = \sqrt{dx^2 + dy^2}$ was calculated to represent the average displacement of the scene.

Correction policy

Scenes were categorized and corrected based on the calculated displacement statistics to prioritize significant improvements while avoiding the introduction of new artifacts:

Automatic correction: Images exhibiting a displacement magnitude $M > 30$ m with a confidence score $C > 0.3$ were automatically shifted using the estimated $dx$ and $dy$ values.
Manual assessment: For scenes with very high estimated displacement ($M > 100$ m) but low confidence ($C \le 0.3$), manual review was performed. Corrections were accepted only if the resulting coregistration showed a definitive improvement over the original placement.
Exclusion: Scenes that remained significantly misregistered after attempted correction, or those with insufficient features for reliable matching, were excluded from the mosaic.

Implementation and quality control

Corrected images were re-projected using nearest neighbor resampling to preserve the original radiometric values and avoid the smoothing effects of bilinear or cubic interpolation.

To ensure spatial traceability, a boolean coregistered band was appended to each image and preserved in the final mosaic. This metadata allows users to distinguish between pixels that underwent spatial adjustment and those retained in their original delivered position.

Radiometric normalization

To account for varying atmospheric conditions and sensor differences across the SPOT source collections, radiometric normalization was applied to images comprising the mosaic products. Both the Visual and Analytic basemaps utilize histogram matching against a consistent 2008 Landsat target baseline.

Landsat was selected as the reference over coarser alternatives like MODIS because its 30-meter resolution more closely aligns with the 5 to 20-meter SPOT data. This similarity ensures that the spectral histograms are more equally representative, allowing for a more accurate radiometric transfer during the matching process. The specific methodology for this harmonization differs slightly based on whether the final use case is visual or analytic.

Visual Basemap

To minimize radiometric discontinuities between adjacent images and ensure a near-seamless appearance, a color-balancing workflow was implemented using histogram matching. Pixel values were adjusted to match a consistent 2008 Landsat mosaic target baseline.

The process follows these steps:

Analysis masking: To ensure stable statistics during histogram matching, a temporary analysis mask is generated to exclude areas that could skew the data. This masking targets two primary features:
- High-change areas: Pixels exceeding the 95th percentile of absolute difference between the SPOT image and the Landsat reference mosaic are excluded.
- Water bodies: High reflectance variability over water is excluded using the JRC Yearly Water Classification History dataset. The dataset is filtered to the year 2008, and an inverse mask is applied to ensure only non-water classes are retained for the statistical analysis.
Lookup table (LUT) generation: Using the masked data, cumulative histograms are calculated for both the source SPOT bands and the target Landsat bands.
Interpolation: Source pixel values are remapped to target values using the generated LUT, aligning the radiometric profile of the SPOT data with the 2008 Landsat reference.

Comparison of image mosaics before and after radiometric normalization. Left shows color mismatches, right shows seamless color. — **Figure 8.** Radiometric normalization of the Visual Basemap. The original images (left) exhibit significant radiometric variation. After histogram matching (right), the transitions between scenes are near seamless, providing a consistent visual representation across the mosaic. Black background pixels are regions of no data.

Analytic Basemap

Processing the Analytic Basemap mirrors the Visual Basemap, but includes converting DN values to top-of-atmosphere reflectance:

1. Top-of-atmosphere (TOA) reflectance conversion

The SPOT raw digital numbers (DN) are converted to TOA reflectance to account for physical sensor properties and solar geometry:

Radiance calculation: Application of the provider's band-specific physical gain and bias metadata.
Reflectance normalization: Radiance is normalized by solar irradiance, the cosine of the sun zenith angle, and the Earth-Sun distance for the specific day of acquisition.

2. Histogram matching

To further minimize radiometric differences between disparate SPOT images a histogram matching workflow is applied:

Analysis masking: This step uses the exact same analysis masking described in the Visual Basemap section above: excluding >95th percentile change pixels and filtering out water pixels via the inverted 2008 JRC classification mask.
Harmonization: Similar to the Visual Basemap, pixel values are remapped via a lookup table to align the radiometric profile with a Landsat TOA reference. This establishes mosaic-wide radiometric consistency for training broad-scale machine learning models and running reliable inferences.

Limitations and known issues

While G-BFID v1.0 provides a high-resolution 2008-era baseline, users should be aware of several limitations inherent to the historical SPOT archive and the processing methodologies employed.

Spatial completeness and gaps

The basemaps don't provide 100% geographic coverage of Brazil. Gaps exist where no imagery met the project’s strict 2008-era temporal window (2007–2009), cloud cover thresholds, or quality standards. These gaps are most frequent in regions with persistent cloud cover or lower historical satellite acquisition frequency. See the Geographic Coverage and Gaps section for details.

Variable native resolution

While the final products are delivered at a nominal pixel size of 5 meters (Visual) and 10 meters (Analytic), the source imagery consists of a mix of 5 m, 10 m, and 20 m native pixels. Because nearest neighbor resampling was used to preserve spectral integrity, the boundaries between different resolutions may be visible.

Sharp transition in image resolution and graininess across a scene boundary. — **Figure 10.** Example of varying native source resolutions across a scene boundary. The shift in "grain" or sharpness reflects the transition between different SPOT sensor resolutions preserved in the final product.

Residual misregistration

Despite the automated coregistration workflow, spatial shifts may persist in some areas. Residual misregistration is most likely to occur in regions with extreme terrain relief or in dense, homogeneous forest areas where the algorithm lacked sufficient landmarks to calculate high-confidence displacement vectors.

Atmospheric and cloud artifacts

The manual cloud masking process was intentionally conservative and aggressive; however, it is not exhaustive. Users may encounter occasional residual artifacts, such as very thin cirrus haze or small cloud shadows.

Satellite image showing residual thin hazy clouds not removed by masking. — **Figure 11.** Example of residual thin hazy clouds remaining after the manual cloud masking process.

Radiometric inconsistency and ML performance

While histogram matching was used to minimize radiometric discontinuities, residual spectral variation remains between adjacent images. For automated land cover classification or machine learning applications, this variability increases the spectral variance for a given land cover type across the mosaic. This broader data range can reduce the precision of class separation during inference, potentially leading to higher rates of error.

Visible seams and color/brightness differences at the boundaries of three overlapping satellite images. — **Figure 12.** Residual spectral differences shown at the intersection of three images. These discontinuities are visible as variations in brightness and color between overlapping images and may impact the performance of machine learning models.

Spectral saturation

In areas of extreme brightness—such as highly reflective urban surfaces, specific soil types, or bright sand—pixels may reach the sensor's maximum detectable limit. This "saturation" results in a loss of texture and detail in those specific locations.

Altered band ratios and vegetation indices

To achieve a near-seamless mosaic, histogram matching was applied to each spectral band individually, which inherently alters the original physical ratios between bands. As a result, calculating common indices like NDVI or other band-ratio metrics will produce values that differ from unaltered source datasets. While these derived indices may still capture relative spatial patterns across the mosaic, they should not be used for absolute purposes, cross-sensor comparisons, or analyses that rely on rigid index thresholds.

Brazil Forest Imagery Dataset (G-BFID v1.0) Stay organized with collections Save and categorize content based on your preferences.