AI-generated Key Takeaways
- 
          This tutorial introduces cloud and cloud shadow masking in Sentinel-2 surface reflectance data using Earth Engine, leveraging the s2cloudless dataset and near-infrared reflectance. 
- 
          The process involves filtering and joining Sentinel-2 surface reflectance and cloud probability collections based on area of interest and date. 
- 
          Functions are defined to add cloud and cloud shadow component layers, including cloud probability, cloud masks, dark pixels, and cloud projection, to each image. 
- 
          A final cloud-shadow mask is assembled by combining the cloud and shadow components, followed by morphological operations like focal min and focal max. 
- 
          The tutorial demonstrates how to visualize and evaluate the mask components using Folium and provides a function to apply the final cloud-shadow mask to generate a cloud-free composite. 
This tutorial is an introduction to masking clouds and cloud shadows in Sentinel-2 (S2) surface reflectance (SR) data using Earth Engine. Clouds are identified from the S2 cloud probability dataset (s2cloudless) and shadows are defined by cloud projection intersection with low-reflectance near-infrared (NIR) pixels.
For a similar JavaScript API script, see this Code Editor example.
Run me first
Run the following cell to initialize the Earth Engine API. The output will contain instructions on how to grant this notebook access to Earth Engine using your account.
import ee
# Trigger the authentication flow.
ee.Authenticate()
# Initialize the library.
ee.Initialize(project='my-project')
Assemble cloud mask components
This section builds an S2 SR collection and defines functions to add cloud and cloud shadow component layers to each image in the collection.
Define collection filter and cloud mask parameters
Define parameters that are used to filter the S2 image collection and determine cloud and cloud shadow identification.
| Parameter | Type | Description | 
|---|---|---|
| AOI | ee.Geometry | Area of interest | 
| START_DATE | string | Image collection start date (inclusive) | 
| END_DATE | string | Image collection end date (exclusive) | 
| CLOUD_FILTER | integer | Maximum image cloud cover percent allowed in image collection | 
| CLD_PRB_THRESH | integer | Cloud probability (%); values greater than are considered cloud | 
| NIR_DRK_THRESH | float | Near-infrared reflectance; values less than are considered potential cloud shadow | 
| CLD_PRJ_DIST | float | Maximum distance (km) to search for cloud shadows from cloud edges | 
| BUFFER | integer | Distance (m) to dilate the edge of cloud-identified objects | 
The values currently set for AOI, START_DATE, END_DATE, and CLOUD_FILTER are intended to build a collection for a single S2 overpass of a region near Portland, Oregon, USA. When parameterizing and evaluating cloud masks for a new area, it is good practice to identify a single overpass date and limit the regional extent to minimize processing requirements. If you want to work with a different example, use this Earth Engine App to identify an image that includes some clouds, then replace the relevant parameter values below with those provided in the app.
AOI = ee.Geometry.Point(-122.269, 45.701)
START_DATE = '2020-06-01'
END_DATE = '2020-06-02'
CLOUD_FILTER = 60
CLD_PRB_THRESH = 50
NIR_DRK_THRESH = 0.15
CLD_PRJ_DIST = 1
BUFFER = 50
Build a Sentinel-2 collection
Sentinel-2 surface reflectance and Sentinel-2 cloud probability are two different image collections. Each collection must be filtered similarly (e.g., by date and bounds) and then the two filtered collections must be joined.
Define a function to filter the SR and s2cloudless collections according to area of interest and date parameters, then join them on the system:index property. The result is a copy of the SR collection where each image has a new 's2cloudless' property whose value is the corresponding s2cloudless image.
def get_s2_sr_cld_col(aoi, start_date, end_date):
    # Import and filter S2 SR.
    s2_sr_col = (ee.ImageCollection('COPERNICUS/S2_SR')
        .filterBounds(aoi)
        .filterDate(start_date, end_date)
        .filter(ee.Filter.lte('CLOUDY_PIXEL_PERCENTAGE', CLOUD_FILTER)))
    # Import and filter s2cloudless.
    s2_cloudless_col = (ee.ImageCollection('COPERNICUS/S2_CLOUD_PROBABILITY')
        .filterBounds(aoi)
        .filterDate(start_date, end_date))
    # Join the filtered s2cloudless collection to the SR collection by the 'system:index' property.
    return ee.ImageCollection(ee.Join.saveFirst('s2cloudless').apply(**{
        'primary': s2_sr_col,
        'secondary': s2_cloudless_col,
        'condition': ee.Filter.equals(**{
            'leftField': 'system:index',
            'rightField': 'system:index'
        })
    }))
Apply the get_s2_sr_cld_col function to build a collection according to the parameters defined above.
s2_sr_cld_col_eval = get_s2_sr_cld_col(AOI, START_DATE, END_DATE)
/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/ee/deprecation.py:209: DeprecationWarning: Attention required for COPERNICUS/S2_SR! You are using a deprecated asset. To make sure your code keeps working, please update it. Learn more: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR warnings.warn(warning, category=DeprecationWarning)
Define cloud mask component functions
Cloud components
Define a function to add the s2cloudless probability layer and derived cloud mask as bands to an S2 SR image input.
def add_cloud_bands(img):
    # Get s2cloudless image, subset the probability band.
    cld_prb = ee.Image(img.get('s2cloudless')).select('probability')
    # Condition s2cloudless by the probability threshold value.
    is_cloud = cld_prb.gt(CLD_PRB_THRESH).rename('clouds')
    # Add the cloud probability layer and cloud mask as image bands.
    return img.addBands(ee.Image([cld_prb, is_cloud]))
Cloud shadow components
Define a function to add dark pixels, cloud projection, and identified shadows as bands to an S2 SR image input. Note that the image input needs to be the result of the above add_cloud_bands function because it relies on knowing which pixels are considered cloudy ('clouds' band).
def add_shadow_bands(img):
    # Identify water pixels from the SCL band.
    not_water = img.select('SCL').neq(6)
    # Identify dark NIR pixels that are not water (potential cloud shadow pixels).
    SR_BAND_SCALE = 1e4
    dark_pixels = img.select('B8').lt(NIR_DRK_THRESH*SR_BAND_SCALE).multiply(not_water).rename('dark_pixels')
    # Determine the direction to project cloud shadow from clouds (assumes UTM projection).
    shadow_azimuth = ee.Number(90).subtract(ee.Number(img.get('MEAN_SOLAR_AZIMUTH_ANGLE')));
    # Project shadows from clouds for the distance specified by the CLD_PRJ_DIST input.
    cld_proj = (img.select('clouds').directionalDistanceTransform(shadow_azimuth, CLD_PRJ_DIST*10)
        .reproject(**{'crs': img.select(0).projection(), 'scale': 100})
        .select('distance')
        .mask()
        .rename('cloud_transform'))
    # Identify the intersection of dark pixels with cloud shadow projection.
    shadows = cld_proj.multiply(dark_pixels).rename('shadows')
    # Add dark pixels, cloud projection, and identified shadows as image bands.
    return img.addBands(ee.Image([dark_pixels, cld_proj, shadows]))
Final cloud-shadow mask
Define a function to assemble all of the cloud and cloud shadow components and produce the final mask.
def add_cld_shdw_mask(img):
    # Add cloud component bands.
    img_cloud = add_cloud_bands(img)
    # Add cloud shadow component bands.
    img_cloud_shadow = add_shadow_bands(img_cloud)
    # Combine cloud and shadow mask, set cloud and shadow as value 1, else 0.
    is_cld_shdw = img_cloud_shadow.select('clouds').add(img_cloud_shadow.select('shadows')).gt(0)
    # Remove small cloud-shadow patches and dilate remaining pixels by BUFFER input.
    # 20 m scale is for speed, and assumes clouds don't require 10 m precision.
    is_cld_shdw = (is_cld_shdw.focalMin(2).focalMax(BUFFER*2/20)
        .reproject(**{'crs': img.select([0]).projection(), 'scale': 20})
        .rename('cloudmask'))
    # Add the final cloud-shadow mask to the image.
    return img_cloud_shadow.addBands(is_cld_shdw)
Visualize and evaluate cloud mask components
This section provides functions for displaying the cloud and cloud shadow components. In most cases, adding all components to images and viewing them is unnecessary. This section is included to illustrate how the cloud/cloud shadow mask is developed and demonstrate how to test and evaluate various parameters, which is helpful when defining masking variables for an unfamiliar region or time of year.
In applications outside of this tutorial, if you prefer to include only the final cloud/cloud shadow mask along with the original image bands, replace:
return img_cloud_shadow.addBands(is_cld_shdw)
with
return img.addBands(is_cld_shdw)
in the above add_cld_shdw_mask function.
Define functions to display image and mask component layers.
Folium will be used to display map layers. Import folium and define a method to display Earth Engine image tiles.
# Import the folium library.
import folium
# Define a method for displaying Earth Engine image tiles to a folium map.
def add_ee_layer(self, ee_image_object, vis_params, name, show=True, opacity=1, min_zoom=0):
    map_id_dict = ee.Image(ee_image_object).getMapId(vis_params)
    folium.raster_layers.TileLayer(
        tiles=map_id_dict['tile_fetcher'].url_format,
        attr='Map Data © <a href="https://earthengine.google.com/">Google Earth Engine</a>',
        name=name,
        show=show,
        opacity=opacity,
        min_zoom=min_zoom,
        overlay=True,
        control=True
        ).add_to(self)
# Add the Earth Engine layer method to folium.
folium.Map.add_ee_layer = add_ee_layer
Define a function to display all of the cloud and cloud shadow components to an interactive Folium map. The input is an image collection where each image is the result of the add_cld_shdw_mask function defined previously.
def display_cloud_layers(col):
    # Mosaic the image collection.
    img = col.mosaic()
    # Subset layers and prepare them for display.
    clouds = img.select('clouds').selfMask()
    shadows = img.select('shadows').selfMask()
    dark_pixels = img.select('dark_pixels').selfMask()
    probability = img.select('probability')
    cloudmask = img.select('cloudmask').selfMask()
    cloud_transform = img.select('cloud_transform')
    # Create a folium map object.
    center = AOI.centroid(10).coordinates().reverse().getInfo()
    m = folium.Map(location=center, zoom_start=12)
    # Add layers to the folium map.
    m.add_ee_layer(img,
                   {'bands': ['B4', 'B3', 'B2'], 'min': 0, 'max': 2500, 'gamma': 1.1},
                   'S2 image', True, 1, 9)
    m.add_ee_layer(probability,
                   {'min': 0, 'max': 100},
                   'probability (cloud)', False, 1, 9)
    m.add_ee_layer(clouds,
                   {'palette': 'e056fd'},
                   'clouds', False, 1, 9)
    m.add_ee_layer(cloud_transform,
                   {'min': 0, 'max': 1, 'palette': ['white', 'black']},
                   'cloud_transform', False, 1, 9)
    m.add_ee_layer(dark_pixels,
                   {'palette': 'orange'},
                   'dark_pixels', False, 1, 9)
    m.add_ee_layer(shadows, {'palette': 'yellow'},
                   'shadows', False, 1, 9)
    m.add_ee_layer(cloudmask, {'palette': 'orange'},
                   'cloudmask', True, 0.5, 9)
    # Add a layer control panel to the map.
    m.add_child(folium.LayerControl())
    # Display the map.
    display(m)
Display mask component layers
Map the add_cld_shdw_mask function over the collection to add mask component bands to each image, then display the results.
Give the system some time to render everything, it should take less than a minute.
s2_sr_cld_col_eval_disp = s2_sr_cld_col_eval.map(add_cld_shdw_mask)
display_cloud_layers(s2_sr_cld_col_eval_disp)

Evaluate mask component layers
In the above map, use the layer control panel in the upper right corner to toggle layers on and off; layer names are the same as band names, for easy code referral. Note that the layers have a minimum zoom level of 9 to avoid resource issues that can occur when visualizing layers that depend on the ee.Image.reproject function (used during cloud shadow project and mask dilation).
Try changing the above CLD_PRB_THRESH, NIR_DRK_THRESH, CLD_PRJ_DIST, and BUFFER input variables and rerunning the previous cell to see how the results change. Find a good set of values for a given overpass and then try the procedure with a new overpass with different cloud conditions (this S2 SR image browser app is handy for quickly identifying images and determining image collection filter criteria). Try to identify a set of parameter values that balances cloud/cloud shadow commission and omission error for a range of cloud types. In the next section, we'll use the values to actually apply the mask to generate a cloud-free composite for 2020.
Apply cloud and cloud shadow mask
In this section we'll generate a cloud-free composite for the same region as above that represents mean reflectance for July and August, 2020.
Define collection filter and cloud mask parameters
We'll redefine the parameters to be a little more aggressive, i.e. decrease the cloud probability threshold, increase the cloud projection distance, and increase the buffer. These changes will increase cloud commission error (mask out some clear pixels), but since we will be compositing images from three months, there should be plenty of observations to complete the mosaic.
AOI = ee.Geometry.Point(-122.269, 45.701)
START_DATE = '2020-06-01'
END_DATE = '2020-09-01'
CLOUD_FILTER = 60
CLD_PRB_THRESH = 40
NIR_DRK_THRESH = 0.15
CLD_PRJ_DIST = 2
BUFFER = 100
Build a Sentinel-2 collection
Reassemble the S2-cloudless collection since the collection filter parameters have changed.
s2_sr_cld_col = get_s2_sr_cld_col(AOI, START_DATE, END_DATE)
Define cloud mask application function
Define a function to apply the cloud mask to each image in the collection.
def apply_cld_shdw_mask(img):
    # Subset the cloudmask band and invert it so clouds/shadow are 0, else 1.
    not_cld_shdw = img.select('cloudmask').Not()
    # Subset reflectance bands and update their masks, return the result.
    return img.select('B.*').updateMask(not_cld_shdw)
Process the collection
Add cloud and cloud shadow component bands to each image and then apply the mask to each image. Reduce the collection by median (in your application, you might consider using medoid reduction to build a composite from actual data values, instead of per-band statistics).
s2_sr_median = (s2_sr_cld_col.map(add_cld_shdw_mask)
                             .map(apply_cld_shdw_mask)
                             .median())
Display the cloud-free composite
Display the results. Be patient while the map renders, it may take a minute; ee.Image.reproject is forcing computations to happen at 100 and 20 m scales (i.e. it is not relying on appropriate pyramid level scales for analysis). The issues with ee.Image.reproject being resource-intensive in this case are mostly confined to interactive map viewing. Batch image exports and table reduction exports where the scale parameter is set to typical Sentinel-2 scales (10-60 m) are less affected.
# Create a folium map object.
center = AOI.centroid(10).coordinates().reverse().getInfo()
m = folium.Map(location=center, zoom_start=12)
# Add layers to the folium map.
m.add_ee_layer(s2_sr_median,
                {'bands': ['B4', 'B3', 'B2'], 'min': 0, 'max': 2500, 'gamma': 1.1},
                'S2 cloud-free mosaic', True, 1, 9)
# Add a layer control panel to the map.
m.add_child(folium.LayerControl())
# Display the map.
display(m)

Hopefully you now have a good sense for Sentinel-2 cloud masking in the cloud 😉 with Earth Engine.
 Run in Google Colab
 Run in Google Colab
       View source on GitHub
 View source on GitHub