Getting Started • ERA5Flux

Intro

This package was designed to make it easier to work with both ERA5 and AmeriFlux data, hence the name “ERA5Flux”. There may be instances where the AmeriFlux data contain gaps that you want to fill with ERA5 data to create a single time series for your analysis.

Here is a demonstration of the workflow needed to achieve that result.

Workflow

Step 0: Get AmeriFlux data and site metadata

First we will need to download AmeriFlux data. Navigate to AmeriFlux and login to your account. Once you’re logged in, you can download data at the Download page.

You will be presented with options on the types of data products you would like to download. Select AmeriFlux BASE then select CC-BY-4.0 as the data use policy that you will follow. At the next step, pick the sites that you would like, then write a short description of your intended use for this data and agree to the license.

Finally, you will be presented with a data download page that has download links for the requested sites. Click on those download links to get a zip file for each site. You will also need to click on the “Requested_Files” download link to get the metadata text file for your request. This will download as “requested_files_manifest_YYYYMMDD.txt”.

After you’ve downloaded everything, unzip the site zip files. To ensure that the rest of the workflow runs smoothly, we recommend that you arrange your files like so:

your-working-directory/
├── unzipped_AmeriFlux_data/          
│   ├── unzipped_site1_folder/  
│   │   ├── *.csv
│   │   └── *.xlsx
│   │
│   ├── unzipped_site2_folder/
│   │   ├── *.csv
│   │   └── *.xlsx
│   │
│   ├── unzipped_site3_folder/
│   │   ├── *.csv
│   │   └── *.xlsx
│   │
│   └── requested_files_manifest_YYYYMMDD.txt
│
├── site1_data.zip  
├── site2_data.zip 
├── site3_data.zip 
│
└── more stuff/               
    ├── ...   
    └── ...

After arranging the files, you can now generate the AmeriFlux site metadata. In this example, the folder called unzipped_AmeriFlux_data contains the unzipped site folders and requested files manifest.

# Load the package
library(ERA5Flux)

# Specify the ERA5 variables you want to get
my_variables <- c("surface_solar_radiation_downwards")

# Point to the folder containing the unzipped site folders and requested files manifest
my_folder <- system.file("extdata", "unzipped_AmeriFlux_data", package = "ERA5Flux")

# Generate the AmeriFlux site metadata file
my_site_metadata <- get_site_metadata(folder = my_folder,
                                      selected_variables = my_variables)
#> selected variables: surface_solar_radiation_downwards
#> Now checking: US-GL2
#> Rows: 7248 Columns: 1
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (1): TIMESTAMP_START
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

my_site_metadata
#>   site_codes     lat      lon   startdate      enddate
#> 1     US-GL2 46.7167 -87.4000 2.02501e+11 202505312330
#>                           variables
#> 1 surface_solar_radiation_downwards

The metadata data frame will have 5 columns: site_codes, lat, lon, startdate, enddate, and variables. We will need this metadata in order to build our ERA5 data requests.

Step 1: Download ERA5 Data

After you get the AmeriFlux site metadata, you can now go ahead and download ERA5 data.

But before that, there are a few steps you need to do.

First, make sure you create an account at the Copernicus Climate Data Store if you haven’t done so already. You may need to accept the data license agreement first before you can download the data. Visit the Climate Data Store User Profile page to accept the appropriate license(s).

Then, download_ERA5() requires a land-sea mask so you must download that first with get_land_sea_mask(). This function will download the land-sea mask from here

# Download the land-sea mask if you haven't done so already
get_land_sea_mask()

Now you are ready to download ERA5 data. Copy and paste your Climate Data Store API token into the my_token argument. Provide the AmeriFlux site metadata in site_metadata, and set the file path to your land-sea mask as well. Finally, specify a folder path to where you want to download your ERA5 data to.

# Paste your token 
my_token <- "my_ECMWF_token"

# Set the path to your land-sea mask
path_to_mask <- "lsm_1279l4_0.1x0.1.grb_v4_unpack.nc"

# Point to the folder where you want the ERA5 data to download to
my_download_path <- "path_to_ERA5_download_folder"

# Download the ERA5 data
download_ERA5(my_token = my_token,
              site_metadata = my_site_metadata,
              mask = path_to_mask,
              download_path = my_download_path)

Step 2: Data Processing

After downloading the ERA5 .nc files, we convert them into CSV files formatted to match AmeriFlux standards, enabling easy merging with existing datasets. This conversion is handled by netcdf_df_formatter(), which adjusts ERA5 timestamps from UTC to local time (yyyyMMddHHmm) based on each site’s coordinates and converts solar radiation (ssrd), air temperature (t2m), and total precipitation (tp) to AmeriFlux units. The netcdf_to_csv() function then calls netcdf_df_formatter() for each site folder, merging all relevant data by site and year so that each output CSV contains one site’s data for one year. If full_year = TRUE, the function will only return full years that begin with the first hour of the year and end with the last. For the purposes of this example, set full_year = FALSE.

# Point to a folder containing ERA5 .nc files
site_folder <- system.file("extdata", "path_to_ERA5_download_folder", package = "ERA5Flux")

# Specify a site name
site_name <- "US_GL2"

# Create a temporary directory to export our output to
output_filepath <- tempdir()

# Convert NetCDF data to a CSV file
netcdf_to_csv(site_folder, output_filepath, site_name, full_year = FALSE)
#> Saved: US_GL2_2024_2025_ssrd.csv

# Read the CSV back in
data <- read.csv(list.files(output_filepath, pattern = "US_GL2", full.names = TRUE))

head(data)
#>           time     ssrd
#> 1 202412311900 663.4275
#> 2 202412312000   0.0000
#> 3 202412312100   0.0000
#> 4 202412312200   0.0000
#> 5 202412312300   0.0000
#> 6 202501010000   0.0000

Step 3: Merging and Blending AmeriFlux with ERA5 Data

After you processed the ERA5 data, you can merge it with the AmeriFlux data.

The merge_ERA5_Flux() function synchronizes AmeriFlux tower observations with ERA5 reanalysis data, ensuring both datasets share consistent timestamps and comparable variable names. It first reads AmeriFlux BASE data, replaces missing values with NA, and imports the processed ERA5 CSV file generated from netcdf_to_csv(). Because ERA5 data are typically hourly and AmeriFlux data are often half-hourly, the function linearly interpolates ERA5 variables to match the AmeriFlux time step. It then merges the datasets based on the aligned time column, adding corresponding AmeriFlux and ERA5 variables side by side. The resulting data frame provides a harmonized, site-specific time series that researchers can use to compare reanalysis and in situ flux measurements, fill short data gaps, and prepare blended climate–flux inputs for ecosystem or hydrological modeling.

# Point to AmeriFlux data
ameriflux_file <- system.file("extdata", "AMF_US-GL2_BASE-BADM_1-5.zip", package = "ERA5Flux")
# Point to ERA5 data
era5_file <- list.files(output_filepath, pattern = "US_GL2", full.names = TRUE)

# List AmeriFlux variable(s) to be merged 
ameriflux_var <- c("SW_IN")
# List ERA5 variable(s) to be merged 
era5_var <- c("ssrd")

# Merge them together
merged_data <- merge_ERA5_Flux(filename_FLUX = ameriflux_file,
                               filename_ERA5 = era5_file,
                               varname_FLUX = ameriflux_var,
                               varname_ERA5 = era5_var)

merged_data[1:20,]
#>                   time     ssrd SW_IN
#> 1  2024-12-31 19:00:00 663.4275    NA
#> 2  2024-12-31 19:30:00 331.7138    NA
#> 3  2024-12-31 20:00:00   0.0000    NA
#> 4  2024-12-31 20:30:00   0.0000    NA
#> 5  2024-12-31 21:00:00   0.0000    NA
#> 6  2024-12-31 21:30:00   0.0000    NA
#> 7  2024-12-31 22:00:00   0.0000    NA
#> 8  2024-12-31 22:30:00   0.0000    NA
#> 9  2024-12-31 23:00:00   0.0000    NA
#> 10 2024-12-31 23:30:00   0.0000    NA
#> 11 2025-01-01 00:00:00   0.0000 0.483
#> 12 2025-01-01 00:30:00   0.0000 0.518
#> 13 2025-01-01 01:00:00   0.0000 0.541
#> 14 2025-01-01 01:30:00   0.0000 0.476
#> 15 2025-01-01 02:00:00   0.0000 0.442
#> 16 2025-01-01 02:30:00   0.0000 0.438
#> 17 2025-01-01 03:00:00   0.0000 0.500
#> 18 2025-01-01 03:30:00   0.0000 0.449
#> 19 2025-01-01 04:00:00   0.0000 0.502
#> 20 2025-01-01 04:30:00   0.0000 0.478

As shown above, the merged data contains ERA5 data in ssrd, as well as AmeriFlux data in SW_IN.

Finally, you can blend the merged data.

The blend_ERA5_Flux() function combines AmeriFlux and ERA5 datasets into a single, gap-filled time series by applying flexible blending rules that determine how missing or incomplete AmeriFlux data should be replaced or adjusted using ERA5 values. Building on the merged output from merge_ERA5_Flux(), this function allows users to select among four blending approaches: simple replacement, linear regression with or without an intercept, or an automatic rule that adapts based on data completeness. For example, when more than half of the AmeriFlux values are available, the function uses a regression-based correction to preserve site-specific patterns while filling gaps; otherwise, it substitutes ERA5 data directly. The blended output adds new columns (e.g., SW_IN_f, TA_f) that contain the harmonized variables, producing a continuous, high-quality dataset suitable for modeling, data assimilation, or long-term flux-climate analysis.

The final result will look something like this:

# Specify the blending rule(s)
# If you have multiple variables, specify a rule for each variable
blending_rule <- c("replace")

# Blend them together
blended_data <- blend_ERA5_Flux(merged_data = merged_data,
                                varname_FLUX = ameriflux_var,
                                varname_ERA5 = era5_var,
                                blending_rule = blending_rule)
#> Processing: SW_IN using rule: replace

blended_data[1:20,]
#>                   time     ssrd SW_IN  SW_IN_f
#> 1  2024-12-31 19:00:00 663.4275    NA 663.4275
#> 2  2024-12-31 19:30:00 331.7138    NA 331.7138
#> 3  2024-12-31 20:00:00   0.0000    NA   0.0000
#> 4  2024-12-31 20:30:00   0.0000    NA   0.0000
#> 5  2024-12-31 21:00:00   0.0000    NA   0.0000
#> 6  2024-12-31 21:30:00   0.0000    NA   0.0000
#> 7  2024-12-31 22:00:00   0.0000    NA   0.0000
#> 8  2024-12-31 22:30:00   0.0000    NA   0.0000
#> 9  2024-12-31 23:00:00   0.0000    NA   0.0000
#> 10 2024-12-31 23:30:00   0.0000    NA   0.0000
#> 11 2025-01-01 00:00:00   0.0000 0.483   0.0000
#> 12 2025-01-01 00:30:00   0.0000 0.518   0.0000
#> 13 2025-01-01 01:00:00   0.0000 0.541   0.0000
#> 14 2025-01-01 01:30:00   0.0000 0.476   0.0000
#> 15 2025-01-01 02:00:00   0.0000 0.442   0.0000
#> 16 2025-01-01 02:30:00   0.0000 0.438   0.0000
#> 17 2025-01-01 03:00:00   0.0000 0.500   0.0000
#> 18 2025-01-01 03:30:00   0.0000 0.449   0.0000
#> 19 2025-01-01 04:00:00   0.0000 0.502   0.0000
#> 20 2025-01-01 04:30:00   0.0000 0.478   0.0000

In this example, the simple replacement blending rule was used, so a new SW_IN_f column was created with values from the ERA5 column, ssrd.