Intro
This package was designed to make it easier to work with both ERA5 and AmeriFlux data, hence the name “ERA5Flux”. There may be instances where the AmeriFlux data contain gaps that you want to fill with ERA5 data to create a single time series for your analysis.
Here is a demonstration of the workflow needed to achieve that result.
Workflow
Step 0: Get AmeriFlux data and site metadata
First we will need to download AmeriFlux data. Navigate to AmeriFlux and login to your account. Once you’re logged in, you can download data at the Download page.
You will be presented with options on the types of data products you would like to download. Select AmeriFlux BASE then select CC-BY-4.0 as the data use policy that you will follow. At the next step, pick the sites that you would like, then write a short description of your intended use for this data and agree to the license.
Finally, you will be presented with a data download page that has download links for the requested sites. Click on those download links to get a zip file for each site. You will also need to click on the “Requested_Files” download link to get the metadata text file for your request. This will download as “requested_files_manifest_YYYYMMDD.txt”.
After you’ve downloaded everything, unzip the site zip files. To ensure that the rest of the workflow runs smoothly, we recommend that you arrange your files like so:
your-working-directory/
├── unzipped_AmeriFlux_data/
│ ├── unzipped_site1_folder/
│ │ ├── *.csv
│ │ └── *.xlsx
│ │
│ ├── unzipped_site2_folder/
│ │ ├── *.csv
│ │ └── *.xlsx
│ │
│ ├── unzipped_site3_folder/
│ │ ├── *.csv
│ │ └── *.xlsx
│ │
│ └── requested_files_manifest_YYYYMMDD.txt
│
├── site1_data.zip
├── site2_data.zip
├── site3_data.zip
│
└── more stuff/
├── ...
└── ...
After arranging the files, you can now generate the AmeriFlux site
metadata. In this example, the folder called
unzipped_AmeriFlux_data contains the unzipped site folders
and requested files manifest.
# Load the package
library(ERA5Flux)
# Specify the ERA5 variables you want to get
my_variables <- c("surface_solar_radiation_downwards")
# Point to the folder containing the unzipped site folders and requested files manifest
my_folder <- system.file("extdata", "unzipped_AmeriFlux_data", package = "ERA5Flux")
# Generate the AmeriFlux site metadata file
my_site_metadata <- get_site_metadata(folder = my_folder,
selected_variables = my_variables)
#> selected variables: surface_solar_radiation_downwards
#> Now checking: US-GL2
#> Rows: 7248 Columns: 1
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (1): TIMESTAMP_START
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
my_site_metadata
#> site_codes lat lon startdate enddate
#> 1 US-GL2 46.7167 -87.4000 2.02501e+11 202505312330
#> variables
#> 1 surface_solar_radiation_downwardsThe metadata data frame will have 5 columns: site_codes,
lat, lon, startdate,
enddate, and variables. We will need this
metadata in order to build our ERA5 data requests.
Step 1: Download ERA5 Data
After you get the AmeriFlux site metadata, you can now go ahead and download ERA5 data.
But before that, there are a few steps you need to do.
First, make sure you create an account at the Copernicus Climate Data Store if you haven’t done so already. You may need to accept the data license agreement first before you can download the data. Visit the Climate Data Store User Profile page to accept the appropriate license(s).
Then, download_ERA5() requires a land-sea mask so you
must download that first with get_land_sea_mask(). This
function will download the land-sea mask from here
# Download the land-sea mask if you haven't done so already
get_land_sea_mask()Now you are ready to download ERA5 data. Copy and paste your Climate
Data Store API token into the my_token argument. Provide
the AmeriFlux site metadata in site_metadata, and set the
file path to your land-sea mask as well. Finally, specify a folder path
to where you want to download your ERA5 data to.
# Paste your token
my_token <- "my_ECMWF_token"
# Set the path to your land-sea mask
path_to_mask <- "lsm_1279l4_0.1x0.1.grb_v4_unpack.nc"
# Point to the folder where you want the ERA5 data to download to
my_download_path <- "path_to_ERA5_download_folder"
# Download the ERA5 data
download_ERA5(my_token = my_token,
site_metadata = my_site_metadata,
mask = path_to_mask,
download_path = my_download_path)Step 2: Data Processing
After downloading the ERA5 .nc files, we convert them into CSV files
formatted to match AmeriFlux standards, enabling easy merging with
existing datasets. This conversion is handled by
netcdf_df_formatter(), which adjusts ERA5 timestamps from
UTC to local time (yyyyMMddHHmm) based on each site’s coordinates and
converts solar radiation (ssrd), air temperature (t2m), and total
precipitation (tp) to AmeriFlux units. The netcdf_to_csv()
function then calls netcdf_df_formatter() for each site
folder, merging all relevant data by site and year so that each output
CSV contains one site’s data for one year. If
full_year = TRUE, the function will only return full years
that begin with the first hour of the year and end with the last. For
the purposes of this example, set full_year = FALSE.
# Point to a folder containing ERA5 .nc files
site_folder <- system.file("extdata", "path_to_ERA5_download_folder", package = "ERA5Flux")
# Specify a site name
site_name <- "US_GL2"
# Create a temporary directory to export our output to
output_filepath <- tempdir()
# Convert NetCDF data to a CSV file
netcdf_to_csv(site_folder, output_filepath, site_name, full_year = FALSE)
#> Saved: US_GL2_2024_2025_ssrd.csv
# Read the CSV back in
data <- read.csv(list.files(output_filepath, pattern = "US_GL2", full.names = TRUE))
head(data)
#> time ssrd
#> 1 202412311900 663.4275
#> 2 202412312000 0.0000
#> 3 202412312100 0.0000
#> 4 202412312200 0.0000
#> 5 202412312300 0.0000
#> 6 202501010000 0.0000Step 3: Merging and Blending AmeriFlux with ERA5 Data
After you processed the ERA5 data, you can merge it with the AmeriFlux data.
The merge_ERA5_Flux() function synchronizes AmeriFlux
tower observations with ERA5 reanalysis data, ensuring both datasets
share consistent timestamps and comparable variable names. It first
reads AmeriFlux BASE data, replaces missing values with NA,
and imports the processed ERA5 CSV file generated from
netcdf_to_csv(). Because ERA5 data are typically hourly and
AmeriFlux data are often half-hourly, the function linearly interpolates
ERA5 variables to match the AmeriFlux time step. It then merges the
datasets based on the aligned time column, adding corresponding
AmeriFlux and ERA5 variables side by side. The resulting data frame
provides a harmonized, site-specific time series that researchers can
use to compare reanalysis and in situ flux measurements, fill short data
gaps, and prepare blended climate–flux inputs for ecosystem or
hydrological modeling.
# Point to AmeriFlux data
ameriflux_file <- system.file("extdata", "AMF_US-GL2_BASE-BADM_1-5.zip", package = "ERA5Flux")
# Point to ERA5 data
era5_file <- list.files(output_filepath, pattern = "US_GL2", full.names = TRUE)
# List AmeriFlux variable(s) to be merged
ameriflux_var <- c("SW_IN")
# List ERA5 variable(s) to be merged
era5_var <- c("ssrd")
# Merge them together
merged_data <- merge_ERA5_Flux(filename_FLUX = ameriflux_file,
filename_ERA5 = era5_file,
varname_FLUX = ameriflux_var,
varname_ERA5 = era5_var)
merged_data[1:20,]
#> time ssrd SW_IN
#> 1 2024-12-31 19:00:00 663.4275 NA
#> 2 2024-12-31 19:30:00 331.7138 NA
#> 3 2024-12-31 20:00:00 0.0000 NA
#> 4 2024-12-31 20:30:00 0.0000 NA
#> 5 2024-12-31 21:00:00 0.0000 NA
#> 6 2024-12-31 21:30:00 0.0000 NA
#> 7 2024-12-31 22:00:00 0.0000 NA
#> 8 2024-12-31 22:30:00 0.0000 NA
#> 9 2024-12-31 23:00:00 0.0000 NA
#> 10 2024-12-31 23:30:00 0.0000 NA
#> 11 2025-01-01 00:00:00 0.0000 0.483
#> 12 2025-01-01 00:30:00 0.0000 0.518
#> 13 2025-01-01 01:00:00 0.0000 0.541
#> 14 2025-01-01 01:30:00 0.0000 0.476
#> 15 2025-01-01 02:00:00 0.0000 0.442
#> 16 2025-01-01 02:30:00 0.0000 0.438
#> 17 2025-01-01 03:00:00 0.0000 0.500
#> 18 2025-01-01 03:30:00 0.0000 0.449
#> 19 2025-01-01 04:00:00 0.0000 0.502
#> 20 2025-01-01 04:30:00 0.0000 0.478As shown above, the merged data contains ERA5 data in
ssrd, as well as AmeriFlux data in SW_IN.
Finally, you can blend the merged data.
The blend_ERA5_Flux() function combines AmeriFlux and
ERA5 datasets into a single, gap-filled time series by applying flexible
blending rules that determine how missing or incomplete AmeriFlux data
should be replaced or adjusted using ERA5 values. Building on the merged
output from merge_ERA5_Flux(), this function allows users
to select among four blending approaches: simple replacement, linear
regression with or without an intercept, or an automatic rule that
adapts based on data completeness. For example, when more than half of
the AmeriFlux values are available, the function uses a regression-based
correction to preserve site-specific patterns while filling gaps;
otherwise, it substitutes ERA5 data directly. The blended output adds
new columns (e.g., SW_IN_f, TA_f) that contain
the harmonized variables, producing a continuous, high-quality dataset
suitable for modeling, data assimilation, or long-term flux-climate
analysis.
The final result will look something like this:
# Specify the blending rule(s)
# If you have multiple variables, specify a rule for each variable
blending_rule <- c("replace")
# Blend them together
blended_data <- blend_ERA5_Flux(merged_data = merged_data,
varname_FLUX = ameriflux_var,
varname_ERA5 = era5_var,
blending_rule = blending_rule)
#> Processing: SW_IN using rule: replace
blended_data[1:20,]
#> time ssrd SW_IN SW_IN_f
#> 1 2024-12-31 19:00:00 663.4275 NA 663.4275
#> 2 2024-12-31 19:30:00 331.7138 NA 331.7138
#> 3 2024-12-31 20:00:00 0.0000 NA 0.0000
#> 4 2024-12-31 20:30:00 0.0000 NA 0.0000
#> 5 2024-12-31 21:00:00 0.0000 NA 0.0000
#> 6 2024-12-31 21:30:00 0.0000 NA 0.0000
#> 7 2024-12-31 22:00:00 0.0000 NA 0.0000
#> 8 2024-12-31 22:30:00 0.0000 NA 0.0000
#> 9 2024-12-31 23:00:00 0.0000 NA 0.0000
#> 10 2024-12-31 23:30:00 0.0000 NA 0.0000
#> 11 2025-01-01 00:00:00 0.0000 0.483 0.0000
#> 12 2025-01-01 00:30:00 0.0000 0.518 0.0000
#> 13 2025-01-01 01:00:00 0.0000 0.541 0.0000
#> 14 2025-01-01 01:30:00 0.0000 0.476 0.0000
#> 15 2025-01-01 02:00:00 0.0000 0.442 0.0000
#> 16 2025-01-01 02:30:00 0.0000 0.438 0.0000
#> 17 2025-01-01 03:00:00 0.0000 0.500 0.0000
#> 18 2025-01-01 03:30:00 0.0000 0.449 0.0000
#> 19 2025-01-01 04:00:00 0.0000 0.502 0.0000
#> 20 2025-01-01 04:30:00 0.0000 0.478 0.0000In this example, the simple replacement blending rule was used, so a
new SW_IN_f column was created with values from the ERA5
column, ssrd.