library(brandr)
library(fs)
library(geodata)
library(ggplot2)
library(groomr) # github.com/danielvartan/groomr
library(here)
library(httr2)
library(magrittr)
library(ncdf4)
library(orbis) # github.com/danielvartan/orbis
library(osfr)
library(readr)
library(stringr)
library(terra)
library(tidyterra)
library(zip)
A Reproducible Pipeline for Processing the Global Dataset of Historical Yields (1981-2016) by Iizumi et al.
Overview
This report contains a reproducible pipeline for processing the Global Dataset of Historical Yield (Iizumi, 2019; Iizumi & Sakai, 2020) in R.
Data Availability
The processed data are available in the tif
format via a dedicated repository on the Open Science Framework (OSF), accessible here. You can also access these files directly from R using the osfr
package.
Methods
Source of Data
The data used in this analysis come from the following source:
- PANGAEA: A data publisher for earth and environmental sciences, which hosts the Global Dataset of Historical Yield (Iizumi, 2019; Iizumi & Sakai, 2020).
Data Munging
The data munging followed the data science workflow outlined by Wickham et al. (2023), as illustrated in Figure 1. All processes were made using the Quarto publishing system (Allaire et al., n.d.), the R programming language (R Core Team, n.d.) and several R packages.
Spatial data analysis was performed using the terra R package. For data manipulation and workflow, packages from the tidyverse and rOpenSci ecosystems—adhering to the tidy tools manifesto (Wickham, 2023)—were prioritized. All steps were designed to ensure transparency and reproducibility of results.
Source: Reproduced from Wickham et al. (2023).
Code Style
The Tidyverse code style guide and design principles were followed to ensure consistency and enhance readability.
Reproducibility
The pipeline is fully reproducible and can be run again at any time. See the README file in the code repository to learn how to run it.
Set the Environment
Set the Initial Variables
crop <- c(
"maize", "maize_major", "maize_second", "rice", "rice_major", "rice_second",
"soybean", "wheat", "wheat_spring", "wheat_winter"
)
crop <- "rice"
raw_data_dir <- here("data-raw")
valid_data_dir <- here("data")
Code
dirs <- c(raw_data_dir, valid_data_dir)
for (i in dirs) {
if (!dir_exists(i)) {
dir_create(i, recurse = TRUE)
}
}
Download the Data
raw_file <- here(raw_data_dir, "raw.zip")
paste0(
"https://store.pangaea.de/Publications/",
"IizumiT_2019/",
"global-historical-yield_v1.2_v1.3_20190128.zip"
) |>
request() |>
req_progress() |>
req_perform(raw_file)
Unzip the Data
raw_file |> unzip(exdir = raw_data_dir, overwrite = TRUE)
file_delete(raw_file)
Read the Data
dir <- here(raw_data_dir, crop)
files <- dir |> dir_ls(type = "file", regexp = "\\.nc4$")
data <- files |> rast()
Tidy the Data
years <- files |> str_extract("\\d{4}")
data <- data |> shift_and_rotate(dx = 180)
Data Dictionary
The data dictionary for the Global Dataset of Historical Yield is available here. For detailed information, see Iizumi (2019) and Iizumi & Sakai (2020).
Save the Data
data |> terra::writeRaster(valid_file, overwrite = TRUE)
Visualize the Data
world_shape <- world(path = raw_data_dir)
for (i in cut_vector(names(data), n = 9)) {
i_data <- data |> select(all_of(i))
i_plot <-
ggplot() +
geom_spatvector(data = world_shape) +
geom_spatraster(data = i_data) +
facet_wrap(~lyr, ncol = 2) +
scale_fill_brand_b(
direction = -1,
limits = c(0, max_value),
breaks = seq(0, max_value, 5) |> remove_caps()
) +
labs(fill = "Yield (t/ha)")
print(i_plot)
}
How to Cite
When using this data, you must also cite the original data sources.
To cite this work, please use the following format:
Vartanian, D., & Carvalho, A. M. (2025). A reproducible pipeline for processing the Global Dataset of Historical Yields (1981–2016) by Iizumi et al. [Computer software]. Sustentarea Research and Extension Group of the University of São Paulo. https://sustentarea.github.io/global-historical-yield
A BibTeX entry for LaTeX users is
@misc{vartanian2025,
title = {A reproducible pipeline for processing the Global Dataset of Historical Yields (1981–2016) by Iizumi et al.},
author = {{Daniel Vartanian} and {Aline Martins de Carvalho}},
year = {2025},
address = {São Paulo},
institution = {Sustentarea Research and Extension Group of the University of São Paulo},
langid = {en},
url = {https://sustentarea.github.io/global-historical-yield}
}
License
The original data sources may have their own license terms and conditions.
The code in this report is licensed under the GNU General Public License Version 3, while the report is available under the Creative Commons CC0 License.
Copyright (C) 2025 Daniel Vartanian
The code in this report is free software: you can redistribute it and/or
modify it under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your option)
any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program. If not, see <https://www.gnu.org/licenses/>.
Acknowledgments
This work is part of the Sustentarea Research and Extension Group project: Global syndemic: The impact of anthropogenic climate change on the health and nutrition of children under five years old attended by Brazil’s public health system (SUS).
This work was supported by the Department of Science and Technology of the Secretariat of Science, Technology, and Innovation and of the Health Economic-Industrial Complex (SECTICS) of the Ministry of Health of Brazil, and the National Council for Scientific and Technological Development (CNPq) (grant no. 444588/2023-0).