Dataset-Element.Rmd
The following libraries are needed to create a working EML document.
This document explains how to add multiple datasets to one EML document and how to add non tabular data to an EML document. This document will not explain how to create a complete valid EML document to upload to the EDI repository. Please use this article just for the dataset section of the EML document and refer to the EML template helper to generate the entire document.
In this example we included two datasets by creating a tibble where each row gives information on one dataset. To follow along with this example please download the files for the multi-dataset-example.
The tibble has 4-6 columns:
datatable
column which contains the file paths to the datatablesdatatable_name
column which contains the names of the datatablesdatatable_description
column which contains a brief description of dataattribute_info
column which contains the file path to the metadata describing the dataset (should be a metadata.xlsx)datatable_description
column which contains a brief description of datadataset_methods
column that reads file path to dataset specific methodsadditional_information
column that adds additional information that is pertinent to the entry, please add it as a string description.In this example there are no dataset specific methods or additional information so those columns are not included.
To add more than two datasets please follow the same structure adding additional an additional row for each dataset.
dataset_files <- dplyr::tibble(datatable = c("multi-dataset-example/enclosure-study-gut-contents-data.csv",
"multi-dataset-example/enclosure-study-growth-rates-data.csv"),
datatable_name = c("enclosure-study-gut-contents-data.csv",
"enclosure-study-growth-rates-data.csv"),
attribute_info = c("multi-dataset-example/enclosure-study-gut-contents-metadata.xlsx",
"multi-dataset-example/enclosure-study-growth-rates-example-metadata.xlsx"),
datatable_description = c("Hannon Salmonid Dataset Enclosure Study Gut Contents Data",
"Hannon Salmonid Dataset Enclosure Study Growth Rates Data"))
We need to add each data table. The data table element includes the datatable name, a physical
section, and the attribute_list
. These sections are all lists which must be created first then added to a named dataTable
list.
The code below create a data_tables
list where we append each data table. We have the code_helper
and the attributes_and_codes
that help create the attribute_list
using the add_attribute
function. We create the physical
element using the add_physical
function.
Please make sure you review what type of attribute you are providing and what inputs are necessary. These values can then be inputted into the “attribute” tab in the “example-metadata.xlsx” excel file. If you are using a “nominal” or “ordinal” attribute which is “enumerated”, (it has a specific code definition), please also use the tab “code_definitions”. Provide each unique code and definition, with its attribute_name
aligning to that which is in the “attribute” tab. An example is present currently to help better showcase this. Every single column in the dataTable must have a described attribute to match EDI congruence checker.
adds_datatable <- function(datatable, datatable_name, attribute_info, datatable_description, dataset_methods = NULL, additional_info = NULL){
attribute_table <- readxl::read_xlsx(attribute_info, sheet = "attribute")
codes <- readxl::read_xlsx(attribute_info, sheet = "code_definitions")
attribute_list <- list()
attribute_names <- unique(codes$attribute_name)
# Code helper function
code_helper <- function(code, definitions) {
codeDefinition <- list(code = code, definition = definitions)
}
# Attribute helper function to input into pmap
attributes_and_codes <- function(attribute_name, attribute_definition, storage_type,
measurement_scale, domain, type, units, unit_precision,
number_type, date_time_format, date_time_precision, minimum, maximum,
attribute_label){
if (domain %in% "enumerated") {
definition <- list()
current_codes <- codes[codes$attribute_name == attribute_name, ]
definition$codeDefinition <- purrr::pmap(current_codes %>% select(-attribute_name), code_helper)
} else {
definition = attribute_definition
}
new_attribute <- add_attribute(attribute_name = attribute_name, attribute_definition = attribute_definition,
storage_type = storage_type, measurement_scale = measurement_scale,
domain = domain, definition = definition, type = type, units = units,
unit_precision = unit_precision, number_type = number_type,
date_time_format = date_time_format, date_time_precision = date_time_precision,
minimum = minimum, maximum = maximum, attribute_label = attribute_label)
}
attribute_list$attribute <- purrr::pmap(attribute_table, attributes_and_codes)
physical <- add_physical(file_path = datatable)
dataTable <- list(entityName = datatable_name,
entityDescription = datatable_description,
physical = physical,
attributeList = attribute_list)
}
data_tables <- purrr::pmap(dataset_files, adds_datatable)
This data_table
list can now be added in as dataTable = data_tables
in the dataset list.
If you have vector data please append the data elements in a spatialVector
named section of the dataset. In addition to the components that are required for tabular data you must include a geometry for your dataset. This should be added in the “dataset tab” of the “example-metadata.xlsx” excel file.
To learn how to create the other dataset elements physical
and attribute_list
please see the EML template helper
data_tables <- list(spatialVector = list(entityName = dataset_file,
entityDescription = metadata$dataset$name,
physical = physical,
attributeList = attribute_list,
geometry = metadata$dataset$geometry))
This data_table
list can now be added in as dataTable = data_tables
in the dataset list.
If you have raster data please append the data elements in a spatialRaster
named section of the dataset. In addition to the components that are required for tabular data you must include the:
spatial_reference
horizontal_accuracy
vertical_accuracy
cell_size_x
cell_size_y
number_of_bands
raster_origin
rows
columns
verticals
cell_geometry
All of these required fields must be added as inputs to the add_raster()
function.
data_tables <- list(spatialRaster = list(add_raster()))
This data_table
list can now be added in as dataTable = data_tables
in the dataset list.