The following libraries are needed to create a working EML document.

library(EMLaide)
library(tidyverse)
library(readxl)
library(EML)

Handle Special Case Datasets

This document explains how to add multiple datasets to one EML document and how to add non tabular data to an EML document. This document will not explain how to create a complete valid EML document to upload to the EDI repository. Please use this article just for the dataset section of the EML document and refer to the EML template helper to generate the entire document.

Multiple datsets for one data package

In this example we included two datasets by creating a tibble where each row gives information on one dataset. To follow along with this example please download the files for the multi-dataset-example.

The tibble has 4-6 columns:

A datatable column which contains the file paths to the datatables
A datatable_name column which contains the names of the datatables
A datatable_description column which contains a brief description of data
An attribute_info column which contains the file path to the metadata describing the dataset (should be a metadata.xlsx)
A datatable_description column which contains a brief description of data
(Optional) An dataset_methods column that reads file path to dataset specific methods
(Optional) An additional_information column that adds additional information that is pertinent to the entry, please add it as a string description.

In this example there are no dataset specific methods or additional information so those columns are not included.

To add more than two datasets please follow the same structure adding additional an additional row for each dataset.

dataset_files <- dplyr::tibble(datatable = c("multi-dataset-example/enclosure-study-gut-contents-data.csv",
                                            "multi-dataset-example/enclosure-study-growth-rates-data.csv"),
                               datatable_name = c("enclosure-study-gut-contents-data.csv", 
                                                  "enclosure-study-growth-rates-data.csv"),
                               attribute_info = c("multi-dataset-example/enclosure-study-gut-contents-metadata.xlsx",
                                                  "multi-dataset-example/enclosure-study-growth-rates-example-metadata.xlsx"),
                               datatable_description = c("Hannon Salmonid Dataset Enclosure Study Gut Contents Data",
                                                         "Hannon Salmonid Dataset Enclosure Study Growth Rates Data"))

Data Tables

We need to add each data table. The data table element includes the datatable name, a physical section, and the attribute_list. These sections are all lists which must be created first then added to a named dataTable list.

The code below create a data_tables list where we append each data table. We have the code_helper and the attributes_and_codes that help create the attribute_list using the add_attribute function. We create the physical element using the add_physical function.

Please make sure you review what type of attribute you are providing and what inputs are necessary. These values can then be inputted into the “attribute” tab in the “example-metadata.xlsx” excel file. If you are using a “nominal” or “ordinal” attribute which is “enumerated”, (it has a specific code definition), please also use the tab “code_definitions”. Provide each unique code and definition, with its attribute_name aligning to that which is in the “attribute” tab. An example is present currently to help better showcase this. Every single column in the dataTable must have a described attribute to match EDI congruence checker.

adds_datatable <- function(datatable, datatable_name, attribute_info, datatable_description, dataset_methods = NULL, additional_info = NULL){

  attribute_table <- readxl::read_xlsx(attribute_info, sheet = "attribute")
  codes <- readxl::read_xlsx(attribute_info, sheet = "code_definitions")
  attribute_list <- list()
  attribute_names <- unique(codes$attribute_name)
  
  # Code helper function 
  code_helper <- function(code, definitions) {
  codeDefinition <- list(code = code, definition = definitions)
  }
  # Attribute helper function to input into pmap
  attributes_and_codes <- function(attribute_name, attribute_definition, storage_type, 
                                   measurement_scale, domain, type, units, unit_precision, 
                                   number_type, date_time_format, date_time_precision, minimum, maximum, 
                                   attribute_label){
    if (domain %in% "enumerated") { 
      definition <- list()
      current_codes <- codes[codes$attribute_name == attribute_name, ]
      definition$codeDefinition <- purrr::pmap(current_codes %>% select(-attribute_name), code_helper) 
    } else {
      definition = attribute_definition
    }
    new_attribute <- add_attribute(attribute_name = attribute_name, attribute_definition = attribute_definition,
                                   storage_type = storage_type, measurement_scale = measurement_scale, 
                                   domain = domain, definition = definition, type = type, units = units, 
                                   unit_precision = unit_precision, number_type = number_type, 
                                   date_time_format = date_time_format, date_time_precision = date_time_precision, 
                                   minimum = minimum, maximum = maximum, attribute_label = attribute_label)
  }
  attribute_list$attribute <- purrr::pmap(attribute_table, attributes_and_codes)
  
  physical <- add_physical(file_path = datatable)
  dataTable <- list(entityName = datatable_name,
                    entityDescription = datatable_description,
                    physical = physical,
                    attributeList = attribute_list)
}
data_tables <- purrr::pmap(dataset_files, adds_datatable)

This data_table list can now be added in as dataTable = data_tables in the dataset list.

Diffrent types of datasets

Vector Data

If you have vector data please append the data elements in a spatialVector named section of the dataset. In addition to the components that are required for tabular data you must include a geometry for your dataset. This should be added in the “dataset tab” of the “example-metadata.xlsx” excel file.

To learn how to create the other dataset elements physical and attribute_list please see the EML template helper

data_tables <- list(spatialVector = list(entityName = dataset_file,
                                         entityDescription = metadata$dataset$name,
                                         physical = physical,
                                         attributeList = attribute_list,
                                         geometry = metadata$dataset$geometry))

This data_table list can now be added in as dataTable = data_tables in the dataset list.

Raster Data

If you have raster data please append the data elements in a spatialRaster named section of the dataset. In addition to the components that are required for tabular data you must include the:

spatial_reference
horizontal_accuracy
vertical_accuracy
cell_size_x
cell_size_y
number_of_bands
raster_origin
rows
columns
verticals
cell_geometry All of these required fields must be added as inputs to the add_raster() function.

data_tables <- list(spatialRaster = list(add_raster()))

This data_table list can now be added in as dataTable = data_tables in the dataset list.