template-doc.Rmd
This document provides guidance for completing each element within the metadata template directory. Within the top level of the directory you will find the following assets for recording your projects metadata:
Additionally, within the metadata template directory is a sub directory with completed examples.
If you have any questions regarding metadata, please contact the acting CVPIA data managers at ecain@flowwest.com or erodriguez@flowwest.com.
In “method.docx” describe the methods followed in the creation of the dataset, including description of field, laboratory and processing steps, sampling methods and units, quality control procedures. What were the actual procedures that are used in the creation or the subsequent processing of the dataset? Also, describe processes that have been used to define or improve the quality of a data file, or to identify potential problems with the data file.
Please be specific, include instrument descriptions, or point to a protocol online. If this is a data compilation please specify datasets used, preferably their DOI or URL plus general citation information.
If the project includes more than one dataset, make a section in the document for each dataset and the method used.
In “abstract.docx” give a brief overview of the data resource, dataset, and/or project. The abstract will be used for full-text searches, and it should be rich with descriptive text. In particular, descriptions should include information that does not fit into structured metadata, and focus on the “what”, “when”, and “where” information, general taxonomic information, as well as whether the dataset is ongoing or completed. Some general methods description is appropriate, and broad classes of measured parameters should also be included. For a large number of parameters, use categories instead of listing all parameters (e.g. use the term “nutrients” instead of nitrate, phosphate, calcium, etc.), in combination with the parameters that seem most relevant for searches.
The abstract can be at the project level or you can provide an abstract for each dataset associated with the project.
All other information will be inputted into project-metadata.xlsx (for project-level information, like funding and personnel) and datatable-metadata.xlsx (for each datatable). Sections below are split into those falling under Project Metadata and Data Table Metadata. Each section below corresponds to a sheet of the xlsx document. Columns that are gray are optional and columns that are blue are conditionally required but every other column must be filled out. Some columns contain drop down menus of acceptable values. In these columns you must input one of the predefined options listed in the dropdown menu.
If multiple data tables are associated with your project, fill out a different datatable-metadata.xlsx for each data table and a single project-metadata.xlsx. If methodology or abstract differ between the data tables, you can add subsections with headers labeled for each data table to the abstract and methods documents.
In the dataset sheet you will provide information on the name and type of dataset. If the type of your dataset is not tabular or vector contact the acting CVPIA data managers and we will help you prepare the metadata. Columns that are white are required and columns that are blue are conditionally required.
For a dataset that contains vector data please fill out the metadata, abstract, and methods docs following the above instructions. Make sure to indicate type as vector in the dataset tab and to include a geometry in the geometry tab. EML schema accepts eight geometry values: Point, LineString, LinearRing, Polygon, Multipoint, MultiLineString, MultiPolygon, MultiGeometry.
For a dataset that contains raster data please contact the acting CVPIA data managers at ecain@flowwest.com, erodriguez@flowwest.com, or sgill@flowwest.com. They will provide you with more information on how to format the metadata for raster data.
In the personnel sheet you will provide information on the creator and the associated parties to the dataset. The creator is any person or organization who is responsible for the creation of the data, the creator will also be the contact for this dataset. You must have one person with the role “creator”. Columns that are gray are optional but every other column must be filled out.
The title should be fairly descriptive and between 7 and 20 words long. The short name must be less than the number of words present in the title and is your opportunity to give viewers a more accessible name to the dataset.
The keyword set should include a list of keywords and the name of the controlled vocabulary registry (keyword thesaurus) they belong to. Keywords help users identify your dataset and the use of controlled vocabulary sets are helpful to keep keywords consistent across many different research efforts. If you choose not to use one of the controlled vocabularies linked to below, leave keywordThesaurus blank.
In order to promote consistency, please search the following resources for keywords:
The intellectual rights information associated with the dataset. For projects funded under CVPIA authority, we have preselected two licenses, CC0 and CC BY. You can select either of these by adding “CC0” or “CCBY” in the default_license column and leave the rest of the columns blank.
CC0 - The most permissive license, appropriate for data in the public domain.
CC BY - Attribution required
If neither of these two licenses fit the intellectual right associated with the dataset you must provide information for all of the conditionally required columns:
Example:
Use this sheet to input a project personnel if you wish to specify a specific project personnel. If project personnel is left blank we will automatically assign project personnel to the prior defined personnel with the role of creator. To give a different project personnel please fill out all sections of this sheet. Columns that are blue are conditionally required and columns that are gray are optional.
Funding information of the dataset. Use this sheet to describe the funding awarded to the project. We have provided a set of CVPIA default funders: "USBR", "CDFW", "CDWR", and "USFWS". If you input one of the previous listed in the funder_name column you must still provide an award_title but you may leave the remaining columns blank.
If the default funding options are not applicable to your project you must fill out all required and conditionally required columns. Columns that are blue are conditionally required and columns that are gray are optional but every other column must be filled out.
Example:
Maintenance information of the dataset. Use this sheet to describe the status of the data collection as well as the frequency at which you plan to update the data. All columns required.
Example (ongoing maintenance):
Geographical and temporal coverage associated with the dataset, all columns are required. Geographic and temporal will allow users to quickly search for your dataset based on geographic location and time.
Example:
Taxonomic coverage for the dataset, this sheet is optional. We have provided a set of CVPIA common taxa: "chinook", "delta_smelt", "white_sturgeon", "green_sturgeon", or "steelhead". If you select one of the previous listed in the CVPIA_common_species column you can leave the remaining columns blank. If you are adding a new taxon you must fill out all of the other columns. When adding a new taxon please use https://www.itis.gov/ for full taxonomic coverage. Contact the CVPIA data managers if you wish to add a set of taxa to the CVPIA common taxa list. All blue columns are conditionally required and only need to be filled out if you do not select a CVPIA_common_species.
Example:
For each column in your dataset, you need to describe in detail the type of information encoded in that column. In EML, columns are “attributes”. The type of information needed for an attribute varies based on its measurement scale. This document explains the different types of measurement scales, how to select the appropriate one given the data within a column, and which columns within the ‘attribute’ tab of the metadata excel workbook are required given the measurement scale.
The instructions included below detail how to complete all the columns. Columns that are gray are optional and columns that are blue are conditionally required depending on the type of measurement scale:
The ‘attributes’ tab is the most in the metadata standards weeds, we can provide technical assistance with completing this tab.
There are five types of measurement scales to choose from. Based on the selected measurement scale, additional columns must be filled out to produce a valid EML document. Below each measurement scale is defined and guidance on the additional required columns are provided.
Used to define categorical scale attributes. Nominal is used when numbers have only been assigned to a variable for the purpose of categorizing the variable. An example of a nominal scale is assigning the number 1 for male and 2 for female.
Used to define ordered scale attributes. Ordinal is used when the categories have a logical or ordered relationship to each other. These types of scale allow one to distinguish the order of values, but not the magnitude of the difference between values. An example of an ordinal scale is a categorical survey where you rank a variable 1=good, 2=fair, 3=poor.
Used to define interval scale attributes. Intervals define data which consist of equidistant points on a scale. For example temperature data, mark grading, IQ scale, etc. Intervals can be negative while ratios cannot
Used to define ratio scale attributes. Ratios define data which consists not only of equidistant points but also has a meaningful zero point, which allows the ratio to have meaning. For example measurement heights, flow rates, weight, length, etc.
Used to define date and time attributes. DateTime is used when the values fall on the Gregorian calendar system. DateTime values are special because they have properties of interval values (most of the time it is legitimate to treat them as interval values by converting them to a duration from a fixed point) but they sometimes only behave as ordinals (because the calendar is not predetermined, for some dateTime values one can only find out the order of the points and not the magnitude of the duration between those points). The most encompassing format is: YYYY-MM-DDThh:mm:ss.
The code definitions sheet is where you will define the “enumerated” variable types specified in the attributes sheet. This is simply where you will list out all the types for an enumerated variable and each of their definitions.
All columns of this section are blue because they are conditionally required and only needs to be filled out if there are attributes that are “enumerated”.
Example:
code | definition | attribute_name |
---|---|---|
1 | Clear weather | weather |
2 | Cloudy weather | weather |
3 | Rain | weather |