Documenting your data as fully as possible will maximise its value and provide a rich context for your data for yourself and for future researchers. Be sure to capture and document all of the relevant and accurate facts about your data. This will act as reminder for yourself of what you did. If you publish your data or contribute it to a data repository or archive, good documentation will improve the findability and reusability of your research.
Each research project and dataset is different, so there is no one standard for documenting your data - approaches may vary from discipline to discipline.
The descriptive elements that document your data may be referred to as metadata.
Secure storage of documentation and metadata is as important as the storage of the research data, as the metadata provides a descriptive context to raw research data. Researchers are encouraged to use the same guidelines to store all documentation and metadata as those used for research data storage.
What is metadata?
Metadata is the detailed description that accompanies related research data, including its location to enable efficient retrieval and reuse throughout the life span of the research data. It is succinctly expressed as "data about data".
Metadata is critical for effective research data management and should include file management standards that are used by all researchers working on a project.
Why is metadata important?
Metadata:
What metadata elements should you include?
You may consider the following elements for describing and documenting the data you are collecting:
Identifier | Unique alpha-numeric identifier used to identify the data (such as DOI or Handle) |
Date | Any key dates associated with the data, including project start and end dates |
Title | Name of the project or dataset |
Version | Information on the relevant version(s) of the dataset |
Creator(s) | Names, contact details and identifiers (such as ORCID) for all organisations and/or persons who collected and created the data |
Source | Citations for any data obtained or derived from other sources, including the creator, the year, the title of the dataset, identifier and access information |
Location | Relevant geographic information, including cities, regions, states, countries or coordinates |
Keywords | Keywords or phrases describing the data, this could also include relevant Field of Research codes |
Methodology | Information on how the data was created, including specific software or equipment (with model or version numbers), formulae, algorithms or methodologies |
Processing | Information on how the data has been transformed, altered or processed |
Technical details | All relevant technical information including a list of all the files that make up the dataset with extensions and relevant file formats and structures, an explanation of any codes or abbreviations used in the file names, a list of all variables in the data files, as well as the names and version numbers of all software packages required to use, view, or analyse the data |
Rights | Any known intellectual property rights, statutory rights, licenses, or restrictions on use of the data |
Access | How and where the data can be accessed |
Decide if each of these elements are relevant to your research data or useful to future researchers.
Are there metadata standards?
There are metadata standards, including some for specific disciplines:
General | Arts | Humanities | Social Sciences | Science |
Astronomy Visualization Metadata (AVM) |
Identifiers are persistent, unique alpha-numeric labels given to datasets. Unique identifiers are critical for the documentation, identification, citation and retrieval of research data and should be included in the metadata.
Identifiers create a connection between the research data and related resources, these connections provide important provenance information to data and the related research.
Some appropriate identifiers include:
When data is published, researchers often provide a readme.txt file with the dataset to help ensure data can be correctly interpreted and reanalysed by others, and to ensure effective usability of the data.
Codebooks are documents that interpret any abbreviations, codes or variables used when entering data into a dataset. They describe which values you should expect within a field and what those values correspond to.
Readme and Codebooks also help to describe: