Skip to Main Content

Research Data Management - Research Guide

Documentation

Documenting and Describing Data

Documenting your data as fully as possible will maximise its value and provide a rich context for your data for yourself and for future researchers. Be sure to capture and document all of the relevant and accurate facts about your data. This will act as reminder for yourself of what you did. If you publish your data or contribute it to a data repository or archive, good documentation will improve the findability and reusability of your research.

Each research project and dataset is different, so there is no one standard for documenting your data - approaches may vary from discipline to discipline.

The descriptive elements that document your data may be referred to as metadata.

Secure storage of documentation and metadata is as important as the storage of the research data, as the metadata provides a descriptive context to raw research data. Researchers are encouraged to use the same guidelines to store all documentation and metadata as those used for research data storage.

Metadata

What is metadata?

Metadata is the detailed description that accompanies related research data, including its location to enable efficient retrieval and reuse throughout the life span of the research data. It is succinctly expressed as "data about data".

Metadata is critical for effective research data management and should include file management standards that are used by all researchers working on a project.

Why is metadata important?

Metadata:

  • enables effective organisation of research data
  • improves findability
  • facilitates research data sharing
  • provides digital identifiers for the research data
  • supports archiving and preservation.

What metadata elements should you include?

You may consider the following elements for describing and documenting the data you are collecting:

Identifier Unique alpha-numeric identifier used to identify the data (such as DOI or Handle)
Date Any key dates associated with the data, including project start and end dates
Title Name of the project or dataset
Version Information on the relevant version(s) of the dataset
Creator(s) Names, contact details and identifiers (such as ORCID) for all organisations and/or persons who collected and created the data
Source Citations for any data obtained or derived from other sources, including the creator, the year, the title of the dataset, identifier and access information
Location Relevant geographic information, including cities, regions, states, countries or coordinates
Keywords Keywords or phrases describing the data, this could also include relevant Field of Research codes
Methodology Information on how the data was created, including specific software or equipment (with model or version numbers), formulae, algorithms or methodologies
Processing Information on how the data has been transformed, altered or processed
Technical details All relevant technical information including a list of all the files that make up the dataset with extensions and relevant file formats and structures, an explanation of any codes or abbreviations used in the file names, a list of all variables in the data files, as well as the names and version numbers of all software packages required to use, view, or analyse the data
Rights Any known intellectual property rights, statutory rights, licenses, or restrictions on use of the data
Access How and where the data can be accessed

 

Decide if each of these elements are relevant to your research data or useful to future researchers.

Are there metadata standards?

There are metadata standards, including some for specific disciplines:

General Arts Humanities Social Sciences Science

Dublin Core

Metadata Encoding and Transmission Standard (METS)

Metadata Object Description Schema (MODS)

Categories for the Description of Works of Art (CDWA)

Visual Resources Association (VRA Core)

Functional Requirements for Bibliographic Records (FRBR)

The Text Encoding Initiative

ANZLIC

Content Standard for Digital Geospatial Metadata (CSDGM)

Data Documentation Initiative (DDI)

Astronomy Visualization Metadata (AVM)

CSMD-CCLRC Core Scientific Metadata Model

Darwin Core

Ecological Metadata Language (EML)

Identifiers for Data

Identifiers are persistent, unique alpha-numeric labels given to datasets. Unique identifiers are critical for the documentation, identification, citation and retrieval of research data and should be included in the metadata.

Identifiers create a connection between the research data and related resources, these connections provide important provenance information to data and the related research.

Some appropriate identifiers include:

  • Digital Object Identifiers (DOIs) are used to uniquely identify datasets and provide a persistent link to the location of the data on the internet. Contributing your dataset to a data repository or archive will allow you to generate a DOI for your data
  • The Handle system creates persistent identifiers when a DOI is not appropriate or unable to be created. Use the identifier tree to determine which identifier is right for your needs. The Handle service is available without charge for Australian organisations and individuals creating, using or curating publicly available research data
  • Research Activity Identifier (RAiD) - is an identifier for research projects and activities. A RAiD has two parts: The RAiD handle and the RAiD Data Management Record (DMR). The handle is a unique string of numbers that identifies the research project. The DMR is attached to the handle to store identifiers and metadata for datasets associated with the project or activity

Readme files and Codebooks

When data is published, researchers often provide a readme.txt file with the dataset to help ensure data can be correctly interpreted and reanalysed by others, and to ensure effective usability of the data. 

Codebooks are documents that interpret any abbreviations, codes or variables used when entering data into a dataset. They describe which values you should expect within a field and what those values correspond to.

Readme and Codebooks also help to describe:

  • The research methodology employed to create the data
  • Any data processing steps, particularly if not described in the publication, that may affect interpretations of results
  • For tabular data: definitions of column headings and row labels, data codes (including missing data/null values), and measurement units
  • What associated datasets are stored elsewhere, if applicable
  • Details of software or tools used and notes on any specific equipment setup needed to interpret, process, or replicated the data
  • Whom to contact with questions