Skip to Main Content

Research Data Management - Research Guide

Sharing and Reuse

Open Data

Open data is data that can be freely used, re-used and redistributed, with appropriate attribution and if released under the same or similar licence as the original dataset.  Open data must also be publicly available and accessible on a public server, without password or firewall restrictions. The data must be technically open, which means the datasets must be published in electronic formats that are machine readable and non-proprietary, so that anyone can access and use the data using common, freely available software tools.

To determine if data is open, check the publisher's website for their terms and conditions for data usage. The Open Knowledge Foundation lists licenses that conform with the principles of open data sharing. If the licensing conditions are unclear, you will need to contact the publisher.

Open data repositories are included in the list of repositories on the Data Repositories and Archives page.

Data-focused Journals

To facilitate the findability and sharing of research data, academic publishers are now publishing a range of data-focused journals including:

Title Publisher Access
Big Earth Data Taylor & Francis Open access
Biodiversity Data Journal Pensoft Publishers Open access
Chemical Data Collections Elsevier Subscription
Data in Brief Elsevier Open access
Earth System Science Data Copernicus Publications Open access
Geoscience Data Journal Wiley Open access
Journal of Chemical & Engineering Data American Chemical Society (ACS) Subscription
Journal of Open Psychology Data Ubiquity Press Open access
Journal of Physical and Chemical Research Data American Institute of Physics (AIP) Mixed model
Scientific Data Nature Open access

Reusing Externally Sourced Datasets

What are the benefits of using externally sourced data?

Some benefits of starting your own research from the existing data collected by other researchers include:

  • the availability of background information
  • the limitation of data duplication
  • the potential savings in time and costs
  • the potential for collaboration opportunities
  • the data comes with established validity and reliability

What should be considered before using externally sourced data?

Considering these questions will help you determine if the data is suitable for you to reuse:

  • Is there enough description of the content of the data?
  • Is the context of the research relevant?
  • Is the source reliable?
  • How long the data will be stored and made available?
  • Are there any restrictions or specifications for data re-use?
  • What will be the impact of these restrictions or specifications on your research?
  • What is the relationship between the externally sourced data and the data you are collecting?
  • How will the data be integrated?
  • How will any differences in format be managed?

Data Citation

What is data citation?

Data citation refers to the practice of providing a reference to data in the same way that researchers routinely provide a reference to journal articles or other publications. When reusing the data of others, proper attribution must be given to the work of the original creator of the dataset.

Why is data citation important?

Citing data is important because it:

  • Acknowledges data as a research output and facilitates reproducible and transparent research
  • Acknowledges and provides credit to the original creator of the data
  • Allows replication or verification of the data, improving their reliability and validity
  • Allows for citations for published data to be included in a researcher's resume
  • Increases the citation rate of related publications in which the data is cited
  • Enables the collection of citation statistics to measure the impact of the data (data citation metrics)

How should data be cited?

Published research data should be cited in the same way as other scholarly outputs. Styles and formats for data varies in the same way article referencing styles and formats vary. Important elements in citing data, regardless of referencing style, publisher or repository guidelines, include:

  • Who produced the dataset (creator or author)
  • The title of the dataset
  • The year the dataset was published and its version number, if it has one
  • The source of the dataset (repository or journal)
  • The unique identifier of the dataset, preferably a Digital Object Identifier (DOI) or a persistent URL

Standard citations for datasets often list these elements in the following order:

Creator(s). Publication Year. Title. Version. Source. Identifier.

When citing data follow the guidelines provided by your referencing style, your editor or publisher, or the data source (the dataset creator or repository). Referencing software, such as EndNote, do provide a template for datasets, but other requirements may mean the generated references need to be modified.