Help and Support: Research Data Management : Sharing and Reuse

Open Data

Open data is data that can be freely used, re-used and redistributed, with appropriate attribution and if released under the same or similar licence as the original dataset. Open data must also be publicly available and accessible on a public server, without password or firewall restrictions. The data must be technically open, which means the datasets must be published in electronic formats that are machine readable and non-proprietary, so that anyone can access and use the data using common, freely available software tools.

To determine if data is open, check the publisher's website for their terms and conditions for data usage. The Open Knowledge Foundation lists licenses that conform with the principles of open data sharing. If the licensing conditions are unclear, you will need to contact the publisher.

Open data repositories are included in the list of repositories on the Data Repositories and Archives page.

Data-focused Journals

To facilitate the findability and sharing of research data, academic publishers are now publishing a range of data-focused journals including:

Title	Publisher	Access
Big Earth Data	Taylor & Francis	Open access
Biodiversity Data Journal	Pensoft Publishers	Open access
Chemical Data Collections	Elsevier	Subscription
Data in Brief	Elsevier	Open access
Earth System Science Data	Copernicus Publications	Open access
Geoscience Data Journal	Wiley	Open access
Journal of Chemical & Engineering Data	American Chemical Society (ACS)	Subscription
Journal of Open Psychology Data	Ubiquity Press	Open access
Journal of Physical and Chemical Research Data	American Institute of Physics (AIP)	Mixed model
Scientific Data	Nature	Open access

Reusing Externally Sourced Datasets

What are the benefits of using externally sourced data?

Some benefits of starting your own research from the existing data collected by other researchers include:

the availability of background information
the limitation of data duplication
the potential savings in time and costs
the potential for collaboration opportunities
the data comes with established validity and reliability

What should be considered before using externally sourced data?

Considering these questions will help you determine if the data is suitable for you to reuse:

Is there enough description of the content of the data?
Is the context of the research relevant?
Is the source reliable?
How long the data will be stored and made available?
Are there any restrictions or specifications for data re-use?
What will be the impact of these restrictions or specifications on your research?
What is the relationship between the externally sourced data and the data you are collecting?
How will the data be integrated?
How will any differences in format be managed?

Data Citation

What is data citation?

Data citation refers to the practice of providing a reference to data in the same way that researchers routinely provide a reference to journal articles or other publications. When reusing the data of others, proper attribution must be given to the work of the original creator of the dataset.

Why is data citation important?

Citing data is important because it:

Acknowledges data as a research output and facilitates reproducible and transparent research
Acknowledges and provides credit to the original creator of the data
Allows replication or verification of the data, improving their reliability and validity
Allows for citations for published data to be included in a researcher's resume
Increases the citation rate of related publications in which the data is cited
Enables the collection of citation statistics to measure the impact of the data (data citation metrics)

How should data be cited?

Published research data should be cited in the same way as other scholarly outputs. Styles and formats for data varies in the same way article referencing styles and formats vary. Important elements in citing data, regardless of referencing style, publisher or repository guidelines, include:

Who produced the dataset (creator or author)
The title of the dataset
The year the dataset was published and its version number, if it has one
The source of the dataset (repository or journal)
The unique identifier of the dataset, preferably a Digital Object Identifier (DOI) or a persistent URL

Standard citations for datasets often list these elements in the following order:

Creator(s). Publication Year. Title. Version. Source. Identifier.

When citing data follow the guidelines provided by your referencing style, your editor or publisher, or the data source (the dataset creator or repository). Referencing software, such as EndNote, do provide a template for datasets, but other requirements may mean the generated references need to be modified.