Open data is data that can be freely used, re-used and redistributed, with appropriate attribution and if released under the same or similar licence as the original dataset. Open data must also be publicly available and accessible on a public server, without password or firewall restrictions. The data must be technically open, which means the datasets must be published in electronic formats that are machine readable and non-proprietary, so that anyone can access and use the data using common, freely available software tools.
To determine if data is open, check the publisher's website for their terms and conditions for data usage. The Open Knowledge Foundation lists licenses that conform with the principles of open data sharing. If the licensing conditions are unclear, you will need to contact the publisher.
Open data repositories are included in the list of repositories on the Data Repositories and Archives page.
To facilitate the findability and sharing of research data, academic publishers are now publishing a range of data-focused journals including:
Title | Publisher | Access |
Big Earth Data | Taylor & Francis | Open access |
Biodiversity Data Journal | Pensoft Publishers | Open access |
Chemical Data Collections | Elsevier | Subscription |
Data in Brief | Elsevier | Open access |
Earth System Science Data | Copernicus Publications | Open access |
Geoscience Data Journal | Wiley | Open access |
Journal of Chemical & Engineering Data | American Chemical Society (ACS) | Subscription |
Journal of Open Psychology Data | Ubiquity Press | Open access |
Journal of Physical and Chemical Research Data | American Institute of Physics (AIP) | Mixed model |
Scientific Data | Nature | Open access |
What are the benefits of using externally sourced data?
Some benefits of starting your own research from the existing data collected by other researchers include:
What should be considered before using externally sourced data?
Considering these questions will help you determine if the data is suitable for you to reuse:
What is data citation?
Data citation refers to the practice of providing a reference to data in the same way that researchers routinely provide a reference to journal articles or other publications. When reusing the data of others, proper attribution must be given to the work of the original creator of the dataset.
Why is data citation important?
Citing data is important because it:
How should data be cited?
Published research data should be cited in the same way as other scholarly outputs. Styles and formats for data varies in the same way article referencing styles and formats vary. Important elements in citing data, regardless of referencing style, publisher or repository guidelines, include:
Standard citations for datasets often list these elements in the following order:
Creator(s). Publication Year. Title. Version. Source. Identifier.
When citing data follow the guidelines provided by your referencing style, your editor or publisher, or the data source (the dataset creator or repository). Referencing software, such as EndNote, do provide a template for datasets, but other requirements may mean the generated references need to be modified.