Skip to main content

Research Data Management: Share Your Data

Share Your Data

What is a data repository?

  • an archival service providing reliable long-term care for digital objects with research value
  • repositories preserve, manage, and provide access to many types of digital materials in a variety of formats
  • materials in online repositories are curated to enable search, discovery, and reuse
  • there must be sufficient control for the digital material to be authentic, reliable, accessible and usable on a continuing basis

                                                                             from Research Data Canada - Original RDC Glossary

Why deposit your research data into a data repository? Repositories can assist with

  • managing data
  • supplying a persistent identifier in order for you or others to cite your data
  • facilitating discovery of your data
  • preserving your data for the long-term

Self-Deposit Repositories

FRDR: Federated Research Data Repository

  • any researcher affiliated with a Canadian institution can deposit data in FRDR at no direct cost
  • FRDR can move data sets of any size
  • the preservation and archiving is done automatically
  • research librarians from the Canadian Association of Research Libraries (CARL) curate and approve deposited items

Zenodo

  • a multidisciplinary platform hosted by CERN.
  • accepts research outputs from all fields of research
  • uploaded data receives a Digital Object Identifier (DOI) to make the data uniquely identifiable and easily cited
  • research output is stored safely for the future in the same infrastructure as CERN's own Large Hadron Collider research data

Discipline-Specific Repositories

Qualitative Data Repository (QDR) *NEW* for social scientists

QDR curates, stores, preserves, publishes, and enables the download of digital data generated through qualitative and multi-method research in the social sciences. (from homepage) Data depositors may be charged a fee which helps cover the costs of curation and preservation.

IEEEDataPort *NEW* Engineering

Accepts data sets up to 2TB. Designed to perform four functions:

  1. Enable individuals and institutions to indefinitely store and make datasets easily accessible to a broad set of researchers, engineers and industry;  
  2. Enable researchers, engineers and industry to gain access to datasets that can be analyzed to advance technology;
  3. Facilitate data analysis by enabling access to data in the AWS Cloud and by enabling the downloading of datasets
  4. Supports reproducible research.(from homepage)

Discipline-specific repositories are also known as subject-specific or domain-specific repositories. Many areas of research are supported by discipline-specific repositories hosted by a variety of internationals groups.

re3data.org

To find a repository specific to your research area, check out re3data.org, a searchable "global registry of research data repositories from a diverse range of academic disciplines. It provides information on repositories for the permanent storage and access of data sets to researchers, funding bodies, publishers and scholarly institutions."

Note that some domain-specific repositories allow for self-deposit while some are based on submission to a larger body, and that the levels of curation vary from repository to repository.

**NEW** Repository Options in Canada: A Portage Guide Find it in English or French

Some scholarly journals are requiring the sharing of research data as a condition of publication. You often can find data sharing policies in the “Instruction for Authors” or “Author Guidelines” sections of the journal. Here are examples of author guidelines with a data sharing requirement:

GigaScience

"GigaScience requires authors to deposit the data set(s) supporting the results reported in submitted manuscripts in a publicly-accessible data repository. ... This section should be included when supporting data are available and must include the name of the repository and the permanent identifier or accession number and persistent hyperlinks for the data sets (if appropriate). The following format is recommended:

"The data set(s) supporting the results of this article is(are) available in the [repository name] repository, [cite unique persistent identifier]."" (from GigaScience Instruction for Authors).

 Canadian Journal of Fisheries and Aquatic Sciences  

"For primary biodiversity data authors are strongly encouraged to place all species distribution records in a publicly accessible database such as the national Global Biodiversity Information Facility (GBIF) nodes (www.gbif.org) or data centres endorsed by GBIF, including BioFresh (www2.freshwaterbiodiversity.eu) for freshwater data and the Ocean Biogeographic Information System (OBIS, http://www.iobis.org/) for marine biodiversity data, which also holds supporting measurements taken alongside the species occurrence data.

Alternatively, Authors who are interested in depositing their underlying data in a repository are referred to Dryad Digital Repository at http://datadryad.org/. "The Dryad Digital Repository is a curated resource that makes the data underlying scientific and medical publications discoverable, freely reusable, and citable." (from Canadian Journal of Fisheries and Aquatic Sciences Scope of the Journal and Guidelines for Papers).

 

When sharing data from research based upon human subjects, confidentiality is very important.

  • Researchers who plan to deposit research data collected from human subjects must ensure that those plans are included in their research ethics application.
  • Research that includes the collection of sensitive data such as some health research, must ensure that privacy considerations are respected, and that confidentiality extends to data deposit.
  • Informed consent documents should include a provision for data sharing.
  • Prior to undertaking the research, determine if the data will need to be de-identified or anonymized. This task can be time consuming, may affect the research project timelines, and may affect the budget.
  • Choosing a repository for the data should be informed by the type of confidentiality review and access control options offered by the repositories under consideration.

Data Citation is a standardized method for secondary users to cite data. Researchers are encouraged to both cite the data of other researchers and to create data citations for their own datasets to increase the likelihood that their own data will be cited.

The benefits of data citation include:

  • the acceptance of research data as a contribution to the scientific record
  • the ability to reference data as standalone research product
  • gain acknowledgement for archiving and sharing data
  • the verification of results and the re-purposing of data for future study
  • tracking the usage and impact of their data

The Joint Declaration of Data Citation Principles has established eight principles to guide the development of data citations. These principles are: importance, credit and attribution, evidence, unique identification, access, persistence, specificity and verifiability, and interoperability and flexibility.

There are various ways out there to cite the data. Different repositories have different styles (i.e. figshare, Dryad, DataCite, and ICPSR). Additionally, traditional citation styles are starting to prescribe standards for citing data. For example, APA has an example for citing published data and one for citing unpublished raw data.

These are basic citation formats to use data citation:

  • Creator (PublicationYear): Title. Publisher. Identifier
  • Creator (PublicationYear): Title. Version. Publisher. ResourceType. Identifier

Fictional example of data citation:

Smith, Madeleine Emily. (2016). 2016 Healthcare Survey Data from Belize. figshare. Data file and code book. DOI xxxx-3333-333-3333 

Metadata is used to describe data so that other researchers can find it and use it appropriately. There are different metadata standards to chose from depending on your area of research. Some examples of metadata used for a research dataset include:

  • the title of the dataset
  • the creator
  • the date
  • the method used to generate the data
  • the source of the data
  • terms to describe the content
  • technical descriptions including file names, formats, versions, etc.
  • access information

Your chosen data repository support services should be able to assist with determining what metadata to record. The most common metadata standards used for data management are Dublin Core and the DDI (Data Documentation Initiative).

Some disciplines have their own metadata schema. Each will have their own specified elements and structure.The Metadata Standards Catalog is a collaborative, open directory of metadata standards applicable to research data.

 Here are some examples from Curtin University Library.

Discipline

Metadata standard

General

Dublin Core (DC)

Metadata Object Description Schema (MODS)

Metadata Encoding and Transmission Standard (METS)

Arts

Categories for the Description of Works of Art (CDWA)

Visual Resources Association (VRA Core)

Astronomy

Astronomy Visualization Metadata (AVM)

Biology

Darwin Core

Ecology

Ecological Metadata Language (EML)

Geographic

Content Standard for Digital Geospatial Metadata (CSDGM)

Social sciences

Data Documentation Initiative (DDI)