It is important to cite any data that you use in your research. Citing data gives credit to its author(s), lends credibility to your work, and supports further research by allowing people to identify and locate the data you used.
We've created this guide as an online handout to some of the data sources we spoke about in class.
There are perhaps two main strategies to find research datasets:
1. Search for a journal article on the topic of interest (through literature databases like Web of Science). Go through the results until you come across one that suits your needs AND that provides a link to the dataset underlying the article.
2. Go directly to a known data repository to search for a dataset of interest. If you do not know a data repository then look in the resources listed below. You can find a data repository by searching in a directory of research data repositories (such as re3data), or you can search across data repositories (using DataCite).
There are now journals that just publish datasets! We have included a brief list of some of the best-known ones below. But you can also search in your browser by "data journals" to see more examples. Other journals may publish datasets as well as articles.
Multidisciplinary data journal published by Nature. From their About page: "Scientific Data is a peer-reviewed, open-access journal for descriptions of scientifically valuable datasets, and research that advances the sharing and reuse of scientific data."
Research funders are beginning to require datasets from funded research be made openly available in a data repository. Many traditional journals are also beginning to require authors make the data underlying their published articles openly available. Often they will link to the datasets from the article. Or you can search for datasets directly by going to a data repository.
There are many, many data repositories out there. It might be best to start with a directory of data repositories.
An international data repository supporting ecological, environmental and earth science research. The data originate from a highly-distributed set of field stations, laboratories, research sites, and individual researchers.
Dataverse is software that institutions use to archive and preserve the research datasets of their faculty researchers. Harvard's Dataverse also provides a search across other Dataverse instances. Primarily social sciences.
Rather than searching each repository individually it is likely more efficient to search across repositories using one of these tools...
Data Observation Network for Earth (DataONE) is a platform for environmental and ecological science, to provide access to Earth observational data. This tool allows searching across all member repositories.
This is a discovery tool ("database"), not a repository itself. It is unique in that it also gives an indication of how much a dataset has been re-used (citation data). Multidisciplinary, coverage back to 1900. Subscription resource.
License Information: There are no restrictions to the number of simultaneous users. Access is restricted to current students, faculty, and staff of the University of Saskatchewan, and to "walk-in" users of the University of Saskatchewan Library for educational, research, and non-commercial personal use. It is accessible in the library, on campus, and remotely. Systematic copying or downloading of electronic resource content is not permitted by Canadian and International Copyright law.