SEARCHABLE LISTS OF DATABASES
Registry of Research Data Repositories (re3data.org)
A global registry of research data repositories from different academic disciplines.
A curated, annotated list of databases, with data policies and metadata standards.
Nature Journals recommended data repositories list
A curated list that includes both generalist repositories and specialized, discipline-specific repositories. The listed repositories meet the Nature Journals requirements for data access, preservation and stability.
Examples of generalist data repositories
Other data repositories
Inter-University Consortium for Political and Social Research (ICPSR)
(Note: Bucknell is an ICPSR member institution. Link to Bucknell when creating your user account to access membership benefits.)
"Moving Beyond the Title: Evaluating The Data You Find": An ICPSR video tutorial on finding and evaluating datasets to answer a research question (July 2020).
U.S. government's open data
Home of U.S. government's open data: https://www.data.gov/
Examples of qualitative and text-based data repositories
The Corpus of Contemporary American English (COCA)
Museum collections
In 2017, the Metropolitan Museum of Art (The Met) made all images of public-domain works in its collection available under the Creative Commons Zero (CC0) license, which allows unrestricted use, sharing, and remixing. The change reflects The Met's commitment to increasing access to the collection in a digital age.
Open Access at the Cleveland Museum of Art
The Cleveland Museum of Art became an Open Access institution in 2019. All the images of public-domain works in the collection are available under the Creative Commons Zero (CC0) license, and can be used, shared, and remixed without restrictions. In addition, portions of collections information (metadata) for more than 61,000 artworks, both in the public domain and those works with copyright or other restrictions, are now available.
Bucknell Digital Commons, a service of Bucknell University Libraries, is an institutional repository that bring together all of Bucknell University's research and scholarship under one umbrella, with an aim to preserve and provide access to that research and scholarship.
The research and scholarly output included in Bucknell Digital Commons is selected and deposited by the individual university departments and centers on campus. The repository is an excellent vehicle for working papers or copies of published articles and conference papers, as well as presentations, senior theses, and other works not published elsewhere.
Submit your research to Bucknell Digital Commons
Most research can be submitted electronically. Click on the link above to submit your research. Some publications do not allow authors to submit directly. In these cases, you will be provided with a mail form to contact the appropriate administrator for further instruction.
Open data and content can be freely used, modified, and shared by anyone for any purpose.
The Open Definition (a project of the Open Knowledge Foundation) defines in detail the meaning of “open” with respect to knowledge, promoting a robust commons in which anyone may participate, and interoperability is maximized.
1. Open Works
An open work must satisfy the following requirements in its distribution:
Open License or Status: The work must be in the public domain or provided under an open license
Access: The work must be provided as a whole and at no more than a reasonable one-time reproduction cost, and should be downloadable via the Internet without charge.
Machine Readability: The work must be provided in a form readily processable by a computer and where the individual elements of the work can be easily accessed and modified.
Open Format: The work must be provided in an open format. An open format is one which places no restrictions, monetary or otherwise, upon its use and can be fully processed with at least one free/libre/open-source software tool.
2. Open Licenses
A license is open if its terms satisfy the following conditions:
Required Permissions: The license must irrevocably allow use, redistribution, modification, and compilation for any purpose. The license must not restrict anyone from making use of the work in a specific field of endeavor. The license must not impose any fee arrangement, royalty, or other compensation or monetary remuneration as part of its conditions.
Acceptable Conditions: The license may require distributions of the work to: include attribution of contributors, rights holders, sponsors, and creators as long as any such prescriptions are not onerous; and to remain under the same license or a similar license; among other conditions.
Open Definition 2.1, Open Knowledge Foundation, http://opendefinition.org/od/2.1/en/, accessed on August 12, 2019.
Attribution-NonCommercial 4.0 International license (CC BY-NC 4.0 license) is a Creative Commons license commonly used for scientific data.
Under this license, you are free to:
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material
The licensor cannot revoke these freedoms as long as you follow these license terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial — You may not use the material for commercial purposes.
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Further reading: "Creative Commons Licenses: How to Choose the Best CC License" blog post (June 2021).
The FAIR Data Principles are intended to make digital data more Findable, Accessible, Interoperable, and Reusable, including by computational systems (since we increasingly rely on computational support to work with data).
Findable
The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services.
F1. (Meta)data are assigned a globally unique and persistent identifier
F2. Data are described with rich metadata (defined by R1 below)
F3. Metadata clearly and explicitly include the identifier of the data they describe
F4. (Meta)data are registered or indexed in a searchable resource
Accessible
Once the user finds the required data, she/he needs to know how can they be accessed, possibly including authentication and authorisation.
A1. (Meta)data are retrievable by their identifier using a standardised communications protocol
A1.1 The protocol is open, free, and universally implementable
A1.2 The protocol allows for an authentication and authorisation procedure, where necessary
A2. Metadata are accessible, even when the data are no longer available
Interoperable
The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.
I2. (Meta)data use vocabularies that follow FAIR principles
I3. (Meta)data include qualified references to other (meta)data
Reusable
The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.
R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
R1.1. (Meta)data are released with a clear and accessible data usage license
R1.2. (Meta)data are associated with detailed provenance
R1.3. (Meta)data meet domain-relevant community standards
Sources:
FAIR Principles. GO Fair website. Retrieved from https://www.go-fair.org/fair-principles/ on July 31, 2020.
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18.
Kristin Briney argues for publishing both the research findings and the research data, in order to improve the quality, reproducibility, and impact of research. "Data, or it didn't happen!" TED x UW-Milwaukee, 2015. Length: 15 minutes.
Open Data: Unleashing Hidden Value (LinkedIn Learning)
An online course on LinkedIn Learning with transcripts, exercise files, and self-assessment quizzes. Governments around the world are discovering the value and responsibility in making the data they collect and store easily available to anyone who wants to access it. Making the decision to open up data sets is a strategic choice that requires detailed tactics. There are processes and technologies to make data accessible while minimizing risk. If you want to start opening up your organization's data to enable transparency and catalyze innovation, or use open data to drive analysis and make more informed decisions, this course is for you. The course introduces real-world use cases for open data, as well as the steps you need to take to develop and operationalize an open data program, and measuring the value of open data.
Length: 1 hour, 10 minutes. (Free access with Bucknell login.)