Cloud Dataverse: A Data repository platform for an OpenStack Cloud

Presentation Date: 

Monday, May 8, 2017

Location: 

OpenStack Summit, Boston

Presentation Slides: 

In the last 10 years, the Dataverse project has been a leader in open-source repository software for sharing and archiving research data. Dataverse has an active, growing community of developers and users, with 22 installations of the software around the world. The Harvard Dataverse repository alone hosts 70,000 datasets, 330,000 data files, with contributions from more than 500 institutions. 

Cloud Dataverse combines Dataverse and OpenStack by storing datasets in OpenStack’s Swift Object storage and replicating datasets from Dataverse repositories world-wide to the cloud(s) -- offering enormous value to both the Dataverse and OpenStack communities. It provides Dataverse users the ability to host larger datasets and efficiently compute on data from around the world using OpenStack’s compute services. It provides OpenStack users with a repository system that is much richer than Amazon’s Public Datasets service.