Cloud Dataverse - A Faster Way to Process Data in the Cloud

Presentation Date: 

Wednesday, October 26, 2016


OpenStack Summit, Barcelona

Presentation Slides: 

A presententation at the OpenStack Summit by  Piyanai Saowarattitada, Mercè Crosas, and Orran Krieger.

Abstract: Cloud Dataverse is a new service for accessing and processing public data sets in an OpenStack Cloud. It is based on Dataverse, a popular framework for sharing, preserving, and analyzing research data. Cloud Dataverse extends Dataverse to replicate datasets from per-institution repositories to a cloud-based repository and store their data files in Swift, making data processing faster for in-situ application running in the cloud. Cloud Dataverse is a collaborative effort between two open source projects: Massachusetts Open Cloud (MOC) and Dataverse. Dataverse is being developed at Harvard's Institute for Quantitative Social Science (IQSS) with contributors worldwide providing 19 Dataverse repos hosting 60,000 datasets from 300 institutions deposited by 10,000 data authors.The MOC is a collaboration between higher education (BU, NEU, Harvard, MIT and UMass), government, and industry.Its mission is to create a self-sustaining at-scale public cloud based on the Open Cloud eXchange model