• Sol

    The Sun

  • January


  • February


  • March


  • April


  • May


  • June


  • July


  • August


  • September


  • October


  • November


  • December


I’m  a data technologist and researcher, currently holding two roles at Harvard University, as the University Research Data Management Officer, with Harvard University Information Technology (HUIT), and the Chief Data Science and Technology Officer at Harvard's Institute for Quantitative Social Science. My career journey has included research in astrophysics, design and implementation of software for astronomical observations, development of learning and data management systems for education and biotechnologies, and now leading software platforms and tools for research data sharing and analysis, applied to all research fields. 

What am I interested in? Open science to facilitate access and reuse of research data and code while preserving privacy, build software to enhance the quality and productivity of scientific outcomes,  improve research data management, and establish data-centric multidisciplinary collaborations with the aid of technology and a human touch.

Recent Publications

Qualitative data sharing and synthesis for sustainability science
Alexander S, Jones K, Bennet N, Buden A, Cox M, Crosas M, Game E, Geary J, Hardy D, Johnson J, et al. Qualitative data sharing and synthesis for sustainability science. Nature Sustainability [Internet]. 2020;(3) :81-88. Publisher's VersionAbstract
Socio–environmental synthesis as a research approach contributes to broader sustainability policy and practice by reusing data from disparate disciplines in innovative ways. Synthesizing diverse data sources and types of evidence can help to better conceptualize, investigate and address increasingly complex socio–environmental problems. However, sharing qualitative data for re-use remains uncommon when compared to sharing quantitative data. We argue that qualitative data present untapped opportunities for sustainability science, and discuss practical pathways to facilitate and realize the benefits from sharing and reusing qualitative data. However, these opportunities and benefits are also hindered by practical, ethical and epistemological challenges. To address these challenges and accelerate qualitative data sharing, we outline enabling conditions and suggest actions for researchers, institutions, funders, data repository managers and publishers.
Wilkinson MD, Dumontier M, Sansone S-A, Olavo L, Prieto M, Batista D, McQuilton P, Kuhn T, Rocca-Serra P, Crosas M, et al. Evaluating FAIR maturity through a scalable, automated, community-governed framework. Nature-Springer Scientific Data [Internet]. 2019;6 (174). Publisher's VersionAbstract
Transparent evaluations of FAIRness are increasingly required by a wide range of stakeholders, from scientists to publishers, funding agencies and policy makers. We propose a scalable, automatable framework to evaluate digital resources that encompasses measurable indicators, open source tools, and participation guidelines, which come together to accommodate domain relevant community-defined FAIR assessments. The components of the framework are: (1) Maturity Indicators – community-authored specifications that delimit a specific automatically-measurable FAIR behavior; (2) Compliance Tests – small Web apps that test digital resources against individual Maturity Indicators; and (3) the Evaluator, a Web application that registers, assembles, and applies community-relevant sets of Compliance Tests against a digital resource, and provides a detailed report about what a machine “sees” when it visits that resource. We discuss the technical and social considerations of FAIR assessments, and how this translates to our community-driven infrastructure. We then illustrate how the output of the Evaluator tool can serve as a roadmap to assist data stewards to incrementally and realistically improve the FAIRness of their resources.
A Data Citation Roadmap for Scholarly Data Repositories
Fenner M, Crosas M, Grethe J, Kennedy D, Hermjakob H, Rocca-Serra P, Durand G, Berjon R, Karcher S, Martone M, et al. A Data Citation Roadmap for Scholarly Data Repositories. Nature-Springer Scientific Data [Internet]. 2019;6 (28). Publisher's VersionAbstract
This article presents a practical roadmap for scholarly data repositories to implement data citation in accordance with the Joint Declaration of Data Citation Principles, a synopsis and harmonization of the recommendations of major science policy bodies. The roadmap was developed by the Repositories Expert Group, as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11.org and the NIH-funded BioCADDIE (https://biocaddie.org) project. The roadmap makes 11 specific recommendations, grouped into three phases of implementation: a) required steps needed to support the Joint Declaration of Data Citation Principles, b) recommended steps that facilitate article/data publication workflows, and c) optional steps that further improve data citation support provided by data repositories. We describe the early adoption of these recommendations 18 months after they have first been published, looking specifically at implementations of machine-readable metadata on dataset landing pages.

Recent Presentations

Dataverse and OpenDP: Tools for Privacy-Protective Analysis in the Cloud, at Red Hat Research Day, Tuesday, September 22, 2020:

This presentation is part of the Dataverse and OpenDP session in the Red Hat Research Day on Privacy: https://research.redhat.com/research-days-us-2020/


When big data intersects with highly sensitive data, both opportunity to society and risks abound. Traditional approaches for sharing sensitive data are known to be ineffective in protecting privacy. Differential Privacy, deriving from roots in cryptography, is a strong mathematical criterion for privacy preservation that also allows for rich statistical analysis of sensitive data....

Read more about Dataverse and OpenDP: Tools for Privacy-Protective Analysis in the Cloud

Tweets from @mercecrosas