• Sol

    The Sun

  • January

    January

  • February

    February

  • March

    March

  • April

    April

  • May

    May

  • June

    June

  • July

    July

  • August

    August

  • September

    September

  • October

    October

  • November

    November

  • December

    December

I’m  a data technologist and researcher, currently with two roles as the University Research Data Officer, with Harvard University Information Technology (HUIT), and the Chief Data Science and Technology Officer at Harvard's Institute for Quantitative Social Science. My career journey has included research in astrophysics, design and implementation of software for astronomical observations, development of learning and data management systems for education and biotechnologies, and now leading software platforms and tools for research data sharing and analysis, applied to all research fields. 

What am I interested in? Open science to facilitate access and reuse of research data and code while preserving privacy, build software to enhance the quality and productivity of scientific outcomes,  improve research data management, and establish data-centric multidisciplinary collaborations with the aid of technology and a human touch.

Recent Publications

Wilkinson MD, Dumontier M, Sansone S-A, Olavo L, Prieto M, Batista D, McQuilton P, Kuhn T, Rocca-Serra P, Crosas M, et al. Evaluating FAIR maturity through a scalable, automated, community-governed framework. Nature-Springer Scientific Data [Internet]. 2019;6 (174). Publisher's VersionAbstract
Transparent evaluations of FAIRness are increasingly required by a wide range of stakeholders, from scientists to publishers, funding agencies and policy makers. We propose a scalable, automatable framework to evaluate digital resources that encompasses measurable indicators, open source tools, and participation guidelines, which come together to accommodate domain relevant community-defined FAIR assessments. The components of the framework are: (1) Maturity Indicators – community-authored specifications that delimit a specific automatically-measurable FAIR behavior; (2) Compliance Tests – small Web apps that test digital resources against individual Maturity Indicators; and (3) the Evaluator, a Web application that registers, assembles, and applies community-relevant sets of Compliance Tests against a digital resource, and provides a detailed report about what a machine “sees” when it visits that resource. We discuss the technical and social considerations of FAIR assessments, and how this translates to our community-driven infrastructure. We then illustrate how the output of the Evaluator tool can serve as a roadmap to assist data stewards to incrementally and realistically improve the FAIRness of their resources.
A Data Citation Roadmap for Scholarly Data Repositories
Fenner M, Crosas M, Grethe J, Kennedy D, Hermjakob H, Rocca-Serra P, Durand G, Berjon R, Karcher S, Martone M, et al. A Data Citation Roadmap for Scholarly Data Repositories. Nature-Springer Scientific Data [Internet]. 2019;6 (28). Publisher's VersionAbstract
This article presents a practical roadmap for scholarly data repositories to implement data citation in accordance with the Joint Declaration of Data Citation Principles, a synopsis and harmonization of the recommendations of major science policy bodies. The roadmap was developed by the Repositories Expert Group, as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11.org and the NIH-funded BioCADDIE (https://biocaddie.org) project. The roadmap makes 11 specific recommendations, grouped into three phases of implementation: a) required steps needed to support the Joint Declaration of Data Citation Principles, b) recommended steps that facilitate article/data publication workflows, and c) optional steps that further improve data citation support provided by data repositories. We describe the early adoption of these recommendations 18 months after they have first been published, looking specifically at implementations of machine-readable metadata on dataset landing pages.
Crosas M, Gautier J, Karcher S, Kirilova D, Otalora G, Schwartz A. Data policies of highly-ranked social science journals. SocArXiv. 2018;March.Abstract

By encouraging and requiring that authors share their data in order to publish articles, scholarly journals have become an important actor in the movement to improve the openness of data and the reproducibility of research. But how many social science journals encourage or mandate that authors share the data supporting their research findings? How does the share of journal data policies vary by discipline? What influences these journals’ decisions to adopt such policies and instructions? And what do those policies and instructions look like? We discuss the results of our analysis of the instructions and policies of 291 highly-ranked journals publishing social science research, where we studied the contents of journal data policies and instructions across 14 variables, such as when and how authors are asked to share their data, and what role journal ranking and age play in the existence and quality of data policies and instructions. We also compare our results to the results of other studies that have analyzed the policies of social science journals, although differences in the journals chosen and how each study defines what constitutes a data policy limit this comparison. We conclude that a little more than half of the journals in our study have data policies. A greater share of the economics journals have data policies and mandate sharing, followed by political science/international relations and psychology journals. Finally, we use our findings to make several recommendations: Policies should include the terms “data,” “dataset” or more specific terms that make it clear what to make available; policies should include the benefits of data sharing; journals, publishers, and associations need to collaborate more to clarify data policies; and policies should explicitly ask for qualitative data.

This paper has won the IASSIST & Carto 2018 Best Paper award.

Pasquier T, Lau MK, Han X, Fong E, Lerner BS, Boose E, Crosas M, Ellison A, Seltzer M. Sharing and Preserving Computational Analyses for Posterity with Encapsulator. Computing in Science and Engineering (CiSE), IEEE [Internet]. 2018;May. Preprint VersionAbstract
Open data and open-source software may be part of the solution to sciences reproducibility crisis, but they are insufficient to guarantee reproducibility. Requiring minimal end-user expertise, encapsulator creates a “time capsule” with reproducible code in a self-contained computational environment. encapsulator provides end-users with a fully-featured desktop environment for reproducible research. 
More

Recent Presentations

Data and Code Sharing for Open Science, at Computer Science Department Seminar, EPFL, Switzerland, Wednesday, October 16, 2019:
A critical element of Open Science is open access to research data and code. Successful implementation of widely used solutions for sharing open data must combine technology, standards, and incentives. For the last 15 years, the Dataverse project has focused on these three aspects to enhance research data sharing and enable access to tens of thousands of research datasets around the word. Currently, the Dataverse project is working to add two crucial components to continue supporting open science: 1) sharing code associated with the data and... Read more about Data and Code Sharing for Open Science
More

Tweets from @mercecrosas