• Sol

    The Sun

  • January

    January

  • February

    February

  • March

    March

  • April

    April

  • May

    May

  • June

    June

  • July

    July

  • August

    August

  • September

    September

  • October

    October

  • November

    November

  • December

    December

I’m the University's Research Data Officer, with Harvard University Information Technology (HUIT), and the Chief Data Science and Technology Officer at Harvard's Institute for Quantitative Social Science. My career journey has included research in astrophysics, design and implementation of software for astronomical observations, development of learning and data management systems for education and biotechnologies, and now leading software platforms and tools for research data sharing and analysis, applied to all research fields. 

What am I currently interested in? Open science to facilitate access and reuse of research data and code while preserving privacy, build software to enhance the quality and productivity of scientific outcomes,  improve research data management, and establish data-centric multidisciplinary collaborations with the aid of technology and a human touch.

Recent Publications

Fenner M, Crosas M, Grethe J, Kennedy D, Hermjakob H, Rocca-Serra P, Durand G, Berjon R, Karcher S, Martone M, et al. A Data Citation Roadmap for Scholarly Data Repositories. Nature-Springer Scientific Data [Internet]. 2019;6 (28). Publisher's VersionAbstract
This article presents a practical roadmap for scholarly data repositories to implement data citation in accordance with the Joint Declaration of Data Citation Principles, a synopsis and harmonization of the recommendations of major science policy bodies. The roadmap was developed by the Repositories Expert Group, as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11.org and the NIH-funded BioCADDIE (https://biocaddie.org) project. The roadmap makes 11 specific recommendations, grouped into three phases of implementation: a) required steps needed to support the Joint Declaration of Data Citation Principles, b) recommended steps that facilitate article/data publication workflows, and c) optional steps that further improve data citation support provided by data repositories. We describe the early adoption of these recommendations 18 months after they have first been published, looking specifically at implementations of machine-readable metadata on dataset landing pages.
Crosas M, Gautier J, Karcher S, Kirilova D, Otalora G, Schwartz A. Data policies of highly-ranked social science journals. SocArXiv. 2018;March.Abstract

By encouraging and requiring that authors share their data in order to publish articles, scholarly journals have become an important actor in the movement to improve the openness of data and the reproducibility of research. But how many social science journals encourage or mandate that authors share the data supporting their research findings? How does the share of journal data policies vary by discipline? What influences these journals’ decisions to adopt such policies and instructions? And what do those policies and instructions look like? We discuss the results of our analysis of the instructions and policies of 291 highly-ranked journals publishing social science research, where we studied the contents of journal data policies and instructions across 14 variables, such as when and how authors are asked to share their data, and what role journal ranking and age play in the existence and quality of data policies and instructions. We also compare our results to the results of other studies that have analyzed the policies of social science journals, although differences in the journals chosen and how each study defines what constitutes a data policy limit this comparison. We conclude that a little more than half of the journals in our study have data policies. A greater share of the economics journals have data policies and mandate sharing, followed by political science/international relations and psychology journals. Finally, we use our findings to make several recommendations: Policies should include the terms “data,” “dataset” or more specific terms that make it clear what to make available; policies should include the benefits of data sharing; journals, publishers, and associations need to collaborate more to clarify data policies; and policies should explicitly ask for qualitative data.

This paper has won the IASSIST & Carto 2018 Best Paper award.

Pasquier T, Lau MK, Han X, Fong E, Lerner BS, Boose E, Crosas M, Ellison A, Seltzer M. Sharing and Preserving Computational Analyses for Posterity with Encapsulator. Computing in Science and Engineering (CiSE), IEEE [Internet]. 2018;May. Preprint VersionAbstract
Open data and open-source software may be part of the solution to sciences reproducibility crisis, but they are insufficient to guarantee reproducibility. Requiring minimal end-user expertise, encapsulator creates a “time capsule” with reproducible code in a self-contained computational environment. encapsulator provides end-users with a fully-featured desktop environment for reproducible research. 
If These Data Could Talk
Pasquier T, Lau M, Trisovic A, Boose E, Couturierer B, Crosas M, Ellison A, Gibson V, Jones C, Seltzer M. If These Data Could Talk. Nature Scientific Data [Internet]. 2017. Publisher's VersionAbstract
In the last few decades, data-driven methods have come to dominate many fields of scientific inquiry. Open data and open-source software have enabled the rapid implementation of novel methods to manage and analyze the growing flood of data. However, it has become apparent that many scientific fields exhibit distressingly low rates of reproducibility. Although there are many dimensions to this issue, we believe that there is a lack of formalism used when describing end-to-end published results, from the data source to the analysis to the final published results. Even when authors do their best to make their research and data accessible, this lack of formalism reduces the clarity and efficiency of reporting, which contributes to issues of reproducibility. Data provenance aids both reproducibility through systematic and formal records of the relationships among data sources, processes, datasets, publications and researchers.
More

Recent Presentations

Data and Code Sharing for Open Science, at Computer Science Department Seminar, EPFL, Switzerland, Wednesday, October 16, 2019:
/mercecrosas/A critical element of Open Science is open access to research data and code. Successful implementation of widely used solutions for sharing open data must combine technology, standards, and incentives. For the last 15 years, the Dataverse project has focused on these three aspects to enhance research data sharing and enable access to tens of thousands of research datasets around the word. Currently, the Dataverse project is working to add two crucial components to continue supporting... Read more about Data and Code Sharing for Open Science
OECD Workshop on the Revision of the Recommendation concerning access to research data from public funding, at OECD conference center, Paris, Tuesday, October 15, 2019:
Two presentations to the OECD workshop on the Revision of the Recommendation concerning access to research data from public funding at part of the two following panels: 1) Use cases of enhanced access to software, algorithms, and workflows; 2) Use cases of access to sensitive data for research puposes.
More

Tweets from @mercecrosas