• Sol

    The Sun

  • January

    January

  • February

    February

  • March

    March

  • April

    April

  • May

    May

  • June

    June

  • July

    July

  • August

    August

  • September

    September

  • October

    October

  • November

    November

  • December

    December

I’m  a data technologist and researcher, currently holding two roles at Harvard University, as the University Research Data Management Officer, with Harvard University Information Technology (HUIT), and the Chief Data Science and Technology Officer at Harvard's Institute for Quantitative Social Science. My career journey has included research in astrophysics, design and implementation of software for astronomical observations, development of learning and data management systems for education and biotechnologies, and now leading software platforms and tools for research data sharing and analysis, applied to all research fields. 

What am I interested in? Open science to facilitate access and reuse of research data and code while preserving privacy, build software to enhance the quality and productivity of scientific outcomes,  improve research data management, and establish data-centric multidisciplinary collaborations with the aid of technology and a human touch.

Recent Publications

Trisovic A, Lau MK, Pasquier T, Crosas M. A large-scale study on research code quality and execution. Arxiv [Internet]. 2021. Publisher's VersionAbstract
This article presents a study on the quality and execution of research code from publicly-available replication datasets at the Harvard Dataverse repository. Research code is typically created by a group of scientists and published together with academic papers to facilitate research transparency and reproducibility. For this study, we define ten questions to address aspects impacting research reproducibility and reuse. First, we retrieve and analyze more than 2000 replication datasets with over 9000 unique R files published from 2010 to 2020. Second, we execute the code in a clean runtime environment to assess its ease of reuse. Common coding errors were identified, and some of them were solved with automatic code cleaning to aid code execution. We find that 74\% of R files crashed in the initial execution, while 56\% crashed when code cleaning was applied, showing that many errors can be prevented with good coding practices. We also analyze the replication datasets from journals' collections and discuss the impact of the journal policy strictness on the code re-execution rate. Finally, based on our results, we propose a set of recommendations for code dissemination aimed at researchers, journals, and repositories.
Trisovic A, Mika K, Boyd C, Feger S, Crosas M. Repository Approaches to Improving the Quality of Shared Data and Code. MDPI Data [Internet]. 2021. Publisher's VersionAbstract
Sharing data and code for reuse has become increasingly important in scientific work over the past decade. However, in practice, shared data and code may be unusable, or published results obtained from them may be irreproducible. Data repository features and services contribute significantly to the quality, longevity, and reusability of datasets. This paper presents a combination of original and secondary data analysis studies focusing on computational reproducibility, data curation, and gamified design elements that can be employed to indicate and improve the quality of shared data and code. The findings of these studies are sorted into three approaches that can be valuable to data repositories, archives, and other research dissemination platforms.
Qualitative data sharing and synthesis for sustainability science
Alexander S, Jones K, Bennet N, Buden A, Cox M, Crosas M, Game E, Geary J, Hardy D, Johnson J, et al. Qualitative data sharing and synthesis for sustainability science. Nature Sustainability [Internet]. 2020;(3) :81-88. Publisher's VersionAbstract
Socio–environmental synthesis as a research approach contributes to broader sustainability policy and practice by reusing data from disparate disciplines in innovative ways. Synthesizing diverse data sources and types of evidence can help to better conceptualize, investigate and address increasingly complex socio–environmental problems. However, sharing qualitative data for re-use remains uncommon when compared to sharing quantitative data. We argue that qualitative data present untapped opportunities for sustainability science, and discuss practical pathways to facilitate and realize the benefits from sharing and reusing qualitative data. However, these opportunities and benefits are also hindered by practical, ethical and epistemological challenges. To address these challenges and accelerate qualitative data sharing, we outline enabling conditions and suggest actions for researchers, institutions, funders, data repository managers and publishers.
More

Recent Presentations

More

Tweets from @mercecrosas

  • _VictoriaAlsina
    _VictoriaAlsina 🎓🌍 Among other objectives, the new Catalan Science Law wants to deepen internationalisation and foreign action in research, development and innovation. t.co/EIePs7T2nE
  • emollick
    emollick The economic value of college isn't just a degree & networks, what you learn actually does impact wages. A university suddenly lowered credits required by 10-20% & those classes of students, with less education but the same degree, had 10-20% lower wages! t.co/qNmZ8TT5nT t.co/bw0leXip4x
  • emollick
    emollick Frustrated that an expert pundit or media personality won’t admit they were wrong about something? It’s a rational strategy. This paper shows folks who are paid for their advice are incentivized to appear overconfident, especially on social media, even after they are proven wrong t.co/p9XerzOGb9
  • johncarlosbaez
    johncarlosbaez Steven Weinberg died! For all the talk of unification, there are few examples. Newton unified terrestrial and celestial gravity - apples and planets. Maxwell unified electricity and magnetism. Weinberg, Glashow and Salam unified electromagnetism and the weak force. (1/n) t.co/LHHwkMyRyi