Publish your research outputs: Code, data, documents, ...
E-mail course: 6 steps towards reproducible research (step 6)
Wow friend, I can’t believe we’ve made it this far! 💪🥳
this is already the last episode of this e-mail course, but no worries: the newsletter will continue packed with info on all things open and reproducible data science 🤓.
Now, let’s dive into today’s topic: publishing research outputs.
First let’s talk about what it is not. When people write the following:
The data/code will be made available upon request.
This usually means:
Once the PhD student who wrote this paper leaves their position, the data/code will be lost in space.
If you try to access the data/code, you will feel like Mickey Mouse in the Gif below 🤷
But how can you do better? How can you make your research outputs available?
Publish in a repository
Publish your research outputs in a repository. You basically have three options here:
A general purpose service (e.g. Zenodo or Open Science Framework),
The service of your institution (e.g. Open Data LMU or ETH Zurich’s Research Collection),
A field or project specific service (e.g. a specific repository for high throughput sequencing data or CRAN for R-Packages)
Please make sure to use a trustworthy service. How to check if a service is trustworthy? My rule of thumb is that services that have investor backing (e.g. Figshare) are less trustworthy than services backed by the research community (e.g. Zenodo, which is developed by OpenAIRE and CERN). Why? Well, I think an Open Science service should not be driven primarily by commercial goals. At some point commercial services will take money from you, if that may be by selling your data, by locking your uploaded material behind a pay wall, or in another way.
Publish with the paper
Some journals offer to publish your research outputs with your paper. I will be honest, I have mixed feelings about this. Not all journals which offer this, really have the expertise to do so and they don’t necessarily have the possibility to store data long term. For one of my papers we uploaded the material with the journal, but the link to the material keeps vanishing and I keep getting the confused emails of interested readers. So, make sure the journal you upload your material to, ensures long term storage and availability 🧐
If your research outputs cannot be shared openly
What should you do if you cannot publish your research outputs openly?
I’ll dive deeper into that in an upcoming newsletter, but for now: if you have sensitive data and no consent, do not publish the data! There are other options for you.
If for any reason you cannot share your research outputs, think of options how you can still ensure that others can trust in the reproducibility of your research.
You cannot publish the data? Can you maybe publish the metadata and the code? Can you publish a synthetic version of your data? Can you share the data with specific people (e.g. researchers in the same field)? Brainstorm with your peers, librarians, or IT support. There are always solutions that are better than publishing nothing.
Your task
Discuss with your research team: what are good places to publish your data, code and other research outputs in your field? Put it on the agenda of the next team meeting 🗓
Do you know what you are allowed to do with your code and data? If not, discuss it with the people who might know 🙌
Check out Open Science Framework (OSF) or Zenodo. Try uploading something simple (e.g. slides of your last presentation) ✅
Further reading
My favorite helpers for choosing licenses:
Anything: Choose a Creative Commons license
Thanks for your interest in making your work more reproducible. Let’s improve the way we do science! 👏💪
All the best,
Heidi
P.S. Are you also noticing that researchers are moving from Twitter to Mastodon? If you want to find interesting Open Science people on Mastodon check out this page (thanks to the German Reproducibility Network for organising 👏). You can find me under @HeidiSeibold@fosstodon.org.
Great, as always! Thank you for sharing.
It would be cool to see more for-profits in this field that are smart enough not to defraud their own customers. For-profits may find more opportunities to do good than a donation powered community. To me, most small donation based services are like pet lion cubs! They are cute, but very limited in what they can do and very costly to scale! And when they do scale, they become massive liabilities. So much so that their owners have to shift gears from growth to keeping up with current expenses (Wikipedia or an instance of Mastodon might be good examples).
We just need more people who genuinely care about science and integrity to be at the core of businesses. I see so much room for technological improvement that may significantly simplify these 6 steps. I just started a relevant discussion on Github, but unfortunately my account is currently flagged as a potential spam bot! Maybe I shouldn't have crammed for that Turing test! 😆