Recently, I came across an interesting study from 2009, the content of which is not relevant here. However, I was not convinced by some of the statistical analyses and decided to ask the authors whether they could share the data with me. They replied they might have lost the data and would have to search for it. The permanent loss of data is always annoying, but for this study, it would be particularly harmful because it is based on an original survey that would be difficult to replicate. (In addition to the possibility that an analysis of newly collected data might not replicate the results from approximately 10-15 years ago because of time-varying effects.)
This is not an unfamiliar problem and made me think, as have many people before, how one could increase incentives for data archiving in repositories such as the Dataverse Network. Since positive incentives such as data citations do not seem to achieve this goal, two negative incentives might be more effective. If authors do not make their data available in a repository, are asked to share their data and cannot deliver it for whatever reason, one could take two courses of action.
First, the article for which no data are available for reproduction would be retracted. This might seem to be a harsh response and would put such articles on the same level as work in which the authors had committed plagiarism or data fabrication. However, we should ask ourselves: who has the burden of proof showing that an empirical analysis was carried out and needs to ensure that it can be reproduced? The burden should rest with the authors and if they can’t deliver by sharing data and code, the study looks similar to one that is based on fabricated results that lack any preceding empirical work. I am not saying all studies for which no reproduction data are available have been fabricated, but, empirically, we cannot tell the difference. So one might wonder why they should be treated differently.
A second, less far-reaching measure would be to flag publications with something like “Data not available. Study cannot be reproduced.” Such a stamp should be put on the publisher’s website and on the publication itself (e.g., on every page of a PDF). The publication would not be retracted, but the reader would be provided with the important auxiliary information that the results she is looking at cannot be reproduced. No single study can deliver conclusive results and all should be interpreted with caution and this should be even more the case when the underlying data is nonexistent.
The best option, of course, remains data archiving immediately following publication of a study and, hopefully, more and more journals and book publishers will make the online depositing of data and code a necessary requirement for the acceptance of a publication. Until we have reached this point, negative incentives might be useful complementary instruments. Evidently, the two proposals also require the collaboration of publishers and the retraction of non-reproducible articles might be difficult to achieve. However, flagging publications as lacking data for reproduction might be feasible once the loss of data is made public and brought to the editors’ attention, who hopefully, would feel compelled to take at least some form of action.