Open Data Spotlight for researchers

Share this on social media:

Digital Science recently hosted the first in a series of Open Data Spotlight events. Here, we find out what open data means for researchers. By Nicko Goncharoff, director of publisher relations and head of knowledge discovery at Digital Science.

At Digital Science our aim is to help researchers work in the most effective way possible, overcome the myriad challenges they face, and maximise the value of their efforts. As part of our outreach efforts we recently announced our new Spotlight series of community-based events, themed around some of the pain-points researchers experience and the ways they can be addressed.

On 26 February we launched the series with our first event, ‘Open Data For Researchers: the obstacles and the opportunities’. Funder mandates, the open research movement and new technological innovations have all led to open data becoming a topic of considerable importance within academia.

But what does it all mean for researchers? Our event explored the ‘what’, ‘why’ and ‘how’ of open data for researchers, helping them to understand the benefits as well as examining concerns about potential risks.

The event started with a panel of open data specialists from research and publishing (the latter largely former researchers as well). We then opened the floor to the audience where a lively discussion and debate ensued.

Between presentations by our panelists and the Q&A session that followed, several key themes emerged:

  • Easy access to data is important to the advancement of science;
  • That said, researchers are worried about being scooped if they make their data available before they’ve published in a scholarly journal;
  • Data published in data journals or publicly posted is not accorded the impact of a journal article, even though many feel it should be;
  • As a result, researchers are concerned they won’t get credit for making their data available or reproducible;
  • Even if one does make data reproducible, will anyone do anything with it? How many grants have been awarded to reproduce someone else’s experiment? and
  • Some data, such as clinical trial, other medical records, or competitively sensitive outputs, should not be made open.

So while the concept of data sharing and reproducibility were generally seen as good, there was a sense that the effort requires leadership to ensure the widespread adoption needed to fulfil the goals of the open data movement.

We were delighted by the quality of presenters at the event. Following is a short summary of the panelists’ presentations. Ross Mounce, postdoc from the University of Bath, opened the proceedings with an introductory overview of open data, providing a succinct and clear definition of what it really means.

'Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness),' he said. Mounce explained his view that opening up data is about making the most of its potential. It is often the case that the best thing to do with your data will be thought of by someone else. It has also been shown that papers with open data get cited more.

Andrew Hufton, managing editor at Scientific Data, Nature Publishing Group’s new open-access publication for descriptions of datasets, gave a brief introduction to data journals and what, in his view, they can offer of value to researchers. He presented three principles which he argued should form the basis of a data journal:

  • Data must be well described before others can use it and benefit from it;
  • Scientists who share data in a reusable manner deserve credit through citable publication; and
  • Quality of data is important.

Hufton summed up his talk with a call-to-arms, encouraging researchers to preserve their data, to encourage its reuse and to get credit for it.

Amye Kenall, journal development manager for open data at BioMed Central, then spoke about GigaScience, BioMed Central’s online open-access open-data journal for very large datasets. Her main focus however was on a new initiative to bring open contributorship badges to science.

She argued that we need to re-imagine the way we value different research outputs and research skills. As things stand the article is still seen as the most valuable output. This must change in order to encourage data sharing and the way to do so is to ensure  researchers get credit for sharing data. Kenall explained how badges are used to classify and recognise different skills within Stack Overflow’s online community and argued that academia badly needs a similar scheme.

Michael Markie, associate publisher at F1000 Research, gave a talk titled ‘Getting the Most Out of Research Data’. He explained some ways to help make data usable and reproducible. He stressed the importance of usable, non-proprietary formats, as well as detailed specifications of the methods, software and software parameters needed in order to generate and analyse the data. In essence, the more information about a dataset the better. Markie concluded his talk by making the argument that the article as we know it needs to change. In his view the article of the future should be designed to fit how research is actually done, not the other way around.

Alan Hyndman, from Figshare, spoke on ‘The Unforeseeable Benefits of Sharing Data’. Alan briefly gave the Figshare backstory, explaining how the company's founder Mark Hahnel was frustrated at not being able to publish the videos generated as outputs of his research. He wanted to be able to share all of this data, so he created Figshare. Hyndman shared several impressive examples of how researchers shared data that ended up being used in ways they would never have predicted. For example, files containing 3D scans of the world’s largest dinosaur were uploaded to Figshare, and were seen by people all across the world, used with 3D printers and turned into full CGI animations!

Finally we heard from Tom Pollard, PhD student at University College London, who spoke about the needs of the research community, from his perspective, highlighting some of the big challenges around the sharing of data, especially from a medical perspective. He explained how valuable clinical data is often neglected, to the point that it often no longer exists. Even if it is archived, it’s not at all easy to find and reuse, this is a real problem because it’s a barrier to medical progress.

Another key challenge that he discussed was credit, something touched on by several of the speakers. At the moment researchers get practically no credit at all for investing effort  and time in making data clear and reusable. As many of the speakers argued, this is something that fundamentally needs to change. Pollard explained that the current pressure to publish, with a lack of credit for sharing data and code, leads to many talented people leaving academia.

After the evening’s talks there was a question and answer session with the audience and the panel. Many interesting topics came up in the discussion, perhaps the biggest of which was the issue of ‘scooping’. Many researchers worry that sharing their data before they’ve published articles featuring that data could lead to other people taking the credit for their work.

Overall, the evening was great example of what we’re trying to encourage and support at Digital Science. Different stakeholders in the research community were able to come together and share their perspectives on an important challenge that researchers are facing. This was the first event in our Spotlight series and we would welcome any feedback from those who attended. We look forward to the next one, so keep an eye out for it.