This summer I went to a Research Data Alliance meeting in London. The meeting was about persistent identifiers, and my goal was to meet like-minded people that care about data. During the meeting I got talking to Sarah Callaghan. Sarah described Patterns; the new data science journal she was setting up. After the meeting our discussion continued via email and Skype and Sarah asked me to become a member of the academic advisory board.
Data science is an emerging discipline that is gaining more and more attention both in academia and industry. It is a multi-disciplinary field and it is not limited to data analysis. It also includes topics such as data cleaning, computational infrastructure, as well as legal and policy aspects of data. This can present problems for academic researchers. How do you publish, and get credit for, work that you have done on developing data science infrastructure or policy?
In this interview Sarah describes how Cell Press are creating a new journal, called Patterns, to try to help alleviate this and other problems with knowledge sharing in data science. Sarah has a 20-year career in creating, managing, and analysing scientific data and she is Patterns’ Editor-in-Chief.
Tjelvar Olsson: What prompted the creation of Patterns?
Sarah Callaghan: Data are everywhere, and we’re producing more and more of it as time goes by. A lot of the time that data creation is intentional, like when a researcher designs and runs an experiment and collects the data. Sometimes that data creation is unintentional, like when a supermarket customer buys one brand instead of another. But regardless of how the data was created, there are uses for it, whether that’s in developing new science, or in figuring out how to market a new type of toothpaste.
One common trend across all the domains that create and manage data is this: everyone has common problems in dealing with data. Everyone, whether they’re an astronomer or a zoologist, has problems with data collection, cleaning, sharing with other researchers, understanding the legal and policy aspects of data, analysing it, and publishing it. And researchers in different domains have come up with solutions to those problems that work in their particular domain, but could also be usefully shared across domains.
That cross-disciplinary knowledge sharing is growing, but it’s not quite there yet. Patterns is all about providing a forum for researchers to share their data-related solutions, tools, methods and analyses across multiple domains. There is a lot of really exciting and innovative work out there that has not gotten out to the wider world – Patterns is here to help change that!
TO: That sounds fantastic. I certainly know the feeling of struggling to find out how others have tackled problems that I’m facing working with scientific infrastructure and data management. It would be great to be able to read about solutions and lessons learnt by others.
I’ve never met anyone who has set up a new journal before. What are the challenges involved in this?
SC: Lots! First and foremost, the main challenge is getting the word out. People can’t submit their articles to a journal that they don’t know exists. So (at the risk of going all marketing-speak) building a journal brand is important. That includes setting a scope and aims that will suit the journal audience, and recruiting an advisory board who will promote the journal in their own networks.
Getting people enthused and interested in the journal is also vital, especially in the case of Patterns, where we’re bringing together different communities into a new, more-inclusive group. Data are fundamental to research, regardless of what your domain is, so Patterns is bringing together computer scientists, data stewards and engineers, and researchers in data-intensive domains in order to share solutions and knowledge.
Commissioning papers for a new journal is also a challenge. Because Patterns is new, there can be a bit of convincing required to get authors to submit their articles to my journal, rather than a more established one. This is where Patterns cross-disciplinary focus and open access nature have added value – it allows researchers to reach readers outside their usual domains.
From a personal point of view, setting up a new journal means a lot of travelling to conferences and meetings, and even more talking to people about their research in order to commission papers (which to be fair, I do enjoy). And email. Lots of email!
TO: It sounds like you get to talk to lots of people about data science. I think this means that you have your finger on the pulse in this field. How do you think data science and management will develop over the next ten years?
SC: I think there’ll be a lot more of it, and there will be different variations in the roles and job titles associated with data. At the moment, a “data scientist” role can cover a wide range of skills and talents, and as a title, it means different things to different people.
I also think that we’re on the cusp of a change in the way that data is produced and dealt with. The closest analogy I can think of is the industrial revolution, where goods moved from being produced as piecework, done by individuals, to being produced in factories. Historically, with data, datasets have been hand created in their own formats by individuals or small groups. The landscape has moved to large scale data creation, and to deal with the issues that come out of that, you need things like infrastructure and standards to drive tools and services.
Academics aren’t the only ones doing research into data science – there is a lot of very interesting and exciting work being done in the business domain. I expect, in the next decade or so, that we will see more of the innovations developed by business to work with data rolled out more widely across research. This is already happening with advances in computer vision for example. And it’s only a little stretch to see how the same artificial intelligence network that can count people in a crowd could be repurposed to count antelope in a herd.
TO: What types of manuscripts should people be submitting to Patterns?
SC: I am always looking out for exciting, innovative original research where a data science solution has been applied to a problem in a research domain, and that solution has the potential to be applied to different domains too. The solution doesn’t have to be complete, in fact Patterns has developed a Data Science Maturity Level scale in order to help readers understand what stage the research is at.
Patterns also publishes descriptor articles – which are papers that describe a data science resource, whether that’s a dataset, piece of software, infrastructure, workflow, algorithm, even a piece of hardware. As long as the resource can be uniquely and unambiguously identified and is useful to the wider community, then an article about it is in scope. This allows the researchers who spend their time building, for example, infrastructures, to gain academic credit for their work.
I am also interested in opinion pieces and reviews on topics of interest to the community. Reviews can be on the literature around a certain topic in data science (e.g. GANs, blockchain, knowledge graphs, etc.) or can be on types of software and tools, highlighting their strengths, weaknesses and uses for the community.
Fundamentally, I want to publish interesting, exciting and innovative work that people from a wide range of domains want to read!
TO: Where can people find out more about Patterns and how to submit manuscripts to it?
SC: We have a very pretty and informative website at http://www.cell/com/patterns where you can find all the information needed by authors to write and submit their article. This includes details of the article types, and the aims and the scope of Patterns. There’s also the link on that page to the system where you can submit your manuscript, and also another link so that you can get the journal e-table of contents delivered free to your inbox when each issue is released.
We’re on Twitter too (@Patterns_CP) where we’ll be promoting our content and also sharing cool and interesting data science things (and pretty pictures of interesting patterns I come across when out and about).
And of course, if the readers of this interview have any other questions, or want to discuss whether or not their research is suitable for Patterns, then I’d be very glad to hear from you! My email is email@example.com
I’d just like to finish up by saying that the future for data science is bright – let’s make it together!