November 27, 2017

I’ve Seen the Future of STM Publishing

When we think about the future of publishing, we tend to focus on one aspect of publishing. Each of those developments is indeed important, and many of them have transformed how we do what we do when we publish—how we do that thing that each of them is about.

Here are a few examples, all so obvious in hindsight that I don’t need to explain them, I can just list them:

• Postscript, and then PDF
• SGML, and then XML
• The Web
• DOIs and Crossref
• Open Access

I could go on, but you get the point.

Yet despite all that change—each of those was a watershed in its way, and all of them are now taken for granted—we still pretty much publish the way we did in the first place.

Take scholarly journal articles, and especially STM journal articles. Teams of scientists get grant money to fund research; that research leads to one or more papers documenting the research and its results; they submit those papers to publishers; publishers vet the articles via peer review; accepted articles are polished for publication; publishers distribute them—generally in pages that pretty much all look alike—to libraries or directly to readers and researchers.

Before you jump on me about all the things that are different—sure, now many of those articles are distributed via Open Access, which means their costs of publication are paid for up front rather than via subscription revenue, and sure, we mostly get them article-by-article now rather than packaged in issues, and sure, we don’t get them in paper much anymore, we get them over the Web. But until very recently, the only way we got most of them over the Web was as a PDF that we printed out (which is still how most articles are obtained). People like pages.

So I would argue that they still aren’t all that different than when I started my career back in the 1970s, or even a generation before that; even not that different than articles in Philosophical Transactions back in the 17th century. They still basically represent a report of one main thing the researcher wants to report about the research—almost always positive results, because negative results don’t get published much—on a few pages that follow a pretty consistent form (because we’re still stuck on PDFs, which are just pages without paper, until we print them out).

Journal articles are still pretty much what they’ve always been: a report of one main thing the researcher wants to report, on a few pages that follow a pretty consistent form.

It’s comforting, isn’t it? We’re just doing the same old thing, just in better and better ways. No need to panic.

Well, I think all of that is in the midst of a real sea change. In fact I would characterize it as a sort of phase change (you remember that from high school physics, right?). We are on the verge of fundamentally publishing the research, not just articles about it. Publishing in a whole new way, thanks to a number of trends that are all taking shape at the same time.

Here are some key ones, in summary form. I realize I’m sticking my neck out, but I want you to see the big picture first. I’ll write other blogs about some of these topics in more detail, with concrete examples. But here are the trends that I think, together, are fundamentally transforming STM publishing.

The blurring lines between funders and journals

The funders of research want their money to go far and fast. Researchers need publication: it’s how science works, all research is based on other research. They both need attention: that’s why the prestige of the journal in which articles are published is so important. If nobody sees the article, the funder has wasted its money and the researcher’s career makes no progress.

There are already examples, especially in biomedical fields, where funders are actually becoming publishers themselves. An interesting recent example is Gates Open Research, from the Bill and Melinda Gates Foundation in partnership with F1000, Altmetric, and others. I will get into more detail in another blog, but what this promises to do is to publish fast—all articles published within a week—with transparent Open Peer Review, with all outputs from the research available (data, software, etc.), and with everything citable.

This is an extreme example, of course. And to wipe the sweat off the brows and diminish the lumps in the throats of the publishers who see this as the beginning of the end, I would stress that this is not cause for panic. Why? Many reasons. It’s just one funder, publishing a tiny portion of the literature. Not everybody likes transparent Open Peer Review. Researchers still aim for Nature and Science and Cell (and all the other prestige-granting journals that get them tenure—and future funding).

Most importantly, I don’t believe funders really want to be publishers. They just want to get the results of the research they funded out there fast—all the results. And publishers, frankly, don’t really relish all the mechanics of publishing; that’s not what they’re about. So one aspect of this next generation of STM publishing, I submit, will be a more productive partnership between funders and journals to optimize the process.

Stay tuned for a blog that will go into more depth about this.

An open, modular, interoperable publishing ecosystem

The tools and technologies we use in the publishing workflow are still only loosely joined together—and sometimes not really joined at all. As an article moves from authoring to submission to peer review to editing to production to publication, it often gets handed off from one proprietary system to another.

Lots gets lost along the way. I deal with this in my consulting work. There’s a lot of doing, undoing, and redoing in most workflows. Every handoff—between systems, between authors and publishing staff and freelancers and vendors—creates friction, and is a source of mistakes, and is a cause for what I fondly call “loopy QC.”

There have been many attempts to create “end-to-end” workflow management systems, and some of them really do a good job. But monolithic systems are difficult to acquire, difficult to implement, and difficult to maintain.

What we’re seeing now is a much more streamlined, modular architecture based on open source software and web standards. Instead of one company building the system-to-end-all-systems (or trying to), developers are using open source software and the Open Web Platform to address a part of the workflow that they have particular expertise in, or have come up with a particularly good solution for. They create a module that just does that thing really well—and in a way that plugs into the workflow because the other components, developed by others expert in those other things, have developed the other pieces based on the same open, web-based architecture.

It’s all about interoperability. The result is an organically evolving technical infrastructure that innovates faster and that costs less to implement than the traditional model. The most prominent advocate of this approach that I know of is the Collaborative Knowledge Foundation; they’ll be the topic of another future blog.

Don’t get me wrong; we are in the very early stages of this. This is not how things are mostly done today. But enough has happened within the past couple of years that it has become clear to me, at least, that this is the future.

Publishing all of the results

I’ve mentioned many times in this blog that a journal article is typically about one thing, usually a hypothesis that a researcher set out to test and which the research indicated is the case. This is the tip of a very big iceberg. Under the waterline are all the results the researcher chose not to report on, all the data generated by the research, all the software and other tools used to conduct the research, and all the work of a whole team of people (many of whom are credited as contributors, though the PI—Principal Investigator—is who gets most of the glory).

What’s starting to happen is a trend toward publishing all of that. Making the data collected in the course of the research—all of the data—openly available. Making the software used to generate and analyze that data openly available too. This can lead to some wonderful results.

First of all, it helps make the research reproducible. It has long been a secret shame of STM research, and recently a more openly acknowledged crisis, that a shockingly low proportion of experimental results—perhaps only about a third—can be replicated. Does this mean the research is bogus and the results are wrong? Of course not. But it’s pretty queasy-making, isn’t it? Making the data and software available is a step in the right direction. (We still have to work on the incentive to replicate research: publishing still privileges novelty, though megajournals like PLOS are helping to remedy that.)

Another big benefit of publishing the research data is that new research can be done based on it. The New England Journal of Medicine recently ran a “challenge” to come up with a novel finding from a particular set of research data they made available from a recent article. The results are astonishing: really useful results were generated that were not at all what the original research was about, but which were enabled by the data that the original research generated.

Yep, I’ll do another blog on this.

Connecting everything and everybody to everything and everybody

This is the holy grail: linked open science based on the semantic web. Creating what is called a “knowledge graph” based on semantic technologies like RDF triples, using open and referenceable taxonomies, with persistent identifiers for people, places, things, which enables not just the point-to-point linking of this-known-thing to that-known-thing, but inference. Making connections that were not hard-wired.

This is the basis for artificial intelligence and machine learning. We use it every day now, every time we Google something, or see an ad on Facebook, or look for something on Amazon. Many of the biggest companies today are AI-based.

The great thing about this technology is that it gets better and more powerful the more content it has to work with. One implication for STM publishing is that when all that data and software and reporting of results will live “in the cloud,” semantic technologies, metadata, and persistent identifiers will enable it to be dramatically more discoverable, visible, and usable than it is today. It will no longer be trapped in the publication silos we have today.

In the STM publishing sphere, I’d point to a couple of examples: Yewno, which is essentially an inference engine, and SciGraph, which is a giant knowledge graph of scientific and scholarly information. Yet another blog to come on this.

The future I see for STM publishing

So here’s what I think the future of STM publishing is going to look like:

• Funders and journals will become true working partners in the publishing enterprise.
• Studies, and the protocols they use, will be publicly registered at the outset.
• The participants, their institutions, and all the “stuff” will have persistent identifiers (ORCIDs, ISNIs, DOIs, etc.).
• The processes used will be open.
• The data generated from the study will be open.
• The results will be open.

Marcus Munafò, a researcher who spoke at the recent ALPSP Annual Meeting, articulated much of that in the answer to a question in the Q&A after his talk, and I loved the way he put it: “Wrap a DOI around all of that; that’s a publication.”

I know, I know, there are lots of details to be worked out and a lot of transition to accomplish. But I’m convinced that this is what STM publishing will be in the decades ahead. In any case, I sure hope it is.

A shout-out to the 2017 ALPSP Annual Meeting, where many of the sessions, taken together, helped me crystallize my thinking on this. I will cite specific sessions on the followup blogs I’ve promised to write.