January 8, 2018

Questions about publishing workflow and automation answered

We at Apex spent the better part of a quarter conducting a survey of the publishing world to understand where the industry stands with regard to production processes, automation, XML workflows, and general technology adoption – and it was enlightening!

The results of the 2017 survey were shared in a webinar with Bill Kasdorf and Greg Suprock and also in a report you can download here. There were many great questions from attendees during the webinar; we have provided a few of particular interest in this blog as additional insight.

As always, if this strikes a chord, we are happy to have a conversation. Contact us.

Q&A from Webinar: Survey Results: Publishers’ progress toward workflow & automation

Q: How was xml-first workflow defined? Editing in xml or starting production with xml?

A (Greg): XML-First workflow was defined as getting to a structured document up-front of production.  It may or may not mean that manuscript is transformed to XML. However, with proper care in using Word manuscripts, the underlying structure of the file is XML which, when combined with a proper styling template and document clean-up tools, it is possible to continue editing in Word but have XML under the hood.

We’ve seen relatively few publishers take the step to edit directly in XML. Those that do tend to produce either standards documents or technical manuals.

Q: How big an advantage will a SAAS based system provide to publishers in terms of workflow automation?

A (Greg): SAAS-based systems are often attractive to smaller publishers or a publisher who wants to try things out without a big investment of time or money.

Depending on the pricing model and the needs of the publisher, a SAAS-based system can, in fact, be viable long-term. But typically that only works for a publisher whose needs are pretty much just what the SAAS-based system is designed to do.

More often, and where there is either greater volume, greater diversity, or greater need for specialization, bringing the automation in-house becomes more attractive.

One other point, which I only mentioned in the last minute of the webinar: since the ecosystem is getting ever more standards-based and ever more modular, this is not a black and white issue. It is sometimes appealing to use a SAAS-based system for one aspect of a workflow even if other aspects are handled by internal systems.

Q: Do you see a future where the publisher assigns a single identifier (say an ISBN) to the XML file and the conversion to the end-user format is handled by the intermediary (i.e. send XML to ProQuest and they deliver as ePub to an end-user) or will the publishers still need to convert to assorted formats, assign ISBNs and deliver multiple file types to intermediaries?

A (Bill): In general, the toolset and expertise for the technical conversion are not typically the same as those for aggregators and distributors. It’s true that ProQuest and their peers do in fact have a lot of technical expertise, so the scenario you raise is in fact possible, though I don’t think it’s the likely future state in general.

The closest example to what you’re asking is probably Ingram’s CoreSource. But they don’t actually do the conversion; they hand that off to conversion houses.

And of course that conversion is rarely, if ever, from an XML source, though yes, I would like to see that happen. But because XML can really be in any form, there’s a lot of potential variation there. Even on the STM side, where JATS/BITS is so pervasive, there’s a lot of variation between JATS files from different publishers. And for your organization, though you’re moving toward an XHTML5-based infrastructure, you’re still in a position of having to generate JATS XML from that for journals (not books, of course). That’s not to say your move to XHTML5 isn’t a good one—it’s excellent, and definitely reducing the needs for transforms, and making the transforms you will need to do more straightforward.

In the STM world, it’s really the hosting services that are best positioned to do what you’re suggesting. And in fact, the upcoming release from one of them will provide EPUBs for journal articles from source XML conforming to their specs, along with HTML for online, and the others may not be far behind. So it wouldn’t be much of a stretch for those hosts to create the variants that need to be managed. But initially, that’s for journal articles we’re talking about, because they’re much more consistent across journals and publishers; books are much messier and more complicated.

Two more comments:

What we’d really like to see happen is not having to create all those variants. Lisa McCloy-Kelley at PRH actually tells me that they do indeed send the same EPUB to all retailers. And as you know, the whole idea behind Web Publications is eliminating the need for one file for a packaged format like EPUB and a separate format for online. Ideally, that’s the future development that should help eliminate or reduce the need for multiple formats to be created at all.

And remember that ISBN is a product identifier. So if you consider that initial XML a product, yes, it could have an ISBN; but the different formats generated from it still need their own ISBNs, whether assigned by the publisher or by the service provider on the publisher’s behalf. More likely, the XML would get a DOI for a scenario like this, and the XML would not likely be delivered as such to an end user.

Q: Do these automation tools for editorial work on top of MS Word or do they require a different CMS?

A (Greg): The tools that I described are not embedded in a CMS. They are standalone tools using workflows that either sit on Word or manipulate the Word documents to facilitate editing. For example, Apex CoVantage’s NuanSCE is a tool designed to improve the consistency and speed of copyediting.  Learn more here.