September 19, 2017

Publishing@W3C: The convergence is well underway

I’ve written and spoken a lot over the past few years about the coming convergence of EPUB and the Web. This is finally happening, and it’s probably the most important development in the publishing ecosystem since the Web itself was created.

It will affect all publishers—in a good way. Here’s the story.

Why we have to publish in multiple formats today

Currently, most of our publications are not actually Web publications—unless you consider simple ones like blog posts and news articles as publications.

We find publications over the Web; we obtain publications over the Web; we use Web technologies to create publications; but the publications themselves are in formats like EPUBs or apps or, of course, print. This includes PDFs. Yes, you can click on a link in a website and open up a PDF, but that’s because the browser fires up a PDF reader to render it to you. PDFs are their own binary format, not native HTML, not rendered via CSS. HTML and CSS are Web technologies. PDF is not.

We may consume parts of a publication on the Web—an article in a magazine, for example, or a chunk of a textbook—but the publication as a whole is not a single “thing” on the Web. Those publications are typically on a platform that uses the Web for delivery, but those platforms are not the Web.

Our publications today are not Web publications—we just get them via the Web.

This means publishers today have to create publications in a variety of formats. We make PDFs to send to printers and to deliver to users in a pre-paginated form that works kind of okay on laptops and tablets (if they’re simple, and if you don’t mind the annoyance of panning and zooming) but that fails miserably on phones—and that is not fully accessible (often hardly accessible at all). We make EPUBs that work much better on mobile technology and e-readers and assistive technology for accessibility, but which require special applications to be read. We make platforms that can deliver HTML versions, rendered in browsers or apps via CSS—which can use pretty much the same HTML files as those in the EPUBs, but which can’t use EPUB’s XML-specific features (more on that below) and which are not bundled together in a package the way EPUBs are. It’s a pain, creating all these separate formats for the same publication.

EPUB is webby, but not webby enough

Especially when EPUB 3 came along, EPUB was very firmly committed to being based on Web technologies. It no longer used a special profile of CSS; it just used CSS as defined by the W3C. Its content documents were all based on Web technologies, including using HTML5 as defined by the W3C. Video and audio were as defined by HTML5. Scripting was JavaScript. And instead of freezing those specs at a point in time, EPUB was explicitly designed to stay in synch as the W3C evolved those underlying specifications.

As much as EPUB aligns with Web technologies, it still has many aspects that are foreign to browsers and the Web as it’s used today.

Nevertheless, EPUB is still non-Webby in lots of ways. Those HTML files? They have to be expressed as XML (it’s called XHTML, the XML “serialization” of HTML), whereas the Web needs content expressed as non-XML HTML (the HTML serialization). So although most of most content documents in EPUBs will render (with an appropriate CSS) in a browser, there are aspects of EPUB content documents that are foreign to browserland: techy things like namespaces and the epub:type attribute that adds important semantics. Plus EPUB requires a bunch of other things like a package file with a lot of EPUB-specific metadata, and a manifest, and a spine—because although EPUB is a .zip file, inside that file are a whole bunch of documents and images and fonts and scripts and metadata and other things. Those things are all separate things on the Web; EPUB bundles them together to make them one thing: a publication.

Averting a potentially catastrophic collision

As EPUB 3 evolved in the EPUB 3 Working Group (the EPUB WG) of the IDPF (the International Digital Publishing Forum, the organization that governed the EPUB standard), steps were taken to make it ever more Webby, and to move away, wherever possible, from things that were obstacles to making EPUB purely a Web technology. Why should publishers have to create two separate formats, one for online and one for EPUB? The ultimate vision was to get to a future version of EPUB that just worked on the Web. The same EPUB file, online or offline.

At the same time, the W3C (the Worldwide Web Consortium, which governs most of the standards of the Open Web Platform like HTML, CSS, and many others) realized that its standards didn’t really accommodate the complexities and nuances required by publications. It surprises most people in publishing that the Web wasn’t really paying all that much attention to the publishing industry. It was focused on communication and commerce; it really didn’t have (in fact still doesn’t have) the ability to create a single entity as complex as most publications. All those components of a publication—chapters in a book, images, videos, etc.—are separate things on the Web. They may link to each other, but they are not gathered together to constitute a whole publication.

The ultimate vision was to get to a future version of EPUB that just works on the Web. The same file, online or offline.

So the W3C created the Digital Publishing Interest Group (referred to as the DPUB IG), which was a group of technical folks working to create an official W3C standard (called a Recommendation) for a Web Publication, including a packaged version: a publication that would work online or offline.

Can you see what a disastrous collision we would have had if the IDPF came up with a future EPUB that worked online and offline, and the W3C came up with a specification for a Web Publication that worked online and offline, and they weren’t the same?

Fortunately, many of the people working on both the EPUB WG in the IDPF and the DPUB IG in the W3C were the same people. Those groups worked very closely together to try to avert this collision.

A brief history of the convergence

The formal collaboration began with a paper delivered at the 2014 Books in Browsers conference, co-authored by Markus Gylling (then chair of the EPUB WG) and Ivan Herman (W3C staff liaison for the DPUB IG), called “Bridging the Web and Digital Publishing.” That document evolved throughout 2015 and 2016, resulting in a W3C Editor’s Draft in November, 2016 called “Web Publications for the OWP.” (OWP is the Open Web Platform.)

At the same time, the leadership of both the W3C and the IDPF were working closely together to align the two organizations to make sure the Web world and the publishing world would stay in synch. This resulted in the IDPF becoming part of the W3C in February 2017, and the creation within the W3C of a major new initiative called Publishing@W3C.

This is a really major commitment on the part of the W3C to bring the publishing industry formally into the development of the Web and making EPUB a first-class citizen of the Web. As Jeff Jaffe, W3C’s CEO, said at the time, “W3C is thrilled to gain the expertise of the publishing industry, with its rich tradition of excellence in developing many forms of content for books, magazines, journals, educational materials, and scholarly publications.” I was involved in all this, and I can attest to the sincerity of that statement: the W3C really appreciates the publishing industry and genuinely needs its involvement to advance the Web.

This is based on the realization that a publication on the Web has to be more than just a document, or a collection of linked documents. It’s an arbitrarily extensive and complex collection of resources on the Web (Web pages, CSS, fonts, images, media, scripts, etc.) that has an identity, and that can be referenced and cited and interchanged as a coherent entity. It needs to work both online and offline, whether cached or formally packaged like EPUB. Not separate things, not separate formats. The same thing, working in all those different states.

This is also done with full appreciation of how essential EPUB has become to the publishing ecosystem. EPUB is the basis for virtually all e-readers; it’s the proper format for the interchange of accessible content; educational platforms are built on it; it’s used for all kinds of publications; and it’s widely adopted all over the world.

So how is this working?

To ensure that the publishing industry is brought into the development of the Web, and to ensure that EPUB is maintained and evolves with the development of Web Publications, the W3C has created three new units, grouped under the “Publishing@W3C” umbrella.

  • The Publishing Business Group is responsible for strategic direction. It’s composed mainly of people focused on the business issues (hence the name), initially mainly but not exclusively folks that had been involved in the IDPF. (The W3C is working diligently to get more people from more sectors—education, scholarly publishing, magazines—involved.) It not only provides direction to the other two new groups, it also represents the needs of the publishing industry to all groups within the W3C as a whole.
  • The Publishing Working Group is a technical group, currently composed mainly of the people who were previously on the IDPF EPUB WG and the W3C DPUB IG. It’s chartered to produce, within three years, three specifications: for Web Publications, for Packaged Web Publications, and for EPUB 4. The idea is that all of these will be fundamentally the same, but with Web Publications being the most general; a Packaged Web Publication being a special kind of Web Publication; and EPUB being a special type of Packaged Web Publication, with certain aspects required for a stable publishing supply chain (for example, not packaged in just any possible way, but in a specific way).
  • The EPUB 3 Community Group is charged with maintaining EPUB 3, which among other things will be to work with other groups within the W3C to address specific near term needs, such as improvements to CSS to make it possible to render publications with the sophistication we enjoy in print, and looking after issues like accessibility and educational publishing.

While W3C membership is required to belong to the Publishing Business Group and the Web Publishing Working Group (including a special two-year membership category created for the former members of the IDPF), membership in the EPUB 3 Community Group is free to all.

I have the privilege of participating in all three of these groups, including the Steering Committee of the Publishing Business Group, which gives me a first-hand view of this work as it evolves. I can tell you, most emphatically, that it’s working. This is a very exciting time for publishing. The convergence of EPUB and the Web is well underway!

You can see a presentation I gave on Publishing@W3C here.

About Bill Kasdorf

Bill Kasdorf is VP and Principal Consultant of Apex Content and Media Solutions. Past President of SSP, he is a recipient of SSP’s Distinguished Service Award, the IDEAlliance/DEER Luminaire Award, and the BISG Industry Champion Award. Bill serves on the Steering Committee of the W3C Publishing Business Group and the W3C Publishing Working Group developing the next generation of Web Publications and EPUB; the International Press Telecommunications Council; is Chair of the BISG Content Structure Committee; and is an active member of ABC, the Accessible Books Consortium, the EDUPUB Alliance, and the IDEAlliance Tech Council. Bill has spoken at many industry events, such as SSP, STM, AAUP, DBW, O’Reilly TOC, NISO, BISG, IDPF, IPTC, Seybold Seminars, and the Library of Congress. He serves on the editorial boards of Learned Publishing and the Journal of Electronic Publishing. In his consulting practice, Bill has served publishers such as Pearson, Wolters Kluwer, Kaplan, Sage, Harvard, Toronto, Taylor & Francis, Cambridge, ASME, and IEEE, and organizations such as the World Bank, the British Library, OCLC, and the European Union.