×Close

Webinar: Newspaper digitization with Library of Virginia  | October 5       Register Now

July 26, 2017

Why markup matters

Group of people holding up ipads with big question marks, "we need answers!"

I’ve been so involved with markup for so many years (decades!)—as a designer, as a copyeditor, as a typesetter, as an XML advocate and educator, and now as a consultant and a contributor to standards development—that I take the value of good, consistent, standards-based markup as self-evident. What could be more obvious?

So I was a bit startled to be asked to put together a session for the recent AAUP Annual Meeting (Association of American University Presses) to explain what the point of it is.

I have spoken on markup often at AAUP meetings over the years, but those sessions were mainly teaching sessions: teaching people about XML, HTML, EPUB, all the pointy-bracket stuff.

This session was to have no codes, no tags, no actual markup. (What?!?! Impossible!)

What, exactly, is the point of markup?

The session was not for the production and digital folks who take markup for granted as much as I do. Instead, it was for all the other folks who have been told that markup is important—and more recently that XML, especially the somewhat squishy concept of “XML First,” is essential to the future of the enterprise. The folks who think, “Okay, but why, exactly? We get Word files from authors; we typeset them with InDesign; we make nice PDFs; why do we need markup?”

I’m a detail person. I actually love the complexity, the ability to make ever more granular and semantic distinctions, involved in content modeling and markup. But I do have a tendency to get into the weeds. Here was an opportunity to stay weed-free.

So here’s the basic point:

Markup is fundamentally just identifying and naming the components of your content.

What most people don’t stop to think about is that design and typography are really implied markup. The big number at the top of a new page in a book is a chapter number, right? And the big type under or beside that is the chapter title, and the smaller phrase in italic right under it is a subtitle. The separate blocks of text—separated by spaces, or indicated by indents—are paragraphs.

Those components—titles, paragraphs, subheads, footnotes, and all the rest—need to look different so we can tell them apart and understand their relationships. All the paragraphs probably look the same, but the ones following a small italic subhead are probably a subsection of the section that began with a bigger bolder subhead. Voilá, we’ve got structure!


Markup is what enables a computer to tell the components apart, and understand their relationships.


Typographic convention is what enables us to do that. They’re just obvious, right? We don’t need markup to tell us what they are because we’ve been trained by years of reading to know ’em when we see ’em. (Well, to those of us who can see them. Once again I need to mention accessibility. . . .)

Markup is what enables a computer to tell the components apart, and understand their relationships.

But don’t the styles in a Word file or InDesign file do that? Well, yes, they do, or at least they can. Taken in its most general sense, they are a form of markup. But the ones in the Word file are only understood by Word, and the ones in the InDesign file only by InDesign. And they can be different from book to book—designers make them up, typesetters make them up.

So what we really want to do is to identify and name the components of our content consistently, in both a human- and machine-readable way. Why is this important? Because it reduces or eliminates ambiguity: everybody in the workflow can know what they mean—and so can the computers!

What makes markup good markup?

But let’s take that definition a step farther:

Good markup is identifying and naming the components of your content consistently, in both a human- and machine-readable way, according to a standard vocabulary and syntax.

Why is that so important? Because it enables other parties to already know what the components are and how they relate, using tools that may be different from the ones you use.

It also means that you can avoid re-inventing the wheels. Your content speaks the language of a community. Which enables interoperability.

And this enables what’s known as semantic markup: not just identifying what the components are, but what they mean, what they’re there for.

Why Web-based markup is so important

Now the final step, admittedly putting my advocacy hat back on:

Good markup is identifying and naming the components of your content consistently, in both a human- and machine-readable way, according to a standard vocabulary and syntax, especially XML, HTML, and EPUB.

These have become fundamental to the publishing ecosystem. The Web and Web technologies have become essential to content, commerce, and communication. Anything viewed in a Web browser. Blogging platforms. Tons of other Web-based technologies. EPUB 3—the content documents of which are HTML5 vocabulary in XML syntax—is now the basis for accessibility.


HTML and EPUB don’t guarantee interoperability and accessibility. What they provide are interoperabilityability and accessibilityability. The ability to be interoperable and accessible.


So does this mean if we have HTML5, XHTML5, and EPUB 3, we’ve guaranteed interoperability and accessibility?

Not quite. What they provide are interoperabilityability and accessibilityability. The ability to be interoperable and accessible. You have to use them well.

Markup matters!

About Bill Kasdorf

Bill Kasdorf is VP and Principal Consultant of Apex Content and Media Solutions. Past President of SSP, he is a recipient of SSP’s Distinguished Service Award, the IDEAlliance/DEER Luminaire Award, and the BISG Industry Champion Award. Bill serves on the Steering Committee of the W3C Publishing Business Group and the W3C Publishing Working Group developing the next generation of Web Publications and EPUB; the International Press Telecommunications Council; is Chair of the BISG Content Structure Committee; and is an active member of ABC, the Accessible Books Consortium, the EDUPUB Alliance, and the IDEAlliance Tech Council. Bill has spoken at many industry events, such as SSP, STM, AAUP, DBW, O’Reilly TOC, NISO, BISG, IDPF, IPTC, Seybold Seminars, and the Library of Congress. He serves on the editorial boards of Learned Publishing and the Journal of Electronic Publishing. In his consulting practice, Bill has served publishers such as Pearson, Wolters Kluwer, Kaplan, Sage, Harvard, Toronto, Taylor & Francis, Cambridge, ASME, and IEEE, and organizations such as the World Bank, the British Library, OCLC, and the European Union.

Questions?