March 27, 2019

Five common ebook conversion challenges and how to solve them

Converting backlist titles to ebook formats is easier than ever, but problems do arise. Here are five common complications, and how to solve them.

The ebook conversion process has come a long way. More publishers are updating backlist titles for this very reason. It is more cost effective and straightforward than it every used to be. Most modern publications (1970-1990) created on word processors are fairly straightforward to convert. OCR technology alone can achieve high accuracy levels, especially if the source content is of high quality.

However, as in most things, there are complications that come up. For instance, what happens when the source content isn’t high quality? Or it contains complicated equations?

The typical challenges that come up in ebook conversion can all be overcome. It just requires a little more thought and effort.

Here is an introduction to the typical challenges for converting print-only or legacy PDF texts to ebook file formats:

1. Source content quality

Where the ebook-conversion-rub most frequently comes is with variations in source content.

If the source is faded, incorporates intricate fonts, contains multiple languages, or warped images, it becomes more difficult to deliver a high-quality product, but not impossible.

Content from older sources may need special scanning equipment and expertise to create high-quality digital files. Books that do not lay flat when opened, have brittle pages, or tight bindings that prevent viewing content in the gutter require a collaborative approach between the librarian, publisher, preservationist, and conversion vendor to achieve good results.

A book that does not lay flat when opened, has damaged or missing text, and bleed through are all very common problems when scanning printed material. They can all be overcome, they just require a different resolution path.


2. Scale and variety of material

Converting a large quantity of content with a lot of variety is more difficult than converting a large amount of content that is similar. This does not come as a surprise.

For every specification, technology needs to be configured, workflows may vary, and people need to be trained. When all of the source material is relatively similar, over time it becomes easier and faster to convert. Processes get more efficient and the people working on them become familiar.

There is not a magic solution for this one. Prioritizing content at the outset of a project is key, as well as working with a partner who is flexible and creative to help find efficiencies within the process.

3. Handwritten material

No matter how good OCR has gotten, handwritten material still causes challenges. Some script is even hard for the human eye to decipher.

While some OCR programs claim to convert handwritten text, the level of accuracy received is still questionable. Older documents with intricate or damaged text and even more recent, yet untidy, script can be very difficult for OCR to decipher.

Today, keying and proofreading methodologies are the best means to accurately convert handwritten text to digital. It takes a little more energy but delivers much higher accuracy levels and a better user experience.

Check back in a few years and we’ll see if technology has solved this problem yet.

4. Accessibility

With the Section 508 refresh and other recent international regulations, accessibility is a new favorite topic for publishers. Making ebooks accessible as they are created eliminates the need for costly rework down the line. In addition, it makes a more useful product for all consumers.

Publishers and service providers already use HTML coding capable of enabling accessibility, they just have to use it correctly. There is a level of expertise required to make sure Web PDF, EPUB, MOBI files, etc. have all the right accessibility features.

This involves supplementing HTML with the detailed structural semantics in the WAI-ARIA specification (Web Accessibility Initiative-Accessible Rich Internet Applications). Fortunately, this does not necessarily mean a ton of extra work, especially if you are already using an XML-first workflow.

5. MathML

Equations and math can be challenging. Right now, it is not so much of a coding issue (most math is in some form of MathML), the real difficulty is on the distribution side.

Different distribution platforms render MathML differently. The result is not always pretty. For this reason, MathML often gets left out of the final publication, even if it existed in earlier versions. This is something the industry as a whole is working to solve, especially as publishers grapple with accessibility. Proper MathML is a requirement for accessible publications.

Again, while there are some common challenges that come up with large-scale ebook conversion, it is nothing that cannot be overcome without some creativity and experience.

If you are taking on ebook conversion initiatives, get in touch with the experts of Apex. We have converted over a million ebooks (both print and PDF) for publishers around the world and would love the opportunity to support your conversion efforts, too.

