January 2, 2019

Give new life to old texts through ebook conversion

Converting backlist titles to ebook formats is a growing trend, and for good reason

There is incredible value in converting backlist titles to ebook.With the amount of digital content created growing every day, it is easy to forget that publishers have only been producing “born-digital” content for about twenty years.

There is incredible value – both for revenue and academic purposes – in the hundreds of years of backlist, print-only titles sitting on publishers’ shelves. Similarly, early digital publications created in now-obsolete software have a chance of second life through ebook conversion.

Thanks to years of experience and ever-advancing technology, converting backlist titles to ebook formats is easier than ever; and publishers are reaping the benefits of providing old content through new distribution channels.

Ebook conversion has gotten much easier

Since “digital preservation” even became feasible during the last two decades of the 20th century, the entire ebook conversion process has become very smooth overall. OCR technology has come a long way and can achieve very high accuracy given the source content is of high quality.

While older, handwritten texts still present some serious challenges, typeset or word processor files created with modern fonts (anything created after 1970) are much easier to convert and publishers have had huge success in making frontlist and backlist titles available in ebook formats.

Capitalize on ebook distribution platforms with XML

By “ebook,” we really mean any of the wide variety of ways a text can show up digitally – as a WebPDF, a string of HTML files on a website, a Kindle MOBI file, EPUB, or what we see most often, an XML file.

The many distribution channels publishers use today like Amazon, Apple, OverDrive, VitalSource, and Google Scholar, all require different file types.

Publishers need a suite of files – different versions of the exact same publication (EPUB, MOBI, XML, WebPDF) that will “just work” anywhere.

XML is the best method to ensure the underlying content is consistent and correct to achieve that goal.

While XML-first workflows have gained large-scale adoption in the last decade, there are still plenty of publishers who do not use an XML-workflow or create “born digital” publications today. That doesn’t mean ebook conversion work cannot still take place.

Thank you, STM Journals

We really have the STM community, particularly publisher initiatives working with the National Library of Medicine, to thank for helping to lay the foundation for how content could be coded full-text in a consistent manner, and, for publishers hosting online content that needed to be uniquely identified so that it could be found in web searches. The rise of CrossRef, the mandate of PubMed Central, and accessibility initiatives promoted by DAISY pushed forward core technologies and delivery platforms that help drive ebook conversion work today.

Frontlist vs. backlist conversion

The process of converting ebooks has only improved from there for both frontlist and backlist content. In the past ten years, we have seen a large uptick for publishers approaching ebook conversion projects. Even for backlist work, some form of digital copy usually exists (old PDF or image file) it just needs to be processed to function to today’s standards, which again, can be a straightforward process. Most of the backlist titles Apex converts today come to us either in old PDF formats or as images. Projects that require scanning first do come up, but they are usually in the library space.

The biggest difference between frontlist and backlist ebook creation is rooted in source type and access to authors for fine-tuning. The more modern and consistent the source type, the easier the conversion process. Whether your source content is 10 or 100 years old, there are solutions; it is just a matter of finding the best resolution path forward to achieve your desired result.

Libraries and cultural heritage preservation

As I mentioned above, the majority of source material we receive from publishers comes already in a digital format. Libraries are a different story. Many libraries working to preserve cultural heritage materials require digitization or scanning first to support their content preservation and storage condensing goals.

Quality of source content plays a hugely important role here. OCR can provide high degrees of accuracy, but when source content is old, damaged, missing text, or unable to lay flat for scanning, it presents challenges that require creative solutions to present a high-quality digital version that is accessible to users online.

Hard copies degrade over time, no matter how slowly. Following a digitization path allows unique, or even irreplaceable, content to be properly curated and protected while still providing access to the information inside. Newspaper digitization from microfilm is also a common medium for libraries who are the keepers of these important and historic collections.

Conversion to XML and ebooks helps libraries meet several requirements: preservation of historic material, making content available broadly for end-users, and meeting accessibility requirements for people with disabilities.

Keep up the good (conversion) work

Whatever you are converting, the steady evolution of markup language from SGML to HTML and now XML have made ebook conversion both a competitive business strategy and a practical method for opening information to a wider audience online.

There is a lot of content out there. Print, digital, print and digital – and it is a wonderful thing. The access we have today to research, history and entertainment is driving medical breakthroughs that save lives, bridging gaps across cultures, and bringing joy and excitement to millions of people around the world.

If you are working on ebook conversion or other digitization projects, we would love to talk. Apex has converted over a million documents for publishers and libraries around the world. Contact us to start a conversation.

About Greg Suprock

Greg is Head of Solutions Architecture at Apex. He has over 20 years of experience in XML workflows, content management, web application development, and prepress. Greg excels at collaborative efforts to achieve project and business goals. He has developed XML workflows for the Public Library of Science, HighWire Press, The Library of Congress, and many more.

Related Posts

Explore All