February 23, 2023

Newspaper Digitization: Nuts & Bolts

Newspaper Digitization

For the last two decades, newspaper digitization and conversion have been prominent themes for libraries, archives, genealogy companies, and content aggregators. Each has sought to establish means to create internet-ready searchable newspaper content for wide access and use.

The introduction of key technologies and new product innovations has opened up opportunities for the digitization of at scale, resulting in high-fidelity page images, well-structured XML, and digital file sets for hundreds of millions of pages of invaluable historic newspaper collections.

 

Key Technology

At the turn of the century, Apex’s technology team developed IZAAC, a technology platform stack for content conversion that firmly established Apex at the forefront of the content services industry. This key IZAAC technology, paired with Apex’s talented newspaper subject matter and solutions architects, created a unique solution that uniquely supplied newspaper digitization at scale. Apex’s newspaper conversion services provide deep value due to our ability to combine powerful technologies with people intervening at specific steps in the process – creating the very best possible digital newspapers at a rate that leaves money left in the project budget.

 

Formats

Apex has enjoyed working with National Endowment for the Humanities (NEH) state grant award recipients across the U.S. participating in the National Digital Newspaper Program (NDNP). The NDNP METS/ALTO XML specifications have provided an excellent foundation for institutions to digitize newspapers (both in and outside of the program). Their Chronicling America platform is the de facto content repository for historic U.S. newspapers.

In addition to supporting dozens of NDNP awardees, Apex has provided newspaper services to university libraries, historical societies, and public libraries utilizing the NDNP standard for digitization and conversion deliverables. As a recognized format and ‘best practice’, the NDNP format provides a fantastic path forward for anyone looking for a north star in their approach to creating digital newspapers.

In addition to NDNP, Apex has also seen an uptick in libraries taking a more granular approach to newspaper digitization and seeking out article-level segmentation in their collection. This approach yields the very best user experience for searching and browsing large collections. Apex has digitized over 25 million pages of newspapers with article-level METS/ALTO XML segmentation for libraries around the world.

 

Now is better than ever for Newspaper Digitization!

If you have newspaper collections that are candidates for digitization and conversion, there has never been a better time to get started on tackling your project than right now. Industry-wide, the cost per page of newspaper projects has become increasingly cost-effective, and rates have dropped by up to 60-70% and have stabilized over the past 2-3 years. Thanks in part to breakthroughs in optical character recognition (OCR) technologies, new OCR language support, and established process workflows.

Over the past twenty years, we have had the privilege of serving as partners with hundreds of national, state, and local libraries, as well as content aggregators, to create hundreds of millions of digital newspaper pages across the globe.

Apex provides the following services in support of newspaper projects:

  • Microfilm scanning
  • Microfilm duplication
  • NDNP conversion
  • Language support for hundreds of languages
  • METS/ALTO article-level conversion
  • Cloud bucket delivery workflows

Today, we are grateful to be known as the ‘go-to’ company for newspaper digitization and conversion services. If we can help, we would love to speak with you about your next newspaper project. Please reach out!