August 25, 2017

ORCID is great. Why isn’t it used much for books?

Almost a decade ago, I did a consulting engagement for IEEE to help them with name disambiguation. You probably know that IEEE is a huge, international organization. They have over 400,000 members. When you add to that all the people who are contributors to articles in their 200-some journals or who present papers at one of their 1,500 conferences every year, we are talking a seriously large number of folks.

How can you be sure this Mary Smith is the right Mary Smith?

The issue: how the heck can you know for sure whether the Mary Smith who just sent in a conference proposal, or who is listed as one of the 20-some contributors to an article, has contributed anything else to an IEEE journal or conference? There are scores of Mary Smiths who have done so, several of whom are in this Mary Smith’s field. Plus, is she a member? If she is, which of the who-knows-how-many Mary Smiths in our records is she? If not, let’s try to recruit her!

This problem was not unique to IEEE of course. And to their credit, IEEE had licensed some software to help address the issue. The problem was, it wasn’t working.

As the scholarly ecosystem became digital and online, the issue of name disambiguation became a huge problem.

In that consulting engagement, I researched how other similar organizations were dealing with this issue (answer: lots of different ways, most of them no more successfully than IEEE’s), and how the various proprietary ID services available then—Elsevier’s Scopus Author Identifier, ISI’s ResearcherID, ProQuest’s Community of Science, Collexis BioMedExperts, etc.—were accomplishing name disambiguation (same answer: lots of different ways, none completely successfully).

There were lots of judgment calls involved. When trying to ascertain which papers were actually contributed to by this Mary Smith, was it better, in the interest of accuracy, to omit some papers that actually were hers, but which you couldn’t be sure of, or, in the interest of completeness, to include papers you were pretty sure she’d contributed to, thus scooping up some she actually hadn’t?

Very messy situation. This is what led to the development of ORCID, the Open Researcher and Contributor ID

The solution: The ORCID iD

ORCID is a nonprofit organization that issues an open, persistent, nonproprietary identifier to a given researcher and enables her to record, in the ORCID database, tons of relevant professional information about herself that serves to richly identify her, and that she cares about having right: education history, professional history, articles and books authored or contributed to, organizations a member of, grants received, other IDs (like those just mentioned above), etc.

Best of all, the researcher is who maintains the record. Actually, the researcher herself or, more commonly now, an ORCID member organization like a publisher or society or CrossRef that the researcher has given permission to update the record on her behalf. This is important because these “trusted parties” typically automate the process, for example adding a publication to the contributor’s record at the time of publication.

So not only can the researcher make sure her record is accurate and complete, she controls who can update it. She can also decide which information can be made public and which stays private, part of her profile for disambiguation purposes but not given out.

This is really key. It was obvious to me a decade ago, in that work for IEEE, that any system based on inferring identity would be inherently inaccurate, to one degree or another. (See above re: tight parameters, thus missing attributions, vs. looser parameters, thus including incorrect attributions.) Only when the researcher him- or herself maintains the record can the record truly be trusted.

Plus, ORCID is working hard to ensure that the data connected to ORCID records is done, wherever possible, via machine, with the source of the item clearly visible. The combination of researcher control and authoritative source is ideal in providing a trusted service.

The key to accuracy and completeness of the individual’s identity record: the researcher personally maintains it.

This ORCID iD (yes, officially it uses a lowercase “i”) is now widely used in the scholarly ecosystem. It’s part of most article submission and peer review systems; funders use it extensively; editorial and production systems (and even JATS, the XML model on which most of this work is based) accommodate it; and many scholarly journals provide the ORCID iD of each contributor (if they have it—and some are required) when every article is published.

This not only disambiguates which Mary Smith contributed to that particular article, it also enables a publisher or a funder to see Mary’s credentials—what else she’s published, what other grants she’s received, where she studied, where she works—to the extent Mary has made that information available. And most researchers do make that information available because it is so important to their professional standing. They have an incentive not just to make sure it’s correct and complete, they have an incentive to make it available.

Why isn’t the ORCID iD used more for books?

You may have noticed, though, that I very sneakily moved from talking about scholarly publications in general to talking about articles. That’s because although ORCID has become widely adopted in journals, it has hardly been adopted at all for books.

ORCID engaged me recently to work with them to analyze why this is the case and what to do about it. The reasons come down mainly to the significant differences between current book and journal workflows.

Book workflows are typically much less automated than journal workflows, and the component systems are much less integrated. Most journals use article submission and review systems that capture a lot of useful metadata and can pass it along to production. So when that ORCID iD is provided up front, it can pretty easily make its way all the way through to print and online. Not so much for books. Manuscript review is handled by individual editors, often managed with a spreadsheet; editing is often just done on isolated Word files, which are handed off to vendors for typesetting; and EPUBs are still too often an afterthought. Good luck getting a useful piece of metadata like an ORCID iD from one end to the other.

Books are far more important than journals in humanities and social science fields, and their workflows are much less automated than journal workflows are.

There are discipline distinctions as well. STM publishing is dominated by journals, and a researcher’s professional advancement is dependent largely on the number, quality, and prestige of his journal articles or conference papers. And while both STM and HSS (Humanities and Social Science) fields have both journals and books, books are far more important to the HSS scholar. Tenure is often very dependent on book publication. While a given STM researcher publishes many papers a year, an HSS scholar typically takes years to complete a book. The result is that there is simply less demand for ORCID, and less recognition of its value, in the book-dominated HSS disciplines than there is in the journal-dominated STM disciplines.

ORCID is working on it. Please help!

The ORCID iD would be extremely valuable for books, and for Humanities and Social Science scholars and scholarship, if it could just get critical mass. ORCID is working on a number of initiatives to get this to happen. Some examples:

  • Working with the vendors of the title management systems typically used by book publishers to get ORCID built into them. My research showed that it would be easy to do this; the vendors just haven’t done it because their customers haven’t been asking for it.
  • Working with organizations like the Modern Language Association that provide places where their members already manage their identities, like the MLA Commons.
  • Working with organizations in fields where conference papers are the key to professional advancement. Conference proceedings, though published like books, are really created by journal-like workflows.
  • Collaborating with library organizations like OCLC and JSTOR who have a strong interest in—and experience with—“name authorities” and identity management of the authors across the scholarly book and journal literature.

As more publishers are now integrating what used to be separate book and journal platforms, and as book publishers are beginning to see the value of the kinds of automation and integration that have become common for journals, it has become vital to enable the unambiguous management of contributor data across book and journal content. The time is right for the scholarly book ecosystem to become as ORCID-enabled as journals are.E

Everybody I talked to in my ORCID project agreed that this is important to do, and they virtually all recognized that it shouldn’t be that hard. Let’s get on it, people!

You can see my full report to ORCID here.

