Old OpenAlex API documentation
  • Overview
  • Quickstart tutorial
  • API Entities
    • Entities overview
    • 📄Works
      • Work object
        • Authorship object
        • Location object
      • Get a single work
      • Get lists of works
      • Filter works
      • Search works
      • Group works
      • Get N-grams
    • 👩Authors
      • Author object
      • Get a single author
      • Get lists of authors
      • Filter authors
      • Search authors
      • Group authors
      • Limitations
      • Author disambiguation
    • 📚Sources
      • Source object
      • Get a single source
      • Get lists of sources
      • Filter sources
      • Search sources
      • Group sources
    • 🏫Institutions
      • Institution object
      • Get a single institution
      • Get lists of institutions
      • Filter institutions
      • Search institutions
      • Group institutions
    • 💡Topics
      • Topic object
      • Get a single topic
      • Get lists of topics
      • Filter topics
      • Search topics
      • Group topics
    • 🗝️Keywords
    • 🏢Publishers
      • Publisher object
      • Get a single publisher
      • Get lists of publishers
      • Filter publishers
      • Search publishers
      • Group publishers
    • 💰Funders
      • Funder object
      • Get a single funder
      • Get lists of funders
      • Filter funders
      • Search funders
      • Group funders
    • 🌎Geo
      • Continents
      • Regions
    • 💡Concepts
      • Concept object
      • Get a single concept
      • Get lists of concepts
      • Filter concepts
      • Search concepts
      • Group concepts
    • Aboutness endpoint (/text)
  • How to use the API
    • API Overview
    • Get single entities
      • Random result
      • Select fields
    • Get lists of entities
      • Paging
      • Filter entity lists
      • Search entities
      • Sort entity lists
      • Select fields
      • Sample entity lists
      • Autocomplete entities
    • Get groups of entities
    • Rate limits and authentication
  • Download all data
    • OpenAlex snapshot
    • Snapshot data format
    • Download to your machine
    • Upload to your database
      • Load to a data warehouse
      • Load to a relational database
        • Postgres schema diagram
  • Additional Help
    • Tutorials
    • Report bugs
    • FAQ
Powered by GitBook
On this page
  1. API Entities
  2. Authors

Author disambiguation

PreviousLimitationsNextSources

Last updated 1 year ago

Our information about authors comes from MAG, Crossref, PubMed, ORCID, and publisher websites. We use an algorithm to authors; this uses an author’s name, their publication record, their citation patterns, and (where available) their ORCID.

So for example, if J. Schmidt and John Jacob Jingleheimer Schmidt both write about 19th-century ketchup production, we’ll treat them as one author–but we won’t include the JJJ Schmidt who writes about weasel migration (even though his name is their name, too).

Our methods, code, and models are all, of course, fully open. You can find You will also find code and links to training data there.

In late July, 2023, we switched to a new, more accurate author disambiguation system, with a better machine-learning model to identify authors, a smarter strategy for author assignments for new works, and a much better integration with ORCID data when it is available. As part of that switch, we deprecated all of the old OpenAlex Author IDs, and assigned new Author IDs to all authors. You can find the old Author IDs, along with their associated works, . New Author IDs have a numeric component of their OpenAlex ID >5000000000. The new Author IDs have been used since late July, 2023, and in the data snapshots starting in August, 2023.

The "null" Author ID

You may come across an OpenAlex Author with ID A9999999999, particularly if you are using the data snapshot. We use this author ID internally within the disambiguation system as our "null author". It is assigned to all authorships that do not go through disambiguation. Usually, this is because we did not receive an author name for that authorship, the name was too short to disambiguate, or it was a phrase we have specifically called out to ignore in our disambiguation process (for example, "'Unknown Unknown" or "Unknown Author").

👩
disambiguate
technical documentation on the author disambiguation model on Github here.
as a data dump here