Affiliations

On 15 January 2020 the ADS team publicly announced an affiliations feature that described how a curated list of institutional affiliations is built and maintained, and how it can be incorporated into literature searches. A follow-up blog post on 15 Apr 2021 describes the progress to this feature.

There is good motivation for this feature. The affiliation strings of documents in ADS frequently include typographical errors (as provided by users), or users will refer to the same institution with a very different affiliation string. A single institution might have dozens of variations of affiliation strings that appear regularly.

The ads Python package uses the affiliation identifiers (ads.Document.aff_id) introduced in the two blog posts linked above, and includes a data model ads.Affiliation. When a document is returned by ADS, any affiliation identifiers are represented in the ads.Document.affiliation attribute as ads.Affiliation objects. This allows for complex search queries using affiliations, countries, and relationships between affiliations.

You will need the following imports in order to execute all code blocks on this page:

from ads import Affiliation, Document
from ads.utils import flatten

The ads.Affiliation object

The ads.Affiliation data model contains the following fields:

ads.Affiliation.id

The unique (child) identifier for the affiliation.

ads.Affiliation.parent

A foreign key field to any parent identifiers for this affiliation.

ads.Affiliation.abbreviation

The abbreviated affiliation name.

ads.Affiliation.canonical_name

The full affiliation name.

ads.Affiliation.country

The name of the country that the affiliation is located in.

Selecting affiliations

You can select a single Affiliation object with the ads.Affiliation.get() method, or select multiple records using the ads.Affiliation.select() method. Here are a few examples:

# Retrieve a single affiliation based on an exact expression.
mit = Affiliation.get(abbreviation="MIT")

# Get a single affiliation, but we don't care which one.
affiliation = Affiliation.get()

# Select 10 affiliations with "observatory" (case-insensitive) in the name,
# ordered by canonical name.
observatories = Affiliation.select()\
                           .where(Affiliation.canonical_name.contains("observatory"))\
                           .order_by(Affiliation.canonical_name.asc())\
                           .limit(10)
for observatory in observatories:
    print(f"> {observatory.id} {observatory.canonical_name} ({observatory.abbreviation})")
> A11178 AK Volcano Observatory, Fairbanks (AVO)
> A05329 Aalto University, Metsahovi Radio Observatory (Metsahovi Rad Obs)
> A05229 Abastumani Astrophysical Observatory (Abast Ast Obs)
> A05229 Abastumani Astrophysical Observatory (Abast Ast Obs)
> A11178 Alaska Volcano Observatory, Fairbanks (AVO)
> A11489 Archenhold Observatory, Berlin, Germany (Archenhold Obs)
> A01804 Arecibo Observatory (Arecibo)
> A01804 Arecibo Observatory (Arecibo)
> A01804 Arecibo Observatory (Arecibo)
> A11739 Armagh Observatory, Ireland (Armagh Obs)

Here we can see a few repeated records:

  • two for the Abastumani Astrophysical Observatory (Abast Ast Obs),

  • two for the Alaska Volcano Observatory (AVO), with slightly different names, and

  • three for Arecibo Observatory.

The reason that we received multiple records for the same affiliation is because some affiliations have child/parent relationships to other affiliations.

Parent and child affiliation references

Some affiliations have relationships to other affiliations. Currently, ADS only supports parent/child references between affiliations. An example might be departments (children) within a university (parent), where both the department and the university have their own recognised affiliation identifier. Another example might be a research organisation that is spread across geographical areas:

# The Center for Excellence for All Sky Astrophysics (CAASTRO)
# was an Australian Research Council-funded project that included 
# research institutions across Australia, and elsewhere.
caastro = Affiliation.get(abbreviation="CAASTRO")
print(f"# {caastro.id}: {caastro.abbreviation} - {caastro.canonical_name}")
# A11661: CAASTRO - Center for Excellence for All Sky Astrophysics

parent = caastro.parent
print(f"# Parent: {parent.id}: {parent.abbreviation} - {parent.canonical_name}")
# Parent: A00172: Curtin U - Curtin University, Australia

In many cases there is only a single parent reference. But in this case we know that there are multiple records for CAASTRO, because it is a child reference of multiple parent (universities). We can use the ads.Affiliation.parents property to run a self-join on the ads.Affiliation table and find all possible parents of this affiliation.

for parent in caastro.parents:
    print(f"# {parent.id}: {parent.canonical_name}")
# A00172: Curtin University, Australia
# A00254: University of Queensland, Australia
# A00339: Australian National University, Canberra
# A00361: University of Melbourne, Australia
# A00446: University of Western Australia
# A00650: Swinburne University of Technology, Australia
# A00732: University of Sydney, Australia
# A04927: Australian Research Council

If we wanted to find all the children referenced by a parent, we can use the ads.Affiliation.children back-reference accessor:

arc = Affiliation.get(canonical_name="Australian Research Council")
for node in arc.children.order_by(Affiliation.canonical_name.asc()):
    print(f"# {node.id}: {node.canonical_name}")
# A11660: Antarctic Climate and Ecosystems Cooperative Research Center
# A11661: Center for Excellence for All Sky Astrophysics
# A11720: Center for Quantum Computation and Communication Technology
# A11705: Center of Excellence for Climate System Science
# A11662: Center of Excellence for Core to Crust Fluid Systems
# A11659: Center of Excellence in Ore Deposits