Methods for updating taxon names in APCalign

Aligning taxon names with taxon concepts/names in APC and APNI

The following table indicates the rules for each of the 51 separate algorithms sequentially applied to attempt to align each submitted name to a taxon concept in APC or scientific names in APNI.

Note, if the table is truncated on your screen, use horizontal scroll to view the entire table.

alignment_code search algorithm original name variant matched to match type taxonomic dataset aligned to taxon_rank of alignment notes about sequence
match_01a Detect scientific names, including authorship original_name exact APC accepted taxon concepts species/infraspecific Check if strings are full scientific names, including authorship.
match_01b Detect scientific names, including authorship original_name exact other APC taxon concepts species/infraspecific NA
match_01c Detect canonical names, lacking authorship cleaned_name exact APC accepted taxon concepts species/infraspecific Check if strings are taxon names, lacking authorship.
match_01d Detect canonical names, lacking authorship cleaned_name exact other APC taxon concepts species/infraspecific NA
match_02a Detect genus sp., genus ssp. and genus spp. first word (“genus”) exact APC accepted taxon concepts, other APC taxon concepts, APNI genus First goal is to align 2-word strings that indicate an unknown species within a genus (or family)
match_02b Detect genus sp., genus ssp. and genus spp. first word (“genus”) fuzzy APC accepted taxon concepts genus NA
match_02c Detect genus sp., genus ssp. and genus spp. first word (“genus”) fuzzy other APC taxon concepts genus NA
match_02d Detect family sp., family ssp. and family spp. first word (“genus”) exact APC accepted taxon concepts family NA
match_03a Detect --, -- (intergrade taxa) and align to genus first word (“genus”) exact APC accepted taxon concepts, other APC taxon concepts, APNI genus Next find strings that indicate a name reflects an intergrade between two taxa. These names can only be aligned to a genus.
match_03b Detect --, -- (intergrade taxa) and align to genus first word (“genus”) fuzzy APC accepted taxon concepts genus NA
match_03c Detect --, -- (intergrade taxa) and align to genus first word (“genus”) fuzzy other APC taxon concepts genus NA
match_03d Detect --, -- (intergrade taxa) and align to genus first word (“genus”) fuzzy APNI genus NA
match_03e Detect --, -- (intergrade taxa), but fail to align to genus NA no match NA NA NA
match_04a Detect \ (indecision between taxa) and align to genus. first word (“genus”) exact APC accepted taxon concepts, other APC taxon concepts, APNI genus Next find strings that indicate a name reflects a data collector’s indecision about which of two (or more) taxa is the appropriate taxon. These names can only be aligned to a genus.
match_04b Detect \ (indecision between taxa) and align to genus. first word (“genus”) fuzzy APC accepted taxon concepts genus NA
match_04c Detect \ (indecision between taxa) and align to genus. first word (“genus”) fuzzy other APC taxon concepts genus NA
match_04d Detect \ (indecision between taxa) and align to genus. first word (“genus”) fuzzy APNI genus NA
match_04e Detect \ (indecision between taxa), but fail to align to genus NA no match NA NA NA
match_05a Detect canonical names, lacking authorship stripped_name fuzzy APC accepted taxon concepts species/infraspecific NA
match_05b Detect canonical names, lacking authorship stripped_name fuzzy other APC taxon concepts species/infraspecific NA
match_05c Detect canonical names, lacking authorship cleaned_name exact APNI species/infraspecific NA
match_06a Detect aff, affinis (affinity to) and align to genus first word (“genus”) exact APC accepted taxon concepts, other APC taxon concepts, APNI genus Find strings that indicate a name that indicates an affinity to a specific taxon, but the name itself is not that taxon. Such names, unless documented in APC (i.e. matches 6, 7 above) can only be aligned to genus.
match_06b Detect aff, affinis (affinity to) and align to genus first word (“genus”) fuzzy APC accepted taxon concepts genus NA
match_06c Detect aff, affinis (affinity to) and align to genus first word (“genus”) fuzzy other APC taxon concepts genus NA
match_06d Detect aff, affinis (affinity to) and align to genus first word (“genus”) fuzzy APNI genus NA
match_06e Detect aff, affinis (affinity to), but fail to align to genus NA no match NA NA NA
match_07a Detect canonical names, lacking authorship stripped_name imprecise fuzzy APC accepted taxon concepts species/infraspecific Further checks if strings are taxon names, lacking authorship, now with imprecise fuzzy matching
match_07b Detect canonical names, lacking authorship stripped_name imprecise fuzzy other APC taxon concepts species/infraspecific NA
match_08a Detect x (hybrid taxon) and align to genus first word (“genus”) exact APC accepted taxon concepts, other APC taxon concepts, APNI genus Find strings that indicate a name that is a hybrid between two taxa. Such names, unless documented in APC (i.e. matches 6, 7 above) can only be aligned to genus.
match_08b Detect x (hybrid taxon) and align to genus first word (“genus”) fuzzy APC accepted taxon concepts genus NA
match_08c Detect x (hybrid taxon) and align to genus first word (“genus”) fuzzy other APC taxon concepts genus NA
match_08d Detect x (hybrid taxon) and align to genus first word (“genus”) fuzzy APNI genus NA
match_08e Detect x (hybrid taxon), but fail to align to genus NA no match NA NA NA
match_09a Detect canonical names, by checking first three words in string three words (from stripped_name_2) exact APC accepted taxon concepts species/infraspecific Check if the first three words in the name string match with a taxon name, allowing notes to be discarded. Also useful for aligning phrase names.
match_09b Detect canonical names, by checking first three words in string three words (from stripped_name_2) exact other APC taxon concepts species/infraspecific NA
match_09c Detect canonical names, by checking first three words in string three words (from stripped_name_2) fuzzy APC accepted taxon concepts species/infraspecific NA
match_09d Detect canonical names, by checking first three words in string three words (from stripped_name_2) fuzzy other APC taxon concepts species/infraspecific NA
match_10a Detect canonical names, by checking first two words in string two words (from stripped_name_2) exact APC accepted taxon concepts species/infraspecific Check if the first two words in the name string match with a taxon name, allowing notes and invalid infraspecific names to be discarded. Also useful for aligning phrase names.
match_10b Detect canonical names, by checking first two words in string two words (from stripped_name_2) exact other APC taxon concepts species/infraspecific NA
match_10c Detect canonical names, by checking first two words in string two words (from stripped_name_2) fuzzy APC accepted taxon concepts species/infraspecific NA
match_10d Detect canonical names, by checking first two words in string two words (from stripped_name_2) fuzzy other APC taxon concepts species/infraspecific NA
match_11a Detect canonical names, lacking authorship stripped_name fuzzy APNI species/infraspecific Further checks if strings are APNI taxon names, lacking authorship, now with fuzzy matching or considering just the first three or two words in the string.
match_11b Detect canonical names, lacking authorship stripped_name imprecise fuzzy APNI species/infraspecific NA
match_11c Detect canonical names, by checking first three words in string three words (from stripped_name_2) exact APNI species/infraspecific NA
match_11d Detect canonical names, by checking first two words in string two words (from stripped_name_2) exact APNI species/infraspecific NA
match_12a Detect genus, by checking the first word in the string first word (“genus”) exact APC accepted taxon concepts genus Check if the first two word in the name string match with a taxon name, allowing an alignment to the genus-level or family-level
match_12b Detect genus, by checking the first word in the string first word (“genus”) exact other APC taxon concepts genus NA
match_12c Detect genus, by checking the first word in the string first word (“genus”) exact APNI genus NA
match_12d Detect family, by checking the first word in the string first word (“genus”) exact APC accepted taxon concepts family NA
match_12e Detect family, by checking the first word in the string first word (“genus”) exact other APC taxon concepts family NA
match_12f Detect genus, by checking the first word in the string first word (“genus”) fuzzy APC accepted taxon concepts genus NA
match_12g Detect genus, by checking the first word in the string first word (“genus”) fuzzy other APC taxon concepts genus NA
match_12h Detect family, by checking the first word in the string first word (“genus”) fuzzy APC accepted taxon concepts family NA
match_12i Detect family, by checking the first word in the string first word (“genus”) fuzzy other APC taxon concepts family NA

Updating taxonomy

The following table indicates the separate functions used to:

Different functions are used depending on the taxon rank of the aligned name and the taxonomic dataset to which the name was aligned (APC vs APNI).

categories of aligned names processed
columns filled in
function name taxonomic dataset taxon rank updates to aligned name format of suggested_name accepted name (& taxon_ID) genus (& taxon_ID_genus) scientific_name_ID
update_taxonomy_APC_genus APC genus to APC accepted genus genus sp. [notes] * no yes no
update_taxonomy_APNI_genus APNI genus none genus sp. [notes] no no no
update_taxonomy_APC_family APC family none family sp. [notes] no no no
update_taxonomy_APC_species_and_infraspecific_taxa APC species & infraspecific NA APC accepted species** name yes yes yes
– taxonomic_splits = “most_likely_species” NA NA to APC accepted taxon concept most likely APC accepted species** name [alternative possible names] yes yes yes
– taxonomic_splits = “return_all” NA NA to APC accepted taxon concept all possible APC accepted species** name (extra rows added) yes yes yes
– taxonomic_splits = “collapse_to_higher_taxon” NA NA collapsed to APC accepted genus genus sp. [collapsed names] no yes no
update_taxonomy_APNI_species_and_infraspecific_taxa APNI species & infraspecific none to species name; genus to APC accepted genus if possible APNI listed species** name* no sometimes yes
(names not aligned) (not aligned) (not aligned) none original name no no no

-* genus updated to APC accepted genus if possible; ** species or infraspecific taxon name

Outputs of APCalign

The following columns are output by the core function create_taxonomic_update_lookup and the two component functions align_taxa and update_taxonomy.

variable returned by description
original_name default The original plant name.
aligned_name default The input plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function.
accepted_name default The APC-accepted plant name when available.
suggested_name default The suggested plant name to use. Identical to the accepted_name when an accepted_name exists; otherwise the suggested_name is the aligned_name or the aligned name with an outdated genus updated.
genus default The genus of the accepted (or suggested) name; only APC-accepted genus names are filled in.
family full The family of the accepted (or suggested) name; only APC-accepted family names are filled in.
taxon_rank default The taxonomic rank of the suggested (and accepted) name.
taxonomic_dataset default The source of the suggested (and accepted) names (APC or APNI).
taxonomic_status full The taxonomic status of the suggested (and accepted) name.
aligned_reason default The explanation of a specific taxon name alignment (from an original name to an aligned name).
update_reason default The explanation of a specific taxon name update (from an aligned name to an accepted or suggested name).
subclass full The subclass of the accepted name.
taxon_distribution full The distribution of the accepted name; only filled in if an APC accepted_name is available.
scientific_name_authorship default The authorship information for the accepted (or synonymous) name; available for both APC and APNI names.
taxon_ID full The unique taxon concept identifier for the accepted_name; only filled in if an APC accepted_name is available.
taxon_ID_genus full An identifier for the genus; only filled in if an APC-accepted genus name is available.
scientific_name_ID full An identifier for the nomenclatural (not taxonomic) details of a scientific name; available for both APC and APNI names.
taxonomic_status_aligned full The taxonomic status of the aligned name before any taxonomic updates have been applied.
row_number full The row number of a specific original_name in the input.
number_of_collapsed_taxa default The number of possible taxon names that have been collapsed when taxonomic_splits == “collapse_to_higher_taxon”.