rDNA ITS based identification of Eukaryotes and their communication via DOIs
Frequently Asked Questions
Species Hypothesis - any species-level group of individuals that share a given set of observed characters. In this case it is sequences that share a given level of similarity – that is, operational taxonomic units (OTUs). However, since we’d like to traverse the step between OTUs and species, we’d like to define our OTUs in such a way that they correspond to the species level as currently practiced in mycology. This is a non-trivial exercise and the results will not always be perfect, hence the suffix "hypothesis".
Remark: This is a practical definition of the SH, which is a major building block of the new UNITE system.
Accession code of the SH - each SH has a unique accession code - e.g. SH037970.06FU - where SH translates into Species Hypothesis and is followed by a unique number, full stop, version number, and taxon acronym (FU for Fungi).
Full name of the SH – the accession code of an SH – such as SH037970.06FU – is enough for unique reference (and is, in fact, the only unique reference available for the countless species known only from "Uncultured fungus" sequences). Nevertheless, it is not overly informative for humans. The full name of that particular SH is "Boletus satanas | SH037970.06FU | 97 | UDB000418". These items are the Latin (or other) name given to the sequence | the SH accession code | the similarity level at which the SH was designated | the INSDC/UNITE accession number of the representative/reference sequence. The URL of this particular SH is https://unite.ut.ee/sh/SH037970.06FU.
Reference sequence (RefS) – serves as a name anchor for the Species Hypothesis and is chosen manually by an expert to define the SH. It may originate from any biological sample, viz. herbarium specimen, living culture, soil, water, air, tissue of other organism, etc. The RefS forms a part of the name of the full name of the SH is thus used for scientific communication.
Representative sequence (RepS) – when RefS are not available (has not been designated manually), a representative sequence (RepS) is chosen automatically from the most common sequence type in the SH. All SHs have a RepS. For SHs for which a RefS has been designated, the RefS takes precedence over the RepS. The RepS are shown in green font in the SH pages (e.g., https://unite.ut.ee/sh/SH037970.06FU).
All UNITE SHs are assigned a unique, stable DOI (e.g. doi:10.15156/BIO/SH001616.07FU). Using these DOIs in publications would make results reproducible, link scientific results together across datasets and publications, and allow data assembly over time. Kõljalg et al. (2016) showed that this DOI-based system allows discovery and communication of fungal species even in the absence of formal names. DOIs are always connected to the fungal classification, and when species names become available, then UNITE will connect them to the DOIs post-factum.
SHs are versioned and linked through different versions following their 1) reference and representative sequences (preferred way) or 2) SH composition. Previous versions carry link to newer version(s) with extra information on splits and merges (see SH004673.06FU as an example). Current version shows link(s) to previous version(s).
Currently there are two major versions (6 and 7) and two minor versions (7.0 and 7.1) published and available for searching and browsing online. In major versions, sequences from previous version and new sequences are reclustered and assigned new accession codes (version name is reflected in SH code, see Accession code of the SH) and representative sequences. Reference sequences are carried over from previous version to the new one. If there are merges in new version (e.g. two distinct SHs in version 6 form one SH in version 7), reference sequence that was set earlier, has the priority in new version. In minor versions, SHs are complemented with new sequences (or added as new based on new sequences only), but there are no splits and merges.
NB! Only version 7.0 SHs are published as DOIs. Current version is 7.1.
Each threshold value shows minimum distance between two sister Species Hypotheses (SH). Threshold value 1.0% means that two sequences belong into different SHs if their ITS sequences are more than 1% different. However, they fall into single SH under 1% threshold value if third sequence, which is 0.5% different from both sequences, becomes available. In other words, two sister SHs fall into single SH under specific threshold value if the distance between two most similar sequences in sister SHs are more similar than this specific threshold value requires.
SH taxon names for online resources (SH pages, search results) are updated regularly (twice per month) to take into account new identifications and reference sequences (RefS) set since the last update. SHs with RefS set carry the name of RefS. For SHs without RefS, identifications of all sequences falling into this SH, are considered -
- The lowest common ancestor (taxon name on kingdom, phylum, class, order, family, or genus level) is found by considering identifications of all sequences in SH (unidentified ranks excluded);
- Unless there are conflicts below the lowest common ancestor (e.g. 2 sequences are assigned into 2 different genera), if there is any species level name available in the set of SH sequences, it will be selected to represent this SH;
- In case of conflicts, the conflicting sequences and taxon names are tracked (and presented when working with SHs in PlutoF), and SH will be assigned taxon name that belongs to the lowest common ancestor.
Anyone with documented taxonomic expertise in any given lineage of fungi can join the effort. Experts are asked only to work with the species they know very well. Instructions on how to register are available HERE. Instructions on how to annotate and work with SHs are available HERE.
Nilsson et al. (2014) and Nilsson et al. (2016) list several reasons why we need to facilitate the study of environmental communities of fungi for the general scientific community. In short, when researchers sequence fungal communities, they need – and deserve – correct names of species or higher-level groups – unlike the "Uncultured fungus" classification they get today. Failure to meet this demand will make mycology look bad, which is not a good thing in today's competitive scientific environment. Also, knowledge is likely to accumulate faster for fungal lineages where order prevails; if all researchers are able to communicate and relate their findings to well-defined species (and/or higher levels), then we won't have to deal with data associated with mis-identified or unidentified species of fungi. Finally, going through the SHs of one's core expertise species is a great way to stay abreast of developments and to explore the data available for overlooked patterns and research hypotheses.