Databases in BIBI IV Automated ProKaryotes Phylogeny

Databases

DataBases sources

leBIBI DB is now using the rDNA database of riboDB release 17.0

rDNA are extracted as follows:
From the genomes of Bacteria present in RefSeq + GenBank genomes with a species level absent in the RefSeq DB
From the genomes of Archaea present in RefSeq + GenBank

As concern 16SrDNA, it contains 138,286 genomes of Bacteria and 4,779 of Archaea representing 22,781 species name
In the common case of multiples operons only one rDNA is retained on the basis of its centrality

Taxonomy DB

EMBL-ENA taxonomy (xml format). A Julia-written dictionnary is constructed and is used to access to the taxonomy/nomenclature normalized hierarchy.

Type Strains DB

The source of information concerning Type Strains sequences is now RefSeq

Available DB

Archaea and Bacteria 16S stringent DB

This is the default DB containing the genomes of NCBI reference genomes (R) + genomes of the type-strain (T)
If the a species is missing, we try to find one or more present in Ensembl! bacteria (E).
if no T/R/E is found, we select a genome following the completude quality (Complete better than Scaffold better than Unassembled) and the longest RNA. This is the TRECS_16SrRNA.fst DB also named prototype low redundency DB.

Archaea and Bacteria 16S relaxed DB

This is the "all named sequence" DB, available in the "expert" version and the complete riboDB release 17.0 (here named BiBi_16SrRNA.fst).

Bacteria DNA-directed-RNA-polymerase subunit beta and beta/beta' DB

This DB is constructed during the riboDB release 17.0 construction process and follows the stringency rules of the 16S stringent DB.
Note that it is extended to the case of some taxa where there is a fusion of genes rpoB and rpoC.

Mycobacterales Chaperonin GroeL

Built as the preceeding one. Contains the paralogs groEL-1 and groEL-2. As there is a BLAST search in the process, mixing both is not a problem as the closest in term of similarities will be found.

leBIBI IV SSU-rDNA (16S) Automated ProKaryotes Phylogeny