leBIBI IV 16S/23S Automated ProKaryotes Phylogeny

Some useful informations

Please read the informations for the "One click mode process"Most of the options are straigthforward understandable so only very specific tricks have to be explained

Mitigation of low quality positions of the query

Conditions of use

This option is useful ONLY when submitting a sequence of low quality and especially important if this is a short one.

But when the sequences are of good quality, the process is done but no correction will occur, so suppressing the process is not mandatory.

Mechanism

After the alignment, a ligth correction of the query is done to replace some apparently faulty positions due to sequencing errors by gaps. For exemple this is the case if the position is an "A" in the query BUT no "A" is present in the corresponding column of the alignment. The same is applied if the position contains non cardinal letters (non ATCG, but FastTree is doing the same at the beginning of its work).

The columns with more than the selected ratio of gaps are then deleted.

The small set of bases that are at an unexpected distance of the core of the alignment in the query sequence are also replaced by gaps (this occurs in the case of short and bad quality sequences inducing a strong over-alignment).

When the length of the query is low comparatively to the length of the alignment, the phylogenetic reconstruction is impacted by the number of undetermined positions as FastTree or the model it uses does not tolerate too much of them. This is especially the case when some sequences are sharing 100% identities in the region covered by the query. The query may be extremely ill-positionned due to this reconstruction artifact. So we have to adapt the length of the alignment to the query

The major cause is the position of the query on the SSUrDNA gene for some species/genus/families when the gene has very low variabilities in this region. A mitigation has been set-up : the length of the alignment is driven by the length of the query as soon as the length of the query is less than 90% of the length of the aligment . This 90% length is set to prevent reconstruction errors and maintain some more information. The alignment length is thus adapted to the query length corrected by the number of gapped+undetermined positions and 5% of the query-length added, if possible at both ends.

 

 

logo LBBE

LABORATORY OF BIOMETRY
AND
EVOLUTIONARY BIOLOGY

logo CNRS logo university

Original solution copyright w3schools.com