Google
More documentation can be found on the ARB website.
Last update on 08. Apr 2009 .
Main topics:
Related topics:

Estimate Parameters from Coloumn Statistics

OCCURRENCE

ARB_DIST

 

DESCRIPTION

In a standart RNA, base frequencies are not equally distributed. Especially in the archea subclass we find extremely G+C rich sequences. This yielded in a couple of new rate corrections, algorithms and programs which:

  • calculate the average G+C content of all/two sequences
  • correct the distance.

But further research showed us that the G+C frequencies are not equally distributed within a sequence. Especially helical parts have a significant higher G+C content than non helical parts. One strait forward algorhythm would calculate each frequency independently for each coloumn. Especially for small datasets the resulting frequencies would look like random data, as too few examples are analyzed.

In ARB we implemented a combination of the 2 approaches. Lets say we want to estimate a Parameter 'P' with a maximum variance 'maxvar', so we need a minumum samples 'minsap'.

  • All sequence positions a clustered according to
    • helical/non helical region
    • variability

  • The size of the cluster is choosen with respect to the variability of the sequences to get a minimum of independent events.
  • The final parameter estimate for a coloumn is a weighted sum between the estimate for the cluster and the estimate for the single position.

You can give your favourite method a higher weight by controlling the smothing parameter:

  • Less smoothing -> independent parameter estimates
  • Much smoothing -> clustered parameter estimates

To get a good tree we recommend you to try all selections.

 

NOTES

To get parameters from a column statistic you first have to create one. Do this with <ARB_NT/SAI/Positional Variability (Parsimony M.)>

 

WARNINGS

Problems may occur when

  1. independent parameter estimates is selected and
  2. your dataset is quit small (<100 Sequences) and
  3. one sequence is bad or badly aligned

or

  1. Much smoothing of parameters is selected and
  2. your are anlazing ribosomal RNA and
  3. 'Use Helix Information' is turned off

 

BUGS

No bugs known