Sabine MERCIER
Tél : (33)[0]2 35 14 71 34
Fax : (33)[0]2 32 10 37 94
Adresse électronique :
Sabine.Mercier@univ-rouen.fr
soutenue le 16 décembre 1999
sous la direction de C. Dellacherie, Directeur de recherche au CNRS
et de D. Cellier, MdC à l'Université de Rouen
avec la mention très honorable et félicitations du Jury
| Discipline | : Mathématiques Appliquées
| Spécialité | : Probabilités et Statistiques
| |
Composition du Jury :
| Rapporteurs | : | Régnier, M. | Directeur de Recherche, |
| INRIA Rocquencourt | |||
| Prum, B. | Professeur, | ||
| Université d'Évry | |||
| Directeurs de Thèse | : | Cellier, D. | Maître de Conférences, |
| Université de Rouen | |||
| Dellacherie, C. | Directeur de Recherche, | ||
| CNRS-Université de Rouen | |||
|
Examinateurs | : | Charlot, F. | Maître de Conférences, |
| Université de Rouen | |||
| Lecroq, T. | Maître de Conférences, | ||
| Université de Rouen | |||
| Risler, J.L. | Directeur de Recherche, | ||
| CNRS-Université de Versailles | |||
| Robert, C. | Professeur, | ||
| Université de Rouen | |||
Résumé
Abstract
The comparison of two biological sequences is an important
tool for the analysis of data from molecular biology. In
order to make such comparisons, assignments, called scores,
are attributed to the different couples of components of the
sequences (nucleotides or amino acids) and we search the regions
that correspond to the maximal score, the local score.
The statistical problem is to test if the calculated score
is significant or not, in order to highlight a biological
link between the sequences. The goal of this thesis consists
in studying the distribution of the local score.
For this, the sequences are represented by independent
and identically distributed and integer valued random variables.
We first suppose that the expected score is non positive.
The work of Karlin et al., implemented in B.L.A.S.T. for
sequence alignment, is proved more precisely and ameliorated.
Using the random walks theory, we then establish the distribution
of the maximum of the partial sums. This distribution
is the unique invariant probability transition of a Markov
chain. This result allows us to obtain a new assymptotic
(for long sequences) approximation for the local score distribution
that improve the one given by Karlin et al.
We then obtain the exact distribution of the local score
using Markov chain theory. This result is available for negative,
positive or null expected score, and give the distribution
as a power of a certain matrix and is adapted to short
sequences.
These two approaches studied in the thesis are different
and independent each other, and with the one used by
Karlin et al. as well. The results can easily be generalized
for Markovian dependent sequences.
Mots clés :
Score global, score local, P-value, chaînes de Markov, suite de Bernoulli i.i.d., marche aléatoire, signification statistique
Keywords:
Global score, local score, P-value, Markov chains,
i.i.d. Bernoulli sequence, random walks, significativity,
alignment.
![]()