## The Statistics of Gapped Sequence Alignments

- Nov. 16, 2007
- 3:30 p.m.
- LeConte 401

## Abstract

Sequence comparison is indispensable to modern molecular biology. For example, biologists use the BLAST program more than once a second over the web to compare their query sequences to databases. If a query matches a database sequence of known function with a small p-value, the biological function of the query can be inferred. Presently, no on-line method can compute p-values to the accuracy the BLAST program requires, so sequence matches are restricted to certain pre-computed statistical parameters, to the detriment of certain types of database retrieval applications. Over the past two years, we have reduced the simulation time required to estimate BLAST statistical parameters from about two days to less than one second, with prototype code for on-line estimation. Our mathematical methods entail many interesting speculations.

This is joint work with Yonil Park and Sergey Sheetlin