Ph.D. Theses

Mathematical Models for Mycobacterium Tuberculosis Complex Genotyping and Patient Data

By Inna Vitol
Advisor: Kristin P. Bennett
April 21, 2006

The goal of this project was to develop biologically appropriate mathematical models for genotyping and patient data and use them to analyze and exploit the information in heterogeneous genotyping and epidemiological databases. These databases can be used to address fundamental questions in public health, particularly dynamics of emerging infectious diseases. This work focuses on Mycobacterium tuberculosis complex (MTC) because tuberculosis (TB) presents a reemerging serious health threat worldwide; the most optimistic scenarios predict in excess of 80 million new cases and 20 million deaths in the coming decade. Moreover, mycobacteria are one of the most widely sequenced pathogenic groups, and global TB databases currently exist.

Advances in molecular methods contribute significantly to our understanding of the spread of TB. Differentiating between various patent isolates and using the data to guide the efforts of TB control programs are major applications for MTC genotyping. Our research develops mathematical models for spacer oligonucleotide typing (spoligotyping) and demographic data on TB patients. The spoligotyping method exploits polymorphism in the direct repeat locus of chromosome of the MTC bacteria. Spoligotyping produces a simple binary pattern for each TB isolate and is widely used for MTC strain discrimination.

We present SPOTCLIST, a novel mixture modeling approach to advance global studies of MTC genotyping data. SPOTCLUST incorporates biological information on spoligotype evolution without attempting to derive the full phylogeny of MTC. The algorithm is applied to spoligotyping data identified among strains isolated between 1996 and 2004, primarily from New York State TB patients. Our results both confirm previously defined families of MTC strains and suggest certain new families. We demonstrate on New York City demographic data how the resulting models can potentially form the basis of TB control tools using genotyping. Several alternative methods of analysis of MTC genotype and patient data are explored. Improvements to the current method are suggested. Future work will concentrate on developing methods for merging probabilistic models for spoligotypes and results from other TB genotyping methods with traditional epidemiological data.

Return to main PhD Theses page