Detection of genomic changes in Mycobacterium tuberculosis
Resistance development in Mycobacterium tuberculosis
Tuberculosis (TB) remains the leading cause of death due to
bacterial infections worldwide. About 8 million new cases of active
TB arise each year resulting in 3 million annual deaths. Roughly 1
billion individuals are believed to harbour latent tuberculosis.
Primary infections with M. tuberculosis are generally
asymptomatic but will in some cases remain as a latent infection.
Tuberculosis is a secondary disease caused by reactivation of the
M. tuberculosis bacteria not fully eliminated after the
primary infection.
Sequencing and sequence analysis of bacterial and fungal genomes and
proteins/peptides have led to a better general understanding of the
pathogenesis of bacterial and fungal infections. Future
understanding of the regulatory events at the molecular level will
increase and be accelerated by using a variety of new technologies
and technology platforms within microarrays, protein chips and
sequence analysis tools (CLC bio workbenches). The aim is to develop
more specific and effective drugs much faster to target e.g. the
expanding multi resistance Mycobacterium tuberculosis strains
in human populations all over the world.
Multidrug resistant bacterial strains arise by sequential
accumulation of resistance mutations for individual drugs. A diverse
array of strategies is available to assist in rapid detection of
drug resistance-associated gene mutations.
Bioinformatics (CLC bio workbenches) together with functional
genomics and functional proteomics have been used to identify
expression pattern (signature) changes in multi resistant M.
tuberculosis strains.
Mycobacterium tuberculosis genome
The
Mycobacterium tuberculosis genome was sequenced in 1998
[
Cole et al., 1998] (see figure
1). Today
many strains of
M. tuberculosis have been sequenced. In pest
control laboratories, sequencing is routinely performed to monitor
strain genomic changes.
Presently with the latest re-annotation of the M.
tuberculosis genome, it is possible to assign a function to 2058
protein coding sequences (52% of the 3995 proteins predicted). Only
376 putative proteins share no homology with known proteins and thus
could be unique to M. tuberculosis.
Drug resistance in Mycobacterium tuberculosis
Today eight agents are used in the treatment of TB. Multi drug
resistant
M. tuberculosis strains are resistant both to
isoniazid (INH) and rifampicin (RFP). These agents have the most
effective bactericidal activity towards
M. tuberculosis.
Nearly 95% of the RFP resistant strains possess a mutation in the
rpoB gene encoding a DNA-dependent RNA polymerase. Approximately
90% of INH resistant strains have a mutation in the inhA, katG, and
ahpG genes encoding enzymes related to a mycolic acid synthesis of
cell wall. Pyrazinamide (PZA) resistant strains have a mutation in
the pncA gene encoding a pyrazinamidase which degrades PZA to the
bactericidal substance, pyrazinoic acid. Streptomycin resistant
strains have a mutation in the rrs and rpsL gene encoding the 16S
rRNA and the 12S ribosomal subunit protein, respectively. Kanamycin
resistance is due to nucleotide substitutions in the rrs gene
encoding 16S rRNA. Ethambutol resistant strains have a mutation in
the embB gene encoding a arabinosyl transferase which catalyzes cell
wall synthesis. Ethambutol resistance is in approximately 60% of
organisms due to amino acid replacements at position 306 of an
arabinosyltransferase encoded by the embB gene.
CLC bio provide some important bioinformatics solutions for
detection and identification of mutations in selected genes of new
strains of M. tuberculosis and other important pathogenic
microbes. An example of this is described below.
Detection of mutations in Mycobacterium tuberculosis embB, rrs and pncA genes
The work flow for detecting and identifying mutations using the
CLC Combined Workbench is summarized in figure
2.
Figure 2: Illustration of work flow.
The multi resistance M. tuberculosis strains are identified
and the genomic DNA extracted. The encoding region of the embB, rrs
and pncA genes are marked with primers using the graphical primer
design. The relevant regions are sequenced automatically and
assembled into contigs. BLAST searches in selected databases (both
local and NCBI databases) are performed. Subsequently homologous
sequences are aligned and annotations are transferred to the contig
sequence.
In the next two sections, two of the steps in the work flow are
described in further detail.
Zooming in on Primer design
Using the integrated GenBank search function, the annotated
M.
tuberculosis is downloaded. In a few clicks, the relevant genes
(embB, rrs and pncA) are extracted. Guided by the CDS annotations, a
set of primers are calculated for each gene (see
figure
3). Using the interactive primer design
functionality of the
CLC Combined Workbench, the primers can
be designed to match very specific needs.
Figure 3: Primer design guided by CDS annotations (the yellow boxes).
Zooming in on Assembly
The sequencing data is imported into the CLC Workbench, and during
the assembly, the genes are automatically divided into separate
contigs.
The inconsistencies that exist between different reads are inspected
using the variance table (see figure 4).
Figure 4: Assembling of raw sequence data. The upper image is the contig variance table, showing where there are conflicts/variation between the reads. When selecting a row in the table, the bottom view automatically selects this position in the contig, and it can be inspected in more detail.
The reads are both forward and reverse, which are automatically
detected during the assembly. The orientation is reflected in the
color of the reads (in figure
4, the read at the top
is reversed which is indicated by the red color of the residues).
Quality scores are assigned in order to trim low-quality trace data,
so that reads align properly. The quality scores can also be shown
graphically below the sequence.
The result of the assembly is a contig sequence which is used in the
following steps for BLAST searches and alignments.
The description above is only a brief example of how bioinformatics
software, such as the CLC Workbenches, can be used in clinical
microbiology. Although remarkable advances have been made, much
remains to be learned about the molecular genetic basis of drug
resistance development including M. tuberculosis. New therapeutics
will be developed based on improved bioinformatic data-mining of
results from bacterial drug resistance studies.