• Users Online: 758
  • Home
  • Print this page
  • Email this page

 Table of Contents  
Year : 2017  |  Volume : 6  |  Issue : 4  |  Page : 365-378

Sequence homology and expression profile of genes associated with DNA repair pathways in Mycobacterium leprae

1 Department of Biotechnology, Indian Institute of Technology, Hyderabad, Telangana, India
2 Schieffelin Institute of Health Research and Leprosy Center, Vellore, Tamil Nadu, India; Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
3 Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK, USA

Date of Web Publication17-Nov-2017

Correspondence Address:
Madhusmita Das
Molecular Biology Laboratory, Schieffelin Institute of Health Research and Leprosy Center, Karigiri, Vellore - 632 106, Tamil Nadu
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/ijmy.ijmy_111_17

Rights and Permissions

Background: Survival of Mycobacterium leprae, the causative bacteria for leprosy, in the human host is dependent to an extent on the ways in which its genome integrity is retained. DNA repair mechanisms protect bacterial DNA from damage induced by various stress factors. The current study is aimed at understanding the sequence and functional annotation of DNA repair genes in M. leprae. Methods: T he genome of M. leprae was annotated using sequence alignment tools to identify DNA repair genes that have homologs in Mycobacterium tuberculosis and Escherichia coli. A set of 96 genes known to be involved in DNA repair mechanisms in E. coli and Mycobacteriaceae were chosen as a reference. Among these, 61 were identified in M. leprae based on sequence similarity and domain architecture. The 61 were classified into 36 characterized gene products (59%), 11 hypothetical proteins (18%), and 14 pseudogenes (23%). All these genes have homologs in M. tuberculosis and 49 (80.32%) in E. coli. A set of 12 genes which are absent in E. coli were present in M. leprae and in Mycobacteriaceae. These 61 genes were further investigated for their expression profiles in the whole transcriptome microarray data of M. leprae which was obtained from the signal intensities of 60bp probes, tiling the entire genome with 10bp overlaps. Results: It was noted that transcripts corresponding to all the 61 genes were identified in the transcriptome data with varying expression levels ranging from 0.18 to 2.47 fold (normalized with 16SrRNA). The mRNA expression levels of a representative set of seven genes ( four annotated and three hypothetical protein coding genes) were analyzed using quantitative Polymerase Chain Reaction (qPCR) assays with RNA extracted from skin biopsies of 10 newly diagnosed, untreated leprosy cases. It was noted that RNA expression levels were higher for genes involved in homologous recombination whereas the genes with a low level of expression are involved in the direct repair pathway. Conclusion: This study provided preliminary information on the potential DNA repair pathways that are extant in M. leprae and the associated genes.

Keywords: DNA repair, gene expression, homology, Mycobacterium leprae, phylogeny, transcriptome

How to cite this article:
Sharma M, Vedithi SC, Das M, Roy A, Ebenezer M. Sequence homology and expression profile of genes associated with DNA repair pathways in Mycobacterium leprae. Int J Mycobacteriol 2017;6:365-78

How to cite this URL:
Sharma M, Vedithi SC, Das M, Roy A, Ebenezer M. Sequence homology and expression profile of genes associated with DNA repair pathways in Mycobacterium leprae. Int J Mycobacteriol [serial online] 2017 [cited 2022 Aug 11];6:365-78. Available from: https://www.ijmyco.org/text.asp?2017/6/4/365/218618

  Introduction Top

Stability and integrity of genetic information is crucial to cell survival and multiplication. Both prokaryotes and eukaryotes contain a repertoire of DNA repair pathways that are crucial to protecting the DNA from a myriad of harming errors which can be caused by various external and intracellular factors. Environmental agents such as chemicals, ultraviolet light and ionizing radiation, as well as errors in DNA metabolism, challenge the chemical structure and stability of the genome. These etiological factors lead to a variety of alterations in the normal DNA structure such as single- and double-strand breaks, chemically modified bases, abasic sites, inter- and intra-strand cross-links, and base-pairing mismatches. Given this diversity of threats and their effects, it is not surprising that there is a corresponding diversity in DNA repair pathways.[1] The diversity in functions and complexity of DNA repair pathways is better understood by comparing the mechanisms of action of each of the pathways. Most of what is thought for bacterial DNA repair mechanisms is derived from research in Escherichia coli (E.coli). However, genome sequencing has revealed many genes with unknown capabilities, and clear variations improve questions about the ubiquity of similar DNA repair pathways in the bacterial kingdom. For instance, many species of bacteria, including E. coli, lack an end joining pathway and depend on non-homologous recombination to repair double stranded breaks and alternatively on non-homologous end joining mechanisms (NHEJ).[2] Proteins associated with NHEJ were identified in a number of bacteria, some of which include Bacillus subtilis, Mycobacterium tuberculosis, Mycobacterium smegmatis,[3],[4],[5],[6] and Mycobacterium marinum.[7] Bacteria utilize a remarkably compact version of NHEJ wherein all the required activities are contained in only two proteins: a Ku homodimer and a multifunctional ligase/polymerase/nuclease LigD.[8]

Originating from the family of Mycobacteriaceae, the genus Mycobacteria consists of pathogens known to cause serious diseases in humans, including tuberculosis and leprosy. The etiological agent of leprosy is Mycobacterium leprae. This bacteria has never been successfully grown on an artificial cell culture medium.[9] Instead, it has been grown in mouse foot pads and in armadillos. Armadillos develop infection and manifest disease. M. leprae also has the longest doubling time of 14 days.[10] Due to the absence of an axenic culture medium for propagation, studying cellular processes, especially those belonging to DNA repair pathways is often challenging. In general, the genes involved in DNA repair mechanisms are a part of the core metabolism and Possess similarity with E. coli and other Mycobacterial genomes, however intriguing minor differences suggest biological diversity in bacterial responses to DNA damage.

In this study, the genes in M. leprae that possess a probable role in DNA repair pathways, were identified and annotated using computational and laboratory tools. Initially, a bioinformatics approach was employed to analyze and describe the open reading frames (ORFs) in the genome of M. leprae, that are potentially related to DNA repair mechanisms. M. leprae specific homologues and orthologs of genes corresponding to DNA repair pathways in E. coli and M. tuberculosis were identified from the public databases. Most of the genes indicated a range of similarity and identity with orthologs in the genome of M. tuberculosis. However, M. leprae does not possess genes of the typical mismatch repair (MMR) system that are found in most of the other bacteria. Although M. leprae and E. coli belong to separate phylogenetic groups, many of their DNA repair genes possess substantial similarity. However, some of the vital DNA Repair genes that are present in E. coli, are absent in M. leprae.[11]Conversely, some of the functionally related genes that are present in M. leprae, are absent in E. coli.

  Methods Top

Sequence annotation to identify DNA repair genes in M. leprae genome

The putative ORFs of M. leprae were compared with known DNA repair related genes obtained from public databases using the “BlastP” and DELTA-Blast search over Genbank non-redundant (nr) database of proteins. In a few precise cases, potential DNA repair genes in M. leprae genome were identified both by sequence similarity searches (using seed sequence orthologs from other organisms) and keyword searches. The candidate genes that are associated with DNA repair pathways are therefore confirmed by sequence similarity searches and domain analysis using CDD Blast on a Conserved Domain Database (National Centre for Biotechnology Information (NCBI)).

Sequence phylogeny analysis

Sequence similarities and evolutionary relatedness of all the probable DNA repair genes in M. leprae which are identified by above methods, were further analyzed by searching for orthologous and paralogous sequences in KEGG SSDB database using Smith–Waterman (SW) scoring matrix.[12] Phylogenetic trees were generated for a group of hypothetical protein orthologs and paralogs present in Mycobacteriaceae family. Protein sequences were aligned using “MUSCLE” (multiple sequence alignment program)[13] and manually adjusted with “Bio-Edit”(http://www.mbio.ncsu.edu/bioedit/bioedit.html). The maximum likelihood phylogenies with 100 bootstrap replicates were performed with PhyML [14] using the “Phylogeny.fr.”[15]

Identification of ribosome binding sites and promoters

Nucleotide sequences of putative promoter regions for selected hypothetical proteins were obtained from publicly available databases. For all open-reading frames, 200 nucleotides upstream of the translation initiation site were considered while mapping promoters. Ribosome binding sites (RBS) and promoter sequences were predicted for a common motif by DNA alignments using MUSCLE.[13]

Insights from whole transcriptome microarray experiments

To determine the activity of the DNA repair genes, expression levels of these genes were analyzed in the transcriptome of M. leprae (whole RNA extracted from human skin biopsies of newly diagnosed untreated leprosy cases) using unpublished data on whole transcriptome experiments conducted by Chaitanya et al. (Schieffelin Institute of Health Research and Leprosy Center, Karigiri) (GEO dataset: GSE85948 private series). Differential gene expressions in terms of signal intensities of the DNA repair genes in the microarray experiment were normalized with that of 16SrRNA, which is most commonly used housekeeping gene to measure the basal level of mRNA expressions in prokaryotes.[16],[17] The median intensity value of 16SrRNA as noted from the experiments is 8.051386 and this value was used to calculate the expression folds.

Quantitative polymerase chain reaction (qPCR) experiments

Source of Mycobacterium leprae RNA

M. leprae RNA was obtained from the skin biopsies of active leprosy patients. A total of 10 newly diagnosed untreated leprosy cases from the Dermatology Outpatient Department of “Schieffelin Institute of Health–Research and Leprosy Centre”, Karigiri, Tamil Nadu, India, were enrolled in the study following the institutional ethical guidelines. An informed and written consent for participation was obtained from all the subjects before enrolling in the study, following the ethical guidelines as laid down by the Indian Council of Medical Research. All the procedures conducted in the study were in accordance with the guidelines of the institutional ethical committee and with the ethical standards as laid down in the 1964 declaration of Helsinki and its later amendments or comparable ethical standards. The excisional skin biopsy samples were collected in RNA later (Catalog No: R0901, Sigma-Aldrich) in aseptic conditions, by a clinician and were sent to Molecular Biology laboratory for RNA extraction and quantitative polymerase chain reaction (qPCR) experiments.

RNA extraction

RNA extraction was performed using RNeasy Blood and Tissue Kit (Catalog No: 74104; Qiagen Inc., USA) according to manufacturer's protocol. Aseptically, 2 mm × 2 mm size skin tissues were cut from the actual biopsy sample and were minced/grinded thoroughly using manual glass homogenizer. Alternatively, the tissues (up to 30 mg) were disrupted in Buffer RLT and homogenized using Tissuelyser LT (Catalog No.: 69980, Qiagen Inc., USA). Ethanol was added to the lysate to promote selective binding of RNA to the RNeasy membranes. The sample was then applied to the RNeasy Mini spin column. The contaminants were washed twice and high-quality RNA was eluted in RNase-free water. Genomic DNA contamination was removed by performing DNase treatment (Catalog No.: EN0521, Thermo Fischer Scientific). To rule out the presence of DNA contamination in the RNA samples, a PCR was set up for 16SrRNA gene of M. leprae directly from the RNA samples without reverse transcription reaction. P2 and P3 primers as reported earlier [18] were used in the PCR amplifications. complementary DNA (cDNA) was constructed from 1 μg of total RNA from each of the sample using high-capacity cDNA reverse transcription kit (Catalog No.: 4368814, Applied Biosystems).

Quantitative polymerase chain reaction

Based on the expression levels of the DNA repair genes identified from the transcriptome data, genes corresponding to a set of 4 highly expressed and annotated proteins and 3 highly expressed hypothetical proteins were selected for qPCR experiments to determine/confirm the expression levels. cDNA corresponding to these 10 transcripts was amplified on a Rotor Gene-Q qPCR machine (Qiagen Inc., USA, Serial Number: R0414139) using respective primers [Table 1] and by following reaction conditions. A volume of 20 μl reaction mix containing 10 μl of QuantiNova SYBR Green PCR Master Mix (Qiagen, Cat No: 208054), 0.25 μM (0.5 μl) concentration of each of the forward and reverse primers for respective genes, 7 μl of nuclease free distilled water and 2 μl of cDNA (containing approximately 200 ng) were cycled in Rotor-Gene Q. Cycling conditions include one cycle of hold at 95°C for 2 min (initial denaturation and activation of enzyme) followed by 40 cycle of 95°C for 10 s, annealing at 60°C for 15 s and elongation at 72°C for 20 s. Fluorescence was acquired on green channel during the annealing step. This was followed by a melting step which involves an increase in temperature from 72°C to 95°C at a rate of 1°C/s. Melting curve analysis was performed to determine the integrity of the amplification and to rule out primer-dimer formation.
Table 1: Primer sequences for Seven DNA repair genes which chosen for gene expression analysis

Click here to view

Analysis of quantitative polymerase chain reaction data

The mRNA expression levels were normalized using 16SrRNA as a reference. The threshold fluorescence values were normalized to those of 16SrRNA threshold fluorescence (Ct) values. The mRNA expression levels were calculated after determining the primer efficacy for all the targets using Pfaffl Method [16] by a standard curve with a 7-fold dilution of M. leprae DNA from 500 pg/reaction to 7.813 pg/reaction. Melting curve analysis was performed to determine the integrity of the amplification and to rule out primer-dimer formation. PCR for 16SrRNA PCR was performed as reported earlier.[17]

  Results Top

Genomic sequence annotations

A set of 96 DNA repair genes in the genome of E. coli and M. tuberculosis were considered as a reference and searched for their presence in M. leprae [Table 2], supplementary data]. This approach was adopted to identify the conserved nature of the DNA repair genes in Mycobacteriaceae and conversely, to identify the unique DNA repair genes in M. leprae. BLAST search on the protein database revealed the presence of 61 genes in the genome of M. leprae whose products detect orthologous DNA repair genes in E. coli and M. tuberculosis. Genbank annotations of the 61 genes identify 36 as characterized gene products (59%), 11 as hypothetical proteins (18%), and 14 as pseudogenes (23%). All these genes have orthologs in M. tuberculosis and 49 (80.32%) in E. coli. A set of 12 genes which are absent in E. coli, are present in M. leprae and Mycobacteriaceae. These include DNA ligases, DNA helicase II (uvrD), DNA helicase erCC3, Error-prone DNA polymerase DnaE2, DNA MMR protein mutT, and uracil DNA glycosylases. Functional annotation of all these proteins in DNA repair mechanisms is presented in [Table 2].
Table 2: Comparison of DNA repair genes of Mycobacterium leprae with Escherichia coli and Mycobacteriaceae family with focus on Mycobacterium tuberculosis

Click here to view

Sequence comparison and phylogenetic analysis

A set of 11 hypothetical genes, namely ML1105, ML1889, ML0202, ML0190, ML0603, ML2157, ML1351, ML1682, ML2698, ML1683, and ML1175 which are identified in the above approach were further searched for homologs across the prokaryotic databases using KEGG SSDB search with SW scoring matrix.[12] This was performed to identify the functional characteristics of the hypothetical proteins in relevance to DNA repair and to decipher the evolutionary relatedness with homologs in other bacteria. Multiple sequence alignment of these proteins with MUSCLE indicated that many of Mycobacteriaceae family members contain the conserved residues. All the close homologs that had high sequence identities are hypothetical proteins themselves and are identified as entities of Mycobacteriaceae family. These were selected to build a phylogenetic tree. The phylogenetic profiles were bootstrapped 100 times before constructing the trees. All the phylogenetic trees confirmed a close relationship between the 11 hypothetical proteins and proteins from the Mycobacteriaceae family. Hence, these hypothetical proteins are well conserved and might possess a functional role. Some of the closely related species matches include M. haemophilum, M. tuberculosis, M. marinum, and M. kansasii.

Annotation of ribosome binding sites and promoters

To identify the expression characteristics of the 11 hypothetical protein coding genes mentioned in sections above, presence of RBS and promoter like sequences in the 5' UTR were determined by multiple sequence alignment with promoter-like regions of other Mycobacterial homologs. A representative set of alignments for two hypothetical proteins with their transcription initiation sites, Shine – Dalgarno (SD) sequence and translational start points were aligned to their homologs in Mycobacteria [Figure 1]. Some of the hypothetical proteins demonstrate low similarities with their Mycobacterial counterparts. Although Mycobacterial promoters, for the most part, comprise of some indistinguishable segments from established bacterial promoters and occur upstream of and/or lie between the coding areas of two adjoining gene fragments; some much diverse promoter sequences concurrently exist, which direct the sequence interpretation and transcription in M. leprae. To check whether these hypothetical protein coding genes express in M. leprae, despite lacking canonical promoter regions, a set of 3 hypothetical proteins that indicated low similarity with their homologs in other mycobacteria, were chosen and qPCR was performed to identify gene expression.
Figure 1: Promoter-like sequences upstream of transcribed Mycobacterium leprae hypothetical proteins ML1683 and ML0190: It shows representative alignments of promoter-like sequences for Mycobacterium leprae genes and their mycobacterial homologs which are within 200 nt upstream of the translational start point. Panel A and B represent the ML0190 and ML1683 upstream promoter-like regions containing -35 and -10 regions and initiation site (i) in relationship to their ribosomal binding sites and translational start codons (Start), respectively

Click here to view

Gene expression profiles from the Mycobacterium leprae whole transcriptome microarray

Transcriptome data were analyzed for 61 genes identified from the sequence based homology searches above and it was noted that transcripts corresponding to all the 61 genes were detected from the transcriptome data. A set of 60 nt length probes tiling every 10 nt and complementary to the transcripts of each of the 61 DNA repair genes in M. leprae (with mean signal-to-noise ratio cut-off value of ≥2), were analyzed. The signal intensities of each of the transcript was normalized with that of 16SrRNA whose median signal intensity was 8.051386. The fold-change in average gene expression levels was obtained by dividing the 16SrRNA signal intensity value with that of the expressed DNA repair genes followed by logarithmic transformation. It was noted that ML1335c demonstrated highest signal intensity and it was annotated as a pseudogene in M. leprae having seven stop codons. These observations correlate with the earlier findings on higher expression of pseudogenes and their implications in M. leprae.[19] It was noted that RecN which is primarily involved in homologous recombination process was overexpressed in the current experimental sample. However, the other genes contributing to this pathway are moderately expressed. The least expressed gene is RuvA, which has a signal intensity that is nearly equal to that of 16SrRNA. A heatmap indicating expression levels of all the 61 DNA repair genes is represented in [Figure 2].
Figure 2: Heat-map of significant expression level changes in genes associated with DNA-repair

Click here to view

Determination of gene expressions of a representative set of seven DNA repair genes by quantitative polymerase chain reaction

The gene expression profiles of 3 hypothetical protein coding genes (ML1105, ML0202, and ML0603) and 4 regular DNA repair genes (RecN, DNAJ1, RuvA, and RecA) from untreated patients' sample were analyzed using qPCR. qPCR assays were based on target-specific primers and a master mix containing SYBR Green I fluorescent dye that intercalates with double-stranded DNA (dsDNA/cDNA) that was generated during each progressive cycle of the PCR and emits a fluorescence signal which is quantitatively measured to track the amplification of cDNA. There is a quantitative relationship between the amount of starting template and the PCR product at the exponential phase of the PCR.[8]

Standard curves to determine the amplification efficiency of the selected genes in quantitative polymerase chain reaction

Before testing on clinical samples, pure stocks of bacterial reference DNA of M. leprae (Br4923 strain) was used to construct standard graphs. These graphs were developed to validate the assays, identify lower detection limit and determine error rates in the qPCR experiments. Standard curves were constructed by estimating threshold cycle values for seven 10-fold serial dilutions of purified M. leprae DNA ranging from 0.5 ng to 7.8 pg for each qPCR assay [Table 3] and [Figure 3]. Optimal fluorescence thresholds were chosen based on the common practice that it should be positioned on the lower half of the fluorescence accumulation curves plot from the 10-fold dilutions and was used both to calculate the Ct for standard curve fitting and Ct for all the 10 clinical samples in the study.
Table 3: Standard curves parameters and results for quantitative polymerase chain reaction assays of Mycobacterium leprae DNA

Click here to view
Figure 3: Standard graph of 16srRNA gene of Mycobacterium leprae

Click here to view

Relative abundance of DNA repair gene transcripts in M. leprae RNA from clinical isolates

qPCR of 16SrRNA served as a positive control, imparting incremental sensitivity over assays based on the detection of a single or multiple copies of genomic sequences, since each cell contains 1000–10,000 copies of rRNA. Real time PCR was performed in duplicates for each of the 10 skin biopsies. The mRNA expression levels of all the 10 genes in clinical isolates from newly diagnosed untreated leprosy cases reveal a range of threshold fluorescence values. The average Ct values for all the 10 samples for each of the gene was represented in [Figure 4].
Figure 4: Mean Ct values of 4 DNA repair genes and 3 hypothetical protein coding genes along with 16SrRNA

Click here to view

Comparative analysis of expression levels of all the seven genes using qPCR and microarray data suggested that RecA, ML0202 and ML0603 indicated substantial correlation. Rest of the genes in the analysis revealed a poor correlation with observations from microarray data [Figure 5]. RuvA indicated increased expression in qPCR and low intensities in microarray data. One of the possible reasons for this observation could be due to the selection of leprosy cases which are all highly bacillated providing high quantities of bacterial RNA. RecA, ML0202 and ML0603 indicated similar expressions in both qPCR and microarray data which suggests that ML0202 and ML0603 may have a significant functional role in the DNA repair pathways. The mean Ct values of each of the genes along with the normalized (delta Ct) values are represented in [Table 4] and the microarray fold changes for the same set of genes has been represented in [Table 5].
Figure 5: Comparison of gene expression fold difference between qPCR and microarrays. Genes are indicated by name whereas hypothetical proteins are indicated by their M. leprae accession numbers.

Click here to view
Table 4: Summary of the qPCR results for selected DNA repair genes

Click here to view
Table 5: Summary of the gene expressions from microarrays

Click here to view

  Discussion Top

The relevance of this comparative analysis is to provide the basis for investigating the putative genes and pathways detected in the genome of M. leprae. The presence and absence of DNA repair genes are discussed and predictions are made considering the particular aspects of the M. leprae among other known DNA repair pathways. Sequence annotations of DNA repair genes in M. leprae with insights from their orthologs in E. coli and M. tuberculosis enabled identification of potential DNA repair pathways. DNA repair genes were stratified based on their function in the following mechanisms: base excision repair, nucleotide excision repair (NER), MMR, recombination repair, NHEJ, translesion synthesis (TLS), direct reversal, nucleotide pool, regulatory and other related processes.

Base excision repair

One of the primary mechanisms for the repair of alkylated bases is BER, which is initiated by one of the 3-methyladenine DNA glycosylases, tagA or alkA. A homolog of the tagA gene is present in M. leprae which includes 10 stop codons, splitting the corresponding locus into many reading frames and has been annotated as a pseudogene-(ML0190). A gene encoding “3-methyladenine DNA glycosylase” is also present in Mycobacteria and possess conserved regions throughout the Mycobacterial species. In M. leprae, it has been annotated as a hypothetical protein ML1351. Although no functional studies have been reported, the conservation of this gene across various species suggests its indispensable role. One of the most common and stable oxidation products in DNA is 8-oxo- 7, 8-dihydroguanine (8-oxo-G),[20] having a propensity to mispairing with adenine. Both modified bases act as substrates for the formamidopyrimidine-DNA glycosylase, known as fpg or mutM.[21] The fpg gene has been shown to be involved in the repair of DNA lesions induced by hydrogen peroxide in E. coli.[22]M. tuberculosis (H 37Rv) has four genes of the fpg/nei family of DNA glycosylases: Rv2924c annotated as fpg (ML1658 in M. leprae), Rv3297 annotated as nei, Rv0944(ML0148 in M. leprae) annotated as a possible fpg, and Rv2464c (ML1483 in M. leprae) annotated as a possible DNA glycosylase. Homologs of all four of these genes are found in the other Mycobacterial genomes and in M. leprae the loci corresponding to Rv0944 and Rv2464c contain pseudogenes, whereas there is no equivalent of Rv3297. Endonuclease III (Nth) excises oxidative pyrimidines. A homolog of Nth is present in both the mycobacterial genomes and it is named as ML2301c in M. leprae.

Adenine can be incorporated rather than the cognate cytosine opposite 8-oxo-G during DNA replication, leading to G.C and T.A transversions. To contract this, the adenine DNA glycosylase (mutY) excises the mismatched pair, which also includes nucleotides on the complementary strand. The mutY gene in M. leprae is ML1920 which has homologs that are identified in other Mycobacterial genomes as noted in earlier studies.[23] Uracil can also be found in DNA either because of misincorporation or deamination of cytosine. The archetypal family-1 Uracil DNA glycosylases/(ung) are specific to uracil in DNA and excise it from both double-stranded (ds) and single-stranded (ss) substrates.[24] The homologs of udgB from E. coli and M. tuberculosis are present in M. leprae as ung and ML1105. The second step in BER is the cleavage of sugar-phosphate backbone by an apurinic/apyrimidinic endonuclease. In E. coli, endonuclease IV (Nfo) and exonuclease III (XthA) produce a single-strand (ss) break at abasic sites by attacking the phosphodiester bond 5' to the site of base loss, leaving 3'OH groups. Homologs of Nfo have been identified in many Mycobacterial species and in M. leprae, it is annotated as hypothetical protein (ML1889). Similarly, XthA is also present in all Mycobacterial species except M. leprae where a corresponding pseudogene (ML1931) is found.

Nucleotide excision repair

This system recognizes the distortion in the double helix caused by lesions which can recognize a larger variety of base modifications. Removal of lesions from the intact oligonucleotide forms is facilitated by the sequential action of nucleases and helicases, followed by DNA polymerization and ligation by DNA ligase.[25] It includes proteins uvrA, uvrB, the nuclease uvrC, the helicase uvrD and the dsDNA translocase Mfd. Homologs of uvrA, uvrB and uvrC are present in all the Mycobacterial genomes including M. leprae, suggesting that this pathway of DNA repair is important to Mycobacteria. Despite the canonical uvr genes, an additional protein involved in the incision step of NER has been identified in E. coli, termed cho having sequence similarity with the N-terminal portion of uvrC and containing the domain for the 3' incision. The sequence of this protein is conserved throughout the Mycobacterial species, except M. leprae, where the corresponding locus is a pseudogene (ML0884c). Transcription-coupled repair is a sub-pathway of NER that selectively removes lesions from the transcribed strands, mediated by the transcription-repair coupling factor (mfd). Homologs of mfd have been identified in M. leprae (ML0252); however, the actual function is yet to be deciphered. In M. leprae, there are two homologs of uvrD, annotated as uvrD1 and uvrD2. While their role is not experimentally determined, their orthologs in M. tuberculosis interact with Ku, a component of the NHEJ pathway of DNA repair, stimulating the helicase activity. Thus, it may be that uvrD1 is involved in multiple DNA repair pathways in Mycobacteria. While most of the Mycobacterial genomes have homologs for superfamily II helicases known in eukaryotes, the M. leprae gene ML2157 encodes for ERCC3, a 3'-5' helicase and is reported as the first example of this gene in prokaryotes.[26]

Mismatch repair

The mutS/mutL complex recognizes DNA replicative errors or misalignments and will perform an excision of the section containing the mismatch.[27]M. leprae lacks a system for MMR, as mutS, mutL or mutH could not be identified and not even their homologs. The exonucleases recJ or exol (encoded by sbcB or xonA) are also absent in M.leprae. This indicates that Mycobacteria may possess alternative control over homologous recombination, possibly involving a recA-mediated strand transfer. E. coli and related enteric bacteria also possess a system known as very short patch repair that targets mismatched T.G base pairs arising from deamination of 5-methylcytosine, especially within motifs recognized by DNA cytosine methyltransferase. Repair is initiated by the Vsr protein which nicks the DNA immediately upstream of the mismatch pair, followed by synthesis of a short stretch (<10 nucleotides) of DNA by DNA polymerase I and ligation.[28] Both these genes are absent in M. leprae.

Homologous recombination

Recombination repair maintains genome integrity. In E. coli, two pathways, the RecBCD and RecFOR recruit RecA to single stranded DNA and provoke the repair of double stranded breaks or repair post replication daughter strand gaps respectively breaks or of postreplication daughter strand gap, respectively.[29] RecA plays a central role in recombination repair and homologous recombination by promoting homologous pairing and DNA strand exchange using ATP, involving the formation of a nucleoprotein filament.[30] In some Mycobacteria like M. tuberculosis, recA is encoded by an elongated gene containing an intein which is made active by protein splicing [31],[32],[33] and similar observations were noted in M. leprae. M. leprae-recA intein binds to cognate DNA and displays endonuclease activity in the presence of alternative divalent cations like Mg2+ or Mn2+.[34] In E. coli, several pathways exist for the initial processing of dsDNA breaks to single stranded substrates for recombination, each featuring the action of exonucleases and helicases. M. leprae possesses neither of these systems, but it does possess homologs of an archaeal exonuclease (ML1155) and helicase (ML1312) belonging to the recB family of exonucleases/helicases [34] in addition to ML2157 and exonucleases (sbcD [ML1119], xseAB) which can perform the break-processing function. RuvABC and RecG complete the process of recombination by RecA. The RuvAB complex or the helicase RecG catalyze branch migration of Holliday junctions formed by the crossing over of strands from two DNA duplexes, and RuvC resolves this structure to allow separation of the DNA helices.[35] Homologs of each of RuvA, RuvB, RuvC, and RecG are present in M. leprae.

The functions of RecN and Rec X has not been elucidated to a substantial level in Mycobacteria and hence, their role in the repairing the double stranded breaks in M. leprae is unknown. M. leprae does not possess homologs of RecE and RecT genes. Homologs of RadA are present in many of the Mycobacterial species except in M. ulcerans and M. leprae consists of it in the form of a pseudogene (ML0318c).

Non-homologous end-joining

NHEJ also operates in some prokaryotes, including Mycobacteria,[36] but only Ku and ligase proteins are required.[8] Ku homologues are present in all the Mycobacterial species, with the single exception of M. leprae where it is present as a pseudogene (ML2092). Many Mycobacteria encode at least three different ATP-dependent ligases, known as LigB, LigC and LigD; expect in M. leprae, in which these genes are annotated as pseudogenes ML1747 for LigB and ML2090 for LigD. LigC is absent in M. leprae.

Translesion synthesis

In M. leprae, genes related to TLS are present as pseudogenes. DinB, DinP and dnaE2 coding genes are annotated as pseudogenes ML1197, ML1739, and ML0416, whereas other genes umuC, umuD, and polB are absent.

SOS Repair systems

The genes umuC and umuD form a complex UmuC/UmuD2, known as DNA polymerase V,[37] which is responsible for the induced mutagenesis through the SOS repair in E. coli. However, these polymerases are absent in M. leprae. The SOS inducible and error prone DNA polymerase IV (dinB) is involved in TLS in E. coli,[38] and thought to be doing the same regulatory function in M. leprae. The SOS induced mutagenesis in M. leprae has been proven to be promoted by enzymes encoded by operon including a second subunit of DnaE (the catalytic subunit of DNA polymerase III) called DNAE2.

The principal motivation for this study was to identify all the DNA repair genes present in the M. leprae genome, identify their expression from available microarray data and validate a representative set (especially the hypothetical proteins) using qPCR assay. Overall, 100% of the DNA repair genes were found to be transcribed as noted in microarrays. Different DNA repair pathways of M. leprae exhibited different levels of RNA expression. RNA expression was relatively higher for genes involved in the homologous recombination, whereas, the genes with a low level of expression were involved in the direct repair pathway. There were some differences in the levels of RNA expression detected by microarray and qPCR. The level of expression of hypothetical proteins involved in direct repair pathway detected by microarray were higher than the level from the same genes detected by qPCR, when compared to 16SrRNA expression. This discrepancy might reflect the difference in the target length for both methods as well as the difference in the length of transcribed RNA.

The presence of promoter-like sequences in the 5'UTR of transcribed M. leprae hypothetical genes with translational start codons was investigated, using alignment of promoter like regions with that of Mycobacterial homologs. These promoters aligned very well with that of other Mycobacterial homologs and showed relationship to their-35 and-10 box, initiation site, RBS, and translational start codon. Although the results of this study indicate that some hypothetical proteins (supplementary data) having weak RBS sequences, some of the hypothetical genes like ML0190, ML1683 have intact ribosome-binding sequences of similar strength to the orthologs of Mycobacteriaceae. In addition, phylogenetic analysis also revealed that these hypothetical proteins from M. leprae are well conserved and might possess a functional role.

Functional annotation of most of the above-mentioned gene products using experimental approaches is vital to elucidate the DNA repair mechanisms in M. leprae. Understanding and targeting the DNA repair processes in M. leprae can be an important strategy for the development of potential future therapeutics for leprosy as they are essential for the survival at different stages of infections. During leprosy infection, different sets of genes play a vital role in maintaining the stability of the Mycobacterial genome; therefore, an improved understanding of the role of DNA repair in the pathogenesis of Mycobacteria may uncover the great possibility for the effective treatment against leprosy. Nonetheless, the majority of the in silico work should be confirmed experimentally, this work provides a profile of those genes responsible for the maintenance of genome stability, contributing to the understanding of the mechanisms of genome protection and mutagenesis in M. leprae. It also provides a useful framework for further investigations on the functions of these genes with the confirmation of their presence in microarray and qPCR experiments.


Authors would like to thank the scientific staff and students of the Department of Biotechnology, Indian Institute of Technology Hyderabad – who contributed in the Bioinformatics analysis. Our special thanks to all the research staff of the branch of laboratories and the directorate of SIH-R&LC Karigiri for providing access to microarray data and infrastructure to conduct all the scientific experiments.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.

  References Top

Eisen JA, Hanawalt PC. A phylogenomic study of DNA repair genes, proteins, and processes. Mutat Res 1999;435:171-213.  Back to cited text no. 1
Chayot R, Montagne B, Mazel D, Ricchetti M. An end-joining repair mechanism in Escherichia coli. Proc Natl Acad Sci U S A 2010;107:2141-6.  Back to cited text no. 2
Shuman S, Glickman MS. Bacterial DNA repair by non-homologous end joining. Nat Rev Microbiol 2007;5:852-61.  Back to cited text no. 3
Gong C, Martins A, Bongiorno P, Glickman M, Shuman S. Biochemical and genetic analysis of the four DNA ligases of mycobacteria. J Biol Chem 2004;279:20594-606.  Back to cited text no. 4
Gong C, Bongiorno P, Martins A, Stephanou NC, Zhu H, Shuman S, et al. Mechanism of nonhomologous end-joining in mycobacteria: A low-fidelity repair system driven by Ku, ligase D and ligase C. Nat Struct Mol Biol 2005;12:304-12.  Back to cited text no. 5
Aravind L, Koonin EV. Prokaryotic homologs of the eukaryotic DNA-end-binding protein Ku, novel domains in the Ku protein and prediction of a prokaryotic double-strand break repair system. Genome Res 2001;11:1365-74.  Back to cited text no. 6
Wright DG, Castore R, Shi R, Mallick A, Ennis DG, Harrison L, et al. Mycobacterium tuberculosis and Mycobacterium marinum non-homologous end-joining proteins can function together to join DNA ends in Escherichia coli. Mutagenesis 2017;32:245-56.  Back to cited text no. 7
Della M, Palmbos PL, Tseng HM, Tonkin LM, Daley JM, Topper LM, et al. Mycobacterial Ku and ligase proteins constitute a two-component NHEJ repair machine. Science 2004;306:683-5.  Back to cited text no. 8
McMurray DN. Mycobacteria and Nocardia. In: Baron S, editor. Medical Microbiology. 4th edition. Galveston (TX): University of Texas Medical Branch at Galveston; 1996. Chapter 33. Available from: https://www.ncbi.nlm.nih.gov/books/NBK7812/.  Back to cited text no. 9
Shepard CC. The first decade in experimental leprosy. Bull World Health Organ 1971;44:821-7.  Back to cited text no. 10
Vissa VD, Brennan PJ. The genome of Mycobacterium leprae: a minimal mycobacterial gene set. Genome Biology 2001 2(8), reviews1023. 1–reviews1023.8.  Back to cited text no. 11
Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol 1981;147:195-7.  Back to cited text no. 12
Edgar RC. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004;32:1792-7.  Back to cited text no. 13
Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O, et al. New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. Syst Biol 2010;59:307-21.  Back to cited text no. 14
Dereeper A, Guignon V, Blanc G, Audic S, Buffet S, Chevenet F, et al. Phylogeny.fr: Robust phylogenetic analysis for the non-specialist. Nucleic Acids Res 2008;36:W465-9.  Back to cited text no. 15
Pfaffl MW. A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res 2001;29:e45.  Back to cited text no. 16
Cox RA, Kempsell K, Fairclough L, Colston MJ. The 16S ribosomal RNA of Mycobacterium leprae contains a unique sequence which can be used for identification by the polymerase chain reaction. J Med Microbiol 1991;35:284-90.  Back to cited text no. 17
Phetsuksiri B, Rudeeaneksin J, Supapkul P, Wachapong S, Mahotarn K, Brennan PJ, et al. A simplified reverse transcriptase PCR for rapid detection of Mycobacterium leprae in skin specimens. FEMS Immunol Med Microbiol 2006;48:319-28.  Back to cited text no. 18
Williams DL, Slayden RA, Amin A, Martinez AN, Pittman TL, Mira A, et al. Implications of high level pseudogene transcription in Mycobacterium leprae. BMC Genomics 2009;10:397.  Back to cited text no. 19
Demple B, Harrison L. Repair of oxidative damage to DNA: Enzymology and biology. Annu Rev Biochem 1994;63:915-48.  Back to cited text no. 20
Gros L, Saparbaev MK, Laval J. Enzymology of the repair of free radicals-induced DNA damage. Oncogene 2002;21:8905-25.  Back to cited text no. 21
Asad NR, de Almeida CE, Asad LM, Felzenszwalb I, Leitão AC. Fpg and uvrA proteins participate in the repair of DNA lesions induced by hydrogen peroxide in low iron level in Escherichia coli. Biochimie 1995;77:262-4.  Back to cited text no. 22
Kurthkoti K, Varshney U. Base excision and nucleotide excision repair pathways in mycobacteria. Tuberculosis (Edinb) 2011;91:533-43.  Back to cited text no. 23
Pearl LH. Structure and function in the uracil-DNA glycosylase superfamily. Mutat Res 2000;460:165-81.  Back to cited text no. 24
Truglio JJ, Croteau DL, Van Houten B, Kisker C. Prokaryotic nucleotide excision repair: The UvrABC system. Chem Rev 2006;106:233-52.  Back to cited text no. 25
Poterszman A, Lamour V, Egly JM, Moras D, Thierry JC, Poch O, et al. A eukaryotic XPB/ERCC3-like helicase in Mycobacterium leprae? Trends Biochem Sci 1997;22:418-9.  Back to cited text no. 26
Balasingham SV, Zegeye ED, Homberset H, Rossi ML, Laerdahl JK, Bohr VA, et al. Enzymatic activities and DNA substrate specificity of Mycobacterium tuberculosis DNA helicase XPB. PLoS One 2012;7:e36960.  Back to cited text no. 27
Bhagwat AS, Lieb M. Cooperation and competition in mismatch repair: Very short-patch repair and methyl-directed mismatch repair in Escherichia coli. Mol Microbiol 2002;44:1421-8.  Back to cited text no. 28
Morimatsu K, Kowalczykowski SC. RecFOR proteins load RecA protein onto gapped DNA to accelerate DNA strand exchange: A universal step of recombinational repair. Mol Cell 2003;11:1337-47.  Back to cited text no. 29
Kowalczykowski SC, Dixon DA, Eggleston AK, Lauder SD, Rehrauer WM. Biochemistry of homologous recombination in Escherichia coli. Microbiol Rev 1994;58:401-65.  Back to cited text no. 30
Saves I, Lanéelle MA, Daffé M, Masson JM. Inteins invading mycobacterial RecA proteins. FEBS Lett 2000;480:221-5.  Back to cited text no. 31
Davis EO, Thangaraj HS, Brooks PC, Colston MJ. Evidence of selection for protein introns in the recAs of pathogenic mycobacteria. EMBO J 1994;13:699-703.  Back to cited text no. 32
Davis EO, Jenner PJ, Brooks PC, Colston MJ, Sedgwick SG. Protein splicing in the maturation of M. tuberculosis RecA protein: A mechanism for tolerating a novel class of intervening sequence. Cell 1992;71:201-10.  Back to cited text no. 33
Singh P, Tripathi P, Silva GH, Pingoud A, Muniyappa K. Characterization of Mycobacterium leprae RecA intein, a LAGLIDADG homing endonuclease, reveals a unique mode of DNA binding, helical distortion, and cleavage compared with a canonical LAGLIDADG homing endonuclease. J Biol Chem 2009;284:25912-28.  Back to cited text no. 34
McGlynn P, Lloyd RG. Recombinational repair and restart of damaged replication forks. Nat Rev Mol Cell Biol 2002;3:859-70.  Back to cited text no. 35
De Mot R, Schoofs G, Vanderleyden J. A putative regulatory gene downstream of RecA is conserved in gram-negative and gram-positive bacteria. Nucleic Acids Res 1994;22:1313-4.  Back to cited text no. 36
Fuchs RP, Fujii S, Wagner J. Properties and functions of Escherichia coli: Pol IV and Pol V. Adv Protein Chem 2004;69:229-64.  Back to cited text no. 37
Strauss BS, Roberts R, Francis L, Pouryazdanparast P. Role of the dinB gene product in spontaneous mutation in Escherichia coli with an impaired replicative polymerase. J Bacteriol 2000;182:6742-50.  Back to cited text no. 38


  [Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5]

  [Table 1], [Table 2], [Table 3], [Table 4], [Table 5]

This article has been cited by
1 Hypothetical Proteins as Predecessors of Long Non-coding RNAs
Girik Malik,Tanu Agarwal,Utkarsh Raj,Vijayaraghava Seshadri Sundararajan,Obul Reddy Bandapalli,Prashanth Suravajhala
Current Genomics. 2020; 21(7): 531
[Pubmed] | [DOI]
2 Heteroexpression of Mycobacterium leprae hypothetical protein ML0190 provides protection against DNA-alkylating agent methyl methanesulfonate
Mukul Sharma,Deepa Akula,Monisha Mohan,Richa Nigam,Madhusmita Das,Roy Anindya
Biochemical and Biophysical Research Communications. 2019; 509(3): 779
[Pubmed] | [DOI]
3 Identification of novel open reading frames in the intergenic regions of Mycobacterium leprae genome and detection of transcript by qRT-PCR
Mukul Sharma,Madhusmita Das,D. Diana,Anna Wedderburn,Roy Anindya
Microbial Pathogenesis. 2018; 124: 316
[Pubmed] | [DOI]


Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
Access Statistics
Email Alert *
Add to My List *
* Registration required (free)

  In this article
Article Figures
Article Tables

 Article Access Statistics
    PDF Downloaded364    
    Comments [Add]    
    Cited by others 3    

Recommend this journal