With a comprehensive view of a pathogen’s genome, researchers are gaining new insights for managing infectious disease outbreaks
By Jonas Korlach, PhD, and Christian Olsen
Rapid identification of pathogens and their transmission paths is critical to preventing or shortening infectious disease outbreaks. It is also vital for choosing the right treatment for each patient—especially with the increasing threat of antibiotic resistance. Unfortunately, microbial identification is often performed using technologies that only produce short pieces of DNA, which can lead to inaccurate results. These technologies have been shown to misidentify microbial strains and drug-resistance mechanisms, putting physicians at risk of recommending and delivering wrong prognoses and ineffective treatment decisions.
Recently, leading infectious disease scientists have turned to single-molecule, real-time (SMRT) sequencing to identify and characterize microorganisms. This approach uses the highly accurate and extraordinarily long reads produced by SMRT sequencing—in the tens of kilobases (kb)—to generate complete genome assemblies, fully representing plasmids and other accessory elements where drug-resistance mechanisms are found. With such comprehensive information, microbial identification is definitive; the data can also properly determine infection transmission paths and guide the selection of drug therapies.
Unlike other microbial sequencing methods, SMRT sequencing also generates genomewide methylation data. New studies have shown that methylation is used more frequently and with greater complexity in prokaryotes than previously understood. This information can play an important role in determining the virulence of pathogens.
With SMRT sequencing it is now possible to produce ‘reference-grade’ de novo sequence assemblies with fully automated closed genomes for most microbes. This capability has contributed to robust studies of infectious disease outbreaks, and evaluations of antibiotic resistance and hospital surveillance programs, among other applications.
Traditionally, bacterial identification has been performed using the 16S rRNA gene as a kind of unique fingerprint. However, closely related species can be difficult to distinguish with this method and, since only bacteria carry the gene, this technique is not applicable for identifying all pathogens.
Nanopore-based technology is an alternative approach that struggles with providing high-consensus accuracy, since its systematic sequencing bias and error rates are so high. This limitation necessitates using short reads for polishing up nanopore-generated sequences.
More recently, scientists have used short-read sequencing for microbial analysis. But short reads, which typically max out at about 300 base pairs (bp), are not long enough to span repetitive regions in these organisms. These short fragments of repetitive DNA are often misassembled during alignment, leading to fragmented assemblies that are particularly prone to errors near insertion sites and in low-complexity regions of sequence. Also, short-read sequencers rely on amplification by polymerase chain reaction (PCR), which is notorious for introducing sequence bias into sequencing data. For microbes with GC-rich genome—a group that includes many clinically important species—the coverage bias introduced by sequencing technologies that rely on amplification has been shown to produce assemblies with significant gaps.
SMRT sequencing overcomes many of these limitations. By not requiring PCR amplification, this technology eliminates a major source of error in microbial assemblies, and accurately resolves both GC- and AT-rich regions. The long read lengths obtained with SMRT sequencing can capture repeat regions with lengths in the tens of kilobases in individual reads, allowing them to be represented directly without relying on alignment algorithms.
Importantly, the error mode in SMRT sequencing is random, so any sequencing mistakes are easily spotted and thrown out. Finally, plasmids, which are often indistinguishable from a microbe’s core genome in short-read data, are thoroughly characterized and represented as accessory elements by long-read sequencing.
Scientists at the National Biodefense Analysis and Countermeasures Center published an analysis of SMRT sequencing for microbial genomes, deeming it an important tool for generating high-quality assemblies with minimal human intervention. After assessing the repeat complexity of more than 2,200 microbes, the scientists suggested that SMRT sequencing paired with assembly pipelines “could automatically close >70% of the complete bacteria and archaea in GenBank.”1
SMRT sequencing has quickly established itself as the new gold standard for microbial sequencing and meets the rigorous standards for clinical sequencing applications.
The Whole Picture
Recent studies underscore the importance of fully resolving the genome, epigenome, and accessory elements for a clearer understanding of a microbe’s function, virulence, and ability to resist therapeutics.
In a recent publication, scientists from San Diego State University reported that an assembly of Mycobacterium tuberculosis generated with SMRT sequencing revealed a number of errors in older assemblies of the same organism.2 The team found that variants previously associated with virulence in the microbe—identified by comparing Sanger-based assemblies of a virulent and nonvirulent strain of M. tuberculosis—were significantly overstated (See Figure 1, “A Visualization of the Reduced Set of H37Ra-specific Variants“). “Our assembly reveals that the number of H37Ra-specific variants is less than half of what the Sanger-based H37Ra reference sequence indicates, undermining and, in some cases, invalidating the conclusions of several studies,” the scientists reported. The overestimate of genetic differences among the strains was attributed to sequencing and assembly errors, mostly associated with GC bias or repetitive DNA. Because SMRT sequencing resolved the genome into a single and complete contig, it became obvious that older assemblies were much less accurate than expected.
A separate study from scientists at Griffith University, Ohio State University, and other institutions highlights the importance of methylation for a complete picture of a microbe’s function. The effort focused on nontypeable Haemophilus influenzae, a common cause of middle ear infections in children.3 By analyzing genomewide methylation patterns produced with SMRT sequencing, the team discovered an epigenetic ‘switch’ responsible for phase-variable expression of a DNA methyltransferase. The protein regulated by this ‘switch’ is included in vaccine candidates, so understanding the mechanism offered immediate clinical utility. Additional investigation revealed that the epigenetic ‘switch’ has key implications for biofilm formation, ability to evade a host immune system, and antibiotic resistance.
In an earlier example, SMRT sequencing was used during the dangerous 2011 outbreak of Escherichia coli in Germany, which caused hemolytic-uremic syndrome in many patients. While some previous sequencing data suggested a different strain classification, SMRT sequencing data correctly distinguished it as enteroaggregative E. coli producing Shiga toxin, most likely through horizontal genetic exchange from enterohemorrhagic E. coli.4 The study also determined that activity of the Shiga toxin-producing gene was increased by specific antibiotics, providing instrumental information to help physicians select an appropriate treatment.
Scientists have now demonstrated that it is possible to pool several microbes for multiplexed SMRT sequencing, which increases the throughput and lowers the cost of assembling each individual strain. By employing this strategy, scientists have the choice to focus on multiple samples at once, or multiple targets within a sample, or a combination of both.
Getting a more comprehensive view of a microbe’s genome has proven important for more than just biological research. In numerous studies, experts have deployed SMRT sequencing for a deeper understanding of the transmission paths of hospital-associated infections as well as the mechanisms of antibiotic resistance.
Tracking hospital-associated infections is a major challenge for clinical teams today. Methicillin-resistant Staphylococcus aureus (MRSA) can persist in improperly sanitized hospital rooms, ready to strike the next patient; but it can also enter through the emergency room door with a patient who was infected in the community. Distinguishing between these origins can help hospital teams trace the source of an infection and, in the case of a MRSA nosocomial infection, take appropriate safeguards to stop its spread.
This approach was demonstrated nicely in an analysis of carbapenem-resistant Enterobacteriaceae, including strains of Klebsiella pneumoniae, collected during a 2011 outbreak at the clinical center of the National Institutes of Health (NIH).5 NIH scientists found that short-read sequencing and strain-typing technologies were unable to provide enough accurate information for a detailed analysis of these samples. They applied SMRT sequencing to generate complete assemblies of 20 Enterobacteriaceae isolates, including the outbreak samples plus several samples collected from routine postoutbreak patient and environmental surveillance.
In the paper reporting this work, the scientists presented data from complete genome assemblies of all 20 isolates, validated through orthogonal technologies as being better than 99.9999% accurate. The information made it possible to reconstruct the transmission path of the Klebsiella outbreak, leading to a surprising conclusion: “only 1 of the 10 cases represented nosocomial spread,” the scientists reported. Indeed, their study underscored how many patients were coming into the hospital already colonized with infections traditionally associated with hospitals. These results argue for better patient screening and surveillance protocols.
Among the compelling details of the NIH study was the importance of plasmids. These highly repetitive elements harbor the carbapenem-resistance mechanisms for Enterobacteriaceae but have been overlooked or considered intractable in many previous attempts to characterize these microbes. In a perspective piece published contemporaneously with the NIH study, scientists wrote that “plasmids may be viewed as the ‘dark matter’ of short-read bacterial genome assemblies, with many large-scale genomic studies conspicuously avoiding the complexities of plasmid structure.”6 Noting that Sanger sequencing is also suboptimal for resolving plasmids, the scientists added, “long-read genome assembly offers clear advantages for the resolution of complete plasmid sequences that can discriminate plasmid diversity, antimicrobial-resistance gene context, and multiplicity.” Indeed, NIH scientists fully assembled all plasmids in the 20 isolates, the longest of which was 379 kb. This effort led to the identification of a novel plasmid encoding carbapenemase, with likely consequences for drug resistance.
A follow-up study based on the NIH data focused on mobile elements to elucidate how drug-resistance plasmids were transmitted from one species to another during a hospital outbreak (See Figure 2, “Mechanisms of Evolution in High-Consequence Drug-Resistance Plasmids“). In their report on the “mobilome,” or full complement of mobile elements, the scientists used SMRT sequencing data and identified two mobile elements that were likely responsible for the evolution of drug resistance.7 “We are able to propose the exact historical molecular events underlying plasmid rearrangements which provide a basis for understanding how antibiotic-resistant strains change over time, with significant implications for combating plasmid-mediated antimicrobial resistance,” they reported.
Understanding how organisms acquire drug resistance mechanisms is growing increasingly important as clinicians attempt to treat patients with superbugs resistant to most or all classes of drugs. An eye-opening study from scientists at Houston Methodist Research Institute and collaborating institutions found that virulent strains of K. pneumoniae are much more widespread than expected, and that antibiotic resistance in this organism is becoming more common.8
The effort involved sequencing 1,800 isolates collected during a 4-year period at the Houston hospital, with in-depth analysis of five noteworthy strains using long-read sequencing (See Figure 3, “Spatial Relationships of ESBL-Producing Klebsiella Pneumoniae Strains“). The dominant strain, representing more than 35% of samples, was found to be as virulent as strains associated with K. pneumoniae pandemics, with inherent drug resistance. The strain had been abundant in the region for years, but had not been appreciated as a public health threat.
Investigations of antibiotic resistance have also uncovered nightmare scenario cases: infections that are resistant to all known treatments. In a 2015 paper, researchers described the analysis of a K. pneumoniae isolate collected from a hospital patient in the United Arab Emirates.9 The strain “was found to be non-susceptible to all antibiotics tested, which includes cephalosporins, penicillins, carbapenems, aztreonam, aminoglycosides, ciprofloxacin, colistin, tetracyclines, tigecycline, chloramphenicol, trimethoprim-sulfamethoxazole, and fosfomycin,” the scientists reported. Long-read sequencing generated a complete assembly of the 5.5 Mb microbe, as well as five circular plasmids and a linear plasmid prophage.
The scientists reviewed these data to determine the genetic basis for the microbe’s pan-resistance, finding several acquired antibiotic resistance genes, mobile elements associated with resistance to the last-resort treatment colistin, and even a new variant of a gene that likely provides increased resistance to carbapenem. This was the first discovery of a carbapenem-resistant mobile element conferring resistance to colistin. Their results “reveal the critical role of mobile resistance elements in accelerating the emergence of resistance to other last resort antibiotics,” the scientists reported.
The studies cited here were typically performed months or even years after a case or outbreak was detected, but their results suggest that the insights provided by long-read sequencing could be instrumental in preventing or shortening future outbreaks.
Many hospitals and clinical labs already perform routine microbial surveillance, both of their patients and of the treatment facility itself. Adding SMRT sequencing to these efforts would offer more comprehensive views of drug resistance, worrisome new strains, and transmission paths, at a time when these data could inform treatment, quarantine protocols, and other decisions that are essential for optimal patient care.
Recent advances in long-read sequencing technology, such as microbial multiplexing, have made it higher-throughput and more cost-effective for use in a clinical setting. As sequencing throughput continues to increase, costs to interrogate each isolate will continue to fall (Figure 4). According to the NIH scientists who conducted the Enterobacteriaceae study, “The cost of whole-genome sequencing is dwarfed by [other] costs associated with outbreaks and their investigations, including the human and financial toll and the loss of patient confidence in the healthcare facility.”
Ultimately, SMRT sequencing could enable hospitals to spot infectious disease threats early enough to protect other patients, and to identify antibiotic resistance mechanisms quickly enough to guide selection of therapies for the best outcome. This would be a major step forward in global efforts to mitigate the rapid spread of dangerous, antimicrobial-resistant pathogens.
Jonas Korlach, PhD, is chief scientific officer, and Christian Olsen is a senior scientist, at PacBio. For further information contact CLP chief editor Steve Halasey at [email protected].
- Koren S, Harhay G, Smith T, et al. Reducing assembly complexity of microbial genomes with single-molecule sequencing. Genome Biol. 2013;14(9):R101; doi 10.1186/gb-2013-14-9-r101.
- Elghraoui A, Modlin S, Valafar F. SMRT genome assembly corrects reference errors, resolving the genetic basis of virulence in Mycobacterium tuberculosis. BMC Genomics. 2017;18(1):302; doi 10.1186/s12864-017-3687-5.
- Atack JM, Srikhanta YN, Fox KL, et al. A biphasic epigenetic switch controls immunoevasion, virulence, and niche adaptation in non-typeable Haemophilus influenzae. Nat Commun. 2015;6:7828; doi: 10.1038/ncomms8828.
- Rasko DA, Webster DR, Sahl JW, et al. Origins of the E. coli strain causing an outbreak of hemolytic–uremic syndrome in Germany. N Engl J Med. 2011;365(8):709–717; doi: 10.1056/nejmoa1106920.
- Conlan S, Thomas PJ, Deming C, et al. Single-molecule sequencing to track plasmid diversity of hospital-associated carbapenemase-producing Enterobacteriaceae. Sci Transl Med. 2014;6(254):254ra126; doi: 10.1126/scitranslmed.3009845.
- Beatson SA, Walker MJ. Tracking antibiotic resistance. Science. 2014;345(6203):1454–1455; doi: 10.1126/science.1260471.
- He S, Chandler M, Varani AM, Hickman AB, Dekker JP, Dyda F. Mechanisms of evolution in high-consequence drug resistance plasmids. MBio. 2016;7(6): pii e01987-16; doi: 10.1128/mbio.01987-16.
- Long SW, Olsen RJ, Eagar TN, et al. Population genomic analysis of 1,777 extended-spectrum beta-lactamase-producing Klebsiella pneumoniae isolates, Houston, Texas: unexpected abundance of clonal group 307. MBio. 2017;8(3):e00489-17; doi: 10.1128/mbio.00489-17.
- Zowawi HM, Forde BM, Alfaresi M, et al. Stepwise evolution of pandrug-resistance in Klebsiella pneumoniae. Sci Rep. 2015;5:15082; doi: 10.1038/srep15082.