Big data laboratorians are patient care’s wave of the future 

By Daniel Meyer 

Daniel Meyer

Daniel Meyer

We have entered a new age of information in medicine, one in which the scale and complexity of data, as well as the pace at which data are being generated, limits the ability of any individual to manually interpret it and draw relevant conclusions about the optimal course of therapy for a particular patient.

Together with this challenge comes a great opportunity to improve patient care. Electronic medical records (EMRs), advanced imaging, and next-generation DNA sequencing (NGS), are increasing the information we have available on each patient, and research is identifying factors that can be used to help direct therapy. By applying advanced data analytics technologies to the deluge of biomedical data in healthcare, we have the opportunity to dramatically improve patient care. Pathologists will play a central role in how these technologies improve the lives of patients.

In recent years, the industry has started to apply new terms to the practice of accurate diagnosis and management of patients. These include “precision medicine,” “personalized medicine,” “systems medicine,” and “genomic medicine.” Central to each of these is the concept of determining the right diagnosis and treatment plan for each individual patient based on all available information. “All available information” includes a variety of demographic, phenotypic, genomic, treatment, and outcomes data. The key challenge is how best to harvest advances in computational resources to effectively use the growing body of complex data to improve patient care.


Healthcare is awash in data from myriad sources. One reason for the dramatic increase in data is the emergence of new diagnostic platforms. Perhaps the most notable example is the sequencing of a human genome, where the cost has fallen more than five orders of magnitude since the draft sequence was released 15 years ago, and the time required to generate the data has dropped from months to days. This dramatic change in sequencing has made the generation of diagnostic sequence data feasible, and consequently, it is rapidly becoming part of the overall approach to care—particularly for diseases refractory to standard of care.

Not all of the data flows from new platforms. Increased access to information from clinical records provides a trove of demographic, treatment, and outcomes information. Technologies such as improved natural language processing (NLP) tools provide access to information that is otherwise trapped in narratives and other unstructured forms. The establishment of information exchanges has led to increased flows of data in standard formats.

An increased desire by leadership at provider sites to benchmark clinical and operational metrics provides a similar push for normalized information. Many of these trends have been driven by government incentives such as those intended to increase the meaningful use of EMR systems.


Recent years have seen an expansion of resources available for data analytics across industries, including new, cost-effective tools for data transmission, storage, security, interoperability, computation, analysis, and visualization. Pathologists and clinical laboratories are uniquely positioned to apply advances in data analytics to improve patient care, particularly in growth areas such as molecular profiling. Each of these advances is part of a larger, more general approach that has spawned the field of “Big Data,” and one that precision medicine is particularly well positioned to exploit. The field tends to be characterized by five “V’s”: volume, velocity, variety, veracity, and value.

Precision medicine in pathology requires a variety of data analysis capabilities.  An example workflow for molecular pathology in oncology includes (a) sourcing tissue and clinical information; (b) performing clinical-grade analysis across multiple assay types; (c) interpreting results based on public, licensed, and proprietary reference content sources; and (d) reporting to the patient and referring clinician.

Precision medicine in pathology requires a variety of data analysis capabilities. An example workflow for molecular pathology in oncology includes (a) sourcing tissue and clinical information; (b) performing clinical-grade analysis across multiple assay types; (c) interpreting results based on public, licensed, and proprietary reference content sources; and (d) reporting to the patient and referring clinician.

Volume refers to how much data is being analyzed, interpreted, and visualized. There are more than 300,000 medical laboratory professionals in the United States. Every year, US laboratories perform and interpret more than 10 billion tests.1 The average size of data generated from those tests is increasing, notably through the increasing use of molecular data. Today, for example, relapsed and refractory oncology patients are routinely profiled with NGS panels that include tens or hundreds of genes. In the coming year, clinical labs are initiating the routing of whole-exome sequencing for clinical use, with multiple initiatives under way to sequence the entire genomes of 100,000 or more individuals.

Velocity is the speed at which data is generated. In the US alone, each day there are more than three million patient visits to physician offices, hospital outpatient facilities, and emergency departments.2 Laboratories and pathologists perform and interpret an average of 27 million tests per day. These visits and tests result in an average of more than 10 million prescriptions filled every day—representing but one of many potential treatments.3 These and other events result in a high velocity of data in healthcare, a majority of which is somehow associated with data from a lab.

Variety refers to the heterogeneity of data, in terms of type and source. Pathologists have access to and an understanding of a wide range of results from a variety of testing modalities. They are also provided clinical information from ordering physicians and are in a position to integrate data with the clinical record from client EMRs. As a result, pathologists and their clients collectively have access to a large variety of information, including demographic, EMR-based, pathology (clinical, anatomic, and molecular), and reimbursement data—as well as reference content from professional associations, government entities, and the like.

Veracity refers to the quality and cleanliness of data and provides a measure of how readily understandable and usable data are. In order to perform population analytics on data from the clinical record without direct access to information technology systems integrated with pathology, data scientists often have to process results from pathology reports. Whereas existing tools such as natural language processing technologies can render this activity straightforward in certain cases, in others the processing is more complicated. And in certain cases, there is information that simply does not make it onto the pathology report due to a lack of clinical relevance. Instead of processing lab data after it reaches the EMR, we can make tools available to the pathologists and medical directors in the labs themselves.

Most importantly, value refers to how data impacts the particular use case. In healthcare and life sciences, value is measured in terms of impact on patient care. Certain measures are economic in nature and evaluate how much care we can provide for a given dollar spent, or how we can improve the bottom line of a lab to ensure financial viability. Some measures more directly evaluate patient care, such as achieving the right diagnosis and treatment for an individual patient earlier in the care process. Another set of measures evaluates our ability to support clinical development of new diagnostics and therapies that will advance medicine for years to come.


There are great opportunities to improve outcomes and patient care through advanced use of data analytics. These include opportunities to improve care of individual patients in everyday care today and in the future, as well as to make the medical system more efficient to improve our ability to deliver care across large populations.

As the name of an emerging field of healthcare research, “big data” typically suggests a range of large-scale activities related to the development of new drugs or population-based healthcare policies. But it can also suggest activities to develop more personalized therapeutic approaches based on the analysis of large and varied bodies of healthcare data. Pathologists and clinical laboratories are uniquely positioned to apply such advances in data analytics to improve patient care, particularly in growth areas such as molecular profiling. The data under review tend to be characterized by new metrics that describe both their scale and their utility.

[reference float=”right”]Five V’s in Big Data

Volume. How much data is being analyzed, interpreted, and visualized.

Velocity. Speed at which the data is being generated.

Variety. Heterogeneity of the data, in terms of both type and source.

Veracity. Quality and cleanliness of the data, indicating how readily understandable and usable it is.

Value. Impact on patient care (direct), or on the viability of institutions that contribute to patient care (indirect).[/reference]

Taking full advantage of these opportunities will require that the healthcare industry broadly adopt advanced informatics approaches. The information technology industry has provided valuable resources for dealing with the challenges of big data in medicine. Our challenge is in harnessing these tools and tailoring them for biomedical data and use in clinical practice.

New database technology allows for the integration of diverse data types that are sourced from a variety of information platforms. Programming interfaces allow for the interrogation of reference content so that pathologists and other stakeholders can interpret larger result sets efficiently. Natural language processing tools can extract unstructured data and provide meaning based on context. Visualization tools allow for more user-friendly interpretation of complex analyses.

By becoming educated consumers of these and other informatics technologies, pathologists have the ability to impact care in concrete ways today. Informatics platforms can integrate results from different clinical, anatomic, and molecular diagnostic platforms. Molecular pathologists can allow systems to automatically compare complex genomic results to dozens of reference content sites and semi-automate the interpretation of long lists of somatic variants in oncology and other areas. Institutions can combine lab results with structured data extracted from EMRs by NLP technologies. The analysis of the resulting data can be presented in statistical form to researchers and in visual form to other users.


Although pioneered in large academic centers and major information technology companies, advanced data science strategies can also be implemented in regional laboratories and community practices in a variety of clinical settings.

GenoSpace has been working with PathGroup for the past year to support its molecular pathology offering. PathGroup is a physician-owned pathology group and clinical laboratory based in Nashville, Tenn. The organization serves major medical centers, large clinical practices, and community oncology practices. The team at PathGroup has launched a comprehensive platform for providing molecular profiling in oncology.

PathGroup’s SmartGenomics offering incorporates a variety of advanced capabilities:

  • Integration of NGS, aCGH, FISH, and flow cytometry results from different instruments and associated information systems.
  • Interrogation of a variety of reference content sources, including Thomson Reuters Life Sciences, the National Comprehensive Cancer Network,, and others.
  • Web-based portals that allow for cohort identification, analysis, and visualization across molecular profiling results and other biomedical data.

An overwhelming majority of oncology patients are treated in community oncology practices. PathGroup’s commitment to delivering leading molecular profiling offerings based on advanced analytics tools is an important example of the pathology community’s ability to impact patient care across the country. At GenoSpace, we are excited to be working with stakeholders across research, clinical development, lab medicine, and clinical care—and view pathology as perhaps the most exciting opportunity to apply information technologies to improve outcomes for all stakeholders. We believe our tools and information management systems provide the enabling technology necessary to operationalize precision medicine.

Daniel Meyer is CFO and head of corporate development at GenoSpace, Cambridge, Mass. For further information, contact CLP chief editor Steve Halasey via [email protected]


1. ASCP Celebrates Your Contributions to Patient Care. Available at: Accessed June 11, 2014.

2. Ambulatory Care Use and Physician Visits. Available at: Accessed June 11, 2014.

3. Total Number of Retail Prescription Drugs Filled at Pharmacies. State Health Facts. Available at: Accessed June 11, 2014.