Emory University | Woodruff Health Sciences Center
Bookmark and Share

Embracing Big Data

The Center for Data Science gives nurses the analytical tools to optimize health care delivery and outcomes

By Sylvia Wrobel

Story Photo

Based in the School of Nursing, the Center for Data Science serves as a resource for Emory researchers and affiliates. The staff who designed and lead it include Rebecca Mitchell (left), Mengtian Jin, Roy Simpson, Andrea Plotsky, and Vicki Hertzberg.

Two years ago, Dean Linda McCauley asked biostatistician Vicki Hertzberg PhD FASA to move next door, from the Rollins School of Public Health to the School of Nursing, to help nursing faculty and students take advantage of the tsunami of data related to health, disease, and patient care.

An expert in statistical methods for collaborative research on epidemiological and clinical issues, Hertzberg knew little about nursing except that highly prepared nurses are caring clinicians and administrators.

But the more she learned about nursing's multiple roles and unique perspectives—especially its holistic approach to patients and families—the more she was convinced that McCauley was right. To empower nurses—and accelerate advances in health care—nursing and big data needed each other

The first step in capitalizing on that prospect was creating the Center for Data Science (CDS), led by Hertzberg and Elizabeth Corwin PhD RN FAAN, associate dean for research. Its overarching goal is to use the power of data-driven thinking to help solve some of nursing's (and health care's) most challenging problems through better clinical decision support, disease surveillance, and population health management.

Housed in the School of Nursing, the center serves as a hub for the growing number of data science resources across the university, including the schools of medicine and public health, Emory Healthcare, and affiliates such as the Georgia Institute of Technology and Children's Healthcare of Atlanta, all of which helped helped plan CDS.

Hertzberg, "the big picture person," has recruited key staff and faculty, including visiting assistant professor Rebecca Mitchell, whom she describes as "a DVM/PhD with serious bioinformatics chops." CDS has sent a dozen nursing research faculty to data analysis workshops. It also has established partnerships with Emory colleagues in business, mathematics, and computer science.

Corwin also formed a consortium with five other nursing schools committed to using common data elements for nursing research—work that Roy Simpson DNP RN DPNAP FAAN, assistant dean for technology management at the School of Nursing, helped pioneer nationally for implementation in the electronic medical record (EMR).

Working with Andrea Plotsky MSPH, an informatics specialist whom Hertzberg recruited from Rollins, consortium members agreed on the use of common terms, coding, and research instruments and measures across institutions. Thus, researchers can combine databases to increase sample sizes and maximize the ability to identify true differences.

Drilling down into EMR data

The new Center for Nursing Data Health Electronic Record database (CeNDHeR, pronounced Send-Her) is part of CDS's plan to provide real-life data for teaching purposes. Created by Plotsky and Mengtian Jin, an Emory business major and mathematics and computer science whiz, CeNDHeR consists of 100,000 randomly selected Emory Healthcare patient records. The records have been scrubbed of any identifying information. What remains is information gold. Each record may include hundreds of patient encounters (i.e., hospital, clinic, diagnostics, procedures, and medications), and each encounter may contain up to 160,000 fields of data.

Plotsky and Jin used hypothetical research scenarios proposed by students in the Doctor of Nursing Practice (DNP) program to create a subset of 100 variables that can be combined to ask questions, look for relationships, and explore hypotheses. Current usable variables include demographic factors; where each encounter took place and with what type of health professional; DRGs, the diagnostic codes used for billing (so that students can examine costs); results of laboratory tests and medical procedures; and medication history. With time, the number of variables will expand, enabling more complex queries.

DNP and PhD students are using CeNDHeR for queries such as determining the number and type of congestive heart failure patients readmitted within 48 hours after discharge. Such data will help them identify and understand reasons why patients are discharged too early.

Amazon Web Services provided support to move the database to the cloud, making access and analysis easier, faster, and less expensive. Cloud computing, Herzberg and Plotsky note, has leveled the playing field, enabling newcomers to jump into big data science without the expense of building complex physical systems of their own.

As data use becomes more integral to nursing practice, CeNDHeR will become integral to MSN and even BSN nursing education. "Don't worry," Herzberg says. "No one is trying to turn nurses into programmers." Thanks to Plotsky and Jin, nurses don't have to be. The simple drag-and-click design of the program interface will enable students to explore the database with relative ease.

Using big data for research

Big data gets much of its strength from its ability to combine information from multiple and vastly different databases. Take, for example, the rapidly burgeoning field of "omics"—genomics, transcriptomics, proteomics, and metabolomics, to name a few. Emory is awash in such data, which nursing and other scientists are eager to share and explore.

One example can be found in Moultrie, Georgia, where nursing students spend long summer days providing care to migrant farm workers in rural Colquitt County. The students often are the only health care providers many of these men, women, and children see all year.

Valerie Mac 07N 15MN 16PhD, assistant research professor at the School of Nursing, studies the body's responses to pesticide and other hazardous exposures. In 2016, she and Hertzberg went to Moultrie for a pilot study involving 38 male and female migrant workers, who provided samples of urine, blood, and stool (to measure gut microbiome). Samples were brought back to the HERCULES Exposome Research Center at Rollins, where they were analyzed for evidence of organophosphates (commonly found in fertilizers) and changes associated with inflammation and heat-related illnesses.

Still more samples went to Emory's Clinical Biomarkers Lab, where high-throughput mass spectrometry measured the chemical fingerprints left behind by specific genetic, metabolic, and cellular processes. Based on this data, Hertzberg is determining how heat and dehydration affect the microbiome, metabolomics, and, ultimately, health. She and Mac plan to expand the study.

The ultimate goal of nursing's embrace of big data is to make a positive difference in the lives of patients. Nurses who understand how big data work can help the health care team record patient notes properly for consistent, accurate data retrieval and analysis. As personalized health care (biological information that predicts disease and treatment response) becomes a greater reality, nurses will play a critical role in using collective data to build health profiles for predictive models. Big data will also provide the evidence that nurses need to shape health policy.

"Big data is the future—a future that requires critical thinking and the ability to evaluate data for good decision-making," says Simpson. "Nurses must be part of this increasingly interdisciplinary process, focusing on data through the distinct lens of nursing knowledge and education."

Email the editor