Last month’s column discussed a couple of the ways data analytics are helping medical researchers find successful treatments, and how Medicare and Medicaid auditors can sift through data to find fraudulent claims and reduce the number of improper claims.
Analytics, however, can also produce more immediate results on the ground, by helping epidemiologists track disease outbreaks as they happen, much more quickly than was possible in the past. Some exciting developments in this area involve both Big Data and not-so-big data, and they point to a future in which epidemics can be stopped before they reach worldwide proportions.
One of the more widely publicized uses of text mining originated in 2005, when Google searches for flu-related terms enabled researchers to track an influenza outbreak. That part is well known – but what actually happened is more complicated.
As reported by Canadian physician-researcher Gunther Eysenbach in the AMIA Annual Symposium Proceedings, published by the American Medical Informatics Association, it was actually Google AdSense that provided the information. Google itself does not like to release search data – even when subpoenaed by the Federal government – but by creating an ad campaign with AdSense, Eysenbach was able to track Canadian searchers who entered “flu” or “flu symptoms” into Google over a period of time. He then compared his statistics with those of the Public Health Agency of Canada, which uses more traditional metrics based on reports from physicians and labs testing for influenza. Eysenbach’s click-generated results for each week turned out to be a better predictor of the following week’s rate of influenza than the traditional physician reports.
Meanwhile, Boston Children’s Hospital’s Computational Epidemiology Group has developed an information site called HealthMap, which shows, graphically on a map or in a list, disease reports for any location in the US or throughout the world. They do this by aggregating information from an array of publicly available sources, including Google News, Baidu News (China), and more specialized health news sources like ProMED Mail, a service of the International Society for Infectious Diseases, and numerous other sources of information on both human and animal health and disease. Crowdsourcing also plays a role: HealthMap includes a mobile app that provides local information on any disease outbreaks occurring in the area and allows users to submit reports of outbreaks not yet shown on the map.
Data generated from public sources can provide an accurate, timely picture of the progress of seasonal or other disease outbreaks. Source: healthmap.org
This summer’s devastating outbreak of Ebola fever in West Africa has caused widespread alarm, both in the countries hit by the outbreak and elsewhere. With high-volume international air travel, an infectious virus can spread across the world in a matter of hours or days. But epidemiologists have new tools at their disposal, to track the progress of such an epidemic so that public health authorities can allocate resources effectively.
Medical workers can deduce how and where an outbreak is likely to spread, if they know where individuals are going, whom they’re interacting with, what they’re doing, what their demographic traits are, and what level of awareness they have regarding disease prevention. This technique, known as “contact tracing,” is the most important means of stopping a disease outbreak. By aggregating survey data gathered from individuals, medical organizations can anticipate where an epidemic is likely to hit next, so they can get their resources there in time to make a difference.
A software package called Epi Info, developed by the Centers for Disease Control and Prevention (CDC), uses contact tracing to map the spread of an epidemic. Epi Info, a set of public domain programs for designing surveys and gathering data in the field, was originally produced in 1985 and has undergone several phases of enhancement since then. It supports the development of specialized applications tailored to the unique characteristics of a particular disease outbreak. The application being used to track this year’s Ebola outbreak is so new that the CDC sent a programmer to Africa to continue working on the code while it’s already in use.
Epi Info combines commonly used statistical techniques – linear regression, t-tests, analysis of variance, and so on – with more specialized epidemiological analyses such as Mantel-Haenszel (a way of identifying likely risk factors for a particular disease), and Kaplan-Meier and Cox Proportional Hazards survival analysis (methods for estimating survival rates of a treated population). The CDC developed Epi Info in response to the need for the agency’s Epidemic Intelligence Service (EIS) to be able to track disease outbreaks in West Africa, but it has become widely used in other areas as well.
A Big Data truism holds that it’s all about volume, variety, velocity, and veracity. But data analytics breakthroughs don’t always have to involve huge volumes of data, as this case illustrates. By capturing the right data, doing the right analysis, and getting the results out into the hands of front-line health workers immediately, data analytics can save lives.
By John Kafalas
This monthly column covers Business Intelligence and data analytics issues. If you have questions, comments, or topic suggestions, please contact the author.
Copyright © 2014 4Sight Technologies, Inc. All rights reserved.