Big Data and COVID-19: Proceed with Caution

February 29, 2020, was the date of the first reported COVID-19 death in the United States. It was the day the entire government system began to change the way they reacted to this disease, with a focus on preventing as many deaths as possible.

Since that point in time, every single state has enacted measures to slow the rate of spread and “flatten the curve.” It’s a term that will likely go down in history as the pop-culture reference of 2020, perhaps the first time ever such a term has been so heavily influenced by big data.

In the ten weeks since that, the rate of infection in the US has exploded to over 1.6 million cases and 92,000 deaths. Worldwide, this pandemic has led to over 4.9 million confirmed infections and over 320,000 deaths. ¬The numbers are devastating, yet they’re significantly lower than initial models suggested might be the case if people were allowed to continue about their daily lives.

The use of big data to make policy decisions may very well have resulted in hundreds of thousands of lives being saved. But where does that data come from? And how much weight should we be giving to it as more decisions need to be made in the weeks and months ahead?

Understanding Big Data

Unless you work in the field of data science, you’ve probably never heard of the term “big data” before. Unlike “big pharma,” it doesn’t have quite the same buzzworthy appeal. But it turns out in times of global crisis, big data may be one of the most influential tools in our toolkit for predicting potential outcomes and saving lives.

In the simplest terms possible, big data is data that is massive in volume and still growing exponentially over time.

As it applies to COVID-19, big data is the accumulation of all the data points related to this disease being received from around the world. Mathematical modeling has taken that data and used it to identify geographical hotspots, create death prediction models, provide estimates regarding testing and the need for testing supplies, and guide decision-making among policymakers, health care providers, and other key stakeholders.
But it’s important to understand that while big data gives us the insight we otherwise wouldn’t have, it doesn’t always manage to account for all the variables necessary to make an accurate call.

Predicting the future is impossible (just ask every data scientist who tried to predict the outcome of the 2016 election). The information we have is valuable, and big data has the potential to save lives—but it’s not infallible, and it loses value when we fail to combine what that data tells us with science and with what is actually happening on the ground.
Big Data and COVID-19

The fight against COVID-19 is far from over—the international health community now believes the crisis will likely carry on until early 2021, at which point a vaccine will hopefully become available.

But there is no doubt harnessing the computational power provided by the field of big data has played a key role in our collective, and largely successful, efforts to mitigate and contain the spread of this disease.

Real-time tracking of cases has been instrumental not only in guiding policy but also in helping keep people informed around the globe. World leaders have been able to visualize the pandemic, locating hotspots and using predictive modelling to help guide policy. This real-time tracking has also provided insight into those diagnosed with and recovering from COVID-19. That’s information that will be used to guide efforts to reopen sectors of the economy, assuring leaders of an overall low level of new cases or a high number of recovered people who may have immunity to COVID-19.

Models now suggest that the disease curve has flattened, meaning that in many places around the world participating in social distancing and stay-at-home efforts, the number of new cases has not gone up at the drastic and exponential rate initially feared.

The next step would be improved contract tracing, which essentially involves connecting the dots of who an infected person may have been in contact with prior to diagnosis. While contact tracing at this point in the epidemic can be hard to implement on a large scale, the facts show that contact tracing works—and big data provides the opportunity for accurate contact tracing based on so much more than just an individual’s memory of whom they’ve spent time with. According to the Harvard Business Review, contact tracing helped curb the spread of COVID-19 in East Asia, and it could absolutely do so for the rest of the world.

Indeed, big data and data analysis has been instrumental in curtailing the COVID-19 pandemic. But this data doesn’t exist in a bubble. There are caveats we need to pay attention to, lest we risk derailing the international community’s efforts to mitigate the spread of this outbreak.

Caveats of Big Data Use in COVID-19

While big data has been a tremendous help in the fight against COVID-19, some drawbacks exist:

• Interpretation: Policymakers and others who wish to harness the power of big data must look at the data in terms of context. Though the so-called “curve” has been squashed, this has only been due to society’s collective efforts to practice social distancing, handwashing, and in many places, adhering to stay-at-home measures. For this reason, the current decrease in the rate of new cases should not be seen as a reason to give up all of the efforts we have put into play. Without the proper interpretation of the current data and trends related to COVID-19, the results could be disastrous. Harnessing the full potential of the data won’t be a solo effort by the data science community alone—we need to be working as a team with the healthcare experts with biomedical expertise to draw conclusions and make decisions which can benefit society

• Privacy issues: Companies and governments have been working together to obtain location data for users of mobile phones and the internet to develop contact tracing methods. These efforts largely involve analyzing vast datasets in order to reveal patterns in movements and behaviour that can be used to implement safety measures to prevent the spread of the epidemic. While there is no question that contact tracing methods are effective to reduce the spread of COVID-19, there are certainly privacy issues to consider.

• Lack of real-world context: Big data initially told us that the vast majority of people impacted by COVID-19 were elderly or with underlying health conditions. While it is true these are the groups most at risk, we have since learned that COVID-19 can, and does, lead to serious complications for those outside those groups as well. Doctors and healthcare workers play an important role in data interpretation. We cannot be making predictions and drawing conclusions without the healthcare and scientific context.
Proceeding with Caution

Big data is powerful. But with great power comes great responsibility. While we have access to this huge database of information, there will always be some blind spots and missing variables of data that prevent us from having the full picture.

In the proper context, big data can be incredibly useful. But in order to harness its full power, we need to have both people who are familiar with data models as well as people who understand the epidemiology and the medical implications of the virus to work together.
It is important to keep in mind that existing big data models may be incomplete due to variables that remain unaccounted for (such as population density), either due to statistical or methodological limitations, or because the relevant variables have not yet been identified. This means that despite our best efforts, we may still be missing the entire picture on COVID-19.

The truth is, predictive analytics are just that: predictive, not fortune-telling. But the more information we put into existing models, the more accurate our interpretations will be.
That will require healthcare workers, policymakers, data scientists and epidemiologists all working together to combine what they know and provide the most accurate picture possible.