As public health officials around the world contend with the latest surge of the COVID-19 pandemic, researchers at Drexel University have created a computer model that could help them be better prepared for the next one. Using machine learning algorithms, trained to identify correlations between changes in the genetic sequence of the COVID-19 virus and upticks in transmission, hospitalizations and deaths, the model can provide an early warning about the severity of new variants.
More than two years into the pandemic, scientists and public health officials are doing their best to predict how mutations of the SARS-CoV-2 virus are likely to make it more transmissible, evasive to the immune system and likely to cause severe infections. But collecting and analyzing the genetic data to identify new variants — and linking it to the specific patients who have been sickened by it — is still an arduous process.
Because of this, most public health projections about new “variants of concern” — as the World Health Organization categorizes them — are based on surveillance testing and observation of the regions where they are already spreading.
“The speed with which new variants, like Omicron have made their way around the globe means that by the time public health officials have a good handle on how vulnerable their population might be, the virus has already arrived,” said Bahrad A. Sokhansanj, PhD, an assistant research professor in Drexel’s College of Engineering who led development of the computer model.
The Drexel model, which was recently published in the journal Computers in Biology and Medicine, is driven by a targeted analysis of the genetic sequence of the virus’s spike protein — the part of the virus that allows it to evade the immune system and infect healthy cells, it is also the part known to have mutated most frequently throughout the pandemic — combined with a mixed effects machine learning analysis of factors such as age, sex and geographic location of COVID patients.
Learning to Find Patterns
The research team used a newly developed machine learning algorithm, called GPBoost, based on methods commonly used by large companies to analyze sales data. Via a textual analysis, the program can quickly home in on the areas of the genetic sequence that are most likely to be linked to changes in the severity of the variant.
It layers these patterns with those that it gleans from a separate perusal of patient metadata (age and sex) and medical outcomes (mild cases, hospitalizations, deaths). The algorithm also accounts for, and attempts to remove, biases due to how different countries collect data. This training process not only allows the program to validate the predictions it has already made about existing variant, but it also prepares the model to make projections when it comes across new mutations in the spike protein. It shows these projections as a range of severity – from mild cases to hospitalizations and deaths – depending on the age, or sex of a patient.
Keeping up with Covid
Drexel’s targeted approach to predictive modeling of COVID-19 is a crucial development because the massive amount of genetic sequencing data being collected has strained standard analysis methods to extract useful information quickly enough to keep up with the virus’s new mutations.
Rosen’s lab has been at the forefront of using algorithms to cut though the noise of genetic sequencing data and identify patterns that are likely to be significant. Early in the pandemic the group was able to track the geographic evolution of new SARS-CoV-2 variants by developing a method for quickly identify and labeling its mutations. Her team has continued to leverage this process to better understand the patterns of the pandemic.
A Better View
The team notes that advances like this underscore the need to provide more public health resources to vulnerable areas of the world — not only for treatment and vaccination, but also for collecting public health data, including sequencing emerging variants.
The researchers are currently using the model to more rigorously analyze the current group of emerging variants that will become dominant after Omicron BA.4 and BA.5. (NS/Newswise)