Researchers have amassed vast single-cell gene expression databases to understand how the smallest details impact human biology. However, current analysis methods struggle with the large volume of data and, as a result, produce biased and contradictory findings. Scientists at St. Jude Children’s Research Hospital created a machine-learning algorithm capable of scaling with these single-cell data repositories to deliver more accurate results. The new method was published today in Cell Genomics.
Before single-cell analysis, bulk gene expression data gave high-level but unrefined results for many diseases. Single-cell analysis enables researchers to look at individual cells of interest, a difference akin to looking at an individual corn kernel instead of a field. These detailed insights have already made breakthroughs in understanding some diseases and treatments, but difficulty replicating and scaling analyses for data that keeps increasing in size has stymied progress.
Our scalable toolset addresses the exponential growth in single-cell RNA sequencing data, enabling accurate analysis within a practical timeframe.
Dr. Paul Geeleher, St. Jude Department of Computational Biology
All techniques for studying single-cell gene expression create large amounts of data. When scientists test millions of cells simultaneously, the amount of computer memory and processing power needed to handle the data is enormous. Geeleher’s team turned to a different kind of hardware to help solve the problem.
We developed a method leveraging GPUs, providing the processing power needed to handle the computational load in a scalable and efficient way.
Dr. Xueying Liu, St. Jude Department of Computational Biology
The volume of data often forces researchers to make concessions and assumptions that introduce biases when conducting analyses with standard methods. The St. Jude scientists used an artificial intelligence approach that removes such bias from these selections.
Our method employs unsupervised machine learning to automatically identify robust, less arbitrary parameters, grouping cells based on their active biological processes or cell type identities.
Dr. Xueying Liu, St. Jude Department of Computational Biology
Our tool is broadly applicable for studying any disease through single-cell RNA analysis and has outperformed existing methods. We hope other scientists use it to maximize the value of their single-cell data.
Dr. Paul Geeleher, St. Jude Department of Computational Biology
Since the algorithm learns and derives its analysis from the data presented, researchers could use it on any sizeable single-cell RNA sequencing dataset. As it investigates each new large dataset individually and only uses those expression program clues to make conclusions, the researchers called the approach the Consensus and Scalable Inference of Gene Expression Programs (CSI-GEP). When applied to the largest single-cell RNA databases, CSI-GEP produced better results than every other method. Most impressively, the algorithm could identify cell types and the activity of biological processes missed by other methods.
CSI-GEP is freely available at https://github.com/geeleherlab/CSI-GEP.
(Newswise/HR)