CAMBRIDGE, Mass. — An explosion of biological data will lead to the first drug candidates discovered by an artificial intelligence system within the next few years, a feat that promises to change the way pharmaceutical companies conduct research.
That was a consensus opinion that emerged among experts in machine learning who spoke at the STAT Summit Thursday about the use of AI in the pharmaceutical industry.
“I would predict that within the next two to three years we’ll have examples of [drug] targets … discovered through machine learning methods that unravel biology that wasn’t appreciated before,” said Dr. Hal Barron, chief scientific officer of GlaxoSmithKline (GSK).
While it will take quite a bit longer for these new targets to yield approved medicines, Barron added that the research will in short order produce promising drug candidates — and put them in the approval pipeline of the Food and Drug Administration.
The progress in machine learning research is being fueled by increasing volumes of genetic data and other biological information that can be mined for patterns that humans can’t see. Those patterns, the experts said, may cause drug researchers to change course and find new approaches to vexing problems in diseases ranging from cancers to neurodegenerative diseases like Alzheimer’s.
“One of the things we’ve seen time and time again in machine learning is that, if you are willing to let go of your biases in looking at large datasets, you will find you’ve been looking in the wrong place,” said Daphne Koller, chief executive of Insitro, an AI startup that partners with pharmaceutical companies.
She added: “Letting the machine figure out what really is the differentiator between sick and healthy states will let us find targets that we would not otherwise be able to find.”
During the past few years, pharmaceutical companies have significantly ramped up their investments in machine learning research. At the same time, however, many onlookers in the industry have questioned whether those expenditures are being driven by hype, or meaningful scientific progress.
The experts who spoke Thursday acknowledged that hype propagated by some purveyors is undermining real progress made by many others.
“People are throwing the term [machine learning] around to refer to anything that vaguely resembles multivariate statistics,” Koller said. “Things that people have been doing for 50 years are now called machine learning.”
It’s gotten to the point, she added, that “when you go into a venture capitalist’s office and on slide five say that you are doing deep learning, your valuation doubles. My husband, who’s an investor, calls it the machine learning pixie dust.”
What defines modern machine learning is its ability to pick out insights from massive data sets that humans will never be able to see, no matter how long they examine the data. “And from that specific definition, I think it’s being underhyped,” said Barron.
Still, he cautioned that the use cases for machine learning are limited: “You have to ask, is there a possibility of collecting enough data, with enough dimensionality, where machine learning is going to be useful. And the vast majority of the time, the answer is going to be no.”
Aviv Regev, a computational biologist at the Broad Institute, said the application of machine learning is already dramatically changing approaches to biological research. In prior decades, large amounts of messy information about biological systems from multiple sources was a problem, as it forced scientists to constrain their analyses to distinct subsets of data. Machine learning, however, is able to deal with such complexity more fluidly, even when there are wrinkles in the data, as long as the datasets are large enough.
“Now you can handle all sorts of things that we as humans see as imperfections,” she said. “That changes what [data] we should be collecting if we want to do drug discovery. We should not just think about doing things we’ve done 100 or a thousand times before, but doing things very differently.”