For years, DNA-encoded library (DEL) technology has been a workhorse for drug discovery, allowing researchers to screen billions of compounds against biological targets efficiently.

President and CSO
X-Chem

CTO
X-Chem
But a new paradigm is emerging, one that reframes DEL not simply as a screening platform, but as a massive data generation engine capable of fueling artificial intelligence (AI) and reshaping how we understand protein structure and function.
In this Xtalks Spotlight feature, experts from X-Chem, Matt Clark, President and Chief Scientific Officer (CSO), and Erin Davis, Chief Technology Officer (CTO), discuss how advances in data scale, computational methods and library diversity are converging to unlock new possibilities in early-stage drug discovery.
From Big Data to Omics-Scale Chemistry
Traditional “big data” in chemistry often involves a few million data points, from a variety of sources that are often not directly interoperable, but recent developments are pushing that boundary much further.
X-Chem’s DEL platform is now generating hundreds of millions of usable data points, offering unprecedented opportunities to study structure-function relationships.
Davis explained that while collections of directly comparable datasets are generally much smaller across the industry, X-Chem’s breakthrough is twofold: the volume of data being generated is orders of magnitude greater, and because it’s produced from a single experimental platform, the results are inherently comparable across readouts. With access to hundreds of millions of directly comparable data points, researchers are now exploring new ways to analyze protein structure and function and apply those insights to drug discovery.
“Chemomics is fundamentally now working with chemistry data at an omic scale,” Davis explained.
The field has made significant strides in handling large chemistry datasets, but the current scale represents a step change, she said.
AI and Chemomics: A Perfect Partnership
As AI methods continue to evolve, their performance increasingly depends on the quality and scale of underlying data. In drug discovery, omics-scale chemistry datasets provide exactly that foundation.
As the world continues to invest in developing AI methods, chemomics is creating the data that goes into it at an unprecedented scale, making them perfect partners
However, while the algorithms are “phenomenal, they can’t do anything unless you have solid data,” Davis noted.
This alignment is pushing the field toward more data-driven discovery, where vast libraries and sophisticated models work hand in hand to uncover new insights that were previously unreachable with smaller datasets.
From Pioneering to Tech Acceleration
Clark outlined X-Chem’s position as pioneers in DEL technology.
“We’ve been operating it for 15 years. I think we’ve done more DEL screens and built more DEL libraries than any other team in the world. We’ve learned a lot of things about how to design libraries and how to run the screens in the most effective way. We’ve also learned a lot of things about how to analyze and annotate the data that comes off of the DEL.”
In addition to the company’s long-standing expertise in DEL technology, it has also made recent investments in computational infrastructure, along with bringing in Davis as CTO, to develop a powerful computational suite capable of handling and leveraging the vast datasets produced by the platform, a move that now allows them to fully harness the technology’s potential.
A Mindset Shift in How Data Is Used
The technological leap also brings a conceptual shift: from viewing DEL as a way to generate shortlists of “hits” to recognizing its value as a rich data resource in its own right.
“The mindset change really is transforming DEL from a screening platform to a data generation platform,” explained Clark.
Traditionally, outputs of DEL screens are condensed into hit lists of a few dozen compounds for further development. But this approach leaves the majority of the data, tens of millions of measurements, largely untapped.
“It almost seems like a disservice to take tens of millions of data points and distill them down to 40 compounds,” said Clark.
Davis adds that this shift requires rethinking how to handle chemistry data at this unprecedented scale.
A major challenge in working with chemistry data has traditionally been the need to piece together information from multiple, disparate sources to achieve sufficient scale. For decades, people have been “scrapping” together whatever they could to reach meaningful scale, explained Clark.
Now, there is orders of magnitude more data from the same experiment on the same target under the same conditions. Therefore, with access to datasets that are orders of magnitude larger than before, the challenge has shifted.
Rather than scarcity, researchers are facing an abundance of high-quality data. “It’s almost the burden of luxury now and the burden of choice,” said Clark.
Why This Transformation Is Happening Now
While DEL technology has been around for years, the current transformation has been made possible by the accumulation of highly diverse libraries over time, combined with recent advances in computational capacity.
It has taken years to build the critical mass of diversity, quality and scale necessary to make these datasets the data generation resource that they can be, one that is truly valuable for modeling and AI.
“The libraries accumulate over the years, in terms of schemes, vectors, building blocks. This diversity is key. We started seeing signals that our data could drive computational modeling a few years ago, and as the libraries continue to grow, that signal just gets stronger,” Clark notes.
“We continue to see that as time goes on, as we build the libraries even further, that signal just gets more and more powerful.”
The convergence of omics-scale chemistry data and AI is catalyzing a fundamental shift in drug discovery. Rather than focusing solely on hit identification, researchers are beginning to explore how massive, high-quality chemistry datasets can deepen our understanding of biological systems, drive better modeling and ultimately lead to more effective therapeutics.
This is not just an evolution of DEL technology, it’s a redefinition of how data itself is generated, interpreted and applied across the discovery pipeline.
This article was created in collaboration with the sponsoring company and the Xtalks editorial team.
Join or login to leave a comment
JOIN LOGIN