Wednesday, March 8, 2026 | 11am EDT: Smarter Trial Decisions Through Clinical Data Integration

X

What Matters More in DNA-Encoded Libraries: Size, Quality or Chemical Diversity?

DNA-encoded library design, DNA-encoded library

DNA-encoded libraries (DELs) are a lab-based screening approach used in early drug discovery to test large numbers of chemical compounds against disease-related biological targets. Because DELs can evaluate billions of compounds at once, they are often discussed in terms of scale.

Ying Zhang, PhD
Vice President of Discovery
X-Chem

According to Dr. Ying Zhang, Vice President of Discovery at X-Chem, however, size alone is a poor indicator of success. In an Xtalks Spotlight conversation, Dr. Zhang, who has led discovery and library strategy at the company for more than a decade, explained why library quality matters.

Designing a DEL the right way can influence the reliability of screening data, guide early discovery decisions and affect whether promising drug candidates move forward.

 

 

 

 

How DELs Became Associated with Scale

“DELs can grow very large because of how they are synthesized,” said Dr. Zhang.

“The libraries are chemically synthesized in a split-and-pool fashion in which the building block sets are introduced in sequential reaction- encoding cycles,” she explained. “So higher numbers of cycles effectively increase the numerical size of the library.”

However, she noted that adding more cycles does not come without tradeoffs.

“When you add cycles of chemistry, it tends to increase the molecular weight of the compounds you produce in the library,” she said. Additional cycles can potentially reduce synthetic efficiency and increase side products.

As a result, very large libraries can drift away from lead-like or drug-like chemical space.

“What we want is high quality library compounds in lead-like, drug-like chemical space,” Dr. Zhang said. “And that’s why at X-Chem, instead of numerical size, we have been focusing on making DNA-encoded libraries with fewer cycles of chemistry while maintaining chemical diversity.”

Fewer Cycles, Stronger Starting Points

Dr. Zhang emphasized that library quality, particularly chemical diversity and physicochemical property profiles, has a direct impact on discovery timelines.

“We firmly believe that the library quality, including diversity and property profiles, are the most important factors in accelerating drug discovery,” she said.

She explained that, aside from macrocyclic libraries designed to intentionally explore beyond Rule-of-Five space, most newly designed libraries at X-Chem contain only two to three cycles of chemistry, with average molecular weights around 350 to 450 daltons.

This strategy, she said, has translated into real-world outcomes across partnering discovery programs.

“We analyzed over 700 licensed compound families, and all clinical candidates are from those libraries with fewer than a billion molecules,” Dr. Zhang said. “That correlates to typical reaction schemes with two to three cycles of chemistry.”

What Chemical Diversity Really Means

For Dr. Zhang, chemical diversity is not simply about having different compounds.

“When we consider library diversity, we definitely think beyond numerical size,” she said. “True chemical diversity comes from the library schemes, vector geometry, the three-dimensional shape of the molecules, the connection chemistry that links the building blocks at each cycle and even the diversity of the building blocks themselves.”

She stressed that diversity decisions are made intentionally at the design stage.

“At the library design stage, we do not want to limit ourselves to what can be made,” Dr. Zhang said. “Rather, our library designs are guided by what should be made in order to deliver a successful discovery outcome.”

She described adapting atom efficient reactions on DNA to generate compounds with high three-dimensional character and unique topological diversity, which support downstream hit-to-lead and lead-to-candidate efforts.

Why Library Quality Shapes DEL Data

Dr. Zhang said library quality ultimately determines data quality. “A high-quality DNA-encoded library supports discovery programs of different modalities, help generate reliable data streams for machine learning and pharmacophore paneling and ultimately provide a rich source to elicit structure-function relationships.”

She described DEL as a union of chemistry and biology that generates data at an omics-like scale, where consistency and diversity are critical, especially for machine learning applications.

Dr. Zhang pointed to a 2020 DEL machine learning paper published in collaboration with Google, which evaluated DEL data generated from X-Chem libraries as a source of machine learning data. She also referenced subsequent collaborative work with the Structural Genomics Consortium (SGC), where machine learning applied to DEL data alone was used to identify hits and tool compounds across 18 targets to evaluate ligandability of WDR protein family.[

“It isn’t a numbers game,” Dr. Zhang added. “The quality of the inputs really determines the quality of the outputs.”

Library Design, Novelty and Discovery Risk

Library quality, Dr. Zhang said, also influences early discovery decisions and overall program risk.

“Over the years, we’ve received lots of feedback from our discovery partners that our DEL hits bind to cryptic pockets, identify allosteric binding sites or induce unprecedented conformational changes of the protein,” she said.

She hypothesized that this reflects an alignment of chemical and biological diversity.

“A truly diverse chemical library should be able to engage more targets and do so in novel ways,” Dr. Zhang said.

She cited a 2024 perspective article featuring industry opinion leaders that examined library quality and chemical diversity from a structural biology perspective, linking these factors to discovery outcomes such as hit identification and desired modes of action.

How to Evaluate DEL Partners

“What kind of questions should you ask when assessing DEL partners?” Dr. Zhang suggested metrics beyond size.

“You should ask how diversity is defined, how it is measured. This should give a glimpse of whether the library is designed with the downstream work in mind,” said Dr. Zhang, adding companies should also check how often DEL results lead to follow-up programs.

“Look at the compounds they publish,” she said, noting that published compounds can often reflect library novelty, design philosophy and physicochemical property profiles.

Dr. Zhang added that analyses X-Chem publications since 2020 show X-Chem reported compounds with average molecular weights just above 400 daltons and contain  fewer amide bonds (less than 1), something associated with desirable physicochemical properties. According to her, X-Chem’s collaboration partners have been able to advance DEL-derived hits into clinical candidates, with the most advanced reaching Phase III.

“These questions focus on whether the data supports confident decision-making,” she said. “It all boils down to how they support the desired outcome of your discovery programs.”

As the conversation wrapped up, Dr. Zhang returned to a central theme.

“The quality of the DNA-encoded libraries, such as chemical diversity and compound properties, really impacts the outcome of small-molecule drug discovery.”


This article was created in collaboration with the sponsoring company and the Xtalks editorial team.