DNA Barcoding in Drug Discovery Screens Might Miss Possible Treatments

Recent studies indicate that drug discovery initiatives that depend heavily on DNA-encoded chemical libraries might be missing numerous viable drug candidates. These libraries assign a unique DNA sequence to each molecule, similar to a barcode, allowing researchers to examine extensive numbers of compounds at the same time. These large datasets are often crucial for training machine learning models that look for potential drug candidates.

Researchers, headed by Raphael Franzini from the University of Utah, aimed to assess the trustworthiness of data linked to DNA-encoded chemical libraries. They analyzed a library comprising over 58,000 compounds focused on enzymes related to DNA repair and cancer. After synthesizing and evaluating 33 molecules that were previously disregarded by screens, they found many to be just as effective as those deemed promising. Remarkably, several screens nearly missed compounds that were structurally similar to the approved cancer medication olaparib.

Franzini comments, “We discovered that DNA-encoded library data frequently categorizes effective molecules as ineffective ones.” The problem appears to stem from the DNA barcodes; molecules evaluated with these tags displayed reduced activity, particularly against targets they weren’t originally designed to interact with.

This research, referred to as a “highly relevant contribution” by Laura Guasch, a computational chemist at Roche in Switzerland, emphasizes the challenge of false negatives contaminating datasets, which can undermine machine learning algorithms used in this field. “False negatives introduce considerable noise and bias into training datasets, causing machine learning models to detect misleading patterns or disregard valid chemotypes,” states Srinivas Chamakuri, assistant professor at Baylor College of Medicine.

Franzini’s team demonstrated that ostensibly effective machine learning models were simply recognizing recurring structural patterns, lacking genuine predictive power. Guasch remarks, “A key implication of this study is the notable risk that existing drug discovery initiatives might be overlooking potential drug candidates because of high rates of false negatives.”

The research showed that removing unreliable data from training sets and focusing on verified active compounds significantly enhanced models’ capabilities to pinpoint promising drugs. This implies that significant modifications may be necessary in machine learning methodologies for drug discovery to address inherent biases present in screening data.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.