BioPredict, Inc.             
Home Company Profile Technologies Services Internal Discovery Success Stories People Contact


Datamining Results of Virtual Screens

While successes are  now common in the literature, docking-based virtual screens continue to suffer from high false-positive rates, high false-negative rates, and incorrect assignment of docked poses1.  The primary sources of these difficulties include  inadequacies in energetic scoring functions and uncertainties in protein coordinates due to experimental resolution, protein motion, or errors introduced when building homology models.  Why are docking-based virtual screens used?  While it is true that the highest-scoring compound is not always active, actives are often  found among the highest few percent of predicted actives.  And while it is true that the highest-scoring ligand pose is not always correct, the correct pose is often found among high-scoring poses.

How can one pick out active compounds and correct poses from the those generated in a docking based virtual screen?   The approach described here is to retain multiple conformations for all compounds docked in a virtual screen and then to applying data-mining techniques to enrich the selection of actives and of correctly docked poses.  Since a virtual screen docks many compounds to a single protein, the logical point of view for data-mining  is the fixed point of view of the protein,  not the ligand. We therefore start by generating a compact description called an "interaction footprint" for all retained poses of all docked compounds.  For each atom in the protein active site an 'interaction footprint'  describes the interaction of the ligand with that atom. 

Since 'interaction footprints' are of a fixed size they are easily compared and can be used as input to datamining operations.  For example, when crystallized ligand-protein complexes are available for high-affinity ligands, footprints can be generated to describe their observed modes of binding. These footprints can then be used as a filter to identify compounds from a virtual screening experiment that exhibit a similar mode of binding.  Compounds identified in this way have a high likelihood of binding, yet they are not restricted to the same chemical class as the compounds used to generate the footprint filter.   When active compounds are available from the medicinal chemistry literature that are not co-crystalized these can be docked and their footprints clustered  to identify common features or "binding motifs".  Resulting binding motifs can then be imposed as filters to select compounds from the screen.  The derivation of binding motifs can be further sharpened by contrasting footprints of actives and inactives using learning methods such as recursive partitioning and/or kernel-based methods.  

Application of these techniques significantly improves results from a docking-based virtual screen by lowering false positives and negatives and increasing detection of correct docked poses.  Interaction footprint filters and binding motifs can be thought of as hypotheses, whose imposition enhances detection using imperfect energetic methods.  Hypotheses can be imposed post-docking as described here, or incorporated as constraints applied directly during docking as described in the section on 'hypothesis-driven docking'. 



1 Docking on Trial, Peter Kirkpatrick, Nature Reviews Drug Discovery 4: 813 (2005).