successes are now common in the literature, docking-based virtual
screens continue to suffer from high false-positive rates, high
false-negative rates, and incorrect assignment of docked poses1.
The primary sources of these difficulties include inadequacies in
energetic scoring functions and uncertainties in protein coordinates due
to experimental resolution, protein motion, or errors introduced when
building homology models. Why are docking-based virtual screens used?
While it is true that the highest-scoring compound is not always active,
actives are often found among the highest few percent of predicted
actives. And while it is true that the highest-scoring ligand pose is
not always correct, the correct pose is often found among high-scoring
can one pick out active compounds and correct poses from the those
generated in a docking based virtual screen? The approach described
here is to retain multiple conformations for all compounds docked in a
virtual screen and then to applying data-mining techniques to enrich the
selection of actives and of correctly docked poses. Since a virtual
screen docks many compounds to a single protein, the logical point of
view for data-mining is the fixed point of view of the protein, not
the ligand. We therefore start by generating a compact description
called an "interaction footprint" for all retained poses of all docked
compounds. For each atom in the protein active site an 'interaction
footprint' describes the interaction of the ligand with that atom.
'interaction footprints' are of a fixed size they are easily compared
and can be used as input to datamining operations. For example, when
crystallized ligand-protein complexes are available for high-affinity
ligands, footprints can be generated to describe their observed
modes of binding. These footprints can then be used as a filter
to identify compounds from a virtual screening experiment that exhibit a
similar mode of binding. Compounds identified in this way have a high
likelihood of binding, yet they are not restricted to the same chemical
class as the compounds used to generate the footprint filter. When
active compounds are available from the medicinal chemistry literature
that are not co-crystalized these can be docked and their footprints
clustered to identify common features or "binding motifs". Resulting
binding motifs can then be imposed as filters to select compounds from
the screen. The derivation of binding motifs can be further sharpened
by contrasting footprints of actives and inactives using learning
methods such as recursive partitioning and/or kernel-based methods.
Application of these techniques significantly improves
results from a docking-based virtual screen by lowering false positives
and negatives and increasing detection of correct docked poses.
Interaction footprint filters and binding motifs can be thought of as
hypotheses, whose imposition enhances detection using imperfect
energetic methods. Hypotheses can be imposed post-docking as described
here, or incorporated as constraints applied directly during docking as
described in the section on 'hypothesis-driven docking'.
1 Docking on Trial, Peter Kirkpatrick, Nature Reviews Drug
Discovery 4: 813 (2005).