BioPredict, Inc.             
Home Company Profile Technologies Services Internal Discovery Success Stories People Contact


Pharmacophore Match for Docking-Based Virtual Screening

A key technology developed at BioPredict a computational environment for virtual screening and for ab-initio structure-based ligand design whose core is a novel probabilistic computational engine based on a Markov Random Fields (MRF).  

Many problems in computational chemistry and biology can be reduced to finding a match of chemical features in three dimensions.  Problems that can be tackled in this way include three-dimensional superposition of small molecules and of proteins,  identification of pharmacophores in compounds sharing biological activity, screening compound libraries to identify compounds exhibiting specific pharmacophores, and docking ligands into the active site of a protein. Here we are applying MRF engine to the latter problem.

Ligand docking is performed as a graph match of an abstracted description of the ligand fragments or ligand conformations  to an abstracted description of the active site.  The abstracted descriptions are graphs, whose nodes are chemical entities (hydrogen bond acceptors/donors, hydrophobic centers, formal charges, etc.) and whose edges are associated distance constraints. The weighted graph-matching problem is expressed as an MRF model, whose solution minimizes its associated free energy function.  Fast, convergent message passing schemes called Belief Propagation (BP) and approximation to Mean Field (MF)  for product graphs are  used to solve the MRF.  The resulting solution is a maximal a posteriori probability (MAP) distribution that statistically describes optimal (in MAP) placements of the fragment into the active site. Individual low-energy placements of the fragment are obtained by marginalizing the MAP distribution.  MRFs can be combined, allowing simultaneous probabilistic docking into multiple models (ensembles) of protein active sites to represent protein families and/or to account for protein flexibility. Ligand  poses resulting from marginalizing such multi-model probability distributions position ligands  into a newly defined combined protein models  that mix  features of  constituent models.   MRF can incorporate an extended set of chemical substructures for matching at its nodes.  It also can incorporate sets of probabilistic beliefs, expressed as probabilistic prior distributions.  These can be used to bias matches according to known actives (focussed docking). The method used for ligand docking is extended to the computationally related application of ab-initio ligand design.

 A key advantage of MRF probability distributions is that they can be combined.  When applied to docking-based virtual screening these combined MRF’s can be used to describe an ensemble of alternative conformations of a protein, providing an efficient and compact means to account for protein flexibility. Ligand poses obtained by marginalizing the ensemble MRF are optimized against an internally defined probabilistic protein model mixes features of original constituent protein models.  The use of ensemble MRF’s offers an attractive alternative to docking into the potentially enormous number of individual models that the combined features represent. Another application of ensemble MRF’s is to docking into multiple active sites within protein families, a problem that arises in the development of family-focused combinatorial libraries and in identifying compounds with selectivity for a particular family member.   

 An outstanding problem with current docking-based virtual screens is that of the scoring functions used to rank docked compounds.  While scoring functions continue to improve, we believe that they will fail to correctly rank compounds and compound conformations sufficiently to identify active molecules without excessive false positives for some time.  An alternative to using energetic criteria alone is incorporate additional knowledge on the binding modes of known actives to the target or related targets directly into the docking paradigm.   This knowledge can be expressed directly in the MRF as Bayesian priors, augmenting the normal steric, geometrical, graphical, and chemical criteria used to define matches of ligands to the active site.   While this approach does not directly improve scoring functions, it can significantly enrich the number of actives identified in a virtual screen - our ultimate goal.

Recently, fragment-based approaches to virtual screening are becoming more popular, especially for early stages of the drug discovery process. Our method is ideally suited to this approach, in which small molecules called fragments are first docked and then  linked  together  using a library of linkers to suggest novel high-affinity ligands. The approach significantly extends the scope of virtual screening, since potentially suggested ligands represent a very large virtual library. We have currently assembled a database of purchasable fragments and  linkers that, while extremely managable,  represents  ~ a billion virtual compounds large.

The MRF approach to all of the problems given in the introduction is new. BioPredict actively seeks collaborators with whom to explore specific applications.