BioPredict, Inc.             
Home Company Profile Technologies Services Internal Discovery Success Stories People Contact


Our Internal Focus: The GHKL Superfamily of Proteins

Current early drug discovery is driven by high-throughput screening of compound libraries, aided by computational analysis of a target and/or lead compounds.   Biopredict, Inc. have developed a suite of computational methods to address these problems (see Technology) that:

  • analyzes and internalizes available information in protein families or superfamilies;
  • uses this information to design a focused virtual screen of purchasable compounds for testing;
  • docks and performs energy minimization of small molecules in protein active sites and builds molecular fragments into larger molecules;
  • analyzes hits resulting from these focused screens, and
  • generates follow-on hit-to-lead focused screens and combinatorial chemistry lead-optimization libraries.

Our approach makes use not only of available information on a specific target of interest, but also of information on related targets within the target protein family  We have honed our current approach through collaborations with biotech and pharmaceutical companies to identify lead compounds, to increase their potency, to increase specificity in target families, and to design family-focused combinatorial libraries. Targets have included nuclear receptors, protein kinases and phosphatases, nucleotide synthases, and serine and other proteases as as well as other more novel targets. 

An advantage of our family based multi-protein-target approach is that, at the first stage of hits elucidation and structural hypothesis verification, we screen for ligands that bind to a conserved structural motif common to this family of proteins. We use a complementary approach that detects activities against multiple members of the family, helping to prove our structural binding mode hypotheses.

Advantages of our approach over other structure-based virtual screening methods

Many scoring functions used to dock and evaluate docked compounds are still imperfect and will probably remain so for quite some time.  As a result the highest scoring compound is not always active, and the highest scoring conformation of an active compound is not always predicted correctly.  Our information-driven approach circumvents this problem by augmenting energy scores using hypothesis-driven methods to select compounds that exhibit similar binding modes to those already established either against the specific target or against homologous targets.  When used to select compounds these methods postulate that if compounds possess similar modes of binding to known actives they will themselves be active.   It has been our experience that the application of these methods when directly compared to selection of compounds by docking scores alone results in a roughly ten-fold enrichment in identified actives for a screen.  Despite this enrichment, we generally include up to ~1000 compounds in our first screen selected using multiple hypotheses to increase the number of hits obtained.  A further improvement is the inclusion of a multiply iterated screen.  Once hits are found, testing purchasable compounds related to identified hits is an attractive, efficient, and inexpensive way to expand the number of hits considered within a given lead series before committing to chemical synthesis.  A final and perhaps defining advantage of our methods is that they work with multiple related targets at the same time. Careful analysis of active site similarities for multiple targets enables the identification of common focused libraries for testing. If active site similarities are great enough, the same methods can lead to multiple-target inhibitors.  A compound that hits more than one target in a sensitive pathway has a higher likelihood of avoiding drug-resistant mutations when targeting infectious disease organisms (we are pursuing this approach in kinases with a client company). 

The GHKL Superfamily

This superfamily includes such diverse protein families as DNA topoisomerase II, molecular chaperone Hsp90, DNA-mismatch-repair enzymes MutL, and histidine kinases.  The superfamily is rich in established and proposed drug targets, for both cancer and bacterial infectious disease.  We are currently applying a structure-based, computationally-driven approach to identify novel compounds and to design combinatorial chemistry libraries that are active against multiple members of the superfamily.   Our goal is to generate high affinity, specific inhibitors for several GHKL targets, with supporting broad-spectrum libraries active against multiple GHKL targets to establish a superfamily-based SAR.