Computational Techniques in the Drug Design Process

David Young
Cytoclonal Pharmaceutics Inc.

The purpose of this document is to outline the drug design process and specifically the role of computational modeling techniques. This is not meant to be a comprehensive review. It is meant to list the most important techniques currently in use.

The process of designing a new drug and bringing it to market is very complex. According to a 1997 government report, it takes 12 years and 350 million dollars for the average new drug to go from the research laboratory to patient use. Pieces of this process are often repeated to create successively better drugs for the same condition. In the case of antibiotics, drugs loose effectiveness as an immunity is built up, thus leading to a continuing "arms race". The major steps in the drug design process "from scratch" are.


    Find out all that is known about the disease and existing or traditional remedies. It is also important to look at very similar afflictions and their known treatments.


    Develop an assay technique to test drug effectiveness. An ideal assay is one in which a compound can be added to tissue samples or micro-organism colonies and there will be a visible indication of an effective treatment. At worst, there must be a way to test the drug on a laboratory animal that is susceptible to the disease. If the only way to test the effectiveness of a trial compound is to inject an untested compound into a human subject then there is no way to proceed in finding a pharmaceutical treatment.


    The next step is to make a financial decision about whether to proceed with the development process. The assay technique will determine the cost of testing compounds. If there are existing chemical treatments, it will be a refinement effort which saves the expense of finding lead compounds. All drugs must go through extensive testing so this is a fairly fixed cost. There may be governmental grants or tax incentives associated with certain diseases. The number of patients requiring treatment and merits of existing treatments will determine the long term profitability of producing a drug.

  4. Steps 4 and 5 of this procedure are often performed simultaneously.


    Lead compounds are compounds that have some activity against a disease. These may be only marginally useful and may have severe side effects. However, the lead compounds provide a starting point for refinement of the chemical structures. Lead compounds may come from many sources, including

    1. The isolation of active compounds from traditional remedies.

    2. The testing of natural materials followed by an isolation effort.

    3. Drugs effective against similar diseases.

    4. Use of combinatorial chemistry techniques which produce large numbers of related chemical compounds. This allows testing a large number of compounds at once. When a mixture that is useful is found, a separation must be done to determine which of the related structures has some drug activity. This has been one of the most promising and rapidly growing techniques in recent years.

    5. Searching chemical databases to find compounds similar to those found by the above means. This is the only part of the lead finding process that is considered to be a computational technique. There are many different measures of molecular similarity and ways of efficiently handling large databases, so this is not yet a trivial step.


    If it is known that a drug must bind to a particular spot on a particular protein or nucleotide then a drug can be tailor made to bind at that site. This is often modeled computationally using any of several different techniques. Traditionally, the primary way of determining what compounds would be tested computationally was provided by the researchers' understanding of molecular interactions. A second method is the brute force testing of large numbers of compounds from a database of available structures.

    More recently a set of techniques, called rational drug design techniques or De Novo techniques have been used. These techniques attempt to reproduce the researchers' understanding of how to choose likely compounds built into a software package that is capable of modeling a very large number of compounds in an automated way. Many different algorithms have been used for this type of testing, many of which were adapted from artificial intelligence applications. No clear standard has yet emerged in this area so it is impossible to say what is best the best technique at this time.

    These techniques have seen quite a bit of active development in recent years. Unfortunately, the complexity of biological systems makes it very difficult to determine the structures of large biomolecules. Ideally a x-ray chrystallography structure is desired, but biomolecules are very difficult to chrystalize. Another very useful technique, called "distance geometry" is to find some of the internuclear distances using NMR Nuclear Overhauser Effect experiments then find molecular geometries that have these distances. If only a protein sequence is known, there are many techniques for predicting how that protein will fold, but none has yet been shown to be 100% reliable. Even once a structure has been determined, identifying the site where a drug must bind is not a trivial task.

    The difficulty in find geometries makes it possible to bring first generation drugs to market by refinement of lead compounds without ever knowing the target site for the drug in the body. As such, these techniques are being used primarily for designing improved treatments for diseases that have already been characterized extensively.


    Once a number of lead compounds have been found, computational and laboratory techniques have been very successful in refining the molecular structures to give a greater drug activity and fewer side effects. This is done both in the laboratory and computationally by examining the molecular structures to determine which aspects are responsible for both the drug activity and the side effects.

    Synthetically, functional groups are removed in order to find out which must be present to give a useful drug and which are not necessary. The back bone of the structure is made more flexible or more rigid. A rigid back bone may hold the functional groups in the exact alignment necessary for the drug to bind. A flexible back bone may be necessary to allow the drug to get into the binding site. Adding bulky groups at other points on the molecule is often done in the hopes that these new groups may hinder the molecule from binding at unwanted sites which are responsible for the side effects.

    Computationally, the technique used is known as QSAR (Quantitative Structure Activity Relationships). It consists of computing every possible number that can describe a molecule then doing an enormous curve fit to find out which aspects of the molecule correlate well with the drug activity or side effect severity. This information can then be used to suggest new chemical modifications for synthesis and testing.

    Another important aspect of the molecular structure is its solubility. Whether the molecule is water soluble or readily soluble in fatty tissue will affect what part of the body it becomes concentrated in. The ability to get a drug to the correct part of the body is an important factor in its potency.

    Ideally there is a continual exchange of information between the researchers doing QSAR studies, synthesis and testing. These techniques are frequently used and often very successful since they do not rely on knowning the biological basis of the disease which can be very difficult to determine.


    Once a drug has been shown to be effective by an initial assay technique, much more testing must be done before it can be given to human patients. Animal testing is the primary type of testing at this stage. The scientists doing the testing must be particularly observant of many little details since this is where unexpected side effects can be found. Another question to be answered is whether the drug will work well or poorly with other drugs. This is also where initial data necessary to determine correct dosages is obtained.

    Eventually, the compounds which are deemed suitable at this stage are sent on to clinical trials. In the clinical trials, additional side effects may be found and human dosages are determined. The typical testing process goes like this.

    1. Preclinical testing in animals and test tubes. This takes an average of 6.5 years. Only one compound in 1000 is sent on to clinical testing.
    2. Phase I clinical trials in a few human volunteers. This typically takes a year and a half. Seventy percent of the compounds are sent on to the next step. This is primarily a safety test.
    3. Phase II clinical trials in a few hundred patients. This takes two years and a third of the compounds are passed on to the next step. This is further safety testing and an initial examination of the ability of the drug to have the intended effect in humans.
    4. Phase III clinical trials in a few thousand patients. This step collects more data on safety, dosage, drug activity and side effects. About a quarter of the compounds pass this phase.
    5. An advisory panel of doctors reviews the data and makes recommendations to the FDA.
    6. FDA approval or rejection.
    7. The FDA continues to monitor drug performance long after approval has been given.


    Before a drug can be produced, there must be a means to administer it. Ideally, a tasteless or bland tablet can be created. Alternatively, an oral liquid, intravenous injection or directly applied cream may be created.

    Tablets are created by adding other compounds to minimize stomach upset and control timed release of the drug. A tablet may also have a compound which is a matrix that helps it hold it's shape without crumbling into a powder.

    Oral liquids are often combined with strong flavors and alcohol to mask the taste of the drug and prevent throat irritation.

    A cream may have to be thickened or have a component that the skin will absorb readily.


    The large scale production of complex molecules can be very difficult. Compounds originally isolated from natural products may continue to be harvested. Often natural products are found in nature only in extremely small quantities necessitating a complex synthesis. One route that has been under development more recently is to have compounds produced by genetically engineered micro-organisms or plants.

    Drugs have a high value per gram. As such production techniques can be viable even though they are far more inefficient than those used by bulk chemical producers. Often all possible production techniques are researched even though only one will be put into practice. This is done so that there are no openings for competing corporations to get around a manufacturers patents by using a different technique.

    Manufacturing regulations have become much more stringent in recent years. It is now also important to determine what by-products will result from production and what environmental impact there will be. It is possible to have a case in which a less efficient manufacturing process is more profitable due to the value of side products and reduced waste disposal costs.


    If there is only one available treatment for a disease, it is only necessary to see that physicians know about it. If there are several competing treatments, there may be quite a bit of marketing done so that physicians will understand the relative merits of each.


    After a large amount of experience under a physicians supervision, a drug may be approved for over-the-counter sales. This is often the biggest profit making end of the pharmaceutical industry.


    Once the chemical patents have expired, a drug can be produced by any manufacturer. Generic drugs are often less expensive for the consumer and yield a low profit margin for the producer. The production of generic drugs favors the most cost effective production process.


A good book over all, and chapter 7 in particular, is
G. L. Patrick "An Introduction to Medicinal Chemistry" Oxford (1995)

A recent review is
L. M. Balbes, S. W. Mascarella and D. B. Boyd, in "Reviews in Computational Chemistry, Vol. 5" K. B. Lipkowitz, D. B. Boyd, Eds., VCH, 337 (1994)

An introduction to computational techniques is
G. H. Grant, W. G. Richards "Computational Chemistry" Oxford (1995)

A more detailed description of computational techniques is
A. R. Leach "Molecular Modelling Principles and Applications" Longman (1996)

L. Balbes' "Guide to Rational (Computer-aided) Drug Design" is at

There are many links on Soaring Bear's web page at

An introduction to structure-based techniques is
I. D. Kuntz, E. C. Meng, B. K. Shoichet Acct. Chem. Res. 27 (5), 117 (1994)

An introduction to De Novo techniques is
S. Borman Chemical and Engineering News 70 (12), 18 (1992)

There is more information about clinical testing at

An expanded version of this article will be published in "Computational Chemistry: A Practical Guide for Applying Techniques to Real World Problems" by David Young, which will be available from John Wiley & Sons in the spring of 2001.

Return to table of contents.