Paul E. Gurbaa, Marc E. Parhamb, Joseph R. Votanoc

Comparison of QSAR Models Developed for Acute Oral Toxicity (LD50) by Regression and Neural Network Techniques

aLockheed-Martin Skunk Works, 1011 Lockheed Way Palmdale, CA 93599-3738, USA. E-mail:
b6 Ruben Duren Way, Bedford, MA 01730, USA. E-mail:
cSciVision, 128 Spring Street, Lexington, MA 02173, USA E-mail:

Chemical safety evaluations in many settings are often made with limited information on basic toxicity. Many times this is due to a lack of time or funds to develop reliable toxicity data via typical laboratory studies. In an attempt to fill this information gap we initiated development of a quantitative structure activity relationship (QSAR) model for acute oral toxicity (LD50) in rats based on molecular structure. Today, structure based representations of molecules can be easily obtained with a sketching program such as Alchemy 2000 or ChemDraw. The data was collected from readily available sources (Sigma-Aldrich MSDSs on CD-ROM and RTECS) for 90 aliphatic and aromatic amines for interest to our group. The molecular descriptors for quantum and molecular properties of these compounds were calculated with Alchemy2000 and MolCon (Windows, V1.0) from their chemical structures already in HIN format. A total of 79 input parameters were generated for each compound in the database. The set of 90 compounds was randomly divided into a training set of 65 compounds and a testing set of 25 compounds. Three separate models were developed to determine the best approach to model LD50 with QSAR. Each model was developed using the same train and test set of compounds. The accuracy of the models is reported as a function of the predicted values vs. the actual values for LD50 for the train and test sets. In each case the model was evaluated with the identical test set to estimate how reliable the model's predictions would be with new compounds. Model #1 was generated with a traditional statistical modeling package SAS (NT 3.51, Ver 6.09) with forward stepwise multiple linear regression techniques. It consisted of parameters (14 total) generated with quantum and molecular mechanics (qm/mm) parameters. The training set for Model #1 resulted in a line-fit with R2=0.20 (14 parameters, n=65 compounds). Model #2 was similar to the first model, but additional summation parameters (15 total including kappa, chi and other Hall indices) were evaluated, again with stepwise, multiple linear regression (SAS) using the training set to generate the model and testing set to estimate its accuracy on unseen data. The line-fit of the predicted vs. the observed values improved to R2=0.27 with Hall parameters included. Model #3 was developed with the neural network program by generating a relationship from a selection of both qm/mm and Hall parameters. It required a total of 6 parameters of the original 79 parameters to produce a predictive model. The neural network model resulted in a line-fit of the predicted vs. the observed LD50 values of R2=0.94 for the same set of test compounds. This exercise illustrates the value of neural network analysis in QSAR development with readily available, public information. We are currently attempting to expand rat acute oral toxicity to include a larger number of compounds of diverse chemical classes, and determine if a similar model of high predictive accuracy can be obtained for other toxicity endpoints.

Back to Program Page
Back to Main Page