__William C. Herndon__, Hung-Ta Chen, Gabrielle Rum, and
Yumei Zhang

*QSAR Case Studies Using a Generalized MODL for Molecular Similarity
Analysis: (1) Antimalarial Activities of Phenanthrene-(Alkylamino)Carbinols,
(2) Carcinogenic Activities of Polycyclic Aromatic Hydrocarbons *

`Department of Chemistry, The University of Texas at El Paso, El Paso,
Texas 79968`

Several QSAR methodologies have been developed which make use of
hierarchical sets of molecular descriptors and multilinear regression
analysis of physical or biological properties. Our procedures advance through
enumerations of types of atoms and bonds (level 1), rings and functional
groups (level 2), larger structural fragments and steric interactions (level
3), and end by testing the addition of level 4 descriptors based on the
results of semiempirical or *ab initio* molecular orbital calculations.
Experimental properties (e.g. logP, boiling points, etc.) are an additional
possible source of descriptors, not tested in the present work. In general,
the levels of hierarchical structural descriptors are augmented and tested
sequentially to obtain information regarding the lowest levels of description
necessary for statistically significant rectification of a particular
dependent variable property. High quality, structure/property and
structure/activity relationships are normally found that use significant
terms from several descriptor levels [1-5].

In previous work we have also shown how various types of molecular structure codes or molecular descriptors can be used to calculate measures of molecular similarity [6-9]. In this presentation, a more general, simpler and universal protocol will be described which can be used to obtain molecular similarity measures for an arbitrary set of compounds, either globally or at any chosen level of molecular structure analysis or description. The starting point for the analysis is the usual type of N-by-M data matrix, where N is the number of rows (compounds) and M is the number of columns containing numerical measures of descriptors. The Pearson correlation matrix of this data table is an M-by-M square matrix which describes the linear correlations of the descriptors with each other based on the set of N compounds. In many previous applications, the Pearson correlation matrix has been utilized to select subsets of descriptors for use as trial independent variables in QSAR multilinear regression studies.

The Pearson correlation matrix methodology can also be employed to define a (molecular) similarity matrix for the set of N compounds as follows. In the first step, the descriptor data matrix is standardized by subtracting means and dividing by the standard deviations. This puts all the descriptors on a common scale by removing the undue influence of descriptors with large outlying numerical values. Then, for N compounds, an N-by-N similarity matrix is defined to be the Pearson correlation matrix for the transpose of the standardized matrix of the M molecular structure descriptors. Each column in this similarity matrix represents pairwise numerical values of (+) similarity or (-) dissimilarity to a single compound. Multilinear regression analysis is then used to identify statistically significant similarities and dissimilarities to a small set of reference molecules, which provide the independent variables for a QSAR model equation [8,9].

These concepts are illustrated with two examples. The first set of data is comprised of antimalarial activities of 208 phenanthrene derivatives containing a variety of substituent groups [10]. The first three levels of hierarchical descriptors are determined directly from molecular structure drawings. The fourth level consists of quantum mechanical descriptors derived from AM1 calculations using the QSAR module of the SPARTAN software. The molecular similarity matrix is generated as outlined above. Similarities to particular molecules are chosen to be independent variables in a QSAR equation by stepwise regression analysis. Cross-validated predictions of the antimalarial activities are obtained using the leave-one-out methodology, giving results comparable or superior to those from previous studies [4]. Numerical similarities and dissimilarities to the reference structures defined by this procedure can be used to predict antimalarial activities for new compounds.

This protocol is also tested with biological data consisting of animal studies of carcinogenic activities of polycyclic aromatic hydrocarbons [PAHs] containing a large variety of alkyl substituents [11]. A detailed review of the extant animal assay data through 1991 (210 active compounds of 312 tested) was undertaken [5], and an index of carcinogenicity was assigned to every compound where the latent periods were measured (90 compounds). The carcinogenicity index is defined analogous to the Iball index, proportional to the percent of animals developing cancer and inversely proportional to latent period, except that experiments with promoters are weighted with a factor of 0.5. As before, the first three levels of descriptors are derived from molecular structure drawings, and a fourth level consists of quantum descriptors derived from AM1 calculations. The entire set of descriptors is used to calculate a similarity matrix for the 90 compound set, which provides the independent variables (similarities to particular compounds) for the final QSAR model equation. Cross-validated correlations of the carcinogenesis data are very good, especially for the more active compounds. Some very weakly active compounds are predicted to be inactive by this procedure [5].

[1] M. Garbalena and W. C. Herndon, "Graph Theoretical Models for
Enthalpic Properties of Alkanes." *J. Chem. Inf. Comp. Sci.*, **32**,
37-42 (1992).

[2] W. C. Herndon and S. L. Knott, "Structure/Enthalpy Relationships for
Hydrocarbons Containing Benzene Rings." *Polycycl. Arom. Compds.*,
**11**, 229-236 (1996).

[3] U. J. Urquidi, "Structure/Property and Structure/Activity Analyses of PCBs, PCDDs, and PCDFs.", M. S. Thesis (Univ. of Texas at El Paso, Dec., 1994).

[4] H.-T. Chen, "Structure/Activity Analyses of Antimalarial Compounds.", M. S. Thesis (Univ. of Texas at El Paso, Dec., 1995).

[5] Y. Zhang, "Studies of Aromatic Hydrocarbon Carcinogenicity.", M. S. Thesis (Univ. of Texas at El Paso, Dec., 1996).

[6] W. C. Herndon and S. H. Bertz, "Linear Notations and Molecular
Graph Similarity.", *J. Comp. Chem.,* **8**, 367-374 (1987).

[7] A. J. Bruce, "Benzenoid Carcinogenicity and Abstract Definitions of Molecular Similarity.", B. S. Honors Thesis with (Univ. of Texas at El Paso, Aug., 1990).

[8] G. Rum and W. C. Herndon , "Molecular Similarity Concepts 5. Analysis
of Steroid-Protein Binding Constants." *J. Am. Chem. Soc.*, **113**,
9055-9060 (1991).

[9] W. C. Herndon and G. Rum, "Three-Dimensional Topological Descriptors
and Similarity of Molecular Structures: Binding Affinities of
Corticosteroids." *QSAR and Molecular Modeling.*, Prous Science
Publishers, Madrid, 1996, pp. 380-384.

[10] K. H. Kim, C. Hansch, J. Y. Fukanaga, E. S. Steller, P. Y. C. Jow,
P. N. Craig, and J. Page, "Quantitative Structure-Activity Relationships
in 1-Aryl-2-(alkylamino)ethanol Antimaliarials." *J. Med. Chem.*,
**22**, 366-391 (1979).

[11] "Survey of Compounds Which Have Been Tested For Carcinogenic Activity." Public Health Service Publication No. 149. 15 volumes and two supplements, 1951-1992.

Back to Program PageBack to Main Page