RE: QSAR - How to statistically determine when variables are "working well together" from Lennart Eriksson on 2003-10-24 (QSAR and MS List)

From: Lennart Eriksson <Lennart.eriksson>
Date: Fri, 24 Oct 2003 13:31:33 +0200

Dear Jim Metz

This is a question which is far from trivial. Your question and other related issues (variable selection, etc.) of interest for QSAR modellers and risk assessors in the environmental sciences are discussed in the following publication:

Methods for Reliability and Uncertainty Assessment and for Applicability Evaluations of Classification- and Regression-Based QSARs

Lennart Eriksson,1 Joanna Jaworska,2 Andrew P. Worth,3 Mark T.D. Cronin,4 Robert M. McDowell,5 and Paola Gramatica6

http://ehpnet1.niehs.nih.gov/docs/2003/5758/abstract.html

All the best

Lennart Eriksson

Lennart Eriksson, Ph.D., Docent
Senior Lecturer and Consultant
Enterprise Platforms
Umetrics AB, Box 7960, SE-907 19 Umeå, Sweden
Phone: +46 90 184852
Mobile: +46 73 682 4852
Fax: +46 90 184899
Mailto:lennart.eriksson(0)umetrics.com <mailto:lennart.eriksson-#-umetrics.com>
Visit http://www.umetrics.com <http://www.umetrics.com/>

        -----Original Message-----
        From: qsar_society-admin++accelrys.com [mailto:qsar_society-admin*|*accelrys.com] On Behalf Of james.metz#%#abbott.com
        Sent: den 23 oktober 2003 18:34
        To: qsar_society()accelrys.com
        Cc: james.metz-$-abbott.com
        Subject: QSAR - How to statistically determine when variables are "working well together"

        QSAR Society,

                Is anyone aware of any publications, "white" papers, or presentations which discuss the concept of how to
        judge when molecular descriptors are "working well together" in a QSAR equation, to reduce errors and especially
        improve predictive power for external prediction sets?

                For example, I am well-aware of the more trivial case of building QSAR equations with say 2 terms, then 3
        terms, then 4 terms, then 5 terms, etc. and then monitoring the R^2, Q^2, etc.. Then, we say that if the R^2 or Q^2 or
        perhaps the F statistic has "improved significantly", this justifies the use of a higher order equation. We may then use
        an Occam's Razor argument to use simpler models when a model with fewer terms has about the same predictive power
        as a set of equations with more terms, etc.

                However, I am looking for (perhaps) something more sophisticated and thoughtful than this approach! Is there something
        like examining the synergism between variables in QSAR equations that reduces errors in a way that suggests that the
        variables "work well together" e.g., some kind of cancellation of errors? Is there a mathematical formalism for this?

                Thoughts, ideas, leads, comments, etc. are much appreciated here.

                Regards,
                Jim Metz



        James T. Metz, Ph.D.
        Research Investigator Chemist

        GPRD R46Y AP10-2
        Abbott Laboratories
        100 Abbott Park Road
        Abbott Park, IL 60064-6100
        U.S.A.

        Office (847) 936 - 0441
        FAX (847) 935 - 0548

        james.metz=-«bott.com

Received on 2003-10-24 - 08:32 GMT