QSAR - How to statistically determine when variables are "working well together"

From: james.metz-$-abbott.com
Date: Thu, 23 Oct 2003 11:33:55 -0500

QSAR Society,

        Is anyone aware of any publications, "white" papers, or
presentations which discuss the concept of how to
judge when molecular descriptors are "working well together" in a QSAR
equation, to reduce errors and especially
improve predictive power for external prediction sets?

        For example, I am well-aware of the more trivial case of building
QSAR equations with say 2 terms, then 3
terms, then 4 terms, then 5 terms, etc. and then monitoring the R^2, Q^2,
etc.. Then, we say that if the R^2 or Q^2 or
perhaps the F statistic has "improved significantly", this justifies the
use of a higher order equation. We may then use
an Occam's Razor argument to use simpler models when a model with fewer
terms has about the same predictive power
as a set of equations with more terms, etc.

        However, I am looking for (perhaps) something more sophisticated
and thoughtful than this approach! Is there something
like examining the synergism between variables in QSAR equations that
reduces errors in a way that suggests that the
variables "work well together" e.g., some kind of cancellation of errors?
Is there a mathematical formalism for this?

        Thoughts, ideas, leads, comments, etc. are much appreciated here.

        Regards,
        Jim Metz

James T. Metz, Ph.D.
Research Investigator Chemist

GPRD R46Y AP10-2
Abbott Laboratories
100 Abbott Park Road
Abbott Park, IL 60064-6100
U.S.A.

Office (847) 936 - 0441
FAX (847) 935 - 0548

james.metz[a]abbott.com
Received on 2003-10-23 - 13:42 GMT

This archive was generated by hypermail 2.2.0 : 2005-11-24 - 10:21 GMT