Dear Jim Metz
This is a question which is far from trivial. Your question and other related issues (variable selection, etc.) of interest for QSAR modellers and risk assessors in the environmental sciences are discussed in the following publication:
Methods for Reliability and Uncertainty Assessment and for Applicability Evaluations of Classification- and Regression-Based QSARs
Lennart Eriksson,1 Joanna Jaworska,2 Andrew P. Worth,3 Mark T.D. Cronin,4 Robert M. McDowell,5 and Paola Gramatica6
http://ehpnet1.niehs.nih.gov/docs/2003/5758/abstract.html
All the best
Lennart Eriksson
Lennart Eriksson, Ph.D., Docent
Senior Lecturer and Consultant
Enterprise Platforms
Umetrics AB, Box 7960, SE-907 19 Umeĺ, Sweden
Phone: +46 90 184852
Mobile: +46 73 682 4852
Fax: +46 90 184899
Mailto:lennart.eriksson(0)umetrics.com <mailto:lennart.eriksson-#-umetrics.com>
Visit http://www.umetrics.com <http://www.umetrics.com/>
-----Original Message-----
From: qsar_society-admin++accelrys.com [mailto:qsar_society-admin*|*accelrys.com] On Behalf Of james.metz#%#abbott.com
Sent: den 23 oktober 2003 18:34
To: qsar_society()accelrys.com
Cc: james.metz-$-abbott.com
Subject: QSAR - How to statistically determine when variables are "working well together"
QSAR Society,
Is anyone aware of any publications, "white" papers, or presentations which discuss the concept of how to
judge when molecular descriptors are "working well together" in a QSAR equation, to reduce errors and especially
improve predictive power for external prediction sets?
For example, I am well-aware of the more trivial case of building QSAR equations with say 2 terms, then 3
terms, then 4 terms, then 5 terms, etc. and then monitoring the R^2, Q^2, etc.. Then, we say that if the R^2 or Q^2 or
perhaps the F statistic has "improved significantly", this justifies the use of a higher order equation. We may then use
an Occam's Razor argument to use simpler models when a model with fewer terms has about the same predictive power
as a set of equations with more terms, etc.
However, I am looking for (perhaps) something more sophisticated and thoughtful than this approach! Is there something
like examining the synergism between variables in QSAR equations that reduces errors in a way that suggests that the
variables "work well together" e.g., some kind of cancellation of errors? Is there a mathematical formalism for this?
Thoughts, ideas, leads, comments, etc. are much appreciated here.
Regards,
Jim Metz
James T. Metz, Ph.D.
Research Investigator Chemist
GPRD R46Y AP10-2
Abbott Laboratories
100 Abbott Park Road
Abbott Park, IL 60064-6100
U.S.A.
Office (847) 936 - 0441
FAX (847) 935 - 0548
james.metz=-«bott.com
Received on 2003-10-24 - 08:32 GMT
This archive was generated by hypermail 2.2.0 : 2005-11-24 - 10:21 GMT