RE: QSAR - How to statistically determine when variables are "wor king well together"

From: Stefan Dove <stefan.dove*chemie.uni-regensburg.de>
Date: Tue, 28 Oct 2003 11:42:14 +0100

Dear all,

let me give the following comment to Hugo Kubinyi:

How to derive models which are as simple as possible (and as complex as
necessary)? The e x c l u s i v e selection of variables that explain
as single variables is mostly not appropriate, but each variable
selection must include these variables. Checking all three-variable
combinations as recommended by Hugo will commonly retain such critical
descriptors. The cited example, however, refers just to the case where
the selection of only X-4 is at least not "nonsense". Trivially, if a
single variable like X-4 explains m u c h of the data, the
combination w i t h other variables will not improve the fit and the
prediction, but in the case of multicollinearities the combination o f
 other variables may reproduce the effect of the single variable (Table
7 in the cited article is a nice example of the influence of this rule
in PLS).

May be that I always overemphasize the goal to get more transparent
results with interpretable models and therefore favor strict variable
selection. As referee I often had to deal with manuscripts
investigating the correlation of a huge number of topological
descriptors with biological or chemical data, but in some cases a
simple inspection of a table has shown that a single variable explained
most of the SAR by discriminating between discrete structural features.
Today, with Internet, fast computers and easily available software, it
is much too simple also for unexperienced users to obtain large
descriptor sets. Therefore we may not often enough call for chemical
and pharmacological transparency of our results.

Best regards,

Stefan.

  
Prof. Dr. Stefan Dove Tel. +49 941/943/4673
Univ. Regensburg FAX +49 941/943/4820
Inst. Pharmazie
93040 Regensburg EMail: Stefan.Dove^^chemie.uni-regensburg.de
Germany
Received on 2003-10-28 - 08:08 GMT

This archive was generated by hypermail 2.2.0 : 2005-11-24 - 10:21 GMT