Re: QSAR - Rank order of activity data

From: Bob Clark <bclark++tripos.com>
Date: Fri, 20 Jun 2003 08:39:26 -0500

Jim,

The "artificial" differences in rank are, by definition, random perturbations to
your data and is allowed for to some degree in most non-parametric analysis
methods. You can take further advantage of this "problem", however, by creating
several response rank vectors, generating a model for each, and then combining
models so as to get a perturbation analysis. To get the perturbed response
vectors, run a standard analysis to find the least significant difference and
then randomly swap ranks for some of the pairs that are closer together than
that (you can use the number of such semi-ties as a guide to how many to swap
within each vector*). Then the variance of estimates from the family of models
you generate will be an estimate of the variance of prediction.

This would constitute a nice variation on the progressive scrambling approach I
have been developing over the last several years.

Bob Clark
Tripos, Inc.

*As a sanity check: if the predictions and statistics from the naive
(unperturbed) ranks fall at the extreme of the distributions of predictions and
statistics from the perturbed ones, you have introduced scrambled too much,
tehreby introducing too much noise.

james.metz%x%abbott.com wrote:
>
> QSAR Society Colleagues,
>
> I would like to convert biological activity or molecular property data
> to a rank order e.g., convert IC50 or p(IC50) data to a
> statistically meaningful rank order.
>
> This may sound like a problem with a simple answer, but apparently is
> not for several reasons:
>
> 1) The trivial approach of simply sorting compounds by p(IC50) and
> then assigning a rank leads to artificial rank distinctions among
> compounds that given the inherent experimental variability of the data is
> probably not justified.
>
> 2 If one had replicates for every IC50, one could attempt to do
> T-tests between every IC50, and assign a new rank if the statistic exceeds the
> 95% confidence level. However, I do not have replicates for every measured
> IC50 value.
>
> 3) If one attempts to normalize the IC50 distribution and then assign
> ranks, perhaps, on the basis of mean +/- n*sigma values, this
> creates "somewhat" artificial ranks where some compounds may have been right
> at the border of the mean + sigma, or mean + 2*sigma, etc.
>
> If anyone has ideas, experience, or suggestions on this topic, I
> welcome your comments !
>
> Regards,
> Jim Metz
>
> James T. Metz, Ph.D.
> Research Investigator Chemist
>
> GPRD R46Y AP10-2
> Abbott Laboratories
> 100 Abbott Park Road
> Abbott Park, IL 60064-6100
> U.S.A.
>
> Office (847) 936 - 0441
> FAX (847) 935 - 0548
>
> james.metz-.-abbott.com


Received on 2003-06-20 - 03:50 GMT

This archive was generated by hypermail 2.2.0 : 2005-11-24 - 10:21 GMT