RE: QSAR - Sparse or Uneven Biological Data - What to do?

From: Mark Earll <mark.earll]^[umetrics.co.uk>
Date: Mon, 8 Mar 2004 12:20:00 -0000

James and John, and QSAR members,
 
This is a very common problem in drug design, no one wants to make or test
compounds likely to have lower primary activity, even the compounds may have
better ADMET properties! Statistical Molecular Design (SMD) where small
candidate sets are chosen with maximum chemical diversity in mind, is one
way to ensure there is enough variation in the data to build reliable
models. This however requires a lot of courage to suggest! The benefits of
this approach though are the ability to optimise not only the primary
activity but also a range of other properties that may have an impact on the
drug likeness of the molecules.
 
More information about SMD may be found at the "homepage of chemometrics"
editorials Nov 2002-April 2003 starting at
http://www.acc.umu.se/%7Etnkjtg/chemometrics/editorial/nov2002.html
<http://www.acc.umu.se/%7Etnkjtg/chemometrics/editorial/nov2002.html>
 
Best Regards,
 
Mark
-- --------------------------------------------------------------------
Mark Earll CChem MRSC Umetrics
Senior Consultant (Scientific Data Analysis)
Umetrics UK Ltd
Woodside House, Woodside Road,
Winkfield, Windsor, SL4 2DX

Phone: 01344 885615 Mobile: 07765 402673
Email: mark.earll##umetrics.co.uk
Fax: 01344 885410
Web: http://www.umetrics.com <http://www.umetrics.com/>
----------------------------------------------------------------------------
------------

 

-----Original Message-----
From: Dearden, John [mailto:J.C.Dearden/a\livjm.ac.uk]
Sent: 01 March 2004 09:27
To: qsar_society(-)accelrys.com
Subject: RE: QSAR - Sparse or Uneven Biological Data - What to do?

James:
 
There is what seems to me, as an academic, a relatively cheap answer to the
problem of unevenly distributed data in drug design, namely to persuade your
organic chemists to make a few compounds that are predicted from your
initial QSAR (on your poorly distributed data) to have intermediate
activity. I'm sure, however, that many of you in industry will have reasons
why this is not easy or relatively cheap, such as: why should we make a
compound that is predicted not to have sufficiently high activity to be a
candidate drug?
 
John Dearden

-----Original Message-----
From: qsar_society-admin-,-accelrys.com
[mailto:qsar_society-admin=accelrys.com]On Behalf Of james.metz-$-abbott.com
Sent: 27 February 2004 17:30
To: qsar_society(-)accelrys.com
Cc: james.metz(0)abbott.com
Subject: QSAR - Sparse or Uneven Biological Data - What to do?

QSAR Society Colleagues,

        I have a general question concerning the unfortunate, yet common
problem of sparse, un-evenly distributed
biological data that one often obtains especially during the early phases of
discovery research programs in the pharmaceutical
industry. In the later (or end) phases of the program, typically there is
alot of data and one may be fortunate to have nice
data sets where one can at least find compounds with activities spread out
over (hopefully) a few orders of magnitude.
But, if one builds (predictive) models near the end of the program, there is
very little chance of having a significant impact in
terms of suggesting/warning against compounds that the chemists
should/should not make. Of course, the analysis may
contribute to a nice after-the-fact publication in J. Med. Chem., but ....
ahem....who cares? (other than adding another publication
to my CV).

        For example, perhaps 100 compounds may be tested in an assay,
perhaps 95 compounds are "dead" - meaning
high IC50 values (maybe >100 uM or so) and perhaps only 5 compounds have
"interesting" activities, perhaps IC50s in the 1-10 uM
or perhaps < 1 uM range.

        To clarify the problem more, let us also assume that one does NOT
have X-ray structures of the 5 compounds bound
to a target, so this is NOT simply a matter of figuring out what pocket or
region of an active site or a receptor that the chemists
have not exploited very well.

        In other words... this is more of a ligand-based structure-activity
problem.

        OK, so now what do you decide to do?

        Quit, move on to the next project, or Stick your neck out and try to
build models, or ....

        One idea that has been kicked around goes something like this:
"Since I really only care about the active compounds, why not
pay MORE attention to them?"

        Statistically, translated, this might mean - Change the weighting of
my 5 most active compounds instead of weighting all compounds
evenly. Or, maybe throwing out some compounds near the mean value (high
IC50 in this case), since they (seemingly ?) are not contributing
much "information."

        I can see pros and cons with butchering or artificially modifying
the data set, hance I do not see a clear answer.

        So.... Does anyone have any thoughts on this approach, or perhaps
other ideas about dealing with this general problem of poorly
distributed biological data?

        
        Best Regards,
        Jim Metz

James T. Metz, Ph.D.
Research Investigator Chemist

GPRD R46Y AP10-2
Abbott Laboratories
100 Abbott Park Road
Abbott Park, IL 60064-6100
U.S.A.

Office (847) 936 - 0441
FAX (847) 935 - 0548

james.metz:abbott.com
Received on 2004-03-08 - 09:20 GMT

This archive was generated by hypermail 2.2.0 : 2005-11-24 - 10:21 GMT