RE: QSAR - Modeling biological data

From: Mark Earll <mark.earll#umetrics.co.uk>
Date: Mon, 17 Feb 2003 10:57:57 -0000

Dear Suzanne,

This is a commonly encountered problem in QSAR. I would agree with Jarmo
that the most reliable models will come from exact values however this often
results in small datasets. The way we have handled this type of information
when building multivariate models is to set all values above or below the
detection limit to an arbitrary high or low value, usually something like
half the lower detection limit or twice the upper. Alternatively if using
PLS and you don't have many of these values you can set them to 'missing'.
Obviously the value of the models produced must then be tested by external
validation (not always easy with QSAR data).

Mark

Mark

--
----------------------------------------------------------------------------
-------
Mark Earll CChem MRSC 	       Umetrics 
Senior Consultant	         (Scientific Data Analysis)
Umetrics UK Ltd                    
Woodside House, Woodside Road, 
Winkfield, Windsor, SL4 2DX
Phone:  01344 885615         Mobile: 07765 402673
Email:	 mark.earll#,#umetrics.co.uk  
Fax:       01344 885410     
Web:	 http://www.umetrics.com
----------------------------------------------------------------------------
----------
-----Original Message-----
From: jjhuusko.:.mappi.helsinki.fi [mailto:jjhuusko^mappi.helsinki.fi]
Sent: 08 February 2003 11:13
To: qsar_society|accelrys.com
Subject: Re: QSAR - Modeling biological data
NB: Unless you reset the To: line, your reply goes to the entire list
---
Dear Suzanne,
Your question is justified. It seems like experimental values over
128 mg/ml shows also low activity responce? In QSARs only excact values
for activity should be used, hence I prefer to exclude the compounds
which do not have these values. Of course, some discussion of the
reasons why there are compounds which shows low activity (in this case
there might also be some analytical problems, like low solubility etc).
With all the best,
Jarmo
> Hello,
> I would like to develop a QSAR model using calculated descriptors as
> well as experimental values. The problem with the experimental values is
> the range. They are from 2 to 128 mg/ml. But some could not be
> determined and they are said to be > 128 mg/ml. How can I take into
> account in my training set of data that are greater than.
> 
> Thank you with your help
> 
> 
> Suzanne Sirois, Ph.D
> Cheminformatics, Computational Chemistry
> 
> _______________________________________________
> qsar_society mailing list
> qsar_society_-_accelrys.com
> http://ftp2.accelrys.com/mailman/listinfo/qsar_society
> 
_______________________________________________
qsar_society mailing list
qsar_society=-=accelrys.com
http://ftp2.accelrys.com/mailman/listinfo/qsar_society
Received on 2003-02-17 - 06:35 GMT

This archive was generated by hypermail 2.2.0 : 2005-11-24 - 10:21 GMT