Dear Suzanne,
This is a commonly encountered problem in QSAR. I would agree with Jarmo
that the most reliable models will come from exact values however this often
results in small datasets. The way we have handled this type of information
when building multivariate models is to set all values above or below the
detection limit to an arbitrary high or low value, usually something like
half the lower detection limit or twice the upper. Alternatively if using
PLS and you don't have many of these values you can set them to 'missing'.
Obviously the value of the models produced must then be tested by external
validation (not always easy with QSAR data).
Mark
Mark
-- ---------------------------------------------------------------------------- ------- Mark Earll CChem MRSC Umetrics Senior Consultant (Scientific Data Analysis) Umetrics UK Ltd Woodside House, Woodside Road, Winkfield, Windsor, SL4 2DX Phone: 01344 885615 Mobile: 07765 402673 Email: mark.earll#,#umetrics.co.uk Fax: 01344 885410 Web: http://www.umetrics.com ---------------------------------------------------------------------------- ---------- -----Original Message----- From: jjhuusko.:.mappi.helsinki.fi [mailto:jjhuusko^mappi.helsinki.fi] Sent: 08 February 2003 11:13 To: qsar_society|accelrys.com Subject: Re: QSAR - Modeling biological data NB: Unless you reset the To: line, your reply goes to the entire list --- Dear Suzanne, Your question is justified. It seems like experimental values over 128 mg/ml shows also low activity responce? In QSARs only excact values for activity should be used, hence I prefer to exclude the compounds which do not have these values. Of course, some discussion of the reasons why there are compounds which shows low activity (in this case there might also be some analytical problems, like low solubility etc). With all the best, Jarmo > Hello, > I would like to develop a QSAR model using calculated descriptors as > well as experimental values. The problem with the experimental values is > the range. They are from 2 to 128 mg/ml. But some could not be > determined and they are said to be > 128 mg/ml. How can I take into > account in my training set of data that are greater than. > > Thank you with your help > > > Suzanne Sirois, Ph.D > Cheminformatics, Computational Chemistry > > _______________________________________________ > qsar_society mailing list > qsar_society_-_accelrys.com > http://ftp2.accelrys.com/mailman/listinfo/qsar_society > _______________________________________________ qsar_society mailing list qsar_society=-=accelrys.com http://ftp2.accelrys.com/mailman/listinfo/qsar_societyReceived on 2003-02-17 - 06:35 GMT
This archive was generated by hypermail 2.2.0 : 2005-11-24 - 10:21 GMT