CCL: What do cheminformaticists do with inconsistently measured data?

David, I can see two approaches. The first one is that you treat density as a function of temperature, and determines the constants for that function as part of your fit. Allows you to verify post-fitting for known cases, but you could run into problems. For example, some solvents (e.g., water) don't have a linear relationship. 

The second approach is that you don't fit densities, but density differences under same conditions, allows you to divide into independent data sets to be fit simultaneously. 


Sent from my iPhone

On 22 Apr 2017, at 02:14, David Shobe <owner-chemistry[ AT ]> wrote:

Please excuse crossposting.

For example, if one is doing a QSPR (quantitative structure-properties relation) study of densities of alkanes, and encounters the problem that some densities are measured at 20°C and others at 25°C, how should one handle the inconsistency of measurement conditions?  Note that the difference in density for the same alkane between 20°C and 25°C might be significant in comparison to the difference in density between two isomeric alkanes at the same temperature.  Is is legitimate to try to correct/standardize the 20°C densities to 25°C densities by subtracting or dividing the 20°C densities by some constant?  And if so, how does one determine that constant?  Are there other approaches one can use?

--David Shobe

Confidentiality Notice: This message is private and may contain confidential and proprietary information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this message is not permitted and may be unlawful.