CCL: Original reference for Tanimoto similarity



 Sent to CCL by: "Andreas Bender, PhD" [Andreas.Bender,+,cantab.net]
 Hi Willem,
 > From what I am aware of (I was revising the area quite a lot for my
 recent PhD thesis) what we call now the 'Tanimoto' Coefficient was
 originally published by Paul Jaccard in 1901 with the intention to
 compare ecosystems - more precisely the types of flowers present in
 the basin of a certain river, with those types of flowers present in
 adjacent regions[1]. Accordingly, in some publications the coefficient
 is referred to as the 'Jaccard-Tanimoto' coefficient (or simply the
 Jaccard coefficient) and this is the earliest reference I could find
 which employs what we call now the 'Tanimoto' coefficient.
 As for the first reference by Taffee Tanimoto himself, usually an
 internal report at IBM is given [2] (such as in the overview of
 similarity coefficients at Daylight at
 http://www.daylight.com/dayhtml/doc/theory/theory.finger.html)
 but I
 couldn't get hold of this paper yet.
 Closely related, and I found that really interesting to read, was a
 paper by Amos Tversky on the human perception of similarities based on
 individual features [3]. In this work Tversky also provides a
 theoretical foundation for more unusual similarity coefficients such
 as asymmetrical coefficients, all based on human psychology but also
 with implications for molecular similarity searching. Since molecular
 similarity (as well as any other similarity) is a highly subjective
 area, the Tversky work is of course not able to "solve" problems
 associated with feature-based similarities, but it outlines the
 implicit assumptions behind similarity measured via features and
 coefficients in a very clear manner so I can recommend that work to
 anyone interested in similarity measures. If you can't get hold of the
 journal please just contact me and I would of course be very happy to
 provide you with a reprint.
 All the best,
 Andreas
 [1] P. Jaccard, 1901, Distribution de la flore alpine dans le bassin
 des Dranses et dans quelques régions voisines. Bulletin del la
 Société
 Vaudoise des Sciences Naturelles 37, 241-272.
 [2] T.T. Tanimoto, 1957, IBM Internal Report 17th Nov.
 [3] A. Tversky, 1977, Features of similarity. Psychological Review,
 84(4), 327-352
 --
 Andreas Bender, PhD, Assistant Professor for Cheminformatics
 Leiden / Amsterdam Center for Drug Research
 Pharma-IT Platform: http://www.pharma-it.net
 Division of Medicinal Chemistry: http://www.medchem.leidenuniv.nl
 Personal Homepage: http://www.andreasbender.de
 On Sat, Apr 5, 2008 at 11:58 PM, Willem van Hoorn
 willem.van.hoorn^pfizer.com <owner-chemistry=-Ìl.net> wrote:
 >
 >  Sent to CCL by: "Willem  van Hoorn"
 [willem.van.hoorn~!~pfizer.com]
 >  Hello,
 >
 >  The Tanimoto coefficient is bread and butter for 2D chemical similarity
 searches but does anyone know the original publication? The reference below is
 as close as I could get.
 >
 >  A Computer Program for Classifying Plants
 >  David J. Rogers and Taffee T. Tanimoto
 >  http://www.sciencemag.org/cgi/content/citation/132/3434/1115
 >
 >  Thanks,
 >
 >  Willem van Hoorn
 >
 >
 >
 >  -= This is automatically added to each message by the mailing script
 =->
 >  Search Messages: http://www.ccl.net/htdig  (login: ccl, Password: search)>
 >
 >