CCL: Original reference for Tanimoto similarity
- From: "Andreas Bender, PhD"
<Andreas.Bender===cantab.net>
- Subject: CCL: Original reference for Tanimoto similarity
- Date: Sun, 6 Apr 2008 09:33:41 +0200
Sent to CCL by: "Andreas Bender, PhD" [Andreas.Bender,+,cantab.net]
Hi Willem,
> From what I am aware of (I was revising the area quite a lot for my
recent PhD thesis) what we call now the 'Tanimoto' Coefficient was
originally published by Paul Jaccard in 1901 with the intention to
compare ecosystems - more precisely the types of flowers present in
the basin of a certain river, with those types of flowers present in
adjacent regions[1]. Accordingly, in some publications the coefficient
is referred to as the 'Jaccard-Tanimoto' coefficient (or simply the
Jaccard coefficient) and this is the earliest reference I could find
which employs what we call now the 'Tanimoto' coefficient.
As for the first reference by Taffee Tanimoto himself, usually an
internal report at IBM is given [2] (such as in the overview of
similarity coefficients at Daylight at
http://www.daylight.com/dayhtml/doc/theory/theory.finger.html)
but I
couldn't get hold of this paper yet.
Closely related, and I found that really interesting to read, was a
paper by Amos Tversky on the human perception of similarities based on
individual features [3]. In this work Tversky also provides a
theoretical foundation for more unusual similarity coefficients such
as asymmetrical coefficients, all based on human psychology but also
with implications for molecular similarity searching. Since molecular
similarity (as well as any other similarity) is a highly subjective
area, the Tversky work is of course not able to "solve" problems
associated with feature-based similarities, but it outlines the
implicit assumptions behind similarity measured via features and
coefficients in a very clear manner so I can recommend that work to
anyone interested in similarity measures. If you can't get hold of the
journal please just contact me and I would of course be very happy to
provide you with a reprint.
All the best,
Andreas
[1] P. Jaccard, 1901, Distribution de la flore alpine dans le bassin
des Dranses et dans quelques régions voisines. Bulletin del la
Société
Vaudoise des Sciences Naturelles 37, 241-272.
[2] T.T. Tanimoto, 1957, IBM Internal Report 17th Nov.
[3] A. Tversky, 1977, Features of similarity. Psychological Review,
84(4), 327-352
--
Andreas Bender, PhD, Assistant Professor for Cheminformatics
Leiden / Amsterdam Center for Drug Research
Pharma-IT Platform: http://www.pharma-it.net
Division of Medicinal Chemistry: http://www.medchem.leidenuniv.nl
Personal Homepage: http://www.andreasbender.de
On Sat, Apr 5, 2008 at 11:58 PM, Willem van Hoorn
willem.van.hoorn^pfizer.com <owner-chemistry=-Ìl.net> wrote:
>
> Sent to CCL by: "Willem van Hoorn"
[willem.van.hoorn~!~pfizer.com]
> Hello,
>
> The Tanimoto coefficient is bread and butter for 2D chemical similarity
searches but does anyone know the original publication? The reference below is
as close as I could get.
>
> A Computer Program for Classifying Plants
> David J. Rogers and Taffee T. Tanimoto
> http://www.sciencemag.org/cgi/content/citation/132/3434/1115
>
> Thanks,
>
> Willem van Hoorn
>
>
>
> -= This is automatically added to each message by the mailing script
=->
> Search Messages: http://www.ccl.net/htdig (login: ccl, Password: search)>
>
>