# Re: CCL:Clarification on trivia question

``` On Apr 24,  3:37pm, margot wrote:
> Subject: CCL:Clarification on trivia question
> According to "The Name Game" by A. Nickon and E. F. Silversmith,
Pergamon
> 1987, p. 149
>
> a compound containing 10 elements: C60H78Br2CdCl2I2N16O2P2W2
...
> 	Have fun,
> 			margot
OK, Margot, you're on.  Here's my idea of fun; YMMV:
Let's see;  the cited compound has 167 atoms divided among 10 atom types.
Certainly this would not be as impressive as a molecule that has 10
atoms, all of different types.  But is it as impressive as, say, formamide,
which packs four atom types into a molecule with only six atoms?
How might one devise an "impressiveness metric" for this problem?
Let's
define N as the number of atoms and k as the number of atom types.
Let us denote by "Metric 0" k itself.
One might be tempted to try to use k/N as a metric;  this has a
maximum of 1 for any molecule that has as many types as it has atoms,
but does not accord with intuition, since any monatomic species exhibits
this maximum, and we don't regard these as "impressive."  Anyway, call
k/N
Metric 1.
A second thought might be to use the product of k and k/N
(i.e., k^2/N) as an impressiveness metric.  Call this Metric 2.  Metric 2
has no theoretical maximum, which is good, since how can one put a
maximum on how impressed one can be?  After all, if someone comes
along tomorrow with a molecule that has 11 types crammed into 167 atoms,
we ought to be even more impressed than by Margot's example.
How do several molecules compare according to these metrics?
Ar	CO	margot		formamide	2*formamide
N		1	2	167		6		12
k		1	2	 10		4		 4
Metric 0	1	2	 10		4		 4
Metric 1	1	1	  0.060		0.67		 0.33
Metric 2	1	2	  0.60		2.67	 	 1.33
(N.B.  2*formaldehyde is formaldehyde dymer)
Note that using Metric 2, formaldehyde is quite a bit more impressive
than Margot's example.  Note also that both Metric 1 and Metric 2 consider
formamide dimer to be less impressive than formamide itself.
atom types (Br, Cd, Cl, I, O, P, W) occur only one or two times.  In
fact, the remaining 30% of the types take up 92% of the atoms.  It
scarcely seems fair to ascribe full importance to types which appear
with very low frequency.
We would like to derive a k-value adjusted for frequency of appearance.
The information-theoretical (or statistical) entropy provides a way of
doing this.  First one calculates an entropy as follows:
S = - Sum_over_k_types{ p[i] log_B p[i] }
where B is the base of the logarithm used and p[i] is n[i]/N, where
n[i] is the number of times that type appears;  thus, for C in
formaldehyde, p[i] is 1/6.
We then calculate the "effective number of types" as:
k* = B^S.
Thus, if the natural log is used in the calculation of S, k* = exp( S ).
(We belabor this point because in information theory, it is common to
use B=2, giving S in bits.  Then k*=2^S.  But k* will always come out
the same regardless of the base of the log.)
k* achieves its theoretical maximum of k when all the types appear
equal frequency, but is lower than k when the frequencies of type
appearance are unequal.  Thus, for CO, k=k*=2, but for Margot's
example we have k=10, but k*=3.657.  Note that this accords with our
notion that only about 3 types account for most of the atoms in this
molecule.  k* has an interpretation similar to that of the numerical
value of a partition function:  it is approximately equal to the number
of types "occupied" in the molecule;  types with small fractional
populations don't count for very much.
So I define Metric 3 to be simply k*.  The results using it are
interesting, but share some of the difficulties of Metric 0;  namely,
if the molecule grows with the same distribution of atoms, as by
dimerization, k* doesn't change;  we would be more impressed if we
could "fill up" the available atoms with types, and k* doesn't reflect
this.  Nevertheless, note that just using k*, formaldehyde is nearly as
impressive as Margot's example.
My final proposal, Metric 4, is just like Metric 2, but
replacing k with k*;  i.e., it is equal to k*^2/N.  Results for
the four molecules shown above are as follows:
Ar	CO	margot		formamide	2*formamide
N		1	2	167		6		12
k		1	2	 10		4		 4
k*		1	2	  3.657		3.464		 3.464
Metric 0	1	2	 10		4		 4
Metric 1	1	1	  0.060		0.67		 0.33
Metric 2	1	2	  0.60		2.67	 	 1.33
Metric 3	1	2	  3.657		3.464		 3.464
Metric 4	1	2	  0.080		2.000		 1.000
Note that by most reasonable criteria (including Metric 4, the most
reasonable, IMHO), formamide is far more impressive than Margot's
example.
The grand challenge:  What is the most impressive molecule you can think
of, using Metric 4?  Basically, having a lot of types crammed into
a small number of atoms will win.  For example, CFClBrI has k=k*=5,
and Metric 4 = 5.  Note that if N=k (all types are different), Metric 4
is equal to k, since in this situation k*=k (all types occur with
equal frequency).
It probably would be fairly easy for inorganic chemists to come up
with compounds for which Metric 4 is 9 or 10.
-P.
Literature references for k*:
'Information-theoretical Entropy as a Measure of Sequence
Variability', Peter S. Shenkin, Batu Erman and Lucy D. Mastrandrea,
PROTEINS: Structure, Function and Genetics, 11, 297-313 (1991)
Rosemary Swanson also published on the same concept in J. Chem. Ed.
She gave the name "optiony" to what I call k*.  I'm rather partial
to k*;  I especially like her duets with Tennessee Ernie Ford. :-)
--
************************ The secret of life: *************************
*Peter S. Shenkin, Box 768 Havemeyer Hall, Chemistry, Columbia Univ.,*
* New York, NY  10027;     shenkin (- at -) columbia.edu;     (212) 854-5143  *
************* If you find a loose thread, don't pull it. *************
```