Re: CCL:Clarification on trivia question

 On Apr 24,  3:37pm, margot wrote:
 > Subject: CCL:Clarification on trivia question
 > According to "The Name Game" by A. Nickon and E. F. Silversmith,
 > 1987, p. 149
 > a compound containing 10 elements: C60H78Br2CdCl2I2N16O2P2W2
 > 	Have fun,
 > 			margot
 OK, Margot, you're on.  Here's my idea of fun; YMMV:
 Let's see;  the cited compound has 167 atoms divided among 10 atom types.
 Certainly this would not be as impressive as a molecule that has 10
 atoms, all of different types.  But is it as impressive as, say, formamide,
 which packs four atom types into a molecule with only six atoms?
 How might one devise an "impressiveness metric" for this problem?
 define N as the number of atoms and k as the number of atom types.
 Let us denote by "Metric 0" k itself.
 One might be tempted to try to use k/N as a metric;  this has a
 maximum of 1 for any molecule that has as many types as it has atoms,
 but does not accord with intuition, since any monatomic species exhibits
 this maximum, and we don't regard these as "impressive."  Anyway, call
 Metric 1.
 A second thought might be to use the product of k and k/N
 (i.e., k^2/N) as an impressiveness metric.  Call this Metric 2.  Metric 2
 has no theoretical maximum, which is good, since how can one put a
 maximum on how impressed one can be?  After all, if someone comes
 along tomorrow with a molecule that has 11 types crammed into 167 atoms,
 we ought to be even more impressed than by Margot's example.
 How do several molecules compare according to these metrics?
 		Ar	CO	margot		formamide	2*formamide
 N		1	2	167		6		12
 k		1	2	 10		4		 4
 Metric 0	1	2	 10		4		 4
 Metric 1	1	1	  0.060		0.67		 0.33
 Metric 2	1	2	  0.60		2.67	 	 1.33
 (N.B.  2*formaldehyde is formaldehyde dymer)
 Note that using Metric 2, formaldehyde is quite a bit more impressive
 than Margot's example.  Note also that both Metric 1 and Metric 2 consider
 formamide dimer to be less impressive than formamide itself.
 Let's think about this a bit further, now.  In Margot's example, several
 atom types (Br, Cd, Cl, I, O, P, W) occur only one or two times.  In
 fact, the remaining 30% of the types take up 92% of the atoms.  It
 scarcely seems fair to ascribe full importance to types which appear
 with very low frequency.
 We would like to derive a k-value adjusted for frequency of appearance.
 The information-theoretical (or statistical) entropy provides a way of
 doing this.  First one calculates an entropy as follows:
 	S = - Sum_over_k_types{ p[i] log_B p[i] }
 where B is the base of the logarithm used and p[i] is n[i]/N, where
 n[i] is the number of times that type appears;  thus, for C in
 formaldehyde, p[i] is 1/6.
 We then calculate the "effective number of types" as:
 	k* = B^S.
 Thus, if the natural log is used in the calculation of S, k* = exp( S ).
 (We belabor this point because in information theory, it is common to
 use B=2, giving S in bits.  Then k*=2^S.  But k* will always come out
 the same regardless of the base of the log.)
 k* achieves its theoretical maximum of k when all the types appear
 equal frequency, but is lower than k when the frequencies of type
 appearance are unequal.  Thus, for CO, k=k*=2, but for Margot's
 example we have k=10, but k*=3.657.  Note that this accords with our
 notion that only about 3 types account for most of the atoms in this
 molecule.  k* has an interpretation similar to that of the numerical
 value of a partition function:  it is approximately equal to the number
 of types "occupied" in the molecule;  types with small fractional
 populations don't count for very much.
 So I define Metric 3 to be simply k*.  The results using it are
 interesting, but share some of the difficulties of Metric 0;  namely,
 if the molecule grows with the same distribution of atoms, as by
 dimerization, k* doesn't change;  we would be more impressed if we
 could "fill up" the available atoms with types, and k* doesn't reflect
 this.  Nevertheless, note that just using k*, formaldehyde is nearly as
 impressive as Margot's example.
 My final proposal, Metric 4, is just like Metric 2, but
 replacing k with k*;  i.e., it is equal to k*^2/N.  Results for
 the four molecules shown above are as follows:
 		Ar	CO	margot		formamide	2*formamide
 N		1	2	167		6		12
 k		1	2	 10		4		 4
 k*		1	2	  3.657		3.464		 3.464
 Metric 0	1	2	 10		4		 4
 Metric 1	1	1	  0.060		0.67		 0.33
 Metric 2	1	2	  0.60		2.67	 	 1.33
 Metric 3	1	2	  3.657		3.464		 3.464
 Metric 4	1	2	  0.080		2.000		 1.000
 Note that by most reasonable criteria (including Metric 4, the most
 reasonable, IMHO), formamide is far more impressive than Margot's
 The grand challenge:  What is the most impressive molecule you can think
 of, using Metric 4?  Basically, having a lot of types crammed into
 a small number of atoms will win.  For example, CFClBrI has k=k*=5,
 and Metric 4 = 5.  Note that if N=k (all types are different), Metric 4
 is equal to k, since in this situation k*=k (all types occur with
 equal frequency).
 It probably would be fairly easy for inorganic chemists to come up
 with compounds for which Metric 4 is 9 or 10.
 Literature references for k*:
   'Information-theoretical Entropy as a Measure of Sequence
   Variability', Peter S. Shenkin, Batu Erman and Lucy D. Mastrandrea,
   PROTEINS: Structure, Function and Genetics, 11, 297-313 (1991)
  Rosemary Swanson also published on the same concept in J. Chem. Ed.
  She gave the name "optiony" to what I call k*.  I'm rather partial
  to k*;  I especially like her duets with Tennessee Ernie Ford. :-)
 ************************ The secret of life: *************************
 *Peter S. Shenkin, Box 768 Havemeyer Hall, Chemistry, Columbia Univ.,*
 * New York, NY  10027;     shenkin (- at -);     (212) 854-5143  *
 ************* If you find a loose thread, don't pull it. *************