*From*: "Peter Shenkin" <shenkin (- at -) still3.chem.columbia.edu>*Subject*: Re: CCL:Clarification on trivia question*Date*: Mon, 24 Apr 1995 14:23:02 -0400

On Apr 24, 3:37pm, margot wrote: > Subject: CCL:Clarification on trivia question > According to "The Name Game" by A. Nickon and E. F. Silversmith, Pergamon > 1987, p. 149 > > a compound containing 10 elements: C60H78Br2CdCl2I2N16O2P2W2 ... > Have fun, > margot OK, Margot, you're on. Here's my idea of fun; YMMV: Let's see; the cited compound has 167 atoms divided among 10 atom types. Certainly this would not be as impressive as a molecule that has 10 atoms, all of different types. But is it as impressive as, say, formamide, which packs four atom types into a molecule with only six atoms? How might one devise an "impressiveness metric" for this problem? Let's define N as the number of atoms and k as the number of atom types. Let us denote by "Metric 0" k itself. One might be tempted to try to use k/N as a metric; this has a maximum of 1 for any molecule that has as many types as it has atoms, but does not accord with intuition, since any monatomic species exhibits this maximum, and we don't regard these as "impressive." Anyway, call k/N Metric 1. A second thought might be to use the product of k and k/N (i.e., k^2/N) as an impressiveness metric. Call this Metric 2. Metric 2 has no theoretical maximum, which is good, since how can one put a maximum on how impressed one can be? After all, if someone comes along tomorrow with a molecule that has 11 types crammed into 167 atoms, we ought to be even more impressed than by Margot's example. How do several molecules compare according to these metrics? Ar CO margot formamide 2*formamide N 1 2 167 6 12 k 1 2 10 4 4 Metric 0 1 2 10 4 4 Metric 1 1 1 0.060 0.67 0.33 Metric 2 1 2 0.60 2.67 1.33 (N.B. 2*formaldehyde is formaldehyde dymer) Note that using Metric 2, formaldehyde is quite a bit more impressive than Margot's example. Note also that both Metric 1 and Metric 2 consider formamide dimer to be less impressive than formamide itself. Let's think about this a bit further, now. In Margot's example, several atom types (Br, Cd, Cl, I, O, P, W) occur only one or two times. In fact, the remaining 30% of the types take up 92% of the atoms. It scarcely seems fair to ascribe full importance to types which appear with very low frequency. We would like to derive a k-value adjusted for frequency of appearance. The information-theoretical (or statistical) entropy provides a way of doing this. First one calculates an entropy as follows: S = - Sum_over_k_types{ p[i] log_B p[i] } where B is the base of the logarithm used and p[i] is n[i]/N, where n[i] is the number of times that type appears; thus, for C in formaldehyde, p[i] is 1/6. We then calculate the "effective number of types" as: k* = B^S. Thus, if the natural log is used in the calculation of S, k* = exp( S ). (We belabor this point because in information theory, it is common to use B=2, giving S in bits. Then k*=2^S. But k* will always come out the same regardless of the base of the log.) k* achieves its theoretical maximum of k when all the types appear equal frequency, but is lower than k when the frequencies of type appearance are unequal. Thus, for CO, k=k*=2, but for Margot's example we have k=10, but k*=3.657. Note that this accords with our notion that only about 3 types account for most of the atoms in this molecule. k* has an interpretation similar to that of the numerical value of a partition function: it is approximately equal to the number of types "occupied" in the molecule; types with small fractional populations don't count for very much. So I define Metric 3 to be simply k*. The results using it are interesting, but share some of the difficulties of Metric 0; namely, if the molecule grows with the same distribution of atoms, as by dimerization, k* doesn't change; we would be more impressed if we could "fill up" the available atoms with types, and k* doesn't reflect this. Nevertheless, note that just using k*, formaldehyde is nearly as impressive as Margot's example. My final proposal, Metric 4, is just like Metric 2, but replacing k with k*; i.e., it is equal to k*^2/N. Results for the four molecules shown above are as follows: Ar CO margot formamide 2*formamide N 1 2 167 6 12 k 1 2 10 4 4 k* 1 2 3.657 3.464 3.464 Metric 0 1 2 10 4 4 Metric 1 1 1 0.060 0.67 0.33 Metric 2 1 2 0.60 2.67 1.33 Metric 3 1 2 3.657 3.464 3.464 Metric 4 1 2 0.080 2.000 1.000 Note that by most reasonable criteria (including Metric 4, the most reasonable, IMHO), formamide is far more impressive than Margot's example. The grand challenge: What is the most impressive molecule you can think of, using Metric 4? Basically, having a lot of types crammed into a small number of atoms will win. For example, CFClBrI has k=k*=5, and Metric 4 = 5. Note that if N=k (all types are different), Metric 4 is equal to k, since in this situation k*=k (all types occur with equal frequency). It probably would be fairly easy for inorganic chemists to come up with compounds for which Metric 4 is 9 or 10. -P. Literature references for k*: 'Information-theoretical Entropy as a Measure of Sequence Variability', Peter S. Shenkin, Batu Erman and Lucy D. Mastrandrea, PROTEINS: Structure, Function and Genetics, 11, 297-313 (1991) Rosemary Swanson also published on the same concept in J. Chem. Ed. She gave the name "optiony" to what I call k*. I'm rather partial to k*; I especially like her duets with Tennessee Ernie Ford. :-) -- ************************ The secret of life: ************************* *Peter S. Shenkin, Box 768 Havemeyer Hall, Chemistry, Columbia Univ.,* * New York, NY 10027; shenkin (- at -) columbia.edu; (212) 854-5143 * ************* If you find a loose thread, don't pull it. *************