|CCL Message Filter|
Filtering your CCL Mail
If you are a subscriber of CCL, you can filter CCL messages before they are delivered to your address by CCL server. Filtering requires that you fill out a complicated Web form after you seriously analyzed what messages you want to get from CCL. The content of your filter will not be knowingly released to anyone. We will not tell anyone what you like and what you hate. But CCL administrators may want to share some of your filtering prescriptions (anonymously) for the common good.
Before you get anywhere with this, please carefully read this manual. Please help me to make it easier to read. I would appreciate your comments. Then you need to learn Perl regular expressions (e.g., you can read my: http://www.ccl.net/chemistry/resources/tips/regular_expressions.shtml unless you are an expert). Then you will need to fill out a Web form. Finally, you need to monitor the messages for a while and compare them with the archive of all messages at: http://www.ccl.net/chemistry/resources/messages/index.shtml to see if your filter does what it is supposed to do. You can always go back to the filter setup form and tune it up. If you are too busy or too lazy to do it, do not even start. At the same time, for what it is worth, learning Perl regular expressions will make you more productive and your Return On Learning Investment (ROLI -- a four letter word) will pay off many times over in all aspects of your computational work.
Before you are allowed to setup the filter you will
need to authenticate -- have a recent CCL message handy
and look at its header (namely, the
To: and the Message-Id: header lines).
Also, open the http://www.ccl.net/cgi-bin/ccl/regexp/test_re.pl regular
expression testing form and test your regular expressions before you enter
them into the filter. It will save you a lot of frustration.
The filter setup Web form asks you to specify regular
expressions and assign numeric value (positive or negative score)
to each of them. You also need to tell which part of the message
should be matched (header, body, or both). Before the message is sent
to you, the CCL server checks if you created the filter.
If so, the software retrieves filter specs and matches your regular
expressions with the message. Numeric values assigned
to regular expressions that matched are added together.
If the sum is greater or equal to zero
then the message is delivered to you. If the sum is negative,
you will not get the message. To avoid rounding errors,
the sum that falls within
From firstname.lastname@example.org Thu Nov 3 12:01:18 2005 Received: from server.example.com (server.example.com [192.168.1.23]) by server.ccl.net (8.13.1/8.13.1) with ESMTP id jA3H1GJv008592 for <email@example.com>; Thu, 3 Nov 2005 12:01:16 -0500 Received: from localhost.localdomain (server.example.com [192.168.1.23]) by server.example.com (8.13.0/8.13.0) with ESMTP id jA3H1C4W003410 for <firstname.lastname@example.org>; Thu, 3 Nov 2005 12:01:13 -0500 Message-Id: <200511031701.jA3H1C4W003410@server.example.com> Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: binary Date: Thu, 03 Nov 2005 12:01:12 -0500 MIME-Version: 1.0 Reply-To: "John M. Smith" <John.M.Smith@example.com> Organization: Example.Com Corporation Subject: Spin contamination papers? From: "John M. Smith" <John.M.Smith@example.com> To: email@example.com I am looking for papers on spin contamination that present results of calculations on polyradicals with unrestricted KS as implemented in QuickMol. John M. Smith
This is the simplest possible text message. Some messages will have encoded parts and attachments. Currently we do not address these complications and treat all messages as a straight text. In most cases it will not matter much but on rare occasions you can get a message that you did not want or loose the message on your favorite topic due to character encoding. For your matching convenience each header line is converted to a single line (continuation lines are joined with the first line). Using the message above as an example, the filter
Regular expression Message Match Scope Score /spin/i [body] 10.0 /quickmol/i [header+body] -5.0 /smith\@example\.com/i [header] -10.0 /johns\@other\.com/i [header] -10.0 /jerk\@hootmail\.com/i [header] -100.0 /^Subject:\s.*Lottery.*$/im [header] -1000.0
will stop the example message (the first 3 expression matched and the sum is -5.0), while the filter:
Regular expression Message Match Scope Score /(spin|radical|esr)/i [body] 10.0 /unrestricted|uhf|uks/i [body] 10.0 /quickmol/i [header+body] -5.0 /smith\@example\.com/ [header] -10.0 /johns\@other\.com/ [header] -10.0 /jerk\@hootmail\.com/ [header] -10.0
will pass the message since the sum of scores of the regular expressions that matched is positive (+5.0).
This is a new thing, and there is no experience on how to use it in practice. My way of thinking is that you should initially focus on rejecting messages that you do not want rather than prioritize messages by interest. Assuming (an abstract case, since I have to be politically correct) that you hate mail coming from a country called Buenita (ISO country code: BU) that is a source of messages promising fraudulent business deals and also the mail from one CCL subscriber (firstname.lastname@example.org) that you despise personally, you can make a simple filter like:
Regular expression Message Match Scope Score /\.bu\W/i [header] -100.0 /jerk\@jerks\.example\.com/i [header] -100.0
and rest assured that these messages will not be sent to you by CCL. You can also use filters to protect your mailbox from overflowing when we have a "flame war" on CCL and then remove the breaks when it is over. To stop all messages from CCL without unsubscribing, you could do:
Regular expression Message Match Scope Score /./ [header] -100.0
but make sure that you keep at least one recent message that you received from CCL to remove this break, since the filter setup Web page requires your CCL Message-Id and/or your CCL subscriber Id for authentication. It is probably easier though to unsubscribe temporarily using appropriate Web form available from the CCL Web page. If you FUBAR or FUMTU, please contact me, and I will fix it for you. Now you are ready... Go to the page: http://www.ccl.net/cgi-bin/ccl/enter_preferences and set your CCL mail preferences.
Maybe, at some point I will build upon
your experience and come up with some typical prescriptions. In a long run,
it would be much better to come with the set of categories/topics
to which each message can be assigned (say: quantum chemistry, drug
design, molecular dynamics, spectroscopy, molecular graphics, etc.)
and subscribe to one or more topics. This is an interesting
(and far from trivial) research problem, and there are numerous
ways of attacking it. One possible approach would be
to create a file for each topic with text expressions (abbreviations,
keywords, names, text snippets, etc.). Then calculate some text similarity
index between each file and the message (e.g., a normalized overlapping
ngram count) and assign the message to one or more topics depending on the
degree of similarity. But again... this is a good topic for a Ph.D.
thesis or to do something useful after retirement. Since the
income that CCL brings is negligible, I am not sure when and if I will
approach this issue.
|Modified: Mon Nov 28 00:01:45 2005 GMT|
|Page accessed 72446 times since Fri Nov 4 00:11:17 2005 GMT|