From owner-chemistry-!at!-ccl.net Mon Nov 7 12:23:01 2011 From: "Jose R. Valverde jrvalverde]![cnb.csic.es" To: CCL Subject: CCL: Science code manifesto Message-Id: <-45835-111104140245-3147-Jv0Xy7jhrL+fIaletiFl3w-.-server.ccl.net> X-Original-From: "Jose R. Valverde" Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Date: Fri, 4 Nov 2011 19:02:36 +0100 Mime-Version: 1.0 Sent to CCL by: "Jose R. Valverde" [jrvalverde%a%cnb.csic.es] Regarding all this issue, I came late after some traveling and read it all now. Let me take a stab at it: 1) If your science is funded by others, then whatever you do, no matter how much effort it required, does not belong to you but to whomever paid for it: the public or your boss. So, I see a reason for anybody in the public requesting access to works developed with their money, just as many other public and administrative documents are public or get declassified after a sensible time. Refusal is burglary of their hard earned tax dollars. That leaves the issue of closed developments. And brings more complex issues. Granted. We should find a way to coexist. But one thing is undeniable, accepting closed results is an act of blind faith. Accepting open results is only an act of faith that can always be verified if needed. It's like the difference between religious beliefs (accept my results because -backed by G.- I say so) vs. scientific beliefs (accept my results because I say so, but you can verify them). 2) Everybody seems very concerned about reviewers. But only journal reviewers. C'mon! Anybody who uses any tool is a reviewer of it. And of course there are those who won't fix a flat in their car or bike, but there are lots who will. Most of them cannot build a car, but may be able to fix it. The whole point of openness is that anybody can review the results and the method used. Not that everybody actually does. But there are many who will at one point in time so any lie is bound to surface and errors are easier to spot. Closed source is the untouchable 'holy IP' that anatemizes questioners. 3) Literacy. Most of us may be X-code illiterate. But we are not total dumbs. Neither are we absolutely perfect. Thus, any code will have bugs and as the saying goes, 'you can fool everybody some times, somebody all times, but you can't fool MOM'. Some bugs may require experts to fix, others can be fixed by anyone, and the author always has the last word. The corollary is simple: whenever you use any complex code you'll face bugs. With closed source you're stuck. With open source you can always give it a try and sometimes you or an expert friend/student will be able to fix it. 4) Reproducibility: having access to an executable does not explain how it works. Certainly there is complex equipment no one would tamper with. But, hey, we scientists do build our own equipment often, and do science because we like to tamper with complex things. When things are not reproducible, closed tools do not allow you to decide which result to believe. Be it spectrometry or calculations. You must believe in the tool being right because the author -invested by the Grace of G.- says s/he cannot be wrong. 5) Competition: if you give access to your code to others they will be able to use it. Yeah. So what? That is what science is about. Yes, and getting a tenure track position. Now, who is more likely to get it? One obscure guy who published a work using his own code that none else is able/cared to reproduce? Or one obscure guy whose code is being used by many famous scientists who reviewed it and mention it in their high-impact publications? If your code is useful, publish it. By all means do also publish your results computed with it. But you are more likely to get citations from your code -if it was worth using by others- than from your results. 6) Equilibrium. This is the crux of the problem. Open source imposes a new equilibrium: with closed source you avoid a lot of trouble and criticism, nobody will see if you are a sloppy programmer. Nobody will know if your program does not work correctly (everybody is used to new versions giving new results, specially if they include a new algorithm), so hiding mistakes becomes trivial. With open source you trade that comfort in obscurantism by the benefits of light: your code, shared, may become popular if it is good, if it does, fixes will come for free, you remove justification for others to compete doing the same thing (and for them getting grants to repeat your work, bye bye competitors), and save on costs for your research, so you can do more... and get more responsibility. It's a new equilibrium. 7) Knowledge: closed tools/data allow no one else to learn from them, so competitors have it more difficult to reach your level of knowledge (think tenure track is your way). Think twice: they'll have to learn the hard way. But they will, and to do so will waste money you could be using. Sharing everything makes it easier for them to learn, to cite you, and disincentives them from reproducing your work as it is trivial to do and won't help them win points. So, why have so many companies shared the information on drugs in Zinc? There is an opportunity for them to benefit from the work of others. You find a use for their drug and they sell it. 8) Equilibrium: that brings us back to square one. Is there a benefit for a company to share/open their code? I'd bet in most cases yes. Specially under GPL: that way competitors may use it but improvements will flow back. And no one else will be able to sell it at premium prices. The author can do as s/he wishes. And clients will -if you are good- prefer to contract the author for maintenance, extensions, etc. than third, less knowledgeable parties. Of course there are always exceptions. My bet is there should be just that, exceptions. And yet, nobody precludes you from publishing in the "Gullible" Chemical Journal your irreproducible results. There is space for all. But certainly, even the faster-than-light result has been "stopped" until it can be reproduced, and I can hardly thing of something more difficult (or expensive) to reproduce. Nevertheless, I can see I'm not going to convince everybody. Dealing with special cases will require a lot of politics, we are human after all. But, personally, I see nothing wrong with defining a sensible baseline for a new equilibrium that brings more openness and less obscurantism (and that saves everybody's dollars in a time of crisis). It's not only Climate: Biolinformatics, Cryptography, Physics, Statistics and many other fields have realized that was the best route for them long ago. I'm not saying it is for Computational Chemistry, but it certainly is time to ponder it. j -- EMBnet/CNB Scientific Computing Service Solving all your computer needs for Scientific Research. http://bioportal.cnb.csic.es http://www.es.embnet.org