CCL: Science code manifesto



 Sent to CCL by: "Jose R. Valverde" [jrvalverde%a%cnb.csic.es]
 Regarding all this issue, I came late after some traveling and read it all now.
 Let me take a stab at it:
 1) If your science is funded by others, then whatever you do, no matter how
 much effort it required, does not belong to you but to whomever paid for it:
 the public or your boss.
 	So, I see a reason for anybody in the public requesting access to
 works developed with their money, just as many other public and administrative
 documents are public or get declassified after a sensible time. Refusal is
 burglary of their hard earned tax dollars.
 	That leaves the issue of closed developments. And brings more complex
 issues. Granted. We should find a way to coexist. But one thing is undeniable,
 accepting closed results is an act of blind faith. Accepting open results is
 only an act of faith that can always be verified if needed. It's like the
 difference between religious beliefs (accept my results because -backed by G.-
 I say so) vs. scientific beliefs (accept my results because I say so, but you
 can verify them).
 2) Everybody seems very concerned about reviewers. But only journal reviewers.
 C'mon! Anybody who uses any tool is a reviewer of it. And of course there are
 those who won't fix a flat in their car or bike, but there are lots who will.
 Most of them cannot build a car, but may be able to fix it.
 	The whole point of openness is that anybody can review the results
 and the method used. Not that everybody actually does. But there are many who
 will at one point in time so any lie is bound to surface and errors are easier
 to spot. Closed source is the untouchable 'holy IP' that anatemizes questioners.
 3) Literacy. Most of us may be X-code illiterate. But we are not total dumbs.
 Neither are we absolutely perfect. Thus, any code will have bugs and as the
 saying goes, 'you can fool everybody some times, somebody all times, but you
 can't fool MOM'. Some bugs may require experts to fix, others can be fixed
 by anyone, and the author always has the last word.
 	The corollary is simple: whenever you use any complex code you'll face
 bugs. With closed source you're stuck. With open source you can always give it
 a try and sometimes you or an expert friend/student will be able to fix it.
 4) Reproducibility: having access to an executable does not explain how it
 works. Certainly there is complex equipment no one would tamper with. But,
 hey, we scientists do build our own equipment often, and do science because
 we like to tamper with complex things.
 	When things are not reproducible, closed tools do not allow you to
 decide which result to believe. Be it spectrometry or calculations. You
 must believe in the tool being right because the author -invested by the
 Grace of G.- says s/he cannot be wrong.
 5) Competition: if you give access to your code to others they will be able
 to use it. Yeah. So what? That is what science is about. Yes, and getting
 a tenure track position. Now, who is more likely to get it? One obscure
 guy who published a work using his own code that none else is able/cared
 to reproduce? Or one obscure guy whose code is being used by many famous
 scientists who reviewed it and mention it in their high-impact publications?
 	If your code is useful, publish it. By all means do also publish
 your results computed with it. But you are more likely to get citations from
 your code -if it was worth using by others- than from your results.
 6) Equilibrium. This is the crux of the problem. Open source imposes a new
 equilibrium: with closed source you avoid a lot of trouble and criticism,
 nobody will see if you are a sloppy programmer. Nobody will know if your
 program does not work correctly (everybody is used to new versions giving
 new results, specially if they include a new algorithm), so hiding mistakes
 becomes trivial.
 	With open source you trade that comfort in obscurantism by the
 benefits of light: your code, shared, may become popular if it is good,
 if it does, fixes will come for free, you remove justification for others
 to compete doing the same thing (and for them getting grants to repeat
 your work, bye bye competitors), and save on costs for your research, so
 you can do more... and get more responsibility. It's a new equilibrium.
 7) Knowledge: closed tools/data allow no one else to learn from them, so
 competitors
 have it more difficult to reach your level of knowledge (think tenure track
 is your way). Think twice: they'll have to learn the hard way. But they will,
 and to do so will waste money you could be using. Sharing everything makes it
 easier for them to learn, to cite you, and disincentives them from reproducing
 your work as it is trivial to do and won't help them win points.
 	So, why have so many companies shared the information on drugs in Zinc?
 There is an opportunity for them to benefit from the work of others. You find
 a use for their drug and they sell it.
 8) Equilibrium: that brings us back to square one. Is there a benefit for a
 company to share/open their code? I'd bet in most cases yes. Specially under
 GPL: that way competitors may use it but improvements will flow back. And no
 one else will be able to sell it at premium prices. The author can do as s/he
 wishes. And clients will -if you are good- prefer to contract the author
 for maintenance, extensions, etc. than third, less knowledgeable parties.
 	Of course there are always exceptions. My bet is there should be just
 that, exceptions. And yet, nobody precludes you from publishing in the
 "Gullible" Chemical Journal your irreproducible results. There is
 space for all.
 But certainly, even the faster-than-light result has been "stopped"
 until it
 can be reproduced, and I can hardly thing of something more difficult (or
 expensive) to reproduce.
 	Nevertheless, I can see I'm not going to convince everybody. Dealing
 with special cases will require a lot of politics, we are human after all.
 But, personally, I see nothing wrong with defining a sensible baseline for a
 new equilibrium that brings more openness and less obscurantism (and that saves
 everybody's dollars in a time of crisis).
 It's not only Climate: Biolinformatics, Cryptography, Physics, Statistics and
 many other fields have realized that was the best route for them long ago. I'm
 not saying it is for Computational Chemistry, but it certainly is time to
 ponder it.
 				j
 --
 			EMBnet/CNB
 		Scientific Computing Service
 	Solving all your computer needs for Scientific
 			Research.
 		http://bioportal.cnb.csic.es
 		  http://www.es.embnet.org