CCL: Science code manifesto
- From: "Jose R. Valverde"
- Subject: CCL: Science code manifesto
- Date: Fri, 4 Nov 2011 19:02:36 +0100
Sent to CCL by: "Jose R. Valverde" [jrvalverde%a%cnb.csic.es]
Regarding all this issue, I came late after some traveling and read it all now.
Let me take a stab at it:
1) If your science is funded by others, then whatever you do, no matter how
much effort it required, does not belong to you but to whomever paid for it:
the public or your boss.
So, I see a reason for anybody in the public requesting access to
works developed with their money, just as many other public and administrative
documents are public or get declassified after a sensible time. Refusal is
burglary of their hard earned tax dollars.
That leaves the issue of closed developments. And brings more complex
issues. Granted. We should find a way to coexist. But one thing is undeniable,
accepting closed results is an act of blind faith. Accepting open results is
only an act of faith that can always be verified if needed. It's like the
difference between religious beliefs (accept my results because -backed by G.-
I say so) vs. scientific beliefs (accept my results because I say so, but you
can verify them).
2) Everybody seems very concerned about reviewers. But only journal reviewers.
C'mon! Anybody who uses any tool is a reviewer of it. And of course there are
those who won't fix a flat in their car or bike, but there are lots who will.
Most of them cannot build a car, but may be able to fix it.
The whole point of openness is that anybody can review the results
and the method used. Not that everybody actually does. But there are many who
will at one point in time so any lie is bound to surface and errors are easier
to spot. Closed source is the untouchable 'holy IP' that anatemizes questioners.
3) Literacy. Most of us may be X-code illiterate. But we are not total dumbs.
Neither are we absolutely perfect. Thus, any code will have bugs and as the
saying goes, 'you can fool everybody some times, somebody all times, but you
can't fool MOM'. Some bugs may require experts to fix, others can be fixed
by anyone, and the author always has the last word.
The corollary is simple: whenever you use any complex code you'll face
bugs. With closed source you're stuck. With open source you can always give it
a try and sometimes you or an expert friend/student will be able to fix it.
4) Reproducibility: having access to an executable does not explain how it
works. Certainly there is complex equipment no one would tamper with. But,
hey, we scientists do build our own equipment often, and do science because
we like to tamper with complex things.
When things are not reproducible, closed tools do not allow you to
decide which result to believe. Be it spectrometry or calculations. You
must believe in the tool being right because the author -invested by the
Grace of G.- says s/he cannot be wrong.
5) Competition: if you give access to your code to others they will be able
to use it. Yeah. So what? That is what science is about. Yes, and getting
a tenure track position. Now, who is more likely to get it? One obscure
guy who published a work using his own code that none else is able/cared
to reproduce? Or one obscure guy whose code is being used by many famous
scientists who reviewed it and mention it in their high-impact publications?
If your code is useful, publish it. By all means do also publish
your results computed with it. But you are more likely to get citations from
your code -if it was worth using by others- than from your results.
6) Equilibrium. This is the crux of the problem. Open source imposes a new
equilibrium: with closed source you avoid a lot of trouble and criticism,
nobody will see if you are a sloppy programmer. Nobody will know if your
program does not work correctly (everybody is used to new versions giving
new results, specially if they include a new algorithm), so hiding mistakes
With open source you trade that comfort in obscurantism by the
benefits of light: your code, shared, may become popular if it is good,
if it does, fixes will come for free, you remove justification for others
to compete doing the same thing (and for them getting grants to repeat
your work, bye bye competitors), and save on costs for your research, so
you can do more... and get more responsibility. It's a new equilibrium.
7) Knowledge: closed tools/data allow no one else to learn from them, so
have it more difficult to reach your level of knowledge (think tenure track
is your way). Think twice: they'll have to learn the hard way. But they will,
and to do so will waste money you could be using. Sharing everything makes it
easier for them to learn, to cite you, and disincentives them from reproducing
your work as it is trivial to do and won't help them win points.
So, why have so many companies shared the information on drugs in Zinc?
There is an opportunity for them to benefit from the work of others. You find
a use for their drug and they sell it.
8) Equilibrium: that brings us back to square one. Is there a benefit for a
company to share/open their code? I'd bet in most cases yes. Specially under
GPL: that way competitors may use it but improvements will flow back. And no
one else will be able to sell it at premium prices. The author can do as s/he
wishes. And clients will -if you are good- prefer to contract the author
for maintenance, extensions, etc. than third, less knowledgeable parties.
Of course there are always exceptions. My bet is there should be just
that, exceptions. And yet, nobody precludes you from publishing in the
"Gullible" Chemical Journal your irreproducible results. There is
space for all.
But certainly, even the faster-than-light result has been "stopped"
can be reproduced, and I can hardly thing of something more difficult (or
expensive) to reproduce.
Nevertheless, I can see I'm not going to convince everybody. Dealing
with special cases will require a lot of politics, we are human after all.
But, personally, I see nothing wrong with defining a sensible baseline for a
new equilibrium that brings more openness and less obscurantism (and that saves
everybody's dollars in a time of crisis).
It's not only Climate: Biolinformatics, Cryptography, Physics, Statistics and
many other fields have realized that was the best route for them long ago. I'm
not saying it is for Computational Chemistry, but it certainly is time to
Scientific Computing Service
Solving all your computer needs for Scientific