From owner-chemistry*- at -*ccl.net Mon Nov 7 16:41:00 2011 From: "David A Mannock dmannock*ualberta.ca" To: CCL Subject: CCL: Science code manifesto Message-Id: <-45836-111107161004-15816-xwMcvbc3ko8VbWvc/WvZtQ%x%server.ccl.net> X-Original-From: David A Mannock Content-Type: multipart/alternative; boundary=f46d043c05cce71d4a04b12b7961 Date: Mon, 7 Nov 2011 14:09:53 -0700 MIME-Version: 1.0 Sent to CCL by: David A Mannock [dmannock#,#ualberta.ca] --f46d043c05cce71d4a04b12b7961 Content-Type: text/plain; charset=ISO-8859-1 Jose, good summary of the issues, which presents good arguments for explanation and publication of computer chemistry codes. If only the commercial interests could see the benefits of that approach, we would all be further forward. I am guessing that they have beta testers in independent labs providing that feedback for them though, even if that scrutiny is not as comprehensive as releasing open source software. I guess it's the difference between making lots of money from specialized users and institutions for closed source code vs a little bit of money/copy for open source code from the great unwashed with the added benefit of lots of feedback. Dave On Fri, Nov 4, 2011 at 12:02 PM, Jose R. Valverde jrvalverde]![cnb.csic.es < owner-chemistry-$-ccl.net> wrote: > > Sent to CCL by: "Jose R. Valverde" [jrvalverde%a%cnb.csic.es] > Regarding all this issue, I came late after some traveling and read it all > now. > > Let me take a stab at it: > > 1) If your science is funded by others, then whatever you do, no matter how > much effort it required, does not belong to you but to whomever paid for > it: > the public or your boss. > > So, I see a reason for anybody in the public requesting access to > works developed with their money, just as many other public and > administrative > documents are public or get declassified after a sensible time. Refusal is > burglary of their hard earned tax dollars. > > That leaves the issue of closed developments. And brings more > complex > issues. Granted. We should find a way to coexist. But one thing is > undeniable, > accepting closed results is an act of blind faith. Accepting open results > is > only an act of faith that can always be verified if needed. It's like the > difference between religious beliefs (accept my results because -backed by > G.- > I say so) vs. scientific beliefs (accept my results because I say so, but > you > can verify them). > > 2) Everybody seems very concerned about reviewers. But only journal > reviewers. > C'mon! Anybody who uses any tool is a reviewer of it. And of course there > are > those who won't fix a flat in their car or bike, but there are lots who > will. > Most of them cannot build a car, but may be able to fix it. > > The whole point of openness is that anybody can review the results > and the method used. Not that everybody actually does. But there are many > who > will at one point in time so any lie is bound to surface and errors are > easier > to spot. Closed source is the untouchable 'holy IP' that anatemizes > questioners. > > 3) Literacy. Most of us may be X-code illiterate. But we are not total > dumbs. > Neither are we absolutely perfect. Thus, any code will have bugs and as the > saying goes, 'you can fool everybody some times, somebody all times, but > you > can't fool MOM'. Some bugs may require experts to fix, others can be fixed > by anyone, and the author always has the last word. > > The corollary is simple: whenever you use any complex code you'll > face > bugs. With closed source you're stuck. With open source you can always > give it > a try and sometimes you or an expert friend/student will be able to fix it. > > 4) Reproducibility: having access to an executable does not explain how it > works. Certainly there is complex equipment no one would tamper with. But, > hey, we scientists do build our own equipment often, and do science because > we like to tamper with complex things. > > When things are not reproducible, closed tools do not allow you to > decide which result to believe. Be it spectrometry or calculations. You > must believe in the tool being right because the author -invested by the > Grace of G.- says s/he cannot be wrong. > > 5) Competition: if you give access to your code to others they will be able > to use it. Yeah. So what? That is what science is about. Yes, and getting > a tenure track position. Now, who is more likely to get it? One obscure > guy who published a work using his own code that none else is able/cared > to reproduce? Or one obscure guy whose code is being used by many famous > scientists who reviewed it and mention it in their high-impact > publications? > > If your code is useful, publish it. By all means do also publish > your results computed with it. But you are more likely to get citations > from > your code -if it was worth using by others- than from your results. > > 6) Equilibrium. This is the crux of the problem. Open source imposes a new > equilibrium: with closed source you avoid a lot of trouble and criticism, > nobody will see if you are a sloppy programmer. Nobody will know if your > program does not work correctly (everybody is used to new versions giving > new results, specially if they include a new algorithm), so hiding mistakes > becomes trivial. > > With open source you trade that comfort in obscurantism by the > benefits of light: your code, shared, may become popular if it is good, > if it does, fixes will come for free, you remove justification for others > to compete doing the same thing (and for them getting grants to repeat > your work, bye bye competitors), and save on costs for your research, so > you can do more... and get more responsibility. It's a new equilibrium. > > 7) Knowledge: closed tools/data allow no one else to learn from them, so > competitors > have it more difficult to reach your level of knowledge (think tenure track > is your way). Think twice: they'll have to learn the hard way. But they > will, > and to do so will waste money you could be using. Sharing everything makes > it > easier for them to learn, to cite you, and disincentives them from > reproducing > your work as it is trivial to do and won't help them win points. > > So, why have so many companies shared the information on drugs in > Zinc? > There is an opportunity for them to benefit from the work of others. You > find > a use for their drug and they sell it. > > 8) Equilibrium: that brings us back to square one. Is there a benefit for a > company to share/open their code? I'd bet in most cases yes. Specially > under > GPL: that way competitors may use it but improvements will flow back. And > no > one else will be able to sell it at premium prices. The author can do as > s/he > wishes. And clients will -if you are good- prefer to contract the author > for maintenance, extensions, etc. than third, less knowledgeable parties. > > Of course there are always exceptions. My bet is there should be > just > that, exceptions. And yet, nobody precludes you from publishing in the > "Gullible" Chemical Journal your irreproducible results. There is space > for all. > But certainly, even the faster-than-light result has been "stopped" until > it > can be reproduced, and I can hardly thing of something more difficult (or > expensive) to reproduce. > > Nevertheless, I can see I'm not going to convince everybody. Dealing > with special cases will require a lot of politics, we are human after all. > > But, personally, I see nothing wrong with defining a sensible baseline for > a > new equilibrium that brings more openness and less obscurantism (and that > saves > everybody's dollars in a time of crisis). > > It's not only Climate: Biolinformatics, Cryptography, Physics, Statistics > and > many other fields have realized that was the best route for them long ago. > I'm > not saying it is for Computational Chemistry, but it certainly is time to > ponder it. > > j > -- > EMBnet/CNB > Scientific Computing Service > Solving all your computer needs for Scientific > Research. > > http://bioportal.cnb.csic.es > http://www.es.embnet.org> > > --f46d043c05cce71d4a04b12b7961 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Jose, good summary of the issues, which presents good arguments for explana= tion and publication of computer chemistry codes. If only the commercial in= terests could see the benefits of that approach, we would all be further fo= rward. I am guessing that they have beta testers in independent labs provid= ing that feedback for them though, even if that scrutiny is not as comprehe= nsive as releasing open source software. I guess it's the difference be= tween making lots of money from specialized users and institutions for clos= ed source code vs a little bit of money/copy for open source code from the = great unwashed with the added benefit of lots of feedback. Dave

On Fri, Nov 4, 2011 at 12:02 PM, Jose R. Val= verde jrvalverde]![cnb.csic.es <owner-chemistry-$-ccl= .net> wrote:

Sent to CCL by: "Jose R. Valverde" [jrvalverde%a%cnb.csic.es]
Regarding all this issue, I came late after some traveling and read it all = now.

Let me take a stab at it:

1) If your science is funded by others, then whatever you do, no matter how=
much effort it required, does not belong to you but to whomever paid for it= :
the public or your boss.

=A0 =A0 =A0 =A0So, I see a reason for anybody in the public requesting acc= ess to
works developed with their money, just as many other public and administrat= ive
documents are public or get declassified after a sensible time. Refusal is<= br> burglary of their hard earned tax dollars.

=A0 =A0 =A0 =A0That leaves the issue of closed developments. And brings mo= re complex
issues. Granted. We should find a way to coexist. But one thing is undeniab= le,
accepting closed results is an act of blind faith. Accepting open results i= s
only an act of faith that can always be verified if needed. It's like t= he
difference between religious beliefs (accept my results because -backed by = G.-
I say so) vs. scientific beliefs (accept my results because I say so, but y= ou
can verify them).

2) Everybody seems very concerned about reviewers. But only journal reviewe= rs.
C'mon! Anybody who uses any tool is a reviewer of it. And of course the= re are
those who won't fix a flat in their car or bike, but there are lots who= will.
Most of them cannot build a car, but may be able to fix it.

=A0 =A0 =A0 =A0The whole point of openness is that anybody can review the = results
and the method used. Not that everybody actually does. But there are many w= ho
will at one point in time so any lie is bound to surface and errors are eas= ier
to spot. Closed source is the untouchable 'holy IP' that anatemizes= questioners.

3) Literacy. Most of us may be X-code illiterate. But we are not total dumb= s.
Neither are we absolutely perfect. Thus, any code will have bugs and as the=
saying goes, 'you can fool everybody some times, somebody all times, bu= t you
can't fool MOM'. Some bugs may require experts to fix, others can b= e fixed
by anyone, and the author always has the last word.

=A0 =A0 =A0 =A0The corollary is simple: whenever you use any complex code = you'll face
bugs. With closed source you're stuck. With open source you can always = give it
a try and sometimes you or an expert friend/student will be able to fix it.=

4) Reproducibility: having access to an executable does not explain how it<= br> works. Certainly there is complex equipment no one would tamper with. But,<= br> hey, we scientists do build our own equipment often, and do science because=
we like to tamper with complex things.

=A0 =A0 =A0 =A0When things are not reproducible, closed tools do not allow= you to
decide which result to believe. Be it spectrometry or calculations. You
must believe in the tool being right because the author -invested by the Grace of G.- says s/he cannot be wrong.

5) Competition: if you give access to your code to others they will be able=
to use it. Yeah. So what? That is what science is about. Yes, and getting a tenure track position. Now, who is more likely to get it? One obscure
guy who published a work using his own code that none else is able/cared to reproduce? Or one obscure guy whose code is being used by many famous scientists who reviewed it and mention it in their high-impact publications= ?

=A0 =A0 =A0 =A0If your code is useful, publish it. By all means do also pu= blish
your results computed with it. But you are more likely to get citations fro= m
your code -if it was worth using by others- than from your results.

6) Equilibrium. This is the crux of the problem. Open source imposes a new<= br> equilibrium: with closed source you avoid a lot of trouble and criticism, nobody will see if you are a sloppy programmer. Nobody will know if your program does not work correctly (everybody is used to new versions giving new results, specially if they include a new algorithm), so hiding mistakes=
becomes trivial.

=A0 =A0 =A0 =A0With open source you trade that comfort in obscurantism by = the
benefits of light: your code, shared, may become popular if it is good,
if it does, fixes will come for free, you remove justification for others to compete doing the same thing (and for them getting grants to repeat
your work, bye bye competitors), and save on costs for your research, so you can do more... and get more responsibility. It's a new equilibrium.=

7) Knowledge: closed tools/data allow no one else to learn from them, so co= mpetitors
have it more difficult to reach your level of knowledge (think tenure track=
is your way). Think twice: they'll have to learn the hard way. But they= will,
and to do so will waste money you could be using. Sharing everything makes = it
easier for them to learn, to cite you, and disincentives them from reproduc= ing
your work as it is trivial to do and won't help them win points.

=A0 =A0 =A0 =A0So, why have so many companies shared the information on dr= ugs in Zinc?
There is an opportunity for them to benefit from the work of others. You fi= nd
a use for their drug and they sell it.

8) Equilibrium: that brings us back to square one. Is there a benefit for a=
company to share/open their code? I'd bet in most cases yes. Specially = under
GPL: that way competitors may use it but improvements will flow back. And n= o
one else will be able to sell it at premium prices. The author can do as s/= he
wishes. And clients will -if you are good- prefer to contract the author for maintenance, extensions, etc. than third, less knowledgeable parties.
=A0 =A0 =A0 =A0Of course there are always exceptions. My bet is there shou= ld be just
that, exceptions. And yet, nobody precludes you from publishing in the
"Gullible" Chemical Journal your irreproducible results. There is= space for all.
But certainly, even the faster-than-light result has been "stopped&quo= t; until it
can be reproduced, and I can hardly thing of something more difficult (or expensive) to reproduce.

=A0 =A0 =A0 =A0Nevertheless, I can see I'm not going to convince every= body. Dealing
with special cases will require a lot of politics, we are human after all.<= br>
But, personally, I see nothing wrong with defining a sensible baseline for = a
new equilibrium that brings more openness and less obscurantism (and that s= aves
everybody's dollars in a time of crisis).

It's not only Climate: Biolinformatics, Cryptography, Physics, Statisti= cs and
many other fields have realized that was the best route for them long ago. = I'm
not saying it is for Computational Chemistry, but it certainly is time to ponder it.

=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0j
--
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0EMBnet/CNB
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Scientific Computing Service
=A0 =A0 =A0 =A0Solving all your computer needs for Scientific
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Research.

=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0http://bioportal.cnb.csic.es
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0http://www.es.embnet.org



-=3D This is automatically added to each message by the mailing script =3D-=
E-mail to subscribers: CHEMISTRY-$-ccl.n= et or use:
=A0 =A0 =A0http://www.ccl.net/cgi-bin/ccl/send_ccl_message

E-mail to administrators: CHEM= ISTRY-REQUEST-$-ccl.net or use
=A0 =A0 =A0http://www.ccl.net/cgi-bin/ccl/send_ccl_message

Subscribe/Unsubscribe:
=A0 =A0 =A0http://www.ccl.net/chemistry/sub_unsub.shtml

Before posting, check wait time at: http://www.ccl.net

Job: http://www.ccl.n= et/jobs
Conferences: http://server.ccl.net/chemistry/announcements/co= nferences/

Search Messages: http://www.ccl.net/chemistry/searchccl/index.shtml
=A0 =A0 =A0
h= ttp://www.ccl.net/spammers.txt

RTFI: http://www.ccl.net/chemistry/aboutccl/instructions/



--f46d043c05cce71d4a04b12b7961--