CCL: Where can you publish articles on software?
- From: "Warren DeLano" <warren||delsci.com>
- Subject: CCL: Where can you publish articles on software?
- Date: Sat, 15 Oct 2005 12:09:52 -0700
Sent to CCL by: "Warren DeLano" [warren() delsci.com]
Dear Cory,
Thank you for that response, which includes excellent discussion of the
"valid personal, economic, political, legal, practical, and insitutional
reasons for not disclosing source code" to which I referred. Releasing
source code can very well make it harder for the involved parties to do
*more science*, due to distraction, giving up control, lack of proper
credit, loss of funding, risk of tenure, and so forth.
But also understand that by not releasing source code, you absolutely
deny the rest of the scientific community the opportunity of doing *more
science* using your source code. Perhaps your TCA paper would now have
1,000 citations had the code been open from the start. Who can say? It
depends.
Regardless, both sides of the above argument are pragmatic in nature and
do not address the question I again pose:
If we solely consider the standards of disclosure, verifiability, and
reproducibility inherent in the scientific method, is it not the case
that sharing source code comes much closer to meeting that ideal than
does not sharing source code?
And if so, then as good scientists, shouldn't we seek opportunities to
advance science through open source scientific code, whenever possible,
while fully taking into account pragmatic concerns like those you
described?
Cheers,
Warren
PS. Aside...
> Source code is "not" mathematical proof, just ask anyone who
> had to use a buggy Fortran compiler, say when the runtime
> "check array bounds" debugging option, didn't work with a
> character array.
Source code is a symbolic representation of an information process that
has a precise mathematical meaning, subject to the nature of the
implementation language. One can indeed prove mathematical equivalence
or non-equivalence between an abstract mathematical expression and the
mathematical meaning of a given source code implementation.
For example, take the expression "F=ma", and the source code
implementation "def F(m,a): return m*a". In Python, this
implementation is proven to be correct to the precision limits of the
double-precision numeric representation. In contrast "def F(m,a):
return m+a" is proven to be non-equivalent.
In other words, having a broken "+" key on a calculator does not
invalidate Addition any more than having a broken Fortran compiler
invalidates the mathematical validity of the Fortran program you
compile.
--
Warren L. DeLano, Ph.D.
Principal Scientist
. DeLano Scientific LLC
. 400 Oyster Point Blvd., Suite 213
. South San Francisco, CA 94080 USA
. Biz:(650)-872-0942 Tech:(650)-872-0834
. Fax:(650)-872-0273 Cell:(650)-346-1154
. mailto:warren__delsci.com
> -----Original Message-----
> From: owner-chemistry__ccl.net [mailto:owner-chemistry__ccl.net]
> Sent: Saturday, October 15, 2005 10:36 AM
> To: Warren DeLano
> Subject: CCL: Where can you publish articles on software?
>
>
> Sent to CCL by: Cory Pye [cpye[-]crux.smu.ca] On Fri, 14 Oct
> 2005, Warren DeLano warren],[delsci.com wrote:
>
> > No -- let me clarify -- I do not imply that closed-source is
> > tantamount to deception. It is simply non-disclosure -- a willful
> > holding back of pertinent helpful information. It is tantamount to
> > saying "trust me" -- I have correctly applied chemistry,
physics,
> > math, and computer science to create a working solution to
> your problem.
>
> Every time somebody publishes a paper, there is a matter of
> trust inherent between the readers, journal, and author. This
> trust entails, for example, the author declaring "I did not
> fabricate my data", "I have not already published this work
> elsewhere", and the editor declaring "This manuscript has
> undergone a rigorous peer-review process", "I did not let any
> personal connection with the author influence my views", etc.
> There is a certain amount of trust inherent in scientific
> publishing. A (complicated) program is the author's "baby",
> and it is up to the author as to whether he or she wishes to
> make it publically available. I don't think that the trust
> argument is applicable here, as it is an inevitable fact.
>
> One difficulty with making a program publically available is
> it can mutate into a non-viable form by an inexperienced
> programmer, and if the mutant happens to be widely
> circulated, then a lot of the blame ends up on the original
> programmer, who has essentially given up his "baby" for
> adoption, instead of the modifier. One can easily waste weeks
> of time addressing irate "customers"
> because of someone else's goof-up.
>
> >
> > Thorough testing of closed-source code can of course lay an
> empirical
> > foundation for extending such trust, and testing is equally
> necessary
> > with open-source code. But testing alone is not the same as
> > disclosing an implementation that can itself be subjected to direct
> > intellectual scrutiny.
> >
> > While there are valid personal, economic, political, legal,
> practical,
> > and insitutional reasons for not disclosing source code, I
> challenge
> > anyone to come up with a compelling scientific reason for
> why source
> > code should not be disclosed -- when possible -- to enable
> > understanding, reproduction, verification, and extension of
> > computational advances.
>
> Suppose a junior faculty member publishes a paper and
> publicly releases some code with his first graduate student.
> The idea becomes so popular that a senior researcher or
> company takes that open-source code, with acknowledgements,
> and incorporates it into a popular commercial program, with
> some modifications and writes 2 or 3 papers describing the
> advances. Science is advanced. Lots of people use it, but
> only quote the paper in the manual of the senior researcher.
>
> Now suppose that the advances were supposed to be the rest of
> the graduate student's Ph. D. work. This student cannot
> re-publish this work because the other ideas have been
> published already by the borrower. The junior faculty, for
> lack of sufficient publications, loses his grant, and is
> denied tenure.
> The student's thesis, being mostly unpublishable, is not accepted.
> Has science advanced at the expense of the careers of the two
> individuals?
> Are computational chemists a species known for eating their
> young? :-) Had the junior faculty member not disclosed his
> source, it would have been more difficult for this
> hypothetical travesty to happen.
>
> The sad reality is that things like this can and do happen
> and a little paranoia goes a long way. It would have been far
> better if the company had approached the junior faculty and
> come to an agreement.
>
> In order to be able to do your own science, pragmatics such
> as having tenure, a grant, etc. have to be looked at first.
> It is a lot less stressful (and the temptation to cut corners
> less i.e. thorough testing) to implement a new code, building
> on some older code, when you don't have someone breathing
> down your neck trying to outdo you. Writing and debugging
> code is a lot of effort, and you want to be rewarded for your
> efforts, either through publications, citations, or financial
> remuneration.
>
> I would like to say for the record that my experience coding
> the COSMO routine within the ADF package (97-99) has been
> absolutely wonderful, as I have found SCM to be very
> cooperative in pointing out my errors and vice versa. I am
> also grateful to the many users who have been patient when
> they hit the occasional snag, esp. Heiko Jacobsen and Michael
> Atanasov. Through dialogue the resulting code was much
> better. I will certainly celebrate with a bottle of good
> scotch when I hit 100 citations on the TCA paper describing
> our implementation either late his year or early next year.
>
> >
> > Is there ever a legitimate *purely scientific* reason for settling
> > with empirical evidence alone (just test results) when mathematical
> > proof is itself attainable (via inspection of source code)?
> I cannot
> > think of any.
> >
>
> Source code is "not" mathematical proof, just ask anyone who
> had to use a buggy Fortran compiler, say when the runtime
> "check array bounds" debugging option, didn't work with a
> character array.
>
> > Or are we all agreed that making source code available is the
> > *scientific* ideal to which we should all aspire?
> >
> > If so, then when we do not make source available, we should
> certainly
> > have some compelling non-scientific reason for holding it
> back, and as
> > honest scientists, we must realize that doing so will have
> the effect
> > of limiting the value and impact of our work -- at least from a
> > scientific standpoint. Intellectual advances are either shared or
> > lost, and software implementations are no exception to this.
> >
> > Cheers,
> > Warren
> >
> > [stuff deleted]
>
> ************* ! Dr. Cory C. Pye
> ***************** ! Associate Professor
> *** ** ** ** ! Theoretical and Computational Chemistry
> ** * **** ! Department of Chemistry, Saint Mary's
> University
> ** * * ! 923 Robie Street, Halifax, NS B3H 3C3
> ** * * ! cpye .. crux.stmarys.ca
> http://apwww.stmarys.ca/~cpye
> *** * * ** ! Ph: (902)-420-5654 FAX:(902)-496-8104
> ***************** !
> ************* ! Les Hartree-Focks (Apologies to
> Montreal Canadien Fans)
>
>
>
> -= This is automatically added to each message by the mailing
> script =- To recover the email address of the author of the
> message, please change the strange characters on the top line
> to the __ sign. You can also look up the X-Original-From: line
> in the mail header.>
> -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> -+-+-+-+-+
>
>
>
>
>
>
>