|
From: |
Roberto Gomperts <roberto.,at,.medusa.boston.sgi.com> |
Date: |
Wed, 04 Nov 92 19:01:17 EST |
Subject: |
Re: Musings about parallelism... |
Your message dated: Tue, 03 Nov 92 21:29:35 EST
> The recent flurry of posts about parallelism prompts my $0.02. If
> any of this has been said here before, I'm sorry. (I wasn't a subscriber
> when parallelism was a topic last year!)
>
I guess it is time to have my own $0.02 contribution in this
interesting thread.
> Parallelism is not *that* new for computational chemistry, though I agree
> that it is newer than vectorization.
Yes, parallelism came after vectorization although vectorization
could easily be viewed as a special case of parallelism. It is
just a matter of how you define it.
I have often compared difficulties for a broad acceptance of
parallelism with the early days of vectorization: initially the
convertion of code to effectively take advantage of a particular
hardware architecture can be seen as an unsurmountable obstacle.
And, of course, you have the naive minds that think that a
particular implementation of an algorithm will run well on any
kind of machine. This leads always to frustration and the
dismissal of interesting and good opportunities. These
situations occurred before vector machines were popular and we
have seen them as parallel machines evolve. But, in the same way
as software developers and other users got use to vector codes
(either by conversion of by writing) from scratch, we are
already seeing more and more parallel codes. This very
discussion thread is another indication of the growing
acceptance and popularity of parallelism.
It is remarkable that most of the original (shared memory) paralallel
computers had/have vector CPUs (Alliant, Convex, Cray). Even the loosely
coupled model in Enrico's lab (that Graham describes beneath)
had fast "pipe-lined" processors (again something close to a
vector machine).
> The first use of parallelism for quantum chemistry that I'm aware of was
> in Enrico Clementi's lab at IBM in Kingston NY in the mid 80s (see
> IJQC Symp 18, 601 (1984) and JPC 89, 4426 (85)). When I started a postdoc
> there in Jan 86, both IBMOL (later KGNMOL) and HONDO ran in parallel on the
> LCAP systems. Each LCAP had a serial IBM "master" and 10 FPS array processo
> r
> "slaves" that acted in a distributed memory fashion, though later
> developments added shared memory. The parallel HONDO 8 referred to in
> an earlier post here probably descends from that version, parallelised by
> Michel Dupuis. Incidentally this is where Roberto Gomperts (hi!) first
> learned about parallelism when developing KGNMOL. Many other comp chem
> programs were parallelized for LCAP in this lab too.
>
LCAP was a very interesting architecture. It was never meant to
be a "true" MPP (i.e. 100's or 1000's of processors) and it did
not have shared memory. The idea was to have a few reasonable
powerful processor. Enrico used to say something like "it was
better to have a cart be pulled by 10 strong horses than by 1000
chickens".
It turns out that for many Monte Carlo and Ab-Initio programs
this model is very appropriate. It is not my intention to get into
or start a "religious war" between the MIMD and SIMD sects.
Given the right program and the right problem both architectures
can show their strengths!
> In Jan 88 I joined Hypercube (developers of HyperChem), which had been
> founded by Neil Ostlund to write computational chemistry software for
> distributed memory MIMD computers. Neil's philosphy was (and still is
> I think) that "dusty deck" FORTRAN codes do not parallelize well, and
> he sought to start from scratch with distributed memory MIMD parallelism
> as one of the design criteria. At that time he already had ab initio
> and semi-empirical prototype codes running on the Intel iPSC. I developed
> a parallel implementation of the AMBER molecular mechanics potential on
> the Intel iPSC/2 (written in C) and later in 1988 ported to a ring of
> transputers. These semi-empirical and molecular mechanics codes designed
> for distributed memory MIMD live on as parts of HyperChem! Once you've
> written for a parallel machine it's easy to run on a serial machine like
> the PC - just set the number of nodes to 1! For the SGI version of
> HyperChem, parallelism is exploited by simulating the message passing
> of distributed memory MIMD on multi-processor Irises. This may be the
> only parallel SGI comp chem code *not* parallelized by Roberto! ;-)
>
I think that, putting aside philosophical oppinions, the
practical thing to do to bring parallelism "to the masses" is to, in
an initial stage, try to convert existing (serial) programs to
run in parallel with reasonable efficiency.
This approach has several advantages, among others:
1. Usually it is not too hard to do
2. As it has been pointed out users often are confronted with
the choice of speed vs throughput. In this context it is
imperative that:
a. running on 1 processor is as simple as Graham pointed
out above: "just set the number of nodes to 1!"
b. there is no signifcant loss in efficiency for the
parallel code running on 1 processor with respect to the
serial code.
I am not implying at all that new parallel algorithms should not be
developed and implemented. I am just saying that while that is
happenning and while there is no consensus on what the "standard"
or "converged" parallel architecture of the futures is going to
be, it would be a pitty not to be able to take advantage of
parallelism TODAY.
I am sorry if what follows, sounds as advertising, it is only
intended as illustration. At SGI we are committed to do just
that. Make parallelism available TODAY and NOW. And in different
flavors and forms, trying to stay away from what I called before
"religious wars": use the correct approach for the correct
algorithm applied to the correct problem. To truly bring this "to
the masses" we work in collaboration with the commercial and
academic software vendors.
> BTW HyperChem's implementation of the MOPAC methods *is* parallel for
> distributed memory MIMD computers, but we haven't yet convinced Autodesk
> to market such a version. :-(
>
I should add that SGI's implementation of Mopac (obtainable via
QCPE) is also parallel. I must confess that it is not one of the
best examples of an efficient parallel implementation departing
from an existing parallel code. But I think that any researcher
would be more than happy if he/she can obtain a result more than
2 times faster when using 3 processors than when using 1.
> It's nice to see the growing interest in and acceptance of parallelism,
> but somewhat frustrating that we've had to wait so long! In the meantime
> we had to make a serial PC version of our software to pay the rent! ;-)
>
Why did it take it so long? Well, I guess that this is where the
accusing finger goes to hardware vendors and to some system
software developers. The development of tools to either convert
serial codes to run in parallel or to develop parallel
algorithms from scratch has been lagging behind. Again I am not
saying that there are no tools out there (SGI certainly has
a very neat and useful environment for parallel development)
but that it has not kept pace with the developments in hardware,
both SIMD and MIMD. It has been my experience in different hardware
companies that manufacture paralell computers, that the system
software developers in this companies, tend to target the naive
user, i.e. the person who will just use this "wonderful and
magic" compiler that will take your dusty deck and make it run N
times faster on N processors!!! (Obvioulsy marketing hype).
While these compilers/preprocessors will do a good job on "wel
behaved" loops (I am talking here clearly about shared memory
machines) they have a long way to go before they can efficiently
and correctly tackle "real world" codes. My contention is that
the focus of the tools developers should be the applications
software developers. We need tools to for expert or semi-expert
users. I think that this is the right way to bring parallelism
"to the masses" TODAY. And really, if you look at it, many of
the users of the programs are not the ones who developed them,
and while they might (should) have a basic understanding of the
theoretical foundation of an algorithm or method, they have no
interest nor time to get involved in the details of its
implementation. Mind you, I am not talking about using a program for
scientific research as a black box, but in practice people do
not care how a program is vectorized as long as it doesn't throw
your "CRAY money" away, or how it runs in parallel as long as it
performs well when using more than 1 processor.
> Someone (sorry I didn't keep the post) commended CDAN for its recent
> articles on parallelism - in the late 80's they declined to have Neil write
> an article on parallelism in computational chemistry because they said no
> one was interested in parallelism!
>
> Should you worry about porting or redesigning for distributed memory
> MIMD? Only if you:
> (a) want a single calculation done faster
> or
> (b) want to tackle a larger calculation.
> For throughput you're better off running n serial jobs on n nodes (provided
> the jobs fit!). You can do (a) for at least smaller numbers of nodes by
> porting a serial code, but for a large number of nodes or (b) you probably
> need to redesign to partition your data and hopefully keep data transfers
> minimized, to/from near nodes, and overlapped with calculation.
>
I would make the question more general and not restrict it to
MIMD machines. As (I think it was) Joe Leonard pointed out in
one of the first mailings of this thread, there are quite a few
programs out there that are running in parallel on shared memory
machines (and more are forthcoming!). In my opinion, multiprocessor
shared memory machines offer an unique development environment
to exploit the appropriate level of parallelism in the right
place. Take f.e. the case of Gaussian 92. There a mixed model
parallelism was used: a distributed memory model via the use of
the "fork()" system call and at the same time the allocation of
shared memory regions to avoid all the intrincancies of message
passing algortihms. Also fine grain parallelism was exploited at
the loop level (the "magic" compiler) and via the call to
(shared memory) parallel routines for linear algebra operations
like matrix multiplies.
In other cases, given the underlying algorithms of the currently
available commercial MM and MD programs like Charmm, Discover,
Sybil, etc. the best parallel implementation is a shared memory
one (sorry Graham!!). That is not to say that future
developments would make MIMD implemetations of MM and MD codes
efficient.
> Exploiting parallelism with networked computers is a good idea that
> was first demonstrated in the 80s. Bob Whiteside, now at Hypercube,
> gained some acclaim by beating a Cray with a bunch of otherwise-idle
> networked Suns while he was at Sandia. As well as accomplishing (a),
> networked computers can be used effectively for (b), though most people
> seem more excited by the potential for speedup.
>
I would generalize
> Cheers,
>
> Graham
> ------------
> Graham Hurst
> Hypercube Inc, 7-419 Phillip St, Waterloo, Ont, Canada N2L 3X2 (519)725-4040
> internet: hurst "at@at" hyper.com
>
>
> ---
> Administrivia: This message is automatically appended by the mail exploder.
> CHEMISTRY (+ at +) ccl.net --- everyone CHEMISTRY-REQUEST (+ at +)
ccl.net --- coordinato
> r
> OSCPOST -8 at 8- ccl.net send help from chemistry Anon. ftp
kekule.osc.ed
> u
> ---
>
-- Roberto
Roberto Gomperts
roberto-0at0-sgi.com
phone: (508) 562 4800
Fax: (508) 562 4755
Similar Messages
11/05/1992: Re: Musings about parallelism...
11/04/1992: Musings about parallelism...
11/05/1992: Re: Musings about parallelism...
02/11/1993: summary of parallel responses
06/02/1994: Parallelism and HyperChem
06/08/1993: undergrad computational chem
09/27/1994: parallelization & benchmarks
02/28/1995: Parallel Molecular Dynamics with full Coulomb interactions
02/28/1995: Parallel Molecular Dynamics with full Coulomb interactions
01/05/1999: Re: CCL:G:SUMMARY: Parallel G98 on x86/Linux
Raw Message Text
|