CCL Home Preclinical Pharmacokinetics Service
APREDICA -- Preclinical Service: ADME, Toxicity, Pharmacokinetics
Up Directory CCL November 04, 1992 [010]
Previous Message Month index Next day

From:  Roberto Gomperts <roberto.,at,.medusa.boston.sgi.com>
Date:  Wed, 04 Nov 92 19:01:17 EST
Subject:  Re: Musings about parallelism...


Your message dated: Tue, 03 Nov 92 21:29:35 EST
 > The recent flurry of posts about parallelism prompts my $0.02.  If
 > any of this has been said here before, I'm sorry.  (I wasn't a subscriber
 > when parallelism was a topic last year!)
 >
 	I guess it is time to have my own $0.02 contribution in this
	interesting thread.
	
 > Parallelism is not *that* new for computational chemistry, though I agree
 > that it is newer than vectorization.

	Yes, parallelism came after vectorization although vectorization
	could easily be viewed as a special case of parallelism. It is
	just a matter of how you define it.
	I have often compared difficulties for a broad acceptance of
	parallelism with the early days of vectorization: initially the
	convertion of code to effectively take advantage of a particular
	hardware architecture can be seen as an unsurmountable obstacle.
	And, of course, you have the naive minds that think that a
	particular implementation of an algorithm will run well on any
	kind of machine. This leads always to frustration and the
	dismissal of interesting and good opportunities. These
	situations occurred before vector machines were popular and we
	have seen them as parallel machines evolve. But, in the same way
	as software developers and other users got use to vector codes
	(either by conversion of by writing) from scratch, we are
	already seeing more and more parallel codes. This very
	discussion thread is another indication of the growing
	acceptance and popularity of parallelism.
	It is remarkable that most of the original (shared memory) paralallel
	computers had/have vector CPUs (Alliant, Convex, Cray). Even the loosely
	coupled model in Enrico's lab (that Graham describes beneath)
	had fast "pipe-lined" processors (again something close to a
	vector machine).

 > The first use of parallelism for quantum chemistry that I'm aware of was
 > in Enrico Clementi's lab at IBM in Kingston NY in the mid 80s (see
 > IJQC Symp 18, 601 (1984) and JPC 89, 4426 (85)).  When I started a postdoc
 > there in Jan 86, both IBMOL (later KGNMOL) and HONDO ran in parallel on the
 > LCAP systems.  Each LCAP had a serial IBM "master" and 10 FPS array processo
 > r
 > "slaves" that acted in a distributed memory fashion, though later
 > developments added shared memory.  The parallel HONDO 8 referred to in
 > an earlier post here probably descends from that version, parallelised by
 > Michel Dupuis.  Incidentally this is where Roberto Gomperts (hi!) first
 > learned about parallelism when developing KGNMOL.  Many other comp chem
 > programs were parallelized for LCAP in this lab too.
 >
 
 	LCAP was a very interesting architecture. It was never meant to
	be a "true" MPP (i.e. 100's or 1000's of processors) and it did
	not have shared memory. The idea was to have a few reasonable
	powerful processor. Enrico used to say something like "it was
	better to have a cart be pulled by 10 strong horses than by 1000
	chickens".
	It turns out that for many Monte Carlo and Ab-Initio programs
	this model is very appropriate. It is not my intention to get into
	or start a "religious war" between the MIMD and SIMD sects.
	Given the right program and the right problem both architectures
	can show their strengths!
	
 > In Jan 88 I joined Hypercube (developers of HyperChem), which had been
 > founded by Neil Ostlund to write computational chemistry software for
 > distributed memory MIMD computers.  Neil's philosphy was (and still is
 > I think) that "dusty deck" FORTRAN codes do not parallelize well, and
 > he sought to start from scratch with distributed memory MIMD parallelism
 > as one of the design criteria.  At that time he already had ab initio
 > and semi-empirical prototype codes running on the Intel iPSC.  I developed
 > a parallel implementation of the AMBER molecular mechanics potential on
 > the Intel iPSC/2 (written in C) and later in 1988 ported to a ring of
 > transputers.  These semi-empirical and molecular mechanics codes designed
 > for distributed memory MIMD live on as parts of HyperChem!  Once you've
 > written for a parallel machine it's easy to run on a serial machine like
 > the PC - just set the number of nodes to 1!  For the SGI version of
 > HyperChem, parallelism is exploited by simulating the message passing
 > of distributed memory MIMD on multi-processor Irises.  This may be the
 > only parallel SGI comp chem code *not* parallelized by Roberto! ;-)
 >
 	I think that, putting aside philosophical oppinions, the
	practical thing to do to bring parallelism "to the masses" is to, in
	an initial stage, try to convert existing (serial) programs to
	run in parallel with reasonable efficiency.
	This approach has several advantages, among others:
	  1. Usually it is not too hard to do
	  2. As it has been pointed out users often are confronted with
	  the choice of speed vs throughput. In this context it is
	  imperative that:
	     a. running on 1 processor is as simple as Graham pointed
	     out above: "just set the number of nodes to 1!"
	     b. there is no signifcant loss in efficiency for the
	     parallel code running on 1 processor with respect to the
	     serial code.
	     
	I am not implying at all that new parallel algorithms should not be
	developed and implemented. I am just saying that while that is
	happenning and while there is no consensus on what the "standard"
	or "converged" parallel architecture of the futures is going to
	be, it would be a pitty not to be able to take advantage of
	parallelism TODAY.
	
	I am sorry if what follows, sounds as advertising, it is only
	intended as illustration. At SGI we are committed to do just
	that. Make parallelism available TODAY and NOW. And in different
	flavors and forms, trying to stay away from what I called before
	"religious wars": use the correct approach for the correct
	algorithm applied to the correct problem. To truly bring this "to
	the masses" we work in collaboration with the commercial and
	academic software vendors.
	  
 > BTW HyperChem's implementation of the MOPAC methods *is* parallel for
 > distributed memory MIMD computers, but we haven't yet convinced Autodesk
 > to market such a version. :-(
 >
 	I should add that SGI's implementation of Mopac (obtainable via
	QCPE) is also parallel. I must confess that it is not one of the
	best examples of an efficient parallel implementation departing
	from an existing parallel code. But I think that any researcher
	would be more than happy if he/she can obtain a result more than
	2 times faster when using 3 processors than when using 1.
	
 > It's nice to see the growing interest in and acceptance of parallelism,
 > but somewhat frustrating that we've had to wait so long! In the meantime
 > we had to make a serial PC version of our software to pay the rent! ;-)
 >
 	Why did it take it so long? Well, I guess that this is where the
	accusing finger goes to hardware vendors and to some system
	software developers. The development of tools to either convert
	serial codes to run in parallel or to develop parallel
	algorithms from scratch has been lagging behind. Again I am not
	saying that there are no tools out there (SGI certainly has
	a very neat and useful  environment for parallel development)
	but that it has not kept pace with the developments in hardware,
	both SIMD and MIMD. It has been my experience in different hardware
	companies that manufacture paralell computers, that the system
	software developers in this companies, tend to target the naive
	user, i.e. the person who will just use this "wonderful and
	magic" compiler that will take your dusty deck and make it run N
	times faster on N processors!!! (Obvioulsy marketing hype).
	While these compilers/preprocessors will do a good job on "wel
	behaved" loops (I am talking here clearly about shared memory
	machines) they have a long way to go before they can efficiently
	and correctly tackle "real world" codes. My contention is that
	the focus of the tools developers should be the applications
	software developers. We need tools to for expert or semi-expert
	users. I think that this is the right way to bring parallelism
	"to the masses" TODAY. And really, if you look at it, many of
	the users of the programs are not the ones who developed them,
	and while they might (should) have a basic understanding of the
	theoretical foundation of an algorithm or method, they have no
	interest nor time to get involved in the details of its
	implementation. Mind you, I am not talking about using a program for
	scientific research as a black box, but in practice people do
	not care how a program is vectorized as long as it doesn't throw
	your "CRAY money" away, or how it runs in parallel as long as it
	performs well when using more than 1 processor.
	
 > Someone (sorry I didn't keep the post) commended CDAN for its recent
 > articles on parallelism - in the late 80's they declined to have Neil write
 > an article on parallelism in computational chemistry because they said no
 > one was interested in parallelism!
 >
 > Should you worry about porting or redesigning for distributed memory
 > MIMD? Only if you:
 >     (a) want a single calculation done faster
 > or
 >     (b) want to tackle a larger calculation.
 > For throughput you're better off running n serial jobs on n nodes (provided
 > the jobs fit!).  You can do (a) for at least smaller numbers of nodes by
 > porting a serial code, but for a large number of nodes or (b) you probably
 > need to redesign to partition your data and hopefully keep data transfers
 > minimized, to/from near nodes, and overlapped with calculation.
 >
 	I would make the question more general and not restrict it to
	MIMD machines. As (I think it was) Joe Leonard pointed out in
	one of the first mailings of this thread, there are quite a few
	programs out there that are running in parallel on shared memory
	machines (and more are forthcoming!). In my opinion, multiprocessor
	shared memory machines offer an unique development environment
	to exploit the appropriate level of parallelism in the right
	place. Take f.e. the case of Gaussian 92. There a mixed model
	parallelism was used: a distributed memory model via the use of
	the "fork()" system call and at the same time the allocation of
	shared memory regions to avoid all the intrincancies of message
	passing algortihms. Also fine grain parallelism was exploited at
	the loop level (the "magic" compiler) and via the call to
	(shared memory) parallel routines for linear algebra operations
	like matrix multiplies.
	In other cases, given the underlying algorithms of the currently
	available commercial MM and MD programs like Charmm, Discover,
	Sybil, etc. the best parallel implementation is a shared memory
	one (sorry Graham!!). That is not to say that future
	developments would make MIMD implemetations of MM and MD codes
	efficient.
	
 > Exploiting parallelism with networked computers is a good idea that
 > was first demonstrated in the 80s.  Bob Whiteside, now at Hypercube,
 > gained some acclaim by beating a Cray with a bunch of otherwise-idle
 > networked Suns while he was at Sandia.  As well as accomplishing (a),
 > networked computers can be used effectively for (b), though most people
 > seem more excited by the potential for speedup.
 >
 	I would generalize
 > Cheers,
 >
 > Graham
 > ------------
 > Graham Hurst
 > Hypercube Inc, 7-419 Phillip St, Waterloo, Ont, Canada N2L 3X2 (519)725-4040
 > internet: hurst "at@at" hyper.com
 >
 >
 > ---
 > Administrivia: This message is automatically appended by the mail exploder.
 > CHEMISTRY (+ at +) ccl.net --- everyone      CHEMISTRY-REQUEST (+ at +)
ccl.net --- coordinato
 > r
 > OSCPOST -8 at 8- ccl.net  send help from chemistry            Anon. ftp
kekule.osc.ed
 > u
 > ---
 >

				-- Roberto


						Roberto Gomperts
						roberto-0at0-sgi.com
						phone: (508) 562 4800
						Fax:   (508) 562 4755






Similar Messages
11/05/1992:  Re: Musings about parallelism... 
11/04/1992:  Musings about parallelism...
11/05/1992:  Re: Musings about parallelism...
02/11/1993:  summary of parallel responses
06/02/1994:  Parallelism and HyperChem
06/08/1993:  undergrad computational chem
09/27/1994:  parallelization & benchmarks
02/28/1995:  Parallel Molecular Dynamics with full Coulomb interactions
02/28/1995:  Parallel Molecular Dynamics with full Coulomb interactions
01/05/1999:  Re: CCL:G:SUMMARY: Parallel G98 on x86/Linux


Raw Message Text