SUMMARY of SGI hardware question
- From: qfsaulo
#*at*# usc.es (Saulo Vazquez Rodriguez)
- Subject: SUMMARY of SGI hardware question
- Date: Tue, 18 Jul 1995 19:38:31 +0200
Dear netters:
Here is the collection of responses for "SGI hardware question".
Thanks
for your help.
My original question was:
We are going to buy a computer to run GAUSSIAN94. We will probably
choose one of the two following options from Silicon Graphics:
(1) A two-processor (R8000's) PowerChallenge workstation with 128MB RAM
(in total) and 6 GB hard disk.
(2) Two single-processor PowerChallenge workstations, with 128MB RAM
and 4GB hard disk each.
Does anybody known which of the two options has a better performance
for GAUSSIAN calculations?.
Thanks in advance.
Saulo A. Vazquez (qfsaulo #*at*# usc.es)
Response 1:
Hi,
If you have enough jobs to keep both processors constantly busy, it
doesn't matter so much. The two processor machine is nice since it makes
scheduling much easier; you can either queue a third job and it will start
when either one of the other jobs finish, or you can use npri to put it
at a low priority and again, it will finish when either of the others
is done.
Alternatively you can just run all 3 together and each will run at 66%
speed and again, when any one of them finishes, the other 2 will run at 100%
speed.
If you only have one job to run (especially a big one) you can set it
to run in parallel mode and use both processors that way. Due to extra
overhead in parallel, however, only run in parallel if you only have 1 job
to run. (i.e. the time for 2 jobs running serial to finish will be faster
than if they are both in parallel.) If you only have one job, however, use
the extra CPU!
Bottom line, the 2 processor machine is a LOT more flexible, as long as
you can afford the price difference.
I hope this helps!
Dan
--
Dr. Daniel L. Severance dan #*at*# sage.syntex.com
Staff Researcher Work phone: (415) 354-7509
Syntex Discovery Research Home phone: (415) 969-5818
R6W-002 Fax (Work): (415) 354-7363
3401 Hillview Ave
Palo Alto, CA 94303
Response 2:
Which option is the best depends on what you are most eager to calculate: many
smaller jobs or fewer but bigger calculations.
For the bigger jobs the Power Challenge can use several processors in parallel.
This works very well in Gaussian 92 as you can see from the enclosed numbers.
There is also the possibility to install additional processors later on. The
parallel performance is probably even better in Gaussian 94. In the latter
there is also network parallelism through the Linda parallel environment but
then you need additional software in order to run parallel computations, for
this option I have no idea about the performance.
..............................................................................
Gaussian 92 Test Job 178: TATB rhf/6-31g**//hf/6-31g**, 300 basis functions:
SGI Indigo^2, R4000: 113.4 min
SGI Challenge, 1* R4400: 74.3 min
SGI Power Challenge, 1*R8000: 15.7 min 1.0
Cray Y-MP, 1 processor: 13.2 min
SGI Power Challenge, 2*R8000: 8.9 min 1.76 * 1 CPU
SGI Power Challenge, 4*R8000: 5.5 min 2.85 * 1 CPU
-------------------------------------------------------
Values obtained from other sources:
Cray C90 8/256, 1 processor: 4.25 min
Cray C90 8/256 (incore), 1 P.: 1.5 min
IBM 590/pwr2: 17 min
IBM 390/pwr2: 41 min
90 MHz Pentium: 600 min
..............................................................................
Best Regards
/Johan Landin
___________________________________________________________________
Johan Landin Tel: +46 31 773 3767
Dept. of Medical Biochemistry Fax: +46 31 41 6108
Medicinaregatan 9 Home: +46 31 14 7554
S-413 90 Goteborg, Sweden Email: landin #*at*# mednet.gu.se
Response 3:
If one ignores cost, then the next question may well be how fast do you wish to
get a given job done. On a 2 processor machine the effective speed-up is about
1.85,,or so. You might contact Roberto Gomperts at SGI in Boston, the
"keeper"
of G94 on SGI hardware. He knows more than most about your question.
Regards,
John
--
John M. McKelvey email: mckelvey #*at*# Kodak.COM
Computational Science Laboratory phone: (716) 477-3335
2nd Floor, Bldg 83, RL
Eastman Kodak Company
Rochester, NY 14650-2216
--
Response 4:
I am afraid I cannot speak from experience of GAUSSIAN. However I would
go for the two separate workstations. This would give you two screens, more
core memory, more disk space and two separate processors anyway. If by
chance one of the machines has a fault you would most likely have the
other one working.
Yours sincerely
Peter Bladon.
Response 5:
The answer will depend on how you use Gaussian. If you are going to only
run one job at any given time, then the single machine with 2CPU's might
give you better performance, since you can do some of the work in parallel.
If you are going to be running lots of separate Gaussian runs, then you can
run two separate jobs on the two machines at the same time. Each individual
run takes longer, but a collection takes about the same time (maybe even
less since you don't have to pay for the overhead of parallization).
Most of the people here tend to run a "family" of Gaussian jobs at a
time.
(same molecule, with different basis sets, or different constraints, etc)
I would tend to favor the second option, especially if the two options are
about equal in cost.
a) I seem to recall that Gaussian is another one of those memory hogs, so
"more memory is better". (More disk is better too)
b) Two separate machines is better if (when) one dies.
c) Sharing two machines is easier.
The things that I can come up with that favor the single machine are related
to networking and administration. Essentially, it is easier to only have one
machine to troubleshoot, upgrade, and administer. If you already have other
machines that you plan to network with this (these) new machine(s), then
the extra work for the new machine(s) sort of blends in, because "it is
always hardest the first time".
-------------------------------------------------------------
("`-/")_.-'"``-._
Wendy W. Richardson, Ph.D. (. . `) -._ )-;-,_()
Sr. Research Investigator (v_,)' _ )`-.\ ``-'
Searle _;- _,-_/ / ((,'
4901 Searle Parkway ((,.-' ((,/
Skokie, IL 60077 wwrich #*at*# ddpi7.monsanto.com
Response 6:
Of the two hardware platforms you describe I would lean to the
PowerChallenge with two processors over two single processor machines.
The memory bandwidth is better than the PowerIndigo2 and G94 automatically
compiles to run in parallel, if desired, on a PowerChallenge. Gaussian 94
defaults to 32MB of memory per process which is sufficient for the majority
of calculations and so even with 2 processors 128 MB is sufficient. Also
G94 can now use your full disk by splitting its scratch files into 2GB
chunks until SGI upgrades IRIX 6 to support files larger than 2GB.
Gaussian 94 uses a shared memory parallel model on the PowerChallenge
and HF and DFT energies, gradients and frequencies run in parallel. Post-HF
calculations take some advantage of parallel but less than spectacular
at this time. There is no additional cost for this capablity.
Let us know if you have additional questions.
Doug Fox
help #*at*# gaussian.com
Response 7:
It look that (for performance of ONE g94 task) you'll receive more
high performance on 2-processor system than on 2 systems, connected
in one cluster - if the memory requirements for your task will be not
higher than you have on 2-processor system. This is due to more
bad parallelization in cluster if compare with 2-processor system.
For performance of mix of G94 jobs you must have more high
throughput on 2 independent 1-processor systems due to more high
summary memory and absense of bus competitions.
In the sense of price/performance 2-processor system must be
more attractive.
Dr.Mikhail Kuzminsky,
N.D.Zelinsky Institute of Organic Chemistry,
Moscow
Response 8:
We have a Power Challenge L and 2 Power Indigo2 workstations:
Power Challenge L 4xR8000, 512 MB, 2x8GB Gaussian scratch
directories (we implemented switching in a script)
striped across 2 Fast-Wide Differential SCSI-2
channels each with 2 4 GB disks (aggregate 40 MBs
and we sustain about 36 MBs)
Runs 2 Gaussian jobs simultaneously. Each with
MEMEORY=30 (240 MB).
Power Indigo2 96/128 MB, 2/4 GB dedicated Gaussian scratch space
on a single Fast-Wide SCSI2 disk (10 MBs and we
sustain 6-8 MBs)
These systems are constrained to running only 1
Gaussian job at a time with "MEMORY=8" (64
MB).
All Gaussian acratch disks are Seagate Baraccuda drives.
As far as raw CPU speed, the Power Indigo2 is about 0.8-0.9x the speed of a
single processor on the PowerChallenge because of smaller cache and more
limited bus bandwidth. Paralell performance is very good over 2-4
processors for HF and DFT calculations (degree of parallelization is about
0.95-0.97). Parallel performance is also good for MP2 calculations but not
as good as HF or DFT (degree of prallelization is about 0.90). However you
will find I/O to be a bigger issue particularly with Gaussian codes even
for DIRECT calculations. Minimizing I/O will be a major concern and will
dramatically impact your throughput. If you can, use striped file systems
for your Gaussian scratch directory; and it would be best to stripe across
disks on multiple SCSI channels as we do on the Power Challenge. On the
Power Indigo 2, I/O is typically 20-50% of the total job time while on the
Power Challenge it's 5-15%. Note that striping across a single SCSI
channel will only improve I/O by about 20%.
If you can...
Power Indigo 2:
Stripe across an internal (bus 0) and external (bus 1) disk on the
Power Indigo2 or use a bus extender to use external devices on both
channels. This will effectively double your I/O.
Power Challenge:
You can stripe across both internal SCSI channels but one is
differential (20 MBs) and the other isn't (10 MBs) so you'd be limited
to aggregate transfer rates of 30 MBs. Alternatively, you could
stripe across the internal differential bus and the external bus which
may be set to differential to get 40 MBs.
We chose to take another route which offers future expandability. We
installed an additional HIO card and bus extenders which provide three
additional external differential SCSI2 channels; cost about $3,000.
We then used two of these together with external differential 4 GB
disks to give us a striped array which sustains 36 MBs. We plan to
upgrade to 6 processors and stripe across 4 fast-wide SCSI-2 channels
as soon as possible.
Note that as your processors get faster I/O will become a larger fraction
of the total job time. Therefore you'll have to look to balance your
system performance to maximize throughput.
Good luck!
_______________________________________________________________________
/ \
| Comments are those of the author and not Unilever Research U. S. |
| |
| Karl F. Moschner, Ph. D. |
| |
| Unilever Research U. S. e-mail: Karl.F.Moschner #*at*# urlus.sprint.com |
| 45 River Road Phone: (201) 943-7100 x2629 |
| Edgewater, NJ 07020 FAX: (201) 943-5653 |
\_______________________________________________________________________/
Response 9:
It depends a bit on the type of jobs you are going to run. Option 1
would allow parallelization, which is done reasonably well for a
number of runtypes (in particular HF and DFT). This would mean one job
at a time.
If you want to run one job per processor, then go for option 2: it has
more RAM available for each calculation and I/O will not interfere.
Regardless of which configuration you choose: make sure that the scratch
space is spread over at least two disks, in the form of a striped device.
This greatly improves the I/O performance. For example, in the case of
option 2 this would mean 2 2Gb disks, with 1Gb of each combined into a
striped scratch partition of 2Gb total. With 1 4Gb disk, I/O performance
is significantly worse.
Best wishes,
Nico van Eikema Hommes
--
Dr. N.J.R. van Eikema Hommes Computer-Chemie-Centrum
hommes #*at*# ccc.uni-erlangen.de Universitaet Erlangen-Nuernberg
Phone: +49-(0)9131-856532 Naegelsbachstr. 25
FAX: +49-(0)9131-856566 D-91052 Erlangen, Germany