CCL:G: Share of experience, software and hardware



Dear Ibrahim,
 This page may help: www.pqs-chem.com.
 Best Regards.
  
 Olasunkanmi Lukman Olawale
 ________________________________
 Current Address:
 Department of Chemistry,
 Obafemi Awolowo University,
 Ile-Ife, Osun State.
 Nigeria.
 +234-0-80-52401564 Or +234-0-80-67161091
 ________________________________
 On Monday, 9 December 2013, 15:55, Daniel Jana dfjana . gmail.com
 <owner-chemistry_+_ccl.net> wrote:
 Sent to CCL by: Daniel Jana [dfjana_-_gmail.com]
 Hello,
 Strategy 1 - I see no problem with having multiple users using the
 same computer. Of course, physically it's hard... you tend to only
 have one keyboard and screen after all. However, once you have the
 computers on the network, nothing prevents other users from connecting
 (I assume we are talking about Linux workstations) via SSH and
 launching calculations. Through a combination of job priority and
 appropriate choice of the number of cores, the machine can be used for
 regular work (checking journals on the web, reading/writing papers,
 the occasional video on youtube because no one works 100% of the time)
 while running jobs.
 The easiest way to manage the software will be having a NFS-shared
 partition with all the software installed. This means you only install
 the software in one place, rather than locally in every machine.
 In this scenario, users typically check for available workstations and
 run their jobs directly on the machines. You can always go the
 extra-mile and install a scheduler so they can submit jobs to the
 queue and it runs on the first available machine. But that may be too
 much to learn in the beginning.
 Strategy 2 - Obviously having a cluster is the ideal solution, but I'm
 not sure with that budget you'll go far. Perhaps you buy computers
 > from a regular shop and not rack-ready hardware, making it a bit
 cheaper). You will still have a lot of computers in the same room
 producing heat. You have to at least consider the possibility that
 part of that money will go to buying and installing AC. If you go with
 strategy 1 you probably will have the students spread over several
 rooms so the problem becomes less obvious. And buying a rack-ready
 cluster will also mean buying a rack. With such a small budget it may
 end up not being a negligible part of your budget.
 Concerning Amber/Gaussian... those two codes have different
 capabilities when it comes to scaling to many cores. My personal
 feeling is Gaussian scales poorly beyond 8 cores and poorly over more
 than one machine. Amber, on the other hand, should be more or less
 linearly scaling for at least a few hundred cores. This means that if
 you plan to have a cluster for Gaussian there's not much of a need for
 Infiniband, while for Amber it does make sense (because running it
 over Ethernet does impact the performance substantially. Of course,
 with a total budget of 40 kUSD, talking about Infiniband is probably a
 bit stupid.
 I'd say in the beginning strategy 1 makes more sense. You still need
 computers for the students anyway... no point having a cluster if
 users can't connect to it. You can also start learning slowly the
 tools needed to have a cluster (e.g. learning NFS in the beginning to
 share the software; later on installing NIS to manage the users
 centrally, rather than having to install all users on all
 workstations; later on installing a scheduler so that jobs can be
 automatically submitted to remote machines). It's true that it's not
 trivial to manage one, so taking baby steps is probably the best way
 to go at it. When you feel more comfortable with it, perhaps even
 having one or two students capable of dealing with all the needed
 tools that make a cluster, perhaps you could then think about
 acquiring a cluster, potentially with a few other groups so you could
 make an investment giving you a cluster with 30 or 40 nodes.
 When it comes to software: I would avoid as much purchasing
 non-scientific software. Why spend money on an OS, when Linux
 (provided you and your students have either the skills or the time to
 learn them) costs nothing and is probably the best solution? Once your
 students are accustomed to the shell, they can start working on
 scripts that make their life easier (e.g. by parsing the output files
 and extracting only the relevant bits, rather than having to do it all
 by hand). Linux and related tools will cover most of your needs (even
 if you go for a cluster, NFS, DHCP, SSH, NIS, are all readily
 available and there's plenty of information on how to get them to
 work). And if you are anyway considering a cluster, chances are you'll
 need to learn Linux anyway. At least for the cluster, you need a
 scheduler to manage the jobs of the users (although, as I mentioned
 earlier, it may even make sense with the workstations). Lately I've
 been inclined to use SLURM. Torque feels a bit abandoned, SGE split
 into so many things after Sun got bought by Oracle that I don't even
 know which version to install. I could name a few other but some of
 the ones I've tried just felt too bad to be put into production. SLURM
 is a young project, it has some quirks, but it seems a good bet for
 the near future. Compilers: in the beginning you can certainly work
 with GNU compilers (gcc, gfortran, ...), coming with Linux. Most of
 the codes you need to compile will work with those. You'll definitely
 need to install BLAS and LAPACK. Perhaps they will be available from
 the Linux distribution you choose. But it would be best to compile
 them locally, for optimal performance. FFTW 2 and 3 will also be
 important, but you'll figure that out quickly. However, on the long
 run, consider purchasing Intel compilers and MKL. The codes compiled
 with those are often faster than those compiled with GNU compilers.
 With a limited number of machines, efficiency may be the best upgrade
 you can get.
 As for vendors, I feel I cannot give you a good answer. Certainly the
 best vendor in Egypt is not the same as here (read best in whatever
 way you want, from cheapest to the one giving the best costumer
 service).
 I hope this helps,
 Daniel
 PS - Please consider a backup solution. You may go with strategy 1 for
 now, but it serves no purpose to have all those computers and risk
 losing months of work because a hard drive died. Consider buying a
 machine, with several disks and several times the capacity of the
 individual computers and automating backups of the workstations. It
 can even be the machine where you install the software, to reduce
 costs. Bonus points if you manage to have it in a separate location
 (e.g. a server room on the other side of the campus). Like this you
 avoid losing the backups and the workstations when a fire burns your
 lab or when someone steals some computers overnight. It may seem that
 you can think about this later, but from personal experience and
 anecdotal evidence, people only think about backups when it's too
 late, when you already need them.
 On 8 December 2013 20:55, Mahmoud A. A. Ibrahim
 m.ibrahim[A]compchem.net <owner-chemistry%a%ccl.net> wrote:
 >
 > Sent to CCL by: "Mahmoud A. A. Ibrahim" [m.ibrahim^compchem.net]
 > Dear Colleagues
 > We ask you kindly to share your experience with us.
 > Nowadays, we are establishing a new computational chemistry lab and aiming
 to
 > purchase some hardware.
 > The budget is not high. It is around 40,000$.
 > We have two strategies:
 > 1- Purchase good workstations with the available budget. The problem is
 that
 > only one user will use the workstation, i.e. we need a workstation per each
 > student. If there is any way to make many users to use the same workstation
 > at the same time, please share your knowledge with us and let us know.
 > 2- Purchase a small HPC which can be upgraded in the near future (just add
 > more processors and storage disks). I prefer this strategy which makes us
 > able to increase our facilities in the future very easily without getting
 red
 > off the old ones. But, we don't have a professional technicians herein at
 the
 > current time, and our colleagues say that it is not easy to manage a small
 > HPC to handle your jobs.
 > We need your experience and let us know if you were us which one you would
 > purchase (workstations or small HPC).
 > It would be nice from you if you let us know what all hardware and software
 > you need to purchase starting from operating system upto the software
 > responsible for handling the jobs and compilers. As well in case of
 purchase
 > HPC/workstations, which company you would recommend.
 > For your information, we are aiming to run Gaussian calculations and AMBER
 > simulations at the current time.
 > Finally, we thank you deeply in advance for your support.
 > Sincerely;
 > M. Ibrahim
 > P.S. we read many posts on CCL regarding the hardware but because of the
 fast
 > growing up of technology we are afraid we missed something around. We do
 > apologize for any inconvenience caused.
 > --
 > Mahmoud A. A. Ibrahim
 > Editor, Journal of Organic and Biomolecular Simulations (JOBS), Science
 > Publications
 > Group Leader, CompChem Lab, Chemistry Department,
 > Faculty of Science, Minia University, Minia 61519, Egypt.
 > Email: m.ibrahim()compchem.net
 >             m.ibrahim()mu.edu.eg
 > Website: www.compchem.net>
 >
 -= This is automatically added to each message by the mailing script =-
 To recover the email address of the author of the message, please change
 the strange characters on the top line to the _+_ sign. You can also
       http://www.ccl.net/cgi-bin/ccl/send_ccl_message
 E-mail to administrators: CHEMISTRY-REQUEST_+_ccl.net or use
       http://www.ccl.net/cgi-bin/ccl/send_ccl_message
       http://www.ccl.net/chemistry/sub_unsub.shtml
 Before posting, check wait time at: http://www.ccl.net
 Conferences: http://server.ccl.net/chemistry/announcements/conferences/
 Search Messages: http://www.ccl.net/chemistry/searchccl/index.shtml
 If your mail bounces from CCL with 5.7.1 error, check:
      
 --862858767-2088822979-1386678060=:84267
 Content-Type: text/html; charset=iso-8859-1
 Content-Transfer-Encoding: quoted-printable
 <html><body><div style="color:#000; background-color:#fff;
 font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande,
 Sans-Serif;font-size:12pt">Dear Ibrahim,<br>This page may help:
 www.pqs-chem.com.<br>Best
 Regards.<br><div><span></span></div><div>&nbsp;</div><div><span
 style="color:rgb(0, 0, 127);font-weight:bold;">Olasunkanmi Lukman
 Olawale</span><br></div><hr
 style="width:100%;height:2px;"><span
 style="font-weight:bold;">Current
 Address:</span><br><span style="color:rgb(0, 0,
 191);font-weight:bold;">Department of Chemistry,</span><br
 style="color:rgb(0, 0, 191);font-weight:bold;"><span
 style="color:rgb(0, 0, 191);font-weight:bold;">Obafemi Awolowo
 University,</span><br style="color:rgb(0, 0,
 191);font-weight:bold;"><span style="color:rgb(0, 0,
 191);font-weight:bold;">Ile-Ife, Osun State.</span><br
 style="color:rgb(0, 0, 191);font-weight:bold;"><span
 style="color:rgb(0, 0,
 191);font-weight:bold;">Nigeria.</span><br
 style="color:rgb(0,
  0, 191);font-weight:bold;"><div></div><div
 style="text-align:left;"><span style="color:rgb(0, 0,
 191);font-weight:bold;">+</span><span style="color:rgb(0,
 0, 191);">2</span><span style="color:rgb(0, 0,
 191);font-weight:bold;">34-0-80-52401564 Or
 +234-0-80-67161091</span><br><hr
 style="width:100%;height:2px;"> </div><div
 style="display: block;" class="yahoo_quoted"> <br>
 <br> <div style="font-family: HelveticaNeue, Helvetica Neue,
 Helvetica, Arial, Lucida Grande, Sans-Serif; font-size: 12pt;"> <div
 style="font-family: HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida
 Grande, Sans-Serif; font-size: 12pt;"> <div dir="ltr">
 <font face="Arial" size="2"> On Monday, 9 December
 2013, 15:55, Daniel Jana dfjana . gmail.com
 &lt;owner-chemistry_+_ccl.net&gt; wrote:<br> </font>
 </div>  <div class="y_msg_container"><br>Sent to CCL
 by: Daniel Jana [dfjana_-_gmail.com]<br>Hello,<br><br>Strategy
 1 - I see no problem with having multiple users using the<br>same
  computer. Of course, physically it's hard... you tend to only<br>have one
 keyboard and screen after all. However, once you have the<br>computers on
 the network, nothing prevents other users from connecting<br>(I assume we
 are talking about Linux workstations) via SSH and<br>launching
 calculations. Through a combination of job priority and<br>appropriate
 choice of the number of cores, the machine can be used for<br>regular work
 (checking journals on the web, reading/writing papers,<br>the occasional
 video on youtube because no one works 100% of the time)<br>while running
 jobs.<br>The easiest way to manage the software will be having a
 NFS-shared<br>partition with all the software installed. This means you
 only install<br>the software in one place, rather than locally in every
 machine.<br>In this scenario, users typically check for available
 workstations and<br>run their jobs directly on the machines. You can
 always go the<br>extra-mile and install a
  scheduler so they can submit jobs to the<br>queue and it runs on the
 first available machine. But that may be too<br>much to learn in the
 beginning.<br><br>Strategy 2 - Obviously having a cluster is the
 ideal solution, but I'm<br>not sure with that budget you'll go far.
 Perhaps you buy computers<br>&gt; from a regular shop and not
 rack-ready hardware, making it a bit<br>cheaper). You will still have a
 lot of computers in the same room<br>producing heat. You have to at least
 consider the possibility that<br>part of that money will go to buying and
 installing AC. If you go with<br>strategy 1 you probably will have the
 students spread over several<br>rooms so the problem becomes less obvious.
 And buying a rack-ready<br>cluster will also mean buying a rack. With such
 a small budget it may<br>end up not being a negligible part of your
 budget.<br>Concerning Amber/Gaussian... those two codes have
 different<br>capabilities when it comes to scaling to many
  cores. My personal<br>feeling is Gaussian scales poorly beyond 8 cores
 and poorly over more<br>than one machine. Amber, on the other hand, should
 be more or less<br>linearly scaling for at least a few hundred cores. This
 means that if<br>you plan to have a cluster for Gaussian there's not much
 of a need for<br>Infiniband, while for Amber it does make sense (because
 running it<br>over Ethernet does impact the performance substantially. Of
 course,<br>with a total budget of 40 kUSD, talking about Infiniband is
 probably a<br>bit stupid.<br><br>I'd say in the beginning
 strategy 1 makes more sense. You still need<br>computers for the students
 anyway... no point having a cluster if<br>users can't connect to it. You
 can also start learning slowly the<br>tools needed to have a cluster (e.g.
 learning NFS in the beginning to<br>share the software; later on
 installing NIS to manage the users<br>centrally, rather than having to
 install all users on
  all<br>workstations; later on installing a scheduler so that jobs can
 be<br>automatically submitted to remote machines). It's true that it's
 not<br>trivial to manage one, so taking baby steps is probably the best
 way<br>to go at it. When you feel more comfortable with it, perhaps
 even<br>having one or two students capable of dealing with all the
 needed<br>tools that make a cluster, perhaps you could then think
 about<br>acquiring a cluster, potentially with a few other groups so you
 could<br>make an investment giving you a cluster with 30 or 40
 nodes.<br><br>When it comes to software: I would avoid as much
 purchasing<br>non-scientific software. Why spend money on an OS, when
 Linux<br>(provided you and your students have either the skills or the
 time to<br>learn them) costs nothing and is probably the best solution?
 Once your<br>students are accustomed to the shell, they can start working
 on<br>scripts that make their life easier (e.g. by parsing the
  output files<br>and extracting only the relevant bits, rather than having
 to do it all<br>by hand). Linux and related tools will cover most of your
 needs (even<br>if you go for a cluster, NFS, DHCP, SSH, NIS, are all
 readily<br>available and there's plenty of information on how to get them
 to<br>work). And if you are anyway considering a cluster, chances are
 you'll<br>need to learn Linux anyway. At least for the cluster, you need
 a<br>scheduler to manage the jobs of the users (although, as I
 mentioned<br>earlier, it may even make sense with the workstations).
 Lately I've<br>been inclined to use SLURM. Torque feels a bit abandoned,
 SGE split<br>into so many things after Sun got bought by Oracle that I
 don't even<br>know which version to install. I could name a few other but
 some of<br>the ones I've tried just felt too bad to be put into
 production. SLURM<br>is a young project, it has some quirks, but it seems
 a good bet for<br>the near future. Compilers:
  in the beginning you can certainly work<br>with GNU compilers (gcc,
 gfortran, ...), coming with Linux. Most of<br>the codes you need to
 compile will work with those. You'll definitely<br>need to install BLAS
 and LAPACK. Perhaps they will be available from<br>the Linux distribution
 you choose. But it would be best to compile<br>them locally, for optimal
 performance. FFTW 2 and 3 will also be<br>important, but you'll figure
 that out quickly. However, on the long<br>run, consider purchasing Intel
 compilers and MKL. The codes compiled<br>with those are often faster than
 those compiled with GNU compilers.<br>With a limited number of machines,
 efficiency may be the best upgrade<br>you can get.<br><br>As
 for vendors, I feel I cannot give you a good answer. Certainly the<br>best
 vendor in Egypt is not the same as here (read best in whatever<br>way you
 want, from cheapest to the one giving the best
 costumer<br>service).<br><br>I hope this
  helps,<br>Daniel<br><br>PS - Please consider a backup
 solution. You may go with strategy 1 for<br>now, but it serves no purpose
 to have all those computers and risk<br>losing months of work because a
 hard drive died. Consider buying a<br>machine, with several disks and
 several times the capacity of the<br>individual computers and automating
 backups of the workstations. It<br>can even be the machine where you
 install the software, to reduce<br>costs. Bonus points if you manage to
 have it in a separate location<br>(e.g. a server room on the other side of
 the campus). Like this you<br>avoid losing the backups and the
 workstations when a fire burns your<br>lab or when someone steals some
 computers overnight. It may seem that<br>you can think about this later,
 but from personal experience and<br>anecdotal evidence, people only think
 about backups when it's too<br>late, when you already need
 them.<br><br>On 8 December 2013 20:55, Mahmoud A. A.
  Ibrahim<br>m.ibrahim[A]compchem.net
 &lt;owner-chemistry%a%ccl.net&gt;
 wrote:<br>&gt;<br>&gt; Sent to CCL by: "Mahmoud A. A.
 Ibrahim" [m.ibrahim^compchem.net]<br>&gt; Dear
 Colleagues<br>&gt; We ask you kindly to share your experience with
 us.<br>&gt; Nowadays, we are establishing a new computational
 chemistry lab and aiming to<br>&gt; purchase some
 hardware.<br>&gt; The budget is not high. It is around
 40,000$.<br>&gt; We have two strategies:<br>&gt; 1- Purchase
 good workstations with the available budget. The problem is
 that<br>&gt; only one user will use the workstation, i.e. we need a
 workstation per each<br>&gt; student. If there is any way to make many
 users to use the same workstation<br>&gt; at the same time, please
 share your knowledge with us and let us know.<br>&gt; 2- Purchase a
 small HPC which can be upgraded in the near future (just add<br>&gt;
 more processors and storage disks). I prefer this strategy which makes
 us<br>&gt; able to increase
  our facilities in the future very easily without getting red<br>&gt;
 off the old ones. But, we don't have a professional technicians herein at
 the<br>&gt; current time, and our colleagues say that it is not easy
 to manage a small<br>&gt; HPC to handle your jobs.<br>&gt;
 We need your experience and let us know if you were us which one you
 would<br>&gt; purchase (workstations or small HPC).<br>&gt;
 It would be nice from you if you let us know what all hardware and
 software<br>&gt; you need to purchase starting from operating system
 upto the software<br>&gt; responsible for handling the jobs and
 compilers. As well in case of purchase<br>&gt; HPC/workstations, which
 company you would recommend.<br>&gt; For your information, we are
 aiming to run Gaussian calculations and AMBER<br>&gt; simulations at
 the current time.<br>&gt; Finally, we thank you deeply in advance for
 your support.<br>&gt; Sincerely;<br>&gt; M.
 Ibrahim<br>&gt; P.S. we read many posts on CCL
  regarding the hardware but because of the fast<br>&gt; growing up of
 technology we are afraid we missed something around. We do<br>&gt;
 apologize for any inconvenience caused.<br>&gt; --<br>&gt;
 Mahmoud A. A. Ibrahim<br>&gt; Editor, Journal of Organic and
 Biomolecular Simulations (JOBS), Science<br>&gt;
 Publications<br>&gt; Group Leader, CompChem Lab, Chemistry
 Department,<br>&gt; Faculty of Science, Minia University, Minia 61519,
 Egypt.<br>&gt; Email:
 m.ibrahim()compchem.net<br>&gt;&nbsp; &nbsp; &nbsp;
 &nbsp; &nbsp; &nbsp;  m.ibrahim()mu.edu.eg<br>&gt;
 Website:
 www.compchem.net&gt;<br>&gt;<br><br><br><br>-=
 This is automatically added to each message by the mailing script =-<br>To
 recover the email address of the author of the message, please
 change<br>the strange characters on the top line to the _+_ sign. You can
 also<br<br><br>E-mail to subscribers: <a ymailto="mailto:CHEMISTRY_+_ccl.net";
  href="mailto:CHEMISTRY_+_ccl.net";>CHEMISTRY_+_ccl.net</a> or
 use:<br>&nbsp; &nbsp; &nbsp; <a href="http://www.ccl.net/cgi-bin/ccl/send_ccl_message";
 target="_blank">http://www.ccl.net/cgi-bin/ccl/send_ccl_message</a><br><br>E-mail
 to administrators: <a ymailto="mailto:CHEMISTRY-REQUEST_+_ccl.net"; href="mailto:CHEMISTRY-REQUEST_+_ccl.net";>CHEMISTRY-REQUEST_+_ccl.net</a>
 or use<br>&nbsp; &nbsp; &nbsp; <a href="http://www.ccl.net/cgi-bin/ccl/send_ccl_message";
 target="_blank">http://www.ccl.net/cgi-bin/ccl/send_ccl_message</a><br><br<br>&nbsp;
 &nbsp; &nbsp; <a href="http://www.ccl.net/chemistry/sub_unsub.shtml";
 target="_blank">http://www.ccl.net/chemistry/sub_unsub.shtml</a><br><br>Before
 posting, check wait time at: <a href="http://www.ccl.net/";
 target="_blank">http://www.ccl.net</a><br><br>Job: <a
 href="http://www.ccl.net/jobs";
 target="_blank">http://www.ccl.net/jobs </a><br>Conferences:
 <a
  href="http://server.ccl.net/chemistry/announcements/conferences/";
 target="_blank">http://server.ccl.net/chemistry/announcements/conferences/</a><br><br>Search
 Messages: <a href="http://www.ccl.net/chemistry/searchccl/index.shtml";
 target="_blank">http://www.ccl.net/chemistry/searchccl/index.shtml</a><br><br<br>&nbsp;
 &nbsp; &nbsp; <a href="http://www.ccl.net/spammers.txt"; target="_blank">http://www.ccl.net/spammers.txt</a><br><br>RTFI: <a
 href="http://www.ccl.net/chemistry/aboutccl/instructions/";
 target="_blank">http://www.ccl.net/chemistry/aboutccl/instructions/</a><br><br><br><br><br></div>
 </div> </div>  </div>
 </div></body></html>