From owner-chemistry -8 at 8- ccl.net Tue Nov 18 10:46:01 2008 From: "Jozsef Csontos jcsontos.lists .. gmail.com" To: CCL Subject: CCL: mpirun failure Message-Id: <-38117-081118104055-16768-2Qergr/O4i9bYJNCuk4H3w###server.ccl.net> X-Original-From: Jozsef Csontos Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=UTF-8 Date: Tue, 18 Nov 2008 16:40:20 +0100 MIME-Version: 1.0 Sent to CCL by: Jozsef Csontos [jcsontos.lists]![gmail.com] Hi John, I believe that you use ssh for communication and you don't have passwordless connection from your node (CPU0) to your node (CPU1) in your smp machine. Below I pasted one of my earlier reply to a similar problem. > You should work on this issue. (for example: generate a public key with > "ssh-keygen -t rsa" on the master node then put the content of the > generated file into the authorized_keys file on the working nodes "cat > ~/.ssh/id_rsa.pub | ssh hostname_of_your_working_node "cat- >> > ~/.ssh/authorized_keys") However, this procedure strongly depends on > your cluster configuration. > > I hope it helps, > > Jozsef In your case the above means that, A, ssh-keygen -t rsa (just press enter 3 times) B, cp .ssh/id_rsa.pub .ssh/authorized_keys C, try it (ssh localhost - first time you got a keyring question than you're done) Or you can try to google, http://www.google.com/search?hl=en&q=passwordless+ssh&btnG=Google+Search&aq=1&oq=passwordl Good luck, Jozsef John McKelvey jmmckel^^gmail.com wrote: > Folks, > > This pgm fpi [included in the mpich-1.2.7p1 tarball, computes pi] runs > fine on 1 processor on SMP box but for 2 processors I get the below, > knowing that 127.0.0.1 means "looping back". Any > hints how to make this work most appreciated with mpich-1.2.7p1 [I > have to use this version of mpich.] > > Many thanks! > > John McKelvey > > $mpirun -np 2 fpi > connect to address 127.0.0.1 : Connection refused > Trying krb4 rsh... > connect to address 127.0.0.1 : Connection refused > trying normal rsh (/usr/bin/rsh) > localhost.localdomain: Connection refused > > p0_4885: p4_error: Child process exited while making connection to > remote process on localhost.localdomain: 0 > Interrupt > p0_4885: (33.019531) net_send: could not write to fd=4, errno = 32 > > > >