Opened 11 years ago

Last modified 11 years ago

#87 reopened

Issuing reboot on GPO and Utah IG nodes results in node not available

Reported by: lnevers@bbn.com Owned by: somebody
Priority: minor Milestone:
Component: AM Version: SPIRAL5
Keywords: experimenter Cc:
Dependencies:

Description

In several scenario, both at Utah and GPO InstaGENI, issuing a restart causes for the node to not come back up:

[lnevers@top ~]$ sudo shutdown -r now
Failed to talk to init daemon.
[lnevers@top ~]$ Connection to pc2.instageni.gpolab.bbn.com closed by remote host.
Connection to pc2.instageni.gpolab.bbn.com closed.
lnevers@arendia:~/gcf-2.2-rc4$ sleep 360
lnevers@arendia:~/gcf-2.2-rc4$ ssh -p 30780  lnevers@pc2.instageni.gpolab.bbn.com
ssh: connect to host pc2.instageni.gpolab.bbn.com port 30780: Connection refused

There are currently 4 nodes that are part of the slice IG-EXP-7 and were restarted and 30 minutes later are still not accepting ssh connections:

  • pc2.instageni.gpolab.bbn.com port 30780
  • pc2.instageni.gpolab.bbn.com port 30779
  • pc3.utah.geniracks.net port 32060
  • pc3.utah.geniracks.net port 32058

After the restart, the nodes am_status changes from "ready" to "notready".

Change History (2)

comment:1 Changed 11 years ago by lnevers@bbn.com

Resolution: fixed
Status: newclosed

Jon, thanks for the detailed explanation. Since restarting the nodes can be done with the AM API V3 geni_restart or (geni_stop + geni_start). I believe this is low priority. Capturing email exchange to track exchange.

On 1/15/13 12:05 PM, Jonathon Duerig wrote:

This is a known issue. The client-side on the shared nodes does not automatically 
restart stopped vms. When rebooting nodes, you will need to use the AM or CM 
interfaces. 

On Tue, 15 Jan 2013, Luisa Nevers wrote:

Could you please elaborate on what you mean by "use the AM or CM interfaces"?

On 1/15/13 1:28 PM, Jonathon Duerig wrote:

When you shut down a vm from within the node, there is nothing in the root 
context that will restart it. There doesn't seem to be an easy way to distinguish 
between when a 'shutdown -r' and a 'shutdown' command was issued inside the vm.

Solving this will require writing a new daemon that monitors vm state, knows which
nodes should be booted, and restarting them if they shut down. Thus far, this hasn't 
been a high priority.

However, you can still start/restart nodes by talking to the aggregate manager 
directly. The CMv2 interface has commands for booting/rebooting nodes.

So does the AMv3 interface. And although, there are some calls in AMv3 that I know 
don't work properly at the moment, changing the operational state does AFAIK. To
 restart (or start a node after having shut it down from inside), you need to contact
 the aggregate manager and send it a geni_restart (or geni_start) command. 

comment:2 Changed 11 years ago by lnevers@bbn.com

Priority: majorminor
Resolution: fixed
Status: closedreopened

Did not intend to close ticket. Re-opening and lowering priority to minor to track resolution described by Jon.

Note: See TracTickets for help on using tickets.