Cluster reboot

I told colleagues to remove their data from /nfs2 to some other disks with redundancy protection this morning and right in the afternoon host “nfs2.aqua-core.tll” crashes. I suspect it should be because of huge data migration via NFS. Always OSX NFS got problems.

Anyway, I recorded here every steps I did in this to “restart” the cluster remotely without brutely restarting every machine by pushing the power button. And I finally decided to remove all my RemoteMount startup script on each host and mount all the nfs mount after the entire cluster has been fully up. Current situation is:

  • nfs.aqua-core.tll hangs and reboots itself.
  • All NFS clients of nfs2 hangs and my previous solution of applying “unmount” at program level can not help.
  • Any access to /nfs2 hangs and can not be interrupted.

Below is what I did in this situation (all shells are in adm_lsf account)

  1. Login nfs2 to check up file system with df and uptime
  2. As most the machine needs to be reboot, LSF head should be reboot first
  3. Login libra and check up if there is existing connections
  4. Arrange clients to disconnect from libra
  5. Issue sudo reboot on libra
  6. Re-login libra and check LSF with bqueues and bhosts
  7. Restart the nfs1 server with dsh -w nfs1 "sudo reboot"
  8. Once nfs1 is up, check it
  9. On libra, issue dsh -N dmz "sudo reboot" to reboot all the servers in DMZ
  10. On libra, issue dsh -N farm "sudo reboot" to reboot all the LSF nodes
  11. On libra run bin/nfs_mounts to mount all the nfs shares for the cluster
  12. On aries, startup the mysql server with "cd /opt/mysql4; sudo mysqld_safe --user=mysql&"
  13. On taurus, startup optional webservers in /opt/ensembl15 and “/opt/vega”
  14. On taurus, startup proftpd in ~adm_taur/bin/start_ftp


相关日志

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>