Page MenuHomePhabricator

Migrate the stat* mount from dataset1001 to labstore1006/7
Closed, ResolvedPublic

Description

Task for actual migration, need to coordinate with Analytics/Ezachte

Event Timeline

madhuvishy triaged this task as Medium priority.Mar 1 2018, 6:07 PM
madhuvishy created this task.

Change 420083 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] [WIP] statistics: Migrate dumps mount to labstore1006|7

https://gerrit.wikimedia.org/r/420083

Change 422892 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] statistics: Absent existing dumps mount at /mnt/data

https://gerrit.wikimedia.org/r/422892

Change 422896 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] statistics: Symlink /mnt/data to nfs share from active server

https://gerrit.wikimedia.org/r/422896

Change 420083 merged by Madhuvishy:
[operations/puppet@production] statistics: Mount dumps share from labstore1006|7 on stat1005|6

https://gerrit.wikimedia.org/r/420083

Change 423515 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] statistics: Add missed README for dumps nfs mount

https://gerrit.wikimedia.org/r/423515

Change 423515 merged by Madhuvishy:
[operations/puppet@production] statistics: Add missed README for dumps nfs mount

https://gerrit.wikimedia.org/r/423515

Change 422892 merged by Madhuvishy:
[operations/puppet@production] statistics: Absent existing dumps mount at /mnt/data

https://gerrit.wikimedia.org/r/422892

Change 422896 merged by Madhuvishy:
[operations/puppet@production] statistics: Symlink /mnt/data to nfs share from active server

https://gerrit.wikimedia.org/r/422896

Notes from migration plan:

  • Announcements
    • [DONE] Last minute analytics mailing list update
  • Silence monitoring -- need to check if there's anything set up on stat*
    • [DONE] Downtimed puppet run alerts for stat1005|6
  • [DONE] Mount NFS shares from labstore1006 & 7 on stat1005|6 at /mnt/nfs/labstore1006-dumps & /mnt/nfs/labstore1007-dumps
  • [NONE RUNNING] Kill any processes actively accessing /mnt/data
    • find with lsof +f -- /mnt/data
  • [DONE] Absent NFS mount at /mnt/data (served from dataset1001)
    • Disable puppet on stat1005|6
    • Merge puppet patch - https://gerrit.wikimedia.org/r/#/c/422892/
    • Apply patch on stat* and test
      • Puppet refused to remove directory /mnt/data without --force. Manually ran -- madhuvishy@stat1005:/mnt$ sudo rm -r data

Success Criteria:

  • [SUCCESS] stat1005 & 6 can sucessfully read from /mnt/data
    • head -n 1 /mnt/data/xmldatadumps/public/liwiki/latest/liwiki-latest-pages-articles.xml.bz2-rss.xml
  • Labstore1007 - load on the server is normal (monitor for a couple hours atleast) -- All good right now, nothing much is going on though

Post (if success):

Rollback

  • Kill any processes actively accessing /mnt/data
    • find with lsof +f -- /mnt/data
  • [POSSIBLY UNDO] Mount NFS shares from labstore1006 & 7 on stat1005|6 at /mnt/nfs/labstore1006-dumps & /mnt/nfs/labstore1007-dumps

This went well. Clean up task pending: Remove the dumps NFS export from dataset1001

Change 423733 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] dumps: Remove stat1005|6 from nfs clients for dataset1001

https://gerrit.wikimedia.org/r/423733

This went well. Clean up task pending: Remove the dumps NFS export from dataset1001

Patch up - https://gerrit.wikimedia.org/r/423733

Change 423733 merged by Madhuvishy:
[operations/puppet@production] dumps: Remove stat1005|6 from nfs clients for dataset1001

https://gerrit.wikimedia.org/r/423733

madhuvishy closed this task as Resolved.Apr 11 2018, 6:07 PM
bd808 moved this task from Inbox to Done on the cloud-services-team (Kanban) board.May 6 2018, 6:48 PM