Page MenuHomePhabricator

automate data syncing between phabricator servers
Closed, ResolvedPublic

Description

Currently a Phabricator server has 3 rsync modules to sync data. One each for these pathes:

/srv/dump

/srv/homes

/srv/repos

https://gerrit.wikimedia.org/r/c/operations/puppet/+/984811/2/modules/profile/manifests/phabricator/main.pp#600

But we only use rsync::module and not quickdatacopy or a timer to automate the sync.

This means it allows for manual syncing for server migrations but it doesn't happen all the time in the background.

Should we add those timers to sync all the time? If so.. convert to rsync::quickdatacopy?

Or are we ok with just Bacula for backups?

Event Timeline

Change 988107 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator: move data syncing related code to separate profile

https://gerrit.wikimedia.org/r/988107

Change 988111 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator: use quickdatacopy for automatic home dir sync

https://gerrit.wikimedia.org/r/988111

Change 988112 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator: avoid duplicate list of server names in Hiera

https://gerrit.wikimedia.org/r/988112

Change 988113 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator: add key for secondary server and create combined list

https://gerrit.wikimedia.org/r/988113

Jelto triaged this task as Medium priority.

Change 988107 merged by Dzahn:

[operations/puppet@production] phabricator: move data syncing related code to separate profile

https://gerrit.wikimedia.org/r/988107

Change 989240 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator: avoid duplicate list of servers in Hiera

https://gerrit.wikimedia.org/r/989240

Dzahn renamed this task from automate data syncing between phabricator servers? to automate data syncing between phabricator servers.Jan 9 2024, 8:04 PM

Change 988113 abandoned by Dzahn:

[operations/puppet@production] phabricator: add key for secondary server and create combined list

Reason:

replaced by https://gerrit.wikimedia.org/r/c/operations/puppet/+/989240

https://gerrit.wikimedia.org/r/988113

Dzahn changed the task status from Open to In Progress.Jan 9 2024, 9:24 PM
Dzahn raised the priority of this task from Medium to High.

phab server switch over is going to be scheduled for Jan 20th and this should be done way before then

Change 989540 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator: avoid duplicate lists of servers in migration class

https://gerrit.wikimedia.org/r/989540

Change 989240 merged by Dzahn:

[operations/puppet@production] phabricator: avoid duplicate lists of servers in Hiera

https://gerrit.wikimedia.org/r/989240

Change 989540 merged by Dzahn:

[operations/puppet@production] phabricator: avoid duplicate lists of servers in migration class

https://gerrit.wikimedia.org/r/989540

Change 988112 merged by Dzahn:

[operations/puppet@production] phabricator: avoid duplicate list of servers for dumps

https://gerrit.wikimedia.org/r/988112

Change 988111 merged by Dzahn:

[operations/puppet@production] phabricator: use quickdatacopy for automatic home dir sync

https://gerrit.wikimedia.org/r/988111

Change 990247 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator: auto-sync /srv/repos between servers

https://gerrit.wikimedia.org/r/990247

Change 990250 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator: add script/timer to create tarballs of home dirs

https://gerrit.wikimedia.org/r/990250

Change 990247 merged by Dzahn:

[operations/puppet@production] phabricator: auto-sync /srv/repos between servers

https://gerrit.wikimedia.org/r/990247

Mentioned in SAL (#wikimedia-operations) [2024-01-16T18:42:48Z] <mutante> phab2002 - pulling repo data from phab1004 by running sync script created by rsync::quickdatacopy after gerrit:990247 T354221

Change 990250 merged by Dzahn:

[operations/puppet@production] phabricator: add script/timer to create tarballs of home dirs

https://gerrit.wikimedia.org/r/990250

Currently a Phabricator server has 3 rsync modules to sync data. One each for these pathes:

  • /srv/dump - code not replaced by rsync::quickdatacopy, this is for dumps servers pulling from us. But simplified config and got rid of duplicate list of phab servers
  • /srv/homes - replaced code with rsync::quickdatacopy and using the auto sync feature of it to create a unit and timer that automates it, tested, additionally also got rid of duplicate list of phab servers in Hiera
  • /srv/repos - replaced code with rsync::quickdatacopy and using the auto sync feature of it to create a unit and timer that automates it, tested, additionally also got rid of duplicate list of phab servers in Hiera

cc: @eoghan @brennen @Jelto

now we have this new setup for Phabricator servers:

on the active host, rsyncd listens:

[phab1004:~] $ sudo grep -E '(path|hosts)' /etc/rsyncd.conf 
path            = /srv/homes
hosts allow = phab2002.codfw.wmnet localhost
path            = /srv/repos
hosts allow = phab2002.codfw.wmnet localhost
path            = /srv/dumps
hosts allow = phab1004.eqiad.wmnet phab2002.codfw.wmnet clouddumps1001.wikimedia.org clouddumps1002.wikimedia.org localhost

on the passive host, timers are waiting and pulling from active host:

[phab2002:~] $ sudo systemctl list-units | grep rsync
  rsync-phabricator-repos.service                                                          loaded activating start     start Transfer data periodically between hosts
  rsync.service                                                                            loaded active     running         fast remote file copy program daemon
  rsync-phabricator-home-dirs.timer                                                        loaded active     waiting         Periodic execution of rsync-phabricator-home-dirs.service
  rsync-phabricator-repos.timer                                                            loaded active     running         Periodic execution of rsync-phabricator-repos.service

For syncing the home dirs this is a 2-step process.

First a script runs that creates tarballs from /home dirs and copies them to /srv/homes. This only runs on the active server and is a service and timer called backup-home-dirs.service.

[phab1004:~] $ grep Exec /lib/systemd/system/backup-home-dirs.service 
ExecStart=/usr/local/bin/backup-home-dirs

[phab1004:~] $ cat /usr/local/bin/backup-home-dirs 
#!/bin/bash
for user in $(ls /home); do
tar czfv /srv/homes/${user}-$(hostname -s).tar.gz /home/${user}; done

And then the rsync-phabricator-home-dirs timer/service copies /srv/homes between servers.

[phab2002:~] $ cat /usr/local/sbin/sync-phabricator-home-dirs 
#!/bin/sh
/usr/bin/rsync  --delete -a    rsync://phab1004.eqiad.wmnet/phabricator-home-dirs /srv/homes/

Also we now have only 1 central place where Phabricator servers are defined, as opposed to multiple lists:

~/repos/puppet$ grep phabricator_ hieradata/common.yaml 

phabricator_active_server: phab1004.eqiad.wmnet
phabricator_passive_server: phab2002.codfw.wmnet

Lists like "all phab servers" and which is rsync source and which is destination are built from these values alone now.

At the same time "active_server" still decides where the phd service runs, which firewall rules are applied, which mysql host is used etc as it already was before.

Change 991651 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] dumps: replace hardcoded phab server name with a lookup

https://gerrit.wikimedia.org/r/991651

Change 991651 merged by Dzahn:

[operations/puppet@production] dumps: replace hardcoded phab server name with a lookup

https://gerrit.wikimedia.org/r/991651