Page MenuHomePhabricator

Switch phabricator production to codfw
Closed, InvalidPublic

Description

To finish up with T152129: reinstall iridium (phabricator) as phab1001 with jessie, @Dzahn and I came up with this plan:

  • Switch Phabricator production to phab2001 (in codfw)
    • verify that git-ssh is working on phab2001
    • shut down phd on phab1001
    • manually rsync the git data from phab1001 to phab2001 (this is already rsync'd periodically, just need a one-time refresh)
    • verify that phd works on phab2001
    • switch phabricator_active_server to phab2001 in hiera, role/common/phabricator/main.yaml
    • test phabricator's web interface by locally overriding dns records to point to phab2001
    • update dns / redirect traffic to phab2001
  • Make phab1001 a warm standby for phab2001
    • verifying that everything is installed correctly
    • manually rsync the git repositories.
    • make sure the rsync cron job is set up correctly to sync from phab2001

Related Objects

Event Timeline

So we need a proxy set up for phab2001-vcs, I'm not sure how to test it currently.

@Dzahn tells me that we can set up temporary DNS records like phabricator-new.wikimedia.org in order to test the varnish config in codfw.

I saw we already have git-ssh in both eqiad and codfw, like so:

git-ssh.codfw.wikimedia.org has address 208.80.153.250
git-ssh.codfw.wikimedia.org has IPv6 address 2620:0:860:ed1a::3:fa

git-ssh.eqiad.wikimedia.org has address 208.80.154.250
git-ssh.eqiad.wikimedia.org has IPv6 address 2620:0:861:ed1a::3:16


git-ssh.wikimedia.org has address 208.80.154.250
git-ssh.wikimedia.org has IPv6 address 2620:0:861:ed1a::3:16

So since git-ssh without a suffix is equal to git-ssh.eqiad and git-ssh.codfw isn't used yet but also in wikimedia.org we can simply use that and don't need a temp -new name (which is what we did for Gerrit once)

Change 355869 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] LVS/phabricator: add git-ssh in codfw

https://gerrit.wikimedia.org/r/355869

mmodell changed the task status from Open to Stalled.Aug 28 2017, 7:12 AM

Change 355869 abandoned by Dzahn:
LVS/phabricator: add git-ssh in codfw

https://gerrit.wikimedia.org/r/355869

Change 389869 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/dns@master] git-ssh.wm.o: reduce to 10m TTL for failover

https://gerrit.wikimedia.org/r/389869

Change 389871 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] LVS/phabricator: add git-ssh in codfw

https://gerrit.wikimedia.org/r/389871

Change 389869 merged by BBlack:
[operations/dns@master] git-ssh.wm.o: reduce to 10m TTL for failover

https://gerrit.wikimedia.org/r/389869

Change 389871 merged by BBlack:
[operations/puppet@production] LVS/phabricator: add git-ssh in codfw

https://gerrit.wikimedia.org/r/389871

Change 389968 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] phab@codfw - add git-ssh public IPs to vcs config

https://gerrit.wikimedia.org/r/389968

Change 389968 merged by BBlack:
[operations/puppet@production] phab@codfw - add git-ssh public IPs to vcs config

https://gerrit.wikimedia.org/r/389968

mmodell lowered the priority of this task from Medium to Low.Nov 27 2017, 5:57 PM

This is probably going on the back burner while I work on Scap (Tech Debt Sprint FY201718-Q2) for the remainder of the quarter.

Moved to represent that this is still happening but not currently a priority.

Paladox raised the priority of this task from Low to Medium.Mar 23 2018, 8:26 PM

Changing priority to match the other tasks

@Marostegui: Can you comment on how we should handle cross-dc queries for phabricator? More specifically, will there be problems when we switch phabricator to run on phab2001.codfw.wmnet with the masters running in eqiad? Is this simply transparent to the application or do we need to do some configuration changes to tell phab to use different proxies?

how we should handle cross-dc queries for phabricator

You don't want to do cross-dc queries in a normal state-only in an emergency- phabricator does connection pooling, so it would be possible, but you need to configure the client driver to use TLS (something that is available for a few weeks now on misc hosts).

Other than that, if you want active-passive, the idea is to switch the traffic to go to the right (active for phabricator) datacenter, and connect only to the local mysql instance- make sure only one database is in read-write mode "active master", and make sure the replication is flowing in the right direction. How to do that depends on how much you can trust the application (e.g. you can have a master-master setup with everthing in write mode, in which everything is automatic- but if the application writes to the wrong master, you have a split brain.

If you want an active-active scenario, in which you connect to a single database server on both datacenters, the application should be ok with that (multiple app servers, distributed locking, etc.) you need TLS. There is some discussion to be had, for example, if we want to add m3-master to the certificates so that domain is validated to avoid a man in the middle attack, but that is a conversation we can have when we agree on the details. Talk to me on IRC (if possible, on April) and you can decide what is best- once you have the whole picture of possibilties.

One last thing- like gerrit, we are lacking several proxy on codfw right now (we have the eqiad ones only)- those should be bought between now and the end of the natural year. We also have data redundancy but not a 100% redundant database system yet- in this last case, we have only 1 server per misc host, and we would like to have at least 2 per datacenter. We should have the hardware for that already, but it is not yet in a production state.

@mmodell To clarify, this is blocked on a decision of what you want to do architecture-wise, and I think the best way to move forward is for us to meet at some point a discuss options (it is actually very simple but text is too verbose :-))

Dzahn changed the task status from Open to Stalled.Sep 13 2019, 7:11 PM
LSobanski subscribed.

Removing the DBA tag and subscribing myself instead. Once there are specific actions for DBA please re-add us and/or @mention me.

brennen subscribed.

Closing as 6 years out of date with reality.

Regardless the plan still seems roughly accurate and we still want to be able to switch DCs if we have to.

Are we saying we never want to do a switch?