Page MenuHomePhabricator

dbtree: make wasat a working backend and become active-active
Closed, DeclinedPublic

Description

terbium is a maintenance script server and backend for dbtree

wasat is supposed to be the equivalent of terbium in codfw but not yet a working backend for dbtree

it was removed as one in T162976#3186233 https://gerrit.wikimedia.org/r/#/c/348456/1/hieradata/role/common/cache/misc.yaml

after it was added when dbtree was made active/active in https://gerrit.wikimedia.org/r/#/c/346572/

make wasat a working backend and add it back above once we can.

Event Timeline

not sure if this should be tagged as traffic or not. please feel free to remove it. it just got auto-added because it copies tags when you create something as a subtask

BBlack subscribed.

Yeah, leave the traffic tag as we'll want to basically revert https://gerrit.wikimedia.org/r/#/c/348456/ once dbtree is ready for it.

status update:

nowadays terbium and wasat use the identical role and profile in site.pp, as in:

2600 # mediawiki maintenance servers (https://wikitech.wikimedia.org/wiki/Terbium)
2601 node 'terbium.eqiad.wmnet', 'wasat.codfw.wmnet' {
2602     role(mediawiki_maintenance)

So both are getting the dbtree Apache site installed. Both have /etc/apache2/sites-enabled/50-dbtree-wikimedia-org.conf.

Yet, only terbium actually works, as in:

[bast1001:~] $ curl -v --silent -H "Host:dbtree.wikimedia.org" terbium.eqiad.wmnet 2>&1| grep "<title>"
    <title>Core Databases</title>
[bast1001:~] $ curl -v --silent -H "Host:dbtree.wikimedia.org" wasat.codfw.wmnet 2>&1| grep "<title>"
[bast1001:~] $

This is what is to debug next.

reason: database connection to tendril on tendril-backend.eqiad.wmnet failed

^ It should probably not try something in .eqiad., needs "tendril-backend.codfw.wmnet", right @jynus

Mentioned in SAL (#wikimedia-operations) [2017-05-18T20:47:03Z] <mutante> wasat - git pull - bring to latest, the last changed had never been deployed here like on terbium, but it's also not a backend for dbtree yet (T163141)

We should not enable active-active on dbtree (or enable it failing, as it is the current case). Dbtree database backend is db1011, which is only on eqiad. This year the plan is to setup a second node on codfw, which would allow local queries only.

Meanwhile, enabling it would mean cross-dc traffic which we should not allow for privacy reasons.

Once a backend exist locally on both db, we can put both dbmonitor2 and wasat as active-active (where no privacy or performance concerns happen). Alternatively, if we are in a hurry (I don't think we are), we can enable TLS (which we should probably do anyway).

jcrespo changed the task status from Open to Stalled.Jun 15 2018, 5:57 PM

This is stalled because tendril cannot work with multiple db backends. We would need to setup a different backend to support it- which we wanted to do anyway (even if we reuse many of the existing schema), but it is not trivial.
Once we have a working replication, we could make it active-active as it is a read-only functionality.

Marostegui subscribed.

Closing this as we won't be really working on this anymore, but on deprecating tendril in favour of something else.
Replication won't be working anyways, tendril has too many writes per second for the slave in codfw to keep up with.
We do have a host in codfw, as a backup in case the one in eqiad breaks, so we'd need to manually switch dbmonitor to point to the codfw DB.