Page MenuHomePhabricator

dbtree: make wasat a working backend and become active-active
Open, Stalled, MediumPublic

Description

terbium is a maintenance script server and backend for dbtree

wasat is supposed to be the equivalent of terbium in codfw but not yet a working backend for dbtree

it was removed as one in T162976#3186233 https://gerrit.wikimedia.org/r/#/c/348456/1/hieradata/role/common/cache/misc.yaml

after it was added when dbtree was made active/active in https://gerrit.wikimedia.org/r/#/c/346572/

make wasat a working backend and add it back above once we can.

Event Timeline

Dzahn created this task.Apr 17 2017, 7:56 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 17 2017, 7:56 PM

not sure if this should be tagged as traffic or not. please feel free to remove it. it just got auto-added because it copies tags when you create something as a subtask

BBlack moved this task from Triage to Caching on the Traffic board.Apr 18 2017, 6:14 PM
BBlack added a subscriber: BBlack.

Yeah, leave the traffic tag as we'll want to basically revert https://gerrit.wikimedia.org/r/#/c/348456/ once dbtree is ready for it.

Dzahn added a comment.May 11 2017, 9:19 PM

status update:

nowadays terbium and wasat use the identical role and profile in site.pp, as in:

2600 # mediawiki maintenance servers (https://wikitech.wikimedia.org/wiki/Terbium)
2601 node 'terbium.eqiad.wmnet', 'wasat.codfw.wmnet' {
2602     role(mediawiki_maintenance)

So both are getting the dbtree Apache site installed. Both have /etc/apache2/sites-enabled/50-dbtree-wikimedia-org.conf.

Yet, only terbium actually works, as in:

[bast1001:~] $ curl -v --silent -H "Host:dbtree.wikimedia.org" terbium.eqiad.wmnet 2>&1| grep "<title>"
    <title>Core Databases</title>
[bast1001:~] $ curl -v --silent -H "Host:dbtree.wikimedia.org" wasat.codfw.wmnet 2>&1| grep "<title>"
[bast1001:~] $

This is what is to debug next.

Dzahn added a comment.EditedMay 11 2017, 9:22 PM

reason: database connection to tendril on tendril-backend.eqiad.wmnet failed

^ It should probably not try something in .eqiad., needs "tendril-backend.codfw.wmnet", right @jynus

MoritzMuehlenhoff triaged this task as Medium priority.May 12 2017, 12:01 PM
Dzahn claimed this task.May 12 2017, 7:59 PM

Mentioned in SAL (#wikimedia-operations) [2017-05-18T20:47:03Z] <mutante> wasat - git pull - bring to latest, the last changed had never been deployed here like on terbium, but it's also not a backend for dbtree yet (T163141)

We should not enable active-active on dbtree (or enable it failing, as it is the current case). Dbtree database backend is db1011, which is only on eqiad. This year the plan is to setup a second node on codfw, which would allow local queries only.

Meanwhile, enabling it would mean cross-dc traffic which we should not allow for privacy reasons.

Once a backend exist locally on both db, we can put both dbmonitor2 and wasat as active-active (where no privacy or performance concerns happen). Alternatively, if we are in a hurry (I don't think we are), we can enable TLS (which we should probably do anyway).

1978Gage2001 moved this task from Triage to In progress on the DBA board.Dec 11 2017, 9:45 AM
Marostegui moved this task from In progress to Triage on the DBA board.Dec 11 2017, 11:06 AM
jcrespo changed the task status from Open to Stalled.Jun 15 2018, 5:57 PM

This is stalled because tendril cannot work with multiple db backends. We would need to setup a different backend to support it- which we wanted to do anyway (even if we reuse many of the existing schema), but it is not trivial.
Once we have a working replication, we could make it active-active as it is a read-only functionality.

jcrespo moved this task from Triage to Backlog on the DBA board.Jun 15 2018, 5:58 PM
Krinkle updated the task description. (Show Details)Jun 15 2018, 6:50 PM
Krinkle added a project: Availability.