Page MenuHomePhabricator

noc.wm.o/db.php: remove hosts information, or fetch it from etcd somehow
Closed, ResolvedPublic

Description

Now that dbctl is fully deployed, and sectionLoads/groupLoadsBySection information canonically lives there, it's time to remove those keys from the traditional location of db-$DC.php, as they are no longer being maintained by DBAs, and the information there will quickly grow stale and inaccurate.

However, this means that we'll have to either accept that https://noc.wikimedia.org/db.php will no longer display the data of which hosts have what loads in each section, or we'll have to find a way for them to access the data from etcd.

First question is: does db.php have known users that require this information?

Event Timeline

First question is: does db.php have known users that require this information?

Unless you, a DBA, or someone else in SRE answers yes, I don't think it should matter.

I do look at it from time to time as a short-cut for looking at the PHP files directly. Viewing PHP files is about as easy for the relevant audience as viewing this page. However, with it no longer being in PHP either, that potentially changes things. For one, off-hand, I genuinely don't know right now how to see this information. So if anything, I'd recommend (at least for a few months) placing a link to where to view the data now (or an explanation for how any developer can query the data).

The rest of the page (which this task is not about), I do think is useful for others. Especially the API links to show replag information from the perspective of MediaWiki, which can help when investigating during an active incident. So I'd suggest keeping at least that part of it.

First question is: does db.php have known users that require this information?

Unless you, a DBA, or someone else in SRE answers yes, I don't think it should matter.

I do look at it from time to time as a short-cut for looking at the PHP files directly. Viewing PHP files is about as easy for the relevant audience as viewing this page. However, with it no longer being in PHP either, that potentially changes things. For one, off-hand, I genuinely don't know right now how to see this information. So if anything, I'd recommend (at least for a few months) placing a link to where to view the data now (or an explanation for how any developer can query the data).

OK, thanks! This makes sense and I agree.

Right now there's no easy way to view this data without shell access to e.g. cumin1001 (and once there, invoking dbctl config get). That should be fixed; one likely mechanism is reusing https://config-master.wikimedia.org/

When such a thing exists, I will certainly add comments to db-{eqiad.codfw}.php referencing the new location, and also modify noc.wm.o/db.php to link there as well.
I should be able to get to this next week.

The rest of the page (which this task is not about), I do think is useful for others. Especially the API links to show replag information from the perspective of MediaWiki, which can help when investigating during an active incident. So I'd suggest keeping at least that part of it.

+1. FWIW the replag info exported by MW API is still accurate, of course, and can be used to determine the set of hosts involved and the master (but not weights).

From the DBA point of view, we do use db-eqiad.php (or db.php) to quickly check what is and what isn't pooled from a browser - as we have already discussed during the offsite.
Myself, personally, I don't mind whether it is db-eqiad.php, db.php or something else, but I think having a way to check this without using the CLI is very useful and sort of a must.
Another use case is during out of office hours or during weekends, if something pages, we can quickly check from the phone its section, group etc just using the browser.

CDanis moved this task from Backlog to in progress on the conftool board.

Change 528527 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] noc: fetch dbconfig from etcd to local disk

https://gerrit.wikimedia.org/r/528527

Change 528527 merged by CDanis:
[operations/puppet@production] noc: fetch dbconfig from etcd to local disk

https://gerrit.wikimedia.org/r/528527

Change 528829 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] noc fetch dbconfig: fix logging snafu

https://gerrit.wikimedia.org/r/528829

Change 528829 merged by CDanis:
[operations/puppet@production] noc fetch dbconfig: fix logging snafu

https://gerrit.wikimedia.org/r/528829

@Marostegui as of now, there is https://noc.wikimedia.org/dbconfig/eqiad.json and https://noc.wikimedia.org/dbconfig/codfw.json which are updated every minute.

I'll work on getting db.php to display the information nicely as well.

Change 528938 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/mediawiki-config@master] noc: read dbctl JSON from local disk mirror of etcd

https://gerrit.wikimedia.org/r/528938

@Marostegui as of now, there is https://noc.wikimedia.org/dbconfig/eqiad.json and https://noc.wikimedia.org/dbconfig/codfw.json which are updated every minute.

I'll work on getting db.php to display the information nicely as well.

Nice!
This is already helpful! :)
Thank you!

Change 528938 merged by jenkins-bot:
[operations/mediawiki-config@master] noc: read dbctl JSON from local disk mirror of etcd

https://gerrit.wikimedia.org/r/528938

Mentioned in SAL (#wikimedia-operations) [2019-08-20T19:00:29Z] <cdanis@deploy1001> Synchronized docroot/noc/db.php: 80a6743dd noc: read dbctl JSON T229631 (duration: 00m 58s)

https://noc.wikimedia.org/db.php will now stay up-to-date with dbctl changes.

Still need to clean up the old stale data still in eqiad/codfw.php.