Page MenuHomePhabricator

Prepare Quarry for multiinstance wiki replicas
Closed, ResolvedPublic

Description

In the future, we have planned to move to a multi instance model for wiki replicas. This requires some changes on the client side to work, especially in Quarry, which currently connects to one database hostname and then lets you switch databases from there.

The requirements here are:

  • Set up a widget that lets you enter or select the database you wish to query.
  • Any backend changes that allow you to connect to the right domain name for the database you have selected (eg. frwiki.db.svc.analytics.eqiad.wmflabs)

Event Timeline

Bstorm moved this task from Backlog to Planning on the Quarry board.
Bstorm moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

So far, I've discovered that the local vagrant dev environment has different DB encoding than the live DB, which is interesting. I believe I need to add a column to the query_revision table in order to record the query DB. On my local, I've got a nice little field for you to type in the DB, but I need to fix the model and column then make that do things to the DB connection. If it goes well enough, I'll put up a patch.

Change 632804 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[analytics/quarry/web@master] multiinstance: Attempt to make quarry work with multiinstance replicas

https://gerrit.wikimedia.org/r/632804

Minimum working patch up. Hopefully review will improve it 😝

I think that to move this forward, we need to deploy the patch on a testing server and run it in parallel. I also wonder if branching the repo wouldn't be smart.

Change 632804 merged by jenkins-bot:
[analytics/quarry/web@master] multiinstance: Attempt to make quarry work with multiinstance replicas

https://gerrit.wikimedia.org/r/632804

Mentioned in SAL (#wikimedia-cloud) [2021-03-23T18:51:51Z] <bstorm> running the multiinstance migration script T264254

Mentioned in SAL (#wikimedia-cloud) [2021-03-23T19:17:11Z] <bstorm> finished updating quarry for multiinstance replicas T264254

@bd808 noticed that querying meta_p is currently broken by the handling of that URL in the backend. That needs patching.

Change 674427 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[analytics/quarry/web@master] multiinstance support: fix the meta_p query logic

https://gerrit.wikimedia.org/r/674427

Change 674427 merged by jenkins-bot:
[analytics/quarry/web@master] multiinstance support: fix the meta_p and centralauth query logic

https://gerrit.wikimedia.org/r/674427

Mentioned in SAL (#wikimedia-cloud) [2021-03-23T21:45:50Z] <bstorm> restarting quarry services for the meta_p and centralauth issue T264254

Ok, it's looking pretty ok now except that I wish I had a nice big popup if someone forgets the new "database" field

When I modify one of my existing queries and click 'Submit', nothing happens. When after a few minutes I press F5, I see that the changes have not been saved. In the recent queries I can see activity of others. What could I be doing wrong?

No matter which DB you enter into the DB Name Field - nothing happens, query never gets executed. Absolutely no-go right now!

Hello @Bstorm,
Please, I am having issues with Quarry; when I try to fork an existing quarry and click on "submit query", it pops up a message: "quarry.wmflabs.org says bad database name".

No matter which DB you enter into the DB Name Field - nothing happens, query never gets executed. Absolutely no-go right now!

Can you point me at an example query? I am unable to reproduce this.

When I modify one of my existing queries and click 'Submit', nothing happens. When after a few minutes I press F5, I see that the changes have not been saved. In the recent queries I can see activity of others. What could I be doing wrong?

Please point me at a query so I can try to sort out what's happening there.

Hello @Bstorm,
Please, I am having issues with Quarry; when I try to fork an existing quarry and click on "submit query", it pops up a message: "quarry.wmflabs.org says bad database name".

In this case, you need to use the new database box in the upper left to enter the name of the database you will be connecting to in your query so that Quarry can route to the right replica server.

I'm seeing that the helper text does not show up when you fork an existing query with no database, instead it says None, which is unhelpful. That was a result of trying to backfill the database of the application and it not working out so well.

I can try to sort out some better result than filling in "None".
On a "new query" that box looks like:

Screen Shot 2021-03-24 at 11.26.02 AM.png (190×662 px, 44 KB)

For an old query, it may just say "None".

Hi Bstorm, it is now working, no idea why, the whole morning it did not
work.

When I modify one of my existing queries and click 'Submit', nothing happens. When after a few minutes I press F5, I see that the changes have not been saved. In the recent queries I can see activity of others. What could I be doing wrong?

Please point me at a query so I can try to sort out what's happening there.

See query/47567. I've tried both with and without the _p suffix.

Change 674697 had a related patch set uploaded (by Bstorm; author: Bstorm):
[analytics/quarry/web@master] database field: if the field is NULL, don't fill it in with None

https://gerrit.wikimedia.org/r/674697

Change 674710 had a related patch set uploaded (by Bstorm; author: Bstorm):
[operations/puppet@production] quarry: remove the querykiller

https://gerrit.wikimedia.org/r/674710

Change 674723 had a related patch set uploaded (by Bstorm; author: Bstorm):
[analytics/quarry/web@master] worker: update some worker code for errors and connections

https://gerrit.wikimedia.org/r/674723

Hi Bstorm, it is now working, no idea why, the whole morning it did not
work.

I have a theory. I saw that the OOM killer had killed off several workers on both nodes. I think some very large queries caused issues at some point. There might be something we can do in the future to prevent that...but not on this ticket :)

When I modify one of my existing queries and click 'Submit', nothing happens. When after a few minutes I press F5, I see that the changes have not been saved. In the recent queries I can see activity of others. What could I be doing wrong?

Please point me at a query so I can try to sort out what's happening there.

See query/47567. I've tried both with and without the _p suffix.

Works for me https://quarry.wmflabs.org/query/53590

That is one of the ones where the database box ends up pre-filled with "None". You'll notice in my fork, I wrote in the database in that box. If that's what's wrong then https://gerrit.wikimedia.org/r/674697 should help clear up confusion.

When I modify one of my existing queries and click 'Submit', nothing happens. When after a few minutes I press F5, I see that the changes have not been saved. In the recent queries I can see activity of others. What could I be doing wrong?

Please point me at a query so I can try to sort out what's happening there.

See query/47567. I've tried both with and without the _p suffix.

Works for me https://quarry.wmflabs.org/query/53590

That is one of the ones where the database box ends up pre-filled with "None". You'll notice in my fork, I wrote in the database in that box. If that's what's wrong then https://gerrit.wikimedia.org/r/674697 should help clear up confusion.

I did exactly the same as you and still it didn't work, when using Opera on Windows and on Linux (which I've used with the 'old' Quarry). With Vivaldi on Linux it works, however (which I've never used with Quarry). Could it be browser related, or cookie related?

Edit: now it works for me again in Opera.

Change 674697 merged by jenkins-bot:
[analytics/quarry/web@master] database field: if the field is NULL, don't fill it in with None

https://gerrit.wikimedia.org/r/674697

Change 674723 merged by jenkins-bot:
[analytics/quarry/web@master] worker: update some worker code for errors and connections

https://gerrit.wikimedia.org/r/674723

Mentioned in SAL (#wikimedia-cloud) [2021-03-25T22:01:01Z] <bstorm> restarting web interface for a small fix for the database field display T264254

Mentioned in SAL (#wikimedia-cloud) [2021-03-25T22:02:59Z] <bstorm> restarting celery worker processes to fix connection cleanup T264254

Change 674710 merged by Bstorm:
[operations/puppet@production] quarry: remove the querykiller

https://gerrit.wikimedia.org/r/674710

Mentioned in SAL (#wikimedia-cloud) [2021-03-25T22:15:30Z] <bstorm> removing the querykiller role T264254

Change 674979 had a related patch set uploaded (by Bstorm; author: Bstorm):
[operations/puppet@production] quarry: remove the querykiller profile

https://gerrit.wikimedia.org/r/674979

Queries of the last 4 hours are not being executed.

https://quarry.wmflabs.org/query/53621 - no results displayed after running almost for an hour ....

Queries of the last 4 hours are not being executed.

Reported at T278544.

Change 674979 merged by Bstorm:
[operations/puppet@production] quarry: remove the querykiller profile

https://gerrit.wikimedia.org/r/674979

@CommanderWaterford: Hi, this task is about preparing for multi-instance replicas. For separate / other problems please report separate issues. Thanks a lot! :)

I do know very well and since there is no more stability since you are
trying to prepare Quarry for this... Quarry is not running now for almost
12 hours, no info, no feedback, nothing.- are you guys doing some kind of
beta test before those modifications!? Does not seem so.

@CommanderWaterford: If you know very well then please do not intentionally ignore. Also see https://www.mediawiki.org/wiki/Bug_management/Phabricator_etiquette for info where to bring up general questions about testing - thanks a lot.

Shoot, I suspect I might have caused that. Fixing.

Yup. A bug that I surfaced yesterday by improving cleanup code caused workers to die under some conditions (besides what they always have had as an ongoing problem for years). I fixed it now so there's better cleanup and less crashing workers under load.

Created a ticket (T278583) to finally make the web interface say so when it cannot connect to a worker. I presume that will be non-trivial since I merged several quite ancient tickets into it. Right now it just shows "running" forever and people have previously switched the status in the database manually, which is dreadful. I don't know when that will get done, but it would be nice.

Hello @Bstorm,
Please, I am having issues with Quarry; when I try to fork an existing quarry and click on "submit query", it pops up a message: "quarry.wmflabs.org says bad database name".

In this case, you need to use the new database box in the upper left to enter the name of the database you will be connecting to in your query so that Quarry can route to the right replica server.

I'm seeing that the helper text does not show up when you fork an existing query with no database, instead it says None, which is unhelpful. That was a result of trying to backfill the database of the application and it not working out so well.

I can try to sort out some better result than filling in "None".
On a "new query" that box looks like:

Screen Shot 2021-03-24 at 11.26.02 AM.png (190×662 px, 44 KB)

For an old query, it may just say "None".

Hi @Bstorm

I have tried entering databases (Wikibase, MediaWiki) and I keep getting an error message. Kindly find attached screenshots

image.png (1×1 px, 322 KB)

image.png (1×1 px, 346 KB)

I have tried entering databases (Wikibase, MediaWiki) and I keep getting an error message. Kindly find attached screenshots

Hi, those aren't database names, you can use https://db-names.toolforge.org/ to find out the database name for a wiki.

Note that currently not all available databases are accessible in Quarry, see T278715.

nskaggs subscribed.

As the old clusters are now completely offline, this can be considered closed.