Page MenuHomePhabricator

Run mediawiki::maintenance scripts in Beta Cluster
Open, HighPublic

Description

I was chatting with some people on -discovery about what various different servers do, and I realised that beta has no terbium equivalent... And nothing is running the maintenance scripts automatically. Is this something that should be being tested?

There are a variety of maintenance scripts in mediawiki::maintenance::* in operations/puppet that do not look to be running in the beta cluster. These are enabled in prod by applying the mediawiki::maintenance role to a specific host in manifests/site.pp. I'm not sure the right way to go about applying this. I could add an appropriate node clause to site.pp but we are not using that for any other deployment-prep machines.
This is needed because one of the new features in Discovery rebuilds the autocomplete indices from a cronjob and without it the indices will grow stale.

Event Timeline

Krenair created this task.Feb 5 2016, 2:04 PM
Krenair raised the priority of this task from to Needs Triage.
Krenair updated the task description. (Show Details)
Krenair added a subscriber: Krenair.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptFeb 5 2016, 2:04 PM

@EBernhardson: I don't know if @Krenair was talking with you or not, but I decided to ping you anyways, do you have any thoughts here re Discovery-specific maint scripts?

The conversation wasn't really discovery-specific, it just happened to take place in that channel

discovery happens to be relevant here because the completion suggester that we are pushing into production this quarter is not updated in real time (limitations of the data structure). Once this moves beyond beta feature and into production that basically means the autocomplete will not auto-update in beta unless the scripts are run.

hashar updated the task description. (Show Details)
hashar added subscribers: hashar, thcipriani.

Unfortunately I think some of these crons try to do things like use all.dblist instead of all-labs.dblist, use s*.dblist, etc.
I wonder if we should just use an entirely separate dblists directory really.

greg triaged this task as Normal priority.Mar 17 2016, 1:43 PM

As result of not having this instance for deployment-prep wikidata dispatching doesn't actually run regularly on the beta cluster.
Problems with wikidata dispatching have recently blocked the train T171370 T172394

greg added a comment.Aug 12 2017, 4:01 PM

@Addshore added mediawiki::maintenance::wikidata to deployment-tin last night: https://tools.wmflabs.org/sal/log/AV3S-ZGGwg13V6285ZLD as a "do the minimum to fix the issue at hand" step.

I think we should still apply all of mediawiki::maitenance to deployment-tin (I guess we don't need a new worker vps?) and hope that any misconfigured (for Beta Cluster) scripts won't break too badly/negatively :)

I think we should still apply all of mediawiki::maitenance to deployment-tin (I guess we don't need a new worker vps?) and hope that any misconfigured (for Beta Cluster) scripts won't break too badly/negatively :)

I think the problem will just be that some beta-specific wikis won't be covered and you'll get a ton of error messages from wikis that don't exist in beta. It won't be perfect without changing the maintenance script, which you'd likely have a lot of problems trying to fix in puppet.git

Krinkle renamed this task from Run mediawiki::maintenance scripts? to Run mediawiki::maintenance scripts in Beta Cluster.Aug 18 2017, 6:52 PM

Is this something that requires an ammount of non-trivial work? Otherwise we can list in a page somewhere in Wikitech which scripts should be regularly run and manually do so when needed. Note that CU and AF no longer stores data there so purge_(checkuser|abusefilter).pp can be left out of the list. Thanks.

Dzahn added a subscriber: Dzahn.Apr 10 2018, 8:59 PM

I suggest to create a fresh instance (that is not named after a hostname in prod but has a generic name) and apply role(mediawiki_maintenance) to it. Then you will see which errors you actually get (or not). The ones that just work you can keep and the ones that are breaking you disable in Hiera (the puppet class makes this easy, already disables all the crons on the inactive maintenance server (currently codfw). I don't think that "keep a list of what should be run manually" is going to work that well.

@Dzahn Thanks for your explanation. I agree with the naming, etc. As for "see which erros you actually get" I'm afraid I'd not be able to do so, nor disable things in Hiera as I am not a project admin for the deployment-prep project. Should we involve Release-Engineering-Team here as the primary maintainers of the site? Thanks.

MarcoAurelio updated the task description. (Show Details)EditedApr 10 2018, 9:24 PM

Also, what about deployment-maintenance with role::mediawiki_maintenance? (sorry if wrong role:: naming, puppet naming is still confusing to me)

Joe added a subscriber: Joe.May 2 2018, 12:15 PM

Is anyone working on this? If not, I guess this should be expedited to enable us to test running the maintenance scripts on php 7 in production as well, as hhvm is dog slow at running cli scripts and I see this as a priority.

@Joe I don't think anyone is working on this atm. Anyone should feel free to take on this one.

Addshore raised the priority of this task from Normal to High.Aug 30 2018, 8:12 AM

It looks like the fix for running wikidata dispatching is on more since we have new deploy servers for beta and it would look like te wikidata maintenance role was not carried over to them?

Pinging @Krenair as he created the instances.

Either we should just go ahead and make a maint server for beta now, or lets add the wikidata maint role added in T125976#3520785 back to one of the servers.

Setting to high as we really want dispatching running on beta, as do the WMF media info team.

Addshore moved this task from incoming to monitoring on the Wikidata board.

Change 462019 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[operations/puppet@production] Beta: maintenance: skip mediawiki::state function

https://gerrit.wikimedia.org/r/462019

Change 462020 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[operations/puppet@production] Beta: maintenance: no openldap management

https://gerrit.wikimedia.org/r/462020

I've got a stretch instance called deployment-mwmaint01 running in beta with role::mediawiki_maintenance. I made a couple patches to make this happen: one because we don't have conftool in beta and another because we don't have the ldap-admins group in beta (and openldap::maintenance isn't probably needed on this machine).

Note that not all scripts running on production can be run on beta (for example, the lack of CheckUser extension on Beta will make the script either to fail or running it will be pointless). I sugges we're allowed to choose which scripts to run, and under which parameters if that's not going to cause undue complications. That said, Puppet is a strange and very complicated land to understand to me so apologies if this is nonsensical. Regards.

Currently there is just one general switch to either enable all crons (scripts) or disable them all, and it's based on what is the active DC. It would be possible to have a separate switch for each script but that would be quite some overhead.

It seems like adding the missing extension on Beta would be the better solution.

and openldap::maintenance isn't probably needed on this machine

It should probably be moved to a different machine in prod too but that's a matter for a different ticket I suppose.

It seems like adding the missing extension on Beta would be the better solution.

I think it actually used to be there but @hashar got rid of it over 6 years ago for undefined reasons (https://gerrit.wikimedia.org/r/9796 - the exact code that does it got moved around later) - comment there refers to @Reedy? I wonder if we should put a commit up for review to re-enable it.

Reedy added a comment.Sep 22 2018, 5:12 PM

It seems like adding the missing extension on Beta would be the better solution.

I think it actually used to be there but @hashar got rid of it over 6 years ago for undefined reasons (https://gerrit.wikimedia.org/r/9796 - the exact code that does it got moved around later) - comment there refers to @Reedy? I wonder if we should put a commit up for review to re-enable it.

CheckUser stores PI in the form of (at least) IP addresses. And as basically anyone can get an account, anyone can look at the database and see the information.

As there's no way to neuter CheckUser to not store this, easiest answer was to just undeploy it

If someone wants to add some config to it so it doesn't always store that information.. Maybe we can redeploy it.. But feels kinda hacky

It seems like adding the missing extension on Beta would be the better solution.

I think it actually used to be there but @hashar got rid of it over 6 years ago for undefined reasons (https://gerrit.wikimedia.org/r/9796 - the exact code that does it got moved around later) - comment there refers to @Reedy? I wonder if we should put a commit up for review to re-enable it.

CheckUser stores PI in the form of (at least) IP addresses. And as basically anyone can get an account, anyone can look at the database and see the information.

Anyone that can look at the DB (anyone) can deploy code that does the same thing.

Reedy added a comment.Sep 22 2018, 5:15 PM

Sure, but it's more effort to do so. Plus then storing it somewhere, chances of it not being noticed by someone else is slim...

Maybe it's worth a discussion with legal about it, and see how they view it

It doesn't particularly matter how much effort it takes, it is possible.

It doesn't particularly matter how much effort it takes, it is possible.

It's a cost benefit analysis. Which is easier/quicker/whatever? Patching out the core functionality of the extension in PHP? Or patching puppet to put a config flat as to whether to enable the cronjob for checkuser...

That being said... We have a hook for the cu_changes table

		Hooks::run( 'CheckUserInsertForRecentChange', [ $rc, &$rcRow ] );

Use that, override the sensitive columns to '' in CommonSettings-labs.php.. Seems more sensible than a config variable to make CheckUser stop doing what it's basically supposed to do which has limited usage elsewhere....

Not sure if we need to bother about cu_log

On the other hand, if purge_checkuser detects CheckUser is not installed it will just print that the CheckUser extension is not installed and will move along. It's just a bit of logspam instead of potential privacy issues.

Reedy added a comment.Sep 23 2018, 4:53 PM

Make it do a file existence && run script

We can try, but this is puppet.git, and we may just get a CR-2.

Change 476980 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[operations/puppet@production] Beta: add mwmaint01 to mediawiki-installation

https://gerrit.wikimedia.org/r/476980

Change 476980 merged by Dzahn:
[operations/puppet@production] Beta: add mwmaint01 to mediawiki-installation

https://gerrit.wikimedia.org/r/476980

@Dzahn With the patch merged above, I assume that we have now a deployment-mwmaint01 server where to run maintenance scripts. But I assume that maintenance scripts that run in production are not yet running on beta automatically, right?

Dzahn added a comment.Dec 14 2018, 7:31 PM

@MarcoAurelio The patch means more specifically just that a host deployment-mwmaint01.deployment-prep.eqiad.wmflabs is receiving mediawiki deployments when/if scap is running in deployment-prep.

But yea, on https://tools.wmflabs.org/openstack-browser/server/deployment-mwmaint01.deployment-prep.eqiad.wmflabs we can see that host exists and is active.

And it also tells us under Puppet classes that it is using "role::mediawiki_maintenance" among other things. (btw, i want to rename that to mediawiki::maintenance to follow the other mediawiki:: structure -> https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/479131/).

The role includes `::profile::mediawiki::maintenance and inside that there is this code:

    $ensure = mediawiki::state('primary_dc') ? {
        $::site => 'present',
        default => 'absent',
    }
`

This $ensure value is then used with all the cron jobs to decide if they should be running or not.

   # Mediawiki maintenance scripts (cron jobs)
    class { 'mediawiki::maintenance::pagetriage': ensure => $ensure }
    class { 'mediawiki::maintenance::translationnotifications': ensure => $ensure }
    class { 'mediawiki::maintenance::updatetranslationstats': ensure => $ensure }
...

So for production it makes sense, since automatically the crons are either stopped or running based on what the current active_dc is.

So the question is really "what is mediawiki::state('primary_dc') in deployment-prep?".

A separate one is if each individual cron could also run in deployment-prep or not, and i don't know the answer. The way the code is written so far means that we can only have all or none running, so far.