Page MenuHomePhabricator

Lag in updating Special Pages?
Closed, ResolvedPublicBUG REPORT

Description

Not sure if this is a bug report but the Special Pages lists I use typically update on the 1st of the month and every 3 days after that. Well, we're almost into May 2nd (UTC) and they still haven't updated so I thought I'd post an inquiry. It's my first time on Phab so sorry if this isn't how things are done here.

I'd love it if these lists could be updated daily but I've been told that is unlikely. But I thought I'd throw that suggestion out there.

That's my only query....thanks for any help you can provide.

What should have happened instead?:
The lists on the pages should have updated on May 1st.

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc.:

Event Timeline

Liz reopened this task as Open.EditedMay 28 2022, 8:11 PM

Looks like this special pages posting lag is happening again, on En.Wiki, special pages involving categories typically, for years, updated around 13:00-15:00 UTC time every 3 days but on May 25th, they updated at 19:00 UTC and today, they haven't updated at all. I'd prefer that they "catch up" and post sometime today rather than skipping a date and just updating again on June 1st as they did at the beginning of May when this lag was first reported. A late update is better than omitting an update!

Also, I noticed in the last update for unused categories that the system omitted some empty maintenance categories that are helpful to see, in advance. I'm not sure why that was done. Thanks!

This happened again today. The special pages involving categories eventually updated at around 00:30 UTC on May 29th but so far, no updates today on May 31st and they usually also update on the 1st of every month so the reports should have been issued both today and also tomorrow....so we'll see.

If anyone who has control over this procedure is paying attention to this ticket, the 13:00-15:00 UTC time slot that the reports used to be issued was very convenient to the people on en.wiki who work with the reports so it would be awesome if whenever this problem with Special Pages is fixed, they would return to being generated at that time of the day. Thank you.

Update: Okay, so minutes after I posted this, someone flipped a switch or ran a program and the reports were issued. So maybe they don't happen automatically and they need a human being to request them. So, thanks for the reports today, late is better than not at all, but it would be ideal if we could go back to the regular schedule which until recently, was very reliable.

Maybe calling for these reports this was a duty of a staff person who has moved on to a different job or other responsibilities. If someone could add it to their morning to-do list every 3 days (could one ever hope for daily reports?!) that would superb! Thanks again.

Just a reminder that today is June 1st and the Special Pages, at least the ones we utilize on En.Wiki, typically get generated on the 1st of the month...and then the 4th, 7th, 10th, 13th, 16th, 19th, 22nd, 25th, 28th and sometimes the 31st. If they could get back to being generated on their old, regular schedule, that would be great!

Just thought I'd add a little reminder to this Phab ticket. Many thanks!

taavi subscribed.

The update process is fully automatic. It's configured to start every 1st, 4th, 7th, ... of every month at 5AM UTC, if the previous run has finished by that time (this mostly happens on the first of the month if the previous month had 31 days). The process takes a while for all 923 wikis, for example the run today (2022-06-01) was skipped since the one started on 2022-05-31 is still working (it's processing sawikisource as I write this). There's no way to guarantee the update for a specific wiki to happen at a specific time, since various factors like system load and database maintenance can affect the runtime of the entire process by as much as several hours. There haven't been any recent changes to this setup as far as I can see.

Increasing the update frequency by changing it to process one wiki from each section at the same time doesn't seem to be an option either, since there are special pages related to Commons and Wikidata that would then risk overloading those individual wikis.

Removing WMF-JobQueue since that's a fully separate system. Closing as 'invalid' since the system is working as intended and no actual change was made to resolve this task.

In T307314#7974768, @Majavah wrote:

Increasing the update frequency by changing it to process one wiki from each section at the same time doesn't seem to be an option either, since there are special pages related to Commons and Wikidata that would then risk overloading those individual wikis.

If it's not reliably finishing in time, I think we should be aiming to parallelize it by shard. Maybe for the main jobs we can exclude the Commons/Wikidata special pages, and then run those as part of the s4/s8 jobs?

1mlegoktm@mwmaint1002:~$ mwscript updateSpecialPages.php --wiki=enwiki --list
2Statistics [callback]
3ValidationStatistics [callback]
4Ancientpages [QueryPage]
5BrokenRedirects [QueryPage]
6Deadendpages [QueryPage]
7DoubleRedirects [QueryPage]
8ListDuplicatedFiles [QueryPage]
9LinkSearch [QueryPage]
10Listredirects [QueryPage]
11Lonelypages [QueryPage]
12Longpages [QueryPage]
13MediaStatistics [QueryPage]
14MIMEsearch [QueryPage]
15Mostcategories [QueryPage]
16Mostimages [QueryPage]
17Mostinterwikis [QueryPage]
18Mostlinkedcategories [QueryPage]
19Mostlinkedtemplates [QueryPage]
20Mostlinked [QueryPage]
21Mostrevisions [QueryPage]
22Fewestrevisions [QueryPage]
23Shortpages [QueryPage]
24Uncategorizedcategories [QueryPage]
25Uncategorizedpages [QueryPage]
26Uncategorizedimages [QueryPage]
27Uncategorizedtemplates [QueryPage]
28Unusedcategories [QueryPage]
29Unusedimages [QueryPage]
30Wantedcategories [QueryPage]
31Wantedfiles [QueryPage]
32Wantedpages [QueryPage]
33Wantedtemplates [QueryPage]
34Unwatchedpages [QueryPage]
35Unusedtemplates [QueryPage]
36Withoutinterwiki [QueryPage]
37MostGloballyLinkedFiles [QueryPage]
38GloballyWantedFiles [QueryPage]
39DisambiguationPages [QueryPage]
40DisambiguationPageLinks [QueryPage]
41GadgetUsage [QueryPage]
42OrphanedTimedText [QueryPage]
43UnconnectedPages [QueryPage]

Out of those, the Commons ones are MostGloballyLinkedFiles and GloballyWantedFiles, but those are specially written to only execute on Commons, so they get skipped on all other wikis. Then there's UnconnectedPages for Wikidata, except it is uncached so it doesn't get executed either. So I think we can safely parallelize this by shard.

Change 804788 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/puppet@production] mediawiki: Split updateSpecialPages.php job to be per-shard

https://gerrit.wikimedia.org/r/804788

With very aggressive rounding up to nearest minute, the enwiki run takes 42 minutes. commonswiki is 8.8 hours, wikidatawiki is ~50 minutes. I'm tagging DBA for a heads-up and review, since this is mostly going to have an impact on databases.

If this per-shard split goes well, I'd like to increase the frequency from 3 days to 2 days and if that goes well, to daily.

@Legoktm this queries would go to the replicas on the slow section for MW right?

Marostegui triaged this task as Medium priority.Jun 13 2022, 5:16 AM
Marostegui moved this task from Triage to In progress on the DBA board.

@Legoktm this queries would go to the replicas on the slow section for MW right?

Yes. Specifically,

	/**
	 * Get a DB connection to be used for slow recache queries
	 * @stable to override
	 * @return IDatabase
	 */
	protected function getRecacheDB() {
		return $this->getDBLoadBalancer()
			->getConnectionRef( ILoadBalancer::DB_REPLICA, [ $this->getName(), 'QueryPage::recache', 'vslow' ] );
	}

(And the one place I saw it overridden in production-deployed code correctly passed 'vslow' as well.)

That's great then. I have lost track of which scripts were changed to reload the config (T298485), is this one of the ones that got that implemented?

My quick skim of that ticket is that reloading config hasn't been implemented in MW yet, unless it was in a patch not linked on that ticket.

Maybe we can create an specific task for that implementation so we don't hijack this one :)

I have been told, probably repeatedly, that Special Pages are generated automatically every 3 days but the ones I regularly use https://en.wikipedia.org/wiki/Special:UnusedCategories and https://en.wikipedia.org/wiki/Special:WantedCategories, were supposed to update some time on August 22nd and didn't.

I'm not sure where to report this so I came back here. Maybe someone can tell me a reason why this happens and when this problem might be resolved by the system.

The automatic update is a systemd timer (formerly 'cron job') that is running on the mw maintenance server.

mediawiki_job_update_special_pages.service is triggered by mediawiki_job_update_special_pages.timer

Checking the status of the service I see this:

Active: active (running) since Fri 2022-08-19 05:00:00 UTC; 4 days ago

And in the process list I can see multiple processes with "updateSpecialPages" in their names. Those are still running since they started on the 19th.


The config for _when_ it runs is *-1/3 05:00 which translates to:

Normalized form: *-*-01/3 05:00:00
    Next elapse: Thu 2022-08-25 05:00:00 UTC
       From now: 1 day 9h left

The underlying problem is that these scripts when getting updated do extremely expensive queries, like full table scans on tables with billions of rows. It will be slightly better after normalization of these tables but clearly mysql is not built to handle queries like this. The reporting need to come from hadoop that is built and architectured towards such large-scale querying. I already created the ticket for it T309738: Move MediaWiki QueryPages computation to Hadoop and if people get enough resources for building the API from AQS or anything like that, I'll do the mediawiki part. Afterwards we can even increase the frequency of the updates and it would remove a massive burden on the core databases.

Change 804788 merged by Dzahn:

[operations/puppet@production] mediawiki: Split updateSpecialPages.php job to be per-shard

https://gerrit.wikimedia.org/r/804788

@Liz the existing job is super close to being finished.. it's currently at zhwikisource going alphabetically. Also we are about to switch this to multiple separated jobs, one per shard.

deployed first https://gerrit.wikimedia.org/r/804788 and then https://gerrit.wikimedia.org/r/c/operations/puppet/+/804800 on mwmaint2002/1002 by @Legoktm

from puppet point of view they are both identical and new services/timers have been created. the old timer has been removed by puppet:

which DC actually runs them is decided in conftool.

[mwmaint1002:~] $ file /lib/systemd/system/mediawiki_job_update_special_pages_*
/lib/systemd/system/mediawiki_job_update_special_pages_s11.service: ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s11.timer:   ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s1.service:  ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s1.timer:    ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s2.service:  ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s2.timer:    ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s3.service:  ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s3.timer:    ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s4.service:  ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s4.timer:    ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s5.service:  ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s5.timer:    ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s6.service:  ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s6.timer:    ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s7.service:  ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s7.timer:    ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s8.service:  ASCII text
/lib/systemd/system/mediawiki_job_update_special_pages_s8.timer:    ASCII text

@Liz and all:

We now have one job per shard ,the command lines are:

[mwmaint1002:~] $ for shard in 1 2 3 4 5 6 7 8 11; do grep Exec /lib/systemd/system/mediawiki_job_update_special_pages_s${shard}.service; done
ExecStart=/usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscriptwikiset updateSpecialPages.php s1.dblist
ExecStart=/usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscriptwikiset updateSpecialPages.php s2.dblist
ExecStart=/usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscriptwikiset updateSpecialPages.php s3.dblist
ExecStart=/usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscriptwikiset updateSpecialPages.php s4.dblist
ExecStart=/usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscriptwikiset updateSpecialPages.php s5.dblist
ExecStart=/usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscriptwikiset updateSpecialPages.php s6.dblist
ExecStart=/usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscriptwikiset updateSpecialPages.php s7.dblist
ExecStart=/usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscriptwikiset updateSpecialPages.php s8.dblist
ExecStart=/usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscriptwikiset updateSpecialPages.php s11.dblist

They all run at the same time, at the old schedule: *-1/3 05:00 and that's ok because gerrit:804800 made it possible to parallelize them. thanks to @Legoktm

And just to spell it out, enwiki is the only wiki in s1, so it should never miss a run because some other wiki has delayed it. Thank you @Dzahn for pushing this forward :)

old (flock) processeds killed. now we are just waiting for the new jobs to run. the scheduled time for s1 (enwiki) is: Trigger: Thu 2022-08-25 05:00:00 UTC; 1 day 6h left

What's the status of this? Is this fixed?

Ladsgroup moved this task from In progress to Done on the DBA board.

With splitting based on sharding and some other fixes (T322849), this has been properly mitigated. In order to avoid lingering tasks, I close this and if people need more fixes, we can take a look at T309738 (or simply create a new task)