Page MenuHomePhabricator

Gadget dependencies sometimes don't update
Closed, ResolvedPublic

Description

Dependencies on MediaWiki:Gadgets-definition sometimes don't update properly. I've encountered this at least twice. When this happens, completely removing and adding back the definition works. Here are the two examples:

On 14 December 2016 00:05, I removed jquery.ui.dialog as a dependency (https://sv.wikipedia.org/w/index.php?title=MediaWiki:Gadgets-definition&diff=38413998) but the gadget was still using it half an hour later and I removed the definition entirely 00:37 (https://sv.wikipedia.org/w/index.php?title=MediaWiki:Gadgets-definition&diff=38414035). When I added it back everything worked as expected.

Same thing happened today. I replaced mediawiki.api with mediawiki.api.edit (https://sv.wikipedia.org/w/index.php?title=MediaWiki:Gadgets-definition&diff=38808577) and half an hour later it still hasn't updated, so I remove the definition entirely (https://sv.wikipedia.org/w/index.php?title=MediaWiki:Gadgets-definition&diff=38808599). When I added it back everything worked as expected.

Details

Related Gerrit Patches:
mediawiki/extensions/Gadgets : masterAccount for DB lag in loadGadgets() caching
mediawiki/extensions/Gadgets : masterAvoid wfMessage() call in Gadget cache updates

Event Timeline

Nirmos created this task.Feb 5 2017, 6:21 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 5 2017, 6:21 AM
Krinkle added a subscriber: Krinkle.Feb 6 2017, 6:59 PM

I'm fairly certain ResourceLoader can be ruled out as cause as it doesn't perform caching of any kind with regards to dependencies (aside from the HTTP response containing the startup manifest output - which rolls over every 5 minute and we know that part is working).

For Gadgets in particular, there is an abstraction layer between the module class in PHP and the definition on the wiki page - there is a definition parser that caches the definition in memcached. Presumably this cache store is either experiencing a race condition (an old request overwriting your new one), or it may be experiencing data loss (the update or purge being lost).

Relevant code is in the MediaWikiGadgetsDefinitionRepo class and its use of WANObjectCache.
https://github.com/wikimedia/mediawiki-extensions-Gadgets/blob/f1ad65c724b66ba32582cdb93bd9dc01be735cf3/includes/MediaWikiGadgetsDefinitionRepo.php#L35-L84

Krinkle added a subscriber: aaron.
Krinkle assigned this task to aaron.Feb 17 2017, 10:52 PM
aaron triaged this task as Medium priority.Feb 22 2017, 7:59 PM
aaron moved this task from Inbox to Next In This Quarter / Jan-Mar 2020 on the Performance-Team board.

Change 341944 had a related patch set uploaded (by Aaron Schulz):
[mediawiki/extensions/Gadgets] Avoid wfMessage() call in Gadget cache updates

https://gerrit.wikimedia.org/r/341944

Change 341944 merged by jenkins-bot:
[mediawiki/extensions/Gadgets] Avoid wfMessage() call in Gadget cache updates

https://gerrit.wikimedia.org/r/341944

Krinkle closed this task as Resolved.Mar 18 2017, 3:20 AM

At https://wikitech.wikimedia.org/wiki/How_to_deploy_code#A_note_on_JavaScript_and_CSS it says

Occasionally ResourceLoader trips up and does not re-cache files correctly. The symptom of this is typically that stale minified files are served, but if you add ?debug=true to the URL RL serves the new content.

Should this be removed now that this task is solved, or does it refer to something else?

At https://wikitech.wikimedia.org/wiki/How_to_deploy_code#A_note_on_JavaScript_and_CSS it says

Occasionally ResourceLoader trips up and does not re-cache files correctly. The symptom of this is typically that stale minified files are served, but if you add ?debug=true to the URL RL serves the new content.

Should this be removed now that this task is solved, or does it refer to something else?

It refers to something else. It refers to the caching of files stored on the disk, e.g. those deployed from MediaWiki core and extensions that are maintained in a Git repository. Due to each web server having its own copy of these files, it is possible for there to be the following race conditions:

  • Git commit changing a file from A to B progressively being deployed to 100 servers in a cluster.
  • A user asks ResourceLoader on server 20 (has the new code) the version of the file, it gets back version B.
  • The user now asks ResourceLoader for file?v=B, this request happens to be routed to server 60 (still running the old version), it response with version A.
  • The Varnish caches in surrounding our web servers will now remember response to url "file?v=B" with the content of version A.

Related tasks:

Changes to gadgets however are instantaneous because they are made in a single place (the master database of the current wiki), not on 100 separate web servers. So this bug doesn't apply there.

Unrelated to this task, however, I do notice that the above scenario about git commit is also T47877. While the race condition can still happen, ResourceLoader now informs Varnish that when it responds with a different version, that it should not be remembered.

Thanks @Nirmos, I've updated the documentation.

Nirmos reopened this task as Open.May 13 2017, 5:17 AM

I have reason to believe that this is not fixed after all. In https://meta.wikimedia.org/w/index.php?diff=16751393, Jalexander puts meta under script error (Uncaught Error: Unknown dependency: mw.geoIP). He reverted this edit in https://meta.wikimedia.org/w/index.php?diff=16752348

However, I can still see the error over an hour later and I have screenshots to prove it if you don't believe me.


This screenshot shows me getting the error one hour and 16 minutes after Jalexander undid the faulty edit.

For what it's worth, now, an hour and 36 minutes later, the error is gone, in case that helps.

Aklapper closed this task as Resolved.May 13 2017, 8:28 PM

Reclosing task as some lag can happen when it comes to caching...

Krinkle reopened this task as Open.Aug 25 2017, 11:29 PM

Re-opening per T174124, although that problem too went away by itself after a few hours. Until we find one that we can investigate there's not much we can do.

Change 375088 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/Gadgets@master] Account for DB lag in loadGadgets() caching

https://gerrit.wikimedia.org/r/375088

Change 375088 merged by jenkins-bot:
[mediawiki/extensions/Gadgets@master] Account for DB lag in loadGadgets() caching

https://gerrit.wikimedia.org/r/375088

Krinkle closed this task as Resolved.Sep 4 2017, 8:23 PM
Krinkle removed a project: Patch-For-Review.
Krinkle moved this task from Backlog: Small & Maintenance to Doing on the Performance-Team board.