Page MenuHomePhabricator

Lua module Module:Internet_Archive is causing parser cache pollution - ext.math.styles is sometimes loaded
Closed, ResolvedPublic

Assigned To
Authored By
bmansurov
Aug 23 2017, 3:28 PM
Referenced Files
F25068929: Screen Shot 2018-08-17 at 11.37.05 AM.png
Aug 17 2018, 6:42 PM
F24643823: mathzonk.png
Aug 7 2018, 3:55 PM
F22351177: Screen Shot 2018-06-18 at 3.30.47 PM.png
Jun 18 2018, 10:31 PM
F22263104: cap2-mobile.png
Jun 15 2018, 5:04 PM
F22263103: cap3-mobile.png
Jun 15 2018, 5:04 PM
F22263105: cap5-mobile.png
Jun 15 2018, 5:04 PM
F22262922: cap1-desktop.png
Jun 15 2018, 5:04 PM

Description

The "CSS Size (Barack Obama - en.m)" graph shows that the CSS size for this page has been fluctuating has been increasing and decreasing cyclically by a consistent amount of bytes for almost a year.

6 month period:

Screen Shot 2018-06-14 at 3.43.34 PM.png (759×1 px, 112 KB)

Compare:
http://wpt.wmftest.org/result/180614_EZ_GR/1/details/#waterfall_view_step1 (makes ext.math.styles request
with http://wpt.wmftest.org/result/180612_QJ_GW/1/details/#waterfall_view_step1 (no math request)

The Barack Obama page does not have any Math equations.

I am concerned that user specific preferences are causing parser cache pollution and this problem may be a larger performance problem then we are aware of.

After some investigation (see detailed developer notes) I have managed to narrow this down to a Lua module that reads Wikidata data, which in turn leads to the Math module unexpectedly being added.

Developer notes

Looking at the underlying code there is a troublesome note "This affects the page caching behavior."
https://github.com/wikimedia/mediawiki-extensions-Math/blob/df1689c7177d5b7ef946ae2ff525594bd377da3f/Math.hooks.php#L198

I see the page rendering hash is linked to the user option but not the current user:
https://github.com/wikimedia/mediawiki-extensions-Math/blob/df1689c7177d5b7ef946ae2ff525594bd377da3f/Math.hooks.php#L121

Given the regularity of the CSS change, it seems this isn't related to edit activity so I would suspect there is something funky with the caching here and the style is being added where it shouldn't be.

The style should only be added if the chem or math parser function is used. The only other place ext.math.styles is added relates to Wikidata: Wikidata hook

Replication steps

Screen Shot 2018-06-18 at 3.30.47 PM.png (671×631 px, 77 KB)

Developer notes

This only seems to happen when a page is associated with a Wikidata entry. When I remove the wikilink on https://www.wikidata.org/wiki/Q4115189 the problem magically goes away

It also seems to go away when the template "Internet Archive author" has been removed.

The code inside https://en.wikipedia.org/wiki/Module:Internet_Archive does seem to make use of wikibase - making a call to mw.wikibase.getEntityObject().

I connected https://wikidata.beta.wmflabs.org/wiki/Q361812 to https://en.wikipedia.beta.wmflabs.org/wiki/T173949 and re-ran the experiment but it failed. I updated the module, still no luck, however when I added an inception (start date) field to the Wikidata item the math module started to get pulled in.

Wed 20th June 2018

I've narrowed it down ever further https://en.m.wikipedia.beta.wmflabs.org/wiki/T173949-2 with a simplified module https://en.wikipedia.beta.wmflabs.org/wiki/Module:T173949

All this module does is call the method formatPropertyValues

This method seems to trigger the load of ext.math.styles in the mobile view.

Next step would be to replicate this on an environment where I can see exactly what's going on.

Thur 1st June 2018

I set it up on http://en-reading-web-staging.wmflabs.org/w/index.php?title=T173949 so I could actually debug what's going on.

Discovered the math style was added inside the Math hook. I looked at what introduced this code. It was this patch introduced by T67397.

The onWikibaseClientDataTypes hook is entered in both desktop and mobile for this article but the $dataTypeDefinitions['PT:math'] formatter-factory-callback is only entered on mobile. Why? I'm still investigating. I did a var_dump inside the format-factory-callback for some clues...

object(ValueFormatters\FormatterOptions)[868]
  protected 'options' => 
    array (size=11)
      'languages' => 
        object(Wikibase\LanguageFallbackChain)[834]
          private 'chain' => 
            array (size=1)
              ...
      'lang' => string 'en' (length=2)
      'on-error' => string 'warn' (length=4)
      'unDeserializableMessage' => string 'wikibase-undeserializable-value' (length=31)
      'geoformat' => string 'dms' (length=3)
      'spacing' => 
        array (size=1)
          0 => string 'latlong' (length=7)
      'directional' => boolean true
      'forceSign' => boolean false
      'showQuantityUncertaintyMargin' => boolean true
      'applyRounding' => boolean true
      'applyUnit' => boolean true
/vagrant/mediawiki/extensions/Math/src/MathWikidataHook.php:76:string 'text/plain' (length=10)

Looking into this some more - this hook does seem to run on mobile AND desktop and styles are only added on the first run, however only sometimes does it get cached.

Experimenting with several pages, adding styles and scripts at this point in the code seems incorrect and inappropriate given the global usage of wgOut given it doesn't relate to rendering.

Summary

If an article page is connected to a Wikidata entry which has an inception field AND entity:formatPropertyValues( 'P569' ) is invoked inside a Lua module the ext.math.styles module is unexpectedly pulled into the mobile view.

Acceptance criteria

  • ext.math.styles should not be loaded on mobile for a page which does not have Math

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Jdlrobson renamed this task from CSS size in production fluctuates to ext.math.styles is sometimes loaded when not needed - causing CSS size in production to fluctuate.Aug 29 2017, 3:51 PM
Jdlrobson updated the task description. (Show Details)
Jdlrobson edited projects, added Web-Team-Backlog (Tracking); removed Web-Team-Backlog.
Jdlrobson updated the task description. (Show Details)
Jdlrobson added a project: MediaWiki-Parser.

I'm a little concerned that this might be due to a parser cache getting dirty - so tagging MediaWiki:Parser given this stylesheet can be added via parser hook.

Jdlrobson renamed this task from ext.math.styles is sometimes loaded when not needed - causing CSS size in production to fluctuate to ext.math.styles is sometimes loaded when not needed - causing CSS size in production to fluctuate - parser cache pollution?.Jun 14 2018, 11:09 PM
Jdlrobson raised the priority of this task from High to Unbreak Now!.
Jdlrobson updated the task description. (Show Details)
Jdlrobson updated the task description. (Show Details)

This is still happening leading to unnecessary render blocking styles.
I've added a few tags and raised priority to get more visibility for this issue as it's been a problem for nearly a year now.
I think it is pretty serious and worthy of investigation to rule out any issues with parser cache pollution. We can drop the priority once we have a better idea of the seriousness of this.

greg lowered the priority of this task from Unbreak Now! to High.Jun 14 2018, 11:35 PM
greg subscribed.

I've added a few tags and raised priority to get more visibility for this issue as it's been a problem for nearly a year now.

UBN! is just for things which are breaking production in drastic ways and not a general means of getting wider attention (though this did get mine because I watch all UBN!s).

Who, exactly, needs to see this and take action?

I feel that it is a serious problem in production as it might be a symptom of a much larger problem but it's not clear who should look into this. I'd appreciate someone who understands the parser to check if there's anything obviously wrong in the Math extension or whether this is a problem in our caching strategy. @brion / @Krinkle do you know who might be the right person to talk to about this?

@Jdlrobson I am maintaining the code in the Math extension. It there anything with the code within the math extension which is unclear?

@Physikerwelt it's how the hooks work that confuse me. The Barack Obama article on enwiki uses neither the chem or the math parser functions but the ext.math.styles appear to be leaking into the page some how. How that's happening, I have no idea. Do you have any theories?

Another potential source of caching problems is the setting dependent cache key https://github.com/wikimedia/mediawiki-extensions-Math/blob/df1689c7177d5b7ef946ae2ff525594bd377da3f/Math.hooks.php#L133-L136 depending on the user setting for math rendering different versions of the page are served. I am happy about any suggestions, how the code could be improved.

@Jdlrobson from my understanding what happens during hook registration code is that this code is only called if the parser sees a math,ce, or chem tag in the template expanded wikitext. I could imagine that during template expansion math tags might appear under certain circumstances. But this is really a wild guess.

@Jdlrobson It does indeed appear to be a bug somewhere relating to cache pollution, in particular it seems a cache key somewhere is being shared between mobile and desktop that shouldn't be.

I can reproduce the following in Incognito mode (with disabled cache):

  1. (Desktop) View https://en.wikipedia.org/wiki/Barack_Obama, check Dev Tools/Network/CSS. Observe main stylesheet does not include ext.math.styles.
  2. (Mobile) View https://en.m.wikipedia.org/wiki/Barack_Obama, observe the main stylesheet includes ext.math.styles.
  3. (Mobile purge) https://en.m.wikipedia.org/wiki/Barack_Obama?action=purge, confirm. Then reload https://en.m.wikipedia.org/wiki/Barack_Obama, observe the main stylesheet still includes ext.math.styles.
  4. (Desktop) View https://en.wikipedia.org/wiki/Barack_Obama, observe the main stylesheet now also includes ext.math.styles.
  5. (Desktop purge) View https://en.wikipedia.org/wiki/Barack_Obama?action=purge, confirm. Reload and confirm it is no longer included on desktop. (Mobile) View https://en.m.wikipedia.org/wiki/Barack_Obama, observe the main stylesheet does not include ext.math.styles.
Point 1Point 2Point 3
cap1-desktop.png (1×2 px, 786 KB)
cap3-mobile.png (1×2 px, 384 KB)
cap2-mobile.png (1×2 px, 388 KB)
Point 4Point 5
cap5-mobile.png (1×2 px, 392 KB)
cap4-desktop.png (1×2 px, 725 KB)

So it seems that whenever the last purge came from mobile, the module is included, whenever the last purge came from desktop, it is (correctly) not included.

I'd recommend trying to reproduce it locally, and on Beta Cluster, and on different production wikis. That would help narrow it down somewhat.

Jdlrobson renamed this task from ext.math.styles is sometimes loaded when not needed - causing CSS size in production to fluctuate - parser cache pollution? to Lua module Module:Internet_Archive is causing parser cache pollution - ext.math.styles is sometimes loaded.Jun 18 2018, 10:26 PM
Jdlrobson updated the task description. (Show Details)
Jdlrobson added a project: wikiba.se website.

Apologies for all the notes/spam, but after a few hours looking at this I've managed to narrow this down to an[[ https://en.m.wikipedia.beta.wmflabs.org/wiki/Module:Internet_Archive | Internet Archive Lua module ]] which is hitting Wikidata

Can someone (maybe @Krinkle ) confirm they can replicate this on https://en.m.wikipedia.beta.wmflabs.org/wiki/T173949 per my instructions and that they are seeing the same problem?

I suspect something is wrong in the Wikidata layer or the Math layer. I'm hoping with the detailed instructions I've added we'll be able to work out what's happening here. It worries me a Lua template can cause cache pollution. Thus I'm guessing the Wikibase team would be most appropriate to look into this - since they should have a good handle on how this can happen.

@Physikerwelt any idea why a call to formatPropertyValues in a Lua template would load the Maths modules as a side effect?

Change 441508 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/Math@master] formatter-factory-callback should not load math styles

https://gerrit.wikimedia.org/r/441508

^ @Physikerwelt can you take a look at this patch? This should remedy the problem with Math styles showing up where they shouldn't be. It looks like this impacts any page using the scribunto formatPropertyValues function. I'm pretty sure we can also remove the addModuleStyles calls above as well, but this seems to be the bare minimum. Let me know if you have any concerns and I'll investigate other alternatives, but I'd like to get this out on the next train given it seems to be impacting pages across the site.

A short term fix would also be to set $wgMathEnableWikibaseDataType = false on mobile.

@Jdlrobson The patch you are proposing would prevent math from being properly displayed if its used in a lua module. See https://en.wikipedia.org/wiki/Module:ShowMath for an example. I would not consider this as a bug. I currently don't understand why it's only sometimes (for instance on mobile) loaded and sometimes not (for instance on desktop).

@Physikerwelt can you explain to me why https://en.wikipedia.beta.wmflabs.org/wiki/Module:T173949 causes the loading of unnecessary Math styles/scripts on desktop/mobileeven though they are not needed to render the page?

The fact desktop and mobile parser output is not matching to me suggests there is probably parser cache pollution leaking between the two and some codepath is disabled in mobile that shouldn't be. Using globals in MediaWiki inside hooks always seems to cause this problem so is best avoided.

@Jdlrobson I agree that global should be avoided. Any other solution is highly appreciated, I am just unaware of one. However, whenever math is displayed the associated CSS and js classes should be loaded. I think a good solution is that the math extension follows the standard procedure for all tag extensions that require custom CSS and js for their content. Is there a good example of such an extension?

So from what i understand https://github.com/wikimedia/mediawiki-extensions-Math/blob/master/src/MathHooks.php#L202 uses the parser hook but there are other ways to add math to a page? If that's understood I think I've got this.

@Jdlrobson yes. The (only) other way I am aware of is using properties of type mathematical formula from wikidata.

I have a patch up. With a few minor adjustments this should get fixed.

Change 442725 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/Math@master] Remove unnecessary addModule and addModuleStyle calls

https://gerrit.wikimedia.org/r/442725

Change 442738 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[operations/mediawiki-config@master] Limit wgMathEnableWikibaseDataType to wikidata

https://gerrit.wikimedia.org/r/442738

I think a good solution is that the Math extension follows the standard procedure for all tag extensions that require custom CSS and js for their content. Is there a good example of such an extension?

Yes.

And also, Math itself: MathHooks.php#mathTagHook.

And also, Math itself: MathHooks.php#mathTagHook.

This is why I think https://gerrit.wikimedia.org/r/#/c/442725/is the right solution here. Please please please review? :)

@Jdlrobson I am currently traveling until Wednesday.... From my understanding the problem is not wikidata but wikibase. Thus the styles need to be added somewhere using a wikibase hook.

Thanks @Physikerwelt!

From my understanding the problem is not wikidata but wikibase

If this is the case i suspect we can disable the config flag on wikipedia's:
https://gerrit.wikimedia.org/r/442738
I'll check in with you Wednesday if there is any issues with swatting that change.

Change 443941 had a related patch set uploaded (by Physikerwelt; owner: Physikerwelt):
[mediawiki/extensions/Math@master] Add math related styles and script via appropriate hook

https://gerrit.wikimedia.org/r/443941

Change 447621 had a related patch set uploaded (by Physikerwelt; owner: Physikerwelt):
[mediawiki/extensions/Wikibase@master] Load MathDataUpdater to format math

https://gerrit.wikimedia.org/r/447621

Change 443941 merged by jenkins-bot:
[mediawiki/extensions/Math@master] Add math related styles and script via appropriate hook

https://gerrit.wikimedia.org/r/443941

Change 447621 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Load MathDataUpdater to format math

https://gerrit.wikimedia.org/r/447621

Change 442725 abandoned by Physikerwelt:
Remove unnecessary addModule and addModuleStyle calls

Reason:
this commit is a subset of I0e24bbb53e6e01d549f534744780ca1afc49fdd7

https://gerrit.wikimedia.org/r/442725

Change 450978 had a related patch set uploaded (by WMDE-leszek; owner: WMDE-leszek):
[mediawiki/extensions/Wikibase@master] Fixed the use of MathDataUpdater in parser output generation

https://gerrit.wikimedia.org/r/450978

Merging https://gerrit.wikimedia.org/r/447621 was apparently a mistake, as it caused errors on each item page, when Math extension is enabled, like to be noticed on beta (see below)

mathzonk.png (711×1 px, 134 KB)

https://gerrit.wikimedia.org/r/450978 fixes it

Change 450978 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Fixed the use of MathDataUpdater in parser output generation

https://gerrit.wikimedia.org/r/450978

@WMDE-leszek thanks for fixing. However, I wonder how this could be merged without failing tests. Right after the change was merged, all subsequent unit tests failed. Why did I not see the error message when I tested the change on Jul 24th?

However, I wonder how this could be merged without failing tests.

That's the very good point, and I was wondering the same. It seems that Wikibase quibble/phpunit jobs do not pull in Math extension, so that code path was never reached. Will try to get this fixed (although it might, or might not be as easy to fix as it sounds).
Should I had run tests locally _with_ Math extension enabled, and not assumed CI does it (without double checking), I've had spotted the issue initially.

Why did I not see the error message when I tested the change on Jul 24th?

This I sadly cannot explain. Changes in Wikibase that resulted in the new code being incorrect are actually relatively recent, namely have been merged on July 2nd. Maybe you had a bit dated Wikibase copy or something like this?

see T201496 about adjusting Wikibase CI to include Math extension somehow

@WMDE-leszek thank you. Yes. My local wikibase copy was not up to date when I created the change (cf. Patch set 1).

Change 442738 abandoned by Jdlrobson:
Limit wgMathEnableWikibaseDataType to wikidata

Reason:
Doesn't look like this will be needed anymore

https://gerrit.wikimedia.org/r/442738

Change 441508 abandoned by Jdlrobson:
Add Math styles and scripts in BeforePageDisplay hook

https://gerrit.wikimedia.org/r/441508

I can't seem to replicate the issue anymore on the beta cluster. Hopefully graphs will stabilise from a week tomorrow (16th)! Yay! Thanks all!

The latest Barack Obama CSS size graph:

grafana.wikimedia.org_dashboard_db_mobile-2g_panelId=90&fullscreen&orgId=1.png (1×2 px, 409 KB)

Jdlrobson claimed this task.

Thanks all for looking into this. Looking at last 90 days, I'm seeing a smooth line from the 10th (before that the lines were extremely bumpy and inconsistent).
Given your changes went out on the 7th they would have hit enwiki on the 10th (UTC)
So it looks like your patches were responsible and you've resolved this issue. This means clients will be caching CSS more consistently and hopefully rendering faster in some occasions as a result. Thanks a bunch! 👏👏👏

Screen Shot 2018-08-17 at 11.37.05 AM.png (355×1 px, 86 KB)