Page MenuHomePhabricator

Curation toolbar fails to load occasionally for pages in the PageTriage queue
Closed, ResolvedPublicPRODUCTION ERROR

Description

Steps to replicate the issue (include links if applicable):

What happens?:
The curation toolbar fails to appear, whether on the right side of the window or in the "Tools" portlet (minimized). The JavaScript console says Exception in module-execute in module ext.pageTriage.views.toolbar: TypeError: mw.pageTriage.Article is not a constructor.

What should have happened instead?:
The curation toolbar appears. No console error.

Other information (browser name/version, screenshots, etc.):

  • Reported by @Nardog
  • If this is a regression, This patchset may be to blame. This is where I first spotted the bug.
  • Race condition?
    • It seems to occur less often if you're visiting the page for the first time than after reloading. This is further given credence by the fact disabling cache also seems to decrease the likelihood of the bug.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I don't see any entries for this in Logstash (searching for +"mw.pageTriage.Article is not a constructor" in the client errors dashboard). If someone has reliable steps to reproduce, please document them here.

Idea from @Nardog:

I wonder if taking the code in modules/ext.pageTriage.util/models/ext.pageTriage.article.js (or all scripts in that dir for that matter) out of $(...) changes anything

Nardog renamed this task from TypeError: mw.pageTriage.Article is not a constructor to Curation toolbar fails to load in User namespace.Dec 13 2022, 6:38 PM
Nardog updated the task description. (Show Details)

Observations:

  • It can only mean that ext.pageTriage.views.toolbar/ToolbarView.js is run before ext.pageTriage.util/models/ext.pageTriage.article.js as those are the only construction of and assignment to mw.pageTriage.Article, respectively. I thought the fact the assignment is in $(...) (waiting for page load) might have to do with it (as quoted above), but the module ext.pageTriage.views.toolbar seems to get loaded only from ext.pageTriage.toolbarStartup/ext.pageTriage.toolbarStartup.js, which is also already in $(...), so I'm not so sure about this theory (unless the toolbar module is somehow loaded before toolbarStartup).
  • Even if the curation toolbar shows up, reloading the page almost always reproduces the bug. But if I disable cache, it shows up again! Which clearly points to a race condition concerning $.ready/DOMContentLoaded.
  • Why only in the User namespace? And only on unpatrolled pages?

I don't see any entries for this in Logstash (searching for +"mw.pageTriage.Article is not a constructor" in the client errors dashboard). If someone has reliable steps to reproduce, please document them here.

The error is caught by ResourceLoader, so it's not technically a client error. Does that still crop up in logstash?

MPGuy2824 renamed this task from Curation toolbar fails to load in User namespace to Curation toolbar fails to load occasionally for pages in the PageTriage queue.EditedDec 19 2022, 8:11 AM
MPGuy2824 subscribed.

I've recently got this error for a mainspace article as well. A page refresh took care of the problem. Given this, I've changed the title of the ticket.

@MPGuy2824 Do you have a link to the article? I can't seem to replicate it in mainspace.

@MPGuy2824 Do you have a link to the article? I can't seem to replicate it in mainspace.

Sorry, I don't recall the article, but i feel the issue occurs when there is a bandwidth-crunch on my end.

I don't see any entries for this in Logstash (searching for +"mw.pageTriage.Article is not a constructor" in the client errors dashboard). If someone has reliable steps to reproduce, please document them here.

The error is caught by ResourceLoader, so it's not technically a client error. Does that still crop up in logstash?

My understanding is that if an exception is thrown while executing JS, we are sending the error details to Logstash (cc @Jdlrobson).

Perhaps someone with logstash access should do a quick test. If some of our front end exceptions aren't making it into logstash due to a poorly placed try {} catch {} somewhere, that is not ideal and would be worth fixing.

I'm not seeing anything in the logs and the replication steps in the description don't work for me. The error should be logged provided the error is not being caught and handled in the code. If it's not logging feel free to open a bug against Instrument-ClientError

@MPGuy2824 Do you have a link to the article? I can't seem to replicate it in mainspace.

Sorry, I don't recall the article, but i feel the issue occurs when there is a bandwidth-crunch on my end.

Just got this right now for https://en.wikipedia.org/wiki/57_Tauri The error went away (and the toolbar loaded) after two refreshes.

@Chlod suggests this might be a ResourceLoader issue:

as for the other issue you mentioned (mw.pageTriage.Article is not a constructor), this might be a ResourceLoader dependency issue
though I haven't really gone deep into how ResourceLoader works and i might be wrong on that

Suggested fix:

check if the toolbar's entry on extension.json includes the Article class as a dependency
(as for why the tool works most of the time in spite of that, probably just another race condition)

ext.pageTriage.util (which ext.pageTriage.util/models/ext.pageTriage.article.js is in) is required by ext.pageTriage.views.toolbar, which is defined in Hooks.php, not in extension.json. Perhaps ext.pageTriage.toolbarStartup should require it too?

Novem_Linguae changed the subtype of this task from "Bug Report" to "Production Error".Mar 1 2023, 8:06 AM

The code looks fine to me.

ext.pageTriage gets defined in ext.pageTriage.init.

mw.pageTriage.Article is defined inside ext.pageTriage.article.js which is part of ext.pageTriage.util. This means:

  • anything using mw.pageTriage.Article should list ext.pageTriage.util as a dependency. ✔️
  • ext.pageTriage.util therefore should have a dependency on ext.pageTriage.init, as otherwise ext.pageTriage doesn't exist. ✔️

ext.pageTriage.views.toolbar lists which is causing the error lists ext.pageTriage.util as a dependency. So ext.pageTriage.Article is safer to use there. ✔️

ext.pageTriage.toolbarStartup references mw.pageTriage. That gets defined in ext.pageTriage.init. So dependencies also fine there ✔️

All the above breaks down if something else e.g. a gadget modifies ext.pageTriage. I couldn't find any evidence of that: https://global-search.toolforge.org/?q=mw%5C.pageTriage&regex=1&namespaces=2%2C8&title=

The fact it's not showing up in logstash is super suspicious. This might be an example of cache poisoning (that hopefully would be uncovered in T321394).

Mentioned again today at https://en.wikipedia.org/wiki/Wikipedia_talk:New_pages_patrol/Reviewers#NPP_toolbox

Appears to not be limited to namespace 2 (userspace).

May only happen with redirects.

@jsn.sherman, are there any server side logs we can check to debug this further? If not, maybe we could add some additional debugging code to the front end to give more verbose logging?

Change 917870 had a related patch set uploaded (by Chlod Alejandro; author: Chlod Alejandro):

[mediawiki/extensions/PageTriage@master] Add .util as dependency of .toolbarStartup

https://gerrit.wikimedia.org/r/917870

@Novem_Linguae, @Nardog, and @Jdlrobson: Thank you very much for the information provided here. They helped immensely in trying to hammer out the issue. The stack trace posted by @Skarmory also saved quite a lot of time.

Long story short: This was a race condition introduced by the paired usage of ResourceLoader and $(). The related patch filed above fixes the dependency issues without removing the $(...) calls (which would otherwise be useful in determining if the page is ready or not, which will save eventual headaches regarding page state when the code runs).

ResourceLoader loads sequentially unless dependencies have to be met. In that case, dependencies are loaded first. Since toolbarStartup doesn't depend on util, it loads first and doesn't wait for .util. In turn, it's $(document).ready(...) call (i.e., its shorthand, $(...)) gets registered first. This means, when the document is ready, ext.pageTriage.views.toolbar is loaded asynchronously. While this wouldn't be an issue if it did have to load, this is not the case if the request is cached. In this case, the code in ext.pageTriage.views.toolbar is executed before the $(...) of util has a chance to declare Article.

In summary:

  • Why isn't util loaded with ext.pageTriage.views.toolbar?
    • ResourceLoader thinks it's already ready, so it doesn't think it needs reloading.
  • Why is the $(...) of ext.pageTriage.toolbarStartup loaded before ext.pageTriage.util's?
    • Because toolbarStartup gets loaded by ResourceLoader first.
  • Why does this only sometimes happen?
    • When the request for ext.pageTriage.views.toolbar is cached, the response time is essentially 0ms and JavaScript will continue running the code. It might not (haven't confirmed this, as I don't want to dig into the internals of V8 or Gecko) yield to the event loop, and thus, the contents of ext.pageTriage.views.toolbar (and our problem child ext.pageTriage.views.toolbar/ToolbarView.js) runs immediately before the $(...) of ext.pageTriage.util has a chance to run (because these are run by order of registration).
    • This matches with the "if I disable cache, it shows up again!" Nardog said earlier, but doesn't match up with the refresh behavior. This is so far one of the only loose ends I have.

P47993 has more details. Lines 235 to 248 seems to detail the exact states that lead to replication. Hope this helps. I need a drink. 🤣

Change 917967 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/extensions/PageTriage@master] Use packageFiles

https://gerrit.wikimedia.org/r/917967

I'm not 100% convinced by the proposed solution and need to think about it some more.

I think the core issue here is this uses an outdated pattern for registering and sharing code between modules. ext.pageTriage.init defines mw.pageTriage as an empty object and it gets populated in other modules, meaning modules need to be wary of the order they are loaded and be careful not to undo modifications in other modules. It seems like the only public API for page triage is mw.pageTriage.actionQueue so there might be some benefit in restructuring this in a more logical way as in the patch https://gerrit.wikimedia.org/r/917967.

I'd definitely be much happier if we moved over to using require for everything (heck, the current structure was a nightmare to debug), but this seems like a bigger effort that would take some time to test to make sure we didn't break anything. If there's nothing wrong from the analysis I posted above, given how unstable the toolbar currently is (considering the amount of people this bug has appeared for currently — and that's just the reported ones since we don't seem to have anything in Logstash), I think a temporary solution just to have this fixed immediately would be beneficial for NPP editors. Working towards a structure like in 917967 can follow immediately after.

Thought, ultimately, I don't think it's my call since I'm not a patroller on enwiki. If a solution can wait for something like 917967 to be accomplished, then we can definitely work on that and I'll abandon my patch in favor of that one.

Thought, ultimately, I don't think it's my call since I'm not a patroller on enwiki. If a solution can wait for something like 917967 to be accomplished, then we can definitely work on that and I'll abandon my patch in favor of that one.

I agree with Chlod. If the smaller patch is ok, let's accept it first. At least it would be live by the Thursday of next week then. Since, jdl's patch is large, it might take time to get it to pass jenkins' tests and code reviewers.

I tested some happy paths in Chlod's patch to make sure it doesn't break anything, looks good. I couldn't duplicate the bug in the master branch by clicking around, but Chlod's notes in P47993 look spot on. He appears to have duplicated the bug by inserting a setTimeout(), made this patch, then verified that the bug is fixed. I note that there is a longer-term solution suggested in [[gerrit:917967]], but that it is not ready yet. I am not sure what made this bug suddenly reappear this week, but we've received 5 complaints about it on the [[WT:NPPR]] page, so I think it makes sense to merge this patch to fix the immediate problem, then continue refactoring PageTriage via [[gerrit:917967]]. Thanks to all for your work on this.

@TheresNoTime , are you doing any backports this week? Think that https://gerrit.wikimedia.org/r/917870 might be a good patch to backport? It's a bug fix for a bug affecting at least 5 patrollers. If not no worries.

Change 917870 merged by jenkins-bot:

[mediawiki/extensions/PageTriage@master] Add .util as dependency of .toolbarStartup

https://gerrit.wikimedia.org/r/917870

wikitech:Deployments shows a few windows for today and tomorrow. The afternoon backport window is already underway; best not to interfere with that. The next window is in around 7 hours, but I'll be asleep by then. In any case, if deployment hits an issue, a revert should suffice, should you wish to ride that window. There's another window tomorrow at 07:00 (UTC), in case you need me online at the time of deployment.

Due to how intermittent this bug is, it'd be hard to test this on mwdebug100x, unless you somehow manually cache the request which pulls the ext.pageTriage.views.toolbar module (which should trigger the bug, but it's coerced rather than natural). I'm wondering if the procedure in this case would be to proceed anyway or if we really had to do that part.

Change 918508 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/extensions/PageTriage@master] Use packageFiles

https://gerrit.wikimedia.org/r/918508

Sorry I think you misunderstood me here. I wasn't suggesting porting everything to packageFiles.

The problem with the patch that's just been merged, is you are changing the load order for no good reason and also loading additional code that's not needed by the library, and there's a small risk that could cause problems elsewhere, and I honestly don't think that's a good idea.

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/917967 is the patch that will fix this properly and it's extremely small and far less risky IMO.

Change 918516 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/extensions/PageTriage@master] Remove unnecessary jQuery closure

https://gerrit.wikimedia.org/r/918516

The other approach to fixing this is to just remove the jQuery closures. There's absolutely zero reason to have them and it's likely a performance problem anyhow - as it's adding extra processing: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/918516

Change 918516 merged by jenkins-bot:

[mediawiki/extensions/PageTriage@master] Remove unnecessary jQuery closure

https://gerrit.wikimedia.org/r/918516

Apologies for the quick merge. Wish you would have dropped a minus one, I didn't realize the patch was risky. I guess never mind about the backport, let's slow things down and do it the right way.

Change 918531 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/extensions/PageTriage@wmf/1.41.0-wmf.8] Remove unnecessary jQuery closure

https://gerrit.wikimedia.org/r/918531

Yeh that's my bad. No need for apologies!

Change 918531 merged by jenkins-bot:

[mediawiki/extensions/PageTriage@wmf/1.41.0-wmf.8] Remove unnecessary jQuery closure

https://gerrit.wikimedia.org/r/918531

Mentioned in SAL (#wikimedia-operations) [2023-05-10T20:12:38Z] <cjming@deploy1002> Started scap: Backport for [[gerrit:918531|Remove unnecessary jQuery closure (T324913)]]

Mentioned in SAL (#wikimedia-operations) [2023-05-10T20:14:12Z] <cjming@deploy1002> cjming and jdlrobson: Backport for [[gerrit:918531|Remove unnecessary jQuery closure (T324913)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-05-10T20:21:40Z] <cjming@deploy1002> Finished scap: Backport for [[gerrit:918531|Remove unnecessary jQuery closure (T324913)]] (duration: 09m 02s)

Verified on https://test.wikipedia.org/wiki/Main_Page by running the following code mw.loader.using('ext.pageTriage.views.toolbar') and seeing the toolbar present. The fix will roll out to all wikis with the train (hopefully tomorrow). Thanks @Chlod for working out what was going on here.

@Jdlrobson No problem! Happy to have helped. Thanks for the better patch!

Change 917967 merged by jenkins-bot:

[mediawiki/extensions/PageTriage@master] Use packageFiles in ext.pageTriage.toolbarStartup

https://gerrit.wikimedia.org/r/917967

Jdlrobson claimed this task.

This should now be fixed in production everywhere. Please let me know if it's still happening.