Page MenuHomePhabricator

MWException: Error contacting the Parsoid/RESTBase server (HTTP 404) from DiscussionTools (on open wikis) – permalinks unavailable for some edits
Closed, ResolvedPublic

Description

Continuing from T315383, where we fixed a much more common error similar to this one.

https://logstash.wikimedia.org/goto/060f147c37afdacfb93f7e709df06fbe

image.png (517×2 px, 48 KB)

I don't know what's up with this, but it is certainly a different issue. Maybe we should think some more about reading from outdated replicas (my first debugging idea from yesterday T315383#8159951), or maybe the job tried updating a page that was deleted in the meantime. Either way in my opinion this doesn't block the train.

Related Objects

Event Timeline

Many of the errors mention Special:Undelete (on enwiki), where a RevisionArchiveRecord is passed to our code, perhaps something in that code path is wrong.

Others are just page saves and jobs for various pages on various wikis, with no pattern that I can see.

Krinkle renamed this task from MWException: Error contacting the Parsoid/RESTBase server (HTTP 404) from DiscussionTools (on non-closed wikis) to MWException: Error contacting the Parsoid/RESTBase server (HTTP 404) from DiscussionTools (on open wikis).Aug 31 2022, 11:20 PM
Krinkle triaged this task as High priority.
Krinkle added a subscriber: Krinkle.

Raising priority as it is a recent appearance and in relatively high volume compared to other prod errors. If it was found earlier, it would have likely been proposed as train blocker. Please feel free to reach out to different teams for help as-needed.

In rEVEDf2df5dc7b98a: Improve error messages for RESTBase errors I added more details to these errors, we have three different kinds so far:

Error contacting the Parsoid/RESTBase server (HTTP 404): Page was deleted
Error contacting the Parsoid/RESTBase server (HTTP 404): Requested resource is not found.
Error contacting the Parsoid/RESTBase server (HTTP 404): Requested page does not exist.

I'll read some code to figure out what they really mean, and then either reach out to Content-Transform-Team with a more specific problem, or (if it looks beyond repair) add ParsoidOutputAccess as a fallback to RESTBase and see if that helps. I was going to use ParsoidOutputAccess for this code in the first place, before learning that it has some unspecified caching deficiencies and is not suitable for production use, but this only occurs ~50 times per day, so caching anything shouldn't be needed.

Error contacting the Parsoid/RESTBase server (HTTP 404): Page was deleted

I think this error is wrong, and the page was not in fact deleted. RESTBase will return this when the REST API [1] reports that the page exists, and the action API [2] reports that it doesn't. This can also occur when the action API reads from a stale replica, and that's probably what is happening. I stopped short of trying to figure out why it's calling two APIs.

[1] https://en.wikipedia.org/api/rest_v1/page/title/The_Fighting_Temeraire
[2] https://en.wikipedia.org/w/api.php?format=json&action=query&prop=info%7Crevisions&continue=&rvprop=ids%7Ctimestamp%7Cuser%7Cuserid%7Csize%7Csha1%7Ccomment%7Ctags&titles=The%20Fighting%20Temeraire

Ref: https://github.com/wikimedia/restbase/blob/ecef17bda6f4efc0d6e187fb05b1eeb389bf7120/lib/mwUtil.js#L224

Error contacting the Parsoid/RESTBase server (HTTP 404): Requested resource is not found.

RESTBase will return this when the action API reports that the page doesn't exist. Probably reading from a stale replica.

Ref: https://github.com/wikimedia/restbase/blob/ecef17bda6f4efc0d6e187fb05b1eeb389bf7120/sys/action.js#L148

Error contacting the Parsoid/RESTBase server (HTTP 404): Requested page does not exist.

RESTBase will return this when… its normalized title is not exactly the same as MediaWiki normalized title? I'm as surprised as you are. The error message makes no sense, but I guess we should add logging to find out what those titles are, this could be a problem for other tools…

Ref: https://github.com/wikimedia/restbase/blob/ecef17bda6f4efc0d6e187fb05b1eeb389bf7120/sys/page_revisions.js#L396

matmarex renamed this task from MWException: Error contacting the Parsoid/RESTBase server (HTTP 404) from DiscussionTools (on open wikis) to MWException: Error contacting the Parsoid/RESTBase server (HTTP 404) from DiscussionTools (on open wikis) – permalinks unavailable for some edits.Sep 3 2022, 2:20 AM

Change 829255 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@master] Log page/revision IDs when the page/revision seems to be missing

https://gerrit.wikimedia.org/r/829255

Change 829255 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] Log page/revision IDs when the page/revision seems to be missing

https://gerrit.wikimedia.org/r/829255

Change 843550 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@wmf/1.40.0-wmf.5] Log page/revision IDs when the page/revision seems to be missing

https://gerrit.wikimedia.org/r/843550

Change 843550 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@wmf/1.40.0-wmf.5] Log page/revision IDs when the page/revision seems to be missing

https://gerrit.wikimedia.org/r/843550

Mentioned in SAL (#wikimedia-operations) [2022-10-17T20:18:11Z] <urbanecm@deploy1002> Finished scap: 6762292a4: e320d48c8: 6762292a4: DicsussionTools/WikimediaEvents backports (T315688, T315689, T320938) (duration: 04m 35s)

A few errors have been recorded by now with more detailed logging, here are the results:

ErrorSiteLinkPage titleRevision status
Page was deleted - page 69448378, rev 1058978757en.wikipedia.orglinkDraft talk:Gavin P. Winstonexists
Page was deleted - page 68756907, rev 1086763317en.wikipedia.orglinkDraft talk:List of tropical cyclones with auxiliary namesexists
Page was deleted - page 70576634, rev 1083627975en.wikipedia.orglinkDraft talk:Hasnat Azamexists
Page was deleted - page 61477668, rev 924911274en.wikipedia.orglinkDraft talk:Mikey Tanhaexists
Page was deleted - page 66680312, rev 1010450811en.wikipedia.orglinkDraft talk:Éditions Syllepseexists
Page was deleted - page 66632383, rev 1009624166en.wikipedia.orglinkDraft talk:Jean-Pierre Gallandexists
Requested page does not exist. - page 37014257, rev 1117035655en.wikipedia.orglinkTalk:Marley Rose (Glee)exists
Page was deleted - page 36084406, rev 798771841en.wikipedia.orglinkTemplate talk:Major League Baseball Umpires navboxexists
Requested page does not exist. - page 72049485, rev 1117016508en.wikipedia.orglinkBad titleMISSING
Requested resource is not found. - page 71732317, rev 1117006934en.wikipedia.orglinkTalk:2022 Men's South American Cricket Championshipexists
Page was deleted - page 72015943, rev 1116304199en.wikipedia.orglinkTalk:You Energy Volleyexists
Page was deleted - page 71230848, rev 1097258004en.wikipedia.orglinkTalk:Abdallah Abu Sheikhexists
(no message) - page 949, rev 8663bn.wikiquote.orglinkউইকিউক্তি:নারীবাণীexists
Requested page does not exist. - page 59958918, rev 1116976305en.wikipedia.orglinkTalk:Prespa agreementexists
Page was deleted - page 66931834, rev 1017449165en.wikipedia.orglinkDraft talk:Gujarati Asmitaexists
Page was deleted - page 60593940, rev 979055373en.wikipedia.orglinkDraft talk:Maloney Propertiesexists
Page was deleted - page 67937724, rev 1028366970en.wikipedia.orglinkDraft talk:CN Logisticsexists
Page was deleted - page 5234380, rev 55763445ar.wikipedia.orglinkنقاش:باب اللوقexists
Page was deleted - page 51772242, rev 741477183en.wikipedia.orglinkBad titleMISSING
Page was deleted - page 51772401, rev 741478152en.wikipedia.orglinkBad titleMISSING
Page was deleted - page 54855886, rev 795062007en.wikipedia.orglinkBad titleMISSING
Page was deleted - page 70145684, rev 1073657433en.wikipedia.orglinkDraft talk:The Whipsexists
Page was deleted - page 1598661, rev 102853541de.wikipedia.orglinkBenutzer Diskussion:OCAD Team/OCAD (Software)exists
Requested resource is not found. - page 4460519, rev 1116878719en.wikipedia.orglinkWikipedia:Articles for deletion/Log/2006 March 21exists
Page was deleted - page 69624053, rev 1068334904en.wikipedia.orglinkDraft talk:List of south Asian converts to Islamexists
Requested resource is not found. - page 72025082, rev 1116865602en.wikipedia.orglinkWikipedia:Articles for deletion/Loyola Hall (Seattle University)exists
Page was deleted - page 69409699, rev 1065080097en.wikipedia.orglinkDraft talk:Involve.meexists
Page was deleted - page 70561864, rev 1083242804en.wikipedia.orglinkDraft talk:David T. Warnerexists
Page was deleted - page 27591247, rev 687875274en.wikipedia.orglinkCategory talk:Apple Inc. articles needing attention only to referencing and citationexists
Page was deleted - page 1215271, rev 1116757195en.wikipedia.orglinkFor What It's Worthexists
Requested page does not exist. - page 48004281, rev 1116774516en.wikipedia.orglinkTalk:On Down the Line (album)exists
Page was deleted - page 70529570, rev 1082448442en.wikipedia.orglinkDraft talk:Kanazawa Marathonexists
Requested resource is not found. - page 19758235, rev 69204434vi.wikipedia.orglinkThảo luận Wikipedia:Bạn có biết/2022/Tuần 42exists
Requested resource is not found. - page 72025096, rev 1116755844en.wikipedia.orglinkTalk:List of notable English palindromic phrasesexists
Page was deleted - page 69474350, rev 1116734999en.wikipedia.orglinkUser talk:Anamul Haque Nayeemexists
Requested resource is not found. - page 60797419, rev 1116747284en.wikipedia.orglinkUser talk:Khonda8exists
Page was deleted - page 70548443, rev 1082919659en.wikipedia.orglinkDraft talk:Future tennessee titans stadiumexists
Requested resource is not found. - page 17930914, rev 1116715570en.wikipedia.orglinkWikipedia:Articles for deletion/Log/2008 June 14exists
Requested resource is not found. - page 11022716, rev 1116713326en.wikipedia.orglinkWikipedia:Usernames for administrator attentionexists
Requested resource is not found. - page 13147785, rev 1116712628en.wikipedia.orglinkWikipedia:Articles for deletion/Log/2007 September 8exists
Page was deleted - page 71801047, rev 1111613061en.wikipedia.orglinkTalk:Namrita Mallaexists
Page was deleted - page 70624289, rev 1084576037en.wikipedia.orglinkTalk:2022-23 NCAA Division I men's basketball rankingsexists
Requested resource is not found. - page 68538791, rev 1116700321en.wikipedia.orglinkWikipedia:Contributor copyright investigations/HinduKshatranaexists
Requested page does not exist. - page 69879720, rev 1116693567en.wikipedia.orglinkTalk:Batman and Superman: Battle of the Super Sonsexists

Reviewing this list:

  • Almost all occurences are obviously bogus, as the revision and page exist. It seems that all of them have been recently created, moved or undeleted. This is consistent with my guess that something is just reading stale data.
  • The 4 revisions that don't exist have been (in 3 cases) undeleted and then deleted again, or (in 1 case) deleted while reverting a page move. They have existed at the time of the error.
  • The "Requested page does not exist" errors, which I thought are caused by inconsistent title normalization, occurred on pages that were recently moved, so it looks like this was also something reading stale data and comparing two different titles.
  • There's one title on the list in the main namespace ("For What It's Worth"), because the page was subjected to a (reverted) page move to the talk namespace.

I think the solution is to just retry when the errors occurs, without RESTBase, like in T315689. The issue only occurs a couple dozen times per day, so this should be okay to do.

Change 844574 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@master] Retry without RESTBase when the page/revision seems to be missing

https://gerrit.wikimedia.org/r/844574

Change 844574 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] Retry without RESTBase when the page/revision seems to be missing

https://gerrit.wikimedia.org/r/844574

Change 848391 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@wmf/1.40.0-wmf.6] Retry without RESTBase when the page/revision seems to be missing

https://gerrit.wikimedia.org/r/848391

Change 848391 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@wmf/1.40.0-wmf.6] Retry without RESTBase when the page/revision seems to be missing

https://gerrit.wikimedia.org/r/848391

Mentioned in SAL (#wikimedia-operations) [2022-10-24T20:21:05Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:848390|Allow 'nofollow' on external links in Parsoid output (T321437)]], [[gerrit:848391|Retry without RESTBase when the page/revision seems to be missing (T315688)]]

Mentioned in SAL (#wikimedia-operations) [2022-10-24T20:21:24Z] <urbanecm@deploy1002> urbanecm and matmarex: Backport for [[gerrit:848390|Allow 'nofollow' on external links in Parsoid output (T321437)]], [[gerrit:848391|Retry without RESTBase when the page/revision seems to be missing (T315688)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2022-10-24T20:27:44Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:848390|Allow 'nofollow' on external links in Parsoid output (T321437)]], [[gerrit:848391|Retry without RESTBase when the page/revision seems to be missing (T315688)]] (duration: 06m 38s)