Page MenuHomePhabricator

Content service incorrectly reports article as "deleted"
Closed, ResolvedPublic

Description

For the article on Harold Harefoot, the content service is returning an error, saying that the page was deleted, while the page does in fact exist, and there's no indication that anything happened to it recently.

https://en.wikipedia.org/api/rest_v1/page/mobile-sections-lead/Harold_Harefoot
https://en.wikipedia.org/api/rest_v1/page/summary/Harold_Harefoot

Event Timeline

Dbrant created this task.Nov 29 2017, 3:25 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Mholloway added a comment.EditedDec 8 2017, 4:24 PM

The linked responses look OK now. Did you happen to save a copy of the error JSON somewhere?

I guess it's possible that the service could have been temporarily affected by some of the Cassandra reshaping work being undertaken by the Services team, though I would guess we'd have heard about it if service disruptions were possible.

Dbrant closed this task as Resolved.Dec 12 2017, 6:02 PM
Dbrant claimed this task.

Sounds good; will continue to keep an eye out.

Dbrant reopened this task as Open.Feb 17 2018, 12:12 AM

Received another report of a page being incorrectly shown as "deleted":
https://en.wikipedia.org/api/rest_v1/page/mobile-sections/Public_choice

bearND edited projects, added Parsoid, Services; removed Page Content Service.EditedFeb 17 2018, 1:39 AM
bearND added a subscriber: bearND.

That's coming from the Parsoid response served by RESTBase: https://en.wikipedia.org/api/rest_v1/page/html/Public_choice

{
  "type": "https://mediawiki.org/wiki/HyperSwitch/errors/not_found#page_revisions",
  "title": "Not found.",
  "method": "get",
  "detail": "Page was deleted",
  "uri": "/en.wikipedia.org/v1/page/html/Public_choice"
}

Ah, fun issue. Thanks Dmitry.

That's coming from the Parsoid response served by RESTBase: https://en.wikipedia.org/api/rest_v1/page/html/Public_choice

{
  "type": "https://mediawiki.org/wiki/HyperSwitch/errors/not_found#page_revisions",
  "title": "Not found.",
  "method": "get",
  "detail": "Page was deleted",
  "uri": "/en.wikipedia.org/v1/page/html/Public_choice"
}

That is strange. I can parse the page locally on my laptop and also on the eqiad cluster

ssastry@wtp1025:/srv/deployment/parsoid/deploy/src$ bin/parse.js --pageName 'Public choice' < /dev/null |& head -1
<!DOCTYPE html>

@Pchelolo @mobrovac do you know what is going on? I even purged the page and still get the same error from restbase.

Pchelolo edited projects, added Services (doing); removed Services.Feb 20 2018, 3:27 PM

This is indeed a RESTBase issue. A bit of background context:

For Performance reasons to avoid checking with MediaWiki API on every request whether the page exists or whether the revision has been restricted, we maintain a local table where we store the state of the page. That table is updated when the page Parsoid HTML needs to be rerendered, and if on a rerender request MW API returns that the page does not exist, we store that starting from a particular revision the page was deleted. When the page is undeleted, the new rerender request comes, and if MW API reports that the page indeed exists, we record that as well, removing the page_deleted metadata property.

What happens here is that if on a page edit we hit some lagging MySQL slave for the info about the new page revision, the slave returns that the specified revision does not exist and we wrongly record that the page was deleted. ChangeProp for page edits does retry on 404 with a significant delay, so that the lagging slaves should catch up and the situation should normalize, but somehow no-cache requests for "deleted" page doesn't make it rerender any more and just returns a 404 right away - that is a RESTBase bug that should be fixed.

Pchelolo added a comment.EditedFeb 21 2018, 7:45 PM

This has to fix it: https://github.com/wikimedia/restbase/pull/958

With this, the retries will be fired by change-prop with a delay with a no-cache request for the same page which will eventually fix the incorrect deleted status of the page compensating for the replication lag.

Mentioned in SAL (#wikimedia-operations) [2018-02-21T21:14:34Z] <ppchelko@tin> Started deploy [restbase/deploy@56fffcf]: Do not check for article deletion for update requests T181636

Pchelolo closed this task as Resolved.Feb 21 2018, 9:25 PM
Pchelolo edited projects, added Services (done); removed Services (doing).

The above PR has been deployed and I've tested that now no-cache requests for HTML update the deleted state of the page. I've checked on the cases listed here, all the rest of the cases will gradually get cleaned up as the articles are edited and rerendered. Resolving.

Mentioned in SAL (#wikimedia-operations) [2018-02-21T21:30:32Z] <ppchelko@tin> Finished deploy [restbase/deploy@56fffcf]: Do not check for article deletion for update requests T181636 (duration: 15m 59s)

@Pchelolo Do you think this could be the case of T184556, as well?

@Pchelolo Do you think this could be the case of T184556, as well?

That particular one I couldn't reproduce and that one included a move and have been different, but I could imagine something similar to this one could have happened there. But as I couldn't reproduce it I can't be sure.