Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | ssastry | T229015 Tracking: Direct live production traffic at Parsoid/PHP | |||
Resolved | ssastry | T235902 Tracking: Shadow Parsoid/PHP deployment to production cluster to handle mirrored reparse traffic | |||
Resolved | • Pchelolo | T236836 RESTBase warning missing Content-Language or Vary header in pb.body.html.headers |
Event Timeline
Hmm, this was implemented in https://github.com/wikimedia/parsoid/commit/96ae83f3806b85e42d235777cba3e21947c14406
https://github.com/wikimedia/parsoid/blob/master/src/Parsoid.php#L121-L134
https://github.com/wikimedia/parsoid/blob/master/src/PageBundle.php#L73-L77
Maybe it has something to do with the hack added in T236382#5603851?
Looking at the log, there are some oddities,
api_path /bcl.wikipedia.org/v1/page/html/{title}
but
root_req.uri /en.wikipedia.org/v1/page/html/User%3ABSitzmann_%28WMF%29%2FMCS%2FTest%2FFrankenstein
I imagine this would hit this validation exception,
https://github.com/wikimedia/parsoid/blob/master/extension/src/Rest/Handler/ParsoidHandler.php#L114-L141
The user agent is,
root_req.headers.user-agent ServiceChecker-WMF/0.1.2
not sure what that's doing.
From scandium, if I curl -H "Host: en.wikipedia.org" https://parsoid-php.discovery.wmnet/w/rest.php/en.wikipedia.org/v3/page/pagebundle/User%3ABSitzmann_%28WMF%29%2FMCS%2FTest%2FFrankenstein/860183819
I get,
{"contentmodel":"","html":{"headers":{"content-type":"text/html; charset=utf-8; profile=\"https://www.mediawiki.org/wiki/Specs/HTML/2.1.0\"","content-language":"en","vary":"Accept"},"body":"<!DOCTYPE html>\n<html prefix=\"dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/\" about=\"https://en.wikipedia.org/wiki/Special:Redirect/revision/860183819\"> ...
The api_path property in the log is probably incorrect, it's a long standing RESTBase bug.. I will have a look at this from RESTBase perspective tomorrow.
Hm... Somehow the errors were mostly coming from service-checker requests and rerendering the page has completely fixed the problem. Maybe some bogus data was stored during the deploy when the cluster was in transitional configuration. I'll close this for now, will reopen if it happens again.