Page MenuHomePhabricator

meta property="dc:modified" may be absent
Closed, ResolvedPublic

Description

lastmodified is a required page lead property. However, not all pages have the dc:modified metadata header that we rely on for this, which is causing 500s in production.

This seems to be a new class of error since yesterday's Parsoid deployment (it was apparently so reliably present that we weren't even bothering to check for its absence) but might just be an artifact of everything being regenerated in RESTBase.

Questions:

  1. Does this indicate an upstream problem?
  2. What's the correct way to handle the absence of dc:modified? Should the lastmodified property be made optional, or should we introduce something like a new unknown value?

Example:

https://en.wikipedia.org/api/rest_v1/page/html/Haptic_suit

[2017-12-13T13:45:38.902Z] ERROR: mobileapps/152 on scb2001: 500: internal_error (message="500: internal_error", status=500, type=internal_error, detail="Cannot read property 'getAttribute' of undefined", levelPath=error/500, request_id=4380cc84-dfed-11e7-b043-499e7786d95b)
  stack: TypeError: Cannot read property 'getAttribute' of undefined
      at getModified (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/src/lib/parsoid-access.js:79:68)
      at Request.getParsoidHtml.then (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/src/lib/parsoid-access.js:121:33)
      at Request.tryCatcher (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/node_modules/bluebird/js/release/util.js:16:23)
      at Promise._settlePromiseFromHandler (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/node_modules/bluebird/js/release/promise.js:512:31)
      at Promise._settlePromise (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/node_modules/bluebird/js/release/promise.js:569:18)
      at Promise._settlePromise0 (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/node_modules/bluebird/js/release/promise.js:614:10)
      at Promise._settlePromises (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/node_modules/bluebird/js/release/promise.js:693:18)
      at Promise._fulfill (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/node_modules/bluebird/js/release/promise.js:638:18)
      at Promise._resolveCallback (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/node_modules/bluebird/js/release/promise.js:432:57)
      at Promise._settlePromiseFromHandler (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/node_modules/bluebird/js/release/promise.js:524:17)
      at Promise._settlePromise (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/node_modules/bluebird/js/release/promise.js:569:18)
      at Promise._settlePromise0 (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/node_modules/bluebird/js/release/promise.js:614:10)
      at Promise._settlePromises (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/node_modules/bluebird/js/release/promise.js:693:18)
      at Promise._fulfill (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/node_modules/bluebird/js/release/promise.js:638:18)
      at Request._callback (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/node_modules/bluebird/js/release/nodeback.js:45:21)
      at Request.self.callback (/srv/deployment/mobileapps/deploy-cache/revs/5832a8cbd5e65e241117376e75b6fff978bf5448/node_modules/request/request.js:186:22)
  --
  request: {
    "url": "/en.wikipedia.org/v1/page/mobile-sections/Haptic_vest",
    "headers": {
      "x-request-id": "4380cc84-dfed-11e7-b043-499e7786d95b",
      "content-length": "0"
    },
    "method": "GET",
    "params": {
      "0": "/en.wikipedia.org/v1/page/mobile-sections/Haptic_vest"
    },
    "query": {},
    "remoteAddress": "10.192.16.185",
    "remotePort": 49560
  }

Event Timeline

Question for Parsoid: is there any reason we might be seeing a significant number of pages without <meta property="dc:modified"/> tags after yesterday's deployment?

Mholloway renamed this task from Lastmodified may be unavailable to meta property="dc:modified" may be absent.Dec 13 2017, 2:21 PM

Actually, there seems to be a bigger problem here. /page/html is returning only sections of the desired page.

https://en.wikipedia.org/api/rest_v1/page/html/Haptic_suit (returning only section 'Teslasuit (2015)')

vs

https://en.wikipedia.org/wiki/Haptic_suit

It's /page/html pages affected by this that seem to exhibit the problem.

I added a few example page titles from the logs to the paste P6456.

Briefly, Parsoid only has that meta info for the head when it fetches the page source itself to parse. This seems to indicate that requests posting wikitext are being stored as the latest render.

requests posting wikitext are being stored as the latest render.

Stored or cached. Refining this base on,

https://en.wikipedia.org/api/rest_v1/page/html/Leonard_Cohen
https://en.wikipedia.org/api/rest_v1/page/html/Leonard_Cohen/

It's starting to appear like whatever the most recent request to parse a title is what's being returned when a page is requested without specifying a revision.

requests posting wikitext are being stored as the latest render.

Stored or cached. Refining this base on,

https://en.wikipedia.org/api/rest_v1/page/html/Leonard_Cohen
https://en.wikipedia.org/api/rest_v1/page/html/Leonard_Cohen/

It's starting to appear like whatever the most recent request to parse a title is what's being returned when a page is requested without specifying a revision.

So, I guess one theory is:
(a) someone (likely google-crawler) posts wikitext against a title -- but this wikitext is partial fragments of the page. RB takes the parsed HTML and stores it against the title maybe?
(b) when HTML is requested without posting a revision, cached / stored content is returned.
(c) but when HTML is requested with a specific revision, that revision is parsed if content for it doesn't exist and because of (a), the default content for the title is also updated

See https://gerrit.wikimedia.org/r/#/c/398122/. Turns out the value was actually not used by MCS later. So, we just acted as a canary for RB issues here.

So, we just acted as a canary for RB issues here.

Good thing ! :)

Mentioned in SAL (#wikimedia-operations) [2017-12-13T21:20:08Z] <ppchelko@tin> Started deploy [restbase/deploy@a993556]: Do not fallback if the revision is not specified T182770

Mentioned in SAL (#wikimedia-operations) [2017-12-13T21:24:11Z] <ppchelko@tin> Finished deploy [restbase/deploy@a993556]: Do not fallback if the revision is not specified T182770 (duration: 04m 04s)

mobrovac edited projects, added Services (done); removed Services.
mobrovac subscribed.

It turns out that this was a bug in RESTBase whereby we would serve stashed content to clients if the revision asked for was not in current storage, but stashed content was. PR #932 fixes this by requiring the revision to be specified in order to use the fall-back mechanism. No new reports or log entries have emerged pertaining to this , so resolving.

This has been fixed on RESTBase side. @bearND unless you want to make some fixes on MCS side as well, I think we can close this.

mobrovac assigned this task to Pchelolo.

Eh, I did say in my previous post:

[...] No new reports or log entries have emerged pertaining to this , so resolving.

But forgot to actually resolve it. Doing so now.