Page MenuHomePhabricator

page_props quickly gets out of sync with parser output
Open, HighPublic

Description

When graph uses external data request, and that request is created with time-based wiki markup, graph works for a while, but later stops loading.

During page parse, graph json is generated by the parser, and it includes an external URL with dates like these: 2016-03-21-05 -- 2016-04-20-05

wikirest://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia.org/all-access/user/All%20Writs%20Act/daily/2016032105/2016042005

The hash of that graph is recorded in the HTML output, and the graph spec itself is stored in the page_props table.

Later, something either regenerates the page, creates a new hash, and starts serving that without updating the page_props table, or, the page_props table is updated without changing cache. When I looked at the broken graph URL, I saw that the hash it was using corresponded to the date value 2016-03-21-09 -- 2016-04-20-09, but the page_props contained the graph for 2016-03-21-05 -- 2016-04-20-05. So it seems the page_props was stored with the earlier-generated value, and the varnish/parser cache was generated with the later ones (4 hours later).

Event Timeline

Yurik renamed this task from Graph's time-based external data quickly becomes invalid (e.g. pageviews graph) to page_props quickly gets out of sync with parser output.Apr 24 2016, 2:25 PM
Yurik added a subscriber: aaron.
Yurik triaged this task as High priority.Apr 24 2016, 11:11 PM

?action=purge will cause the page to be re-parsed and parser cache updated. However, unless &forcelinkupdate=1 is also used, the various links tables and page_props are not updated.

I've never really been a fan of using page props to store the graph info because IIRC due to that viewing old revisions doesn't work.