Page MenuHomePhabricator

graphoid fails if page_props is out of sync with parser cache, or on old revisions of a page
Open, LowPublic

Description

I just tried to put a template on [[Commons:VP]] (via a template). Graphoid responds with 400 bad request for the url used ( https://graphoid.wikimedia.org/commons.wikimedia.org/v1/png/Commons%3AVillage_pump/160551952/72802411fc5e55de5868dc35e32bf32e733732bf.png )

Event Timeline

Bawolff created this task.May 13 2015, 5:51 AM
Bawolff raised the priority of this task from to Needs Triage.
Bawolff updated the task description. (Show Details)
Bawolff added a project: Graphoid.
Bawolff added subscribers: Bawolff, Yurik.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 13 2015, 5:51 AM

Hmm, null edit actually fixed it.

Bawolff renamed this task from Graphoid giving 400 bad request to Graphs invoked by template make graphoid give 400 bad request if template edited (until links update?).May 13 2015, 5:56 AM
Bawolff set Security to None.

For example https://commons.wikimedia.org/w/index.php?title=Commons:Sandbox&oldid=160554557 has a graph on it that has a hash that will change on every render. So everytime the parser cache is updated, the hash changes, but the page_props table only is updated on links update, which will result in invalid images most of the time.

This may be a bit of a contrived example, but I imagine there are other situations (especially with templates), where the requirement of having page_props will fail a small percentage of the time.

I'm also concerned a little bit about old page versions. In order for the current scheme to work, every graph must be viewed on a page when it is the current revision, and the image must never be thrown away. Once the revision is no longer current, graphoid can no longer regenerate the image for the graph on that revision.

Bawolff renamed this task from Graphs invoked by template make graphoid give 400 bad request if template edited (until links update?) to graphoid fails if page_props is out of sync with parser cache, or on old revisions of a page.May 13 2015, 6:06 AM

Strange bug - trying to repo it locally. In theory, it should not fail because the generation of hash and storage of the {hash->definition} mapping in page props happens at the same time, so it really shouldn't matter if the definition would expand to something else afterwards - it will be stored as expanded in pageprops under the same id as in image URL.

It might be an API call that is being made by something right before saving:

action=stashedit & format=json & title=GT & section= & sectiontitle= & text={{#tag:graph|\r\n{\r\n  "name": "arc",... & contentmodel=wikitext & contentformat=text/x-wiki & baserevid=139 & token=4fff98xxxxxxxxx5335bf+\

Which seems to go through all the steps of generating pageprops. Right afterwards, another call is made, this time from index.php. I suspect that pageprops DB table is update by the first API call, and later by the second, non-api call. In my local vagrant, I see the URL generated with the second hash, but graphoid fails to pull the data because the pageprops does not contain requested hash yet. Thus, showing broken image icon. Afterwards, if i manually force-refresh the icon image, it works.

The solution would be to figure out who is making action=stashedit and either handle it properly (detect in graph ext and avoid saving to db), or not call the api.

The callstacks for both calls:
api.php call - hash = 6b3401628c0503262a8bfc675522e8abf7517ea9:

#0 /vagrant/mediawiki/extensions/Graph/Graph.body.php(29): Graph\Singleton::finalizeParserOutput(Object(ParserOutput), false)
#1 [internal function]: Graph\Singleton::onParserAfterParse(Object(Parser), 'UNIQefce86db58...', Object(StripState))
#2 /vagrant/mediawiki/includes/Hooks.php(209): call_user_func_array('Graph\Singleton...', Array)
#3 /vagrant/mediawiki/includes/parser/Parser.php(435): Hooks::run('ParserAfterPars...', Array)
#4 /vagrant/mediawiki/includes/content/WikitextContent.php(333): Parser->parse('{{#tag:graph|?{...', Object(Title), Object(ParserOptions), true, true, NULL)
#5 /vagrant/mediawiki/includes/content/AbstractContent.php(497): WikitextContent->fillParserOutput(Object(Title), NULL, Object(ParserOptions), true, Object(ParserOutput))
#6 /vagrant/mediawiki/includes/page/WikiPage.php(2133): AbstractContent->getParserOutput(Object(Title), NULL, Object(ParserOptions))
#7 /vagrant/mediawiki/includes/api/ApiStashEdit.php(138): WikiPage->prepareContentForEdit(Object(WikitextContent), NULL, Object(User), 'text/x-wiki', false)
#8 /vagrant/mediawiki/includes/api/ApiStashEdit.php(119): ApiStashEdit::parseAndStash(Object(WikiPage), Object(WikitextContent), Object(User))
#9 /vagrant/mediawiki/includes/api/ApiMain.php(1098): ApiStashEdit->execute()
#10 /vagrant/mediawiki/includes/api/ApiMain.php(436): ApiMain->executeAction()
#11 /vagrant/mediawiki/includes/api/ApiMain.php(409): ApiMain->executeActionWithErrorHandling()
#12 /vagrant/mediawiki/api.php(90): ApiMain->execute()
#13 /var/www/w/api.php(5): require('/vagrant/mediaw...')
#14 {main}

index.php call - hash = 90d320cc3de48beb8b71fa1bd6e24e2566bde4ea (this hash is what shows up in the resulting HTML)

#0 /vagrant/mediawiki/extensions/Graph/Graph.body.php(29): Graph\Singleton::finalizeParserOutput(Object(ParserOutput), false)
#1 [internal function]: Graph\Singleton::onParserAfterParse(Object(Parser), 'UNIQ66024ffa4f...', Object(StripState))
#2 /vagrant/mediawiki/includes/Hooks.php(209): call_user_func_array('Graph\Singleton...', Array)
#3 /vagrant/mediawiki/includes/parser/Parser.php(435): Hooks::run('ParserAfterPars...', Array)
#4 /vagrant/mediawiki/includes/content/WikitextContent.php(333): Parser->parse('{{#tag:graph|?{...', Object(Title), Object(ParserOptions), true, true, 140)
#5 /vagrant/mediawiki/includes/content/AbstractContent.php(497): WikitextContent->fillParserOutput(Object(Title), 140, Object(ParserOptions), true, Object(ParserOutput))
#6 /vagrant/mediawiki/includes/poolcounter/PoolWorkArticleView.php(140): AbstractContent->getParserOutput(Object(Title), 140, Object(ParserOptions))
#7 /vagrant/mediawiki/includes/poolcounter/PoolCounterWork.php(123): PoolWorkArticleView->doWork()
#8 /vagrant/mediawiki/includes/page/Article.php(674): PoolCounterWork->execute()
#9 /vagrant/mediawiki/includes/actions/ViewAction.php(44): Article->view()
#10 /vagrant/mediawiki/includes/MediaWiki.php(413): ViewAction->show()
#11 /vagrant/mediawiki/includes/MediaWiki.php(291): MediaWiki->performAction(Object(Article), Object(Title))
#12 /vagrant/mediawiki/includes/MediaWiki.php(583): MediaWiki->performRequest()
#13 /vagrant/mediawiki/includes/MediaWiki.php(432): MediaWiki->main()
#14 /vagrant/mediawiki/index.php(46): MediaWiki->run()
#15 /var/www/w/index.php(5): require('/vagrant/mediaw...')
#16 {main}
Yurik triaged this task as Low priority.May 13 2015, 6:24 PM
Yurik added a subscriber: Anomie.

Spoke with @Anomie about this, apparently its a known issue of consistency, similar to T20478, T59256, and T86492

Spoke with @Anomie about this, apparently its a known issue of consistency, similar to T20478, T59256, and T86492

Further to that, the solution I suggested is to not be using page_props for this, since its behavior doesn't match with the needs of graphoid. Hash => definition mappings should be stored in some other way that can be updated immediately as the hash is generated and expired on its own schedule, instead of being tied to LinksUpdate jobs run on current revisions of saved pages.

Full discussion can be found in http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-dev/20150513.txt from about 13:06:09 to 14:33:04.

Change 530767 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/extensions/Graph@master] Stop storing gzipped JSON blobs in page_props

https://gerrit.wikimedia.org/r/530767

This patch doesn't fully address this issue, but it makes the API module pull the graph data from the parser cache (where it was already being stored) rather than from the page_props table. However, this still has the API module using the ParserCache entry for the canonical ParserOptions, and it's possible to construct pages (most of them pathological examples, but some real-world ones) that have graph JSON that vary on the ParserOptions. So we might still want a way to access hash->definition mappings for hashes that appear in a non-canonical rendering of the latest revision of a page. Perhaps we could encode the ParserOptions in the Graphoid URL, or have some other way of identifying which parser cache entry to use.

It would also be nice to be able to use the FlaggedRevs-specific parser cache entries for stable versions, to solve the equivalent of this issue when viewing stable versions (see also T192695).

Yurik added a comment.Aug 18 2019, 5:12 PM

@Catrope thanks for tackling it! I always thought parser cache is non-persisted, so if a page does not get any edits in 2 months, the relevant data might not be there?

@Catrope thanks for tackling it! I always thought parser cache is non-persisted, so if a page does not get any edits in 2 months, the relevant data might not be there?

This is true in the technical sense that if there's a cache miss during an API request for graph data, the page will be reparsed. However, that seems pretty unlikely, because Graphoid only sends these API requests in order to graph requests that come from parsed HTML, so if we're getting an API request we probably recently displayed the page and the parser cache should still be warm.

daniel added a subscriber: daniel.

tagging CPT for code review

Yurik added a comment.Aug 21 2019, 4:34 AM

However, that seems pretty unlikely, because Graphoid only sends these API requests in order to graph requests that come from parsed HTML, so if we're getting an API request we probably recently displayed the page and the parser cache should still be warm.

Would it be possible for the Varnish cache to still have the html page and the parser cache loose corresponding object?

Would it be possible for the Varnish cache to still have the html page and the parser cache loose corresponding object?

The parser cache losing the data is unlikely, but the parser cache having a new version and varnish having an old version will always happen right after an edit. Anon readers will hit the old version in varnish for a while. This is the case for the HTML output as well - in fact, it should be precisely in sync.

Change 530767 merged by jenkins-bot:
[mediawiki/extensions/Graph@master] Stop storing gzipped JSON blobs in page_props

https://gerrit.wikimedia.org/r/530767

WDoranWMF added a subscriber: WDoranWMF.

Untagging CPT as CR was completed by another team.