Page MenuHomePhabricator

Disable parsercache during roundtrip testing
Closed, ResolvedPublic

Description

Presumably since https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/804300 results of parse requests could be returned from the parsercache.

But scandium is being used for roundtrip (and other) testing of parsoid and we definitely want to avoid the cache on all requests to have meaningful results.

Event Timeline

Arlolra triaged this task as High priority.May 16 2023, 5:51 PM
Arlolra moved this task from Needs Triage to Testing on the Parsoid board.

We could add a purge flag to the endpoint, so it behaves similar to action=purge: force a fresh render, writes it to the cache, and returns it.
We'd probably want it to be rate limited in the same way as regular purging is. That should be simple enough.

Alternatively, we could use the transform endpoint to perform the rendering.

Could we set $wgParserCacheType = CACHE_NONE; on scandium?

Could we set $wgParserCacheType = CACHE_NONE; on scandium?

I'm sure it could be hacked into the config, but I don't know if it would be a good idea.

So, looks like we have the following possibilities here: (a) use the transform endpoint to force parsing (b) issue an ?action=purge (and hope that we don't hit rate limits) before starting a test (c) implement an internal use only purge header / flag for the endpoint.

(a) will require us to issue a wiktiext fetch before the test which adds to the test latency, but it is not the end of the world. May add another hour or two to the test run.
(b) is going to be unrealiable
(c) is also something we could consider but requires new code to be written and deployed to production.

Should we explore strategy (a) here?

Hmm ... that looks like a potential bug! We should test that locally.

As far as rt testing is concerned, the oldid is not set when making the wt2html request, so that's not an issue there,
https://github.com/wikimedia/mediawiki-services-parsoid/blob/master/bin/roundtrip-test.js#L825-L831

But, as a more general bug, if wikitext is set, then setContentSource is called on the HtmlOutputRendererHelper,
https://github.com/wikimedia/mediawiki/blob/master/includes/Rest/Handler/ParsoidHandler.php#L358-L360

which resets the revision to another MutableRevisionRecord, but this time with 0 id,
https://github.com/wikimedia/mediawiki/blob/master/includes/Rest/Handler/Helper/HtmlOutputRendererHelper.php#L316-L322

and that's what gets tested for caching purposes. That feels a little fragile. It seems like we should stop setting the id when trying to create a page config. That probably exists to support the {{REVISIONID}} variable? but it seems fine to say that unless we're fetching the wikitext ourselves from the db, it's not a specific revision. Oldid confusion already arose in T333402#8804435

Change 966929 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] Don't set revision id if we're provided wikitext

https://gerrit.wikimedia.org/r/966929

Based on the past few comments, it's probably safe to say we aren't getting cached responses so there's nothing to do here.

Change 966929 merged by jenkins-bot:

[mediawiki/core@master] Don't set passed in revid if we're provided wikitext

https://gerrit.wikimedia.org/r/966929