I noticed that the ApiParse module sets a number of ParserOptions options which are not represented in ParserOptions::optionsHash(), specifically enableLimitReport(), setIsPreview() and setIsSectionPreview(). It calls WikiPage::getParserOutput() with these options, which will cause the parser cache to be polluted with the specified options.
In https://gerrit.wikimedia.org/r/#/c/233344 , in addition to adding setTidy() to the list of modified options, I proposed rectifying the parser cache pollution by having ApiParse skip the parser cache when the relevant API module parameters are specified.
In code review, Brad pointed out that the pollution issue may wider than just ApiParse, and suggested having WikiPage::getParserOutput() validate the ParserOptions object. So I did a caller survey, which I will post below. It shows two callers that are polluting the parser cache: ApiParse and GeoData.
GeoData uses the ParserOptions constructor with no parameters. The ParserOptions constructor is intended for message parsing and the like and so does not call setTidy() and enableLimitReport(), and these options are not in optionsHash(). So GeoData pollutes the parser cache with untidied output.
Design notes and open questions:
- ParserOptions has a lot of options (~35), and more are frequently added. So validation of the ParserOptions object may need a more abstract concept of an "option" and the ability to iterate over them.
- When adding new options, adding validation should be simple and obvious, so that developers do not omit it.
- Is skipping the cache an appropriate response to GeoData's incorrect construction of ParserOptions? We could throw an exception instead. We could provide a separate validation interface, and callers could opt in to skipping the cache.
- Wikidata uses ParserOptions::addExtraKey() to avoid polluting the cache. How can this be reflected in validation? Should addExtraKey() be replaced with a method which tells us which options are to be excluded from validation?