Page MenuHomePhabricator

Wikibase\Repo\Content\EntityHandler should not override makeParserOptions()
Closed, ResolvedPublic

Description

While working on MCR, we noticed that having Content-specific ParserOptions doesn't make much sense there. We'd like to remove that.

Wikibase's EntityHandler/EntityContent seems to be the only model actually doing this, and the way it's doing it is 2/3 based on mistaken assumptions.

EntityHandler's current implementation of the method is

public function makeParserOptions( $context ) {
    if ( $context === 'canonical' ) {
        // There are no "canonical" ParserOptions for Wikibase,
        // as everything is User-language dependent
        $context = RequestContext::getMain();
    }

    $options = parent::makeParserOptions( $context );

    // The html representation of entities depends on the user language, so we
    // have to call ParserOptions::getUserLangObj to split the cache by user language.
    $options->getUserLangObj();

    // bump PARSER VERSION when making breaking changes to parser output (e.g. entity view).
    $options->addExtraKey( 'wb' . self::PARSER_VERSION );

    return $options;
}

This is making three major changes to what the base ContentHandler class does.

Overrides 'canonical' to be non-canonical

This is claimed to be because "There are no 'canonical' ParserOptions for Wikibase, as everything is User-language dependent". That's mistaking the purpose of the 'canonical' behavior.

Anonymous users, in the absence of a uselang URL parameter, Accept-Language headers, or the like, will see pages in the site language and will use default values for the numberheadings, thumbsize, and stubthreshold user preferences, and it's generally a good assumption that most logged-in users are also using the site-default language and preferences. Yes, that's less true on multilingual wikis like Commons and Wikidata, but at the moment only 2.42% of users on Wikidata have their language preference set to non-English and only 2.43% have any of these four options set to non-default values.

So, since it's so likely that a view is going to need the default values for these options, when we save a page after an edit we also populate the parser cache with the version of the page parsed with canonical options. But Wikibase is breaking that for its entity pages, instead populating the parser cache with the version of the page for whichever user made the edit and leaving the canonical version to be parsed later when an IP visits it.

Conclusion: All this overriding does is slow down the page view for subsequent anonymous visitors or visitors who have default values for four user preferences. The override should be removed.

Attempts to force a cache split on the user's language

Again, since Wikibase's output is intrinsically dependent on the user's UI language, the override tries to force a cache split based on the user's language by accessing that language in the ParserOptions object it creates. However, this code in fact has no effect.

ParserOptions has a method, registerWatcher(), that allows registering of one callback to be called when cache-splitting options are used, and getUserLangObj() is indeed one of those options. But at the point EntityHandler's makeParserOptions() calls that method, no watcher has been registered so the notification disappears into the void.

So how then does Wikibase manage to successfully split the cache by language? That's done when EntityContent::getParserOutputFromEntityView() or ::getParserOutputForRedirect() calls $output->recordOption( 'userlang' ).

Conclusion: This has no effect and may be removed.

Adds an extra Entity-specific version to the cache key

And finally, EntityHandler's code calls addExtraKey() on the ParserOptions object to add an additional cache-splitting key that allows the whole cache to be invalidated if Wikibase changes the format of its HTML in an incompatible manner. This is potentially useful, but it can be done without using makeParserOptions().

First, Wikibase should register a lazy "option" that reflects its version, using the ParserOptionsRegister hook:

public static function onParserOptionsRegister( &$defaults, &$inCacheKey, &$lazyOptions ) {
    $defaults['wb'] = null;
    $inCacheKey['wb'] = true;
    $lazyOptions['wb'] = function () {
        return EntityHandler::PARSER_VERSION;
    };
}

Then, in EntityContent::getParserOutputFromEntityView() and ::getParserOutputForRedirect() where it's already calling $output->recordOption( 'userlang' ), add $output->recordOption( 'wb' ) too (and merge https://gerrit.wikimedia.org/r/421964).

Conclusion: There's a better way to handle this.

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

While working on MCR, we really noticed that having Content-specific ParserOptions doesn't make much sense anywhere, and we want to remove that option. If I understand correctly, @Anomie is pointing out a better way to achieve what we wanted: splitting the parser cache on the user language.

Change 444950 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Wikibase@master] Stop overriding ContentHandler::makeParserOptions()

https://gerrit.wikimedia.org/r/444950

Change 444950 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Stop overriding ContentHandler::makeParserOptions()

https://gerrit.wikimedia.org/r/444950

Anomie claimed this task.