Page MenuHomePhabricator

Discuss: multi-lingual label usage tracking
Closed, ResolvedPublic

Description

There are several alternatives for implementing tracking for multi-lingual labels

  • different mode for multi-lingual sites? (different interpretation of L aspect)
  • different usage aspect for "labels in other languages"?
  • separate usage aspect for each language? (needs schema change!)

The result of the discussion should be a clear plan for the implementation. The plan should explain the expected interaction with the parser cache.

Event Timeline

daniel raised the priority of this task from to High.
daniel updated the task description. (Show Details)

We discussed this.
@daniel can you post outcome here and close?

The outcome of this discussion was:

  • introduce an additional column into the wbc_entity_usage table to track usage of different renderings of the page separately.
  • the value of that column should at least contain the target language for which the page was rendered. It may contain the full parser cache key.
  • the effect should we that we only purge the parser cache for a page if a cached rendering uses the modified data, instead of always puring when any language would be affected.
  • separate purging of individual renderings is currently not supported by the parser cache, but may be in the future.

This needs a schema change.

daniel claimed this task.

Discussion closed.

Created T92288 for the implementation.

Revisited after a first exploratory coding session showed the proposed solution to be problematic. An ad-hoc discussion with Thiemo and Jan resulted in going back to the one-aspect-per-language solution. Key points:

  • The intent of tracking usage by aspect is to reduce the number of pages to purge when a change notification for an entity is received. Ntoe that purging a page purges all renderings/variants in the cache.
  • adding a render_key column greatly increases the size of the table
    • the number of aspects (per item/page combination) is multiplied by the number of render keys.
    • Example: let's say 200.000 image description pages on Commons use Q183 as a "tag", and use the label and local page title (L and T aspects), resulting in 400k rows in the database. If on average each page is viewed in 2 languages, this would result in 800k rows; not only the rows for the L usage would be doubled, but the rows for the T usage too, even though that kind of usage does not care about language.
  • adding a render_key does not provide any substantial advantage over using one aspect
    • the expected advantage was to cover cases in which some conditional on the page would result in different items and aspects being used when rendering the page for different users.
    • however, this is only possible (and sensible) if the conditional depends on a feature that also causes a parser cache split.
    • Besides user language, that could be things like the page being editable, or the thumbnail size, numbering of headings, date format, etc.
    • Besides the user language, these settings are mostly inaccessible to conditionals in wikitext/Lua. And if accessible, they are very unlikely to be used.
    • When receiving a change notification, the associated diff is used to determine which aspect of the entity changed, and thus, which usages are affected by the change.
    • From the diff, available features for this decision are the "section" (terms, sitelinks, statement, etc), the language (for labels, descriptions and aliases), and the site id (for sitelinks).
    • Only the features available from the diff can be used to determine the affected aspects. So if we tracked different usages per page depending on the user's thumbnail size, this information would not be helpful to achieve the goal to limit the number of pages to purge, since the diff contains no feature we could filter the thumbnail size in the render_key against.

Caveat affecting both options (render_key column, or "L/de"-style aspects): Updating the table is difficult

  • when the page is edited, all tracking rows referring to it (with any render_key / language) should be removed/invalidated.
  • when a page is rendered, only rows referring to the current render_key/language should be added/updated/removed.
  • It's unclear whether there is any guarantee over the order in which hooks fire when a page is edited.