Page MenuHomePhabricator

Migrate Wikibase to use comment_data field instead of SummaryFormatter
Open, Needs TriagePublic

Description

Problem:
Currently, Wikibase stores semi-structured edit summaries in the text of a revision comment, and then disassembles and properly formats them on display. This conversion is inherently lossy, leading to problems like T186035.

Now that we have a JSON blob per comment available (the comment_data field in the comment table, thanks to T166732), we should use that instead to store the information in a structured way. The comment_text should then, according to the documentation in the SQL file, be that information rendered in the content language (i. e. English on Wikidata).

Example:
Currently, you may have a Summary with members like the following:

  • module name: wbsetlabel
  • action name: add
  • language code: en-gb
  • comment arguments: (empty list)
  • summary arguments: “test label”
  • user summary: “user-specified summary (API parameter)”

(This corresponds to a FormatableSummary with the same fields, except that the module and action name are collapsed into one field, the message key wbsetlabel-add.)

These are then combined by the SummaryFormatter into a single comment_text like the following:

/* wbsetlabel-add:1|en-gb */ test label, user-specified summary (API parameter)

AutoCommentFormatter then picks this apart again and formats it in the user language into a message like the following:

Added [en-gb] label: test label, user-specified summary (API parameter)

(The highlighted part is in reality visually muted, not highlighted, but I can’t replicate that in Remarkup.)

In the JSON blob, we could instead store these as separate fields.

Open questions:

  • What implications does this have for external tools? Some of them also format the comment text like Wikibase does (e. g. WDVD, which queries the wiki replicas directly, or EditGroups, which listens to the [recentchange EventStream](https://stream.wikimedia.org/?doc)) – if we change the comment text to be the rendered comment in the content language, they will break.
  • Can we make the structured comment data available to AbuseFilters? This could solve e. g. T47252; see also T205254 for more general information.
  • Do we want to migrate old comments to comment_data, or should we leave them alone and keep AutoCommentFormatter so we can still render them?