Page MenuHomePhabricator

[Epic] Get rid of Cite backlink formatting i18n messages that are not actually localized
Open, MediumPublic

Description

This epic addresses only the backlink messages, independently of T383763: Deprecate and remove cite_reference_link footnote mark i18n message that is not actually localized which addresses the footnote markers in article content.

All changes recommended in this epic must be coordinated with work in T383036: [Epic] Replace i18n written out backlink marker lists with a configurable autogeneration and T383041: Improve display of long lists of reused reference backlinks.

There are a number of i18n messages in the Cite extension that don't seem to be actually localized. See below. It seems it is better to eliminate these messages - I especially don't see a reason to let HTML structures be localized. In Parsoid land, we are relying on CSS to get different formatting and localization of citations. Eliminating unused customizations could simplify the CSS rules that need to be written and maintained in some cases.

cite_references_link_one

Default message: <li id=\"$1\"$4><span class=\"mw-cite-backlink\">[[#$2|↑]]</span> $3</li>
Purpose: renders a single backlink marker in the reference list. Also includes the footnote body text.
Known customizations: 54 sites customize this message to achieve,

  • Change the "↑" character to "^" or "▲"
  • Apply bold
  • Add horizontal whitespace
  • Suppress in print media.
  • Remove backlink markers entirely.
  • Optional direction parameter $4 is sometimes omitted, probably mistakenly.

Recommendation: styling can already be applied to .mw-cite-backlink . Should suppress in print media by default. Arrow character replacement is not easy at the moment and could be supported with an additional span or shifting the existing span to only apply to the arrow. Footnote body should be independent.

cite_references_link_many

Default message: <li id=\"$1\"$4><span class=\"mw-cite-backlink\">↑ $2</span> $3</li>
Purpose: renders a list of more than one backlink marker. Includes footnote body.
Known customizations: 52 sites customize this message to achieve,

  • Same customizations as cite_references_link_one above
  • Sometimes the list of backlinks is surrounded in square brackets, eg. "[a b c]".

Recommendation: directly reuse any customization mechanisms for cite_references_link_one, so that the arrow is customized the same way in both cases.

cite_references_link_many_format

Default message: <sup>[[#$1|$2]]</sup>
Purpose: renders individual backlink markers when more than one appears on a reference.
Known customizations: 99 sites customize this message to achieve,

  • Switch to the "alternate" alphabetical backlink labels
  • Style with bold and italics
  • Style horizontal whitespace

Recommendation: Make it easier to select individual backlink elements semantically, eg. by adding a span around each one.

cite_references_link_many_format_backlink_labels

This is the focus of T383036: [Epic] Replace i18n written out backlink marker lists with a configurable autogeneration.

cite_references_link_many_sep and cite_references_link_many_and

Default message: &#32;
Known customizations: 4 sites and 3 sites, respectively

  • Replace with whitespace or comma list elements ("," plus "and")

Recommendation: boolean configuration to use existing l10n commaList .

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
InvalidNone
OpenNone
ResolvedNone
OpenNone
ResolvedNone
ResolvedNone
ResolvedNone
OpenNone
ResolvedTobi_WMDE_SW
OpenNone
ResolvedNone
OpenNone
ResolvedNone
OpenNone
StalledNone
OpenNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedMareikeHeuerWMDE
Duplicatelilients_WMDE
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone

Event Timeline

ssastry triaged this task as Medium priority.

@Jdforrester-WMF @Aaharoni-WMF @awight @thiemowmde @Izno Can any of you think of a reason why we shouldn't do this? If we agree this is ia good change, we might have to send out an email on wikitech-l / mediawiki-l for any 3rd parties that might be using these. This is coming up as part of the parsoid read views projects -- this is one of those last bit incompatibility things that I am trying to work through and address. /cc @cscott

I've heard that there are some wikis/users who prefer to have class names and the like in their local language because people don't like it otherwise. (Total hearsay on my part.) I don't think it's worth paying that set of people any mind, otherwise there would be many more complaints about the software....

I don't see any strong reason for any of these messages to be translated.

Yeah, this is a long-standing issue.

The Cite extension massively abuses the message localisation system to inject raw HTML fragments into the DOM. This is (a) bad but (b) vital to several of the ways the extension has been used for ~15 years now to support some languages. ~8 years ago the Editing and Parsoid teams as-they-were-then-called were working on replacing the need for this with CSS selectors (I believe that https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Cite/+/refs/heads/master/modules/ext.cite.style.fa.css was the exemplar done for this) led by Marc.

This was announced but I think it got lost over the years.

I believe the next steps would be T156351 and then T156350, after which point this task could be "drop old message-based display configuration of Cite"?

Right, I am picking this up now and those 2 tasks are on my radar now. But, in this task, I am focusing on messages that aren't even localized by anyone here. But, based on what wrote here, I clearly missed the fact that these messages could have been localized with on-wiki versions. I should figure out if that is the case before we can remove this.

So, you are right, even for these seemingly unused messages, it might be better to get those other tasks done before getting to this.

Looks like cite_references_link_one has some localized version on several wikis mostly to pick the caret/arrow and to bold/italicize them, it appears and as we've already done for visual diff testing, those can easily be done with CSS. But, that apart, it is looking likely that we could potentially remove some of those messages now once we confirm there are no localized versions of those on wikis.

And, here are the full results of which messages have localized versions on wikis (thanks to Reedy for running the db query for me across all wikis on the cluster):

[subbu@earth:~/work/wmf/core/extensions/Cite] grep Cite i18n.messages.usage.log | sed 's/.*Cite/Cite/g;s/\/.*//g' | sort | uniq -c| sort -nr
     93 Cite_references_link_many_format
     57 Cite_references_link_many_format_backlink_labels
     54 Cite_references_link_one
     53 Cite_references_link_many
     15 Cite_reference_link
      5 Cite_references_link_many_sep
      4 Cite_references_link_prefix
      3 Cite_references_link_many_and
      2 Cite_references_link_suffix
      2 Cite_reference_link_suffix
      2 Cite_reference_link_prefix

It is irritating a little that there are a handful of wikis there that are customizing the link_suffix and link_prefix messages. Anyway, to be continued.

Looks like it is zhwiktionary which does most of the prefix / suffix customizations. kawikibooks and kawikiquote customize the Cite_references_link_prefix message. Not sure why they do this.

And, looks like kawikibooks and kawikiquote don't actually override Cite_references_link_prefix.

So, that leaves us with zhwiktionary that overrides the prefixes (not suffixes). Instead of cite_ref- and cite_note-, they use, _ref- and _note-.

We should figure out if we can just get rid of that localized message on zhwiktionary and we can get rid of these 5 cite messages right away. cite_reference_link_key_with_num, cite_reference_link_prefix, cite_reference_link_suffix, cite_references_link_prefix, cite_references_link_suffix

Change 892491 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Cite@master] [WIP] Remove unused "HTML message" cite_references_no_link

https://gerrit.wikimedia.org/r/892491

Change 940163 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Cite@master] No expensive transformations on prefix/suffix messages

https://gerrit.wikimedia.org/r/940163

Change 940163 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] No expensive transformations on prefix/suffix messages

https://gerrit.wikimedia.org/r/940163

Change 892491 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Remove unused "HTML message" cite_references_no_link

https://gerrit.wikimedia.org/r/892491

Change 977756 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Cite@master] Drop unused …_suffix and …_key_with_num messages

https://gerrit.wikimedia.org/r/977756

Change 977756 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Drop unused …_suffix and …_key_with_num messages

https://gerrit.wikimedia.org/r/977756

Change 987766 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Cite@master] Drop unused cite_reference(s)_link_prefix messages

https://gerrit.wikimedia.org/r/987766

Change 987766 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Drop unused cite_reference(s)_link_prefix messages

https://gerrit.wikimedia.org/r/987766

Change 998778 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Cite@master] [POC] Call Parser::recursiveTagParse as early as possible

https://gerrit.wikimedia.org/r/998778

awight renamed this task from Get rid of Cite formatting i18n messages that are not actually localized to [Epic] Get rid of Cite formatting i18n messages that are not actually localized.Jan 14 2025, 3:31 PM
awight renamed this task from [Epic] Get rid of Cite formatting i18n messages that are not actually localized to [Epic] Get rid of Cite backlink formatting i18n messages that are not actually localized.Jan 15 2025, 11:00 AM
awight updated the task description. (Show Details)

cite_references_link_one

[…] Arrow character replacement is not easy at the moment and could be supported with an additional span or shifting the existing span to only apply to the arrow.

Would it be a huge tech debt if the arrow character itself was kept as a MediaWiki message? Any CSS solutions are likely to cause problems when CSS is not available (text-only browsers, content reusers with specific requirements etc.). Including full markup is problematic for Parsoid as it’s much harder to recognize footnotes as such, but that single character shouldn’t cause such problems.

My idea is creating a MediaWiki message named for example cite-back-text with the default content of , and using it both in the “one” and in the “many” variants (i.e. cite_references_link_one becomes practically <li id="$1"$4><span class="mw-cite-backlink">[[#$2|{{MediaWiki:cite-back-text}}]]</span> $3</li> and cite_references_link_many becomes <li id="$1"$4><span class="mw-cite-backlink">{{MediaWiki:cite-back-text}} $2</span> $3</li> – of course, with neither of these cite_references_link_* messages being actual MediaWiki messages anymore).

Or make both the arrow and the choice between $2 and $3 in cite_references_link_many_format community configurable (T377918)? That would avoid having too many messages, the configuration would be easier to understand (compare cite_references_link_many_format = <sup>[[#$1|$3]]</sup> with "manyFormat": "alternate"), and it also has a schema, so people can’t save invalid choices.

My idea is creating a MediaWiki message […] with the default content of

This message exists: cite_reference_backlink_symbol. It was introduced as part of T339973 but is not used at the moment for performance reasons.

Personally I think we should turn this into CommunityConfiguration where communities can pick one of the 3 symbols that are currently in use (↑, ^, and ▲, see T335129#9999251 for full analysis). I don't think it makes a lot of sense to let them enter totally arbitrary characters – and especially not any wikitext or HTML sequences.

If Cite has community configuration, then yes, my thinking was also that this should also be part of it (simply community configuration came into my mind fifteen minutes later), since

  • it provides a central place to configure things, with in-context documentation and validation if needed (in fact, all messages with {{notranslate}} should be there, as their sole purpose is configuration);
  • community configuration is loaded only once per request (I guess), not once per footnote, so it improves performance compared to MediaWiki messages a lot.

However, I’m not sure if it’s necessary to constrain what can be used; if the default doesn’t use any wikitext/HTML markup, I’d trust communities to only use markup if absolutely necessary (if the default is full of markup, less experienced admins just judging by the default can easily screw things up; but if the default contains no markup, less experienced admins probably won’t introduce it). Or at least allow arbitrary characters and prevent only markup (e.g. by sending it through htmlspecialchars()) if markup would make the Parsoid/VE implementation much more difficult.

Change #998778 abandoned by Thiemo Kreuz (WMDE):

[mediawiki/extensions/Cite@master] [POC] Call Parser::recursiveTagParse as early as possible

https://gerrit.wikimedia.org/r/998778