Page MenuHomePhabricator

Avoid manual review of translations that include HTML tags with `style=""` attributes
Open, Needs TriagePublicBUG REPORT

Description

Translations of messages are imported from translatewiki.net into Gerrit mostly automatically, but translations that include something that looks like HTML tags are checked. Some HTML tags and attributes are automatically allowed, but for all others, manual review is required. See the parent task.

Currently, the CI tools don't allow the HTML attribute "style", so every translation of messages with it has to be manually reviewed by translatewiki maintainers.

One path towards fixing this is to allow the attribute. This is very easy to do, but has a couple of issues:

  1. It's possible only if security experts are sure that it's not a security threat. (Tagging Security for that. Once that is reviewed, the project tag can be removed.)
  2. There are no good reasons to use this attribute in translations. Translations should be about human language, and they should only have minimal essential markup. The style attribute is not essential in messages.

The other path, which seems better, is to remove it from all the English messages in which it's currently, so that translators won't have a reason to use it. Luckily, there are very few such messages. In the extensions used by Wikimedia, there are two in the Math extension and one in WikimediaMessages, which overrides a message from the CheckUser extension. (There are a few more such messages in extensions that are stored on Wikimedia Gerrit, but not deployed on Wikimedia sites.)

Used by Wikimedia:

  • Math (patch; fixed by removing the HTML from the messages entirely; other styling may be added later, see the discussion in the comments)
  • WikimediaMessages (CheckUser) (patch)

Not used by Wikimedia (possibly an incomplete list; if you find more, please add):

  • PrivateDomains
  • SocialProfile
  • UserGroups
  • Video

Event Timeline

For PrivateDomains, SocialProfile and Video extension, as the maintainer I'd be more than happy to allow the tag, so please let me know what needs to be done for those repositories.

In general, I'm inclined to agree with the overall proposal of this task, but admittedly reality is a bit...different. Writing down my quick observations/notes here for future reference:

  • PrivateDomains - The only impacted message is privatedomains-instructions. This is probably the easiest of the bunch, could move the styles to a RL module and load it on the Special:PrivateDomains page but...why?
  • SocialProfile - Used by only one message (yay!) but it happens to be level-advanced-to (boo!) which is basically the top of a tech debt iceberg. It's a nightmare and it hasn't been enough many years that I'd be willing to touch that voluntarily.
  • Video extension - Like with PrivateDomains, the only offender is a "this is how you use this special page" type of message, video-addvideo-instructions, used on Special:AddVideo. It could likely be split into multiple messages (suggestions/patches on how to do that are more than welcome!), but I always figured that'd be more disruptive than anything else. In any case, the message and some of the PHP code it references could likely use another look: it's 2024, YouTube hasn't been providing Flash-only embed codes in years. (But even if and when updating the message, we do likely wanna keep some of the distinct styling in regardless.)

For CheckUser, there is no need for the style attribute and so it can be removed.

Change #1013431 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/WikimediaMessages@master] Remove style attribute from wikimedia-checkuser-toollinks

https://gerrit.wikimedia.org/r/1013431

For PrivateDomains, SocialProfile and Video extension, as the maintainer I'd be more than happy to allow the tag, so please let me know what needs to be done for those repositories.

In general, I'm inclined to agree with the overall proposal of this task, but admittedly reality is a bit...different. Writing down my quick observations/notes here for future reference:

  • PrivateDomains - The only impacted message is privatedomains-instructions. This is probably the easiest of the bunch, could move the styles to a RL module and load it on the Special:PrivateDomains page but...why?
  • SocialProfile - Used by only one message (yay!) but it happens to be level-advanced-to (boo!) which is basically the top of a tech debt iceberg. It's a nightmare and it hasn't been enough many years that I'd be willing to touch that voluntarily.
  • Video extension - Like with PrivateDomains, the only offender is a "this is how you use this special page" type of message, video-addvideo-instructions, used on Special:AddVideo. It could likely be split into multiple messages (suggestions/patches on how to do that are more than welcome!), but I always figured that'd be more disruptive than anything else. In any case, the message and some of the PHP code it references could likely use another look: it's 2024, YouTube hasn't been providing Flash-only embed codes in years. (But even if and when updating the message, we do likely wanna keep some of the distinct styling in regardless.)

Thanks for the quick reply! The parent task T358384 gives more details, but basically, the current CI configuration in Gerrit doesn't allow automatic merging of translation patches that have added or updated translations with something that looks like HTML tags with the style tag. Since they can't be merged automatically, they have to be reviewed manually. It's easy, but it adds up, and the people who maintain the translations imports could do something more useful with their time. In addition, this HTML markup is (probably) the same in all the languages, so it's just copied and not translated. Though copying it's not very hard, it does take a few seconds, and translators could do something better with their time, too :)

So if it can be completely taken out of all those English messages, it would be super-nice.

For math, there is only two such messages:

	"math-test-fail": "Test ''$1'' <span style=\"color:red\">failed</span>.",
	"math-test-success": "Test ''$1'' <span style=\"color:green\">succeeded</span>.",

I suggest removing the color. Would that be ok?

For math, there is only two such messages:

	"math-test-fail": "Test ''$1'' <span style=\"color:red\">failed</span>.",
	"math-test-success": "Test ''$1'' <span style=\"color:green\">succeeded</span>.",

I suggest removing the color. Would that be ok?

That's certainly a possibility, but you'll lose the colors :)

It's probably not a big deal, because that page is used quite rarely.

Unlike style=, the class= attribute is allowed, so you can add classes for this to a css file, load it on that special page as a ResourceLoader module, and replace style="color:green" in the messages with something like class="ext-math-test-success". That's probably the easiest and cleanest way to keep the colors.

Change #1013586 had a related patch set uploaded (by Physikerwelt; author: Physikerwelt):

[mediawiki/extensions/Math@master] Remove color from test result messages

https://gerrit.wikimedia.org/r/1013586

As some people can't differentiate green and red it would be maybe even better to add icons... That can be done with an extra commit.

Change #1013586 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Remove color from test result messages

https://gerrit.wikimedia.org/r/1013586

Change #1013431 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMessages@master] Remove style attribute from wikimedia-checkuser-toollinks

https://gerrit.wikimedia.org/r/1013431