Page MenuHomePhabricator

Enforce maximum length of a translatable page title
Closed, ResolvedPublic4 Estimated Story Points

Description

Currently there are no checks. Too long names will be failing in various ways, such as

Error 1406: Data too long for column 'tmd_group' at row 1
Function: TranslateMetadata::set
Query: REPLACE INTO `translate_metadata` (tmd_group,tmd_key,tmd_value) VALUES ('page-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA','maxid',1)

Event Timeline

Tacsipacsi renamed this task from Enforce maximum length of a translatable page to Enforce maximum length of a translatable page title.Apr 30 2023, 9:44 AM
Tacsipacsi subscribed.
Nikerabbit set the point value for this task to 2.Aug 1 2023, 11:34 AM

We're also having issues when using the DatabaseMessageIndex. The tmi_key column is VARBINARY(255) and it cannot store long titles.

Note from Niklas:

MediaWiki page titles are limited to 255 bytes. But I think we prefix the namespace id so it could be slightly longer. Any keys longer than that coming from external sources needs truncating (I think the FFS classes already handle this)

Change 955329 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] WIP: Validate that translation unit title is valid MW title

https://gerrit.wikimedia.org/r/955329

I've added code to validate the combination of translatable page title + unit name to ensure that the translation unit page has a valid MW title. This enforces the 255 character limit and performs other validations to ensure that the unit id would make a valid title.

Invalid character in unit name:

image.png (491×1 px, 51 KB)

Very long unit name:

image.png (720×1 px, 71 KB)

Change 956393 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] PageTranslation: Validate display title unit id only if translatable

https://gerrit.wikimedia.org/r/956393

Change 956401 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] MessageGroupReview: Shorten group id

https://gerrit.wikimedia.org/r/956401

abi_ changed the point value for this task from 2 to 4.Sep 11 2023, 12:30 PM

Change 956401 abandoned by Abijeet Patro:

[mediawiki/extensions/Translate@master] MessageGroupReview: Shorten group id

Reason:

Need to consolidate the usage of translate_groupreviews into the MessageGroupReview class first.

https://gerrit.wikimedia.org/r/956401

Change 957263 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] MessageGroupReview: Add getGroupPriorities method

https://gerrit.wikimedia.org/r/957263

Change 957264 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] Move usage of translate_groupreviews table into MessageGroupReview

https://gerrit.wikimedia.org/r/957264

Change 957265 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] MessageGroupReview: Shorten group id when storing in database

https://gerrit.wikimedia.org/r/957265

I was playing around with the validation patches @abi_ :

Downloading refs/changes/93/956393/10 from gerrit

Observations:

  • Long names overflow in various places and cause horizontal scrollbar (not visible in the screenshot, will submit a patch to improve that
  • Unrelated: "Translation unit X may not contain underscore or slash" shown for a name without underscores in it.
  • Inconsistent bolding of translation unit names in the errors.
  • Double escaping of translation unit names (it seems my suggestion to use wfEscapeWikitext was incorrect)
  • (Not visible in the screenshot): I don't see an error for page display title on the first load. It does appear on the second load.
  • Error about length is not given if unit name is found to be invalid in other ways.

image.png (1×1 px, 156 KB)

Unrelated: "Translation unit X may not contain underscore or slash" shown for a name without underscores in it.

Can you share the unit name that was causing this issue?

Inconsistent bolding of translation unit names in the errors.

Fixed in the patch

Double escaping of translation unit names (it seems my suggestion to use wfEscapeWikitext was incorrect)

Removed calls to wfEscapeWikitext

(Not visible in the screenshot): I don't see an error for page display title on the first load. It does appear on the second load.

Submitted a fix for this in this patch: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/956393/10..11

Error about length is not given if unit name is found to be invalid in other ways.

I'm not able to reproduce this. See image below with this patch checked out:

image.png (617×1 px, 73 KB)

Unrelated: "Translation unit X may not contain underscore or slash" shown for a name without underscores in it.

Can you share the unit name that was causing this issue?

This error message is added if any of the characters in MediaWiki\Extension\Translate\PageTranslation\TranslationUnit::UNIT_MARKER_INVALID_CHARS are found in the unit name – in addition to underscore and slash, this includes newlines, less-than and greater-than characters. On the other hand, these should be caught by the TitleParser, so maybe check $ic only if the title parse succeeded?

Error about length is not given if unit name is found to be invalid in other ways.

I'm not able to reproduce this. See image below with this patch checked out:

The error is not given if TitleParser finds it invalid in other ways – it throws only one exception, upon the first failing check, and the length check is one of the last ones. I don’t think we can work this around in Translate, but it doesn’t seem to be a common issue in practice anyway (why would one include [s in translation unit names?).

Change 955329 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] Ensure translation unit title is a valid MediaWiki title

https://gerrit.wikimedia.org/r/955329

Change 956393 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] PageTranslation: Validate display title unit id only if translatable

https://gerrit.wikimedia.org/r/956393

Change 957263 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] MessageGroupReview: Move group priorities related method

https://gerrit.wikimedia.org/r/957263

Change 957264 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] MessageGroupReview: Move group states related methods into class

https://gerrit.wikimedia.org/r/957264

Change 957265 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] MessageGroupReview: Shorten group id when storing in database

https://gerrit.wikimedia.org/r/957265

I created a sample page with a very long title on translatewiki.net, and then tried to mark it for translation:

https://translatewiki.net/w/i.php?title=Special:PageTranslation&target=User%3AAbijeet+Patro%2F2023%3AProgram%2FSubmissions%2F%D8%A7%D9%84%D8%AA%D8%AD%D9%82%D9%8A%D9%82%2B%D8%A7%D9%84%D8%AA%D8%A7%D8%B1%D9%8A%D8%AE%D9%8A%2B%D9%84%D9%83%D9%8A%2B%D9%8A%D8%AA%D9%85%2B%D9%85%D8%B9%D8%B1%D9%81%2B%D8%A7%D9%84%D9%87%D8%AF%D9%81%2B%D9%85%D9%86%2B%D8%A7%D9%84%D9%85%D8%B9%D9%84%D9%88%D9%85%D8%A7%D8%AA%2B%D8%A7%D9%84%D9%85%D9%86%D8%B4%D9%88%D8%B1%D9%87%2B%D9%84%D9%83%D9%8A%2B%D9%84%D8%A7%2B%D8%AA%D8%B3%D8%AA%D8%AE%D8%AF%D9%85%2B%D9%87%D8%B0%D9%87%2B%D8%A7%D9%84%D9%85%D9%86%D8%B5%D9%87%2B%D9%84%D9%86%D8%B4%D8%B1%2B%D8%A7%D9%84%D9%85%D8%B9%D9%84%D9%88%D9%85%D8%A7%D8%AA%2B%D8%A7%D9%84%D9%85%D8%B8%D9%84%D9%84%D9%87%2B%D8%A7%D9%8A%2B-%2BJPLNWY&do=mark

image.png (815×1 px, 94 KB)

I renamed this page and then tried marking it for translation again:

https://translatewiki.net/w/i.php?title=Special:PageTranslation&target=User%3AAbijeet+Patro%2F2023%3AProgram%2FSubmissions%2F%D8%A7%D9%84%D8%AA%D8%AD%D9%82%D9%8A%D9%82%2B%D8%A7%D9%84%D8%AA%D8%A7%D8%B1%D9%8A%D8%AE%D9%8A%2B%D9%84%D9%83%D9%8A%2B%D9%8A%D8%AA%D9%85%2B%D9%85%D8%B9%D8%B1%D9%81%2B%D8%A7%D9%84%D9%87%D8%AF%D9%81%2B%D9%85%D9%86%2B%D8%A7%D9%84%D9%85%D8%B9%D9%84%D9%88%D9%85%D8%A7%D8%AA%2B%D8%A7%D9%84%D9%85%D9%86%D8%B4%D9%88%D8%B1%D9%87%2B%D9%84%D9%83%D9%8A%2B%D9%84%D8%A7%2B%D8%AA%D8%B3%D8%AA%D8%AE%D8%AF%D9%85%2B%D9%87%D8%B0%D9%87%2B%D8%A7%D9%84%D9%85%D9%86%D8%B5%D9%87%2B%D9%84%D9%86%D8%B4%D8%B1%2B%D8%A7%D9%84%D9%85%D8%B9%D9%84%D9%88%D9%85%D8%A7%D8%AA%2B%D8%A7%D9%84%D9%85%D8%B8%D9%84&do=mark

image.png (846×1 px, 71 KB)

If I remove Allow translation of page title, I'm able to mark the page for translation.

I translated the page to Hindi and am able to see the group stats for this page updating properly.

Did a quick check on Meta-Wiki and the new code works as expected.