Page MenuHomePhabricator

Translate tag doesn't get parsed on action=parse
Closed, ResolvedPublic

Description

Visit:
https://meta.wikimedia.org/wiki/Special:ApiSandbox#action=parse&format=json&page=Community_Engagement&prop=text&wrapoutputclass=&section=0

The response contains

<translate>

.. why?

Similarly the same happens for tvar tags.

Acceptance criteria

  • lt;translate> is not present in the output of API Request
  • lt;tvar| is not present in the output of API Request
  • Post deployment to test wiki, a visit to testwiki doesn't show the translate and tvar tags

Event Timeline

This is mentioned in https://www.mediawiki.org/wiki/Help:Extension:Translate/Page_translation_administration/en#Segmentation – the tags are handled by Translate via the ParserBeforeStrip hook. Parser never sees these tags in normal usage – so in theory there are no issues with improper nesting .

@Nikerabbit I'm not sure I understand your comment. What is your recommendation to fix this? Can you clarify what you think should happen here?

IMO the current output doesn't seem correct.

Parser never sees these tags in normal usage – so in theory there are no issues with improper nesting

For a bit more context this is breaking page previews (T167852) which now has to be aware of how translate tags work to provide relevant extracts so this is why this is a problem:

Screen Shot 2017-06-27 at 11.17.41 AM.png (391×571 px, 86 KB)

I notice this seems to happen where the <translate> tag is opened but not closed so could be to do with unbalanced templates.

One solution might be to replace

<translate>
``` with

<span class="mw-translate">

It could then benefit from wgTidy.


It's also not clear how to use the parse API to return translated text? For example if I wanted the French text - how could I get that?

There are at least two options:

  • Strip the <translate> tags yourself at some point, they don't really add anything useful in previews
  • Check whether a page is translatable page source page, in that case fetch the preview from one of the translation pages instead. If and only if the page is marked for translation, then pagename/page-language-code is guaranteed to exist and contain the same content, without any <translate> tags, at the time the page was last marked for translation. Similarly there are other language code subpages for available translations.

Glossary might help if you are not familiar with the concepts.

Unfortunately both of these options require you to handle translatable pages specially. Changing Translate to hide unbalanced <translate> tags is something I could consider, if it doesn't make finding and fixing unbalanced tags harder. We do already prevent saving a page where the <translate> tags are not balanced. I think the same issue is actually happening sometimes when previewing section-level edits.

Changing Translate to hide unbalanced <translate> tags is something I could consider, if it doesn't make finding and fixing unbalanced tags harder.

That would be useful, as it allows us to defer the problem of making sure these are translated which is not a top priority right now.

I think the same issue is actually happening sometimes when previewing section-level edits.

Yes this is where I'm seeing it, and this is also what TextExtracts does internally. Would be good to at least guard for this situation
It appears in the preview of the editor for https://www.mediawiki.org/w/index.php?title=Manual:System_administration&section=0&action=edit

Do you happen to know whether there is a flag in the parser that I can use to check whether we are previewing the whole page or only part of it (and which also works for TextExtracts)? That would make the change I proposed uncontroversial to me.

Change 361847 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[mediawiki/extensions/Translate@master] Clean-up any unbalanced <translate> tags during section preview

https://gerrit.wikimedia.org/r/361847

Change 361892 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/TextExtracts@master] Send sectionpreview parameter on TextExtract parse

https://gerrit.wikimedia.org/r/361892

I will take a look at this today.

Change 361971 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/Translate@master] Collapse tvar tags in section preview

https://gerrit.wikimedia.org/r/361971

Change 361847 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Clean-up any unbalanced <translate> tags during section preview

https://gerrit.wikimedia.org/r/361847

Change 361892 merged by jenkins-bot:
[mediawiki/extensions/TextExtracts@master] Send sectionpreview parameter on TextExtract parse

https://gerrit.wikimedia.org/r/361892

All patches are fixed. I'll make sure to verify the fix and sign it off.

Change 361971 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Collapse tvar tags in section preview

https://gerrit.wikimedia.org/r/361971

All patches are fixed. I'll make sure to verify the fix and sign it off.

Should someone else sign off on this as you submitted/reviewed changes that fixed the issue?

Can do, but can't do that until next deploy.
No train next week also... :/

It's possible to sign this off now by following steps in acceptance criteria.

phuedx updated the task description. (Show Details)
phuedx updated the task description. (Show Details)

I've dropped the

This fix/new behaviour is documented in the extension's docs.

acceptance criterion because I think it's already covered in https://www.mediawiki.org/wiki/Help:Extension:Translate/Page_translation_administration/en#Segmentation.