Page MenuHomePhabricator

Tool to convert Wikitext into translatable wiki-text
Open, Needs TriagePublic

Description

Challenge

Currently, adding translate tags to a page is a manual process, includes a small learning curve, and can get tedious. For example, if there are several tables and lists.

Proposed solution

To create a tool that is a users pastes wiki text, it will add translate tags according to the guidelines. There can be very complex pages, but v1 can support a basic pages, and can be expanded later.

Add support for the below items:

Event Timeline

Hi @KCVelaga , Is this something you are expecting the tool to perform?

image.png (1×3 px, 507 KB)

@Gopavasanth Thank you, it is pretty much what we want. I started worked on the initial backend as well to fetch the wikitext from API, convert and publish (haven't fully tested it though). Please publish this, and we can create issues and improve it. We can name the tool as translatable-wikitext-converter. Also, a quick note, we should only add <translate> tags, the indices will get added by the Translation extension once the page is marked for translation.

Cool. We can plan for the next steps based on that, also I have started working some back end related to T374784: Add support for Wikitext Tables. I will make an MR next week.

For https://phabricator.wikimedia.org/tag/tool-translatetagger/
Can we make it more specific, something like: translatable-wikitext-converter

Also, note for later, we can consider using Wikimedia Codex: https://doc.wikimedia.org/codex/main/

@KCVelaga, i guess translatetagger is better name than translatable-wikitext-converter
or suggesting: tag-translatable-wikitext

I have been looking around this since my conversation with Gopavasanth at Wikimedia Technology Summit in Hyderabad. It is a good start. Has there been any improvement since then?

We have had some good improvements today, during the first day of Indic MediaWiki Hackathon in Bhubhaneswar. For the sake of transparency, I have asked for ideas/suggestions on TA noticeboard's on Wikimedia Commons, Meta-Wiki and Wikidata.

Such a nice revival of T348497: Special:PagePreparation improvements [FY2023/24-Q2]. 🙂 I wrote there I have been believing an heuristics can add a proper markup in most cases:

  • For lists, the tool should wrap each item in its own translate tag pair.
  • For headings, the tool should ensure there is an empty line just before and just after it.
  • For tables, the tool should wrap each cell in its own translate tag pair. If the cell contains a new line, it should ensure there are an empty new line just after the first line of the cell (so the next paragraph of the cell is isolated in its own unit).

I think there are two exceptions:

  • Images: some textual image may want to be fully replaced, whereas we only want to translate the captions for pictures.
  • Some Template parameters are string constants (no localization expected), whereas other one are literal content which should be translated. We could use templateData (string vs content parameter types), but those are rarely well provided.

For images and template calls, the early version of the tool should probably ignore them and make them fully translatable as any text. A more advanced version of the tool would include a prompt which would allow the user to choose between options:

  • Do you want to allow translators to replace this image with another one (i.e. a localized version of the image)?
  • Do you want to allow translators to change the value of this template parameter (i.e. translating the text)?

Such a nice revival of T348497: Special:PagePreparation improvements [FY2023/24-Q2]. 🙂 I wrote there I have been believing an heuristics can add a proper markup in most cases:

  • For lists, the tool should wrap each item in its own translate tag pair.
  • For headings, the tool should ensure there is an empty line just before and just after it.
  • For tables, the tool should wrap each cell in its own translate tag pair. If the cell contains a new line, it should ensure there are an empty new line just after the first line of the cell (so the next paragraph of the cell is isolated in its own unit).

I think there are two exceptions:

  • Images: some textual image may want to be fully replaced, whereas we only want to translate the captions for pictures.
  • Some Template parameters are string constants (no localization expected), whereas other one are literal content which should be translated. We could use templateData (string vs content parameter types), but those are rarely well provided.

For images and template calls, the early version of the tool should probably ignore them and make them fully translatable as any text. A more advanced version of the tool would include a prompt which would allow the user to choose between options:

  • Do you want to allow translators to replace this image with another one (i.e. a localized version of the image)?
  • Do you want to allow translators to change the value of this template parameter (i.e. translating the text)?

Thanks for the comment. I believe we are currently good at tables and lists, and also with the cases of file captions. The latest updates in the tool also issue Translation magic words for categories, and issue Special:MyLanguage, wherever necessary. Template calls and file calls for exceptional calls (wherever localization is needed), is a pending task; which we are aiming to getting done soon.