Page MenuHomePhabricator

Load a single section in Content translation's editor
Open, HighPublic

Description

As part of Section Translation (T243495), we want to support expanding existing articles by translating new sections. Although the final designs have not been finalized yet, for the step where users do the actual translation it makes sense to reuse the translation editor that Content translation provides, at least for desktop. This would extend Content translation editor with a mode similar to the "edit section" capabilities that wikitext or Visual editors have.

Currently, Content translation editor loads a complete article. This ticket proposes to extend its capabilities to be able to load a single section instead. For example, based on a url parameter it should be possible to load the History section of the Ukulele article.

Design details

Expanding Content translation editor with a "section" mode requires some considerations:

  • The translation title. For this case both the article title (non-editable) and the section title (editable in the translation) will be shown.
  • Publishing behaviour. Publishing will add the new section to the target article at the end of the document. This will be refined in follow-up tickets (adjusting the action and messaging to the circumstances).
    • Section-translation Content published will include the "section-translation" tag in addition to the usual "content-translation" one.
  • Publish settings. We may need to initially remove the option to customize the target namespace when translating sections.
  • Access through the URL. As a first step, the section mode will be accessible through a URL parameter. Once the overall workflow for section translation is specified, the UI supporting other steps (e.g., letting the user pick a section to translate) will connect to the current step without the need for manually creating a URL.

Apart from the differences noted, the translation workflow should work in the same way it does when a full article is translated.


Since some of the current limitations of the current database schema may apply, it may be good to keep the following tickets in mind:

More efficient loading of a single section may require support from the Parsing team, and will be explored in T237614: Explore ways to avoid loading the whole article when showing only one section

Details

Related Gerrit Patches:
operations/deployment-charts : masterUpdate cxserver to 2020-02-05-051751-production
mediawiki/services/cxserver : masterAdd mw-section-number data attribute to distinct sections in an article
mediawiki/extensions/ContentTranslation : masterSplit CX into section translation and article translation
mediawiki/extensions/ContentTranslation : masterAllow to load only a single section
mediawiki/services/cxserver : masterExtract section titles
mediawiki/extensions/ContentTranslation : masterAdd link to target article if it exists
mediawiki/extensions/ContentTranslation : masterAllow publishing of section translation
mediawiki/extensions/ContentTranslation : masterReplace article title with section title
mediawiki/extensions/ContentTranslation : masterSection translation
mediawiki/services/cxserver : masterSection translation test

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 1 2019, 11:54 AM
Pginer-WMF triaged this task as Medium priority.Oct 1 2019, 11:55 AM
Pginer-WMF updated the task description. (Show Details)Oct 9 2019, 10:23 AM
Pginer-WMF raised the priority of this task from Medium to High.Oct 9 2019, 11:56 AM

@Pginer-WMF have you considered allowing users to choose where their published section translations should end up? In your mock ups, there was a way to choose which section to translate from source. Similar to that, we may enable users to choose between which sections in target article to insert newly translated one.

@Pginer-WMF have you considered allowing users to choose where their published section translations should end up? In your mock ups, there was a way to choose which section to translate from source. Similar to that, we may enable users to choose between which sections in target article to insert newly translated one.

Yes, there are some possible approaches I considered:

  • When the user selects the section, a preview is shown for the place where the published contents will be added. We can add an option to change the destination.
  • When publishing we can add an extra step for the user to select the destination.
  • After publishing, we can provide an option to (only requiring additional effort when the default placement is wrong).

My initial thinking is that adding it at the end by default may be enough for the initial version, assuming that the default position (with some considerations such as keeping references section always at the end) will be the intended place often. In other words, it may not be worth it to spend time on a destination placement selector until we observe that it is really needed. Based on user research, we'll explore the different approaches to check which works the best.

Technical implementation related notes:

  1. Even though we want to show only on section in source article to users, in the background we need to full article so that we can resolve inter-content references.
    • Show the full source article and highlight the selected section alone. for target article, either (a) show full existing article (b) just a place holder for the selected section. Option (a) has a problem of aligning target content against source content. Not an easy one to resolve. (May be, it can be solved if the target article is not aligned at all). Pros: Seeing the whole context of source and target will help translators and may be even allow to select where the published section goes.
    • Hide everything except the selected section alone. Cons: Translator miss the context of the article.
  2. Since we need to load the full source article, there is no need for any new api at cxserver.
  3. Publishing API would require changes so that we don't overwrite the existing article with single section.
    • Need to find the section offset to insert the new content - How exactly this can be done? (a) if we load the full target article to UI, we can do this insertion at browser. (b) if we do this at publish api(PHP), we need to fetch the existing target article content there and parse .

Looms mostly OK.

Just a few things:

One:

Publishing will add the new section to the target article at the end of the document. This will be refined in follow-up tickets (adjusting the action and messaging to the circumstances).

Does this mean that there is a plan that in the future there will be a way to publish somewhere other than the end of the article?

Two:
In the current image, the name of the section has the same appearance as the name of the article in the full-article mode. The name of the article is a new element, shown in a small font at the top. I suspect that this may be confusing. It makes more sense to me to show the article name and the section name identically to how they are shown in the full-article mode.

Three:
It's not exactly about this visual design, but generally about section translation: How will section translation actions be counted in CXStats? This feature will need some metrics.

Technical implementation related notes:

Thanks for putting this together, Santhosh.

  1. Even though we want to show only on section in source article to users, in the background we need to full article so that we can resolve inter-content references.

Ok, that seems a safe approach, and I'm ok going in that direction as an initial step. However, I think it would be worth exploring the possibility of actually loading properly the specific section. Mobile Visual Editor recently supported section loading, which may had to deal with similar challenges. Are they loading the whole article behind the scenes? Is there an alternative to load references from other paragraphs?

  • Show the full source article and highlight the selected section alone. for target article, either (a) show full existing article (b) just a place holder for the selected section. Option (a) has a problem of aligning target content against source content. Not an easy one to resolve. (May be, it can be solved if the target article is not aligned at all).

Pros: Seeing the whole context of source and target will help translators and may be even allow to select where the published section goes.

  • Hide everything except the selected section alone. Cons: Translator miss the context of the article.

The purpose of Section translation is to focus on a particular sections. So I'm inclined to hide all the unrelated sections on both the source and the target document from the user. Other steps on the workflow would help to provide context when needed.

I made a quick test on this translation by applying the following CSS to hide translation units before the "History" section:

#cxSourceSection0, #cxTargetSection0,
#cxSourceSection1, #cxTargetSection1,
#cxSourceSection2, #cxTargetSection2,
#cxSourceSection3, #cxTargetSection3,
#cxSourceSection4, #cxTargetSection4
{display:none}

Below you can see that paragraph alignment works (adjusting when the translation becomes longer) and references that are created in other paragraphs (like the last one) seem to be present:

  1. Since we need to load the full source article, there is no need for any new api at cxserver.
  2. Publishing API would require changes so that we don't overwrite the existing article with single section.
    • Need to find the section offset to insert the new content - How exactly this can be done? (a) if we load the full target article to UI, we can do this insertion at browser. (b) if we do this at publish api(PHP), we need to fetch the existing target article content there and parse .

Regarding publishing it would be great to make it resistant to changes that may happen in the article while the user is translating (especially if those happen in other sections the user is not working on).

Loading the target article seems to add unnecessary complexity of processing contents that are not going to be modified. The classic Wikitext editor supports section editing for a long time. So maybe we can check how the process of appending new contents on a particular section is supported.

Another concern about loading too much content: Would this path allow multiple users to translate different sections, and publish them without stepping their toes? That is, a user starts translating the "History" section of the Ukulele article into Tagalog while another user starts translating the "Tunning" section, and both can publish their section in any order without destroying the other's work.

Notes on how does section translation differs from full article translation

  1. A section translation is not equivalent to full article translation in terms of CX Statistics.
  2. A translator cannot say I created this article using CX.
  3. So this concept require differentiating between full translation and section translation in
    • database,
    • apis,
    • corpora relations,
    • statistics - APIs and CXStats page
    • CX dashboard.
  4. When we introduce section translation, more than one translator can works on a sourcearticle-sourcelanguage-targetlanguage-targettitle pair. We will need to rethink and improve our database schema and data access classes to get this correct
  5. While we are at it, it is becoming close to the idea of multiple translators doing full translation on same article at same time(TODO: link ticket for that)

Looms mostly OK.
Just a few things:
One:

Publishing will add the new section to the target article at the end of the document. This will be refined in follow-up tickets (adjusting the action and messaging to the circumstances).

Does this mean that there is a plan that in the future there will be a way to publish somewhere other than the end of the article?

Yes, but there are many possibilities. We need to observe users to learn whether the default placement is right most of the time. Depending on how frequent changing the default position is needed, we can provide a more or less prominent way to change the default. Some possibilities:

  • Users most of the time translate sections in the sequence order they expect. Providing a follow-up option after the section is published to move it, may be enough to correct the few cases where the default is not ideal.
  • Users most of the time need a section to be on a different specific place. Then an additional step to select the destination placement may be convenient.

So for the first iteration, I think it makes sense to start with a basic default.

Two:
In the current image, the name of the section has the same appearance as the name of the article in the full-article mode. The name of the article is a new element, shown in a small font at the top. I suspect that this may be confusing. It makes more sense to me to show the article name and the section name identically to how they are shown in the full-article mode.

Good point. Here I was trying to emphasize the main element the user is working on (the section), keeping the article as a secondary contextual element. However, it is true that this contradicts the usual document hierarchy, and we need to check how much distraction/confusion this may generate. In any case, I think that both pieces of information are needed and adjusting the style in one direction or another does not seem to be a blocker for the technical exploration.

Three:
It's not exactly about this visual design, but generally about section translation: How will section translation actions be counted in CXStats? This feature will need some metrics.

I'd expect section translation to be reflected as an edit with a special tag ("section-translation"). The metrics defined for the current fiscal year (T226171) are defined to allow obtaining the number of articles translated (with the current Content translation workflow), sections translated (with the future section translation workflow), or both.

Note that with this approach the articles translated with the classic Content translation workflow may contain several sections, but those are not counted as independent section translations.

Pginer-WMF updated the task description. (Show Details)Oct 15 2019, 11:46 AM

This topic was discussed in details and the current undestanding is given below:

  1. The section translation workflow with in CX- from starting to publish will be a minimal CX both in terms of technology and user experience. CX Will have a newly defined mode-we can name it properly, for now "minimal"
  2. Minimal mode will be used for section translation. It will NOT HAVE the following features
    • Auto save or any kind of saving. Start, edit and publish in one go. No entries made to the CX central databse.
    • Category display and actions to add remove
    • No progress calculation
    • No translation progress based validation or abuse filter checks. Hence no error cards at all
    • No namespace selection since the article exist already
  3. CX Dashboard does not show any ongoing section translation or any statistics. No changes there
  4. CX Stats does not show anything about sections translation
  5. Published translation can have an edit tag

Change 547708 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/services/cxserver@master] Section translation test

https://gerrit.wikimedia.org/r/547708

Change 547709 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/extensions/ContentTranslation@master] Section translation

https://gerrit.wikimedia.org/r/547709

Change 547708 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/services/cxserver@master] Section translation test
https://gerrit.wikimedia.org/r/547708

@Petar.petkovic Can you clarify why we need to fetch a single section from cxserver? Don't we need to fetch full article as I explained in above comment?

  1. Even though we want to show only on section in source article to users, in the background we need to full article so that we can resolve inter-content references.
    • Show the full source article and highlight the selected section alone. for target article, either (a) show full existing article (b) just a place holder for the selected section. Option (a) has a problem of aligning target content against source content. Not an easy one to resolve. (May be, it can be solved if the target article is not aligned at all). Pros: Seeing the whole context of source and target will help translators and may be even allow to select where the published section goes.
    • Hide everything except the selected section alone. Cons: Translator miss the context of the article.
  2. Since we need to load the full source article, there is no need for any new api at cxserver.

Change 548589 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/services/cxserver@master] Add classes to disctinct sections in an article

https://gerrit.wikimedia.org/r/548589

Change 548590 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/extensions/ContentTranslation@master] Allow to load only a single section

https://gerrit.wikimedia.org/r/548590

Change 547708 abandoned by Petar.petkovic:
Section translation test

Reason:
Different approach taken

https://gerrit.wikimedia.org/r/547708

Change 547709 abandoned by Petar.petkovic:
Section translation

Reason:
Different approach taken

https://gerrit.wikimedia.org/r/547709

@Petar.petkovic Can you clarify why we need to fetch a single section from cxserver? Don't we need to fetch full article as I explained in above comment?

We don't necessary need to load full article. There is still no REST API which returns only one section using parsoid and maybe their team had a valid reason not to include such option.
There is a simpler version catered to mobile devices, which I wanted to try out. Loading whole article seems lazy to me. There are some challenges with cross-section content, like named references, but they can be expanded to full definition for that one section.

In order to have this feature faster, I did load full article, with hiding of unnecessary sections. My understanding is that we still don't have a concrete plan how section translation will work and many questions remain open to answer, so, as a start, we can use simpler approach you proposed.

The section translation workflow within CX - from starting to publish will be a minimal CX both in terms of technology and user experience. CX Will have a newly defined mode - we can name it properly, for now "minimal"

This will most likely be a URL flag which disables/enables certain features. In the future, section translation is planned to become more complex and we may want to have more minimal RL modules to load for such limited UX. However, RL penalizes creation of new modules, since those names are loaded on every page view, as discussed many times in the past.

Thanks, @Petar.petkovic for all the details. I captured the idea of avoiding loading extra contents for future iterations, once the initial version is completed: T237614: Explore ways to avoid loading the whole article when showing only one section

Pginer-WMF updated the task description. (Show Details)Nov 8 2019, 12:19 PM

VE added support for section editing earlier this year (T76541). In your current approach, you are still building the full CE tree, and having it laid out. Setting attachedRoot to the required <section> node in ve.dm.Surface will be much faster :)

We don't necessary need to load full article.

You will need to in order support references defined elsewhere on the page, also a change to one section can affect other sections because of reference (e.g. deleting a reference can result in the contents being moved to another part of the page).

That said server-side section editing will give you less performance gains than you might think (if you are using attachedRoot instead): T206228#5330185. Parsoid HTML download is usually not a bottleneck, and more time is spent building and rendering the CE tree than the DM.

That graph applies to ArticleTarget, so it's not certain the same applies to CX, but if you have performance issues, I would suggest investigating things like loading the page content earlier in the application cycle, in parallel with the editor initialising.

Change 550050 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/extensions/ContentTranslation@master] Split CX into minimal and full version

https://gerrit.wikimedia.org/r/550050

server-side section editing will give you less performance gains than you might think (if you are using attachedRoot instead): T206228#5330185. Parsoid HTML download is usually not a bottleneck, and more time is spent building and rendering the CE tree than the DM.

Does attachedRoot allow to use multiple adjacent <section>s or we need to wrap those which define a range between two <h2> headers?

We still haven't decided on a level of granularity we want for section translation, but for initial exploration, I went with a larger set.

Currently attachedRoot only allows a single <section> which Parsoid adds as defined here: https://www.mediawiki.org/wiki/Parsing/Notes/Section_Wrapping

It may be possible to change the code to attach multiple siblings, but it would be easier to just pre-modify the DOM to wrap all the content you want to display in a new <section> tag.

Let me write a roundup of current state of changes made for section translation support and ask questions coming out of it. Work is not merged, so this is preliminary.

  • The goal is to load articles which exist already in target wiki, so that we can expand them with one section. Entry point for starting section translation is presence of section param in URL. Which means section translation could be loaded for articles which don't have translation in target wiki. What should happen in such case?
  • Current working version removes all issues which are shown inside issue card. In full article translation, we have error that is shown when user is not allowed to publish in main namespace of the wiki. We show this only on English Wikipedia, because their community defined Abuse Filter rule to combat article publishing via Content Translation. This exact Abuse Filter rule would not catch sections published with section translation, because it is based on edit summary which we have for articles published with Content Translation. However, community might implement similar solution against section translation and without issue tracking system, there will be no upfront warning for users. I was looking at warnings displayed in issue cards and this was one of them. But, there are more important ones that we might not want missing, which leads me to the next point.
  • Do we really want to get rid of MT abuse checks? While section translation is in exploration phase, it would be nice to predict some basic needs we will have in the future and don't go through big rewrites to remove some parts of code only to bring them back in the future. Reasons why we have MT abuse checks for article translation are existing for section translation as well.
  • Mock ups in the description (F30517880) show how title of the section being translated should be edited. Me and @Pginer-WMF discussed this during Language team offsite last week. Article titles have strict rules about which characters they allow and wiki syntax is not permitted. Section titles are different and could include reference, for example. Text area element which contains article title in full article translation cannot be used for this rich wikitext editing experience. Also, text area does not make it possible to utilize MT to aid translation of the section title. Therefore, I ask to revisit the design with this in mind. Below is screenshot from the current state, where section title is displayed as one of the paragraphs of translation. Some tweaks would be needed, target article title should not be editable and we would need to make sure section title gets translated, similar to what we had in CX1.

Let me write a roundup of current state of changes made for section translation support and ask questions coming out of it.

Thanks for the initial work in this front and surfacing these questions. Some comments below:

...section translation could be loaded for articles which don't have translation in target wiki. What should happen in such case?

For this particular case, I think the expected behavior would be to create a new article consisting of the selected section.

Most of the UI workflows for section translation would limit the user choice to sections of articles that already exist in the target wiki. However, for mobile we also plan to support the creation of new articles by using Section translation (not only expanding existing ones). The idea is to treat the lead section as a section that users can translate to start an article. So the proposed behavior would be consistent with that (and also allow for starting with a different section than the lead one). In addition, the proposed behavior can be useful for other tools that may integrate by setting the URL parameters.

Some considerations (we can create separate tickets for these):

  • Show a message to let the user know that a new article will be created.
  • Make sure redirects are dealt properly. If the target page X is a redirect to Y, we should treat Y as the destination. That is, expand Y with the new section (not overwrite X).
  • Do we really want to get rid of MT abuse checks?

The proposal to skip some checks was for simplification purposes. I think it makes sense to support translating the contents of a section in the same way we support it when translating as part of a larger article. If there are checks that are safe to apply to a fragment of the content, it is great to support them.

My current thinking:

  • If supporting the checks requires additional effort, I'd recommend creating a follow-up ticket.
  • If supporting the checks does not require any effort, verify and document that they work at the section level.

I ask to revisit the design with this in mind. Below is screenshot from the current state, where section title is displayed as one of the paragraphs of translation. Some tweaks would be needed, target article title should not be editable and we would need to make sure section title gets translated, similar to what we had in CX1.

Please let me know whether the following is correct:

  • It is possible to make the page title to be non-editable and adjust the styling to be presented differently (e.g., in a smaller size as the proposed design)
  • It is hard for the section title to be presented above the language indicators (i.e., "English - view page. Tagalog" line)

So the new design you are asking for can adjust the style but not change the placement of elements. Is that correct? Is that a hard limitation or something that can be refined as part of follow-up tickets?

Please let me know whether the following is correct:

  • It is possible to make the page title to be non-editable and adjust the styling to be presented differently (e.g., in a smaller size as the proposed design)
  • It is hard for the section title to be presented above the language indicators (i.e., "English - view page. Tagalog" line)

So the new design you are asking for can adjust the style but not change the placement of elements. Is that correct? Is that a hard limitation or something that can be refined as part of follow-up tickets?

Making target title read-only is possible. We already do that for source title. Text can be styled to be smaller as well.

On second point: Article title is presented in separate HTML element, outside VE editing surface. Section title is inside VE surface and among many other sibling section nodes, which means we cannot move it above language indicators. If we want full editing capabilities for section title, we would need separate VE surface.

Change 554971 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/extensions/ContentTranslation@master] Add link to target article if it exists

https://gerrit.wikimedia.org/r/554971

Change 555680 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/services/cxserver@master] Extract section titles

https://gerrit.wikimedia.org/r/555680

Change 555681 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/extensions/ContentTranslation@master] Replace article title with section title

https://gerrit.wikimedia.org/r/555681

Change 555756 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/extensions/ContentTranslation@master] Allow publishing of section translation

https://gerrit.wikimedia.org/r/555756

Change 548589 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Add mw-section-number data attribute to distinct sections in an article

https://gerrit.wikimedia.org/r/548589

Change 548590 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Allow to load only a single section

https://gerrit.wikimedia.org/r/548590

Pginer-WMF updated the task description. (Show Details)Jan 23 2020, 10:29 AM

Change 570515 had a related patch set uploaded (by KartikMistry; owner: KartikMistry):
[operations/deployment-charts@master] Update cxserver to 2020-02-05-051751-production

https://gerrit.wikimedia.org/r/570515

Change 570515 merged by jenkins-bot:
[operations/deployment-charts@master] Update cxserver to 2020-02-05-051751-production

https://gerrit.wikimedia.org/r/570515

Mentioned in SAL (#wikimedia-operations) [2020-02-06T11:38:32Z] <kart_> Updated cxserver to 2020-02-05-051751-production (T244230, T234323)