Page MenuHomePhabricator

Research support for cross-wiki content propagation
Open, Needs TriagePublic

Description

During the 2019-20 fiscal year the Language team plans to support cross-wiki propagation of content. That is, supporting users to translate relevant pieces of content that are missing in the existing article on their local language (on desktop and mobile). For example, we want to make it easy for a user to translate and transfer the "Construction sequence" information from the English version of the "Suspension bridge" article into the Portuguese version of the article (where it is missing).

This ticket provides an overview of current and future research work that could help in this context. The areas below describe the support needed, in which ways such support would help, and fallback approaches that can be applied while the necessary capabilities are not yet available.

Template parameter mappings

Templates capture relevant contents such as references and infoboxes. Transferring such content across languages is challenging since templates are defined independently in each wiki. Lack of support results in incomplete content and requires additional efforts by editors.

Research in this area can help to map template parameters automatically. Automating this mapping allows to easily transfer the content structured in templates across languages. For example, when translating a new paragraph about the latest scientific discovery, the reference information will be kept in the translation with all its information (book title, page number, etc.) resulting in content of a higher quality.

Status and fallbacks

Work already started to explore how to map automatically template parameters for popular templates using machine learning (T221211: Parameters matching on Templates: ML Exploration )

For templates that cannot be supported with this approach, we can consider the following fallback approaches:

  • Ignore unsupported templates. Remove the unsupported templates when presenting the content to translate to the user
  • Highlighting the content that could not be transferred for the user to add manually later.
  • Prevent suggestions which include content that cannot be translated. When suggesting content to translate, avoid surfacing content that contains problematic contents.

These fallback strategies seem acceptable since users don't need to translate every single piece of content, and the translation can still be a useful contribution even if it does not include all the source contents.

Section mappings

Sections of an article represent relevant aspects of a topic. Sections are useful as content units to work with. They allow to compare which aspects have been covered and which ones may be missing when comparing articles in two different languages.

Research in this area can help to identify sections that are available in one given language and missing in another in order to surface opportunities for the user to contribute. In addition, identifying which of those potential contributions are more relevant (in general or for the current user) can be useful. For example, a German user adds a new section about the latest space mission, and another multilingual user interested in the topic is suggested to translate it to Korean. Then the user speaking both German and Korean checks which other sections are still missing in the Korean version to consider adding them from the German one.

Status and fallbacks

There has been work already from research to identify relevant missing sections that users can add to an article based on those present in other languages. Initial discussions suggest, that this work can be repurposed to identify sections that exist in one given language and are missing in another one.

Unil this approach is available, simpler approaches can be used to prevent this advanced section mapping to be blocker:

  • Focus on articles with no sections at allo to make sure that any section present in another language is not there.
  • Focus on articles that were created with Content translation where no additional sections were added after they were published. In this way the section mapping is already available.
  • Let the users check (and report) if the page contains a given section.

Identifying meaningful facts and updates

During this fiscal year we'll focus on sections since there are tools available to deal with those more easily, but there are other relevant updates in content that users may be interested in transferring across languages. For example, a couple of sentences can be capturing a new fact about a new scientific discovery. Transferring this fact to as many languages as possible enables more people to access this knowledge.

Currently our systems understand modifications in terms of characters and edits, but that is not enough to understand what constitutes a meaningful unit of knowledge increment. For example, adding a single numeric value such as the death year may represent a meaningful change about the topic, while a paragraph rewrite that adds no new information may be irrelevant for propagation despite consisting of more edited text.

Research can help to identify meaningful content changes that are worth propagating across languages.

Status and fallbacks
This is not a priority yet, but will expand the usecases of cross-wiki propagation beyond article sections.

While this is not available, focus for translations will be limited to sections as described above.

Suggestions for specific topic areas

When suggesting contents to translate (being articles, sections, or smaller units), users may be more motivated to work on topics in their areas of interest. Allowing users to select general knowledge areas such as "science" or "art" will allow them to easily discover content they are motivated to translate.

Research can help to integrate topic maps and recommendation systems in a way that makes it possible to get suggestions for specific topic areas.

Status and fallbacks
Current article recommendation mechanisms are available. Those provide a way to customize based on providing an example article ("article seed") to get similar suggestions to such article. Some work has been done with topic maps, but those seem to be disconnected for now.

As a fallback approach, the recent edits by the user can be used as article seeds to find relevant articles that can be used to propose expanding with missing sections, paragraphs, etc.

Event Timeline

Restricted Application added a subscriber: revi. · View Herald TranscriptMay 23 2019, 4:05 PM
Isaac added a subscriber: Isaac.Jun 4 2019, 8:19 PM
SBisson added a subscriber: SBisson.Jul 5 2019, 1:46 AM
Miriam added a subscriber: diego.Jul 11 2019, 3:24 PM