Migrated from: https://wikimedia.mingle.thoughtworks.com/projects/language_engineering/cards/4184
Context
Given that Wikidata provides an equivalence for Wikipedia links (e.g., Cheese - Formaggio). It is possible to guess when the user is writing the translation for a link, and suggest the creation of such link in those cases.
This will not be perfect since the translated word may be different than the Wikidata translated label (requiring the user to modify the inserted word or ignore it at all depending on the case), but it will speed up the process in many cases. There may be also problems due to the detection of word boundaries (e.g., if the user types "a " will we be able to suggest "a day in the life" if that is one of the suggestions from the source links?
Narrative
As a user<i>, I can get suggestions or creating links based on the source text</i>//<i>, so that I can add links just by typing without extra selection</i>.//
Acceptance Criteria
- Given a link in the source English text ("Cheese"), when the user types "for" in the Italian translation, a suggested text (in grey) is shown for "formaggio".
- If the user accepts the suggestion, a link pointing to the corresponding article (based on Wikidata) is created.
- Suggestions for insertion are shown below the current cursor position.
- To avoid frequent false positives, suggestions may be based on 2-3 character occurrences.
- A link fromthe source is not suggested if it is already present in the target at any time (i.e., only the first "formaggio" will be linked, being annoying to show the next times the user writes "formaggio").
A number of linguistic features are approximated with "quick fixes":
- Word boundary detection (simple non-internationalized regex-based approach, e.g. /[A-Za-z_]/)
- Word stem matching in link target set (match "word" prefixes)
- Multi-word phrase matching (match on first "word" only)
- Likely match detection (match the first n characters in the editor after the last "word boundary")