Page MenuHomePhabricator

error translating court cases out of english
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • try to translate a page that has an american courtcase into dutch (e.g. critical race theory)
  • notice that a link like Brown v. Board of Education splits into two parts. "Brown v." and "Board of education"
  • correcting this error is possible, but a hassle

What happens?:
a single link is split into two links

What should have happened instead?:
a single link remains a single link and it links to it's equivelant in the target language.

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):
Firefox 113.0.1 (64-bit)
"Hudson v.Leake County School Board -zaak" shows it too

Event Timeline

Nikerabbit subscribed.

This sounds like an issue with the sentence splitting algorithm.

I can confirm the bug in sentence splitting for highlighting

image.png (209×687 px, 66 KB)

And when translated the link for Brown v. Board of Education becomes two links, in two sentences.

image.png (209×687 px, 71 KB)

This is a bug in cxserver's sentence splitting algorithm. We are working on a mode advanced sentence splitting algorithm for MinT here https://github.com/santhoshtr/sentencesegmenter. However that is a python library. We need the same in js.

Change 961080 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/cxserver@master] Use sentencex library for sentence segmentation

https://gerrit.wikimedia.org/r/961080

Change 961080 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Use sentencex library for sentence segmentation

https://gerrit.wikimedia.org/r/961080

Change 961979 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2023-09-28-043003-production

https://gerrit.wikimedia.org/r/961979

Change 961979 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2023-09-28-043003-production

https://gerrit.wikimedia.org/r/961979

Using the same example from T338689#9155967 which is based on the translation of a section of the Critical race theory article (quick link). The previous issues are no longer happening:

The whole sentence is highlighted:

Screenshot 2023-10-13 at 10.26.37 2.png (432×1 px, 166 KB)

The link to "Brown v. Board of Education" is kept as a single link (not split into two):

Screenshot 2023-10-13 at 10.28.58 2.png (368×1 px, 141 KB)

The above example is using Google Translate. With MinT, the link is not added to the translation, which is an issue that we may be part of a separate ticket: T348612: References moved to the end of the sentence and links disappear when translated with MinT

Screenshot 2023-10-13 at 10.32.45 2.png (390×1 px, 128 KB)

@BartTerpstra, the specific issues reported are fixed. I'm marking this as resolved, but feel free to reopen if new issues appear. Thanks for your feedback!