Page MenuHomePhabricator

CX2 shows </img> instead of math formulas
Closed, ResolvedPublic

Assigned To
Authored By
He7d3r
Jan 20 2019, 9:56 AM
Referenced Files
F30001292: image.png
Aug 8 2019, 10:12 PM
F30001284: image.png
Aug 8 2019, 10:12 PM
F29930826: image.png
Aug 1 2019, 12:42 PM
F29900401: image.png
Jul 29 2019, 8:55 PM
F29899197: image.png
Jul 29 2019, 6:11 PM
F29899134: image.png
Jul 29 2019, 5:57 PM
F29600184: T137803_test-results_2-07-19 (1).pdf
Jun 18 2019, 7:50 AM
F27958906: Captura de tela de 2019-01-20 07-52-45.png
Jan 20 2019, 9:56 AM

Description

Since approximately this week, ContentTranslation v2 sometimes inserts </img> in the translation instead of the mathematical formulas, as shown in the following image:

Captura de tela de 2019-01-20 07-52-45.png (768×1 px, 129 KB)

(from https://pt.wikipedia.org/wiki/Special:ContentTranslation?title=Special:ContentTranslation&campaign=contributionsmenu&to=pt&page=Tensor+product&from=en&targettitle=Produto+tensorial&version=2)

Event Timeline

Using en:Grüneisen_parameter to translate into Serbian, in production:

  • With Google Translate as default option, translate some math formula. "</img>" is displayed as in the description.
  • Change to "Copy original content" and section gets empty
  • Make "Copy original content" the default option
  • Translate some other math formula. Result: Formula looks fine

Here we observe some different behavior depending on MT option chosen and even the order in which options are changed. Could be useful for debugging.

I've just managed to reproduce it.

I translated https://en.wikipedia.org/wiki/User:Amire80/math to https://ru.wikipedia.org/wiki/User:Amire80/math . In the CX2 interface, I saw </img>, and the output came out totally garbled.

Google Translate was enabled.

A bit more details:

The source math was:

<math>\ddot{a}+\bar{a}</math>

The output was:

<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow class="MJX-TeXAtom-ORD"><mstyle displaystyle="true" scriptlevel="0"><mrow class="MJX-TeXAtom-ORD"><mrow class="MJX-TeXAtom-ORD"><mover><mi> </mi><mo> <math>\ddot{a}+\bar{a}</math> </mo></mover></mrow></mrow><mo> <math>\ddot{a}+\bar{a}</math> </mo><mrow class="MJX-TeXAtom-ORD"><mrow class="MJX-TeXAtom-ORD"><mover><mi> </mi><mo stretchy="false"> <math>\ddot{a}+\bar{a}</math> </mo></mover></mrow></mrow></mstyle></mrow> </math><math>\ddot{a}+\bar{a}</math>  <math>\ddot{a}+\bar{a}</math> </img> <span></span>

You can see that <math>\ddot{a}+\bar{a}</math> appears five times in the output סֿ_Ô

Just a reminder that some comparative testing was done by @Barbvd in T137803. The pdf with the testing results (which includes this issue with </img>) is below:

Change 525264 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] WIP: Do not pass Math content to MT engines

https://gerrit.wikimedia.org/r/525264

Change 525264 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Do not pass Math content to MT engines

https://gerrit.wikimedia.org/r/525264

not sure this is related to this ticket, but I never saw a math formula being translated like this.
it is like this in production as well.
waiting for @santhosh feedback to see if it is related or not.

image.png (768×1 px, 145 KB)

update: this might not be related since the example on my last screenshot is a template and not a math formula

image.png (144×568 px, 11 KB)

@Pginer-WMF is this a different issue?
Is there a ticket already for this?

image.png (290×2 px, 132 KB)

Change 526311 had a related patch set uploaded (by KartikMistry; owner: KartikMistry):
[operations/deployment-charts@master] Update cxserver to 2019-07-29-154005-production

https://gerrit.wikimedia.org/r/526311

not sure this is related to this ticket, but I never saw a math formula being translated like this.
it is like this in production as well.
waiting for @santhosh feedback to see if it is related or not.

Please include language pair, title and MT engine information along with screen shots.
In this case I assume es->ca translation, Title: Cálculo tensorial , MT service: Apertium. Correct me if I am wrong.

update: this might not be related since the example on my last screenshot is a template and not a math formula

image.png (144×568 px, 11 KB)

You are right. This is not math formula, but https://es.wikipedia.org/wiki/Plantilla:Ecuaci%C3%B3n which is incorrectly connected to https://ca.wikipedia.org/wiki/Plantilla:Equaci%C3%B3/%C3%BAs - the documentation page for a template in catalan wikipedia. I tried a quick fixing it in wikidata(https://www.wikidata.org/wiki/Q14339252), but seems bit complex since catalan template has its own wikidata item https://www.wikidata.org/wiki/Q25740790. So it is better a catalan community member take look into this. @Pginer-WMF please note.

@Pginer-WMF is this a different issue?
Is there a ticket already for this?

image.png (290×2 px, 132 KB)

This one is definitely a bug. Happening with Apertium. Checking.

Change 526311 merged by KartikMistry:
[operations/deployment-charts@master] Update cxserver to 2019-07-29-154005-production

https://gerrit.wikimedia.org/r/526311

This one is definitely a bug. Happening with Apertium. Checking.

@santhosh are you going to be fixing this issue on this task or a different one?

This one is definitely a bug. Happening with Apertium. Checking.

@santhosh are you going to be fixing this issue on this task or a different one?

This ticket is sufficient.

Change 526644 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Support inline maths for plain text MT services

https://gerrit.wikimedia.org/r/526644

Change 527081 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Do not send math content to HTML MT services

https://gerrit.wikimedia.org/r/527081

QA NOTES (dev can ignore):
I found a weird behaviour in prod with https://es.wikipedia.org/w/index.php?title=Especial:Traducci%C3%B3n_de_contenidos&campaign=contributions-page&page=Gr%C3%BCneisen+parameter&from=en&to=es&targettitle=Gr%C3%BCneisen+parameter

image.png (1×2 px, 546 KB)

MT stops translating after a math formula.
Check this case once this task is in cx2-testing.

Change 526644 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Support inline maths for plain text MT services

https://gerrit.wikimedia.org/r/526644

Change 527081 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Do not send math content to HTML MT services

https://gerrit.wikimedia.org/r/527081

MT stops translating "randomly".
Steps:

  1. start translating the article from the description
  2. using google has the MT engine, start translating one by one from the top of the article
  3. as seen in the screenshot, the third section does not get translated. this happens again if you continue translating.

an error appears on the console although I'm not sure it is related

image.png (1×1 px, 513 KB)

image.png (482×798 px, 134 KB)

mw.cx.TranslationTracker.prototype.isExcludedFromValidation = function(sectionModel) {
        var excludedTypes = ['cxBlockImage', 'mwBlockImage', 'cxTransclusionBlock', 'mwTransclusionBlock', 'mwReferencesList', 'mwMath', 'mwTable', 'list', 'mwHeading']
          , childType = sectionModel.getChildNodeName();
}

created a new ticket for the issue T230195