Page MenuHomePhabricator

TitleError: title-invalid-characters
Closed, ResolvedPublic

Description

In CX2, Apertium MT often fails when running cxserver locally with:

TitleError: title-invalid-characters
    at _checkLegalTitleCharacters (/home/nike/stuff/mw/cxserver/node_modules/mediawiki-title/lib/index.js:253:15)
    at Function.Title.newFromText (/home/nike/stuff/mw/cxserver/node_modules/mediawiki-title/lib/index.js:402:5)
    at TitlePairRequest.<anonymous> (/home/nike/stuff/mw/cxserver/lib/mw/ApiRequest.js:263:19)
    at Generator.next (<anonymous>)
    at resume (/home/nike/stuff/mw/cxserver/lib/util.js:277:21)
    at resumeNext (/home/nike/stuff/mw/cxserver/lib/util.js:287:26)
    at <anonymous>

Here is one example of request data that fails:

html:<section id="cxTargetSection6"><p id="mwHQ"><span data-segmentid="31" class="cx-segment">The first fatal aviation accident was the crash of a <a href="./Rozière_balloon" rel="mw:WikiLink" data-linkid="32" class="cx-link" id="mwHg" title="Rozière balloon">Rozière balloon</a> near <a href="./Wimereux" rel="mw:WikiLink" data-linkid="33" class="cx-link" id="mwHw" title="Wimereux">Wimereux</a>, France, on June 15, 1785, killing its inventor <a href="./Jean-François_Pilâtre_de_Rozier" rel="mw:WikiLink" data-linkid="34" class="cx-link" id="mwIA" title="Jean-François Pilâtre de Rozier">Jean-François Pilâtre de Rozier</a> as well as the other occupant, Pierre Romain.<sup about="#mwt13" class="mw-ref" data-mw="{&quot;name&quot;:&quot;ref&quot;,&quot;body&quot;:{&quot;id&quot;:&quot;mw-reference-text-cite_note-3&quot;},&quot;attrs&quot;:{}}" id="cite_ref-3" rel="dc:references" typeof="mw:Extension/ref"><a href="Aviation%20accidents%20and%20incidents#cite%20note-3" rel="mw:WikiLink" style="counter-reset: mw-Ref 3;"><span class="mw-reflink-text">[3]</span></a></sup> </span><span data-segmentid="35" class="cx-segment">The first involving a powered aircraft was the crash of a <a href="./Wright_Model_A" rel="mw:WikiLink" data-linkid="36" class="cx-link" id="mwIQ" title="Wright Model A">Wright Model A</a> aircraft at <a href="./Fort_Myer" rel="mw:WikiLink" data-linkid="37" class="cx-link" id="mwIg" title="Fort Myer">Fort Myer, Virginia</a>, in the United States on September 17, 1908, injuring its co-inventor and pilot, <a href="./Orville_Wright" rel="mw:WikiLink" data-linkid="38" class="cx-link" id="mwIw" title="Orville Wright">Orville Wright</a>, and killing the passenger, Signal Corps Lieutenant <a href="./Thomas_Selfridge" rel="mw:WikiLink" data-linkid="39" class="cx-link" id="mwJA" title="Thomas Selfridge">Thomas Selfridge</a>.<sup about="#mwt90" class="mw-ref" data-mw="{&quot;name&quot;:&quot;ref&quot;,&quot;body&quot;:{&quot;id&quot;:&quot;mw-reference-text-cite_note-4&quot;},&quot;attrs&quot;:{}}" id="cite_ref-4" rel="dc:references" typeof="mw:Extension/ref"><a href="Aviation%20accidents%20and%20incidents#cite%20note-4" rel="mw:WikiLink" style="counter-reset: mw-Ref 4;"><span class="mw-reflink-text">[4]</span></a></sup></span></p></section>

Event Timeline

Pginer-WMF triaged this task as Medium priority.Mar 12 2018, 8:44 AM
Pginer-WMF moved this task from Backlog to Priority backlog on the Language-2018-Jan-Mar board.

Works for me with source=en, target=fi in CX2. What was your language pair?

image.png (219×1 px, 83 KB)

en->es was my language pair

Not able to reproduce there too.

image.png (435×1 px, 136 KB)

I cannot reproduce currently either. Let's close this for now. I'll reopen if I see it again.

Petar.petkovic subscribed.

There are lots of title-invalid-characters errors again. Can be reproduced with almost any page and language pair. Here is the payload when second paragraph of "Aviation accidents and incidents" is tried to be translated from English to Spanish using Apertium.

html: <section id="cxTargetSection4" data-mw-cx-source="undefined"><p id="mwHQ"><span data-segmentid="20" class="cx-segment">The first fatal aviation accident was the crash of a <a href="./Rozière_balloon" rel="mw:WikiLink" data-linkid="21" class="cx-link" id="mwHg" title="Rozière balloon">Rozière balloon</a> near <a href="./Wimereux" rel="mw:WikiLink" data-linkid="22" class="cx-link" id="mwHw" title="Wimereux">Wimereux</a>, France, on June 15, 1785, killing its inventor <a href="./Jean-François_Pilâtre_de_Rozier" rel="mw:WikiLink" data-linkid="23" class="cx-link" id="mwIA" title="Jean-François Pilâtre de Rozier">Jean-François Pilâtre de Rozier</a> as well as the other occupant, Pierre Romain.<sup about="#mwt13" class="mw-ref" data-mw="{&quot;name&quot;:&quot;ref&quot;,&quot;body&quot;:{&quot;id&quot;:&quot;mw-reference-text-cite_note-3&quot;},&quot;attrs&quot;:{}}" id="cite_ref-3" rel="dc:references" typeof="mw:Extension/ref"><a href="Aviation%20accidents%20and%20incidents#cite%20note-3" rel="mw:WikiLink" style="counter-reset: mw-Ref 3;"><span class="mw-reflink-text">[3]</span></a></sup> </span><span data-segmentid="24" class="cx-segment">The first involving a powered aircraft was the crash of a <a href="./Wright_Model_A" rel="mw:WikiLink" data-linkid="25" class="cx-link" id="mwIQ" title="Wright Model A">Wright Model A</a> aircraft at <a href="./Fort_Myer" rel="mw:WikiLink" data-linkid="26" class="cx-link" id="mwIg" title="Fort Myer">Fort Myer, Virginia</a>, in the United States on September 17, 1908, injuring its co-inventor and pilot, <a href="./Orville_Wright" rel="mw:WikiLink" data-linkid="27" class="cx-link" id="mwIw" title="Orville Wright">Orville Wright</a>, and killing the passenger, Signal Corps Lieutenant <a href="./Thomas_Selfridge" rel="mw:WikiLink" data-linkid="28" class="cx-link" id="mwJA" title="Thomas Selfridge">Thomas Selfridge</a>.<sup about="#mwt119" class="mw-ref" data-mw="{&quot;name&quot;:&quot;ref&quot;,&quot;body&quot;:{&quot;id&quot;:&quot;mw-reference-text-cite_note-4&quot;},&quot;attrs&quot;:{}}" id="cite_ref-4" rel="dc:references" typeof="mw:Extension/ref"><a href="Aviation%20accidents%20and%20incidents#cite%20note-4" rel="mw:WikiLink" style="counter-reset: mw-Ref 4;"><span class="mw-reflink-text">[4]</span></a></sup></span></p></section>

I'm not running Apertium locally, unlike stated in the description, but https://cxserver.wikimedia.org/v2/translate/en/es/Apertium gives HTTP 500 status code (internal server error) to my POST request.

<a href="Aviation%20accidents%20and%20incidents#cite%20note-4" rel="mw:WikiLink" style="counter-reset: mw-Ref 4;"><span class="mw-reflink-text">[4]</span></a>

In this rel="mw:WikiLink" is strange. It is not a wiki link.

I this the same issue as T198303 or related in some way?

Is this the same issue as T198303 or related in some way?

Yes, this issue prevents article from loading in T198303.

Is this the same issue as T198303 or related in some way?

Yes, this issue prevents article from loading in T198303.

Thanks @Petar.petkovic. Since both tickets were created by @Nikerabbit, I'll check with him before merging them.

Change 455130 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Prevent exception in MWApiRequest#normalizeTitle for invalid titles

https://gerrit.wikimedia.org/r/455130

Change 455130 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Prevent exception in MWApiRequest#normalizeTitle for invalid titles

https://gerrit.wikimedia.org/r/455130

Mentioned in SAL (#wikimedia-operations) [2018-09-05T07:27:38Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@f341eec]: Update cxserver to 81d1a97 (T202933, T202283, T189438) (duration: 04m 03s)