Page MenuHomePhabricator

CX2: Red links appear in the source article
Closed, ResolvedPublic

Description

On some articles, the first paragraph seems not to be correctly processed for some reason. In the example, the article "Paris" shows red links in the first paragraph of the source text. These links exist in the original article, and they should be shown as regular links (in blue) instead.

I found the issue when translating "Paris" from English to Catalan:

Details

Related Gerrit Patches:
mediawiki/services/cxserver : masterDo not miss to segment if the textblock has inline non-segmentable
mediawiki/services/cxserver : masterDecouple segmentation and setting link ids

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 28 2018, 1:09 PM
Pginer-WMF triaged this task as Medium priority.May 28 2018, 1:10 PM
Pginer-WMF moved this task from Needs Triage to CX2 on the ContentTranslation board.
Pginer-WMF moved this task from Backlog to Priority backlog on the Language-2018-Apr-June board.
santhosh claimed this task.Jun 6 2018, 5:58 AM

I extracted a minimal html snippet that can cause the issue

<body id="mwAA" lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output"
    dir="ltr">
    <section data-mw-section-id="0" id="mwAQ">
        <p id="mwEA">
            <b id="mwEQ">Paris</b> >is the
            <a rel="mw:WikiLink" href="./Capital_city" title="Capital city" id="mwFA">capital</a> most populous city of
            <a rel="mw:WikiLink" href="./France" title="France" id="mwFg">France</a>, with an area of
            <span about="#mwt16" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"convert","href":"./Template:Convert"},"params":{"1":{"wt":"105"},"2":{"wt":"km2"},"abbr":{"wt":"off"}},"i":0}}]}'
                id="mwFw">105 square kilometres (41 square miles)</span>
             and a population of 2,206,488.
        </p>
    </section>
</body>

When this is segmented and returned from cxserver gives the output:

<body class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr" id="mwAA"
  lang="en">
  <section id="cxSourceSection0" rel="cx:Section">
    <p id="mwEA">
      <b id="mwEQ">Paris</b> is the
      <a href="./Capital_city" id="mwFA" rel="mw:WikiLink" title="Capital city">capital</a> and
      <a href="./List_of_communes_in_France_with_over_20,000_inhabitants" id="mwFQ" rel="mw:WikiLink" title="List of communes in France with over 20,000 inhabitants">most populous city</a> of
      <a href="./France" id="mwFg" rel="mw:WikiLink" title="France">France</a>, with an area of
      <span about="#mwt16" data-mw="{&#34;parts&#34;:[{&#34;template&#34;:{&#34;target&#34;:{&#34;wt&#34;:&#34;convert&#34;,&#34;href&#34;:&#34;./Template:Convert&#34;},&#34;params&#34;:{&#34;1&#34;:{&#34;wt&#34;:&#34;105&#34;},&#34;2&#34;:{&#34;wt&#34;:&#34;km2&#34;},&#34;abbr&#34;:{&#34;wt&#34;:&#34;off&#34;}},&#34;i&#34;:0}}]}"
        id="mwFw" typeof="mw:Transclusion">105 square kilometres (41 square miles)</span> and a population of 2,206,488. </p>
  </section>
</body>

Note that the segmentation did not happen, the links did not get link-id attribute or cx-link class.

If we remove that convert template, we get correct output

<body class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr" id="mwAA"
  lang="en">
  <section id="cxSourceSection0" rel="cx:Section">
    <p id="mwEA">
      <span class="cx-segment" data-segmentid="0">
        <b id="mwEQ">Paris</b> &#62;is the
        <a class="cx-link" data-linkid="1" href="./Capital_city" id="mwFA" rel="mw:WikiLink" title="Capital city">capital</a> most populous city of
        <a class="cx-link" data-linkid="2" href="./France" id="mwFg" rel="mw:WikiLink"
          title="France">France</a>, with an area of and a population of 2,206,488. </span>
    </p>
  </section>
</body>

Since links does not have cx-link class, VE consider it as usual VE mw link and linkcache customization does not work

santhosh added a comment.EditedJun 6 2018, 11:13 AM

I identified 3 bugs to fix.

  • Wrong segmentation as identified in above sample
  • Even if segmentation goes wrong, it should never affect the link id and cx-link class setting behavior. Because that is crucial for link adaptation and VE tools integration
  • When the page is loaded, there is an api call to target wiki for getting link info for the source titles. It should go to source wiki.

Change 437728 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Decouple segmentation and setting link ids

https://gerrit.wikimedia.org/r/437728

Change 437742 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Do not miss to segment if the textblock has inline non-segmentable

https://gerrit.wikimedia.org/r/437742

Change 437728 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Decouple segmentation and setting link ids

https://gerrit.wikimedia.org/r/437728

Change 437742 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Do not miss to segment if the textblock has inline non-segmentable

https://gerrit.wikimedia.org/r/437742

Petar.petkovic removed a project: Patch-For-Review.
Petar.petkovic removed a subscriber: gerritbot.

Mentioned in SAL (#wikimedia-operations) [2018-06-25T12:26:09Z] <kartik@deploy1001> Started deploy [cxserver/deploy@cc6dc61]: Update cxserver to ece5e7a (T191874, T196354, T195768, T195768)

Mentioned in SAL (#wikimedia-operations) [2018-06-25T12:29:42Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@cc6dc61]: Update cxserver to ece5e7a (T191874, T196354, T195768, T195768) (duration: 03m 33s)

Mentioned in SAL (#wikimedia-operations) [2018-06-25T12:29:42Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@cc6dc61]: Update cxserver to ece5e7a (T191874, T196354, T195768, T195768) (duration: 03m 33s)

Etonkovidova closed this task as Resolved.Jun 26 2018, 9:04 PM
Etonkovidova added a subscriber: Etonkovidova.

Checked in cx2-testing - the majority of links are displayed ok:

There are still some cases when red links unreasonably appear in source articles:

Arrbee moved this task from QA to Done on the Language-2018-Apr-June board.Jun 28 2018, 8:11 AM
Vvjjkkii renamed this task from CX2: Red links appear in the source article to b5baaaaaaa.Jul 1 2018, 1:07 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed santhosh as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii edited subscribers, added: santhosh; removed: Aklapper.
CommunityTechBot renamed this task from b5baaaaaaa to CX2: Red links appear in the source article.Jul 2 2018, 3:28 AM
CommunityTechBot closed this task as Resolved.
CommunityTechBot assigned this task to santhosh.
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot edited subscribers, added: Aklapper; removed: santhosh.