Page MenuHomePhabricator

List of issues of 1.3.0 summary endpoint results
Closed, ResolvedPublic

Description

Here's a list of issues found after comparing summary extract_html fields from 1.2.0 to 1.3.0 (MCS commit b0be98c). So far the wikis ar through es on the comparison report have been checked.

Issues

Should be fixed on the Parsoid level or onwiki

Recommend to be fixed onwiki

Moved most to T188134.

(I think the lead paragraph in this article is too long, and the two bullet items too large to fit into a summary. I don't have the Catalan skills to rewrite this so I'm going to punt on this.)

Minor issues, some of which should probably be fixed in MCS

Most of these could also be fixed onwiki, see T188134.

Related Objects

Event Timeline

bearND created this task.Feb 21 2018, 7:57 PM
bearND removed the point value for this task.
bearND updated the task description. (Show Details)Feb 21 2018, 8:14 PM
bearND updated the task description. (Show Details)Feb 22 2018, 3:38 AM

Another couple of things:

  1. do we know why some articles are appearing without a summary? (eswiki: http://wpsummary.surge.sh/1.2.0-b0be98c/html/es.html#27, http://wpsummary.surge.sh/1.2.0-b0be98c/html/es.html#3, cswiki: http://wpsummary.surge.sh/1.2.0-b0be98c/html/cs.html#22)
  2. incorrect word stress (bgwiki: http://wpsummary.surge.sh/1.2.0-b0be98c/html/bg.html#5, Репу̀блика Бълга̀рия now reads Репу̀блика Бълга̀рия, http://wpsummary.surge.sh/1.2.0-b0be98c/html/bg.html#17, Со̀фия and Со̀фия (seeing this on a lot of articles)

Another couple of things:

  1. do we know why some articles are appearing without a summary? (eswiki: http://wpsummary.surge.sh/1.2.0-b0be98c/html/es.html#27, http://wpsummary.surge.sh/1.2.0-b0be98c/html/es.html#3, cswiki: http://wpsummary.surge.sh/1.2.0-b0be98c/html/cs.html#22)
  1. incorrect word stress (bgwiki: http://wpsummary.surge.sh/1.2.0-b0be98c/html/bg.html#5, Репу̀блика Бълга̀рия now reads Репу̀блика Бълга̀рия, http://wpsummary.surge.sh/1.2.0-b0be98c/html/bg.html#17, Со̀фия and Со̀фия (seeing this on a lot of articles)

Those look like real issues to me. It looks to me that Parsoid creates extra spans for these entity encodings. The shows the same as the summary.
Wikitext:

'''Репу̀блика Бълга̀рия'''

Desktop HTML:

<b>Репу̀блика Бълга̀рия</b>

Parsoid HTML:

<b id="mwBQ">Репу<span typeof="mw:Entity" id="mwBg">̀</span>блика Бълга<span typeof="mw:Entity" id="mwBw">̀</span>рия</b>.

Summary HTML:

<b>Репу<span>̀</span>блика Бълга<span>̀</span>рия</b>

It probably should be fixed on the Parsoid side as well. On the MCS side we could consider flattening entity spans. I would love to hear from Parsoid devs why they are introduced.

It probably should be fixed on the Parsoid side as well. On the MCS side we could consider flattening entity spans. I would love to hear from Parsoid devs why they are introduced.

This won't be fixed in Parsoid.

Parsoid deliberately wraps them in spans to distinguish between the input forms (& vs &amp;, etc.). Look at the source of this page vs PHP parser rendering. You see that the output doesn't distinguish between the two input forms. They render identically. So, editing clients need to know what they are editing. And for roundtripping and even for newly created content in editing clients, Parsoid needs to know what form of wikitext to emit.

The spans don't get in the way of rendering. See Parsoid rendering of the same page in the browser.

bearND added a comment.EditedFeb 22 2018, 4:55 PM

So, for reading purposes we should be able to get rid of them and flatten them completely.

@ssastry The problem arises when Unicode characters are wrapped which are supposed to be combined with other characters. That way they get unnecessarily separated. I just updated your test page to demonstrate that. Compare the Parsoid output with the Desktop output.
Anyways, I think if we make the change at the MCS level that would be fine for us at the moment.

So, for reading purposes we should be able to get rid of them and flatten them completely.

@ssastry The problem arises when Unicode characters are wrapped which are supposed to be combined with other characters. That way they get unnecessarily separated. I just updated your test page to demonstrate that. Compare the Parsoid output with the Desktop output.

It renders *exactly the same* to me in Firefox as well as Chrome. The text in Olga's comment above as well as the linked pages from that comment all look identical in these browsers.

bearND added a comment.EditedFeb 22 2018, 5:17 PM

@ssastry To see the difference you may need to look closely. I've increased the font size so it's easier to see. This is with Mac Chrome v64. Note the difference in spacing where the diacritics are:

Here's Desktop version: characters around diacritics close together)

Parsoid version:

ssastry added a subscriber: Catrope.EditedFeb 22 2018, 5:21 PM

@ssastry To see the difference you may need to look closely. I've increased the font size so it's easier to see. This is with Chrome. Note the difference in spacing where the diacritics are:

Here's Desktop version: characters around diacritics close together)

Parsoid version:

I don't see this at all on Chrome and Firefox on Linux. This may potentially be a browser bug which is worth investigating separately. VE has filed bugs against some browsers in the past (@Catrope can probably tell you more :-)). But, in any case, for now, stripping the spans might be a good solution for you.

I don't see this at all on Chrome and Firefox on Linux. This may potentially be a browser bug which is worth investigating separately. VE has filed bugs against some browsers in the past (@Catrope can probably tell you more :-)). But, in any case, for now, stripping the spans might be a good solution for you.

This looks specific to Chrome on macOS. In Firefox it renders correctly. (I haven't tested any other browsers yet.)

Same on Safari on macOS.

bearND updated the task description. (Show Details)Feb 23 2018, 7:17 AM
bearND removed a project: Parsoid.EditedFeb 23 2018, 6:40 PM

Let's move the discussion about the diacritics spacing issue on some macOS browsers to T188127 so people not interested in the summary endpoint specifics can unsubscribe here.

bearND updated the task description. (Show Details)Feb 23 2018, 7:48 PM

Change 414620 had a related patch set uploaded (by BearND; owner: BearND):
[mediawiki/services/mobileapps@master] Summary: get rid of ALL double spaces

https://gerrit.wikimedia.org/r/414620

Change 414620 merged by Mholloway:
[mediawiki/services/mobileapps@master] Summary: get rid of ALL double spaces

https://gerrit.wikimedia.org/r/414620

Change 414737 had a related patch set uploaded (by BearND; owner: BearND):
[mediawiki/services/mobileapps@master] Summary: bump version to 1.3.2

https://gerrit.wikimedia.org/r/414737

bearND updated the task description. (Show Details)Feb 26 2018, 6:39 PM

Change 414737 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Summary: bump version to 1.3.2

https://gerrit.wikimedia.org/r/414737

bearND updated the task description. (Show Details)Feb 26 2018, 6:56 PM

@bearND - also pointed this out in T182319#4010523 - we're seeing a couple of cases (on every wiki tested so far) where math expressions are not rendering. An example:

  1. Go to https://en.wikipedia.org/wiki/Invertible_matrix
  2. Hover over Linear Algebra

Any ideas what might be happening here? I couldn't notice any issues with the markup.

See my response on the linked task.

bearND updated the task description. (Show Details)Mar 13 2018, 5:21 PM
Krinkle moved this task from Backlog to Summary on the Page Content Service board.Jun 6 2018, 2:11 AM

@bearND Is this fixed, should we resolve?

Jhernandez triaged this task as Low priority.
bearND removed bearND as the assignee of this task.Mar 15 2019, 4:10 PM
LGoto assigned this task to bearND.Mar 27 2019, 3:52 PM
LGoto added a subscriber: LGoto.

Hi @bearND can you look into this and resolve if needed? Thanks!

bearND updated the task description. (Show Details)Apr 5 2019, 11:18 PM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptApr 5 2019, 11:18 PM
bearND updated the task description. (Show Details)Apr 5 2019, 11:25 PM
bearND closed this task as Resolved.

Moved the remaining MCS issues to separate tasks.