Page MenuHomePhabricator

Links aren't merged aggressively enough
Closed, ResolvedPublic8 Story Points

Description

  1. Ensure a page called "Foo" exists
  2. On a different page, add [[foo]]
  3. Edit that page in VE, remove the f, and add F instead
  4. The F is not linked, so select it, press Ctrl+K, and link it to Foo using the search suggestions
  5. You now have two links (one covering F and one covering oo) in the CE rendering, and the wikitext serialization looks like [[Foo|f]][[foo|oo]]

Adjacent links to the same target should be merged, even if they use different capitalizations, and even if one of them comes from Parsoid and the other from VE.

Event Timeline

Catrope created this task.Apr 3 2015, 6:26 PM
Catrope assigned this task to Esanders.
Catrope raised the priority of this task from to Needs Triage.
Catrope updated the task description. (Show Details)
Catrope added a subscriber: Catrope.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 3 2015, 6:26 PM
Catrope set Security to None.Apr 3 2015, 6:26 PM
Catrope edited a custom field.
Catrope added a subscriber: Elitre.

I think this should be handled by Parsoid. The following HTML:

<p><a href="./Foo" rel="mw:WikiLink">F</a><a href="./Foo" rel="mw:WikiLink">oo</a></p>
<p><b>F</b><b>oo</b></p>

Serialises to:
[[Foo|F]][[Foo|oo]]
'''Foo'''

The reason VE doesn't merge the A tags is because they contain different HTML attributes. In these cases they are meaningless, but generally they may not be. Parsoid should notice the adjacent annotations are comparable and merge them.

The interesting part about this is that @cscott and I argued against @GWicke in https://phabricator.wikimedia.org/T44803 when I ripped out my extremely generic, but what I thought in retrospect was unnecessary, tag minimization algorithm (but which broke at some point) in favour of focusing only on i/b tags and letting other annotations be as is.

That algorithm (after fixing its brokenness / simplifying it) would have been able to tackle <b><a>x</a></b><a>y</a> and rewrite it to <a><b>x</b>y</a>. I am not convinced we should re-implement such a generic annotation rewriting algorithm in Parsoid.

A different strategy is beginning to emerge in Parsoid land .. which is to set up DOM normalization pass that can be enabled with an api parameter ... but, there is a consideration of possibly spinning out parts of it into a library so that it is usable client-side as well, if found useful.

FYI The reason they don't always get merged in VE is because we compare DOM attributes. Links created afresh in VE will have none, where links loaded from Parsoid will. In order to compare these we'd need to filter out the DOM attributes which don't affect the meaning of the link, but that is not something VE should know about. Nor can we blindly throw away all DOM attributes, as sometimes they represent a meaningful difference (e.g. custom styles on <b> or <i> tags).

Change 201896 had a related patch set uploaded (by Esanders):
Merge MW internal link annotations if they have the same target

https://gerrit.wikimedia.org/r/201896

I did a quick experiment .. and I have A-merging code in place (that can handle simple scenarios). Will push a patch by recasting the existing quote minimization code into a normalizeDOM pass. Needs more testing as well.

Change 201896 merged by jenkins-bot:
Merge MW internal link annotations if they have the same target

https://gerrit.wikimedia.org/r/201896

Catrope closed this task as Resolved.Apr 7 2015, 12:01 AM

Checked the fix in beta.

Jdforrester-WMF triaged this task as Normal priority.Apr 8 2015, 8:19 PM
Jdforrester-WMF removed a project: Patch-For-Review.

There's a Parsoid patch which is under discussion: https://gerrit.wikimedia.org/r/201926

As presently consituted (PS5), we can't always guarantee that the <a> tags will be merged, for example if two adjacent pre-existing <a> tags are edited to have the same href, we won't merge them.

I don't think this sort of incomplete normalization is worth doing in Parsoid, because it introduces corner cases that will eventually breed bugs. Since Parsoid can't always normalize <a> tags in edited content, I think @Etonkovidova's (already merged) patch to fix this on the VE side is the superior solution.

The fix is in test2(1.26wmf2 (71e4dd1).
The production still displays split up & doubled link -[[Foo|f]][[foo|oo]]