Page MenuHomePhabricator

Links: Add support for self-links to Parsoid
Closed, ResolvedPublic

Description

See https://en.wikipedia.org/wiki/User:Ssastry/VE_Test#Self-link_test and see Parsoid's rendering of the same.

https://meta.wikimedia.org/wiki/Help:Self_link is the documentation.

Causes rendering diffs in https://en.wikipedia.org/wiki/Prairie_Home_Companion -- found via looking at a visual diff.


Version: unspecified
Severity: normal

Details

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:31 AM
bzimport set Reference to bz67486.

One issue to consider with self-links is that the page name might not be constant. Flow posts for example are displayed in several contexts including timelines. Users might expect the 'self' to follow the actual page URL.

Originally we considered styling self links with CSS (attribute selectors), similar to red links.

Solution using CSS:

<html><style>

    a[href="test.html"] {
        font-weight: bold;
        color: inherit;
        text-decoration: inherit;
        pointer-events: none;
        cursor: text;
    }
</style>
<body>
    Some link to <a href="test.html">test.html</a> 
</body>

</html>

One issue to consider with self-links is that the page name might not be constant. Flow posts for example are displayed in several contexts including timelines. Users might expect the 'self' to follow the actual page URL.

We're consistently rendering against the topic now (except board header, which is rendered against the board).

Originally we considered styling self links with CSS (attribute selectors), similar to red links.

On a single board, topic A might link to itself. Then, topic B might link to topic A. Only the self-link should be bold. Topic B's link to Topic A should appear as a normal link.

Thus, to do it with CSS/LESS we would have to post-process. E.g. convert the above into the following LESS:

#flow-topic-se7l8dzgtdcd1f47 {
    a[href="/wiki/Topic:Se7l8dzgtdcd1f47"] {
        font-weight: bold;
        color: inherit;
        text-decoration: inherit;
        pointer-events: none;
        cursor: text;
    }
}

(the href differs from what Parsoid would give because of other post-processing).

It might end up simpler for Flow to do a DOM transformation (which we already do for other purposes, e.g. red links).

Maybe it should be marked as a self-link with data-mw; not sure.

@GWicke's idea about putting the "document identity" in the CSS is interesting, so that a link could be styled as a self-link (or not) depending on the CSS that is applied to it.

I think that's probably overkill though -- the existing parser has a number of ways that "the current document title" is part of the parsing context (though not ParserOptions, oddly, it's referenced through the Parser which is one reason a lot of parser hooks explicitly pass a Parser object...). {{PAGENAME}} etc. Self-links are (I expect) pretty rare, so they the fact that they could split a possible fragment cache based on 'current document title' isn't much of a concern right now.

I think the editing-uniformity issue is the main thing. We'd like the self link to have as much commonality with a standard link as possible, so that they are "obviously links" to editing tools and gadgets and don't require crazy special casing.

From the HTML spec, "just" changing the href attribute on the <a> tag to mw-href or something would be sufficient -- https://www.w3.org/TR/CSS22/selector.html#x26 notes that HTML defines :link as equivalent to a[href].

But I think @GWicke identified an even better method, which doesn't require any markup changes at all, other than adding a "link color" (MW has an extensible link color system, primarily for the Disambiguator extension but used in other extensions as well):

a.mw-selflink {
        font-weight: bold;
        color: inherit;
        text-decoration: inherit;
        pointer-events: none;
        cursor: text;
        user-select: text;
    }

The key looks to be the "pointer-events: none; cursor: text; user-select: text" aspect, which makes it non-clickable (and keeps the mouse cursor from making it appear clickable) but still allows text to be selected. There might need to be some minor fine tuning necessary for mobile touch events, but I bet that would work as-is (I think touch generally respects pointer-events). See https://developer.mozilla.org/en-US/docs/Web/CSS/pointer-events and https://developer.mozilla.org/en-US/docs/Web/CSS/user-select

Finally, note that properly adding the 'mw-selflink' color is not *quite* as simple as it may seem. In addition to the normal Title canonicalization stuff, LanguageConverter throws in some extra wrinkles to make sure that titles which are equivalent to each other except for variant (that is, if the page title "in the database" is in (say) Latin script, but you write the title in the article body in Cyrillic script, it still properly resolves as a self-link). Brief thoughts:

  1. We may decide to do the title variant normalization at the Parsoid level, and have the actual href be canonical (that is, guaranteed to be the direct "in database" link). This requires Parsoid to query the database for the presence of each title, though -- in a similar way to redlinks -- and of course that means the Parsoid href doesn't actually directly correspond to the wikitext editor's text *and* that the Parsoid output ought to change if an article of a specific name was ever added to the DB (again, like redlink marking). But if we did canonicalization at the href level (during language conversion?), then indeed the actual self-link code could just compare hrefs. In fact, it might make sense to put self-link code in the same pass as redlinks, and simply rerun that pass after language conversion.
  2. Alternatively, we could consider making self-link marking an extension, like the Disambiguator extension. No strong compelling reason here, other than to exercise the existing color hook and keep the Parsoid core as small as possible. While we're at it, we could probably make redlink marking an extension as well! That could reduce the size of Parsoid's data access API, and it would provide a good case study/example for a simple "post processing pass extension".

EDIT: just to spell it out, there's also title-remapping done by MediaWiki/Language Converter using HTTP redirects. That does the well-known space-to-underscore and first-capital-letter normalization *and also* if the supplied title appears to be missing, converts it to different variants and redirects if they exist. So you don't *need* for your <a> tag href attributes to be canonical -- they'll redirect dynamically if followed. Attempting to canonicalize them means that they will 'break' if new articles are created; that is, we might link directly to article-title-in-cyrillic even after article-title-in-latin is created, and the latter *should* be the target of wikilink-written-in-latin. But these invalidations are again similar to redlink marking, so they are worth keeping in mind but are not blockers.

I believe we currently use the title attribute of the <a> tag to hold the "title as it was written in wikitext" and already normalize the href. We just need *some* attribute of <a> (including data-mw) to reflect "the title as the author write it" in order to support round-trips and editing, we don't actually need that to be the href.

Not necessarily going to work on this immediately (I've got higher-priority parser tests tasks) but since I added the GetLinkColors hook to core/Parsoid I'll provisionally claim this task.

Jdforrester-WMF renamed this task from LInks: Add support for self-links to Links: Add support for self-links to Parsoid.Jan 8 2021, 6:11 PM

Self-links are (I expect) pretty rare, so they the fact that they could split a possible fragment cache based on 'current document title' isn't much of a concern right now.

I think most inclusions of navboxes contain a self-link. Not sure this is enough for for being a concern, but I wouldn’t classify it as “pretty rare”.

Self-links are (I expect) pretty rare, so they the fact that they could split a possible fragment cache based on 'current document title' isn't much of a concern right now.

I think most inclusions of navboxes contain a self-link. Not sure this is enough for for being a concern, but I wouldn’t classify it as “pretty rare”.

That is a little annoying, since it means that we can't reuse the expansion of the navbox on other pages -- but I bet we could work around that just by ensuring that link coloring is done after substitution. Redlink coloring is already done as a postprocessing step, so as long as self-link coloring is done in the same phase it should be fine.

Self-links also happen whenever you sign a comment with ~~~~ on your own talk page. (en.wp has a hack to prevent this with the default signature, but it will happen on most other projects and with most custom signatures.)

Are self-links with fragments going to be handled any differently to how they are now? We're running into this in WS Export where we're having to rewrite links because the HTML is being used in a document with a different title. For example, in the old parser [[#Foo|]] is rendered with href="#Foo" but it's now href="./Lorem#Foo" and so breaks if the page is saved in lorem.html. Even a way to select self-links would be good; at the moment they're just given rel="mw:WikiLink" along with most other links (some cases, such as references, are possible to work around, but not all – e.g. {{ref}}).

If we mark self-links at some point, the DiscussionTools team requests that we *preserve* the href in the HTML. It ought to be possible to suppress the link underline and link behavior via CSS. It complicates their code to have to deal with two types of links in their comment parser, some of which don't actually have href attributes.

So if we do end up implementing self-links, it ought to be via adding a link class and not otherwise changing the markup.

(This needs to be a post-processing pass in Parsoid, because pages can be transcluded or language-converted, both of which alter what is considered a 'self link'. We'd want to reuse the core parse of the page in other contexts with different definitions for self-links, and having self-link classes present in the core HTML for the page would prevent doing that.)

  1. Alternatively, we could consider making self-link marking an extension, like the Disambiguator extension. No strong compelling reason here, other than to exercise the existing color hook and keep the Parsoid core as small as possible. While we're at it, we could probably make redlink marking an extension as well! That could reduce the size of Parsoid's data access API, and it would provide a good case study/example for a simple "post processing pass extension".

Marking red and self-links seems to be a pretty core thing of MediaWiki, so please don’t let system administrators have to enable them explicitly. If you think of a piece of code within the MediaWiki core repository that behaves somewhat like an extension but is loaded implicitly (and preferably doesn’t show up on Special:Version), it’s okay, but it should remain part of core from system administrators’ point of view.

I think current plan is to do this in the redlink-marking pass (or in a new similar postprocessing pass) so that we can reuse core parsoid html as fragments in other pages, but the selflinks are replaced "at the last minute".

I'm pretty sure we can *just* add the mw-selflink class and then do the rest with CSS. It looks like there's already CSS in mediawiki-core:resources/src/mediawiki.skinning/content.links.less which does this in fact.

Note that proper self-link marking has a dependency on language converter, since (using pig-latin as the pretend variant) if we're on [[This Page]] then both [[This Page]] and [[Histay Agepay]] should be marked as self-links. That can probably be safely split off as a separate task once the "simple case" of self link marking is done.

(In theory you could handle self links without changing the HTML at all, just by adding a CSS rule to each page that matched a[href="./This_Page"] and in theory that could be done via ParserOutput::addHeadItem(). That would also cleanly push the responsibility for self link marking in the presence of language variants over to the language converter code, which would then just ensure that href targets are properly language converted. But that's probably too clever by half.)

Change 991827 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/services/parsoid@master] Add self-link processing

https://gerrit.wikimedia.org/r/991827

Change 991827 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Add self-link processing

https://gerrit.wikimedia.org/r/991827

Created T358193 as follow-up for possible language conversion issues, but the core behaviour is fixed.