Page MenuHomePhabricator

URL shall be terminated by %2E or _ if page name ends with dot
Closed, DuplicatePublic

Assigned To
None
Authored By
PerfektesChaos
Jul 14 2020, 8:11 PM
Referenced Files
F31936073: image.png
Jul 15 2020, 1:53 PM
F31936075: image.png
Jul 15 2020, 1:53 PM
Tokens
"Like" token, awarded by MGChecker.

Description

Objective: Various text processing software will automatically create a link if an URL is encountered in user text. Office programs, mail clients, blog and chat and messengers are doing that.

  • If a URL is followed by a dot, and perhaps whitespace, the dot will not be included into the generated link target. It will be interpreted to terminate the sentence, and the link is broken.
  • Wiki servers behave in this way e.g. for ,.;? since 2001.
  • Phabricator will do the same, as see: https://en.wikipedia.org/wiki/Hell,_etc.
  • enwiki has a smart mechanism to ask now for Did you mean.

Proposal: If the last character of any URL would be a dot U+002E, this should be followed in address bar by _ and represented in any generated <a> link by %2E encoding.

We are already doing the same for ? as by https://es.wikipedia.org/wiki/What%27s_Up,_Doc%3F but for query part escaping in URL.

  • It might be found that comma and semicolon are affected as well, but the number of wiki page names is rather limited.

Event Timeline

I think this is impossible. Unlike the question mark ? etc., the dot . is not a reserved character. Even if MediaWiki percent-encoded the dot as you suggest, the browser is doing to decode it when displaying the URL.

I actually wasn't sure what was going to happen (I was half-expecting a bug like this: T106793), so I tested it, I'll upload that test in a moment for you to see.

Change 612866 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/core@master] [DO NOT MERGE] Percent-encode trailing dot in page titles

https://gerrit.wikimedia.org/r/612866

Change 612866 abandoned by Bartosz Dziewoński:
[mediawiki/core@master] [DO NOT MERGE] Percent-encode trailing dot in page titles

Reason:
Doesn't work, see task

https://gerrit.wikimedia.org/r/612866

Here's a demo site running that patch: http://patchdemo.wmflabs.org/wikis/32cd13c9d1a64f0f3af3937f4be4f667/w/index.php/Hell,_etc.

You can confirm that the patch works by looking at the href on the "Read" tab link, where the dot is encoded as %2E:

image.png (2×3 px, 603 KB)

But if you click that link, the browser requests the URL with . instead (it's not even a redirect, it just changes the URL), and it's also displayed as a plain dot on the address bar:

image.png (2×3 px, 470 KB)

I see the same behavior in Chrome, Firefox and Edge.

(Feel free to experiment with the demo site.)

Okay, thanks for your efforts.

While this seems to be knocked out by browser address bar refining and polishing (I should ponder about some JavaScript attempting to set document.url but usually that is triggering page reloading, however there are some strange history manipulation methods I used some years ago) I dreamt of another issue tonight:

What about grabbing URL by context menu? If there is wikitext [[Hell, etc.]] in wikitext and I take produced Hell, etc. and this link is delivered with %2E rather then . I could copy at least that URL into my text environment.

I will check that window.history.replaceState() business anyway. It has no promising performance to run some JavaScript after page has been loaded to manipulate the address bar URL just on a wiki.

Basically it would be an issue for major browsers not to rewrite a dot (comma, semicolon, brackets) and even encode them in the C&P mode (they display most characters decoded, but serve an encoded version or the original string on copy). All users of that browsers for any domain would benefit when mentioning in text input anywhere.

The failure first.

Steps to reproduce:

  1. open https://en.wikipedia.org/wiki/Hell,_etc%2E
  2. Start a JavaScript debug code input.
  3. Provide the following statements.

Watch address bar and try to C&P:

window.history.pushState( {wiki:"textInput"}, "no dot", "Hell,_etc%2E" )
window.history.pushState( {wiki:"textInput"}, "no dot", "Heaven_and_below" )

However, that brought me to a different approach to solve the initial problem. What about:

I am afraid now some smart browsers will swallow the # but the _ will protect the period from occurring as last character.

PerfektesChaos renamed this task from URL shall be terminated by %2E if page name ends with dot to URL shall be terminated by %2E or _ if page name ends with dot.Jul 17 2020, 10:43 AM
PerfektesChaos updated the task description. (Show Details)
  • For generation of any URL, both internal and external, %2E in <a href="...%2E"> would be sufficient. It will be copied by context menu mechanisms as-is and hide the dot from any mail client and office program, whichever URL might be mutilated otherwise. However, that is the same URL as before.
  • For our pages themselves they should declare their own URL by appending _ which will be kept in address input field of the browser and may be copied there. If wiki server is called with superfluous _ that is removed anyway.
  • On footers, by Special:CiteThisPage etc. the URL of this page, if any, shall be propagated as %2E terminated, but currently ending with digits anyway.

Hmm, the idea with _ is clever and I kind of like it.