Page MenuHomePhabricator

Special:Linksearch should de/encode internationalized domain names
Closed, ResolvedPublic8 Estimate Story Points

Description

http://xn--kbenhavn-54a.eu is the decoded version of the internationalized domain name http://københavn.eu . Steps to reproduce:

  1. Place a link to any one of the above domains on any wiki page.
  2. Search for the other in Special:Linksearch

The link created in Step 1 is not returned. Special:Linksearch should find links for all combinations of a de/encoded search string and a de/encoded link.

Version: 1.27.0-wmf.17 (rMWd511973bb2ba) 03:00, 18 March 2016

See also:

Details

Related Gerrit Patches:

Event Timeline

MER-C created this task.Mar 20 2016, 7:01 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 20 2016, 7:01 AM

Change 320721 had a related patch set uploaded (by Legoktm):
Parser: normalize internationalized domain names

https://gerrit.wikimedia.org/r/320721

My patch normalizes the externallinks table to the unicode version, but requires that you use the unicode version on Special:LinkSearch, is that acceptable?

MER-C added a comment.EditedNov 10 2016, 12:05 PM
  • It's necessary, but not sufficient -- one could imagine someone placing a link like xn--kbenhavn-54a.eu (e.g. https://en.wikipedia.org/?diff=748790713). It violates the principle of least astonishment that when one copies the link into Special:Linksearch -- the usual workflow -- no results are returned. Those not technically inclined wouldn't know what to search for, and those who are have to go through the step of encoding the domain name using some external tool.
  • Are there enough links of the form xn--kbenhavn-54a.eu existing in our databases to justify a maintenance script? (My gut feeling is no.)
  • Does this patch fix T130483 as a side effect?
  • It's necessary, but not sufficient -- one could imagine someone placing a link like xn--kbenhavn-54a.eu (e.g. https://en.wikipedia.org/?diff=748790713). It violates the principle of least astonishment that when one copies the link into Special:Linksearch -- the usual workflow -- no results are returned. Those not technically inclined wouldn't know what to search for, and those who are have to go through the step of encoding the domain name using some external tool.

Alright, I'll work on that then.

  • Are there enough links of the form xn--kbenhavn-54a.eu existing in our databases to justify a maintenance script? (My gut feeling is no.)

No idea, but we can either wait for pages to be naturally purged or use refreshLinks.php. But since you don't think so, we can just let it happen naturally.

  • Does this patch fix T130483 as a side effect?

Kind of. It normalizes everything to the unicode form, so regexes that are in unicode will get matched against both decoded and encoded domains.

TheDJ added a subscriber: TheDJ.Nov 11 2016, 11:28 AM
  • Does this patch fix T130483 as a side effect?

Kind of. It normalizes everything to the unicode form, so regexes that are in unicode will get matched against both decoded and encoded domains.

I would consider that sufficient, as long as it is documented somewhere.

Anomie added a subscriber: Anomie.Nov 21 2016, 7:48 PM

Change 322729 had a related patch set uploaded (by Anomie):
Use new externallinks.el_index_60 field

https://gerrit.wikimedia.org/r/322729

MaxSem claimed this task.Jun 5 2017, 10:28 PM
MaxSem added a project: Community-Tech-Sprint.
kaldari set the point value for this task to 8.Jun 5 2017, 10:34 PM
Anomie added a comment.Jun 6 2017, 2:23 PM

@MaxSem: Note that https://gerrit.wikimedia.org/r/322729 would solve this task. Current status of that change is that the schema change in T153182 needs doing, then https://gerrit.wikimedia.org/r/#/c/322728/ needs merging, then a maintenance script needs running.

MaxSem removed MaxSem as the assignee of this task.Jun 6 2017, 7:50 PM
MaxSem removed a project: Community-Tech-Sprint.
MaxSem added a subscriber: MaxSem.

Okay, leaving this to avoid stepping on your toes.

Change 320721 abandoned by Legoktm:
Parser: normalize internationalized domain names

Reason:
Anomie's patch will take care of this once the schema change is done.

https://gerrit.wikimedia.org/r/320721

Change 322729 merged by jenkins-bot:
[mediawiki/core@master] Use new externallinks.el_index_60 field

https://gerrit.wikimedia.org/r/322729

Newly-created IDN links will now be translated.

Existing IDN links may not be found correctly for all searches until T209373: Run maintenance/refreshExternallinksIndex.php on all wikis is complete.

Anomie closed this task as Resolved.Nov 16 2018, 3:07 PM
Anomie claimed this task.