Page MenuHomePhabricator

Special:LinkSearch cannot search with a port in the url
Closed, ResolvedPublic

Description

Giving a url with a port to search form of Special:LinkSearch, will not find the url.

A user on de.wp was searching for:

http://www.gencat.net:8000/osial/owa/p01.dad_ens?via=0&cod=0800180001

This results in a query (on my localhost, git master):
SELECT page_namespace AS namespace, page_title AS title, el_index AS value, el_to AS url
FROM page , externallinks
FORCE INDEX ( el_index )
WHERE (
page_id = el_from
)
AND (
el_index LIKE 'http://net:8000.gencat.www./osial/owa/p01.dad\_ens?via=0&cod=0800180001%'

But el_index in the database is:
http://net.gencat.www.:8000/osial/owa/p01.dad_ens?via=0&cod=0800180001

which cannot match, because the database has the port after domain and the special page put the port after the tld, which looks strange.


Version: 1.20.x
Severity: normal

Details

Reference
bz40588

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:47 AM
bzimport set Reference to bz40588.

There are a few problems with the linksearch:

  • When no protocol is specified it defaults to http://, except when you specify a port: this makes the search not use a protocol at all (finding nothing).
  • When both a protocol and a port are specified the LIKE clause created is wrong, because it treats the port as part of the domain and it gets moved to the front.
  • It doesnt handle urls with names and passwords correctly.
  • The standard text on the page says you should not include the protocol in the search. But this not correct: to find links for any protocol other than http://, the protocol should be specified.

I have to put this cookie back. Fixing this is not as easy as it looks. I keep bumping into special cases needing extra checks and workarounds. Maybe the linksearch functionality should be rewritten from scratch, including support for ipv4 addresses (currently done ad hoc by LinkSearchPage::mungeQuery()) and more general wildcard support (like "ftp://*" to get all ftp links, etc).

(In reply to comment #3)

I have to put this cookie back. Fixing this is not as easy as it looks. I
keep
bumping into special cases needing extra checks and workarounds. Maybe the
linksearch functionality should be rewritten from scratch, including support
for ipv4 addresses (currently done ad hoc by LinkSearchPage::mungeQuery())
and
more general wildcard support (like "ftp://*" to get all ftp links, etc).

The current changeset solves this problem and the general wildcard support, so I think this one is perfectly eligible for 1.21 release.

Of course, you can improve ipv4 support but that's not related to this bug ;)

Reminder: As this has the 1.21.0 target milestone set, this patch needs to get more reviews and merged in the next weeks to include it in 1.21.0.

Change 28908 merged by Brian Wolff:
(bug 40588) LinkSearch cannot search with a port in the url

https://gerrit.wikimedia.org/r/28908