Page MenuHomePhabricator

Duplicate rows in externallinks table
Open, LowPublic

Description

Author: broken.arrow

Description:
The externallinks table may contain duplicate rows, even if the link is present
only once in the page text. Editing the page does not remove the stale entries
on the live site; running refreshLinks.php on a local copy does.

The page above is only one of several examples. Some of the affected pages on
it.wp include Acanthocalycium, Fegato, Elezione_incondizionata, etc.


Version: 1.20.x
Severity: normal
URL: http://it.wikipedia.org/w/index.php?title=Speciale%3ALinksearch&target=http%3A%2F%2Fwww.billie-joe.com

Details

Reference
bz9900

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:40 PM
bzimport added a project: Wikimedia-Rdbms.
bzimport set Reference to bz9900.
bzimport added a subscriber: Unknown Object (MLST).

Is this still a problem?

(In reply to comment #1)

Is this still a problem?

Seems there're still a bunch.

$ echo 'select el_from, el_to, count(*) c from externallinks group by el_from, el_to having c > 1;' | sql itwiki_p > bug9900

http://toolserver.org/~liangent/-/dbq/bug9900

545907 rows in set (2 min 45.47 sec)

Looking at tables.sql on Gerrit (https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=blob;f=maintenance/tables.sql;h=a848bf5eb469ce63b2693b4a392241c5eab76dd1;hb=HEAD), we can see the pagelinks, templatelinks, categorylinks, imagelinks, langlinks, and iwlinks all have a unique index on them. externallinks, however, has the following indices:


CREATE INDEX /*i*/el_from ON /*_*/externallinks (el_from, el_to(40));
CREATE INDEX /*i*/el_to ON /*_*/externallinks (el_to(60), el_from);

CREATE INDEX /*i*/el_index ON /*_*/externallinks (el_index(60));