Page MenuHomePhabricator

Duplicate rows in externallinks table
Open, LowPublic


Author: broken.arrow

The externallinks table may contain duplicate rows, even if the link is present
only once in the page text. Editing the page does not remove the stale entries
on the live site; running refreshLinks.php on a local copy does.

The page above is only one of several examples. Some of the affected pages on
it.wp include Acanthocalycium, Fegato, Elezione_incondizionata, etc.

Version: 1.20.x
Severity: normal



Related Objects


Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:40 PM
bzimport added a project: Wikimedia-Rdbms.
bzimport set Reference to bz9900.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to comment #1)

Is this still a problem?

Seems there're still a bunch.

$ echo 'select el_from, el_to, count(*) c from externallinks group by el_from, el_to having c > 1;' | sql itwiki_p > bug9900

545907 rows in set (2 min 45.47 sec)

Looking at tables.sql on Gerrit (;a=blob;f=maintenance/tables.sql;h=a848bf5eb469ce63b2693b4a392241c5eab76dd1;hb=HEAD), we can see the pagelinks, templatelinks, categorylinks, imagelinks, langlinks, and iwlinks all have a unique index on them. externallinks, however, has the following indices:

CREATE INDEX /*i*/el_from ON /*_*/externallinks (el_from, el_to(40));
CREATE INDEX /*i*/el_to ON /*_*/externallinks (el_to(60), el_from);

CREATE INDEX /*i*/el_index ON /*_*/externallinks (el_index(60));

Try the query

select * from externallinks where el_to like "//%" limit 10;

and you will see the so called "duplicates". el_to is duplicate, but el_index (and el_index_60) is different.

I think this can get closed.