Page MenuHomePhabricator

[Regression] Many url values have a malformed subdomain in wiki meta_p database table
Closed, ResolvedPublic

Description

As of late, the urls are broken for about 60 wikis. It seems something chopped the subdomain part to be 2 characters, which is arbitrary and wrong and breaks urls and/or makes them point to unrelated wikis;

Example:

cewiki ce.wikipedia.org
cebwiki ce.wikipedia.org
..
chwiki ch.wikipedia.org
chrwiki ch.wikipedia.org
chywiki ch.wikipedia.org
..
ganwiki ga.wikipedia.org
gagwiki ga.wikipedia.org
gawiki ga.wikipedia.org

Affected tools end up linking to either broken domains, or linking to the same domain twice with some computing an incorrect result.

See also:

-- MariaDB [meta_p]>
SELECT url, COUNT(*) c FROM wiki GROUP BY url HAVING c>1;
-- 66 rows in set (0.04 sec)

This query should return 0 rows.

Event Timeline

Krinkle created this task.Dec 22 2016, 11:36 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 22 2016, 11:36 PM
Krinkle triaged this task as Unbreak Now! priority.Dec 22 2016, 11:36 PM
Restricted Application added subscribers: Jay8g, TerraCodes. · View Herald TranscriptDec 22 2016, 11:36 PM
Krinkle updated the task description. (Show Details)Dec 22 2016, 11:38 PM

This also affects GUC, as it may now produce a "diff" urls to a revision that doesn't exist on the target domain (or exists but is unrelated).

This would be due to https://gerrit.wikimedia.org/r/#/c/325949/7/modules/role/files/labs/db/views/maintain-meta_p.py where the matches = re.match("^(.*)(wik[it].*)", db) and lang = matches.group(1) were changed to lang = db[:2]

It was self-merged by @chasemp

chasemp added a subscriber: madhuvishy.

@madhuvishy seems we missed this, can you dig in and figure out the right fix? Running maintain_meta-p on labsdb10[9|10|11] should be fine for evaluating. Live values are still 1001 and 1003 when there is a fix.

Wargo added a subscriber: Wargo.Dec 23 2016, 5:58 PM

Change 328929 had a related patch set uploaded (by Madhuvishy):
labsdb: Fix wiki url construction in maintain_meta-p

https://gerrit.wikimedia.org/r/328929

Change 328929 merged by Madhuvishy:
labsdb: Fix wiki url construction in maintain_meta-p

https://gerrit.wikimedia.org/r/328929

Change 328931 had a related patch set uploaded (by Madhuvishy):
labsdb: Fix maintain-meta_p to insert correct url into wiki db

https://gerrit.wikimedia.org/r/328931

Change 328931 merged by Madhuvishy:
labsdb: Fix maintain-meta_p to insert correct url into wiki db

https://gerrit.wikimedia.org/r/328931

I reverted back to the regex @Krenair pointed out to parse language. Should be fixed now everywhere. The script was also inserting underscores in the urls instead of hyphens - also fixed now.

madhuvishy closed this task as Resolved.Dec 23 2016, 8:31 PM

Closing the task now, please reopen if this is still an issue. Thanks!