Page MenuHomePhabricator

namespaceDupes not handling deleted namespace redirects as desired
Closed, ResolvedPublic

Event Timeline

He7d3r raised the priority of this task from to High.
He7d3r updated the task description. (Show Details)
He7d3r added a subscriber: He7d3r.
He7d3r renamed this task from Execute namespaceDupes on Porguguese Wikipedia to Execute namespaceDupes on Portuguese Wikipedia.Mar 3 2015, 3:35 PM
He7d3r set Security to None.
Krenair renamed this task from Execute namespaceDupes on Portuguese Wikipedia to namespaceDupes not handling deleted namespace redirects as desired.Mar 3 2015, 4:12 PM
He7d3r raised the priority of this task from High to Unbreak Now!.Mar 4 2015, 7:33 PM
He7d3r updated the task description. (Show Details)

@He7d3r: Why is this "Unbreak now" priority?
Fixing this in MediaWiki's codebase won't have any direct influence on what's happening on Wikimedia sites like T75164.

Well, a fixed version of the script needs to be executed for ptwiki, so that things work as expected again.
We don't even have a way to find the pages which contain the (now) red links to "Anexo:Something" because
https://pt.wikipedia.org/wiki/Special:WhatLinksHere/Anexo:Biografias
doesn't show them (there is at least one other page linking to that title), there are no deletion logs and
https://pt.wikipedia.org/wiki/Special:WantedPages
should probably have some entries there too. As far as the users can see, the pages simply disappeared.

CC'ing @greg to evaluate how urgent this is

in the description, @He7d3r wrote:

The script namespaceDupes.php should fix that and hundred (possibly thousands) of other similar cases, such as the ones reported on this discussion:

Well, a fixed version of the script needs to be executed for ptwiki, so that things work as expected again.

Which is it? :) If all we need to do is simply run that maint script, then this can be done quickly. If there are fixes needed to that script to do this then that'll take longer. Also, what are those fixes to the script if you know them?

@greg: I don't know what needs to be fixed, but @demon executed its current version (see T75164#1082029) and it didn't restore the previous behavior of the redirects.

Change 194570 had a related patch set uploaded (by Greg Grossmeier):
WIP: namespaceDupes: find and orphaned namespaces

https://gerrit.wikimedia.org/r/194570

@demon: Do you plan to improve the patch in Gerrit?

Isn't possible to fix the current problem directly in the database so that ptwiki users are not affected anymore, and then this bug can have lower priority (and block any future namespace removal request)? Or fixing it in the database is as difficult as rewriting the script to do it?

The problem now is that eliminated edits cannot be accessed and we have people asking for the restoration of deleted articles. In this case, we are completely clueless because it's not even possible to tell if a page was deleted in the past or who did it….

Isn't possible to fix the current problem directly in the database so that ptwiki users are not affected anymore, and then this bug can have lower priority (and block any future namespace removal request)? Or fixing it in the database is as difficult as rewriting the script to do it?

Yes. Let me look at that.

Something like:

update page set page_title = concat("Anexo:",page_title), page_namespace = 0 where page_namespace = 102;

maybe?

Maybe a few runs of:

update page set page_title = concat("Anexo:",page_title), page_namespace = 0 where page_namespace = 102 order by page_title LIMIT 2000

..waiting a bit in between?

Ran:

update page set page_title = concat("Anexo:",page_title), page_namespace = 0 where page_namespace = 102;
update page set page_title = concat("Anexo_Discussão:",page_title), page_namespace = 0 where page_namespace = 103;

Against ptwiki. Entries look ok. We tried the order by ... limit ... but we hit a known replication bug about unsafe statements.

The redirect in the report and a few others listed at
https://pt.wikipedia.org/w/index.php?oldid=38478871#Particularmente_implicado
are now working (thanks!) but it seems the log still doesn't shows deleted pages from that namespace. E.g.:
https://pt.wikipedia.org/wiki/Special:Log/delete?page=Anexo%3ADizeres+populares

Also, pages containing red links to existing pages called "Anexo:..." still require a purge in order to show blue links.

Thank you guys! Much better now! As He7d3r already said, there's a few things yet to do, but I think compliments are in order!

Logging table fixed for Anexo_Discussão.

mysql:wikiadmin@db1024 [ptwiki]> update logging set log_title = concat("Anexo_Discussão:",log_title), log_namespace = 0 where log_namespace = 103;
Query OK, 15201 rows affected (11.76 sec)
Rows matched: 15201  Changed: 15201  Warnings: 0

Can't quite do Anexo yet, need to batch it:

mysql:wikiadmin@db1024 [ptwiki]> select count(*) from logging where log_namespace = 102;
+----------+
| count(*) |
+----------+
|   248661 |
+----------+
1 row in set (0.11 sec)

@demon: Any progress on the "live hack" for this?

Logging table is all cleaned up. Any other weirdness people are spotting?

demon lowered the priority of this task from Unbreak Now! to High.Apr 17 2015, 4:41 PM

Change 194570 abandoned by Chad:
WIP: namespaceDupes: find and orphaned namespaces

https://gerrit.wikimedia.org/r/194570

Yeah, we are still waiting for the redlinks to existing pages be purged. E.g.:
https://pt.wikipedia.org/wiki/Lu%C3%ADs_II
has a red link [[Anexo:Lista de condes e duques do Maine|Conde]] and
https://pt.wikipedia.org/w/index.php?title=Anexo:Lista_de_condes_e_duques_do_Maine&redirect=no
is an existing page.

Running refreshLinks for all pages on ptwiki.

demon removed demon as the assignee of this task.Apr 29 2015, 4:47 PM

Whoops, didn't mean to unassign.

Also: refreshLinks has since finished and the given redlinks are looking better. Logging and page tables are all fixed. Recentchanges is self-correcting as each new entry is added and old (bad) ones fall off.

@FcoLeonSaudanha: the problem you reported on

is the same that @Jbribeiro1 reported above (see T91401#1148380): a page such as
https://pt.wikipedia.org/wiki/Special:Undelete/Anexo:Lista_de_escritores_fora_da_Academia_Brasileira_de_Letras?uselang=en
shows:

Deletion log
(change visibility) 23:32, 8 January 2009 Yanguas (Talk | contribs | block) deleted page Anexo:Lista de escritores fora da Academia Brasileira de Letras (: 20 a 10: ELIMINADA) (view/restore)
Page history
There is no edit history for this page.

which means we (sysop users) won't be able to undelete these edits unless the underlying problem is fixed in MediaWiki (or the data in the database is fixed manually, if that is even possible).

@He7d3r, I understand,,But, there is way to recover such deleted pages history?

@FcoLeonSaudanha: I don't know. Only people with access to the database could check if the missing page histories still exist somewhere.

@He7d3r hunm, this case reminds me of the tragic change of software in December 2003 that ended much of the existing page story so then...

They're there, I just missed updating the archive table before. Looking at that now.

Archive tables fixed:

mysql:wikiadmin@db1024 [ptwiki]> select count(*) from archive where ar_namespace in (102,103);
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

Anything still looking out of place?

@demon I think not, feel free to close this task

Does this mean that if in the future a wiki deletes a custom namespace the problems we had for ptwiki will not happen again? (i.e. was namespaceDupes actually fixed?)

Does this mean that if in the future a wiki deletes a custom namespace the problems we had for ptwiki will not happen again? (i.e. was namespaceDupes actually fixed?)

No, namespaceDupes was not fixed...and I never found out an easy fix. I think I did it wrong...you're probably supposed to run namespaceDupes prior to actually deleting the namespace.