Run maintenance/cleanupUsersWithNoId.php on all wikis
Open, NormalPublic

Description

This should only be done after 1.31.0-wmf.11 is deployed.

To fully resolve T9240: Usernames in history of imported pages should refer to original wiki and prepare for T167246: Refactor "user" & "user_text" fields into "actor" reference table, this maintenance script needs to be run to clean up existing imported rows and CentralAuth global blocks.

The script needs to be run twice for each wiki.

  1. With --table ipblocks --prefix meta to adjust CentralAuth global blocks. If for some reason $wgCentralAuthGlobalBlockInterwikiPrefix is changed, adjust the prefix according.
  2. With --assign --prefix imported --force to clean up old imports. Or of someone wants to give me a list of wiki language codes and short prefixes that mean more or less "imported", I could probably use that.

Related Objects

Anomie created this task.Nov 30 2017, 4:29 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 30 2017, 4:29 PM
Anomie updated the task description. (Show Details)Nov 30 2017, 4:57 PM

Mentioned in SAL (#wikimedia-releng) [2017-11-30T16:58:36Z] <anomie> Running cleanupUsersWithNoId.php on Beta Cluster, see T181731

Mentioned in SAL (#wikimedia-cloud) [2017-11-30T16:58:36Z] <anomie> Running cleanupUsersWithNoId.php on Beta Cluster, see T181731

Anomie updated the task description. (Show Details)Nov 30 2017, 5:00 PM

Mentioned in SAL (#wikimedia-releng) [2017-11-30T17:49:42Z] <anomie> Finished running cleanupUsersWithNoId.php on Beta Cluster for T181731

Mentioned in SAL (#wikimedia-cloud) [2017-11-30T17:49:42Z] <anomie> Finished running cleanupUsersWithNoId.php on Beta Cluster for T181731

Nirmos added a subscriber: Nirmos.Nov 30 2017, 6:15 PM

Mentioned in SAL (#wikimedia-cloud) [2017-11-30T18:59:57Z] <bd808> Testing stashbot fix for double phab logging (T181731)

Mentioned in SAL (#wikimedia-operations) [2017-12-06T16:03:11Z] <anomie@terbium> Running cleanupUsersWithNoId.php for testwiki, see T181731

Mentioned in SAL (#wikimedia-operations) [2017-12-06T16:14:38Z] <anomie@terbium> Running cleanupUsersWithNoId.php for test2wiki, see T181731

Mentioned in SAL (#wikimedia-operations) [2017-12-06T16:27:00Z] <anomie@terbium> Running cleanupUsersWithNoId.php for testwikidatawiki, see T181731

Mentioned in SAL (#wikimedia-operations) [2017-12-06T16:29:37Z] <anomie@terbium> Running cleanupUsersWithNoId.php for mediawikiwiki, see T181731

Graham87 added a subscriber: Graham87.EditedDec 8 2017, 12:48 AM

Another complication relating to this script would be T2323, involving usernames stored with underlines, extra spaces and initial lower-case letters. Quite a few edits affected by this bug also have a rev_user of 0 ... they can probably be found in all the tables besides the "/Positive rev_user" one here: https://en.wikipedia.org/wiki/User:Nemo_bis/Bug_323_revisions

Anomie added a comment.Dec 8 2017, 5:53 PM

Another complication relating to this script would be T2323, involving usernames stored with underlines, extra spaces and initial lower-case letters.

Such entries won't be touched by this script, since they cause User::isUsableName() to return false, and will eventually be copied as-is into the actor table.

If someone decides to resolve that bug in the future, they would need to implement similar prefix-or-assign logic as is being used here.

Mentioned in SAL (#wikimedia-operations) [2017-12-12T19:35:26Z] <anomie> Running cleanupUsersWithNoId.php on all wikis (this will take a while), see T181731

It appears that this script is causing SUL accounts to be created at wikis where pages have been imported – not unreasonable, but the comments at https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Local_accounts_attached_without_a_visit_(and_welcomed_without_an_edit) indicate that it's surprising a few people.

This broke wikidata and dewiki during the night, causing mediawiki exceptions and (suprisingly only) timeouts on page views/edits, freeze of the recentchanges and watchlist functionality. Apparently, archive table differs between servers due to and old mediawiki bug that inserted into archive with INSERT...SELECT, and this script touches many old archive tables, breaking replication on half of the servers.

@Anomie were you by any chance running unattended long-running scripts without screen? Maybe I was wrong, but it confused me a lot to be able to kill your processes.

Sigh, I guess this task is another one that's blocked on s5 DB weirdness being fixed. Let me know when that happens please.

I don't know much of anything about screen. Are there instructions somewhere for how to run maintenance scripts in it correctly? I don't see any mention of it on https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment in the sections about running maintenance scripts.

Graham87 added a comment.EditedDec 14 2017, 7:28 AM

Would we able to undo the results of this script or reconfigure it for the Nostalgia Wikipedia? One of the good things about that site (a copy of the Wikipedia database from 20 December 2001) is that it made it fairly easy to compare edits from 2001 between enwiki and the Nostalgia Wikipedia database. Now that's impossible:
https://en.wikipedia.org/w/index.php?title=Astronomy_and_Astrophysics/History&action=history

I'm 99% sure we won't have these problems with pre-MediaWiki edits on enwiki, except perhaps for the editors who got renamed to a ~enwiki prefix while their edits got left behind (see my earlier link to the village pump thread), because I created all the old account names on that site. I'm not so sure about other old Wikipedias though ...
*edit* it's not really a problem in this case, e.g.
https://en.wikipedia.org/wiki/Special:CentralAuth/Jmccann

Sigh, I guess this task is another one that's blocked on s5 DB weirdness being fixed

@Anomie- actually it is blocked on core taking ownership and followup of the problems generated by bad archive queries. It made other other wikis break, it just complains loudly on s5. If those queries wouldn't have broken dewiki, s5 weirdness would not have affected them.

@Anomie As you can see here, for example, s7 has the same problems with archive: T163190 (also tags-related tables, but that is not part of core, and lesser issue). The data loss on archive Non-deterministic query problems were brought up as early as 2015 T112637 (even before, in a non-formal way).

jcrespo added a comment.EditedDec 14 2017, 10:21 AM

There is a good introduction to screen on https://wikitech.wikimedia.org/wiki/Screen I do not think there should be any guidelines on deployment (as you said in the past, let's not red-tape unnecessarily) but it is a hugely vital tools for managing tasks on a server. With screen I can deploy code from a train or a plane, and not worry about connection interruptions. I thought about what you said of "being limited by buffer", and I wonder if you didn't know you could, on a screen session:

Ctrl+a, Esc (technically, '[', but escape is easier on my keyboard) to go to edit mode, then scroll up and down.I think by default it has around 1000 lines of buffer, you can add more to match those on your terminal, like [https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/admin/files/home/jynus/.screenrc | I did ].

Nobody expects you to know all these, but it certainly makes collaborating with others way easier- I see a SCREEN process by anomie? I know some long running process is ongoing is there- you can connect from any client, share it with other people, etc. Like a vim editor, it takes to get used to, but later you cannot live without it.

Trizek-WMF added a subscriber: Trizek-WMF.

The script has caused SUL accounts to be created and it has been noted, like Sherry said. I have also seen some reports as well on fr.wp. Explain it on Tech News would be a good thing IMO.

Johan added a subscriber: Johan.Dec 14 2017, 11:32 AM

@Trizek-WMF Something like this for Tech News?

When you import a page from another wiki the usernames of the users who edited the article on the wiki you imported it from are shown in the article history. This should link to the users on the original wiki. A script to fix this caused problems for Wikidata and German Wikipedia. It also created a large number of [[<tvar|sul>m:Special:MyLanguage/Help:Unified login|SUL accounts]] on wikis where editors had never edited.

Sigh, I guess this task is another one that's blocked on s5 DB weirdness being fixed

@Anomie- actually it is blocked on core taking ownership and followup of the problems generated by bad archive queries.

Is there a task somewhere that says specifically what needs to be done? Not a huge generic RFC like T112637, a task with a checklist of actual work needed.

I note that if problems of this sort are already in the database, that's probably outside the scope of "core".

@Trizek-WMF Something like this for Tech News?

When you import a page from another wiki the usernames of the users who edited the article on the wiki you imported it from are shown in the article history. This should link to the users on the original wiki. A script to fix this caused problems for Wikidata and German Wikipedia. It also created a large number of [[<tvar|sul>m:Special:MyLanguage/Help:Unified login|SUL accounts]] on wikis where editors had never edited.

@Anomie, can you review that sentence?

That text seems appropriate to me.

jhsoby added a subscriber: jhsoby.Dec 15 2017, 12:31 AM

@Trizek-WMF Something like this for Tech News?

When you import a page from another wiki the usernames of the users who edited the article on the wiki you imported it from are shown in the article history. This should link to the users on the original wiki. A script to fix this caused problems for Wikidata and German Wikipedia. It also created a large number of [[<tvar|sul>m:Special:MyLanguage/Help:Unified login|SUL accounts]] on wikis where editors had never edited.

Actually, it's not just imports. My work account got registered in plenty of wikis, even though it hasn't edited anything that would be imported to many wikis. EBernhardson on IRC figured out that it was because of Wikidata changes being reflected in projects' Recent changes and Watchlist feeds. In my case, this edit to an item for a template used in many different projects probably triggered most of those account creations.

Johan added a comment.Dec 15 2017, 5:39 PM

@jhsoby Noted, but the text doesn't say it's because of the imports, but because of a script that tried to fix an issue with the imports.

EBernhardson on IRC figured out that it was because of Wikidata changes being reflected in projects' Recent changes and Watchlist feeds.

Hmm. That probably means I'll need to figure out what code in Wikidata is doing this, and then re-run the script over the recentchanges tables. Thanks for pointing that out.

Nemo_bis added a subscriber: Nemo_bis.EditedFri, Jan 5, 8:54 AM

EBernhardson on IRC figured out that it was because of Wikidata changes being reflected in projects' Recent changes and Watchlist feeds.

Hmm. That probably means I'll need to figure out what code in Wikidata is doing this, and then re-run the script over the recentchanges tables. Thanks for pointing that out.

I see that the script works on the recentchanges table too (which I didn't expect, and maybe should be optional: it's not very useful when you throw away the RC in few weeks, IMHO). Then the query needs a rc_type = 0 condition, or at any rate rc_type < 5.

Current:

		$this->cleanup(
			'recentchanges', 'rc_id', 'rc_user', 'rc_user_text',
			[], [ 'rc_id' ]
		);
Anomie added a comment.Fri, Jan 5, 2:23 PM

I see that the script works on the recentchanges table too (which I didn't expect, and maybe should be optional: it's not very useful when you throw away the RC in few weeks, IMHO).

It's required for the actor table migration that all tables involved, including recentchanges, are properly cleaned up.

Then the query needs a rc_type = 0 condition, or at any rate rc_type < 5.

All rows have to be cleaned up for the actor table migration, regardless of rc_type.

Anomie added a comment.Tue, Jan 9, 2:51 PM

@jcrespo: Is this still blocked for dewiki (s5) and wikidatawiki (now s8)? Or did the issues blocking it get fixed with the resolution of T161294?

You can run it, but please add it to "week of" on the deployment page. Check with @Marostegui as it may or may not interfere with the comment refactoring schema change.

I am currently running the comment refactoring schema change on s5. Once done, I will go for s8.