Page MenuHomePhabricator

Delete all IP talk pages created by TuanminhBot in viwiki
Closed, ResolvedPublic

Description

There are around 10M IP talk pages automatically created by TuanminhBot in viwiki. It is adding around 50M lint errors and many tables in that wiki has grown to be overly large. The community wants to get rid of those talk pages (if they are created and only edited by the bot) but deleting 10M pages is complicated. I can do the same thing I've done on ruwikinews (T403397).

Community discussion: https://vi.wikipedia.org/wiki/Wikipedia:Tin_nh%E1%BA%AFn_cho_b%E1%BA%A3o_qu%E1%BA%A3n_vi%C3%AAn#c-Ladsgroup-20251110212900-Fixing_around_20M_lint_errors

Event Timeline

Ladsgroup triaged this task as Medium priority.Nov 11 2025, 1:19 PM
Ladsgroup moved this task from Triage to In progress on the DBA board.
Ladsgroup updated the task description. (Show Details)

@Ladsgroup Thanks a lot for your help on this. Might I ask that the bot be sped up? The consensus is that we want the task to be done as soon as possible.

At the current speed of about 43 or 44 pages per minute, it will take 164 days to delete all 10.1 million pages. For reference, @HideonRosie and @DreamRimmer's bots were run at several hundreds per minute.

Okay. I added a second runner. I think if we want to do more, it can add too much pressure on the server. If things go well, I'll add a third runner.

@Ladsgroup Now it's running at just 18 pages per minute. Did something happen?

pywikibot according to the load on the database reduced the speed automatically, I'm going to be a bit pushier with that thing now.

I thought the consensus was limited to deleting only the pages returned by the query that HoR shared, which is why I stopped my bot. Thanks to Ladsgroup for taking care of the rest of the cleanup, and please feel free to ping me anytime if you need my help.

To be clear: My bot deletes user talk pages created by TuanminhBot if all of these conditions are met:

  • Has less than ten edits.
  • All edits are done by TuanminhBot
  • Username is a valid IPv4 or IPv6

To make it be able to delete faster, I put some randomness in picking what to check and what to delete. So the list of deletion has no particular order.

Any ETA for this? Thanks!

Around 5M deleted already and 5M to go so I'd say one to two months.

This should done today/tomorrow.

This should done today/tomorrow.

This is mostly now done. Only around 300K pages left but most of them shouldn't be deleted. I'm doing one last round before closing this ticket. The deletions will be slower than normal though.

This is done now. There are around 300K left but mostly seems to be needed? I leave that to the community to decide what to do with them.