Page MenuHomePhabricator

Deleting pages on the English Wikipedia is very slow
Closed, ResolvedPublic

Description

On the English Wikipedia, deleting pages has been taking an unusually long time.

Steps to reproduce:

  1. Find a page to delete
  2. Click the delete tab
  3. Click the Delete button
  4. Watch as the website hangs for 30 seconds (normally this takes ~3 seconds)

Could someone please look into this?

Event Timeline

AIUI the method we requested in the task above was to be applied when a page had a high number of revisions, not for all page deletions. If batched deletions are happening for all pages, we can certainly slow maintenance a lot. CC @BPirkle and @tstarling.

It is unexpected that the website would appear to hang, whether or not the deletion is batched.

The batch deletion threshold is controlled by $wgDeleteRevisionsBatchSize, which defaults to 1000.

The change also included an "immediate mode" that prevents using batched deletion in situations where it would be problematic. For example, deleteBatch.php uses this mode so that deletions from the maintenance script actually occur during execution of the script, rather than having to wait on the job queue.

Speaking directly to the issue raised in the description, I do not have an immediate explanation why normal deletions triggered from the user interface would be slow. I will investigate.

jijiki triaged this task as Medium priority.Oct 23 2018, 3:43 PM
jijiki added a subscriber: Krinkle.

I logged a deletion on en.wikipedia.org using X-Wikimedia-Debug, you can see it in mwlog1001.eqiad.wmnet:/srv/mw-log/XWikimediaDebug.log . You can see that the row count query was indeed very slow. The query was:

SELECT  COUNT(*) AS `rowcount`  FROM (SELECT  1  FROM `archive`    WHERE ar_page_id = '47773335' ...

Thank you. I came to the same conclusion a bit ago, and am now pondering what to do about it.

The offending code is:

			$archivedRevisionCount = $dbw->selectRowCount(
				'archive', '1', [ 'ar_page_id' => $id ], __METHOD__
			);

And the associated comment is:

			// Get archivedRevisionCount by db query, because there's no better alternative.
			// Jobs cannot pass a count of archived revisions to the next job, because additional
			// deletion operations can be started while the first is running.  Jobs from each
			// gracefully interleave, but would not know about each other's count.  Deduplication
			// in the job queue to avoid simultaneous deletion operations would add overhead.
			// Number of archived revisions cannot be known beforehand, because edits can be made
			// while deletion operations are being processed, changing the number of archivals.

tl;dr: we can't just count revisions as we archive them anymore, so I tried to count them after the fact. But that turned out to be slow. :(

$archivedRevisionCount is used (only) as a parameter to the ArticleDeleteComplete hook. A quick look suggests that value may not be frequently used, but I have not looked exhaustively, and I suspect it is there for a reason.

There's no index on ar_page_id, it needs to select by namespace and title

Change 469354 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] Fix slow query counting by unindexed field ar_page_id

https://gerrit.wikimedia.org/r/469354

Change 469354 merged by Tim Starling:
[mediawiki/core@master] Fix slow query counting by unindexed field ar_page_id

https://gerrit.wikimedia.org/r/469354

Change 469358 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@wmf/1.32.0-wmf.26] [1.32.0-wmf.26] Fix slow query counting by unindexed field ar_page_id

https://gerrit.wikimedia.org/r/469358

Change 469359 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@wmf/1.33.0-wmf.1] [1.33.0-wmf.1] Fix slow query counting by unindexed field ar_page_id

https://gerrit.wikimedia.org/r/469359

Change 469360 had a related patch set uploaded (by Paladox; owner: Tim Starling):
[mediawiki/core@REL1_32] Fix slow query counting by unindexed field ar_page_id

https://gerrit.wikimedia.org/r/469360

Change 469358 merged by Tim Starling:
[mediawiki/core@wmf/1.32.0-wmf.26] [1.32.0-wmf.26] Fix slow query counting by unindexed field ar_page_id

https://gerrit.wikimedia.org/r/469358

Mentioned in SAL (#wikimedia-operations) [2018-10-24T01:46:55Z] <tstarling@deploy1001> Synchronized php-1.32.0-wmf.26/includes/page/WikiPage.php: fix deletion performance regression T207530 (duration: 00m 55s)

Change 469359 merged by Tim Starling:
[mediawiki/core@wmf/1.33.0-wmf.1] [1.33.0-wmf.1] Fix slow query counting by unindexed field ar_page_id

https://gerrit.wikimedia.org/r/469359

Mentioned in SAL (#wikimedia-operations) [2018-10-24T01:56:54Z] <tstarling@deploy1001> Synchronized php-1.33.0-wmf.1/includes/page/WikiPage.php: T207530 (duration: 00m 53s)

Change 469360 merged by jenkins-bot:
[mediawiki/core@REL1_32] Fix slow query counting by unindexed field ar_page_id

https://gerrit.wikimedia.org/r/469360

tstarling claimed this task.

This should be fixed now

I'm not sure if it is related or not, but I'm still experiencing problems deleting things.

For example, https://www.mediawiki.org/w/index.php?title=Thread:Project:Support_desk/no_respond_to_any_clicks/actions.&action=history is a page with 1 revision, but I've been unable to delete it.

To avoid creating high replication lag, this transaction was aborted because the write duration (5.2101356983185) exceeded the 3 second limit. If you are changing many items at once, try doing multiple smaller operations instead.
[W9TSdgpAIEIAABVmXt4AAAAU] exception of type "Wikimedia\Rdbms\DBTransactionSizeError"

I've tried three times and keeps failing (once at 3.7 seconds, once at 4.2 seconds)

Krinkle edited projects, added Performance-Team; removed Performance Issue.

Re-opening for now, but if it turns out to be a separate issue we should re-close and file anew.

I logged a deletion on en.wikipedia.org using X-Wikimedia-Debug, you can see it in mwlog1001.eqiad.wmnet:/srv/mw-log/XWikimediaDebug.log . You can see that the row count query was indeed very slow. The query was:

SELECT  COUNT(*) AS `rowcount`  FROM (SELECT  1  FROM `archive`    WHERE ar_page_id = '47773335' ...

The count(*) is scary, maybe the api should change to not return exact count for very high numbers. Or something else I cannot think of now.

@Krinkle , have you seen evidence of an ongoing problem, or can we resolve this task?