Page MenuHomePhabricator

populateRevisionLength.php displays "Content of archive unavailable" error when updating ar_len field with text stored in archive table
Closed, ResolvedPublic

Description

When I run update.php while updating from 1.22 to 1.23, amongst other things the script maintenance/populateRevisionLength.php is executed to populate the ar_len column.

However, this script does not always work correctly so that the following error occurs (code from populateRevisionLength.php):

  1. Go through and update ar_len from these rows.

foreach ( $res as $row ) {

		$rev = Revision::newFromArchiveRow( $row );
		$content = $rev->getContent();
		if ( !$content ) {
			# This should not happen, but sometimes does (bug 20757)
			$this->output( "Content of archive {$row->ar_id} unavailable!\n" );

In my case this is _not_ caused by bug 20757. Instead, this is caused by the fact that for old archived revisions, the rows in the archive table do _not_ contain a pointer to the text in the text table, but that the text is stored in the archive table directly. Obviously the above code cannot handle this situation properly.

The line

$rev = Revision::newFromArchiveRow( $row );

does not work correctly in this case. It is unable to return information from inside an archive row.


Version: 1.23.0
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=24538

Details

Reference
bz65765

Event Timeline

bzimport raised the priority of this task from to High.
bzimport set Reference to bz65765.
bzimport added a subscriber: Unknown Object (MLST).
This comment was removed by Joergi123.

I'm trying to reproduce this and having trouble. It doesn't appear to happen from a straight 1.22.0 -> 1.23.0-rc.2 update.

It would be good to have a test case that reproduces your problem:

this is caused by the fact that for old archived revisions, the rows in the
archive table do _not_ contain a pointer to the text in the text table, but
that the text is stored in the archive table directly. Obviously the above
code cannot handle this situation properly.

I am upgrading from 1.22.6 to 1.23.0-rc.2.

The database looks pretty normal to me: It has rows in the archive table from the time before the text/revision tables and I get the error message for each of these rows.

I checked the call Revision::newFromArchiveRow( $row ):

I checked var_dump($row) when the error happens; in my example for the first row in the archive table.

I noticed that at this point $row->ar_text is _NOT_ set. I think this is the root of the problem. After that, processing in Revision.php cannot recognize the old row correctly:

if ( isset( $row->ar_text ) && !$row->ar_text_id ) {

		//We don't get here as $row->ar_text incorrectly is NULL.
		echo "pre-1.5: " . $row->ar_text;die();
		// Pre-1.5 ar_text row; get revision text.

The root of the problem is in Revision::selectArchiveFields():

Before all the things I wrote above happen, populateRevisionLength.php gets information about the archive rows with

$ar = $this->doLenUpdates( 'archive', 'ar_id', 'ar', Revision::selectArchiveFields() );

In this call Revision::selectArchiveFields() is used to return the relevant fields from the archive table. Revision::selectArchiveFields() in fact also contains a list of fields to return, but this list does NOT include 'ar_text'. Without this field being returned, the condition

if ( isset( $row->ar_text ) && !$row->ar_text_id ) {
// Pre-1.5 ar_text row; get revision text.

in Revision::newFromArchiveRow() will always return FALSE, so that for old revisions the actual text is never retrieved.

Add 'ar_text' to the list of fields in Revision::selectArchiveFields() and the problem is solved.

Thanks for the debugging help. I recommend that you get commit access, but I think I can solve this with what you've given me.

Since I can't test this, could you try the following one-line change to includes/Revision.php and tell me if the problem disappears?

diff --git a/includes/Revision.php b/includes/Revision.php
index 1ad0b4a..b88bb2e 100644

  • a/includes/Revision.php

+++ b/includes/Revision.php
@@ -444,6 +444,7 @@ class Revision implements IDBAccessObject {

			'ar_id',
			'ar_page_id',
			'ar_rev_id',

+ 'ar_text',

			'ar_text_id',
			'ar_timestamp',
			'ar_comment',

Change 135567 had a related patch set uploaded by MarkAHershberger:
Add ar_text to the list from Revision::selectArchiveFields(). It is checked later.

https://gerrit.wikimedia.org/r/135567

Please confirm that this patch fixes the problem you found so we can include this in the 1.23 release.

Hi Mark,

I hereby confirm that the patch fixes the problem.

Would be great, if you could still merge this into REL1_23 before release, so that 1.23.0 does not suffer from this bug.

Change 135578 had a related patch set uploaded by MarkAHershberger:
Add ar_text to the list from Revision::selectArchiveFields(). It is checked later.

https://gerrit.wikimedia.org/r/135578

Change 135567 merged by jenkins-bot:
Add ar_text to the list from Revision::selectArchiveFields(). It is checked later.

https://gerrit.wikimedia.org/r/135567

Change 135578 merged by jenkins-bot:
Add ar_text to the list from Revision::selectArchiveFields(). It is checked later.

https://gerrit.wikimedia.org/r/135578

committed on master and REL1_23

Joergi123 updated the task description. (Show Details)Apr 7 2017, 3:12 PM
Joergi123 added a subscriber: Joergi123.