Page MenuHomePhabricator

Type error on parent id revisions
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):
Manual:Grabbers indicates the scripts are tested up to MediaWiki 1.38.

  • Install MediaWiki 1.37.6 to match the remote wiki
  • Follow the manual by truncating all relevant tables to allow a clean import from the remote wiki
  • Run grabLogs.php --db="$dbName" --dbpass="$dbPass" --dbuser="$dbUser" --url="$URL"
  • Run grabText.php --db="$dbName" --dbpass="$dbPass" --dbuser="$dbUser" --url="$URL" -namespaces="0|1|2|3|4|5|6|7|8|9|10|11|14|15"

What happens?:
Impacted Namespaces: 0 [Main], 8 [MediaWiki], and 10 [Template].
The script to retrieve edit revisions will exit due to the following error

Title: Hydra.css in namespace 8
Setting page_restrictions on page_id 32545.
Inserting revision 118273
PHP Notice:  Undefined index: parentid in /var/www/html/w/grabbers/includes/TextGrabber.php on line 161
[ee39c25a28cd0cbf5acbdb7c] [no req]   TypeError: Argument 1 passed to MediaWiki\Revision\MutableRevisionRecord::setParentId() must be of the type int, null given, called in /var/www/html/w/grabbers/includes/TextGrabber.php on line 161
Backtrace:
from /var/www/html/w/includes/Revision/MutableRevisionRecord.php(122)
#0 /var/www/html/w/grabbers/includes/TextGrabber.php(161): MediaWiki\Revision\MutableRevisionRecord->setParentId()
#1 /var/www/html/w/grabbers/grabText.php(285): TextGrabber->processRevision()
#2 /var/www/html/w/grabbers/grabText.php(131): GrabText->processPage()
#3 /var/www/html/w/grabbers/grabText.php(94): GrabText->processPagesFromNamespace()
#4 /var/www/html/w/maintenance/doMaintenance.php(108): GrabText->execute()
#5 /var/www/html/w/grabbers/grabText.php(349): require_once(string)
#6 {main}

What should have happened instead?:

  • No errors
  • All revisions should be copied from the remote wiki into the local wiki

Software version (skip for WMF-hosted wikis like Wikipedia):
MediaWiki 1.37.6
grabber-tools master branch

Other information (browser name/version, screenshots, etc.):

Current workaround is to modify line 161 of Grabbers/includes/TextGrabber.php then rerun grabText.php

function processRevision( $revision, $page_id, $title ) {
	if ($revision['parentid']) {
		$rev->setParentId( $revision['parentid'] );
		$this->revisionStore->insertRevisionOn( $rev, $this->dbw );
	}
}

This will allow skipping the error and grabText.php will correctly continue to insert revisions from the remote wiki.

  • A side effect is this could cause unrelated namespaces to have pages throw an error on revision id 0 and refuse to allow deleting because the initial revision is missing. This side effect is resolved by running grabText.php again on the impacted namespaces without the above modification. This may still be prone to still have revision id 0 error in impacted namespaces.

Event Timeline

That's strange. The scripts haven't changed much since it was tested on 1.37. And in fact I have successfully imported a wiki very recently to 1.39

I've done a quick test against a 1.38 remote wiki and the api returns always a parentid, which is 0 for the first (oldest) revision of the page.

Example on mediawiki,org: https://www.mediawiki.org/wiki/Special:ApiSandbox#action=query&format=json&prop=revisions&pageids=120818&formatversion=2&rvdir=newer

It would be nice to see the api response for the particular page that's giving this error, and even more helpful if you can provide the URL to request that particular problematic page, or at least get which MediaWiki version is using. It seems to be a problem for that specific page, since you said it imported pages from other namespaces without error.

The MediaWiki version of the remote target wiki is 1.37.6.

The error page in namespace 8 [mediawiki] was hydra.css, I don't know if any other pages were skipped due to my workaround.
Don't recall which page threw error in namespace 0 [main]

another workaround that probably works better is this instead of skipping all invalid or null parent ids -> creates Error pages

function processRevision( $revision, $page_id, $title ) {
	if ($page_id !== bad_index) {
		$rev->setPageId( $page_id );
		$rev->setParentId( $revision['parentid'] );
		$this->revisionStore->insertRevisionOn( $rev, $this->dbw );
	}
}

In any case, the fix would be to only skip the setParentId, not the revision insertion (if that doesn't cause any other issue when inserting the revision).

There was a maintenance script to populate the parent revision id: populateParentId.php, However, it was removed in 1.36

Is there a possibility to share the URL of the wiki where this problem happens? I'm interested in knowing the root cause for a revision to not have a parent revision ID.

Change 891852 had a related patch set uploaded (by Martineznovo; author: Martineznovo):

[mediawiki/tools/grabbers@master] TextGrabber: guard against null parent revision id

https://gerrit.wikimedia.org/r/891852

I've uploaded a fix. If you can test it and run the script again against the namespaces that failed, it should fix the problem. Existing revisions should be skipped

ok, running the script again. Will update on result.

@Ciencia_Al_Poder no errors though I did have to forcefully skip revision 95686 - duplicate entry in slots table. Thanks for all the help.

Title: Fandom_wiki_staff_list/doc in namespace 10
Setting page_restrictions on page_id 28806.
Inserting revision 95690
Wikimedia\Rdbms\DBQueryError from line 1809 of /var/www/html/w/includes/libs/rdbms/database/Database.php: Error 1062: Duplicate entry '95690-1' for key 'PRIMARY' (127.0.0.1)
Function: MediaWiki\Revision\RevisionStore::insertSlotRowOn
Query: INSERT INTO `slots` (slot_revision_id,slot_role_id,slot_content_id,slot_origin) VALUES (95690,1,322676,95690)

#0 /var/www/html/w/includes/libs/rdbms/database/Database.php(1793): Wikimedia\Rdbms\Database->getQueryException()
#1 /var/www/html/w/includes/libs/rdbms/database/Database.php(1768): Wikimedia\Rdbms\Database->getQueryExceptionAndLog()
#2 /var/www/html/w/includes/libs/rdbms/database/Database.php(1327): Wikimedia\Rdbms\Database->reportQueryError()
#3 /var/www/html/w/includes/libs/rdbms/database/Database.php(2540): Wikimedia\Rdbms\Database->query()
#4 /var/www/html/w/includes/libs/rdbms/database/Database.php(2520): Wikimedia\Rdbms\Database->doInsert()
#5 /var/www/html/w/includes/Revision/RevisionStore.php(953): Wikimedia\Rdbms\Database->insert()
#6 /var/www/html/w/includes/Revision/RevisionStore.php(721): MediaWiki\Revision\RevisionStore->insertSlotRowOn()
#7 /var/www/html/w/includes/Revision/RevisionStore.php(674): MediaWiki\Revision\RevisionStore->insertSlotOn()
#8 /var/www/html/w/includes/Revision/RevisionStore.php(486): MediaWiki\Revision\RevisionStore->insertRevisionInternal()
#9 /var/www/html/w/includes/libs/rdbms/database/Database.php(4782): MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}()
#10 /var/www/html/w/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->doAtomicSection()
#11 /var/www/html/w/includes/libs/rdbms/database/DBConnRef.php(668): Wikimedia\Rdbms\DBConnRef->__call()
#12 /var/www/html/w/includes/Revision/RevisionStore.php(494): Wikimedia\Rdbms\DBConnRef->doAtomicSection()
#13 /var/www/html/w/grabbers/includes/TextGrabber.php(166): MediaWiki\Revision\RevisionStore->insertRevisionOn()
#14 /var/www/html/w/grabbers/grabText.php(285): TextGrabber->processRevision()
#15 /var/www/html/w/grabbers/grabText.php(131): GrabText->processPage()
#16 /var/www/html/w/grabbers/grabText.php(94): GrabText->processPagesFromNamespace()
#17 /var/www/html/w/maintenance/doMaintenance.php(108): GrabText->execute()
#18 /var/www/html/w/grabbers/grabText.php(349): require_once('/var/www/html/w...')
#19 {main}
[e7b2c6bf9dc6811078419126] [no req]   Wikimedia\Rdbms\DBTransactionError: GrabText: Commit failed on server(s) 127.0.0.1: Cannot execute query from GrabText while transaction status is ERROR
Backtrace:
from /var/www/html/w/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1850)
#0 /var/www/html/w/includes/libs/rdbms/lbfactory/LBFactory.php(249): Wikimedia\Rdbms\LoadBalancer->commitPrimaryChanges()
#1 /var/www/html/w/includes/libs/rdbms/lbfactory/LBFactorySimple.php(145): Wikimedia\Rdbms\LBFactory::Wikimedia\Rdbms\{closure}()
#2 /var/www/html/w/includes/libs/rdbms/lbfactory/LBFactory.php(251): Wikimedia\Rdbms\LBFactorySimple->forEachLB()
#3 /var/www/html/w/includes/libs/rdbms/lbfactory/LBFactory.php(319): Wikimedia\Rdbms\LBFactory->forEachLBCallMethod()
#4 /var/www/html/w/maintenance/includes/Maintenance.php(1243): Wikimedia\Rdbms\LBFactory->commitPrimaryChanges()
#5 /var/www/html/w/maintenance/doMaintenance.php(130): Maintenance->shutdown()
#6 /var/www/html/w/grabbers/grabText.php(349): require_once(string)
#7 {main}

Change 891852 merged by Martineznovo:

[mediawiki/tools/grabbers@master] TextGrabber: guard against null parent revision id

https://gerrit.wikimedia.org/r/891852