Page MenuHomePhabricator

Clean up Bad Blobs
Closed, ResolvedPublic

Description

I am cleaning up 4 bad blobs just to get familiar with dumps, in the future I won't be necessarily doing this, just wanted to see what was involved. Details follow in comments. Following up here from the dumps error scanner finding some problems with getText on azwiki and hrwiki.

Event Timeline

Milimetric claimed this task.

For azwiki

mwscript maintenance/findBadBlobs.php --wiki azwiki --revisions 413206,413238,413328

Scanning 3 ids
! Found bad blob on revision 413206 from 20090309224348 (main slot): content_id=306505, address=<tt:406916>, error='Bad data in text row 406916. Use findBadBlobs.php to remedy.', type='MediaWiki\Storage\BlobAccessException'. ID: 413206
! Found bad blob on revision 413238 from 20090309232015 (main slot): content_id=306517, address=<tt:406942>, error='Bad data in text row 406942. Use findBadBlobs.php to remedy.', type='MediaWiki\Storage\BlobAccessException'. ID: 413238
! Found bad blob on revision 413328 from 20090310002010 (main slot): content_id=306575, address=<tt:407020>, error='Bad data in text row 407020. Use findBadBlobs.php to remedy.', type='MediaWiki\Storage\BlobAccessException'. ID: 413328
- Scanned a batch of 3 revisions
Found 3 bad revisions.
On a unix/linux environment, you can use grep and cut to list of IDs
that can then be used with the --revisions option. E.g.
  grep '! Found bad blob' | cut -s -f 3

mwscript maintenance/findBadBlobs.php --wiki azwiki --revisions 413206,413238,413328 --mark T346969

Scanning 3 ids
	! Found bad blob on revision 413206 from 20090309224348 (main slot): content_id=306505, address=<tt:406916>, error='Bad data in text row 406916. Use findBadBlobs.php to remedy.', type='MediaWiki\Storage\BlobAccessException'. ID:	413206
	Changed address to <bad:tt%3A406916?reason=T346969&error=Bad+data+in+text+row+406916.+Use+findBadBlobs.php+to+remedy.>
	! Found bad blob on revision 413238 from 20090309232015 (main slot): content_id=306517, address=<tt:406942>, error='Bad data in text row 406942. Use findBadBlobs.php to remedy.', type='MediaWiki\Storage\BlobAccessException'. ID:	413238
	Changed address to <bad:tt%3A406942?reason=T346969&error=Bad+data+in+text+row+406942.+Use+findBadBlobs.php+to+remedy.>
	! Found bad blob on revision 413328 from 20090310002010 (main slot): content_id=306575, address=<tt:407020>, error='Bad data in text row 407020. Use findBadBlobs.php to remedy.', type='MediaWiki\Storage\BlobAccessException'. ID:	413328
	Changed address to <bad:tt%3A407020?reason=T346969&error=Bad+data+in+text+row+407020.+Use+findBadBlobs.php+to+remedy.>
	- Scanned a batch of 3 revisions
Marked 3 bad revisions.

For hrwiki

mwscript maintenance/findBadBlobs.php --wiki hrwiki --revisions 1705637

Scanning 1 ids
	! Found bad blob on revision 1705637 from 20090309211443 (main slot): content_id=1558975, address=<tt:1677927>, error='Bad data in text row 1677927. Use findBadBlobs.php to remedy.', type='MediaWiki\Storage\BlobAccessException'. ID:	1705637
	- Scanned a batch of 1 revisions
Found 1 bad revisions.
On a unix/linux environment, you can use grep and cut to list of IDs
that can then be used with the --revisions option. E.g.
  grep '! Found bad blob' | cut -s -f 3

mwscript maintenance/findBadBlobs.php --wiki hrwiki --revisions 1705637 --mark T346969

Scanning 1 ids
	! Found bad blob on revision 1705637 from 20090309211443 (main slot): content_id=1558975, address=<tt:1677927>, error='Bad data in text row 1677927. Use findBadBlobs.php to remedy.', type='MediaWiki\Storage\BlobAccessException'. ID:	1705637
	Changed address to <bad:tt%3A1677927?reason=T346969&error=Bad+data+in+text+row+1677927.+Use+findBadBlobs.php+to+remedy.>
	- Scanned a batch of 1 revisions
Marked 1 bad revisions.