Page MenuHomePhabricator

getRedirectTarget should not automatically load revision content in all cases
Closed, ResolvedPublic0 Estimated Story PointsPRODUCTION ERROR

Description

I noticed today when checking logstash errors for retrieval of some content that xml stubs dumps (which should not and previously have not loaded revision content) were producing some of the errors in T203075

Stack trace:

#0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.33.0-wmf.23/includes/debug/MWDebug.php(309): trigger_error(string, integer)
#2 /srv/mediawiki/php-1.33.0-wmf.23/includes/debug/MWDebug.php(164): MWDebug::sendMessage(string, array, string, integer)
#3 /srv/mediawiki/php-1.33.0-wmf.23/includes/GlobalFunctions.php(1106): MWDebug::warning(string, integer, integer, string)
#4 /srv/mediawiki/php-1.33.0-wmf.23/includes/Storage/SqlBlobStore.php(353): wfLogWarning(string)
#5 /srv/mediawiki/php-1.33.0-wmf.23/includes/Storage/SqlBlobStore.php(278): MediaWiki\Storage\SqlBlobStore->fetchBlob(string, integer)
#6 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/objectcache/WANObjectCache.php(1396): MediaWiki\Storage\SqlBlobStore->MediaWiki\Storage\{closure}(boolean, integer, array, NULL)
#7 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/objectcache/WANObjectCache.php(1257): WANObjectCache->doGetWithSetCallback(string, integer, Closure, array)
#8 /srv/mediawiki/php-1.33.0-wmf.23/includes/Storage/SqlBlobStore.php(280): WANObjectCache->getWithSetCallback(string, integer, Closure, array)
#9 /srv/mediawiki/php-1.33.0-wmf.23/includes/Revision/RevisionStore.php(1461): MediaWiki\Storage\SqlBlobStore->getBlob(string, integer)
#10 /srv/mediawiki/php-1.33.0-wmf.23/includes/Revision/RevisionStore.php(1634): MediaWiki\Revision\RevisionStore->loadSlotContent(MediaWiki\Revision\SlotRecord, NULL, NULL, NULL, integer)
#11 [internal function]: MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}(MediaWiki\Revision\SlotRecord)
#12 /srv/mediawiki/php-1.33.0-wmf.23/includes/Revision/SlotRecord.php(307): call_user_func(Closure, MediaWiki\Revision\SlotRecord)
#13 /srv/mediawiki/php-1.33.0-wmf.23/includes/Revision/RevisionRecord.php(175): MediaWiki\Revision\SlotRecord->getContent()
#14 /srv/mediawiki/php-1.33.0-wmf.23/includes/Revision.php(923): MediaWiki\Revision\RevisionRecord->getContent(string, integer, NULL)
#15 /srv/mediawiki/php-1.33.0-wmf.23/includes/page/WikiPage.php(819): Revision->getContent(integer, NULL)
#16 /srv/mediawiki/php-1.33.0-wmf.23/includes/page/WikiPage.php(1043): WikiPage->getContent()
#17 /srv/mediawiki/php-1.33.0-wmf.23/includes/page/WikiPage.php(1030): WikiPage->insertRedirect()
#18 /srv/mediawiki/php-1.33.0-wmf.23/includes/export/XmlDumpWriter.php(185): WikiPage->getRedirectTarget()
#19 /srv/mediawiki/php-1.33.0-wmf.23/includes/export/WikiExporter.php(482): XmlDumpWriter->openPage(stdClass)
#20 /srv/mediawiki/php-1.33.0-wmf.23/includes/export/WikiExporter.php(445): WikiExporter->outputPageStreamBatch(Wikimedia\Rdbms\ResultWrapper, stdClass)
#21 /srv/mediawiki/php-1.33.0-wmf.23/includes/export/WikiExporter.php(269): WikiExporter->dumpPages(string, boolean)
#22 /srv/mediawiki/php-1.33.0-wmf.23/includes/export/WikiExporter.php(154): WikiExporter->dumpFrom(string, boolean)
#23 /srv/mediawiki/php-1.33.0-wmf.23/maintenance/includes/BackupDumper.php(288): WikiExporter->pagesByRange(integer, integer, boolean)
#24 /srv/mediawiki/php-1.33.0-wmf.23/maintenance/dumpBackup.php(81): BackupDumper->dump(integer, integer)
#25 /srv/mediawiki/php-1.33.0-wmf.23/maintenance/doMaintenance.php(94): DumpBackup->execute()
#26 /srv/mediawiki/php-1.33.0-wmf.23/maintenance/dumpBackup.php(138): require_once(string)
#27 /srv/mediawiki/multiversion/MWScript.php(100): require_once(string)

It's plain that content is being retrieved; this is new behavior since the March 20th stubs run and so a regression of some kind.

Event Timeline

ArielGlenn triaged this task as Medium priority.Apr 4 2019, 11:23 PM
ArielGlenn created this task.
ArielGlenn moved this task from Backlog to Active on the Dumps-Generation board.

I can't guarantee that stubs dumps never loaded revision content for redirects previously; I can guarantee that I didn't see these error messages for bad data for stubs previously, though I did see them for dumpTextPass.php (which dumps revision content).

It seems very fishy that an insert into the redirect table would need to be made from here. I'll look at some specific cases tomorrow (but anyone else with a clue please feel free to get there first).

Here's a specific sample.

ErrorException from line 309 of /srv/mediawiki/php-1.33.0-wmf.23/includes/debug/MWDebug.php: PHP Warning: MediaWiki\Storage\SqlBlobStore::fetchBlob: Bad data in text row 1677927. [Called from MediaWiki\Storage\SqlBlobStore::fetchBlob in /srv/mediawiki/php-1.33.0-wmf.23/includes/Storage/SqlBlobStore.php at line 353]

Mysql shows:

wikiadmin@10.64.16.191(hrwiki)> select * from text where old_id = 1677927;
+---------+---------------+-----------+------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
| old_id  | old_namespace | old_title | old_text         | old_comment | old_user | old_user_text | old_timestamp | old_minor_edit | old_flags           | inverse_timestamp |
+---------+---------------+-----------+------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
| 1677927 |               |           | DB://cluster20/0 |             |          |               |               |                | utf-8,gzip,external |                   |
+---------+---------------+-----------+------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
1 row in set (0.00 sec)

wikiadmin@10.64.16.191(hrwiki)> select * from revision where rev_id = 1705637;
+---------+----------+-------------+-------------+----------+---------------+----------------+----------------+-------------+---------+---------------+----------+-------------------+--------------------+
| rev_id  | rev_page | rev_text_id | rev_comment | rev_user | rev_user_text | rev_timestamp  | rev_minor_edit | rev_deleted | rev_len | rev_parent_id | rev_sha1 | rev_content_model | rev_content_format |
+---------+----------+-------------+-------------+----------+---------------+----------------+----------------+-------------+---------+---------------+----------+-------------------+--------------------+
| 1705637 |   192386 |     1677927 |             |    17775 | Fhms          | 20090309211443 |              0 |           0 |      81 |             0 |          | NULL              | NULL               |
+---------+----------+-------------+-------------+----------+---------------+----------------+----------------+-------------+---------+---------------+----------+-------------------+--------------------+
1 row in set (0.00 sec)

I was able to check the previous run's stubs to get the revision information for the text row. This might be useful to folks looking at T203075.

ArielGlenn renamed this task from getRedirectTarget should not automatically load revision content to getRedirectTarget should not automatically load revision content in all cases.Apr 5 2019, 12:03 AM

Page info for the above example:

wikiadmin@10.64.0.205(hrwiki)> select * from page where page_id = 192386;
+---------+----------------+-----------------------------------------------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
| page_id | page_namespace | page_title                                                | page_restrictions | page_is_redirect | page_is_new | page_random    | page_touched   | page_links_updated | page_latest | page_len | page_content_model | page_lang |
+---------+----------------+-----------------------------------------------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
|  192386 |              0 | Franjevački_samostan_Sv._Antuna_Padovanskog_u_Koprivnici  |                   |                1 |           1 | 0.222437823375 | 20160904152621 | NULL               |     1705637 |       81 | wikitext           | NULL      |
+---------+----------------+-----------------------------------------------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+

Entry in redirect table:

wikiadmin@10.64.0.205(hrwiki)> select * from redirect where rd_from = 192386;
+---------+--------------+-------------------------------------------------------------------+--------------+-------------+
| rd_from | rd_namespace | rd_title                                                          | rd_interwiki | rd_fragment |
+---------+--------------+-------------------------------------------------------------------+--------------+-------------+
|  192386 |            0 | Franjevački_samostan_i_crkva_sv._Antuna_Padovanskog_u_Koprivnici  | NULL         | NULL        |
+---------+--------------+-------------------------------------------------------------------+--------------+-------------+

Can reproduce the exception and the call to insertRedirect() with the following command on snapshot1007 as the dumpsgen user:
/usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpBackup.php --wiki=hrwiki --full --stub --report=1 --output=file:/mnt/dumpsdata/temp/dumpsgen/hrwiki-badstubs-history.xml --skip-header --start=192386 --skip-footer --end 192388

Logstash for the above: https://logstash.wikimedia.org/goto/d58def2efbc943e8f13d05ff0ddf59e9

The redirect page (192386) is not working

[XK@TigpAMFcAAHuTZe0AAACN] 2019-04-11 19:20:42: Fataler Ausnahmefehler des Typs „MediaWiki\Revision\RevisionAccessException“

https://hr.wikipedia.org/wiki/Franjeva%C4%8Dki_samostan_Sv._Antuna_Padovanskog_u_Koprivnici
https://hr.wikipedia.org/wiki/Franjeva%C4%8Dki_samostan_Sv._Antuna_Padovanskog_u_Koprivnici?action=history
https://hr.wikipedia.org/wiki/Franjeva%C4%8Dki_samostan_Sv._Antuna_Padovanskog_u_Koprivnici?action=info

The redirect table is queried (on replica) before the insertRedirect function is called, but when rd_interwiki and rd_fragment are null, the insertRedirect is called to populate these fields, but that fails, because the content throws exception

Without the content it is not possible to work properly for WikiPage::getRedirectTarget

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM
Krinkle subscribed.

The redirect page (192386) is not working

[XK@TigpAMFcAAHuTZe0AAACN] 2019-04-11 19:20:42: Fatal error of type "MediaWiki\Revision\RevisionAccessException"

https://hr.wikipedia.org/wiki/Franjeva%C4%8Dki_samostan_Sv._Antuna_Padovanskog_u_Koprivnici

This page is still inaccessible. Yields HTTP 500 Internal Server Error:

RevisionAccessException from line 1442 of /srv/mediawiki/php-1.35.0-wmf.4/includes/Revision/RevisionStore.php: Failed to load data blob from tt:1677927: Bad data in text row 1677927.

However, this issue is already tracked at T203075: Warning: MediaWiki\Storage\SqlBlobStore::fetchBlob: Bad data in text row

And, also when trying to edit it:
https://hr.wikipedia.org/w/index.php?title=Franjeva%C4%8Dki_samostan_Sv._Antuna_Padovanskog_u_Koprivnici&action=edit

Error from line 1147 of /srv/mediawiki/php-1.35.0-wmf.4/includes/EditPage.php: Call to a member function getModel() on null

I've reported this as a new issue at T237570: EditPage.php: Call to a member function getModel() on null.

I'm closing this task as the immediate issue that was affecting dumps has been resolved with a workaround in the relevant maintenance script. The underlying issue (T203075) has its own task.

The underlying issue could potentially be prevented from affecting dumps by now making use of blob fetching when exporting revision meta-data. This seems sensible but out of scope for the production error task. If this is still desired, I recommend filing a new task under MediaWiki-Core-Revision-backend for that.