Page MenuHomePhabricator

getRedirectTarget should not automatically load revision content in all cases
Open, NormalPublic0 Story Points

Description

I noticed today when checking logstash errors for retrieval of some content that xml stubs dumps (which should not and previously have not loaded revision content) were producing some of the errors in T203075

Stack trace:

#0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.33.0-wmf.23/includes/debug/MWDebug.php(309): trigger_error(string, integer)
#2 /srv/mediawiki/php-1.33.0-wmf.23/includes/debug/MWDebug.php(164): MWDebug::sendMessage(string, array, string, integer)
#3 /srv/mediawiki/php-1.33.0-wmf.23/includes/GlobalFunctions.php(1106): MWDebug::warning(string, integer, integer, string)
#4 /srv/mediawiki/php-1.33.0-wmf.23/includes/Storage/SqlBlobStore.php(353): wfLogWarning(string)
#5 /srv/mediawiki/php-1.33.0-wmf.23/includes/Storage/SqlBlobStore.php(278): MediaWiki\Storage\SqlBlobStore->fetchBlob(string, integer)
#6 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/objectcache/WANObjectCache.php(1396): MediaWiki\Storage\SqlBlobStore->MediaWiki\Storage\{closure}(boolean, integer, array, NULL)
#7 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/objectcache/WANObjectCache.php(1257): WANObjectCache->doGetWithSetCallback(string, integer, Closure, array)
#8 /srv/mediawiki/php-1.33.0-wmf.23/includes/Storage/SqlBlobStore.php(280): WANObjectCache->getWithSetCallback(string, integer, Closure, array)
#9 /srv/mediawiki/php-1.33.0-wmf.23/includes/Revision/RevisionStore.php(1461): MediaWiki\Storage\SqlBlobStore->getBlob(string, integer)
#10 /srv/mediawiki/php-1.33.0-wmf.23/includes/Revision/RevisionStore.php(1634): MediaWiki\Revision\RevisionStore->loadSlotContent(MediaWiki\Revision\SlotRecord, NULL, NULL, NULL, integer)
#11 [internal function]: MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}(MediaWiki\Revision\SlotRecord)
#12 /srv/mediawiki/php-1.33.0-wmf.23/includes/Revision/SlotRecord.php(307): call_user_func(Closure, MediaWiki\Revision\SlotRecord)
#13 /srv/mediawiki/php-1.33.0-wmf.23/includes/Revision/RevisionRecord.php(175): MediaWiki\Revision\SlotRecord->getContent()
#14 /srv/mediawiki/php-1.33.0-wmf.23/includes/Revision.php(923): MediaWiki\Revision\RevisionRecord->getContent(string, integer, NULL)
#15 /srv/mediawiki/php-1.33.0-wmf.23/includes/page/WikiPage.php(819): Revision->getContent(integer, NULL)
#16 /srv/mediawiki/php-1.33.0-wmf.23/includes/page/WikiPage.php(1043): WikiPage->getContent()
#17 /srv/mediawiki/php-1.33.0-wmf.23/includes/page/WikiPage.php(1030): WikiPage->insertRedirect()
#18 /srv/mediawiki/php-1.33.0-wmf.23/includes/export/XmlDumpWriter.php(185): WikiPage->getRedirectTarget()
#19 /srv/mediawiki/php-1.33.0-wmf.23/includes/export/WikiExporter.php(482): XmlDumpWriter->openPage(stdClass)
#20 /srv/mediawiki/php-1.33.0-wmf.23/includes/export/WikiExporter.php(445): WikiExporter->outputPageStreamBatch(Wikimedia\Rdbms\ResultWrapper, stdClass)
#21 /srv/mediawiki/php-1.33.0-wmf.23/includes/export/WikiExporter.php(269): WikiExporter->dumpPages(string, boolean)
#22 /srv/mediawiki/php-1.33.0-wmf.23/includes/export/WikiExporter.php(154): WikiExporter->dumpFrom(string, boolean)
#23 /srv/mediawiki/php-1.33.0-wmf.23/maintenance/includes/BackupDumper.php(288): WikiExporter->pagesByRange(integer, integer, boolean)
#24 /srv/mediawiki/php-1.33.0-wmf.23/maintenance/dumpBackup.php(81): BackupDumper->dump(integer, integer)
#25 /srv/mediawiki/php-1.33.0-wmf.23/maintenance/doMaintenance.php(94): DumpBackup->execute()
#26 /srv/mediawiki/php-1.33.0-wmf.23/maintenance/dumpBackup.php(138): require_once(string)
#27 /srv/mediawiki/multiversion/MWScript.php(100): require_once(string)

It's plain that content is being retrieved; this is new behavior since the March 20th stubs run and so a regression of some kind.

Event Timeline

ArielGlenn triaged this task as Normal priority.Apr 4 2019, 11:23 PM
ArielGlenn created this task.
ArielGlenn moved this task from Backlog to Active on the Dumps-Generation board.

I can't guarantee that stubs dumps never loaded revision content for redirects previously; I can guarantee that I didn't see these error messages for bad data for stubs previously, though I did see them for dumpTextPass.php (which dumps revision content).

It seems very fishy that an insert into the redirect table would need to be made from here. I'll look at some specific cases tomorrow (but anyone else with a clue please feel free to get there first).

Here's a specific sample.

ErrorException from line 309 of /srv/mediawiki/php-1.33.0-wmf.23/includes/debug/MWDebug.php: PHP Warning: MediaWiki\Storage\SqlBlobStore::fetchBlob: Bad data in text row 1677927. [Called from MediaWiki\Storage\SqlBlobStore::fetchBlob in /srv/mediawiki/php-1.33.0-wmf.23/includes/Storage/SqlBlobStore.php at line 353]

Mysql shows:

wikiadmin@10.64.16.191(hrwiki)> select * from text where old_id = 1677927;
+---------+---------------+-----------+------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
| old_id  | old_namespace | old_title | old_text         | old_comment | old_user | old_user_text | old_timestamp | old_minor_edit | old_flags           | inverse_timestamp |
+---------+---------------+-----------+------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
| 1677927 |               |           | DB://cluster20/0 |             |          |               |               |                | utf-8,gzip,external |                   |
+---------+---------------+-----------+------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
1 row in set (0.00 sec)

wikiadmin@10.64.16.191(hrwiki)> select * from revision where rev_id = 1705637;
+---------+----------+-------------+-------------+----------+---------------+----------------+----------------+-------------+---------+---------------+----------+-------------------+--------------------+
| rev_id  | rev_page | rev_text_id | rev_comment | rev_user | rev_user_text | rev_timestamp  | rev_minor_edit | rev_deleted | rev_len | rev_parent_id | rev_sha1 | rev_content_model | rev_content_format |
+---------+----------+-------------+-------------+----------+---------------+----------------+----------------+-------------+---------+---------------+----------+-------------------+--------------------+
| 1705637 |   192386 |     1677927 |             |    17775 | Fhms          | 20090309211443 |              0 |           0 |      81 |             0 |          | NULL              | NULL               |
+---------+----------+-------------+-------------+----------+---------------+----------------+----------------+-------------+---------+---------------+----------+-------------------+--------------------+
1 row in set (0.00 sec)

I was able to check the previous run's stubs to get the revision information for the text row. This might be useful to folks looking at T203075.

ArielGlenn renamed this task from getRedirectTarget should not automatically load revision content to getRedirectTarget should not automatically load revision content in all cases.Apr 5 2019, 12:03 AM

Page info for the above example:

wikiadmin@10.64.0.205(hrwiki)> select * from page where page_id = 192386;
+---------+----------------+-----------------------------------------------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
| page_id | page_namespace | page_title                                                | page_restrictions | page_is_redirect | page_is_new | page_random    | page_touched   | page_links_updated | page_latest | page_len | page_content_model | page_lang |
+---------+----------------+-----------------------------------------------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
|  192386 |              0 | Franjevački_samostan_Sv._Antuna_Padovanskog_u_Koprivnici  |                   |                1 |           1 | 0.222437823375 | 20160904152621 | NULL               |     1705637 |       81 | wikitext           | NULL      |
+---------+----------------+-----------------------------------------------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+

Entry in redirect table:

wikiadmin@10.64.0.205(hrwiki)> select * from redirect where rd_from = 192386;
+---------+--------------+-------------------------------------------------------------------+--------------+-------------+
| rd_from | rd_namespace | rd_title                                                          | rd_interwiki | rd_fragment |
+---------+--------------+-------------------------------------------------------------------+--------------+-------------+
|  192386 |            0 | Franjevački_samostan_i_crkva_sv._Antuna_Padovanskog_u_Koprivnici  | NULL         | NULL        |
+---------+--------------+-------------------------------------------------------------------+--------------+-------------+

Can reproduce the exception and the call to insertRedirect() with the following command on snapshot1007 as the dumpsgen user:
/usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpBackup.php --wiki=hrwiki --full --stub --report=1 --output=file:/mnt/dumpsdata/temp/dumpsgen/hrwiki-badstubs-history.xml --skip-header --start=192386 --skip-footer --end 192388

Logstash for the above: https://logstash.wikimedia.org/goto/d58def2efbc943e8f13d05ff0ddf59e9

The redirect page (192386) is not working

[XK@TigpAMFcAAHuTZe0AAACN] 2019-04-11 19:20:42: Fataler Ausnahmefehler des Typs „MediaWiki\Revision\RevisionAccessException“

https://hr.wikipedia.org/wiki/Franjeva%C4%8Dki_samostan_Sv._Antuna_Padovanskog_u_Koprivnici
https://hr.wikipedia.org/wiki/Franjeva%C4%8Dki_samostan_Sv._Antuna_Padovanskog_u_Koprivnici?action=history
https://hr.wikipedia.org/wiki/Franjeva%C4%8Dki_samostan_Sv._Antuna_Padovanskog_u_Koprivnici?action=info

The redirect table is queried (on replica) before the insertRedirect function is called, but when rd_interwiki and rd_fragment are null, the insertRedirect is called to populate these fields, but that fails, because the content throws exception

Without the content it is not possible to work properly for WikiPage::getRedirectTarget

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM