Page MenuHomePhabricator

addWiki.php broken creating ES tables
Closed, ResolvedPublic

Assigned To
Authored By
Reedy
Jan 3 2019, 7:39 PM
Referenced Files
F30412878: image.png
Sep 20 2019, 11:11 PM
F30412991: image.png
Sep 20 2019, 11:11 PM
Tokens
"Like" token, awarded by VIGNERON."Stroopwafel" token, awarded by Ruthven."Love" token, awarded by Quiddity."Like" token, awarded by Ankry.

Description

Initialising external storage cluster24...
[6f30182d25a2d432f75d027c] [no req]   Wikimedia\Rdbms\DBReadOnlyError from line 1163 of /srv/mediawiki/php-1.33.0-wmf.9/includes/libs/rdbms/database/Database.php: Database is read-only: The database has been automatically locked until the replica database servers become available
Backtrace:
#0 /srv/mediawiki/php-1.33.0-wmf.9/includes/libs/rdbms/database/Database.php(4354): Wikimedia\Rdbms\Database->query(string, string)
#1 /srv/mediawiki/php-1.33.0-wmf.9/extensions/WikimediaMaintenance/addWiki.php(229): Wikimedia\Rdbms\Database->sourceStream(resource)
#2 /srv/mediawiki/php-1.33.0-wmf.9/maintenance/doMaintenance.php(94): AddWiki->execute()
#3 /srv/mediawiki/php-1.33.0-wmf.9/extensions/WikimediaMaintenance/addWiki.php(399): include(string)
#4 /srv/mediawiki/multiversion/MWScript.php(100): include(string)
#5 {main}

While doing T197616: Create a production test wiki in group0 to parallel Wikimedia Commons

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

How much of this is unique from T205936 ?

How much of this is unique from T205936 ?

I might be wrong but it seems they are different issues because this one says the ES db is read-only and it can't change anything there while the mentioned bug (which is really bad too) can't even read the data from ES. It might be that they have the same cause with ES going read-only causing writes to get corrupt due to lack of ACID in our code handing ES storage and then it can't read them because they are corrupt (just thinking out loud, can't say for sure)

Is https://phabricator.wikimedia.org/T212881#5195101 the error that still happens or is it the read-only one too?

Is https://phabricator.wikimedia.org/T212881#5195101 the error that still happens or is it the read-only one too?

I didn't get that error in production when trying to make hywwiki. Maybe it's a different error.

A couple of related and more overarching proposals are T158730: Automate WMF wiki creation and T228745: Allow creating an independent "incubator wiki" instead of hosting all new wikis in one Incubator wiki with prefixes. They are closely related and overlapping, but T158730 is described from the internal, technical side, whereas T228745 is more functional and user-level, so they should be kept distinct and inform each other (I hope it makes sense).

I'm not suggesting to abandon this task here and wait until they are resolved. There are problems to resolve now: actual wiki communities that are trying to work and are limited. I am mentioning them because I recently had a conversation with @greg, @mark, and @thcipriani about this, and they were asking for examples of current issues that make wiki creation hard and unstable.

We also discussed the frequency with which new wikis are created. It's true that just a handful of new wikis are created each year, and this may make people think that these issues are not a high priority. But it's the other way around: more wikis should be created each year: dozens, if not hundreds, and the lack of a stable, automatic wiki creation process is one of the reasons they are created so rarely.

Change 526572 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/WikimediaMaintenance@master] Catch, re-check, and retry in case of DB error in creating ES tables

https://gerrit.wikimedia.org/r/526572

Anyone to merge this? So I can give it a try later?

A couple of related and more overarching proposals are T158730: Automate WMF wiki creation and T228745: Allow creating an independent "incubator wiki" instead of hosting all new wikis in one Incubator wiki with prefixes. They are closely related and overlapping, but T158730 is described from the internal, technical side, whereas T228745 is more functional and user-level, so they should be kept distinct and inform each other (I hope it makes sense).

I'm not suggesting to abandon this task here and wait until they are resolved. There are problems to resolve now: actual wiki communities that are trying to work and are limited. I am mentioning them because I recently had a conversation with @greg, @mark, and @thcipriani about this, and they were asking for examples of current issues that make wiki creation hard and unstable.

We also discussed the frequency with which new wikis are created. It's true that just a handful of new wikis are created each year, and this may make people think that these issues are not a high priority. But it's the other way around: more wikis should be created each year: dozens, if not hundreds, and the lack of a stable, automatic wiki creation process is one of the reasons they are created so rarely.

While in general I agree with this but I want to mention that even our SWAT deployments not automatic, not to mention complex things like train, etc. I would be more than happy to split and break down addWiki.php and automate parts of it but I don't think automating the whole process is going to be doable any time soon.

While in general I agree with this but I want to mention that even our SWAT deployments not automatic, not to mention complex things like train, etc. I would be more than happy to split and break down addWiki.php and automate parts of it but I don't think automating the whole process is going to be doable any time soon.

Sure, I know it won't be very quick. But it has to start somewhere :)

Mentioned in SAL (#wikimedia-operations) [2019-08-14T12:24:49Z] <James_F|Away> We're going to try making a new wiki. T212881

@aaron here's the state of play:

Ok, so we just created napwikisource. addWiki.php executed without any visible errors:

reedy@mwmaint1002:/srv/mediawiki$ mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki nap wikisource napwikisource nap.wikisource.org
Creating database napwikisource for nap.wikisource (Napulitano)
Initialising tables
Initialising external storage cluster24...
Initialising external storage cluster25...
Writing main page to Paggena_prencepale
Writing sidebar donate link to MediaWiki:Sitesupport-url
indexing namespaces...
	Indexing namespaces...done
content index...
	Fetching Elasticsearch version...6.5.4...ok
	Scanning available plugins...
		analysis-hebrew, analysis-icu, analysis-nori, analysis-smartcn, analysis-stconvert
		analysis-stempel, analysis-ukrainian, experimental-highlighter, extra, extra-analysis-esperanto
		extra-analysis-serbian, extra-analysis-slovak, ltr
	Picking analyzer...default
	Inferring index identifier...napwikisource_content_first
	Creating index...ok
		Validating number of shards...ok
		Validating replica range...ok
		Validating shard allocation settings...done
		Validating max shards per node...ok
	Validating analyzers...ok
	Validating mappings...
		Validating mapping...different...corrected
	Validating aliases...
		Validating napwikisource_content alias...alias is free...corrected
		Validating napwikisource alias...alias not already assigned to this index...corrected
		Updating tracking indexes...done
general index...
	Fetching Elasticsearch version...6.5.4...ok
	Scanning available plugins...
		analysis-hebrew, analysis-icu, analysis-nori, analysis-smartcn, analysis-stconvert
		analysis-stempel, analysis-ukrainian, experimental-highlighter, extra, extra-analysis-esperanto
		extra-analysis-serbian, extra-analysis-slovak, ltr
	Picking analyzer...default
	Inferring index identifier...napwikisource_general_first
	Creating index...ok
		Validating number of shards...ok
		Validating replica range...ok
		Validating shard allocation settings...done
		Validating max shards per node...ok
	Validating analyzers...ok
	Validating mappings...
		Validating mapping...different...corrected
	Validating aliases...
		Validating napwikisource_general alias...alias is free...corrected
		Validating napwikisource alias...alias not already assigned to this index...corrected
		Updating tracking indexes...done
archive index...
	Fetching Elasticsearch version...6.5.4...ok
	Scanning available plugins...
		analysis-hebrew, analysis-icu, analysis-nori, analysis-smartcn, analysis-stconvert
		analysis-stempel, analysis-ukrainian, experimental-highlighter, extra, extra-analysis-esperanto
		extra-analysis-serbian, extra-analysis-slovak, ltr
	Picking analyzer...default
	Inferring index identifier...napwikisource_archive_first
	Creating index...ok
		Validating number of shards...ok
		Validating replica range...ok
		Validating shard allocation settings...done
		Validating max shards per node...ok
	Validating analyzers...ok
	Validating mappings...
		Validating mapping...different...corrected
	Validating aliases...
		Validating napwikisource_archive alias...alias is free...corrected
		Updating tracking indexes...done
	indexing namespaces...
		Indexing namespaces...done
	content index...
		Fetching Elasticsearch version...6.5.4...ok
		Scanning available plugins...
			analysis-hebrew, analysis-icu, analysis-nori, analysis-smartcn, analysis-stconvert
			analysis-stempel, analysis-ukrainian, experimental-highlighter, extra, extra-analysis-esperanto
			extra-analysis-serbian, extra-analysis-slovak, ltr
		Picking analyzer...default
		Inferring index identifier...napwikisource_content_first
		Creating index...ok
			Validating number of shards...ok
			Validating replica range...ok
			Validating shard allocation settings...done
			Validating max shards per node...ok
		Validating analyzers...ok
		Validating mappings...
			Validating mapping...different...corrected
		Validating aliases...
			Validating napwikisource_content alias...alias is free...corrected
			Validating napwikisource alias...alias not already assigned to this index...corrected
			Updating tracking indexes...done
	general index...
		Fetching Elasticsearch version...6.5.4...ok
		Scanning available plugins...
			analysis-hebrew, analysis-icu, analysis-nori, analysis-smartcn, analysis-stconvert
			analysis-stempel, analysis-ukrainian, experimental-highlighter, extra, extra-analysis-esperanto
			extra-analysis-serbian, extra-analysis-slovak, ltr
		Picking analyzer...default
		Inferring index identifier...napwikisource_general_first
		Creating index...ok
			Validating number of shards...ok
			Validating replica range...ok
			Validating shard allocation settings...done
			Validating max shards per node...ok
		Validating analyzers...ok
		Validating mappings...
			Validating mapping...different...corrected
		Validating aliases...
			Validating napwikisource_general alias...alias is free...corrected
			Validating napwikisource alias...alias not already assigned to this index...corrected
			Updating tracking indexes...done
	archive index...
		Fetching Elasticsearch version...6.5.4...ok
		Scanning available plugins...
			analysis-hebrew, analysis-icu, analysis-nori, analysis-smartcn, analysis-stconvert
			analysis-stempel, analysis-ukrainian, experimental-highlighter, extra, extra-analysis-esperanto
			extra-analysis-serbian, extra-analysis-slovak, ltr
		Picking analyzer...default
		Inferring index identifier...napwikisource_archive_first
		Creating index...ok
			Validating number of shards...ok
			Validating replica range...ok
			Validating shard allocation settings...done
			Validating max shards per node...ok
		Validating analyzers...ok
		Validating mappings...
			Validating mapping...different...corrected
		Validating aliases...
			Validating napwikisource_archive alias...alias is free...corrected
			Updating tracking indexes...done
		indexing namespaces...
			Indexing namespaces...done
		content index...
			Fetching Elasticsearch version...6.5.4...ok
			Scanning available plugins...
				analysis-hebrew, analysis-icu, analysis-nori, analysis-smartcn, analysis-stconvert
				analysis-stempel, analysis-ukrainian, experimental-highlighter, extra, extra-analysis-esperanto
				extra-analysis-serbian, extra-analysis-slovak, ltr
			Picking analyzer...default
			Inferring index identifier...napwikisource_content_first
			Creating index...ok
				Validating number of shards...ok
				Validating replica range...ok
				Validating shard allocation settings...done
				Validating max shards per node...ok
			Validating analyzers...ok
			Validating mappings...
				Validating mapping...different...corrected
			Validating aliases...
				Validating napwikisource_content alias...alias is free...corrected
				Validating napwikisource alias...alias not already assigned to this index...corrected
				Updating tracking indexes...done
		general index...
			Fetching Elasticsearch version...6.5.4...ok
			Scanning available plugins...
				analysis-hebrew, analysis-icu, analysis-nori, analysis-smartcn, analysis-stconvert
				analysis-stempel, analysis-ukrainian, experimental-highlighter, extra, extra-analysis-esperanto
				extra-analysis-serbian, extra-analysis-slovak, ltr
			Picking analyzer...default
			Inferring index identifier...napwikisource_general_first
			Creating index...ok
				Validating number of shards...ok
				Validating replica range...ok
				Validating shard allocation settings...done
				Validating max shards per node...ok
			Validating analyzers...ok
			Validating mappings...
				Validating mapping...different...corrected
			Validating aliases...
				Validating napwikisource_general alias...alias is free...corrected
				Validating napwikisource alias...alias not already assigned to this index...corrected
				Updating tracking indexes...done
done.
Making sure mwstore://local-multiwrite/local-public exists...making 'mwstore://local-multiwrite/local-public' public...done.
Making sure mwstore://local-multiwrite/local-thumb exists...making 'mwstore://local-multiwrite/local-thumb' public...done.
Making sure mwstore://local-multiwrite/local-transcoded exists...making 'mwstore://local-multiwrite/local-transcoded' public...done.
Making sure mwstore://local-multiwrite/local-temp exists...making 'mwstore://local-multiwrite/local-temp' private...done.
Making sure mwstore://local-multiwrite/local-deleted exists...making 'mwstore://local-multiwrite/local-deleted' private...done.
Making sure mwstore://local-multiwrite/timeline-render exists...making 'mwstore://local-multiwrite/timeline-render' public...done.
Done. sync the config as in https://wikitech.wikimedia.org/wiki/Add_a_wiki#MediaWiki_configuration

However, visiting the wiki...

[XVQCfgpAAC4AAAdwgk8AAAAB] 2019-08-14 12:45:50: Errore irreversibbele 'e tipo "MediaWiki\Revision\RevisionAccessException"

2019-08-14 12:45:50 [XVQCfgpAAC4AAAdwgk8AAAAB] mwdebug1002 napwikisource 1.34.0-wmf.17 exception ERROR: [XVQCfgpAAC4AAAdwgk8AAAAB] /wiki/Paggena_prencepale   MediaWiki\Revision\RevisionAccessException from line 1448 of /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/RevisionStore.php: Failed to load data blob from tt:1: Failed to load blob from address tt:1 {"exception_id":"XVQCfgpAAC4AAAdwgk8AAAAB","exception_url":"/wiki/Paggena_prencepale","caught_by":"mwe_handler"} 
[Exception MediaWiki\Revision\RevisionAccessException] (/srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/RevisionStore.php:1448) Failed to load data blob from tt:1: Failed to load blob from address tt:1
  #0 /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/RevisionStore.php(1653): MediaWiki\Revision\RevisionStore->loadSlotContent(MediaWiki\Revision\SlotRecord, NULL, NULL, NULL, integer)
  #1 [internal function]: Closure$MediaWiki\Revision\RevisionStore::constructSlotRecords(MediaWiki\Revision\SlotRecord)
  #2 /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/SlotRecord.php(307): call_user_func(Closure$MediaWiki\Revision\RevisionStore::constructSlotRecords;4585, MediaWiki\Revision\SlotRecord)
  #3 /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/RevisionRecord.php(175): MediaWiki\Revision\SlotRecord->getContent()
  #4 /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/RenderedRevision.php(226): MediaWiki\Revision\RevisionRecord->getContent(string, integer, NULL)
  #5 /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/RevisionRenderer.php(222): MediaWiki\Revision\RenderedRevision->getSlotParserOutput(string)
  #6 /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/RevisionRenderer.php(151): MediaWiki\Revision\RevisionRenderer->combineSlotOutput(MediaWiki\Revision\RenderedRevision, array)
  #7 [internal function]: Closure$MediaWiki\Revision\RevisionRenderer::getRenderedRevision#3(MediaWiki\Revision\RenderedRevision, array)
  #8 /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/RenderedRevision.php(197): call_user_func(Closure$MediaWiki\Revision\RevisionRenderer::getRenderedRevision#3;5923, MediaWiki\Revision\RenderedRevision, array)
  #9 /srv/mediawiki/php-1.34.0-wmf.17/includes/poolcounter/PoolWorkArticleView.php(196): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput()
  #10 /srv/mediawiki/php-1.34.0-wmf.17/includes/poolcounter/PoolCounterWork.php(125): PoolWorkArticleView->doWork()
  #11 /srv/mediawiki/php-1.34.0-wmf.17/includes/page/Article.php(776): PoolCounterWork->execute()
  #12 /srv/mediawiki/php-1.34.0-wmf.17/includes/actions/ViewAction.php(63): Article->view()
  #13 /srv/mediawiki/php-1.34.0-wmf.17/includes/MediaWiki.php(507): ViewAction->show()
  #14 /srv/mediawiki/php-1.34.0-wmf.17/includes/MediaWiki.php(302): MediaWiki->performAction(Article, Title)
  #15 /srv/mediawiki/php-1.34.0-wmf.17/includes/MediaWiki.php(892): MediaWiki->performRequest()
  #16 /srv/mediawiki/php-1.34.0-wmf.17/includes/MediaWiki.php(523): MediaWiki->main()
  #17 /srv/mediawiki/php-1.34.0-wmf.17/index.php(42): MediaWiki->run()
  #18 /srv/mediawiki/w/index.php(3): include(string)
  #19 {main}
Caused by: [Exception MediaWiki\Storage\BlobAccessException] (/srv/mediawiki/php-1.34.0-wmf.17/includes/Storage/SqlBlobStore.php:292) Failed to load blob from address tt:1
  #0 /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/RevisionStore.php(1444): MediaWiki\Storage\SqlBlobStore->getBlob(string, integer)
  #1 /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/RevisionStore.php(1653): MediaWiki\Revision\RevisionStore->loadSlotContent(MediaWiki\Revision\SlotRecord, NULL, NULL, NULL, integer)
  #2 [internal function]: Closure$MediaWiki\Revision\RevisionStore::constructSlotRecords(MediaWiki\Revision\SlotRecord)
  #3 /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/SlotRecord.php(307): call_user_func(Closure$MediaWiki\Revision\RevisionStore::constructSlotRecords;4585, MediaWiki\Revision\SlotRecord)
  #4 /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/RevisionRecord.php(175): MediaWiki\Revision\SlotRecord->getContent()
  #5 /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/RenderedRevision.php(226): MediaWiki\Revision\RevisionRecord->getContent(string, integer, NULL)
  #6 /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/RevisionRenderer.php(222): MediaWiki\Revision\RenderedRevision->getSlotParserOutput(string)
  #7 /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/RevisionRenderer.php(151): MediaWiki\Revision\RevisionRenderer->combineSlotOutput(MediaWiki\Revision\RenderedRevision, array)
  #8 [internal function]: Closure$MediaWiki\Revision\RevisionRenderer::getRenderedRevision#3(MediaWiki\Revision\RenderedRevision, array)
  #9 /srv/mediawiki/php-1.34.0-wmf.17/includes/Revision/RenderedRevision.php(197): call_user_func(Closure$MediaWiki\Revision\RevisionRenderer::getRenderedRevision#3;5923, MediaWiki\Revision\RenderedRevision, array)
  #10 /srv/mediawiki/php-1.34.0-wmf.17/includes/poolcounter/PoolWorkArticleView.php(196): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput()
  #11 /srv/mediawiki/php-1.34.0-wmf.17/includes/poolcounter/PoolCounterWork.php(125): PoolWorkArticleView->doWork()
  #12 /srv/mediawiki/php-1.34.0-wmf.17/includes/page/Article.php(776): PoolCounterWork->execute()
  #13 /srv/mediawiki/php-1.34.0-wmf.17/includes/actions/ViewAction.php(63): Article->view()
  #14 /srv/mediawiki/php-1.34.0-wmf.17/includes/MediaWiki.php(507): ViewAction->show()
  #15 /srv/mediawiki/php-1.34.0-wmf.17/includes/MediaWiki.php(302): MediaWiki->performAction(Article, Title)
  #16 /srv/mediawiki/php-1.34.0-wmf.17/includes/MediaWiki.php(892): MediaWiki->performRequest()
  #17 /srv/mediawiki/php-1.34.0-wmf.17/includes/MediaWiki.php(523): MediaWiki->main()
  #18 /srv/mediawiki/php-1.34.0-wmf.17/index.php(42): MediaWiki->run()
  #19 /srv/mediawiki/w/index.php(3): include(string)
  #20 {main}

Which looks like (but not fully confirmed) to be T205936: Unable to view some pages due to fatal RevisionAccessException: "Failed to load data blob from tt"

If we look at text on napwikisource

wikiadmin@10.64.48.35(napwikisource)> select * from text;
+--------+--------------------+---------------------+
| old_id | old_text           | old_flags           |
+--------+--------------------+---------------------+
|      1 | DB://cluster25/256 | utf-8,gzip,external |
|      2 | DB://cluster25/257 | utf-8,gzip,external |
+--------+--------------------+---------------------+
2 rows in set (0.00 sec)

Why is it using ids of 256 and 257 for the first revision?

If we look at aawiki text table, this looks fishy...

wikiadmin@10.64.48.35(aawiki)> select * from text ORDER BY old_id DESC LIMIT 1;
+--------+---------------+-----------+--------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
| old_id | old_namespace | old_title | old_text           | old_comment | old_user | old_user_text | old_timestamp | old_minor_edit | old_flags           | inverse_timestamp |
+--------+---------------+-----------+--------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
|  10817 |               |           | DB://cluster25/255 |             |          |               |               |                | utf-8,gzip,external |                   |
+--------+---------------+-----------+--------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
1 row in set (0.00 sec)

If we create a new page (for example "Sandbox" on napwikisource)... It works fine

wikiadmin@10.64.48.35(napwikisource)> select * from page;
+---------+----------------+--------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
| page_id | page_namespace | page_title         | page_restrictions | page_is_redirect | page_is_new | page_random    | page_touched   | page_links_updated | page_latest | page_len | page_content_model | page_lang |
+---------+----------------+--------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
|       1 |              0 | Paggena_prencepale |                   |                0 |           1 | 0.722331198926 | 20190814124100 | 20190814124100     |           1 |     1345 | wikitext           | NULL      |
|       2 |              8 | Sitesupport-url    |                   |                0 |           1 |  0.36662219631 | 20190814124100 | 20190814124100     |           2 |      110 | wikitext           | NULL      |
|       3 |              0 | Sandbox            |                   |                0 |           1 |  0.31009594043 | 20190814125456 | 20190814125456     |           3 |       10 | wikitext           | NULL      |
+---------+----------------+--------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
3 rows in set (0.00 sec)

wikiadmin@10.64.48.35(napwikisource)> select * from revision;
+--------+----------+-------------+-------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+-------------------+--------------------+
| rev_id | rev_page | rev_text_id | rev_comment | rev_user | rev_user_text | rev_timestamp  | rev_minor_edit | rev_deleted | rev_len | rev_parent_id | rev_sha1                        | rev_content_model | rev_content_format |
+--------+----------+-------------+-------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+-------------------+--------------------+
|      1 |        1 |           1 |             |        0 |               | 20190814124100 |              0 |           0 |    1345 |             0 | kzxgim5xw3lahegt3rq4f4wg0q428wo | NULL              | NULL               |
|      2 |        2 |           2 |             |        0 |               | 20190814124100 |              0 |           0 |     110 |             0 | e83nrakkhufq82zfs2bacwjs3m45wa3 | NULL              | NULL               |
|      3 |        3 |           3 |             |        0 |               | 20190814125456 |              0 |           0 |      10 |             0 | jrhglkly97hqxkhim2h9zmyde0vhtpo | NULL              | NULL               |
+--------+----------+-------------+-------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+-------------------+--------------------+
3 rows in set (0.00 sec)

wikiadmin@10.64.48.35(napwikisource)> select * from text;
+--------+--------------------+---------------------+
| old_id | old_text           | old_flags           |
+--------+--------------------+---------------------+
|      1 | DB://cluster25/256 | utf-8,gzip,external |
|      2 | DB://cluster25/257 | utf-8,gzip,external |
|      3 | DB://cluster25/1   | utf-8,gzip,external |
+--------+--------------------+---------------------+
3 rows in set (0.00 sec)

wikiadmin@10.64.48.35(napwikisource)>

The ids look weird. It is using the wrong first two ids... possibly something to do with aawiki? Why does it choose 256 and 257 for the first text rows for aawiki?

wikiadmin@10.64.48.35(aawiki)> select * from text ORDER BY old_id DESC LIMIT 10;
+--------+---------------+-----------+--------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
| old_id | old_namespace | old_title | old_text           | old_comment | old_user | old_user_text | old_timestamp | old_minor_edit | old_flags           | inverse_timestamp |
+--------+---------------+-----------+--------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
|  10817 |               |           | DB://cluster25/255 |             |          |               |               |                | utf-8,gzip,external |                   |
|  10816 |               |           | DB://cluster24/268 |             |          |               |               |                | utf-8,gzip,external |                   |
|  10815 |               |           | DB://cluster24/267 |             |          |               |               |                | utf-8,gzip,external |                   |
|  10814 |               |           | DB://cluster25/254 |             |          |               |               |                | utf-8,gzip,external |                   |
|  10813 |               |           | DB://cluster25/253 |             |          |               |               |                | utf-8,gzip,external |                   |
|  10812 |               |           | DB://cluster25/252 |             |          |               |               |                | utf-8,gzip,external |                   |
|  10811 |               |           | DB://cluster25/251 |             |          |               |               |                | utf-8,gzip,external |                   |
|  10810 |               |           | DB://cluster24/266 |             |          |               |               |                | utf-8,gzip,external |                   |
|  10809 |               |           | DB://cluster24/265 |             |          |               |               |                | utf-8,gzip,external |                   |
|  10808 |               |           | DB://cluster24/264 |             |          |               |               |                | utf-8,gzip,external |                   |
+--------+---------------+-----------+--------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
10 rows in set (0.00 sec)

Need to look at ES...

This comment was removed by Reedy.
This comment was removed by Reedy.

This ticket was identified as high priority at Wikimania - @kchapman is this something your team could take on?

kchapman raised the priority of this task from Medium to High.

@Tnegrin updating the priority to make it higher and it will be looked at through the Core Platform Team clinic duty.

@Tnegrin updating the priority to make it higher and it will be looked at through the Core Platform Team clinic duty.

...

  • This is not an issue in mediawiki core, it's an issue on wiring our mediawiki code to our infrastructure. The problem is it only happens in production and while you are doing the operation of making a new wiki, there's no other way to reproduce it (except in beta cluster, but that also is a lot of hassle). So It's really really hard to find out what's wrong, you can't "try" it, change something, try again to see if it works. This makes fixing the issue quite complicated and time-consuming.

This is the problem I'm running into when first looking into this. How do I even test it to try to see what's going on?

However,

If we look at text on napwikisource

wikiadmin@10.64.48.35(napwikisource)> select * from text;
+--------+--------------------+---------------------+
| old_id | old_text           | old_flags           |
+--------+--------------------+---------------------+
|      1 | DB://cluster25/256 | utf-8,gzip,external |
|      2 | DB://cluster25/257 | utf-8,gzip,external |
+--------+--------------------+---------------------+
2 rows in set (0.00 sec)

Why is it using ids of 256 and 257 for the first revision?

If we look at aawiki text table, this looks fishy...

[...]

Need to look at ES...

I did that. It turns out the two rows were inserted into ES for aawiki, rather than ES for napwikisource. That explains the weird IDs, and why napwikisource couldn't find them later.

anomie@mwmaint1002:~$ sql --wiki=aawiki --cluster cluster25 -- -s
wikiadmin@10.64.16.187(aawiki)> select blob_id, to_base64(blob_text) from blobs_cluster25 where blob_id in (256,257);
blob_id	to_base64(blob_text)
256	dVTBbtswDL3nK4hesg2ZezfqHlZgWA9dB3RAD4YPtM0kWmzRk+QaAfzxI+XYSYPuZlKP5OPTk7Ps\n99548H1Zc4vGggSOPLk3qmHLDsKeoHKEwbAF3gJCng/mYFqqDabPvYPO8R+qgh9fJe25dxUVBUir\n9Xot2JRs+hO7vjEBLY/nz6IQADRodz3uKMtWqy/wqyH0pJU1g+UAPqALILOCsTvFB6VraQBvAiUw\nsZdP2KMXcoF8mBmBMFb6eW5s1ZcY2KWRZOQOj3NS2H6STRd0SQF1xTdy0vmYjt8kAa/njBRc4rmp\nh2X1dHxuariU4jOgrcEEGEzTQElg2o5dEH0Dw54cJbr54xaO3MPAvZQ35kDTadNBcGh9gypAHGhs\nILfFKiKiHLOGG+gm/U7izZUUx2yg7APsWMvyfDlT6un4LkwshaLYiJy+o8pgA5qd7PA/MnoVKDvi\nEUhkOrIV36CFXuiYeBUxPXXq/Vyf5+2Qjh5bAs/bMKATxaIg32WasTKznbyHJQv9PQ/KXw0RZVVO\nLI0c7MiSE6qq2QY8xd7pD4nSBxaWVhyqkVJZTLD28CSX+1UT061GQh9XPWmFIhN2u8gyy+DFeFFg\neQQgPr7znSxeNeh9dtM18qoaYw/+5n6V394Ow5CoBp0200aRTIwKGC8QujS64ww5he8xpmQ++KVL\njK4Q8lTOAA2uzv/2LO9oBsToCjH5eIGcbP0ec3oYC2h+KFcoPopLz6AYnjDRaeST5d9yHjidXPWq\nMZwhGpzOK25bttd9HqZssbq71cu5X61exCCXNlCzjJdeWH5+zK42dvkBSk4sPd938g8=
257	JYxBDoAgDAR/4xHuJsSnmAoNNkohtI3fN8B1dnZu1Sa796kyKLqPHiqYCFzt2R+m5ZRqPWJYwjbI\nEKwEoYQX9IkilAaUOTC0GVmvUdlM8AXOY/oB

Un-base64ing and gzinflating those results produces the expected contents of the two napwikisource revisions.

I suppose that does give me a starting point for some static analysis and simulation. I doubt I'll be able to fully test a fix, though.

This comment was removed by Anomie.

sending back to the Performance Team Inbox so @aaron can resolve as the issue appears to be in the load balancer code.

Based on T212881#5413761, there seem to be bad db records on inserted ES hosts; once the root problem is fixed let me know so we can do any necessary cleanup (you don't need a sysadmin for that, but I am offering my help to do it safely, like any database delete).

Poking around using shell.php and tracing through the addWiki.php code.

anomie@mwmaint1002:~$ mwscript shell.php --wiki=aawiki
Psy Shell v0.9.9 (PHP 7.2.8-1+0~20180725124257.2+stretch~1.gbp571e56 — cli) by Justin Hileman
>>> $wgDBname;
=> "aawiki"
>>> $lbFactory = MediaWiki\MediaWikiServices::getInstance()->getDBLoadBalancerFactory();
=> Wikimedia\Rdbms\LBFactoryMulti {#209}
>>> $localLb = $lbFactory->getMainLB();
=> Wikimedia\Rdbms\LoadBalancer {#268}
>>> $localLb->getLocalDomainID();
=> "aawiki"
>>> $lbFactory->redefineLocalDomain( 'napwikisource' );
=> null
>>> $localLb->getLocalDomainID();
=> "napwikisource"

Checks out so far. Let's dig into what happens when it tries to create the page. WikiPage calls PageUpdater which eventually calls RevisionStore which eventually calls SqlBlobStore. The SqlBlobStore comes from MediaWikiServices, so

>>> $blobStore = Wikimedia\TestingAccessWrapper::newFromObject( MediaWiki\MediaWikiServices::getInstance()->getBlobStoreFactory()->newSqlBlobStore( false ) );
=> Wikimedia\TestingAccessWrapper {#2699
     +object: MediaWiki\Storage\SqlBlobStore {#2698},
   }
>>> $extStoreAccess = Wikimedia\TestingAccessWrapper::newFromObject( $blobStore->extStoreAccess );
=> Wikimedia\TestingAccessWrapper {#2727
     +object: ExternalStoreAccess {#2716},
   }
>>> $blobStore->dbDomain
=> false
>>> $extStoreAccess->storeFactory->getWriteBaseUrls()
=> [
     "DB://cluster24",
     "DB://cluster25",
   ]
>>> $store = Wikimedia\TestingAccessWrapper::newFromObject( $extStoreAccess->storeFactory->getStoreForUrl( 'DB://cluster25', [ 'domain' => false ] ) );
=> Wikimedia\TestingAccessWrapper {#2692
     +object: ExternalStoreDB {#2739},
   }
>>> $extStoreAccess->storeFactory->getStoreLocationFromUrl( 'DB://cluster25' )
=> "cluster25"
>>> $store->getMaster( 'cluster25' )->getDomainID();
=> "aawiki"

There's the problem, something in something used by ExternalStoreDB doesn't get affected by the redefineLocalDomain() called. Now where does $store->getMaster() wind up getting its info from?

>>> $lb = $store->getLoadBalancer( 'cluster25' )
=> Wikimedia\Rdbms\LoadBalancer {#483}
>>> $lb->getLocalDomainID()
=> "napwikisource"
>>> $store->getDomainId( $lb->getServerInfo( $lb->getWriterIndex() ) )
=> "aawiki"
>>> $lb->getServerInfo( $lb->getWriterIndex() )
=> [
     "blobs table" => "blobs_cluster25",
     "flags" => 0,
     "dbname" => "aawiki",
     "user" => "wikiadmin",
     "password" => [REDACTED],
     "type" => "mysql",
     "max lag" => 6,
     "variables" => [
       "innodb_lock_wait_timeout" => 15,
     ],
     "master" => true,
     "host" => "10.64.32.65",
     "hostName" => "10.64.32.65",
     "load" => 0,
   ]

Ah ha. The ->redefineLocalDomain() call doesn't change the actual server configuration in the LoadBalancer and LBFactory, it overrides the "domain" passed to the Database instance. But ExternalStoreDB is checking the original configuration to fetch the domain when it has false as its configured domain. A comment in ExternalStoreDB references T200471: LBFactorySimple breaks ExternalStorage, trying to connect to external server with local database name as the reason for that.

Since LoadBalancer stuff is mostly handled by @aaron, and in particular he made the change that caused T200471 as well as the change that fixed it, I'd ask him to look at this. A minimal reproduction seems to be

if ( $wgDBname !== 'aawiki' ) {
    echo "Run with --wiki=aawiki\n";
    exit( 1 );
}

$services = MediaWiki\MediaWikiServices::getInstance();
$services->getDBLoadBalancerFactory()->redefineLocalDomain( 'napwikisource' )
echo $services->getExternalStoreFactory()->getStoreForUrl( 'DB://cluster25', [ 'domain' => false ] )->getMaster( 'cluster25' )->getDomainID() . "\n";
// Should be napwikisource, not aawiki

I hate to butt in here (OK, not really), but could the problems being discussed here possibly explain (and maybe ultimately lead to a fix of) the weirdness that is being seen in the content-page count of napwikisource, as discussed at T231770?

(Following the September 1st and 15th regularly-scheduled recounts, the wiki has been reporting a content-page count over twice as high as the highest value the correct number could possibly be, given the way the wiki is supposed to be configured.)

Or is this (probably) just a totally separate issue?

I hate to butt in here (OK, not really), but could the problems being discussed here possibly explain (and maybe ultimately lead to a fix of) the weirdness that is being seen in the content-page count of napwikisource, as discussed at T231770?

(Following the September 1st and 15th regularly-scheduled recounts, the wiki has been reporting a content-page count over twice as high as the highest value the correct number could possibly be, given the way the wiki is supposed to be configured.)

Or is this (probably) just a totally separate issue?

In short, no.

The problem here is problem with text storage in what we call "external storage"

All page counting is done based on the page table

Change 532552 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/WikimediaMaintenance@master] Use new ExternalStoreDB::initializeTable() method

https://gerrit.wikimedia.org/r/532552

Change 532552 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@master] Use new ExternalStoreDB::initializeTable() method

https://gerrit.wikimedia.org/r/532552

Change 537565 had a related patch set uploaded (by Ladsgroup; owner: Aaron Schulz):
[mediawiki/extensions/WikimediaMaintenance@wmf/1.34.0-wmf.23] Use new ExternalStoreDB::initializeTable() method

https://gerrit.wikimedia.org/r/537565

Change 537566 had a related patch set uploaded (by Ladsgroup; owner: Aaron Schulz):
[mediawiki/extensions/WikimediaMaintenance@wmf/1.34.0-wmf.22] Use new ExternalStoreDB::initializeTable() method

https://gerrit.wikimedia.org/r/537566

So it looks like this is potentially fixed? @kchapman -- not sure about the workflow here...does your team want to check to see if Aaron's patch works? Thanks!

So it looks like this is potentially fixed? @kchapman -- not sure about the workflow here...does your team want to check to see if Aaron's patch works? Thanks!

Hey, I reviewed and since it seems that it fixed the issue , I merged it. In less than an hour, I'm going to create Hindi Wikisource and check if the merged patch fixes the issue or more work is needed. If everything goes fine, I will close this ticket and call it done.

Thanks @Ladsgroup ! Our emerging language communities will be thrilled :)

Change 537566 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@wmf/1.34.0-wmf.22] Use new ExternalStoreDB::initializeTable() method

https://gerrit.wikimedia.org/r/537566

Change 537565 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@wmf/1.34.0-wmf.23] Use new ExternalStoreDB::initializeTable() method

https://gerrit.wikimedia.org/r/537565

The domain appears to be created, but I see this on the main page:

[XYJq7wpAMFEAABkf6JgAAAAO] 2019-09-18 17:35:43: प्रकार की गंभीर अपवाद "MediaWiki\Revision\RevisionAccessException"

The domain appears to be created, but I see this on the main page:

[XYJq7wpAMFEAABkf6JgAAAAO] 2019-09-18 17:35:43: प्रकार की गंभीर अपवाद "MediaWiki\Revision\RevisionAccessException"

Yes, the fix didn't work as expected. @Ladsgroup fixed the wiki manually, it should work now.

Yup. We fixed it manually but it means nothing has changed since Wikimania. This is logstash record of the error: https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2019.09.18/mediawiki?id=AW1FbDfpZKA7RpirHDzm&_g=h@44136fa

this is content of text table before I manually edit it:

|      1 | DB://cluster24/269 | utf-8,gzip,external |
|      2 | DB://cluster25/258 | utf-8,gzip,external |
|      3 | DB://cluster25/1   | utf-8,gzip,external |
|      4 | DB://cluster24/1   | utf-8,gzip,external |
+--------+--------------------+---------------------+

But Everything for hiwikisource should be fine now.

It's good that the problem is getting a lot of attention, even if things aren't quite fixed yet. Thank you. There's one more project creation currently in Phabricator that you can try to work with: T230359. Additionally, there are four more currently under language verification. I'm under a fair amount of pressure from the emerging language community to start moving others along and I intend to do that. (It will still be at least a couple of weeks before anything more than the five described above could be in the queue.)

As I watched @Ladsgroup when he was creating the new wiki, it seems to me that it is now (relatively) easy to create a new wiki now, but still not ideal. Feel free to create new tickets, it (among other things) helps us to get an impression of urgence. Thanks!

As I watched @Ladsgroup when he was creating the new wiki, it seems to me that it is now (relatively) easy to create a new wiki now, but still not ideal. Feel free to create new tickets, it (among other things) helps us to get an impression of urgence. Thanks!

The process to create a wiki hasn't really changed on the whole in years. It's always been relatively "easy"

The difficult part is when things go wrong. And more often than not, *something* does go wrong

Well, I assumed it would fail similary, but maybe that's a wrong assumption ;).

Well, I assumed it would fail similary, but maybe that's a wrong assumption ;).

Very much so. Between Krenair and I... we’ve seen a lot of different breakages. Of all parts of the code

I think getting this script working without manual intervention would be a good first step. @aaron -- I think this is back to you...

I think getting this script working without manual intervention would be a good first step. @aaron -- I think this is back to you...

For sure. There's definitely no point running it again if we know it's broken and no attempt to fix it in the meantime :)

Change 538344 had a related patch set uploaded (by TK-999; owner: TK-999):
[mediawiki/core@master] rdbms: Update server array entries in redefineLocalDomain()

https://gerrit.wikimedia.org/r/538344

Change 538345 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/WikimediaMaintenance@master] addWiki: Redefine database again after creating ES storage

https://gerrit.wikimedia.org/r/538345

So @TK-999 and I spend this evening debugging this and we believe we found the issue. This is basically a code snippet that is similar to what addWiki.php does. I ran it on beta cluster:

$services = MediaWiki\MediaWikiServices::getInstance();
$services->getDBLoadBalancerFactory()->getExternalLB( 'cluster1' );
$services->getDBLoadBalancerFactory()->redefineLocalDomain( 'dewiki' );
echo $services->getExternalStoreFactory()->getStoreForUrl( 'DB://cluster1', [ 'domain' => false ] )->getMaster( 'cluster1' )->getDomainID() . "\n";

$title = Title::newFromText( 'FooO' );
$article = WikiPage::factory( $title );
$article->doEditContent(
	ContentHandler::makeContent( 'BAAAAr', $title ),
	'',
	EDIT_NEW | EDIT_AUTOSUMMARY
);

This basically redefines the local domain and tries to create an article in the wiki. The result is the same as the production issue: https://de.wikipedia.beta.wmflabs.org/wiki/FooO

image.png (678×1 px, 168 KB)

(It has more info because error stack trace is on for beta cluster)

but if we make the redefineLocalDomain also redefine db names in $this->servers (and also make sure that the LB for ES already instantiated because otherwise it will just use the 'serverTemplate' (from db-eqiad.php) which uses $wgDBname). Then it works without any issues:
https://de.wikipedia.beta.wmflabs.org/wiki/FooOo

image.png (159×478 px, 7 KB)

Awesome -- do you feel confident enough to try with N'ko from T230359?

Awesome -- do you feel confident enough to try with N'ko from T230359?

The core patch in particular needs code review by the people that understand it. Timo has expressed some concerns

Awesome -- do you feel confident enough to try with N'ko from T230359?

The core patch in particular needs code review by the people that understand it. Timo has expressed some concerns

To recap, the problem is known now, there are several ways to fix it and some are dirtier than the others, we are discussing which technical solution we should use. Once a solution is decided, I go forward with T230359: Create N'Ko Wikipedia and see if it's working or not.

Change 538455 had a related patch set uploaded (by TK-999; owner: TK-999):
[mediawiki/extensions/WikimediaMaintenance@master] addWiki.php: Redefine RevisionStore service for the wiki being created

https://gerrit.wikimedia.org/r/538455

Change 538344 abandoned by TK-999:
rdbms: Update server array entries in redefineLocalDomain()

Reason:
Abandoning in favor of I46a3f7cca33a542e0d959aa64b5e21033d8db7ca which provides a cleaner solution.

https://gerrit.wikimedia.org/r/538344

Change 538345 abandoned by Ladsgroup:
addWiki: Redefine database again after creating ES storage

Reason:
In favor of I46a3f7cca33

https://gerrit.wikimedia.org/r/538345

This revision definitely looks cleaner! Are we ready to try with N'Ko in production?

This revision definitely looks cleaner! Are we ready to try with N'Ko in production?

Will try it later today, in a couple of hours.

Change 538455 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@master] addWiki.php: Redefine RevisionStore service for the wiki being created

https://gerrit.wikimedia.org/r/538455

Change 539147 had a related patch set uploaded (by Ladsgroup; owner: TK-999):
[mediawiki/extensions/WikimediaMaintenance@wmf/1.34.0-wmf.24] addWiki.php: Redefine RevisionStore service for the wiki being created

https://gerrit.wikimedia.org/r/539147

Change 539148 had a related patch set uploaded (by Ladsgroup; owner: TK-999):
[mediawiki/extensions/WikimediaMaintenance@wmf/1.34.0-wmf.23] addWiki.php: Redefine RevisionStore service for the wiki being created

https://gerrit.wikimedia.org/r/539148

Change 539147 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@wmf/1.34.0-wmf.24] addWiki.php: Redefine RevisionStore service for the wiki being created

https://gerrit.wikimedia.org/r/539147

Mentioned in SAL (#wikimedia-operations) [2019-09-25T17:08:56Z] <ladsgroup@deploy1001> Synchronized php-1.34.0-wmf.24/extensions/WikimediaMaintenance/addWiki.php: Redefine RevisionStore service for the wiki being created (T212881) (duration: 01m 04s)

Change 539148 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@wmf/1.34.0-wmf.23] addWiki.php: Redefine RevisionStore service for the wiki being created

https://gerrit.wikimedia.org/r/539148

Mentioned in SAL (#wikimedia-operations) [2019-09-25T17:11:17Z] <ladsgroup@deploy1001> Synchronized php-1.34.0-wmf.23/extensions/WikimediaMaintenance/addWiki.php: Redefine RevisionStore service for the wiki being created (T212881) (duration: 01m 05s)

Ladsgroup reassigned this task from aaron to TK-999.

Creating nqowiki (T230359: Create N'Ko Wikipedia) was done without any issues related to this ticket. (There was some issues because we didn't put the wiki into list of rtl wikis, which is not documented, but cleaning that up was pretty easy and compared to any other wiki creation I have ever done, this was nothing and uneventful). Thanks Mate for fixing this and everyone who helped (Anomie, Aaron, Urbanecm, Daniel Zahn, and so many others).

And the wikis rejoiced! Thanks everybody!

Change 526572 abandoned by Urbanecm:
[mediawiki/extensions/WikimediaMaintenance@master] Catch, re-check, and retry in case of DB error in creating ES tables

Reason:
the linked task is closed as resolved

https://gerrit.wikimedia.org/r/526572

Change 525999 abandoned by Ladsgroup:

[mediawiki/extensions/WikimediaMaintenance@master] Make AddWiki::execute() smaller

Reason:

We are migrating addWiki.php to core. This is not needed

https://gerrit.wikimedia.org/r/525999