⚓ T224949 Failed to access name from slot

Subject	Repo	Branch	Lines +/-
NameTableStore: ensure consistency upon rollback.	mediawiki/core	REL1_33	+313 -23
NameTableStore: ensure consistency upon rollback.	mediawiki/core	REL1_32	+313 -23
NameTableStore: ensure consistency upon rollback.	mediawiki/core	REL1_34	+313 -23
NameTableStore: ensure consistency upon rollback.	mediawiki/core	master	+313 -23
Protected populateContentTables.php against leaked transactions.	mediawiki/core	master	+24 -3

In T224949#5233839, @Legoktm wrote:

How wrong would it be to explicitly insert the "main" row into slot_roles during update.php rather than relying on NameTableStore?

As far as I can see, whatever makes the insert inside NameTableStore fail will also make the direct insert fail.

I enabled database query logging during a 1.31 -> 1.32 web upgrade. It shows the database apparently being in autocommit mode when the row is inserted. However, I wasn't able to reproduce the issue.

A taxonomy of failure modes that occur to me:

1. The installer does not run the update
- 1.1 The update process is aborted before reaching populateContentTables
- 1.2 $wgMultiContentRevisionSchemaMigrationStage & SCHEMA_COMPAT_WRITE_NEW is false
- 1.3 The updatelog row exists
2. The installer runs the update but does not insert the row
- 2.1 $wgMultiContentRevisionSchemaMigrationStage & SCHEMA_COMPAT_WRITE_NEW is zero
- 2.2 The row is found in the cache
  - 2.2.1 While navigating the wiki with 1.31 tables but 1.32 code, the row was inserted, loaded into the cache in the same transaction, and then rolled back. A subsequent update sees the populated cache.
- 2.3 The row is inserted but later rolled back
  - 2.3.1 The DB is not in autocommit mode due to a transaction leak in a prior update action. An exception aborts the update.
3. The row is inserted but later deleted or ignored

None of these stand out as really likely. The problem with 2.2.1 is that you would have to find a way to call acquireId() on a wiki with 1.31 tables. In my testing, saving revisions on such a broken wiki fails with an exception in RevisionStore::loadSlotRecords() long before it would call acquireId(), due to the slots table being empty.

If you're affected by this issue or can reproduce it, can you please say whether the upgrade was done by CLI or the web UI. If possible, please paste the output from the upgrade.

In T224949#5235469, @tstarling wrote:

If you're affected by this issue or can reproduce it, can you please say whether the upgrade was done by CLI or the web UI. If possible, please paste the output from the upgrade.

@freephile and @Xcompass are the two reports (via IRC) that I've seen. Can either of you provide logs, and also confirm how you upgraded? :)

Obviously the process isn't completely broken, but something is flaky/there's definitely some edge case that is causing it to break on a few wikis

In the scenario of 1.32 code with a non-upgraded 1.31 wiki, I found that page creation actually succeeds. However, it does not write slot_roles to memcached and the DB in the same request. It inserts the slot_roles row during the save transaction, but the cache set is ignored due to the 'pending' flag, which exists to avoid this exact problem. The row is successfully inserted and committed, then on the subsequent GET request, it is loaded from the DB and saved to the cache.

Change 514666 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Protected populateContentTables.php against leaked transactions.

https://gerrit.wikimedia.org/r/514666

Change 514245 abandoned by Daniel Kinzler:
NameTableStore: ensure consistency upon rollback.

Reason:
See I570378b659 instead

https://gerrit.wikimedia.org/r/514245

Thank you for investigating, Tim!

In T224949#5235469, @tstarling wrote:

1. The installer does not run the update

1.1 The update process is aborted before reaching populateContentTables

1.2 $wgMultiContentRevisionSchemaMigrationStage & SCHEMA_COMPAT_WRITE_NEW is false

1.3 The updatelog row exists

2. The installer runs the update but does not insert the row

2.1 $wgMultiContentRevisionSchemaMigrationStage & SCHEMA_COMPAT_WRITE_NEW is zero

2.2 The row is found in the cache

2.2.1 While navigating the wiki with 1.31 tables but 1.32 code, the row was inserted, loaded into the cache in the same transaction, and then rolled back. A subsequent update sees the populated cache.

2.3 The row is inserted but later rolled back

2.3.1 The DB is not in autocommit mode due to a transaction leak in a prior update action. An exception aborts the update.

3. The row is inserted but later deleted or ignored

The patch I wrote was intended to protect against 2.3. My suspicion is that the actual cause is among the one's you listed as 1.x.

I'm curious whether the wikis that have this problem are part of a wiki farm. One thing that might cause this problem is two wikis sharing the same WAN cache key for NameTableStore. But I'd expect that kind of misconfiguration to cause trouble across the board, not just with the NameTableStore. NameTableStore might also be looking at the wrong database - but again, that seems very unlikely, and the misconfiguration necessary to trigger that would likely be catastrophic anyway. Both these options are inconsistent with the report that running populateContentTables.php manually fixed the issue.

It seems like there isn't much we can do as long as we don't have enough information to reproduce this.

• WDoranWMF edited projects, added Platform Team Workboards (Team 3); removed Platform Team Workboards (Waiting for Review).Jun 10 2019, 2:43 PM

• WDoranWMF moved this task from Backlog to Ready on the Platform Team Workboards (Team 3) board.

CCicalese_WMF moved this task from Ready to Doing on the Platform Team Workboards (Team 3) board.Jul 1 2019, 2:46 PM

Marked as stalled per my previous comment:

It seems like there isn't much we can do as long as we don't have enough information to reproduce this.

Please reopen when new information becomes available.

CCicalese_WMF moved this task from Doing to Backlog on the Platform Team Workboards (Team 3) board.Jul 1 2019, 3:07 PM

This missed the boat for the MW 1.33 release. Provisionally re-tagging to 1.34's release as well.

reducing to normal prio, since it seems to be rare.

I have the same exception after upgrade from 1.31.0 to 1.33.0. There is message in update.php output:

...slot_roles table already exists.

The slot_roles table was empty before and is empty after upgrade.
What should its contents be?

The table was added in 1.31, so the message that it exists is expected

MariaDB [wikidb]> select * from mw_slot_roles;
+---------+-----------+
| role_id | role_name |
+---------+-----------+
|       1 | main      |
+---------+-----------+
1 row in set (0.001 sec)

@Absorbcium is there anything in the log about the PopulateContentTables script being run? Try running it manually if in doubt.

In T224949#5314988, @daniel wrote:

@Absorbcium is there anything in the log about the PopulateContentTables script being run? Try running it manually if in doubt.

Result of PopulateContentTables script:

Populating revision...
... revision processed up to revision id 500 of 2794 (0 rows in 0.13800811767578 seconds)
... revision processed up to revision id 1000 of 2794 (0 rows in 0.17201018333435 seconds)
... revision processed up to revision id 1500 of 2794 (0 rows in 0.20401096343994 seconds)
... revision processed up to revision id 2000 of 2794 (0 rows in 0.2440140247345 seconds)
... revision processed up to revision id 2500 of 2794 (0 rows in 0.26101517677307 seconds)
... revision processed up to revision id 2794 of 2794 (0 rows in 0.270015001297 seconds)
Done populating revision table. Processed 0 rows in 0.270015001297 seconds
Populating archive...
... archive processed up to revision id 582 of 1614 (0 rows in 0.0090010166168213 seconds)
... archive processed up to revision id 1082 of 1614 (0 rows in 0.013000965118408 seconds)
... archive processed up to revision id 1582 of 1614 (0 rows in 0.015001058578491 seconds)
... archive processed up to revision id 1614 of 1614 (0 rows in 0.017001152038574 seconds)
Done populating archive table. Processed 0 rows in 0.017001152038574 seconds
Done. Processed 0 rows in 0.40702319145203 seconds

After adding role_id=1 with "main" role name manually, other exception appeared:

Failed to access name from content_models using id = 1

The content_models table was and is empty before and after upgrade, and after PopulateContentTables script too. ~~What should contents of content_models be?~~ I got its content from fresh 1.33.0 installation

Also it seems it's too related to cache, for example: now "Failed to access name from content_models using id = 1" error appears.
When I delete role_id=1 from slot_roles table, previous "Failed to access name from slot_roles using id = 1" error message doesn't appear but returned back again after clearing cache.

Should cache be disabled during upgrade? The update.php creates keys with both mentioned above tables:

>redis-cli keys *
1) "WANCache:t:the_wiki:messages:en"
2) "WANCache:v:the_wiki:messages:en:hash:v1"
3) "the_wiki:messages:en"
4) "WANCache:v:global:NameTableSqlStore:content_models:the_wiki"
5) "WANCache:t:the_wiki:MessageBlobStore"
6) "WANCache:v:global:NameTableSqlStore:slot_roles:the_wiki"

@Reedy, @daniel thank you for help, after manually filling slot_roles and content_models tables 1.33.0 worked after upgrade from 1.31.0.

Content of content_models table:

mysql> SELECT * FROM content_models;
+----------+------------+
| model_id | model_name |
+----------+------------+
|        1 | wikitext   |
+----------+------------+
1 row in set (0.00 sec)

Caching should not be the cause of any problems during upgrade. Unless the desired is found in the cache but isn't actually in the table, in which case it wouldn't get created in the table. But how would it get into the cache if it's not in the table? The cache is populated by loading from the database.

I still have no clue how this can happen, besides the upgrade is interrupted somehow, by an error, or ctrl-c, system failure, etc.

In T224949#5317435, @daniel wrote:

Caching should not be the cause of any problems during upgrade. Unless the desired is found in the cache but isn't actually in the table, in which case it wouldn't get created in the table. But how would it get into the cache if it's not in the table? The cache is populated by loading from the database.

I still have no clue how this can happen, besides the upgrade is interrupted somehow, by an error, or ctrl-c, system failure, etc.

I don't know this too, perhaps empty tables is result of previous upgrade but cache was also filled with data of previous version.

Before upgrade the wiki was 1.31.0, cache was filled with its data and everything worked after upgrade except phpBB user integration - no new user added to wiki "actor" table by integration code ("actor" was empty before upgrade and was filled with update.php. Also "actor" was LEFT JOIN-ed to "user" table and everything worked, old users could edit articles, etc). When I fixed the user-to-actor relation for new users, there left new latest added users with the old still wrong relation in cache. In this point I cleared cache and only here faced with this topic "Failed to access name from slot_roles using id = 1" error.
Maybe this information will be useful to someone else.

Thank you for help, @daniel, now everything looks working on my side.

Before upgrade the wiki was 1.31.0, cache was filled with its data and everything worked after upgrade

This seems strange - this sounds like the cache actually contained the correct mapping for slot_roles, which means it was loaded from the database. But then it was removed again from the database? This is very odd indeed, since as far as I know, there is no code deleting from that table, ever. The only explanation I have is that some error caused a database transaction to be rolled back, so the data what had already been written to the database and loaded into the cache then vanished from the database. This seems unlikely, but not impossible.

I worked on a patch that would address this situation , but abandoned that work because it seemed to unlikely that this would ever happen. I still don't see how it can, especially without any errors being reported.

The whole actor/use thing seems unrelated - unless it was the cause of an error that triggered a database rollback somehow.

In T224949#5317435, @daniel wrote:

Caching should not be the cause of any problems during upgrade. Unless the desired is found in the cache but isn't actually in the table, in which case it wouldn't get created in the table. But how would it get into the cache if it's not in the table? The cache is populated by loading from the database.

I still have no clue how this can happen, besides the upgrade is interrupted somehow, by an error, or ctrl-c, system failure, etc.

My best guess is something like this:

upgrade.php runs PopulateContentTables
- The initial insertions into slot_rolws and content_models somehow get rolled back, so they're in the cache but not in the DB. Probably the script run itself fails at this point, and maybe the whole run of upgrade.php too.
upgrade.php is run again, which runs PopulateContentTables again
- This time the data that's supposed to be in sites and content_models is in the cache so it doesn't get reinserted. And this time any remaining rows get processed.
Eventually the cache expires, and things start failing due to missing slot_roles and content_models.

Just re-running PopulateContentTables isn't likely to help, as it doesn't try to populate rows that already have slots (of which there apparently aren't any). And even if it did, it'd probably insert a row with a new ID rather than an ID matching the existing broken rows.

A few things you might try:

Truncate content and slots to get rid of the broken rows. Then clear caches and run PopulateContentTables again.
Manually add the slot_roles row, as you did, and also populate content_models. If you're lucky, SELECT DISTINCT content_model AS model_id, rev_content_model AS model_name FROM revision JOIN slots ON(rev_id=slot_revision_id) JOIN content ON(slot_content_id=content_id) UNION SELECT DISTINCT content_model AS model_id, ar_content_model AS model_name FROM archive JOIN slots ON(ar_rev_id=slot_revision_id) JOIN content ON(slot_content_id=content_id); will give you the needed rows; if not, some of the rows will have NULL for model_name and you'll have to do more work to figure out what they're supposed to be.

@Anomie time to revive https://gerrit.wikimedia.org/r/c/mediawiki/core/+/514245?

I'd say so.

Change 514245 restored by Daniel Kinzler:
NameTableStore: ensure consistency upon rollback.

https://gerrit.wikimedia.org/r/514245

• WDoranWMF edited projects, added Platform Team Workboards (Clinic Duty Team); removed Platform Team Workboards (Team 3).Jul 17 2019, 4:45 PM

• WDoranWMF moved this task from MCR to mop on the Platform Engineering board.Jul 26 2019, 6:38 PM

• WDoranWMF edited projects, added Core Platform Team Initiatives (MCR); removed Platform Engineering (MCR).

Hey there, should this be moved to 1.35? The cut is a couple of weeks away. If it needs to go out in 1.34, is there anything I can do to help get it out of the door?

@Jdforrester-WMF I'd prefer for this to make it into 1.34. The error is rare, but if it hits during an update.php run, it results in an unusable wiki requiring manual surgery in the database.

Bumping prio to high, let's see if we can make it. Afterall, there is already a patch up.

Just ran into this on https://asyncwiki-moon.wmflabs.org/demo/index.php/Main_Page ... looking at the patch.

https://asyncwiki-moon.wmflabs.org/demo/index.php/Main_Page stack trace:

[bfe433cada9a7c738a0e109e] /demo/index.php/Main_Page MediaWiki\Storage\NameTableAccessException from line 42 of /opt/htdocs/mediawiki/includes/Storage/NameTableAccessException.php: Failed to access name from slot_roles using id = 2

Backtrace:

#0 /opt/htdocs/mediawiki/includes/Storage/NameTableStore.php(308): MediaWiki\Storage\NameTableAccessException::newFromDetails(string, string, integer)
#1 /opt/htdocs/mediawiki/includes/Revision/RevisionStore.php(1624): MediaWiki\Storage\NameTableStore->getName(integer)
#2 /opt/htdocs/mediawiki/includes/Revision/RevisionStore.php(1671): MediaWiki\Revision\RevisionStore->loadSlotRecords(string, integer)
#3 [internal function]: MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}()
#4 /opt/htdocs/mediawiki/includes/Revision/RevisionSlots.php(165): call_user_func(Closure)
#5 /opt/htdocs/mediawiki/includes/Revision/RevisionSlots.php(107): MediaWiki\Revision\RevisionSlots->getSlots()
#6 /opt/htdocs/mediawiki/includes/Revision/RevisionRecord.php(192): MediaWiki\Revision\RevisionSlots->getSlot(string)
#7 /opt/htdocs/mediawiki/includes/Revision/RevisionRecord.php(175): MediaWiki\Revision\RevisionRecord->getSlot(string, integer, User)
#8 /opt/htdocs/mediawiki/includes/Revision.php(932): MediaWiki\Revision\RevisionRecord->getContent(string, integer, User)
#9 /opt/htdocs/mediawiki/includes/page/Article.php(493): Revision->getContent(integer, User)
#10 /opt/htdocs/mediawiki/includes/page/Article.php(714): Article->fetchRevisionRecord()
#11 /opt/htdocs/mediawiki/includes/actions/ViewAction.php(68): Article->view()
#12 /opt/htdocs/mediawiki/includes/MediaWiki.php(501): ViewAction->show()
#13 /opt/htdocs/mediawiki/includes/MediaWiki.php(294): MediaWiki->performAction(Article, Title)
#14 /opt/htdocs/mediawiki/includes/MediaWiki.php(860): MediaWiki->performRequest()
#15 /opt/htdocs/mediawiki/includes/MediaWiki.php(517): MediaWiki->main()
#16 /opt/htdocs/mediawiki/index.php(42): MediaWiki->run()
#17 {main}

@MarkAHershberger the exception says id = 2, not 1; are you sure this is the same issue?

No, I'm not sure. I just found this bug and hoped it was the same.

> select * from slot_roles;
stdClass Object
(
    [role_id] => 1
    [role_name] => main
)

fixed by adding a row with role_id=2 and role_name='aux'

Now...

[51fbb8ea5f47e21ee0143081] [no req]   MediaWiki\Revision\RevisionAccessException from line 1635 of /opt/htdocs/mediawiki/includes/Revision/RevisionStore.php: Main slot of revision 934 not found in database!
Backtrace:
#0 /opt/htdocs/mediawiki/includes/Revision/RevisionStore.php(1671): MediaWiki\Revision\RevisionStore->loadSlotRecords(string, integer)
#1 [internal function]: MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}()
#2 /opt/htdocs/mediawiki/includes/Revision/RevisionSlots.php(165): call_user_func(Closure)
#3 /opt/htdocs/mediawiki/includes/Revision/RevisionRenderer.php(169): MediaWiki\Revision\RevisionSlots->getSlots()
#4 /opt/htdocs/mediawiki/includes/Revision/RevisionRenderer.php(128): MediaWiki\Revision\RevisionRenderer->combineSlotOutput(MediaWiki\Revision\RenderedRevision, array)
#5 [internal function]: MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}(MediaWiki\Revision\RenderedRevision, array)
#6 /opt/htdocs/mediawiki/includes/Revision/RenderedRevision.php(175): call_user_func(Closure, MediaWiki\Revision\RenderedRevision, array)
#7 /opt/htdocs/mediawiki/includes/Storage/DerivedPageDataUpdater.php(1266): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput()
#8 /opt/htdocs/mediawiki/includes/Storage/DerivedPageDataUpdater.php(1288): MediaWiki\Storage\DerivedPageDataUpdater->getCanonicalParserOutput()
#9 /opt/htdocs/mediawiki/includes/Storage/DerivedPageDataUpdater.php(1560): MediaWiki\Storage\DerivedPageDataUpdater->getSecondaryDataUpdates(boolean)
#10 /opt/htdocs/mediawiki/includes/page/WikiPage.php(2106): MediaWiki\Storage\DerivedPageDataUpdater->doSecondaryDataUpdates(array)
#11 /opt/htdocs/mediawiki/maintenance/refreshLinks.php(275): WikiPage->doSecondaryDataUpdates(array)
#12 /opt/htdocs/mediawiki/maintenance/refreshLinks.php(198): RefreshLinks::fixLinksFromArticle(integer, boolean)
#13 /opt/htdocs/mediawiki/maintenance/refreshLinks.php(83): RefreshLinks->doRefreshLinks(integer, boolean, string, boolean, boolean)
#14 /opt/htdocs/mediawiki/maintenance/rebuildall.php(60): RefreshLinks->execute()
#15 /opt/htdocs/mediawiki/maintenance/doMaintenance.php(94): RebuildAll->execute()
#16 /opt/htdocs/mediawiki/maintenance/rebuildall.php(67): require_once(string)
#17 {main}

In T224949#5489415, @MarkAHershberger wrote:

fixed by adding a row with role_id=2 and role_name='aux'

Role name "aux" only exists in tests - if you are hitting this in your "real" database, this would indicate that some test is leaking into the real world. That's NOT good...

If this is the cause, it seems to be different from the one described by this ticket.

Change 514666 abandoned by Daniel Kinzler:
Protected populateContentTables.php against leaked transactions.

https://gerrit.wikimedia.org/r/514666

Could this error be related?

https://lists.wikimedia.org/pipermail/wikitech-l/2019-September/092551.html

In T224949#5503324, @CCicalese_WMF wrote:

Could this error be related?

https://lists.wikimedia.org/pipermail/wikitech-l/2019-September/092551.html

Yes, sounds like it.

RhinosF1 subscribed.Sep 18 2019, 10:04 PM

Paladox subscribed.Sep 18 2019, 10:06 PM

daniel claimed this task.Oct 8 2019, 9:42 AM

daniel moved this task from Waiting for Review to Doing(WIP:5) on the Platform Team Workboards (Clinic Duty Team) board.

Change 514245 merged by jenkins-bot:
[mediawiki/core@master] NameTableStore: ensure consistency upon rollback.

https://gerrit.wikimedia.org/r/514245

ReleaseTaggerBot added a project: MW-1.35-notes (1.35.0-wmf.2; 2019-10-15).Oct 10 2019, 4:00 PM

Change 542224 had a related patch set uploaded (by Reedy; owner: Daniel Kinzler):
[mediawiki/core@REL1_34] NameTableStore: ensure consistency upon rollback.

https://gerrit.wikimedia.org/r/542224

Change 542234 had a related patch set uploaded (by Reedy; owner: Daniel Kinzler):
[mediawiki/core@REL1_33] NameTableStore: ensure consistency upon rollback.

https://gerrit.wikimedia.org/r/542234

Change 542239 had a related patch set uploaded (by Reedy; owner: Daniel Kinzler):
[mediawiki/core@REL1_32] NameTableStore: ensure consistency upon rollback.

https://gerrit.wikimedia.org/r/542239

Change 542224 merged by jenkins-bot:
[mediawiki/core@REL1_34] NameTableStore: ensure consistency upon rollback.

https://gerrit.wikimedia.org/r/542224

ReleaseTaggerBot added a project: MW-1.34-notes.Oct 11 2019, 12:00 AM

Change 542239 abandoned by Reedy:
NameTableStore: ensure consistency upon rollback.

https://gerrit.wikimedia.org/r/542239

Change 542234 abandoned by Reedy:
NameTableStore: ensure consistency upon rollback.

https://gerrit.wikimedia.org/r/542234

I got bored of just kicking the failing test can down the road after 5 supporting backports to REL1_32 and REL1_33 to help this change

In T224949#5565266, @Reedy wrote:

I got bored of just kicking the failing test can down the road after 5 supporting backports to REL1_32 and REL1_33 to help this change

So when someone runs into that problem while installing 1.32 or 1.33, we just tell them to install 1.34 instead? Seems acceptable, since this seems to be rare enough.

In T224949#5566036, @daniel wrote:

In T224949#5565266, @Reedy wrote:

I got bored of just kicking the failing test can down the road after 5 supporting backports to REL1_32 and REL1_33 to help this change

So when someone runs into that problem while installing 1.32 or 1.33, we just tell them to install 1.34 instead? Seems acceptable, since this seems to be rare enough.

I think so, yeah. It was basically diminishing returns. I spent a few hours playing and stacking with commits to deal with refactoring, code being moved about etc to try and get to a solution. At some point, it's not worth any more effort

If this problem was also in 1.31, I think we'd have a different argument as it's an LTS (Supported until June 2021), and probably the sort of bug that should be fixed in it. Whereas 1.32 goes EOL in January 2020 and 1.33 in June 2020, we generally wouldn't suggesting anyone upgrades to 1.32 now (unless as a stepping stone for whatever reason). Then of course, 1.34 is due out next month, so only for them is the current situation a little awkward.

If someone else wants to give it a go, they're more than welcome to restore my stack of patches and try and work out what else is missing

REL1_33:
https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542234/
https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542280/1
https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542248/8
https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542241/1
https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542233/1
https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542232/1 (I actually merged this one because it seemed to have some value)

REL1_32:
https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542239/
https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542249/3
https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542242/2
https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542238/3
https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542237/3 (merged as above)

And then https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542280/1 at least needs putting into the REL1_32 stack

I assume this is resolved now. Please re-open as needed.

• WDoranWMF moved this task from Doing(WIP:5) to Done on the Platform Team Workboards (Clinic Duty Team) board.Oct 16 2019, 3:29 PM

Looks like I ran into this upgrading from 1.31 to 1.34. From https://asyncwiki-earth.wmflabs.org/demo/index.php/Main_Page:

MediaWiki internal error.

Original exception: [e909bd7e200cd969395fb8f6] /demo/index.php/Main_Page MediaWiki\Storage\NameTableAccessException from line 42 of /opt/htdocs/mediawiki/includes/Storage/NameTableAccessException.php: Failed to access name from slot_roles using id = 2
Backtrace:
#0 /opt/htdocs/mediawiki/includes/Storage/NameTableStore.php(327): MediaWiki\Storage\NameTableAccessException::newFromDetails(string, string, integer)
#1 /opt/htdocs/mediawiki/includes/Revision/RevisionStore.php(1637): MediaWiki\Storage\NameTableStore->getName(integer)
#2 /opt/htdocs/mediawiki/includes/Revision/RevisionStore.php(1608): MediaWiki\Revision\RevisionStore->constructSlotRecords(string, Wikimedia\Rdbms\ResultWrapper, integer, Title)
#3 /opt/htdocs/mediawiki/includes/Revision/RevisionStore.php(1719): MediaWiki\Revision\RevisionStore->loadSlotRecords(string, integer, Title)
#4 [internal function]: MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}()
#5 /opt/htdocs/mediawiki/includes/Revision/RevisionSlots.php(165): call_user_func(Closure)
#6 /opt/htdocs/mediawiki/includes/Revision/RevisionSlots.php(107): MediaWiki\Revision\RevisionSlots->getSlots()
#7 /opt/htdocs/mediawiki/includes/Revision/RevisionRecord.php(192): MediaWiki\Revision\RevisionSlots->getSlot(string)
#8 /opt/htdocs/mediawiki/includes/Revision.php(493): MediaWiki\Revision\RevisionRecord->getSlot(string, integer)
#9 /opt/htdocs/mediawiki/includes/Revision.php(760): Revision->getMainSlotRaw()
#10 /opt/htdocs/mediawiki/includes/page/WikiPage.php(663): Revision->getContentModel()
#11 /opt/htdocs/mediawiki/includes/libs/objectcache/wancache/WANObjectCache.php(1424): WikiPage->{closure}(boolean, integer, array, NULL)
#12 /opt/htdocs/mediawiki/includes/libs/objectcache/wancache/WANObjectCache.php(1278): WANObjectCache->fetchOrRegenerate(string, integer, Closure, array)
#13 /opt/htdocs/mediawiki/includes/page/WikiPage.php(669): WANObjectCache->getWithSetCallback(string, integer, Closure)
#14 /opt/htdocs/mediawiki/includes/page/WikiPage.php(291): WikiPage->getContentModel()
#15 /opt/htdocs/mediawiki/includes/page/WikiPage.php(278): WikiPage->getContentHandler()
#16 /opt/htdocs/mediawiki/includes/actions/Action.php(98): WikiPage->getActionOverrides()
#17 /opt/htdocs/mediawiki/includes/actions/Action.php(155): Action::factory(string, WikiPage, RequestContext)
#18 /opt/htdocs/mediawiki/includes/MediaWiki.php(155): Action::getActionName(RequestContext)
#19 /opt/htdocs/mediawiki/includes/MediaWiki.php(817): MediaWiki->getAction()
#20 /opt/htdocs/mediawiki/includes/MediaWiki.php(527): MediaWiki->main()
#21 /opt/htdocs/mediawiki/index.php(44): MediaWiki->run()
#22 {main}

Exception caught inside exception handler: [e909bd7e200cd969395fb8f6] /demo/index.php/Main_Page MediaWiki\Storage\NameTableAccessException from line 42 of /opt/htdocs/mediawiki/includes/Storage/NameTableAccessException.php: Failed to access name from slot_roles using id = 2
Backtrace:
#0 /opt/htdocs/mediawiki/includes/Storage/NameTableStore.php(327): MediaWiki\Storage\NameTableAccessException::newFromDetails(string, string, integer)
#1 /opt/htdocs/mediawiki/includes/Revision/RevisionStore.php(1637): MediaWiki\Storage\NameTableStore->getName(integer)
#2 /opt/htdocs/mediawiki/includes/Revision/RevisionStore.php(1608): MediaWiki\Revision\RevisionStore->constructSlotRecords(string, Wikimedia\Rdbms\ResultWrapper, integer, Title)
#3 /opt/htdocs/mediawiki/includes/Revision/RevisionStore.php(1719): MediaWiki\Revision\RevisionStore->loadSlotRecords(string, integer, Title)
#4 [internal function]: MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}()
#5 /opt/htdocs/mediawiki/includes/Revision/RevisionSlots.php(165): call_user_func(Closure)
#6 /opt/htdocs/mediawiki/includes/Revision/RevisionSlots.php(107): MediaWiki\Revision\RevisionSlots->getSlots()
#7 /opt/htdocs/mediawiki/includes/Revision/RevisionRecord.php(192): MediaWiki\Revision\RevisionSlots->getSlot(string)
#8 /opt/htdocs/mediawiki/includes/Revision.php(493): MediaWiki\Revision\RevisionRecord->getSlot(string, integer)
#9 /opt/htdocs/mediawiki/includes/Revision.php(760): Revision->getMainSlotRaw()
#10 /opt/htdocs/mediawiki/includes/page/WikiPage.php(663): Revision->getContentModel()
#11 /opt/htdocs/mediawiki/includes/libs/objectcache/wancache/WANObjectCache.php(1424): WikiPage->{closure}(boolean, integer, array, NULL)
#12 /opt/htdocs/mediawiki/includes/libs/objectcache/wancache/WANObjectCache.php(1278): WANObjectCache->fetchOrRegenerate(string, integer, Closure, array)
#13 /opt/htdocs/mediawiki/includes/page/WikiPage.php(669): WANObjectCache->getWithSetCallback(string, integer, Closure)
#14 /opt/htdocs/mediawiki/includes/page/WikiPage.php(291): WikiPage->getContentModel()
#15 /opt/htdocs/mediawiki/includes/page/WikiPage.php(278): WikiPage->getContentHandler()
#16 /opt/htdocs/mediawiki/includes/actions/Action.php(98): WikiPage->getActionOverrides()
#17 /opt/htdocs/mediawiki/includes/actions/Action.php(155): Action::factory(string, WikiPage, OutputPage)
#18 /opt/htdocs/mediawiki/extensions/CharInsert/includes/CharInsert.php(21): Action::getActionName(OutputPage)
#19 /opt/htdocs/mediawiki/includes/Hooks.php(174): CharInsert::onBeforePageDisplay(OutputPage, SkinVector)
#20 /opt/htdocs/mediawiki/includes/Hooks.php(234): Hooks::callHook(string, array, array, NULL, string)
#21 /opt/htdocs/mediawiki/includes/OutputPage.php(2571): Hooks::runWithoutAbort(string, array)
#22 /opt/htdocs/mediawiki/includes/exception/MWExceptionRenderer.php(134): OutputPage->output()
#23 /opt/htdocs/mediawiki/includes/exception/MWExceptionRenderer.php(53): MWExceptionRenderer::reportHTML(MediaWiki\Storage\NameTableAccessException)
#24 /opt/htdocs/mediawiki/includes/exception/MWExceptionHandler.php(121): MWExceptionRenderer::output(MediaWiki\Storage\NameTableAccessException, integer)
#25 /opt/htdocs/mediawiki/includes/exception/MWExceptionHandler.php(195): MWExceptionHandler::report(MediaWiki\Storage\NameTableAccessException)
#26 /opt/htdocs/mediawiki/includes/MediaWiki.php(558): MWExceptionHandler::handleException(MediaWiki\Storage\NameTableAccessException)
#27 /opt/htdocs/mediawiki/index.php(44): MediaWiki->run()
#28 {main}

@MarkAHershberger Does this happen while upgrading, or after? Where there any errors reported during the upgrade?

Does running maintenance/populateContentTables.php fix it?

@daniel: it happened before and after. I upgraded to see if the fixes here would fix this issue.

Note that this happened first on the -moon wiki and more recently on the -earth wiki. Both are loaded from the same SimpleWiki dump.

populateContentTables.php appears to have already been run. Here is the tail end of the output:

... revision processed up to revision id 3820500 of 3822236 (0 rows in 3.5399911403656 seconds)
... revision processed up to revision id 3821000 of 3822236 (0 rows in 3.5405840873718 seconds)
... revision processed up to revision id 3821500 of 3822236 (0 rows in 3.5413990020752 seconds)
... revision processed up to revision id 3822000 of 3822236 (0 rows in 3.5420250892639 seconds)
... revision processed up to revision id 3822236 of 3822236 (0 rows in 3.542594909668 seconds)
Done populating revision table. Processed 0 rows in 3.5426099300385 seconds
Populating archive...
Done populating archive table. Processed 0 rows in 0.00056695938110352 seconds
Done. Processed 0 rows in 3.5733768939972 seconds

In T224949#5629556, @MarkAHershberger wrote:

@daniel: it happened before and after. I upgraded to see if the fixes here would fix this issue.

So, this started happening for you at some point using 1.33?

Note that this happened first on the -moon wiki and more recently on the -earth wiki. Both are loaded from the same SimpleWiki dump.

How did you import the dumps? Did the error start to occur soon after a (re)import? Is it possible that an import was aborted/failed, and was then restarted?

Failed to access name from slot_roles using id = 2

This is surprising - to my knowledge, the only thing that currently uses an extra slot is WikibaseMediaInfo. Are you using that?

Note by the way that the error itself is caused by data being lost from a normalization table (presumably due to a rollback). The fix we have in place in 1.34 should prevent this from happening, but if the data goes missing in 1.33, upgrading to 1.34 is not going to help. If the data isn't there, the only fix i can think of is to manually insert it into the table.

daniel moved this task from Done to Inbox on the Platform Team Workboards (Clinic Duty Team) board.Nov 4 2019, 9:12 AM

In T224949#5630202, @daniel wrote:

So, this started happening for you at some point using 1.33?

It happened in 1.31, I believe. I *think* that is what moon wiki was on before all this. @Darenwelsh set up both moon and earth wikis for me using meza, which, I think, uses MW 1.31. I loaded them using a dump from simplewiki. I used the ueland/mwdumper docker image to convert the dump to sql and then loaded the tables.sql followed by the generated SQL.

How did you import the dumps? Did the error start to occur soon after a (re)import? Is it possible that an import was aborted/failed, and was then restarted?

You can see the exact process I used here. I used these tools to create an SQL dump that I then loaded with mysql's cli.

This is surprising - to my knowledge, the only thing that currently uses an extra slot is WikibaseMediaInfo. Are you using that?

No. Could the use of mwdumper be a factor?

Note by the way that the error itself is caused by data being lost from a normalization table (presumably due to a rollback). The fix we have in place in 1.34 should prevent this from happening, but if the data goes missing in 1.33, upgrading to 1.34 is not going to help. If the data isn't there, the only fix i can think of is to manually insert it into the table.

k, well, I'm manually inserting the lines I mentioned above.

Untagged from past releases (which it can't block as they already happened), and untagging from 1.34 as it isn't a new regression. Updated task description to document that it affects 1.32+ so that once a fix is available, it can be backported to relevant branches.

• WDoranWMF moved this task from Inbox to Backlog on the Platform Team Workboards (Clinic Duty Team) board.Nov 7 2019, 8:20 PM

Note that during the upgrade I also ran into the same thing that @Absorbcium mentions above: an empty content_models table.

In T224949#5637909, @MarkAHershberger wrote:

You can see the exact process I used here. I used these tools to create an SQL dump that I then loaded with mysql's cli.

Oh, that sounds risky.

This is surprising - to my knowledge, the only thing that currently uses an extra slot is WikibaseMediaInfo. Are you using that?

No. Could the use of mwdumper be a factor?

Seems likely, but I have not investigated.

In T224949#5650149, @daniel wrote:

In T224949#5637909, @MarkAHershberger wrote:

You can see the exact process I used here. I used these tools to create an SQL dump that I then loaded with mysql's cli.

Oh, that sounds risky.

Could you clarify what sounds risky? Is it the use of mwdumper? Or my automation?

In T224949#5653825, @MarkAHershberger wrote:

Could you clarify what sounds risky? Is it the use of mwdumper? Or my automation?

Using mwdumper. mdumper re-implements the MediaWiki database schema, so it would need to be kept up to date with all the schema changes. But it's not:

MWDumper can read MediaWiki XML export dumps (version 0.3, minus uploads), perform optional filtering, and output back to XML or to SQL statements to add things directly to a database in 1.4 or 1.5 schema.

We are on schema version 0.11, and MW version 1.34!

Beware that MWDumper has not been actively maintained since the mid-2000s, and may or may not work with current deployments. Apparently, it can't be used to import into MediaWiki 1.31 or later.

Indeed: with the changes we made for MCR in 1.31, the way revisions are represented in the database has changed significantly. With 1.34, the compat mode for that schema will not longer be the default, and with 1.35, support for the old schema will be dropped. Even since 1.32, it would probably be advisable to always run populateContentTables.php after importing via MWDumper.

Can you use the importDump.php script instead?

daniel moved this task from Backlog to Discussing on the Platform Team Workboards (Clinic Duty Team) board.Nov 11 2019, 8:11 PM

In T224949#5654605, @daniel wrote:

In T224949#5653825, @MarkAHershberger wrote:

Could you clarify what sounds risky? Is it the use of mwdumper? Or my automation?

Using mwdumper. mdumper re-implements the MediaWiki database schema, so it would need to be kept up to date with all the schema changes. But it's not:

Right. I was aware of that when I first set this up, but I think I got it working with 1.27. "Got it working" might not be enough for the upgrade, I admit.

Note that I was testing against 1.32 this afternoon (hoping to get around all the above problems) and ended up in the same exact place.

Can you use the importDump.php script instead?

Since it looks like that's where I am now, I guess so!

So far, so good.

Ran the import with REL1_32 and then ran update.php for REL1_34. No problem except a note:

User name "Გამაგ" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

So, I ran

php maintenance/cleanupUsersWithNoId.php --force -p earthwiki

and ended up with:

... rev_timestamp=20191112022707 rev_id=1
Completed cleanup, assigned 0 and prefixed 1 row(s)
Beginning cleanup of archive
Wikimedia\Rdbms\DBQueryError from line 1603 of /opt/htdocs/mediawiki/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: SELECT  ar_id,ar_user,ar_user_text,ar_id  FROM `archive`    WHERE (1=1)  ORDER BY ar_id LIMIT 100  
Function: CleanupUsersWithNoId::cleanup
Error: 1054 Unknown column 'ar_user' in 'field list' (localhost)

#0 /opt/htdocs/mediawiki/includes/libs/rdbms/database/Database.php(1574): Wikimedia\Rdbms\Database->getQueryExceptionAndLog('Unknown column ...', 1054, 'SELECT  ar_id,a...', 'CleanupUsersWit...')
#1 /opt/htdocs/mediawiki/includes/libs/rdbms/database/Database.php(1152): Wikimedia\Rdbms\Database->reportQueryError('Unknown column ...', 1054, 'SELECT  ar_id,a...', 'CleanupUsersWit...', false)
#2 /opt/htdocs/mediawiki/includes/libs/rdbms/database/Database.php(1807): Wikimedia\Rdbms\Database->query('SELECT  ar_id,a...', 'CleanupUsersWit...')
#3 /opt/htdocs/mediawiki/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->select('archive', Array, Array, 'CleanupUsersWit...', Array)
#4 /opt/htdocs/mediawiki/includes/libs/rdbms/database/DBConnRef.php(315): Wikimedia\Rdbms\DBConnRef->__call('select', Array)
#5 /opt/htdocs/mediawiki/maintenance/cleanupUsersWithNoId.php(153): Wikimedia\Rdbms\DBConnRef->select('archive', Array, Array, 'CleanupUsersWit...', Array)
#6 /opt/htdocs/mediawiki/maintenance/cleanupUsersWithNoId.php(63): CleanupUsersWithNoId->cleanup('archive', Array, 'ar_user', 'ar_user_text', Array, Array)
#7 /opt/htdocs/mediawiki/maintenance/Maintenance.php(1746): CleanupUsersWithNoId->doDBUpdates()
#8 /opt/htdocs/mediawiki/maintenance/doMaintenance.php(99): LoggedUpdateMaintenance->execute()
#9 /opt/htdocs/mediawiki/maintenance/cleanupUsersWithNoId.php(212): require_once('/opt/htdocs/med...')
#10 {main}

for which I will open a new task (so this problem should not keep this task open).

I'll re-close this and assume that the flaws previously reported were the fault of mwdumper.

MarkAHershberger mentioned this in T238043: cleanupUsersWithNoId.php should not depend on ar_user column.Nov 12 2019, 6:23 AM

Kghbln mentioned this in T241977: NameTableAccessException.php: Failed to access name from content_models using id = 2.Jan 6 2020, 9:48 AM

Change 542234 restored by Reedy:
NameTableStore: ensure consistency upon rollback.

https://gerrit.wikimedia.org/r/542234

Change 542234 merged by jenkins-bot:
[mediawiki/core@REL1_33] NameTableStore: ensure consistency upon rollback.

https://gerrit.wikimedia.org/r/542234

ReleaseTaggerBot added a project: MW-1.33-notes.May 28 2020, 4:00 PM

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:40 PM

daniel mentioned this in T277795: User not found by actor ID: [id].Mar 22 2021, 8:44 PM

Failed to access name from slot_roles using id = 1
Closed, ResolvedPublic
Actions

Description

Details

Related Objects

Event Timeline

	Reedy
	Jun 4 2019, 12:09 AM

Failed to access name from slot_roles using id = 1Closed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

Failed to access name from slot_roles using id = 1
Closed, ResolvedPublic
Actions