Page MenuHomePhabricator

Error 1146: Table 'mediawikiwiki.translate_cache' doesn't exist
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error message
[YAdSVQpAAEEAAJNrP40AAAAA] /rpc/RunSingleJob.php   Wikimedia\Rdbms\DBQueryError: Error 1146: Table 'mediawikiwiki.translate_cache' doesn't exist (10.64.32.136)
Function: MediaWiki\Extension\Translate\Cache\PersistentDatabaseCache::has
Query: SELECT  tc_key  FROM `translate_cache`    WHERE tc_key = 'a367c7111ceede55784f2b74e0072819249aa50f_page-Manual:Database access_Translations:Manual:Database_access/Page_display_title/en'  LIMIT 1
Stack Trace
from /srv/mediawiki/php-1.36.0-wmf.27/includes/libs/rdbms/database/Database.php(1702)
#0 /srv/mediawiki/php-1.36.0-wmf.27/includes/libs/rdbms/database/Database.php(1686): Wikimedia\Rdbms\Database->getQueryException(string, integer, string, string)
#1 /srv/mediawiki/php-1.36.0-wmf.27/includes/libs/rdbms/database/Database.php(1661): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
#2 /srv/mediawiki/php-1.36.0-wmf.27/includes/libs/rdbms/database/Database.php(1230): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#3 /srv/mediawiki/php-1.36.0-wmf.27/includes/libs/rdbms/database/Database.php(1910): Wikimedia\Rdbms\Database->query(string, string, integer)
#4 /srv/mediawiki/php-1.36.0-wmf.27/includes/libs/rdbms/database/Database.php(2010): Wikimedia\Rdbms\Database->select(string, string, array, string, array, array)
#5 /srv/mediawiki/php-1.36.0-wmf.27/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->selectRow(string, string, array, string)
#6 /srv/mediawiki/php-1.36.0-wmf.27/includes/libs/rdbms/database/DBConnRef.php(331): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#7 /srv/mediawiki/php-1.36.0-wmf.27/extensions/Translate/src/Cache/PersistentDatabaseCache.php(79): Wikimedia\Rdbms\DBConnRef->selectRow(string, string, array, string)
#8 /srv/mediawiki/php-1.36.0-wmf.27/extensions/Translate/src/Synchronization/GroupSynchronizationCache.php(150): MediaWiki\Extension\Translate\Cache\PersistentDatabaseCache->has(string)
#9 /srv/mediawiki/php-1.36.0-wmf.27/extensions/Translate/utils/MessageUpdateJob.php(311): MediaWiki\Extension\Translate\Synchronization\GroupSynchronizationCache->isMessageBeingProcessed(string, string)
#10 /srv/mediawiki/php-1.36.0-wmf.27/extensions/Translate/utils/MessageUpdateJob.php(136): MessageUpdateJob->removeFromCache(Title)
#11 /srv/mediawiki/php-1.36.0-wmf.27/extensions/Translate/tag/TranslationsUpdateJob.php(65): MessageUpdateJob->run()
#12 /srv/mediawiki/php-1.36.0-wmf.27/extensions/EventBus/includes/JobExecutor.php(79): TranslationsUpdateJob->run()
#13 /srv/mediawiki/rpc/RunSingleJob.php(76): MediaWiki\Extension\EventBus\JobExecutor->execute(array)
#14 {main}
Notes

Observed 14 of these in 1.36.0-wmf.27. Didn't find anything in phab so rolled back, marking as a blocker for now.

It is related to T182433: Implement a stronger synchronization in RepoNG and Translate. Strong synchronization is a feature used for synchronizing messages from the file system. This feature uses a new database table.

File based message groups are disabled on all Wikimedia wikis. This is why we (Language team) did not request creation of new tables in production. We assumed that since this feature is unused, we won't be hitting any code paths that use the new table. However, we missed that one code path related to it was executed for all types of message groups.

We will do a patch that disables this code path by default to unblock the train. Creation of the tables will be handled through the normal process at later point of time.

Event Timeline

brennen triaged this task as Unbreak Now! priority.Jan 19 2021, 10:00 PM
brennen moved this task from Backlog to Logs/Train on the User-brennen board.

Code defining and using this table was added for T182433: Implement a stronger synchronization in RepoNG and Translate in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/628631 which would have first rolled to production in wmf.25; not sure if there's a config flag preventing code attempting to read from the DB before it's created, but looks like not. Nothing in the SAL about creating the table; should be trivial, but oops.

I think https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/606424/54/utils/MessageUpdateJob.php activated the code. Someone probably should have checked the schema changes were made, oops like James says.

Aha, yeah, that's it.

Creating the table on all wikis is trivial, but I'd like the Language team to decide what to do here in case this wasn't meant to go live yet.

And a reminder to stop this regressing again in future; sql/translate_cache.sql should be added to createExtensionTables.php in MediaWiki-extensions-WikimediaMaintenance so the table is there on any new wikis where Translate gets enaled

Urbanecm subscribed.

Aha, yeah, that's it.

Creating the table on all wikis is trivial, but I'd like the Language team to decide what to do here in case this wasn't meant to go live yet.

IMHO the patch should be reverted. Creating a table is out of scope for fixing a train blocker, that should be coordinated by Language team later.

We're coming up on 15:00 Pacific. Per policy and discussion in #wikimedia-operations, we'll call the train stopped here for the day and await Language team input.

And new tables in production should be coordinated with DBAs before rolling out.

Please, before creating the table, let us know if the table can be replicated entirely to our wikireplicas or it needs to be filtered as it might contain sensitive information.

Let me summarise:

  1. We added a new table but did not update https://phabricator.wikimedia.org/T272428#6759854
  2. The table was added a while back, but the code to use it was not being triggered from any codepath.
  3. Recently we added code that triggers that code path and hence we are seeing that issue - https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/606424/54/utils/MessageUpdateJob.php
  4. The table is fairly generic and can be used as a cache for anything, but currently is being used for one specific feature (T182433) that we need on Translatewiki
  5. The flag I'm planning to add should avoid running the code path on Mediawiki instances, fixing the issue for now.
  6. We should still create the table on Mediawiki instances because we may use the table for some other purpose in the future.

Will discuss with Niklas about his thoughts on 5) when he is online but I will go ahead and submit a WIP patch for now.

Change 657229 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] Add flag to toggle the usage of the group synchronization cache

https://gerrit.wikimedia.org/r/657229

@abi_ is the data on that table public?

The data in the table is available on a page that is currently disabled on MediaWiki and I believe other Wikimedia sites (https://commons.wikimedia.org/wiki/Special:ManageMessageGroups). As of now the table does not store any sensitive information but that cannot be guaranteed for the future, and will have to be verified by code reviews if the table is used for caching other things.

@abi_ is the data on that table public?

The data in the table is available on a page that is currently disabled on MediaWiki and I believe other Wikimedia sites (https://commons.wikimedia.org/wiki/Special:ManageMessageGroups). As of now the table does not store any sensitive information but that cannot be guaranteed for the future, and will have to be verified by code reviews if the table is used for caching other things.

@abi_ thanks for the information. Then we need someone to be in charge of that decision/process, as if we expose the table as it is now, it might be ok, but if in the future that changes, then we'd need to work that out before it happens.
So I would like someone to make that call, either we filter it now and do not replicate it or we get someone to be officially in charge of that, as otherwise it might mean we can have data leaks.

If the table doesn't really contain useful/relevant information for the users (cached stuff normally isn't too useful {{citation needed}}) I would recommend filtering it now before it gets deployed.

Can someone make that call? Maybe someone from the Security-Team?

We can unblock the train without the creation of this table, so I suggest that we will discuss the details using the normal DBA process without a time pressure.

@Marostegui The patch that fixes this particular issue: 657229: Add flag to toggle the usage of the group synchronization cache will remove the need for this table to be present on Wikimedia wikis.

Quoting Niklas:

Creation of the tables will be handled through the normal process at later point of time.

@Marostegui The patch that fixes this particular issue: 657229: Add flag to toggle the usage of the group synchronization cache will remove the need for this table to be present on Wikimedia wikis.

Sure, that's fine by me - but I will leave that to the train conductor :)

Quoting Niklas:

Creation of the tables will be handled through the normal process at later point of time.

That works for me - keep in mind that DBAs do not create the tables, we just need to know if the table is public or not. Tables creation should be done through a normal deployment process see: https://wikitech.wikimedia.org/wiki/Schema_changes#What_is_not_a_schema_change

Thanks!

Change 657229 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Add flag to toggle the usage of the group synchronization cache

https://gerrit.wikimedia.org/r/657229

Change 657306 had a related patch set uploaded (by Nikerabbit; owner: Abijeet Patro):
[mediawiki/extensions/Translate@wmf/1.36.0-wmf.27] Add flag to toggle the usage of the group synchronization cache

https://gerrit.wikimedia.org/r/657306

Change 657306 merged by jenkins-bot:
[mediawiki/extensions/Translate@wmf/1.36.0-wmf.27] Add flag to toggle the usage of the group synchronization cache

https://gerrit.wikimedia.org/r/657306

Nikerabbit lowered the priority of this task from Unbreak Now! to High.Jan 20 2021, 10:22 AM

To train conductor: A fix has been backported to 1.36.0-wmf.27 and merged, but not deployed.

To train conductor: A fix has been backported to 1.36.0-wmf.27 and merged, but not deployed.

Please, to avoid confusing whoever uses deployment host next, do not +2 stuff in wmf/* branches unless you're going to personally deploy it. To propose a backport, creating the patch is enough. +2 in wmf/* branches indicates "I'm deploying this".

Yes. To quote https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Problem:_undeployed_code

The problem is that sometimes, people merge things into a deployment branch and then don't deploy them. This is a terrible habit that should be squashed. If you merge something into a wmf branch, you have a responsibility to either deploy it yourself very soon, make sure that someone deploys it very soon, or revert it if you can't make those things happen. The deployment branch should reflect the current state of the cluster, except during those brief moments where something is about to be deployed or in the process of being deployed.

Is this not the second time in 2 weeks this has happened with WMF branches?

My apologies. I thought it's not a problem if 1.36.0-wmf.27 is not deployed to outside test wikis. Will not do it again.

Mentioned in SAL (#wikimedia-operations) [2021-01-20T13:20:45Z] <urbanecm@deploy1001> Synchronized php-1.36.0-wmf.27/extensions/Translate/: 20decbd5cc3de0af655b9419cf69fc442ab056a4: Add flag to toggle the usage of the group synchronization cache (T272428; T182433) (duration: 01m 10s)

Change 657337 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] Set wgTranslateGroupSynchronizationCache to false explicitly

https://gerrit.wikimedia.org/r/657337

Deployed and tested on test.wikipedia.org with help of Urbanecm.

Thanks all for your assistance. Will roll the train forward to group0 shortly.

Thanks

Deployed and tested on test.wikipedia.org with help of Urbanecm.

sbassett added subscribers: JFishback_WMF, sbassett.

@abi_ thanks for the information. Then we need someone to be in charge of that decision/process, as if we expose the table as it is now, it might be ok, but if in the future that changes, then we'd need to work that out before it happens.
So I would like someone to make that call, either we filter it now and do not replicate it or we get someone to be officially in charge of that, as otherwise it might mean we can have data leaks.

If the table doesn't really contain useful/relevant information for the users (cached stuff normally isn't too useful {{citation needed}}) I would recommend filtering it now before it gets deployed.

Can someone make that call? Maybe someone from the Security-Team?

Out of an abundance of caution, it's probably the lowest risk to have the table not be replicated for now, until it undergoes some kind of privacy/legal review (cc: @JFishback_WMF).

Thanks @sbassett - we can definitely filter the table, but I would appreciate if we can come with a conclusion on whether it is likely that it will hold private data or not, as re-adding the table back can be a bit of an overhead as we'd need to re-add it for more than 900 wikis, manually.
Not sure how much in a rush we are to get this table created - but if we can take sometime to analyze the data that is likely to be there and whether it might be useful in the future to expose it on the wiki replicas, it can help to avoid working twice, especially on such a tedious work.

FWIW, Translate is only used on circa ~40 wikis.

Ah, that's good to know - thanks!. Still, let's try to come up with a decision that isn't likely to be changed in the short/medium term.

And TBH, if it's just a cache, there is probably little to no value replicating it anyway...

We don't replicate other numerous cache tables which are otherwise exposed quite widely through special pages and the API - https://github.com/wikimedia/puppet/blob/8a94fe4b4ae79386f92083bc413b60c76c52ce5d/manifests/realm.pp#L223-L226

Agreed! As I mentioned at T272428#6760729 there's probably not useful data there, so we can probably filter it and forget about it - but not my call :-)

From a privacy perspective, if this table is merely cached data, I would not recommend replicating it, in accordance with the principle of data minimization. If there is some compelling reason for replicating it, please feel free to ping me and I can conduct a privacy risk analysis, but for now I'm going to mark as completed for Privacy Engineering.

@abi_ let's make it private then?

I agree. The purpose of the table is to act as persistent temporary storage. The data in there is not (currently) private, but it isn't useful either.

Change #657337 abandoned by Urbanecm:

[operations/mediawiki-config@master] Set wgTranslateGroupSynchronizationCache to false explicitly

Reason:

https://gerrit.wikimedia.org/r/657337