Page MenuHomePhabricator

Create new translate_cache table on Wikimedia wikis with the Translate extension installed
Closed, ResolvedPublic4 Estimated Story Points

Description

Follow-on for the non-UBN part of T370219: DBQueryError marking page for translation: Table 'mediawikiwiki.translate_cache' doesn't exist.

Table information

Added in 628631: Add persistent translate cache | https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/628631 to meet the needs of T182433: Implement a stronger synchronization in RepoNG and Translate

Why? To keep track of the message groups and messages currently being processed, we needed a persistent cache. For this we looked at existing caching solutions provided by MediaWiki (specifically ObjectCache::getInstance(CACHE_DB)), but these could not be used as they were not persistent.

This was something that is needed only on translatewiki.net, and hence this table wasn't created on Wikimedia wikis.

We're now also storing translation state for pages that translators wish to mark for translation. See: T360409: Page status tri-state for pages not marked for translation

This is a fairly generic table, and in the future, we might store more information in this table as needed.

Table structure

CREATE TABLE /*_*/translate_cache (
  tc_key VARBINARY(255) NOT NULL,
  tc_value MEDIUMBLOB DEFAULT NULL,
  tc_exptime VARBINARY(14) DEFAULT NULL,
  tc_tag VARBINARY(255) DEFAULT NULL,
  INDEX tc_tag (tc_tag),
  PRIMARY KEY(tc_key)
) /*$wgDBTableOptions*/;

More information

Should this table be replicated to wiki replicas (does it not contain private data)?
Yes, we need to create the table on all the wikis where the Translate extension is enabled. It contains information about the translation state of pages: whether they should be marked for translation or not.

It does not contain private data currently.

Will you be doing cross-joins with the wiki metadata?
Currently we're not doing any joins with wiki metadata, and I don't foresee doing so in the future either.

Size of the table (number of rows expected).
With the current usage of the table, there will be at most:

1 record per pages with content model wikitext - (Number of pages marked for translation)

The actual number will be much smaller. This table only contains records for pages that have a translation state set.

Expected growth per year (number of rows).

The worst case scenario for growth will be equal to number of pages created with wikitext content model

Expected amount of queries, both writes and reads (per minute, per hour...per day, any of those are ok)
Rather difficult to predict. On wikis running the translate extension 1 query will be run every time a user views a non translatable wikitext page.

-- Fetch row by key
SELECT tc_key, tc_value, tc_exptime, tc_tag FROM translate_cache WHERE tc_key = "xyz";

Examples of queries that will be using the table
All database operations are carried out in the class: https://github.com/wikimedia/mediawiki-extensions-Translate/blob/master/src/Cache/PersistentDatabaseCache.php

Table is not specific for this use case, so in the future there is a small possibility that the queries might change.

-- Fetch row by key
SELECT tc_key, tc_value, tc_exptime, tc_tag FROM translate_cache WHERE tc_key = "xyz";

-- Fetch rows by tag
SELECT tc_key, tc_value, tc_exptime, tc_tag FROM translate_cache WHERE tc_tag = "tag";

-- Insert values into the table
INSERT INTO users (tc_key, tc_value, tc_exptime, tc_tag)
VALUES ('key', 'value', null, 'tag')
ON DUPLICATE KEY UPDATE tc_key = 'key';

-- Update expiry time based on key
UPDATE translate_cache SET tc_exptime = 'exp' WHERE tc_key = "key";

-- Delete entry with key
DELETE FROM translate_cache WHERE tc_key = "key";

-- Delete entries with tag
DELETE FROM translate_cache WHERE tc_tag= "tag";

See https://wikitech.wikimedia.org/wiki/Creating_new_tables for more.

Event Timeline

T272957: Mark mediawikiwiki.translate_cache as private so it doesn't replicate to wiki replicas makes it sound like the plan was to add this table after T272428: Error 1146: Table 'mediawikiwiki.translate_cache' doesn't exist happened in 2021, but I guess that was not followed up on at the time.

And a reminder to stop this regressing again in future; sql/translate_cache.sql should be added to createExtensionTables.php in MediaWiki-extensions-WikimediaMaintenance so the table is there on any new wikis where Translate gets enaled

Thanks @Jdforrester-WMF for creating the task

Till date that table was not needed on any of the Wikimedia production wikis but we started using the table very recently and I forgot that the table hasn't been created yet. Since its a generic table, it should have been created back then even though we would not have used it for anything.

Is this table really needed or is it a code problem? There're more tickets about the reported error the mentioned T370219: DBQueryError marking page for translation: Table 'mediawikiwiki.translate_cache' doesn't exist and an older one T272428: Error 1146: Table 'mediawikiwiki.translate_cache' doesn't exist

We'll now have to create the table as we're using it to for a feature that we plan to roll out on production wikis.

You still need to get the design and schema approved before creating it in production: https://wikitech.wikimedia.org/wiki/Creating_new_tables

ABran-WMF changed the task status from Open to In Progress.Jul 18 2024, 6:15 AM
ABran-WMF triaged this task as Medium priority.
ABran-WMF moved this task from Triage to Blocked on the DBA board.
abi_ updated the task description. (Show Details)

My review of the design:

  • I need to understand why we are going to have a cache table. Why memcached or something else is not working here?
  • "1 query will be run every time a user views a non translatable wikitext page." is a decent chunk, can you cache the value in memcached for shorter period of time on top of the reads? (using page_touched as key to make invalidation work out of the box?)
  • This should be in x1, it can be per-wiki. That's fine (we have a database for each wiki in x1). You can use virtual domains to make this work easily.

I need to understand why we are going to have a cache table. Why memcached or something else is not working here?

Following reasons:

  1. We initially planned to use the object cache for group synchronization cache (T182433, but noticed that it is cleared when update.php is run.
  2. Its convenient to have the ability to run SQL queries against the data to find what we need.
  3. We need some values stored in the table to remain forever and only be expunged manually.

"1 query will be run every time a user views a non translatable wikitext page." is a decent chunk, can you cache the value in memcached for shorter period of time on top of the reads? (using page_touched as key to make invalidation work out of the box?)

Let me expand on this a bit more:

1 query will be run every time a named user views a non translatable wikitext page.

We first run a query to check if the user is a recent editor (edited the page in the last 3 days) of the page, if so then another query is run to check if a value is already set for the page. The second query is done on a primary key column.

Relevant code is here: https://gerrit.wikimedia.org/g/mediawiki/extensions/Translate/+/90eb9da4f0e8417657f6e28fab9882d47a67fd5a/src/PageTranslation/TranslatablePageView.php#49

This should be in x1, it can be per-wiki. That's fine (we have a database for each wiki in x1). You can use virtual domains to make this work easily.

This https://www.mediawiki.org/wiki/Manual:$wgVirtualDomainsMapping right?

"1 query will be run every time a user views a non translatable wikitext page." is a decent chunk, can you cache the value in memcached for shorter period of time on top of the reads? (using page_touched as key to make invalidation work out of the box?)

Let me expand on this a bit more:

1 query will be run every time a named user views a non translatable wikitext page.

We first run a query to check if the user is a recent editor (edited the page in the last 3 days) of the page, if so then another query is run to check if a value is already set for the page. The second query is done on a primary key column.

I’ve proposed previously in some code review (I’m too lazy to look it up now 😛) to expand this to all non-IP users, i.e. to include temp users. I guess I should stop pursuing it for performance reasons? Or temp+named users is still okay, as long as IPs are short-circuited based on the user name?

named users are bulk of page views in any wiki anyway. Logged out users get the CDN cache hit most of the time. Temp users will be a little bump so it doesn't matter in terms of load. If you check for recent edits beforehand, that should reduce the load dramatically. So caching is not needed but it should be in x1

This https://www.mediawiki.org/wiki/Manual:$wgVirtualDomainsMapping right?

Yes.

Temp users will be a little bump so it doesn't matter in terms of load.

Okay, so I don’t have to give up on this idea yet. Thanks for confirming!

Change #1059104 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] PersistentDatabaseCache: Use IConnectionProvider

https://gerrit.wikimedia.org/r/1059104

Change #1059104 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] PersistentDatabaseCache: Use IConnectionProvider

https://gerrit.wikimedia.org/r/1059104

Change #1059105 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] PersistentDatabaseCache: Use virtual domain with translate_cache table

https://gerrit.wikimedia.org/r/1059105

abi_ set the point value for this task to 4.

Change #1059105 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] PersistentDatabaseCache: Use virtual domain with translate_cache table

https://gerrit.wikimedia.org/r/1059105

Change #1070219 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] TranslatablePageView: Use recentchanges to check for recent edits

https://gerrit.wikimedia.org/r/1070219

Change #1070219 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] TranslatablePageView: Use recentchanges to check for recent edits

https://gerrit.wikimedia.org/r/1070219

Created the table on testwiki

mwscript sql.php --wiki=testwiki --cluster=extension1 /srv/mediawiki/php-1.43.0-wmf.24/extensions/Translate/sql/mysql/translate_cache.sql

Checked if the table is created

mwscript mysql.php --wiki testwiki --cluster extension1

Table description:

wikiadmin2023@10.64.16.208(testwiki)> DESCRIBE translate_cache;
+------------+----------------+------+-----+---------+-------+
| Field      | Type           | Null | Key | Default | Extra |
+------------+----------------+------+-----+---------+-------+
| tc_key     | varbinary(255) | NO   | PRI | NULL    |       |
| tc_value   | mediumblob     | YES  |     | NULL    |       |
| tc_exptime | varbinary(14)  | YES  |     | NULL    |       |
| tc_tag     | varbinary(255) | YES  | MUL | NULL    |       |
+------------+----------------+------+-----+---------+-------+

I'll enable this feature on a few namespaces on testwiki to start with.

List of wikis to create the table in:

  • advisorswiki
  • aewikimedia
  • amwikimedia
  • azwikimedia
  • bdwikimedia
  • betawikiversity
  • bewikimedia
  • brwikimedia
  • cawikimedia
  • collabwiki
  • commonswiki
  • foundationwiki
  • frwiktionary
  • gewikimedia
  • grwikimedia
  • hiwikimedia
  • idwikimedia
  • incubatorwiki
  • legalteamwiki
  • maiwikimedia
  • mediawikiwiki
  • metawiki
  • nowikimedia
  • otrs_wikiwiki
  • outreachwiki
  • plwikimedia
  • ptwikisource
  • punjabiwikimedia
  • ruwikimedia
  • sewikimedia
  • sourceswiki
  • specieswiki
  • testcommonswiki
  • testwiki
  • testwikidatawiki
  • u4cwiki
  • uawikimedia
  • vewikimedia
  • wbwikimedia
  • wikidatawiki
  • wikifunctionswiki
  • wikimania2012wiki
  • wikimania2013wiki
  • wikimania2014wiki
  • wikimania2015wiki
  • wikimania2016wiki
  • wikimania2017wiki
  • wikimania2018wiki
  • wikimaniawiki

These are wikis where translate extension is enabled.

Command to run:

mwscript sql.php --wiki=<wiki-id> --cluster=extension1 /srv/mediawiki/php-1.43.0-wmf.28/extensions/Translate/sql/mysql/translate_cache.sql

We have a translate.dblist so you can use ‘foreachwikiindblist’

Change #1082546 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[operations/puppet@production] tables-catalog: Add translate_cache table

https://gerrit.wikimedia.org/r/1082546

Change #1082602 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[operations/puppet@production] createExtensionTables: Add translate_cache, translate_message_group_subscriptions tables

https://gerrit.wikimedia.org/r/1082602

Change #1082602 abandoned by Abijeet Patro:

[operations/puppet@production] createExtensionTables: Add translate_cache, translate_message_group_subscriptions tables

Reason:

Incorrectly submitted

https://gerrit.wikimedia.org/r/1082602

Change #1082603 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/WikimediaMaintenance@master] createExtensionTables: Add tables for translate extension

https://gerrit.wikimedia.org/r/1082603

Change #1082603 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMaintenance@master] createExtensionTables: Add tables for translate extension

https://gerrit.wikimedia.org/r/1082603

Change #1082546 merged by Ladsgroup:

[operations/puppet@production] tables-catalog: Add translate_cache table

https://gerrit.wikimedia.org/r/1082546

Table has been created on all wikis were Translate extension is enabled even if features that use the table haven't been enabled. We also updated the scripts that run when a new wiki is created in order to automatically create the table.