Page MenuHomePhabricator

Translate should have a way to configure readable and writable ttm services separately
Closed, ResolvedPublic8 Estimated Story Points

Description

Outcome

We've added a new configuration parameter to define services that are writable. Eg:

<?php
$wgTranslateTranslationServices['TTMServer'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
	'cutoff' => 0.75,
	'use_wikimedia_extra' => true,
	'public' => false,
	'writable' => true // New configuration
];

The following rules are enforced:

  • If writable is specified, services marked as writable are considered write only and others are considered read only.
  • If no service is specified as writable then services are considered both readable and writable.
  • The default service must always be readable.

If a service is marked as writable, the mirrors configuration will not be allowed.

Possible example configurations can be found here: https://phabricator.wikimedia.org/T322284#8729298

Documentation has been updated: https://www.mediawiki.org/w/index.php?title=Help%3AExtension%3ATranslate%2FTranslation_memories&diff=5938992&oldid=5931722


As a maintainer of the WMF search infrastructure I want the translate extension to be able to configure readable and writable ttm backends separately so that I can rapidly switch the read traffic to a particular elasticsearch cluster using etcd and dns discovery without shipping a patch to the mediawiki-config (c.f. T143553).

Currently the translate extension does allow writing to multiple datacenter using the mirrors config on the ttm service definition, sadly this assumes that the mirrored service is writable.

A solution could be to introduce a new config var named $wgTranslateTranslationWritableServices holding an array of services to update and if set $wgTranslateTranslationDefaultService would just act as the default service for read operations. If $wgTranslateTranslationWritableServices is null then translate should behave the same way as before.

Open question: what to do if $wgTranslateTranslationWritableServices is an empty array?

AC:

  • Translate is updated to support a new $wgTranslateTranslationWritableServices config entry.
  • The mediawiki config is updated to have 3 elastic ttm servers: default (read-only), eqiad and codfw as write only with $wgTranslateTranslationWritableServices = [ 'eqiad', 'codfw' ];

Event Timeline

@Nikerabbit we were wondering if this task could be added to one of your sprint early next year? It's not yet urgent on our side but is blocking T143553.

I added this to this our Localisation Infrastructure planning document to be considered for future sprints.

To paraphrase your request to ensure I understand it:

  • It should be possible to define multiple TTM services
  • It should be possible to define which ones are writable
  • It should be possible to define which one is used for read queries (only one?)

There are also the API-based readable TTM services, which we can mostly ignore for this discussion as they are not used currently.

Some questions:

  • Should we remove the mirrors option in favor or this more flexible system, or is it needed for something else?
  • How does etcd work and what are it's limitations? E.g. can it be used to change config variables? Is it limited to only string values?
  • We have the Readable/Writable/Searchable interfaces. It feels that these should be removed and converted to instance methods?

I added this to this our Localisation Infrastructure planning document to be considered for future sprints.

Thanks!

To paraphrase your request to ensure I understand it:

  • It should be possible to define multiple TTM services
  • It should be possible to define which ones are writable
  • It should be possible to define which one is used for read queries (only one?)

Yes all this is correct, I think only one for read operations is necessary.

There are also the API-based readable TTM services, which we can mostly ignore for this discussion as they are not used currently.

Some questions:

  • Should we remove the mirrors option in favor or this more flexible system, or is it needed for something else?

Indeed, mirroring now becomes obsolete if we have a way to configure multiple writable TTM servers. So I believe it might be worthwhile to deprecate this feature in favor of this new setting.

  • How does etcd work and what are it's limitations? E.g. can it be used to change config variables? Is it limited to only string values?

In fact etcd will only be used to switch a dns discovery record, I don't think it can be used to change anything directly in the mediawiki-config yet.
My current understanding is that we will have 3 search endpoints (using fake names here):

  • elasticsearch-ro.discovery.wmnet: which may point to either elasticsearch.eqiad.wmnet or elasticsearch.codfw.wmnet, during normal operations it will be active/active: a MW app server running in codfw will hit elasticsearch.codfw.wmnet when targeting elasticsearch-ro.discovery.wmnet.
  • elasticsearch.eqiad.wmnet
  • elasticsearch.codfw.wmnet

The mediawiki config would have:

$wgTranslateTranslationDefaultService = 'elasticsearch-ro.discovery.wmnet';
$wgTranslateTranslationWritableServices = [ 'elasticsearch.eqiad.wmnet', 'elasticsearch.codfw.wmnet' ];

The mediawiki config does in fact remain untouched when changing the route from elasticsearch-ro.discovery.wmnet to either elasticsearch.eqiad.wmnet or elasticsearch.codfw.wmnet. I mentioned etcd here because I believe it is used to store this information but I don't know all the details.

  • We have the Readable/Writable/Searchable interfaces. It feels that these should be removed and converted to instance methods?

I agree, if we leave these interfaces around we might have to instantiate an ElasticSearchTTMServer for the read-only service elasticsearch-ro.discovery.wmnet only for its search behaviors (and we really don't want to write to it), I believe it might work but is not ideal I agree. Removing these interface in favor of instance methods might be more flexible and could allow to reuse ElasticSearchTTMServer as a read-only service. Another approach (less nice imo) would be to split ElasticSearchTTMServer in two services a separate read-only and writable implementation and keep these interfaces around.

Nikerabbit moved this task from Backlog to TTMServer on the MediaWiki-extensions-Translate board.
Nikerabbit set the point value for this task to 8.

Thanks for the clarification. So essentially we need to remove the assumption that the default TTM server is writable and allow independent configuration of writable services.

You have proposed a new configuration variable for that. Another option to consider would be having a writable boolean property in the service config itself. Overall, this new system seems both simpler and more flexible than the mirrors property. I do not know if anyone outside of WMF is using the mirrors property, but I think we want to support that simultaneously with the new method to ensure a smooth migration.

Change 887969 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] TtmServer: Add support for writable key in configuration

https://gerrit.wikimedia.org/r/887969

Change 888010 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] TTMServerMessageUpdateJob: Add support for writable TtmServers

https://gerrit.wikimedia.org/r/888010

Change 890443 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] TTMServerAid: Remove writable services from queryable services

https://gerrit.wikimedia.org/r/890443

Change 891747 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] Deprecate TTMServer::getMirror in favor of writable services

https://gerrit.wikimedia.org/r/891747

We've added a new configuration parameter to define services that are writable. Eg:

<?php
$wgTranslateTranslationServices['TTMServer'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
	'cutoff' => 0.75,
	'use_wikimedia_extra' => true,
	'public' => false,
	'writable' => true // New configuration
];

The following rules are enforced:

  • If writable is specified, services marked as writable are considered write only and others are considered read only.
  • If no service is specified as writable then services are considered both readable and writable.
  • The default service must always be readable.

If a service is marked as writable, the mirrors configuration will not be allowed.

Some example configuration

Case 1 - Default service is writable and other readable services configured

<?php

$wgTranslateTranslationServices['ttm0'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
	'writable' => true
        // ...
];

$wgTranslateTranslationServices['ttm1'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
	'writable' => true
        // ...
];


$wgTranslateTranslationServices['ttm2'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
        // ...
];

$wgTranslateTranslationDefaultService = 'ttm0';

This configuration is not allowed as the default service is marked as writable. The default service must always be readable.

Case 2 - Readable default service and other writable services configured

<?php

$wgTranslateTranslationServices['ttm0'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
        // ...
];

$wgTranslateTranslationServices['ttm1'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
	'writable' => true
        // ...
];


$wgTranslateTranslationServices['ttm2'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
        // ...
];

$wgTranslateTranslationDefaultService = 'ttm0';

For writes, ttm1 will be used. For reads, ttm0, ttm2 can be used.

Case 3 - No explicitly writable service configured

<?php

$wgTranslateTranslationServices['ttm0'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
        // ...
];

$wgTranslateTranslationServices['ttm2'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
        // ...
];

$wgTranslateTranslationDefaultService = 'ttm0';

For writes the default translation service (ttm0) will be used even though its not explicitly marked as writable if it implements the WritableTtmServer interface at home.
For reads, ttm0, ttm2 will be used.

Case 4 - No readable service configured

<?php

$wgTranslateTranslationServices['ttm0'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
	'writable' => true
        // ...
];

$wgTranslateTranslationServices['ttm1'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
	'writable' => true
        // ...
];

No readable service configured and no default service specified. For writes, ttm0 and ttm1 will be used but translation memory will not be available since there are no readable services.

Case 5 - Single service - read and write

<?php

$wgTranslateTranslationServices['ttm0'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
        // ...
];

$wgTranslateTranslationDefaultService = 'ttm0';

Same service ttm0 used for reading and writing.

Case 6 - Multiple services with single readable service

<?php
 $wgTranslateTranslationServices['dc0'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
        // ...
];

$wgTranslateTranslationServices['dc1'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
	'writable' => true
        // ...
];


$wgTranslateTranslationServices['dc2'] = [
	'type' => 'ttmserver',
	'class' => 'ElasticSearchTTMServer',
	'writable' => true
        // ...
];

$wgTranslateTranslationDefaultService = 'dc0';

In this case dc0 will be the readable server, and the others (dc1, dc2`) will be write only.

I think you are missing examples for the two cases that I think are going to be most common:

  1. Only one service configured (read + write)
  2. 3+ services configured, where default service is read-only with write-only services for each dc

Change 903688 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] TtmServerFactory: Rename getDefault to getDefaultForRead

https://gerrit.wikimedia.org/r/903688

Change 887969 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] TtmServer: Add support for writable key in configuration

https://gerrit.wikimedia.org/r/887969

Change 888010 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] TTMServerMessageUpdateJob: Add support for writable TtmServers

https://gerrit.wikimedia.org/r/888010

Change 903688 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] TtmServerFactory: Rename getDefault to getDefaultForRead

https://gerrit.wikimedia.org/r/903688

Change 890443 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] TTMServerAid: Remove writable services from queryable services

https://gerrit.wikimedia.org/r/890443

Change 891747 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] Deprecate TTMServer::getMirror in favor of writable services

https://gerrit.wikimedia.org/r/891747

Change 911758 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] DatatabaseTtmServer: Use MessageHandle in batchInsertDefinitions

https://gerrit.wikimedia.org/r/911758

Change 911758 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] DatatabaseTtmServer: Use MessageHandle in batchInsertDefinitions

https://gerrit.wikimedia.org/r/911758

The patches for this change have been deployed on Meta-Wiki and other Wikimedia sites with wmf/1.41.0-wmf.6.

I see that the suggestions from translation memory appear, and Special:SearchTranslation is continuing to work.

Work pending on this is to,

  • Update documentation
  • Test by configuring readable and writable ttm services separately

@dcausse We've added a new writable configuration for translation memory. The translation memory services marked as writable will be write only. You can find more details in the documentation here: https://www.mediawiki.org/wiki/Help:Extension:Translate/Translation_memories#Configuration (Search with the writable keyword). Additionally, this comment on Phabricator has more examples.

Let me know if you have any questions.

We did a lot of refactoring along while working on this feature. These changes have been on Wikimedia sites for a while, and the existing functionality is not affected. But we have not tested the new writable configuration on production yet.

@abi_ thanks for all the work on this!

We'll get a config patch up shortly to enable this new config system.

Change 922481 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/mediawiki-config@master] ttm: use new config option to separate readable and writable services

https://gerrit.wikimedia.org/r/922481

Change 922481 merged by jenkins-bot:

[operations/mediawiki-config@master] ttm: use new config option to separate readable and writable services

https://gerrit.wikimedia.org/r/922481

Mentioned in SAL (#wikimedia-operations) [2023-06-06T07:27:03Z] <dcausse@deploy1002> Started scap: Backport for [[gerrit:922481|ttm: use new config option to separate readable and writable services (T322284)]]

Mentioned in SAL (#wikimedia-operations) [2023-06-06T07:28:24Z] <dcausse@deploy1002> dcausse: Backport for [[gerrit:922481|ttm: use new config option to separate readable and writable services (T322284)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-06-06T07:42:24Z] <dcausse@deploy1002> Finished scap: Backport for [[gerrit:922481|ttm: use new config option to separate readable and writable services (T322284)]] (duration: 15m 20s)

The configuration changes to use the new writable option has been deployed on production. We tested the following:

  • SearchTranslations page
  • Translation memory appears on the tux window
  • When using a suggested translation, the number of uses of that translation increases in the translation memory suggestions

Moving this task to the done column, but leaving it open for a few days to monitor.

Logs look clean, and no errors on Logstash.