Page MenuHomePhabricator

TTMServer performance and coverage issues
Open, MediumPublic1 Estimated Story Points

Description

The latest fixes to TTMServer done some months ago are not enough. During translation rally at translatewiki.net, translation memory was using too much cpu time. In addition there have been reports and observations that suggestions are not found, for example when translating the tech news with many repeating parts.

During the Lyon hackathon I spoke with David Chan who suggested to replace the current FuzzyLikeThis query with checking some ngrams from beginning and end of the strings. Those need to be stored separately at indexing time unless there is a way to instruct ES to do it for us. In any case short one to three word strings need special attention.

It seems that current performance bottleneck is fetching too many string contents for comparison and scoring, not the scoring itself.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

We should probably let users know that starting next week, thanks to @Phoenix303, they should get faster translation suggestions and that they should report any weirdness.

Arrbee removed a project: LE-Sprint-88.

I do not experience a notable speedup with TM suggestions.

At least when translating the weekly tech newsletter in MW: invariant or next-to-invariant strings of more than, say, 5 characters length are never found in TM. Maybe, this is another issue that has to be investigated separately.

Nemo_bis added a subscriber: dcausse.

I think one of the problem with this function is that it uses very slow elasticsearch functionalities:

  • fuzzy like this: deprecated and will be removed in elasticsearch 2.0 due to perf issues
  • function score on all docs: the levenshtein distance will be applied on all docs returned by the fuzzy like this, we could optimize this part by running this function score inside the rescore phase which would allow us to compute the distance on a limited number of docs thanks to the rescore window.

Concerning the fuzzy like this I would suggest investigating into another approach based on char n-gram.

I'm having problems with the December issue of the VE newsletter. I was able to get the translation memory just once, but then fixing something on the source code and then going back to translate isn't bringing back the TM for me :/ Anything I could do?

I'm having problems with the December issue of the VE newsletter. I was able to get the translation memory just once, but then fixing something on the source code and then going back to translate isn't bringing back the TM for me :/ Anything I could do?

On what kind of translation units were you looking for suggestions? I only see very long units which change every time, of very short recurring items (mostly the headers), for which TM currently works for me.

Even the headers were failing for me.

Translations:VisualEditor/Newsletter/2016/February/38/it AFAICT is the same than in the previous newsletter, and it isn't suggested by the system ATM.

Yes, it's identical: https://meta.wikimedia.org/?diff=15373013&oldid=15166609
The unit is rather long and the "Loading..." for translation memory suggestions systematically times out after 10 seconds for me.

For clarification, the normal priority is relative to other tasks in Language-Engineering April-June 2016 and does not mean this task would suddenly not be important.

Deskana added a subscriber: Deskana.

This doesn't contain actionable work for Discovery-Search so I have removed that tag; let us know if you need input on this task, and we will happily provide it.

I think this is a bigger issue for e.g. Tech News than one would first assume.

We have a couple of items that are specifically designed to be exactly the same every week, to make it easier for the translators – they are more complicated the first time you translate them, because you might need to figure out how to best represent dates in your language, but it will save time and effort in the long run. Or so goes the theory. But if they don't, then you have complicated items where you either have to go back to another issue and copy and paste, remember what to do with the code or be familiar enough with what it's doing that you realize you can simply remove it and exchange it for dates in normal text.

Examples:

<translate><!--T:20-->
You can join the next meeting with the VisualEditor team. During the meeting, you can tell developers which bugs you think are the most important. The meeting will be on [<tvar|time>http://www.timeanddate.com/worldclock/fixedtime.html?hour=20&min=00&sec=0&day=14&month=02&year=2017</> {{#time:<tvar|defaultformat>j xg</>|<tvar|date4>2017-02-14</>|<tvar|format_language_code>{{CURRENTCONTENTLANGUAGE}}</>}} at 20:00 (UTC)]. See [[<tvar|link>mw:VisualEditor/Weekly triage meetings</>|how to join]].</translate>
<translate><!--T:41-->
The [[<tvar|version>mw:MediaWiki 1.29/wmf.12</>|new version]] of MediaWiki will be on test wikis and MediaWiki.org from {{#time:<tvar|defaultformat>j xg</>|<tvar|date1>2017-02-14</>|<tvar|format_language_code>{{CURRENTCONTENTLANGUAGE}}</>}}. It will be on non-Wikipedia wikis and some Wikipedias from {{#time:<tvar|defaultformat>j xg</>|<tvar|date2>2017-02-15</>|<tvar|format_language_code>{{CURRENTCONTENTLANGUAGE}}</>}}. It will be on all wikis from {{#time:<tvar|defaultformat>j xg</>|<tvar|date3>2017-02-16</>|<tvar|format_language_code>{{CURRENTCONTENTLANGUAGE}}</>}} ([[<tvar|calendar>mw:MediaWiki 1.29/Roadmap</>|calendar]]).</translate>

I'm translating almost every week the tech news, and it's quite frustrating when you can't get the correct suggestions from the translation memory. Instead you have to go to the previous tech news and copypaste the correct content from there. In my opinion this should get higher priority as it affects many users globally and makes translating time consuming.

There is a bunch of reports again that TTMServer doesn't work. From API response for a frequently appearing long paragraph I can see it is spending a lot of time in TTMServer (21.37 seconds) without returning any results. Assuming nothing changed in ElasticSearch cluster, it looks like we have a crossed some kind of threshold.

@Nikerabbit What's the long-term implications of this, if we have indeed crossed said treshold? What can we expect?

(If possible to say, I mean, I do understand that "some kind of treshold" isn't very exact.)

So, only recent change in Translate is c55a8ee – I don't see how it could cause this, but maybe @dcausse could know that, or whether there has been any changes in ElasticSearch cluster that could cause TTMServer to work poorly.

If this is not caused by any external changes, it means that the algorithm has stopped working (what I called the threshold) for some reason such as:

  1. The amount of data has increased to the extent that the search is now taking too long and timing out.
  2. The amount of data has increased to the extent that the algorithm, which only loads a subset of it, incorrectly guesses based on the first subset to not load a second, larger subset.
  3. The amount of data has increased to the extent that the algorithm loads more data (but still only a subset) fails to find matches because it loads a poorly selected and/or two small subset of the data.

I won't be able to debug this extensively until August. Once we understand the issue, we can attempt some small tweaks, if possible. If larger changes are required, I expect those could be started at FYQ2 at earliest by me/Language team. Maybe earlier if we get help and or TTMServer stops working completely and this task would be re-prioritized.

I pulled latency numbers from the apiaction logs. Overall it doesn't look like performance has changed in any noticable way in the last 90 days on the wmf prod cluster. The y axis here is milliseconds for all lines except n_requests where it is an absolute count per day:

translation aids latency percentiles, Apr 6 - Jul 5 2018 (386×726 px, 72 KB)

Generated using the following HQL. This is the time for the all translationaids, but in a look it seems like time spent is almost entirely in ttmserver.

SELECT date(concat_ws('-', YEAR, MONTH, DAY)) AS date,
       count(1) AS n_requests,
       percentile_approx(timespentbackend, array(0.5, 0.75, 0.95, 0.99, 0.999)) AS percentiles
FROM wmf_raw.apiaction
WHERE YEAR = '2018'
  AND params['action'] = 'translationaids'
  AND (params['prop'] IS NULL
       OR params['prop'] LIKE '%ttmserver%')
GROUP BY YEAR,
         MONTH,
         DAY

I'm noting for documentation purposes that multiple people are again complaining about translation memory not working when translating tech news.

This task is part of our annual goals. The plan is that we will have a deep look how this could either be improved incrementally or re-architectured to solve the performance issues.

I think this is a bigger issue for e.g. Tech News than one would first assume.

We have a couple of items that are specifically designed to be exactly the same every week, to make it easier for the translators – they are more complicated the first time you translate them, because you might need to figure out how to best represent dates in your language, but it will save time and effort in the long run. Or so goes the theory. But if they don't, then you have complicated items where you either have to go back to another issue and copy and paste

I made https://meta.wikimedia.org/wiki/Template:SALT to simplify this. You enter {{subst:SALT}} as the translation and it'll substitute the contents of that section from the issue of the previous week, or the week before that, or fallback content from the subpages of the template.

It's a workaround, but strangely faster than using the translation memory even when the translation memory DOES work. With the translation memory, you:

  1. Have to wait for suggestions (no waiting for SALT)
  2. Check if the top suggestion isn't a moronic one because the suggestions are served in random order (SALT will never insert outdated crap)
  3. Go over to the right, click or tap the suggestion and be frustrated that the suggestion isn't loaded if you only managed to hit the box but not the text (SALT requires no selection of anything)
  4. Go back to the left to publish the translation (never had to leave the left for SALT!)

This would be even better if a single button gadget could insert "{{subst:SALT}}", but I can't seem to figure this one out because the Translate extension seems to generate the form entirely with JS.

This would be even better if a single button gadget could insert "{{subst:SALT}}", but I can't seem to figure this one out because the Translate extension seems to generate the form entirely with JS.

https://gerrit.wikimedia.org/g/mediawiki/extensions/Translate/+/335a7dc01152d96627b3dce1d2104f129b82825d/hooks.txt#120 may help.

For the first time for years, I got a memory suggestion for “Latest tech news from…” unit. Has there any work recently done?
Maybe T308676: Elasticsearch 7.10.2 rollout plan?

For the first time for years, I got a memory suggestion for “Latest tech news from…” unit. Has there any work recently done?
Maybe T308676: Elasticsearch 7.10.2 rollout plan?

Indeed we did upgrade the cluster recently but we did not expect any changes to memory translations, we haven't seen noticeable perf improvement overall but perhaps this upgrade had a very positive impact on perf on translation memories (maybe WAND?).

Interesting feature request from last merged task:

Suggestions should not be weighted only by number of times used but also by date; recent translations should be given more weight.