When translating at meta I noticed that sometimes I do not get the translation memory does not always show the suggestions I expect. Furthermore, I sometimes get different set of suggestions for same string.
Reason is that TTMServer translation memory query consists of two parts: First find all matching source texts and order them by edit distance, then fetch the translations for the strings if present. To simplify a bit, lets assume the following:
the number of results for the first query is large, say N(all) > 1000
furthermore the number of perfect matches is also large, N(perfect)> 200
the number of results for the second query, assuming we will inspect all N, is small, M < 30
Let’s say each message can be identified with Nₙ and if it has corresponding translation in the target language I will call that as Mₙ. In the actual code we will fetch 50 results, but let’s make it simpler and do 5. Let’s search for string “user” and assume that N₁...N₂₀ are perfect matches. Since we order only by score, there is no guarantee that we will always get N₁...N₅ as a result. Let’s say we get [N₆, N₃, N₈, N₁, N₂]. Then, if only N₇ has a translation M₇, for this query we do not return any suggestions.
Simple solution is to increase the number of results we fetch for the first query. This will only move the problem further away. Better solution is to iteratively get more results until some condition. Most inefficient condition would be to fetch all results until the score goes under a given threshold. Little bit more intelligent solution would be to fetch at least all solutions having score larger or equal to lowest score in the first query. This ensures that if N stays unchanged, we will always get all suggestions we currently can get. To be able to do this iteratively, we need to ensure that results are sorted consistently, so we need to use secondary sort key besides score.
As a translator, I can reliably use translation memory suggestions, so that I can translate faster and more consistently and I can be sure that lack of suggestions means there are no similar translations.
- All translation suggestions that could be shown currently, are shown
- Assuming N does not change, we will always show the same result
Enter link to actually done tangible deliverable(s)