Page MenuHomePhabricator

Translation memory lost
Closed, ResolvedPublic

Description

Meta Wiki currently seems to suffer from a complete loss of its - only partial - translation memory.

https://meta.wikimedia.org/w/index.php?title=Special:Translate&group=page-Tech%2FNews%2F2015%2F40&action=page&filter=&optional=1 does not show a single suggestion, although in the past at least the short recurring strings of all older newsletters were present.

Recurring messages or similar messages of more than 3 or 4 words are not and were never suggested.

Event Timeline

Purodha raised the priority of this task from to Needs Triage.
Purodha updated the task description. (Show Details)
Purodha subscribed.

@dcausse Could this be related to your change to use the plugin? Translation search is working as expected.

I can see in the logs that we are receiving queries with the old inline groovy script:

Caused by: org.elasticsearch.search.SearchParseException: [ttmserver][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"filtered":{"query":{"function_score":{"boost_mode":"replace","filter":{"query":{"fuzzy_like_this":{"fields":["content"],"boost":1,"min_similarity":0.5,"like_text":"Except as explained below, this Privacy Policy applies to our collection and handling of information about you that we receive as a result of your use of any of the Wikimedia Sites. This Policy also applies to information that we receive from our partners or other third parties. To understand more about what this Privacy Policy covers, please see below.","prefix_length":0,"ignore_tf":false,"max_query_terms":25}}},"functions":[{"script_score":{"script":"import org.apache.lucene.search.spell.*\nnew LevensteinDistance().getDistance(srctxt, _source['content'])","params":{"srctxt":"Except as explained below, this Privacy Policy applies to our collection and handling of information about you that we receive as a result of your use of any of the Wikimedia Sites. This Policy also applies to information that we receive from our partners or other third parties. To understand more about what this Privacy Policy covers, please see below."},"lang":"groovy"}}]}},"filter":{"term":{"language":"en"}}}},"from":0,"size":100,"_source":["content"],"min_score":0.65,"sort":["_score","_uid"]}]]
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:747)
        at org.elasticsearch.search.SearchService.createContext(SearchService.java:572)
        at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:544)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:385)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryFetchTransportHandler.messageReceived(SearchServiceTransportAction.java:833)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryFetchTransportHandler.messageReceived(SearchServiceTransportAction.java:824)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.query.QueryParsingException: [ttmserver] script_score the script could not be loaded
        at org.elasticsearch.index.query.functionscore.script.ScriptScoreFunctionParser.parse(ScriptScoreFunctionParser.java:93)
        at org.elasticsearch.index.query.functionscore.FunctionScoreQueryParser.parseFiltersAndFunctions(FunctionScoreQueryParser.java:217)
        at org.elasticsearch.index.query.functionscore.FunctionScoreQueryParser.parse(FunctionScoreQueryParser.java:122)
        at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:305)
        at org.elasticsearch.index.query.FilteredQueryParser.parse(FilteredQueryParser.java:71)
        at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:305)
        at org.elasticsearch.index.query.IndexQueryParserService.innerParse(IndexQueryParserService.java:382)
        at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:281)
        at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:276)
        at org.elasticsearch.search.query.QueryParseElement.parse(QueryParseElement.java:33)
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:731)
        ... 10 more

We merged this patch https://gerrit.wikimedia.org/r/#/c/238446/1/wmf-config/CommonSettings.php to activate the plugin...
I don't know why the config change was not taken into account, maybe it's wrong or there's another ttmserver config elsewhere?

A quick check on terbium shows that the config is properly deployed, but the 'use_wikimedia_extra' option should not be under 'config'.

Change 241019 had a related patch set uploaded (by Nikerabbit):
Fix TTMServer config to use the extra plugin

https://gerrit.wikimedia.org/r/241019

Nikerabbit triaged this task as High priority.

I will schedule this for SWAT on Monday if nobody else does.

revi renamed this task from Translation memory lost in Meta Wiki to Translation memory lost.Sep 28 2015, 11:17 AM
revi added subscribers: revi, Jalexander, Varnent.

I can reproduce this on TWN and Commons, so this is not meta-specific.

TM is working fine on translatewiki.net.

Sorry, I forgot to add this to morning SWAT today. If nobody adds this to evening SWAT today, I will do it in morning SWAT tomorrow.

Change 241019 merged by jenkins-bot:
Fix TTMServer config to use the extra plugin

https://gerrit.wikimedia.org/r/241019

They initially take many seconds to load, but I see suggestions.

Appears to be working for me again as well. Thank you!

revi removed a project: Patch-For-Review.
revi added subscribers: Stryn, Lsanabria, Johan.

Looks like it's happening again?

I can't see any errors (elasticsearch), ttmserver config seems to be good.
It's slow but I can see suggestions on https://meta.wikimedia.org/w/index.php?title=Special:Translate&group=page-Tech%2FNews%2F2015%2F40&action=page&filter=&optional=1&language=fr .

Do you have a link where it does not work so I can check server logs?

It takes about 10 seconds to load any translation aid on longer messages but I do see translation memory suggestions for shorter ones, both in the original URL and in the one Purodha provided.

Maybe it's a performance issue? Unfortunately I've never used ttmserver before we implemented the plugin function (to remove dynamic scripting from the eqiad cluster) so it's hard for me to evaluate if this new function has severe perf drawbacks vs the old groovy script.
AFAICT it should be the same but I may have done something wrong.

I think one of the problem with this function is that it uses very slow elasticsearch functionalities:

  • fuzzy like this: deprecated and will be removed in elasticsearch 2.0 due to perf issues
  • function score on all docs: the levenshtein distance will be applied on all docs returned by the fuzzy like this, we could optimize this part by running this function score inside the rescore phase which would allow us to compute the distance on a limited number of docs thanks to the rescore window.

Concerning the fuzzy like this I would suggest investigating into another approach based on char n-gram.

I think one of the problem with this function is that it uses very slow elasticsearch functionalities:

  • fuzzy like this: deprecated and will be removed in elasticsearch 2.0 due to perf issues
  • function score on all docs: the levenshtein distance will be applied on all docs returned by the fuzzy like this, we could optimize this part by running this function score inside the rescore phase which would allow us to compute the distance on a limited number of docs thanks to the rescore window.

Concerning the fuzzy like this I would suggest investigating into another approach based on char n-gram.

I'm moving this comment to T101236. I don't see evidence that translation memory is generally failing: closing this again for now.

(FWIW I added my comment here.)