Translation memory lost
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Purodha
	Sep 25 2015, 9:48 AM

Description

Meta Wiki currently seems to suffer from a complete loss of its - only partial - translation memory.

https://meta.wikimedia.org/w/index.php?title=Special:Translate&group=page-Tech%2FNews%2F2015%2F40&action=page&filter=&optional=1 does not show a single suggestion, although in the past at least the short recurring strings of all older newsletters were present.

Recurring messages or similar messages of more than 3 or 4 words are not and were never suggested.

Details

	Subject	Repo	Branch	Lines +/-
	Fix TTMServer config to use the extra plugin	operations/mediawiki-config	master	+1 -1

Customize query in gerrit

Related Objects

Mentioned In: T101236: TTMServer performance and coverage issues
rOMWC7ed97dd367c0: Fix TTMServer config to use the extra plugin
Mentioned Here: T101236: TTMServer performance and coverage issues

Event Timeline

• Purodha created this task.Sep 25 2015, 9:48 AM

• Purodha raised the priority of this task from to Needs Triage.

• Purodha updated the task description. (Show Details)

• Purodha added a project: MediaWiki-extensions-Translate.

• Purodha subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 25 2015, 9:48 AM

• Purodha updated the task description. (Show Details)Sep 25 2015, 11:20 AM

• Purodha set Security to None.

• Purodha added subscribers: Nikerabbit, siebrand, Nemo_bis.

@dcausse Could this be related to your change to use the plugin? Translation search is working as expected.

I can see in the logs that we are receiving queries with the old inline groovy script:

Caused by: org.elasticsearch.search.SearchParseException: [ttmserver][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"filtered":{"query":{"function_score":{"boost_mode":"replace","filter":{"query":{"fuzzy_like_this":{"fields":["content"],"boost":1,"min_similarity":0.5,"like_text":"Except as explained below, this Privacy Policy applies to our collection and handling of information about you that we receive as a result of your use of any of the Wikimedia Sites. This Policy also applies to information that we receive from our partners or other third parties. To understand more about what this Privacy Policy covers, please see below.","prefix_length":0,"ignore_tf":false,"max_query_terms":25}}},"functions":[{"script_score":{"script":"import org.apache.lucene.search.spell.*\nnew LevensteinDistance().getDistance(srctxt, _source['content'])","params":{"srctxt":"Except as explained below, this Privacy Policy applies to our collection and handling of information about you that we receive as a result of your use of any of the Wikimedia Sites. This Policy also applies to information that we receive from our partners or other third parties. To understand more about what this Privacy Policy covers, please see below."},"lang":"groovy"}}]}},"filter":{"term":{"language":"en"}}}},"from":0,"size":100,"_source":["content"],"min_score":0.65,"sort":["_score","_uid"]}]]
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:747)
        at org.elasticsearch.search.SearchService.createContext(SearchService.java:572)
        at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:544)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:385)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryFetchTransportHandler.messageReceived(SearchServiceTransportAction.java:833)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryFetchTransportHandler.messageReceived(SearchServiceTransportAction.java:824)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.query.QueryParsingException: [ttmserver] script_score the script could not be loaded
        at org.elasticsearch.index.query.functionscore.script.ScriptScoreFunctionParser.parse(ScriptScoreFunctionParser.java:93)
        at org.elasticsearch.index.query.functionscore.FunctionScoreQueryParser.parseFiltersAndFunctions(FunctionScoreQueryParser.java:217)
        at org.elasticsearch.index.query.functionscore.FunctionScoreQueryParser.parse(FunctionScoreQueryParser.java:122)
        at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:305)
        at org.elasticsearch.index.query.FilteredQueryParser.parse(FilteredQueryParser.java:71)
        at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:305)
        at org.elasticsearch.index.query.IndexQueryParserService.innerParse(IndexQueryParserService.java:382)
        at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:281)
        at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:276)
        at org.elasticsearch.search.query.QueryParseElement.parse(QueryParseElement.java:33)
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:731)
        ... 10 more

We merged this patch https://gerrit.wikimedia.org/r/#/c/238446/1/wmf-config/CommonSettings.php to activate the plugin...
I don't know why the config change was not taken into account, maybe it's wrong or there's another ttmserver config elsewhere?

A quick check on terbium shows that the config is properly deployed, but the 'use_wikimedia_extra' option should not be under 'config'.

I uploaded https://gerrit.wikimedia.org/r/#/c/241019/ that should fix the problem.

Change 241019 had a related patch set uploaded (by Nikerabbit):
Fix TTMServer config to use the extra plugin

https://gerrit.wikimedia.org/r/241019

gerritbot added a project: Patch-For-Review.Sep 25 2015, 12:10 PM

I will schedule this for SWAT on Monday if nobody else does.

This is a WMF-specific issue, right?

Restricted Application added subscribers: Dereckson, Matanya. · View Herald TranscriptSep 27 2015, 7:55 PM

I can reproduce this on TWN and Commons, so this is not meta-specific.

TM is working fine on translatewiki.net.

Sorry, I forgot to add this to morning SWAT today. If nobody adds this to evening SWAT today, I will do it in morning SWAT tomorrow.

Change 241019 merged by jenkins-bot:
Fix TTMServer config to use the extra plugin

https://gerrit.wikimedia.org/r/241019

Nikerabbit mentioned this in rOMWC7ed97dd367c0: Fix TTMServer config to use the extra plugin.Sep 28 2015, 11:44 PM

They initially take many seconds to load, but I see suggestions.

Appears to be working for me again as well. Thank you!

Looks like it's happening again?

I can't see any errors (elasticsearch), ttmserver config seems to be good.
It's slow but I can see suggestions on https://meta.wikimedia.org/w/index.php?title=Special:Translate&group=page-Tech%2FNews%2F2015%2F40&action=page&filter=&optional=1&language=fr .

Do you have a link where it does not work so I can check server logs?

Try translating the tech news
https://meta.wikimedia.org/wiki/Tech/News/2015/42 e.g. to Ripuarian.

Purodha

It takes about 10 seconds to load any translation aid on longer messages but I do see translation memory suggestions for shorter ones, both in the original URL and in the one Purodha provided.

Maybe it's a performance issue? Unfortunately I've never used ttmserver before we implemented the plugin function (to remove dynamic scripting from the eqiad cluster) so it's hard for me to evaluate if this new function has severe perf drawbacks vs the old groovy script.
AFAICT it should be the same but I may have done something wrong.

I think one of the problem with this function is that it uses very slow elasticsearch functionalities:

fuzzy like this: deprecated and will be removed in elasticsearch 2.0 due to perf issues
function score on all docs: the levenshtein distance will be applied on all docs returned by the fuzzy like this, we could optimize this part by running this function score inside the rescore phase which would allow us to compute the distance on a limited number of docs thanks to the rescore window.

Concerning the fuzzy like this I would suggest investigating into another approach based on char n-gram.

Glaisher subscribed.Oct 15 2015, 11:04 AM

Luke081515 moved this task from Backlog to Working on on the Wikimedia-Site-requests board.Oct 17 2015, 5:48 PM

• Elitre subscribed.Oct 21 2015, 8:52 AM

In T113711#1727543, @dcausse wrote:

I think one of the problem with this function is that it uses very slow elasticsearch functionalities:

fuzzy like this: deprecated and will be removed in elasticsearch 2.0 due to perf issues

function score on all docs: the levenshtein distance will be applied on all docs returned by the fuzzy like this, we could optimize this part by running this function score inside the rescore phase which would allow us to compute the distance on a limited number of docs thanks to the rescore window.

Concerning the fuzzy like this I would suggest investigating into another approach based on char n-gram.

I'm moving this comment to T101236. I don't see evidence that translation memory is generally failing: closing this again for now.

Nemo_bis mentioned this in T101236: TTMServer performance and coverage issues.Oct 24 2015, 7:49 AM

(FWIW I added my comment here.)

Luke081515 moved this task from Working on to Done on the Wikimedia-Site-requests board.Mar 7 2016, 4:59 PM

Restricted Application added a subscriber: JEumerus. · View Herald TranscriptMar 7 2016, 4:59 PM

Translation memory lostClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

Translation memory lost
Closed, ResolvedPublic
Actions