Page MenuHomePhabricator

Unhammer (Kevin Brubeck Unhammer)
Apertium greasemonkey

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Jan 21 2015, 11:51 AM (226 w, 5 d)
Availability
Available
IRC Nick
Unhammer
LDAP User
Unknown
MediaWiki User
Unhammer [ Global Accounts ]

Recent Activity

Jan 15 2019

Unhammer added a comment to T213784: Apertium mishandling some links.

We do have a Plan to change Apertium to handle markup correctly, so one can trust that certain tags are always kept (and ordering of close/open tags is preserved), and there has been some work towards that end already, but it's all in "work in progress" branches and needs cleanup and testing. I may find some time in a few months to do that. I can't say for certain whether it'll immediately solve this issue without changes on the Content Translation-side though.

Jan 15 2019, 11:34 AM · ContentTranslation

Jun 20 2016

Unhammer added a comment to T137450: SI-units is translated with capital letters between Nynorsk and Bokmål.

both svn and apertium-nno-nob-1.1.0 give

Jun 20 2016, 8:19 AM · WorkType-NewFunctionality, ContentTranslation

Jun 14 2016

Unhammer awarded T137767: Package apertium-swe-dan/apertium-swe-nor a Cookie token.
Jun 14 2016, 1:27 PM · ContentTranslation-Release10, Language-Q1-2016-17 Sprint 4, WorkType-NewFunctionality, Language-Engineering July-September 2016, ContentTranslation-Deployments, ContentTranslation

May 30 2016

Unhammer added a comment to T108798: Integrate MT healthcheck into Apertium service.

As far as sanity-test-apy.py goes, it looks pretty interesting. We have https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/service/files/checker.py for nodejs application that have a swagger spec like the one in https://cxserver.wikimedia.org/v1/?spec (cxserver is powered partly by apertium in case you were not aware btw). Given it follows a swagger spec, it's language agnostic.

This allows to couple the monitoring to the API endpoint advertisement and thus make is more difficult for monitoring to deviate from what the application actually does. It has proven to be quite robust.

That being said, I am unsure how much sense it makes for apertium to ship a swagger spec describing the API apertium-apy provides. From my point of view, it probably does, but I have no estimation of how much work that is. What do you think ?

May 30 2016, 8:01 AM · CX-deployments, WorkType-Maintenance

Apr 2 2016

Unhammer added a comment to T124137: Update dan-nor language pair for Apertium.

… and now you'll have to start all over again: http://thread.gmane.org/gmane.comp.nlp.apertium/5779 =P
well, only the data packages mentioned in that post are new (no change in other dependencies)

Apr 2 2016, 10:12 AM · ContentTranslation-Release10, Language-Q1-2016-17 Sprint 4, Language-Engineering July-September 2016, Unplanned-Sprint-Work, WorkType-NewFunctionality, ContentTranslation-Deployments, ContentTranslation

Mar 19 2016

Unhammer added a comment to T130347: Grey ContentTranslate link to no.wiki from no.wiki.

Oh, that's interesting, on one of my computers I don't see the "norsk" from those pages.

Mar 19 2016, 4:43 PM · WorkType-Maintenance, ContentTranslation

Mar 18 2016

Unhammer updated the task description for T130347: Grey ContentTranslate link to no.wiki from no.wiki.
Mar 18 2016, 8:40 AM · WorkType-Maintenance, ContentTranslation
Unhammer added a comment to T130347: Grey ContentTranslate link to no.wiki from no.wiki.


Also, from nn.wiki, https://nn.wikipedia.org/wiki/Formant shows links to both "norsk" and "norsk (bokmål)"

Mar 18 2016, 8:40 AM · WorkType-Maintenance, ContentTranslation
Unhammer updated the task description for T130347: Grey ContentTranslate link to no.wiki from no.wiki.
Mar 18 2016, 8:38 AM · WorkType-Maintenance, ContentTranslation
Unhammer added a comment to T130347: Grey ContentTranslate link to no.wiki from no.wiki.

Any reference to no.wiki should probably specify "norsk (bokmål)" or something, since there are two "norsk" wikis ("norsk" is ambiguous between "norsk (bokmål)" and "norsk (nynorsk)").

Mar 18 2016, 8:31 AM · WorkType-Maintenance, ContentTranslation
Unhammer added a comment to T130347: Grey ContentTranslate link to no.wiki from no.wiki.


From no.wiki, shows link to "norsk"

Mar 18 2016, 8:29 AM · WorkType-Maintenance, ContentTranslation
Unhammer created T130347: Grey ContentTranslate link to no.wiki from no.wiki.
Mar 18 2016, 8:29 AM · WorkType-Maintenance, ContentTranslation

Mar 11 2016

Unhammer awarded T124137: Update dan-nor language pair for Apertium a Cookie token.
Mar 11 2016, 9:00 AM · ContentTranslation-Release10, Language-Q1-2016-17 Sprint 4, Language-Engineering July-September 2016, Unplanned-Sprint-Work, WorkType-NewFunctionality, ContentTranslation-Deployments, ContentTranslation

Feb 9 2016

Unhammer added a comment to T108798: Integrate MT healthcheck into Apertium service.

That being said, I am unsure how much sense it makes for apertium to ship a swagger spec describing the API apertium-apy provides. From my point of view, it probably does, but I have no estimation of how much work that is. What do you think ?

Feb 9 2016, 4:27 PM · CX-deployments, WorkType-Maintenance
Unhammer added a comment to T108798: Integrate MT healthcheck into Apertium service.

https://github.com/goavki/apertium-apy/blob/master/tools/sanity-test-apy.py is the script we use which just tries some very simple translations on our installed language pairs. Since it's basically doing a bunch of "curl" calls, the tests could just as well be written in node.js or what have you.

Feb 9 2016, 8:50 AM · CX-deployments, WorkType-Maintenance

Feb 1 2016

Unhammer updated the task description for T102101: Enable Apertium MT in Content Translation for those languages where it is supported.
Feb 1 2016, 12:24 PM · CX-deployments, Language-Engineering July-September 2016, Goal, Community-Wishlist-Survey-2015, WorkType-NewFunctionality, Notice

Sep 23 2015

Unhammer added a comment to T91748: Possible collaborations between Wikimedia and Apertium.

Already proposed such a project: https://meta.wikimedia.org/wiki/Grants:IEG/Pan-Scandinavian_Machine-assisted_Content_Translation :)

Sep 23 2015, 11:51 AM · Possible-Tech-Projects, ContentTranslation
Unhammer added a comment to T85903: Resolve the issues with Norwegian macro language code in ContentTranslation (tracking).

I notice I get "from=nb" in the URL when I go to an article on no.wikipedia.org, and click the greyed out "nynorsk" link on an article that has no nn.wikipedia.org translation. And Content Translation then says "no MT available". Manually changing the URL to "from=no", I do get MT.

Sep 23 2015, 8:30 AM · CX-deployments, Tracking-Neverending, WorkType-Maintenance, Technical-Debt, I18n

Sep 2 2015

Unhammer added a comment to T108798: Integrate MT healthcheck into Apertium service.

on apertium.org we just have a cron job that tries a simple translation on all pairs and shoots of an email if anything doesn't translate the way it used to. I'm guessing this task is for something equivalent? (maybe using https://cxserver.wikimedia.org/translation/ )

Sep 2 2015, 1:53 PM · CX-deployments, WorkType-Maintenance

Aug 24 2015

Unhammer added a comment to T97938: Apertium machine translation doesn't work from Norwegian Nynorsk to Bokmål.
Aug 24 2015, 9:01 AM · WorkType-Maintenance, LE-CX6-Sprint 2, ContentTranslation-Release6, ContentTranslation, ContentTranslation-Deployments

Aug 14 2015

Unhammer added a comment to T101947: Package python-toro as apertium-apy dependency.

This should no longer be needed. If tornado >=4.2 is available, that is enough, and if not, toro.py is included in the APY source.

Aug 14 2015, 7:33 AM · WorkType-NewFunctionality, LE-CX6-Sprint 2, ContentTranslation-Release6, ContentTranslation-Deployments, ContentTranslation

Aug 13 2015

Unhammer added a comment to T107270: Apertium leaves a ton of stale processes, consumes all the available memory.

By the way, I just included toro.py there as a single file, so no need to install it separately; dependencies should just be tornado >=3.1 now.

Aug 13 2015, 1:19 PM · Operations, WorkType-Maintenance, LE-CX6-Sprint 2, ContentTranslation-Release6, ContentTranslation, ContentTranslation-CXserver, Language-Team
Unhammer added a comment to T107270: Apertium leaves a ton of stale processes, consumes all the available memory.

Tried making it work with tornado 3; apy -r61424 seems to work on 3.2 which is what my Ubuntu machine installs.

Aug 13 2015, 12:37 PM · Operations, WorkType-Maintenance, LE-CX6-Sprint 2, ContentTranslation-Release6, ContentTranslation, ContentTranslation-CXserver, Language-Team
Unhammer added a comment to T107270: Apertium leaves a ton of stale processes, consumes all the available memory.

4.2 introduced the locks that we currently use – before that we depended on toro for the same stuff. I'm not sure how much else from >4 we require, but it would require some rewriting in any case.

Aug 13 2015, 11:14 AM · Operations, WorkType-Maintenance, LE-CX6-Sprint 2, ContentTranslation-Release6, ContentTranslation, ContentTranslation-CXserver, Language-Team
Unhammer added a comment to T107270: Apertium leaves a ton of stale processes, consumes all the available memory.

-r57689 is almost a year old and used threading instead of select/polling; RAM usage was the main reason for the rewrite, but also easier process handling (pretty sure that explains the amount of pipelines started). Could you give -r61403 a try? It doesn't require toro any more, but does depend on Tornado >=4.2 and Python >=3.3

Aug 13 2015, 10:22 AM · Operations, WorkType-Maintenance, LE-CX6-Sprint 2, ContentTranslation-Release6, ContentTranslation, ContentTranslation-CXserver, Language-Team
Unhammer added a comment to T107270: Apertium leaves a ton of stale processes, consumes all the available memory.

Wait, what version of apy is this?

Aug 13 2015, 9:55 AM · Operations, WorkType-Maintenance, LE-CX6-Sprint 2, ContentTranslation-Release6, ContentTranslation, ContentTranslation-CXserver, Language-Team
Unhammer added a comment to T107270: Apertium leaves a ton of stale processes, consumes all the available memory.

Oh, that is odd. It should not be starting more than one process per pair by default …

Aug 13 2015, 9:52 AM · Operations, WorkType-Maintenance, LE-CX6-Sprint 2, ContentTranslation-Release6, ContentTranslation, ContentTranslation-CXserver, Language-Team
Unhammer added a comment to T107270: Apertium leaves a ton of stale processes, consumes all the available memory.

A simple grep for nno-nob will also show processes of translation modes for nob-nno (which uses data from the same language folder, as well as minor variants like nob-nno_e but I guess there's no reason that'd be started).

Aug 13 2015, 9:44 AM · Operations, WorkType-Maintenance, LE-CX6-Sprint 2, ContentTranslation-Release6, ContentTranslation, ContentTranslation-CXserver, Language-Team
Unhammer added a comment to T107270: Apertium leaves a ton of stale processes, consumes all the available memory.

:-( we have 526 procs for our user, running 93 translation modes. Does the servlet.py output tell you anything useful?

Aug 13 2015, 8:52 AM · Operations, WorkType-Maintenance, LE-CX6-Sprint 2, ContentTranslation-Release6, ContentTranslation, ContentTranslation-CXserver, Language-Team

Aug 12 2015

Unhammer added a comment to T107270: Apertium leaves a ton of stale processes, consumes all the available memory.

You might want to try running

tools/sanity-test-apy.py http://your.apy.endpoint.wikimedia.org

(from the APY folder) while reading the servlet.py logs to see if any pairs lead to crashes or something like that (maybe change the tests to reflect your installed pairs). The new APY version will also mention in the logs how many pipelines are "still scheduled for shutdown"; these are pipelines where there are still requests to them that haven't been finished. I did manage to get hanging requests by trying so many different language pairs at the same time that I ran out of memory and got bad_alloc crashes, but hopefully your server is configured so that's not possible :)

Aug 12 2015, 9:43 AM · Operations, WorkType-Maintenance, LE-CX6-Sprint 2, ContentTranslation-Release6, ContentTranslation, ContentTranslation-CXserver, Language-Team
Unhammer added a comment to T107270: Apertium leaves a ton of stale processes, consumes all the available memory.

"the new settings" being just -j1 -m300? Also, what SVN revision is this now? And did it keep increasing after the limits were increased?

Aug 12 2015, 9:00 AM · Operations, WorkType-Maintenance, LE-CX6-Sprint 2, ContentTranslation-Release6, ContentTranslation, ContentTranslation-CXserver, Language-Team

Aug 11 2015

Unhammer added a comment to T107270: Apertium leaves a ton of stale processes, consumes all the available memory.

I've now implemented the above mentioned option -r to shut down a pipeline that has handled >N requests. Default is 1000.

Aug 11 2015, 11:52 AM · Operations, WorkType-Maintenance, LE-CX6-Sprint 2, ContentTranslation-Release6, ContentTranslation, ContentTranslation-CXserver, Language-Team

Aug 7 2015

Unhammer added a comment to T107270: Apertium leaves a ton of stale processes, consumes all the available memory.

I am not sure of the performance gains of not shutting down language pairs anyway in the default case

Aug 7 2015, 5:35 PM · Operations, WorkType-Maintenance, LE-CX6-Sprint 2, ContentTranslation-Release6, ContentTranslation, ContentTranslation-CXserver, Language-Team
Unhammer added a comment to T107270: Apertium leaves a ton of stale processes, consumes all the available memory.

I explained some of this on IRC, but repeating here for completeness: if APY is set to use "all available cores", then it starts one http server per core, and each http server gets to start all language pairs (as requests come in). Each language pair uses something like 7 processes (depending on the pair), and by default pipelines are kept open forever to make requests fast. So if you have 48 cores and 27 language pairs, you could end up with about 9.000 processes :-)

Aug 7 2015, 1:48 PM · Operations, WorkType-Maintenance, LE-CX6-Sprint 2, ContentTranslation-Release6, ContentTranslation, ContentTranslation-CXserver, Language-Team