User Details
- User Since
- Jan 21 2015, 11:51 AM (319 w, 1 d)
- Availability
- Available
- IRC Nick
- Unhammer
- LDAP User
- Unknown
- MediaWiki User
- Unhammer [ Global Accounts ]
Jan 15 2019
We do have a Plan to change Apertium to handle markup correctly, so one can trust that certain tags are always kept (and ordering of close/open tags is preserved), and there has been some work towards that end already, but it's all in "work in progress" branches and needs cleanup and testing. I may find some time in a few months to do that. I can't say for certain whether it'll immediately solve this issue without changes on the Content Translation-side though.
Jun 20 2016
both svn and apertium-nno-nob-1.1.0 give
Jun 14 2016
May 30 2016
Apr 2 2016
… and now you'll have to start all over again: http://thread.gmane.org/gmane.comp.nlp.apertium/5779 =P
well, only the data packages mentioned in that post are new (no change in other dependencies)
Mar 19 2016
Oh, that's interesting, on one of my computers I don't see the "norsk" from those pages.
Mar 18 2016
Also, from nn.wiki, https://nn.wikipedia.org/wiki/Formant shows links to both "norsk" and "norsk (bokmål)"
Any reference to no.wiki should probably specify "norsk (bokmål)" or something, since there are two "norsk" wikis ("norsk" is ambiguous between "norsk (bokmål)" and "norsk (nynorsk)").
From no.wiki, shows link to "norsk"
Mar 11 2016
Feb 9 2016
That being said, I am unsure how much sense it makes for apertium to ship a swagger spec describing the API apertium-apy provides. From my point of view, it probably does, but I have no estimation of how much work that is. What do you think ?
https://github.com/goavki/apertium-apy/blob/master/tools/sanity-test-apy.py is the script we use which just tries some very simple translations on our installed language pairs. Since it's basically doing a bunch of "curl" calls, the tests could just as well be written in node.js or what have you.
Feb 1 2016
Sep 23 2015
Already proposed such a project: https://meta.wikimedia.org/wiki/Grants:IEG/Pan-Scandinavian_Machine-assisted_Content_Translation :)
I notice I get "from=nb" in the URL when I go to an article on no.wikipedia.org, and click the greyed out "nynorsk" link on an article that has no nn.wikipedia.org translation. And Content Translation then says "no MT available". Manually changing the URL to "from=no", I do get MT.
Sep 2 2015
on apertium.org we just have a cron job that tries a simple translation on all pairs and shoots of an email if anything doesn't translate the way it used to. I'm guessing this task is for something equivalent? (maybe using https://cxserver.wikimedia.org/translation/ )
Aug 24 2015
Aug 14 2015
This should no longer be needed. If tornado >=4.2 is available, that is enough, and if not, toro.py is included in the APY source.
Aug 13 2015
By the way, I just included toro.py there as a single file, so no need to install it separately; dependencies should just be tornado >=3.1 now.
Tried making it work with tornado 3; apy -r61424 seems to work on 3.2 which is what my Ubuntu machine installs.
4.2 introduced the locks that we currently use – before that we depended on toro for the same stuff. I'm not sure how much else from >4 we require, but it would require some rewriting in any case.
-r57689 is almost a year old and used threading instead of select/polling; RAM usage was the main reason for the rewrite, but also easier process handling (pretty sure that explains the amount of pipelines started). Could you give -r61403 a try? It doesn't require toro any more, but does depend on Tornado >=4.2 and Python >=3.3
Wait, what version of apy is this?
Oh, that is odd. It should not be starting more than one process per pair by default …
A simple grep for nno-nob will also show processes of translation modes for nob-nno (which uses data from the same language folder, as well as minor variants like nob-nno_e but I guess there's no reason that'd be started).
:-( we have 526 procs for our user, running 93 translation modes. Does the servlet.py output tell you anything useful?
Aug 12 2015
You might want to try running
tools/sanity-test-apy.py http://your.apy.endpoint.wikimedia.org
(from the APY folder) while reading the servlet.py logs to see if any pairs lead to crashes or something like that (maybe change the tests to reflect your installed pairs). The new APY version will also mention in the logs how many pipelines are "still scheduled for shutdown"; these are pipelines where there are still requests to them that haven't been finished. I did manage to get hanging requests by trying so many different language pairs at the same time that I ran out of memory and got bad_alloc crashes, but hopefully your server is configured so that's not possible :)
"the new settings" being just -j1 -m300? Also, what SVN revision is this now? And did it keep increasing after the limits were increased?
Aug 11 2015
I've now implemented the above mentioned option -r to shut down a pipeline that has handled >N requests. Default is 1000.
Aug 7 2015
I am not sure of the performance gains of not shutting down language pairs anyway in the default case
I explained some of this on IRC, but repeating here for completeness: if APY is set to use "all available cores", then it starts one http server per core, and each http server gets to start all language pairs (as requests come in). Each language pair uses something like 7 processes (depending on the pair), and by default pipelines are kept open forever to make requests fast. So if you have 48 cores and 27 language pairs, you could end up with about 9.000 processes :-)