Apertium identifies words that it cannot translate and has an ability to log it. We should consider collecting this information and sending it to Apertium developers.
From #apertium on IRC:Steps:
aharoni 2. How can Wikipedia help Apertium improve this? Can we report the most frequent missing words,1. for example?Package python-toro (See: T101947)
TinoDidriksen Unhammer and jacobEo,2. the currently online maintainers of dan-nor; what say you?Determine location for missingFreqs.db and access to it (It is sqlite DB).
aharoni I've been thinking how to report untranslated words from Wikipedia back to Apertium3. Puppet config.
4. Deployment in Beta and Production.
> From #apertium on IRC:
> aharoni 2. How can Wikipedia help Apertium improve this? Can we report the most frequent missing words, for example?
> TinoDidriksen Unhammer and jacobEo, the currently online maintainers of dan-nor; what say you?
> aharoni I've been thinking how to report untranslated words from Wikipedia back to Apertium
> TinoDidriksen Well, APY keeps a database of untranslated words, with frequency afaik.
> aharoni Where is it collected?
aharoni Where is it collected?> TinoDidriksen Some SQLite db on the APY host.
TinoDidriksen Some SQLite db on the APY host.> aharoni [ hi kart_ ]
> aharoni [ hi kart_ ]If we have our own package installed, do we already collect it?
> aharoni If we have our ownkart_ handles all the package installeding for us, do we already collect it?I don't know the technical details.
aharoni kart_ handles all th> TinoDidriksen Don't know what version you have packaging for usged, I don't know the technical detailsor whether it has that part enabled.
TinoDidriksen Don't know what version you have packaged, or whether it has that part enabled.> aharoni kart_: do you know?
aharoni kart_: do you know?> TinoDidriksen File is called missingFreqs.db in the APY folder.
TinoDidriksen File is called missingFreqs.db in the APY folder> aharoni OK, let's say that we do have it.
> aharoni OKIf we periodically send it to Apertium, let's say that we do have it.will it be useful?
> aharoni If we periodically send itWill somebody bother to Apertium, will it be usefuladd the translations?
aharoni Will somebody bother to add the translations> kart_ TinoDidriksen: you mean -apy?
> kart_ TinoDidriksen: you mean -apy?I think I need to update package then.
> kart_ TinoDidriksen: I think I need to update package then.aharoni: ^^
> kart_ aharoni: ^^can I have task in Phab? :D
kart_ > aharoni: can I have task in Phab? :D kart_: ack
aharoni kart_: ack> aharoni TinoDidriksen: you know, you could just run Apertium over a dump of all Wikipedia articles and collect the most frequent untranslated words :)
> kart_ aharoni TinoDidriksen: you know: how to access is another subject, you could justas we do run Apertium over a dump of all Wikipedia articles and collect the most frequent untranslated words :)it on production service.
kart_ > aharoni: how to access is another subject, as we do run it on production service. If you haven't already :)
> aharoni If you haven't already :kart_: How about just copying it once a month and emailing it to an Apertium contact :)
aharoni kart_: How about just copying it once a month and emailing it to an Apertium contact :)> TinoDidriksen Whether anyone will care to look most missing words is a whole other story. I guess it's good incentive because there is a direct feedback loop.
TinoDidriksen Whether anyone will care to look most missing words is a whole other story. I guess it's good incentive because there is a direct feedback loop> aharoni It shouldn't be too big for email.
aharoni It shouldn't be too big for email> TinoDidriksen Ours is 130MB currently.
> aharoni TinoDidriksen Ours is 130MB currently: If there is somebody who will care and add the translations, I'd gladly provide it.
aharoni > TinoDidriksen: If there is somebody who will care and add Nobody is even looking at our own, currently...but it also hasn't been advertised to the translations,mailing list. I'd gladly provide itWe should do that.
TinoDidriksen Nobody is even looking at our own, currently...but it also hasn't been advertised to the mailing list. We should do that> kart_ TinoDidriksen: Please do.
> kart_ TinoDidriksen: Please doEven I came to know today, we should have send feedback earlier.
> kart_ Even I came to know today, we should have send feedback earlier.TinoDidriksen: can location of db configurable?
kart_ TinoDidriksen: can location of db configurable?> TinoDidriksen I don't maintain any of the Python code. Unhammer and sushain handle APY. But I can only assume the answer is yes, 'cause that sounds trivial.
> TinoDidriksen I don't maintain any of the Python code. Unhammer and sushain handle APY.Oh, But I can only assume the answer is yesit already is, 'cause that sounds trivial.with -f
> TinoDidriksen Oh, it already isThere was also a cmdline flag to make it keep an in-memory buffer, so that it doesn't hog I/O with -fSQLite commits: -M1000
TinoDidriksen There was also a cmdline flag to make it keep an in-memory buffer, so that it doesn't hog I/O with SQLite commits: -M1000> Unhammer hi
> Unhammer hiyeah, probably good idea to use -M1000 (or some number like that)
> Unhammer yeahand yeah I'd like seeing the wp missingfreqs, it's probably good idea to use -M1000 (or some number like that)more directly useful
Unhammer and yeah I'd like seeing the wp missingfreqs, it's probably more directly useful