The actual failure can be reproduced by visiting https://cxserver.wikimedia.org/v2/page/sv/nn/Royal_Society_for_the_Protection_of_Birds
Page sv:Royal_Society_for_the_Protection_of_Birds could not be found. TypeError: item.dispose is not a function
Root cause is a regresssion from recent cxserver upgrade. Fix already in place https://gerrit.wikimedia.org/r/c/mediawiki/services/cxserver/+/978192 waiting for deployment
Thu, Nov 30
From our past observations, especiailly during translaiton campaigns, many users participate, potentially creating low quality articles. The review happens much later. Reviwers also had complained that they cannot review all these articles on time. When review happens, articles get deleted. So the deletion happens weeks later the translation activity. Considering this, the chances that a new user has a deleted translation while making intentional or unintentaionl low quality translation is rare.
Hence, the proposed strict limit if user has deletion in last 30 days might not have expected effect. However, I support keeping this in place. But the user should be clearly communicated why their translation limits are high.
The current logic in CX for CJK group of languages(including chinese) is follows. The tokens are characters instead of words, so 人口 has 2 tokens.
@elukey, What do you mean by 'reaching out to you by next time' ? Regarding the architecture of MinT and why it is not using LiftWing we had discussion in the past. I don't think it is not useful to repeat. There is a reason why we put the models in people.wikimedia.org - it was as per recommendation from SRE and this ticket was created to make it more reliable. We still need a public location for models download as MinT is not designed for WMF instrastructure alone.
Tue, Nov 28
We need 2TB scratch volume mounted too.
Tue, Nov 21
Thu, Nov 16
Wed, Nov 8
Tue, Nov 7
https://test.wikipedia.org/w/rest.php/coredev/v0/transform/wikitext/to/html/Oxygen looks good. If this can be exposed for all production wikis, we can definitely move to this endpoint.
Mon, Nov 6
It seems we need to continue with restbase for the time being till a stable, well documented API is known as replacement, right?
Nov 2 2023
http://parsoid-external-ci-access.beta.wmflabs.org - Does this use actual production wiki? Or beta.wmflabs.org? If it is beta.wmflabs.org, then we will be limited by content and supported languages right?
If you need access to pagebundles or the transform endpoints, then we have to figure something out.
"Parsoid endpoints are not expected to work for external requests. So this is "working" as expected."
Nov 1 2023
Oct 30 2023
Fixed in sentencex version 0.5.1
@Sportzpikachu Thanks for the PR. Please note that jquery.i18n has a successor banana.i18n which is a framework agnostic js library. That is the library we are actively going to maintain. If your usecase can use that library, it would be much better.
Oct 25 2023
The model expects sentences. That is how it is trained. For example, words like "Moon" can appear in many latin based languages as proper noun or reference to a title of a book etc. The prediction quality increase as more words are provided. Then it knows better about the context of the word.
Oct 19 2023
Oct 17 2023
We have the service in production: https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_language_identification_prediction
Oct 13 2023
Oct 12 2023
@elukey If I understood that documentation correctly, if the service required oauth token, still Anonymous users can use it with the applicable ratelimiting. am I right?
There would be usecases where non-mediawiki static webpage using this API and this anonymous ratelimited option should be sufficient.
Yes, references are moved to the end of sentence. Also seen in this example below. The positioning of references after the correct position in translation is slightly complicated and need to be implemented.
Oct 11 2023
By adding the following line in the common.js in wikipedia, you can see the proof of concept
importScript( 'User:Santhosh.thottingal/mint-section-translation.js' );
We now have a library for this - in js and python.
Oct 9 2023
Not only styles, but spaces are replaced by .
Oct 5 2023
Trying to reproduce the issue:
Hi @isarantopoulos I drafted the model card here: https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Language_Identification
Oct 4 2023
Oct 3 2023
@daniel, @MSantos What would be web API for posting wikitext to /transform/wikitext/to/html? Could not see documentation for that at https://www.mediawiki.org/wiki/API:REST_API/Reference
Sep 29 2023
Sep 28 2023
Sep 27 2023
Sep 20 2023
Sep 19 2023
I did a temporary fix in the repository to unblock CI issue so that we are not blocked by issue https://gerrit.wikimedia.org/r/c/mediawiki/services/machinetranslation/+/958599 - directly call pytest instead of calling via tox.
but black, and ruff checks are not present in it.
Sep 14 2023
Hi, I looked into this issue again. Blubber not using a virtual environment is an important issue. As debian bookworm's pip restrictions is also coming up, we need to fix this.
Sep 13 2023
Sep 11 2023
I can confirm the bug in sentence splitting for highlighting
Sep 7 2023
We are already working on deploying this flask app as a service on the Lift Wing
@abi_ could you please include a screenshot in this ticket for better understanding of functionality and for historical reference. Thanks.
Aug 22 2023
Aug 16 2023
Aug 9 2023
Session outline, links, suggested reading and materials is given below. I will also have a presentation with this content.
Aug 8 2023
Aug 7 2023
Aug 4 2023
I removed some unused files. Should be reduced from 11.7 to 8GB now. Rest of the files cannot be deleted for now.
Jul 13 2023
Jul 10 2023
@elukey not an answer to your question, but trying to assess the effort required here. Like everybody else we are also constrained by people capacity :-)