- What use case is the model going to support/resolve?
Enterprise would like to support users who are interested in understanding the level of "safety" of each revision they received with as much granularity as possible.
- Do you have a model card?
https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Multilingual_reference_need
- What team created/trained/etc.. the model?
Research (@diego and @Aitolkyn)
- What tools and frameworks have you used?
Mainly the transformers library and pytorch. See the full list of dependencies here.
- What kind of data was the model trained with?
From the model card:
The model was trained on a set of featured articles, which are determined as the highest quality on Wikipedia by the editors.
We used the mediawiki_wikitext_current table to extract the latest available revision for each featured article. The snapshot used was 2024-02.
Number of languages: 5 ('ru', 'es', 'de', 'fr', 'en')
Number of sentences: 100,000
Random sample of 20,000 sentences from each language balanced on the ground-truth label.
- What kind of data the model is going to need in production (for example, calls to internal/external services, special datasources for features, etc..) ?
To predict the reference need for a revision, its content i.e. revision text is required.
- If you have a minimal codebase that you used to run the first tests with the model, could you please share it?
The original source for the model lives in the reference-quality repo. A refactored version of it has since been added to knowledge-integrity and can be used by installing v0.8.2.
- State what team will own the model and please share some main point of contacts.
Research (@diego)
- What is the current latency and throughput of the model, if you have tested it?
The latency scales linearly with the number of uncited sentences in the revision text. At the moment, 70% of test articles can be processed under 500ms, while the rest will exceed this time limit.
- Is there an expected frequency in which the model will have to be retrained with new data? What are the resources required to train the model and what was the dataset size?
- Have you checked if the output of your model is safe from a human rights point of view? Is there any risk of it being offensive for somebody? Even if you have any slight worry or corner case, please tell us!
@FNavas-foundation to comment?