User Details
- User Since
- Feb 15 2022, 2:51 PM (217 w, 2 d)
- Availability
- Available
- IRC Nick
- aiko
- LDAP User
- Unknown
- MediaWiki User
- AChou-WMF [ Global Accounts ]
Wed, Apr 15
I dug deeper into the logs from kserve-container and identified a few opportunities to reduce log volume.
Fri, Apr 10
Thu, Apr 9
Completed local validation for edit-check after upgrading to kserve==0.17.
During local validation after the kserve==0.17 upgrade, Edit-Check hit a startup regression.
Completed local upgrade and validation work for revise-tone-task-generator as part of the KServe 0.17 migration.
Wed, Apr 8
This task is complete. We'll open new tasks for the follow-on hypotheses.
Wed, Apr 1
Possible Extensions — Generation
Mon, Mar 30
Edit Suggestions Experiment — Progress Update
Tue, Mar 24
@gkyziridis quick follow-up: what's the current status of this task? I recall we verified it works on staging. Is there anything left to do before we move it to production?
Wed, Mar 18
Experiment Plan
- Local Experiments
- Use a smaller model.
- Run on a curated set of articles (sampled across each pa_class and main_topic).
- Manually review outputs to identify patterns where the model produces incorrect or low-quality suggestions
- Lab Experiments
- Use a larger model for broader evaluation.
- Run on the full dataset.
- Review outputs:
- Targeted review: Re-run the same selected articles from local experiments to evaluate whether issues are mitigated with a larger model
- Random sampling: Identify any new incorrect patterns not observed during local experiments
- We want to answer these questions:
- What is the risk of generating this incorrect suggestion?
- How would incorrect suggestions impact the user experience?
- Handling Incorrect Suggestions (for future iterations)
- Model-side mitigations:
- Instruct the model to:
- Skip uncertain cases
- Perform self-verification / double-checking (to reduce hallucinations)
- Instruct the model to:
- Content-side mitigations:
- Improve content formatting where issues stem from poor or inconsistent article structure
- Model-side mitigations:
- Content Scope:
- Focus on pure prose content: Exclude templates, structured markup, links, references, etc.
- To be explored in future iterations:
- Mathematical / scientific representations
- Tables
- References
- Section structure and formatting
Mar 17 2026
Mar 12 2026
Mar 11 2026
Mar 2 2026
We likely need a new event schema for this use case. The schema that Lift Wing has been using assumes classification outputs and no more. The article quality model, however, produces a continuous score between 0 and 1. We can also return a derived label and additional computed features. Examples:
Feb 20 2026
I can see that most logs have "kubernetes.pod_name" values like:
"controller-..."
"webhook-..."
"kserve-controller-manager-..."
"istio-ingressgateway-..."
These are system logs generated by kserve, which seem to be the majority.
@DPogorzelski-WMF Is there a way to reduce the system logs from kserve?
@elukey sorry to ping you, but maybe you have some insights here
How does this sound folks?
@gkyziridis Sounds good to me! :)
EVENTGATE_STREAM=mediawiki.page_revert_risk_multilingual_prediction_change.v1, this will separate the stream right ?
In addition to this, we'll need to create a mediawiki-config change like https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1133603 (this is for adding the RRLA stream).
And please test the whole workflow in staging changeprop + staging Lift Wing first before moving to production.
Feb 3 2026
Answering @gkyziridis's questions:
… what kind of optimization do you have in mind?
I meant optimizing for latency and throughput. Since the model server will need to handle every new edit once we produce rr-multilingual predictions to an event stream. The source is page_change event stream (every Wikipedia edit triggers a predict request), so the model server needs to be fast enough to keep up with the incoming edit rate.
@Isaac: Kevin is currently working on the vLLM image for Lift Wing in T415627: Update WMF Debian vLLM image to support latest upstream software stack. He's been testing different versions and building the image on our ML-build machine, which will then be pushed to the WMF registry.
Feb 2 2026
Should these be emitted into the same stream, or should we make a new stream for this?
I was thinking to use the same stream. When I proposed the name in T326179#10711809, my idea was to put all the predictions from revert-risk models (rr-language-agnostic, rr-multilingual, rr-wikidata) in one stream.
Jan 30 2026
The Revise Tone experiment launched on Monday, the 26th of January!
Resolved this task. Really appreciate all the input and collaboration from everyone. :)
This task has been resolved by T412210: Use HTML instead of wikitext for Revise Tone Task Generator in LiftWing
This task has been resolved by T412210: Use HTML instead of wikitext for Revise Tone Task Generator in LiftWing
Jan 29 2026
Jan 19 2026
Weekly Report
Jan 12 2026
Weekly Report
Jan 8 2026
Jan 7 2026
@Isaac Yes, that would be very helpful! I've +1 the MR. :)
Jan 6 2026
let me know if that fixes things? I checked one of your examples locally and that seemed to do the trick but will be good to have a second pair of eyes and more examples.
I tested it and it fixes the issue. Thanks for the quick fix :)
Dec 22 2025
For the issues we want to address with the HTML parser,
By using the HTML parser's plaintext functionality and specifying elements to exclude, we should be able to filter out reference lists, external links, tables, infoboxes, data tables, and image captions. When I parsed the direct quote examples from the spreadsheet, the results show only text that appears in the main article prose.
I parsed all the examples labeled "Tone issue in direct quote" from the spreadsheet Revise Tone: Articles to feed the model using a HTML parser. Overall, the results look very good.


