Page MenuHomePhabricator

AI/ML Infrastructure Request: Persist historical revert risk multilingual model scores for threshold analysis
Open, Needs TriagePublic

Description

Please respond to the following questions, and provide as much detail as possible for each.

About the problem

  • Problem: What problem are you facing that could be resolved or mitigated with infrastructural improvements? What are you trying to accomplish, and how is this problem impacting your work?

We would like to store historical revert risk multilingual model scores somewhere so that they can be analyzed for thresholds in using the model in product facing features.
See notebook here for more details around threshold analysis.

  • Impacted projects: Which user-facing features or experiments would be unblocked or meaningfully improved by solving this problem? For each, indicate the level of impact that this problem has, where 1 = "Adds minor friction to the project" and 5 = "Completely blocks the project".

This work is to support rolling out revert risk multilingual filters in RecentChanges.

  • OKRs: How does this problem impact any OKRs?

This work impacts WE1.7.3

  • Timing: When does this problem need to be solved, and why?

As soon as possible.

Informing the solution

  • [Optional] Possible solutions: What infrastructural work would most meaningfully help you with this problem? Feel free to suggest multiple ideas.
  • [Optional] Requirements: What product or technical requirements do we need to keep in mind when deciding on a solution?
  • [Optional] Notes: Is there anything else you'd like to share?

Event Timeline

Kgraessle renamed this task from AI/ML Infrastructure Request: [Project title here] to AI/ML Infrastructure Request: Persist historical revert risk multilingual model scores for threshold analysis .Jun 17 2025, 1:07 PM
Kgraessle moved this task from Inbox to Radar / Tracking on the Moderator-Tools-Team board.

Hi @Kgraessle ! Can you please clarify the following?

  1. Which KRs for FY2025-2026 are enabled by this request?
  2. What is the relationship between this ticket and the KR being targeted? (Please share a hypothesis or description about how you're going to be using the thresholding data to inform specific product decisions, and how those product decisions will impact the KR metric).

Hi @SSalgaonkar-WMF apologies for the delay in responding.

Which KRs for FY2025-2026 are enabled by this request?

This work supports WE1 Contributor Experiences KR as our expectation is that some of the new dashboard modules we would like to build out will rely on revert risk multilingual models where possible so we will need a way to historically analyze the data to arrive at thresholds for product facing features.

What is the relationship between this ticket and the KR being targeted? (Please share a hypothesis or description about how you're going to be using the thresholding data to inform specific product decisions, and how those product decisions will impact the KR metric).

The multilingual model is preferred over the language agnostic model for the following wikis:

['ka', 'lv', 'ta', 'ur', 'eo', 'lt', 'sl', 'hy', 'hr', 'sk', 'eu', 'et', 'ms', 'az', 'da', 'bg', 'sr', 'ro', 'el', 'th', 'bn', 'no', 'hi', 'ca', 'hu', 'ko', 'fi', 'vi', 'uz', 'sv', 'cs', 'he', 'id', 'tr', 'uk', 'nl', 'pl', 'ar', 'fa', 'it', 'zh', 'ru', 'es', 'ja', 'de', 'fr', 'en']

We would like to have a way to arrive at thresholds for product facing features for those wikis.

Thanks!

Thanks so much for getting back to me, and no worries at all about timing @Kgraessle! I have a few more questions - please bear with me! These are questions that I'm discussing with all teams who are submitting requests to ML for Q1+, in order to help us prioritize and plan our roadmap. These are also questions that we haven't always asked in the past; as shared in our Engagement Model doc, we are moving towards an operating model in which we partner closely with teams toward a shared definition of success, rather than fulfilling requests without actually taking on the mission/purpose of the work.

our expectation is that some of the new dashboard modules we would like to build out will rely on revert risk multilingual models where possible so we will need a way to historically analyze the data to arrive at thresholds for product facing features.

  1. Can you please share more about what specific modules you'll be testing that will use this model, and what your experiment/rollout plan for these modules is (including anticipated launch dates)?
  2. The Objective you shared makes sense, and I imagine that this work rolls up into KR 1.3 ("By the end of Q4, there has been an X% increase in moderation actions done by people who are new to that type of moderation.") Does the Mod Tools team have a hypothesis around creating / testing the specific modules that will utilize the multilingual RR data?

Can you please share more about what specific modules you'll be testing that will use this model, and what your experiment/rollout plan for these modules is (including anticipated launch dates)?

Does the Mod Tools team have a hypothesis around creating / testing the specific modules that will utilize the multilingual RR data?

Hey @SSalgaonkar-WMF, these are great questions and they make sense from a resource planning perspective on your end. We know that we intend to have a dashboard with modules but haven't fine tuned the details around implementation as stated in the research/design focused hypothesis we have for Q1. An overview is that we will show actions associated with moderation tooling and patrolling i.e. a list of recent edits to patrol, to users with extend rights or experienced editors.

We anticipate knowing the exact details about the modules, their use of the multilingual RR data as well as launch dates in mid to late Q1. When this happens we can come back to this use case and create a hypothesis for it in Q2.

We would like to store historical revert risk multilingual model scores somewhere so that they can be analyzed for thresholds in using the model in product facing features.

Just wondering; would an S3 compatible bucket (or set of buckets) be a suitable technical solution for this requirement?
If so, perhaps the Rados Gateway on the Data Platform Engineering group's Ceph cluster might be a suitable place your data.

We have an endpoint in eqiad at: https://rgw.eqiad.dpe.anycast.wmnet which is optimised for throughput.
If object storage (S3 or Swift) isn't suitable, then we might still be able to help you with one of our other interfaces to the cluster.

@DMburugu thank you so much for this helpful response!! It totally makes sense that you're figuring out the details of what this experience will look like, and how you'll be using the multilingual RR data, and we'd love to reconvene in Q1 as you get more clarity. For now, I'll list this request in our Intake Tracker with a status of "Info needed" and I'll list it in our roadmap as a potential item in late Q1/early Q2.