Page MenuHomePhabricator

Request to host kid-friendly-classifier on Lift Wing
Open, Needs TriagePublic

Description

  • What use case is the model going to support/resolve?**

Was made to detect content which we do not to expose kids to in roblox games (based on the roblox TOS)

  • Do you have a '''model card'''?

No. See https://huggingface.co/derenrich/enwiki-kid-friendly-classifier

  • What team created/trained/etc.. the model? What tools and frameworks have you used?**

Future Audiences. Trained using huggingface transformers for the purposes of filtering out articles on roblox.

  • What kind of data was the model trained with, and what kind of data the model is going to need in production (for example, calls to internal/external services, special datasources for features, etc..) ?**

input: article title and short description
output: categorical variable (e.g. none/crime/political/...)

  • If you have a minimal codebase that you used to run the first tests with the model, could you please share it?**

see https://gitlab.wikimedia.org/repos/future-audiences/roblox/speed-backend/-/blob/main/wiki-speedrun/train.py?ref_type=heads

  • State what team will own the model and please share some main point of contacts (see more info in '''Ownership of a model''').**

Future Audiences / me

  • What is the current latency and throughput of the model, if you have tested it?** We don't need anything precise at this stage, just some ballparks numbers to figure out how the model performs with the expected inputs. For example, does the model take ms/seconds/etc.. to respond to queries? How does it react when 1/10/20/etc.. requests in parallel are made? If you don't have these numbers don't worry, open the task and we'll figure something out while we discuss about next steps!

don't know. it's only experimental. ModernBERT-based so should be very easy to host.

  • Is there an expected frequency in which the model will have to be retrained with new data?** What are the resources required to train the model and what was the dataset size?

very infrequently

  • Have you checked if the output of your model is safe from a human rights point of view? **Is there any risk of it being offensive for somebody? Even if you have any slight worry or corner case, please tell us!

have not checked. possible biases. only experimental for now.

  • Everything else that is relevant in your opinion.**

Event Timeline

derenrich updated the task description. (Show Details)

Hi Daniel, thanks for filing this request!

What use case is the model going to support/resolve?**
Was made to detect content which we do not to expose kids to in roblox games (based on the roblox TOS)

Could you please provide more information regarding the specific product that is going to be making requests to this service? We are interested in understanding if this service/product is part of WMF's production infrastructure (e.g. MediaWiki) which would mean that internal requests are allowed or if it an external request which means that it should be routed through the API Gateway.

Could you also provide access to some sample data? Or if there is a specific place you can link where input/output of the model is defined. We are interested in understanding what is the preprocessing logic and a candidate for the response schema.

the model is still being evaluated so as such we do not have a specific product that is going to make requests. likely the requests would come from the wide-internet (since FA would want to integrate it into services running on toolforge or roblox).

the input is free text containing the title / short description of an article in english. the output is a categorical string indicating if the content is kid friendly or if not in what way the content violates kid friendlyness.

Is there something more you need from me here? What is the typical turn-around time for requests like this?

Hey @derenrich, sorry for the delayed follow-up. This is on me personally; I'm balancing a few time-sensitive items right now and a bit behind on the things that are non-blocking (as I understand this request to be - please correct me if I've misunderstood).

This request (and the corresponding Slack threads) have raised some broader questions around the technical and policy implications of hosting unvetted proof-of-concept models in LiftWing vs. Toolforge, and the process for building something nimbly that can later be productionized and reused by other teams. I owe you a longer summary of my thoughts on this, and I will prioritize writing and sending that here by tomorrow if that's okay.

In the meantime, I wanted to double check that this request is non-blocking, given that you're able to use Toolforge in this specific instance (I get that that won't always be the case). Is that correct?

This is kind of a unique case for us, since most of our hosting requests come from the Research team, so I'm sorry that we don't have a process built around this yet!

ok no worries and no rush. this isn't blocking but I was hoping to have a sense of this direction would work (and how the process works). it sounds like much of that is still up in the air.