Page MenuHomePhabricator

Request creation of antivandalismai VPS project
Closed, InvalidPublic

Description

Project Name: antivandalismai

Developer account usernames of requestors: ItsNyoty

Purpose: To build a tool to help me and others fight vandalism on the Dutch Wikipedia more effectively.

Brief description:

The idea is to create a tool that uses smart algorithms to automatically spot vandalism edits on the Dutch Wikipedia (nl.wikipedia.org). As ItsNyoty, I'm already pretty active in fighting vandalism there, but it takes a lot of time. This tool will analyze edits to find things that look suspicious, like big content changes, weird user behavior, and common vandalism patterns. This way, we can quickly find and undo the bad edits before they cause too much trouble.

  • Software: I'm planning to use Python with libraries like TensorFlow or PyTorch to train the algorithms, and the Wiki API to connect to Wikipedia. I'll probably need scikit-learn too. No need to worry about paying for software licenses since these are free libraries.
  • Project Admin: I'll need to build the AI and maintain it as vandals come up with new tricks. This means retraining the algorithms with fresh data and tweaking things as needed. I'll use Github to keep track of everything.
  • Disk Space: I might need a bit more storage space to hold all the data and the algorithm itself. I think 500GB should be enough to start, but I might need to ask for more later depending on how it goes.
  • Important Note: The tool won't just automatically revert edits. I want to make sure a real person always checks first to avoid mistakes.
  • Dutch Focus: The tool will be focused for the Dutch Wikipedia and the kind of vandalism that happens there since I mostly fight vandalism there, so I know my way around. The tool should be possible to scale up to other languages later.

How soon you are hoping this can be fulfilled: I hope to release it this year. I'm a busy Applied Computer Science student, and I'm planning to work on this as a side project in my spare time.

Event Timeline

Hi!

Before creating the project, a couple notes/questions:

  • Are you going to use already existing models?

If so, note that many are not open source, so they will not be allowed (ex. llama https://opensource.org/blog/metas-llama-license-is-still-not-open-source). Also, if you are only going to run the model, it might be doable in toolforge so check it out first if possible (https://wikitech.wikimedia.org/wiki/Help:Toolforge).

  • Are you going to train models?

If so, two notes, one make sure that the data you use is also complying with the data policies of cloud services (https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use).
Second note, we don't have GPUs/ML-performant VMs available, so training your model might take really really long (depending on your training). Note that toolforge is even more restrictive on CPU/RAM limits.

Let us know if all of that is ok for you :)

Hi thank you for reaching out,

I was going to train the models on the cloud services but since WikiMedia doesn't have GPU's VMs available, I will try to look at the datacenter of my University. I might be able to train them there and after that I will look if it's doable on toolforge. I will close this request for now.

ItsNyoty changed the task status from Open to Stalled.Mar 25 2025, 2:04 PM
fnegri subscribed.

@ItsNyoty I'm closing this task as "Invalid", feel free to reopen if you are still interested in a Cloud VPS project.