- Confirm translations are ready
- List of trusted user groups
- Translate "Edit quality (20k sample)"
- Run prelabeling script
- Load revisions into labels.wmflabs.org
I just found the questions in Github. I think most of your concerns might be because eswikiversity is a very small project. I am the only regular contributor there for some periods of time. My background is in software development, so if you let me know how I can see the data and what are your questions, I think should be able to answer them.
And now that I am here, I am wondering if it could be possible to filter out/trust my edits from the sample. Not saying they are always good but considering the experience with the other Spanish projects, I might end up doing most of the labeling myself and it would be kind of silly because a lot of them would be my own.
We can definitely play with the "trusted edits" set. @Lsanabria, are there any user-rights on Spanish Wikiversity that you think might indicated a "trusted" status? Also, do you think if we labeled edits by anyone with over a couple hundred edits as "trusted", would that mostly work out OK? Note that even "trusted" edits get loaded into Wiki Labels for review if they are reverted.
Sorry for the delay. I have been somewhat disconnected these days. I just got admin rights in eswikiversity so marking the edits of the admin group members as trusted should address my concern. Let me know if you need any additional information.
Turns out that cuts out only 500 edits -- from 10727 to 10214. That's a lot of edits to label. We want to get this down to about 5k at the most. I'll try cutting the number of "trusted edits" down to 200.
Of the ~17k edits we get, here's the breakdown by our "autolabeler":
I think a bare minimum of edits for review is to include "anon", "blocked user", and "reverted edit". That will get us 2185 + 87 + 2556 = 4828 edits for review. The other set would add a lot more work. These are edits made by registered newcomers who have made fewer than 100 edits and have not been blocked. Do you think it would be OK to assume these edits are good if they have not been reverted?
If they have not been reverted, I think, they are very likely good edits. Some bad ones might not have been reverted but I don't think they would be too many. Having said that, I don't think it would be a problem if they are included in the review. The eswiki dataset was around 8000 (if I recall) so I think 7000 it is manageable. I will defer to your expertise, whatever you think is better from the technical point of view.