Page MenuHomePhabricator

Create editquality campaign for Spanish Wikiversity
Closed, ResolvedPublic

Description

  • Confirm translations are ready
  • List of trusted user groups
  • Translate "Edit quality (20k sample)"
  • Run prelabeling script
  • Load revisions into labels.wmflabs.org

Related Objects

Event Timeline

Restricted Application added a project: artificial-intelligence. · View Herald TranscriptNov 16 2018, 5:06 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The translation for "Edit quality (20k sample)" is "Editar calidad (20k muestra aleatoria)"

The translation for "Edit quality (20k sample)" is "Editar calidad (20k muestra aleatoria)"

I'd use "Calidad de la edición (muestra de 20 000)". Thanks.

Halfak triaged this task as Normal priority.
Halfak renamed this task from Edit quality campaign for Spanish Wikiversity to Create editquality campaign for Spanish Wikiversity.
Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptApr 5 2019, 11:46 AM
Halfak added a subscriber: Halfak.Apr 26 2019, 8:10 PM

@Ladsgroup i just pinged you in the task because it looks like the data is a little weird and I had some questions about it a few weeks ago that look like they are still unanswered.

I just found the questions in Github. I think most of your concerns might be because eswikiversity is a very small project. I am the only regular contributor there for some periods of time. My background is in software development, so if you let me know how I can see the data and what are your questions, I think should be able to answer them.

And now that I am here, I am wondering if it could be possible to filter out/trust my edits from the sample. Not saying they are always good but considering the experience with the other Spanish projects, I might end up doing most of the labeling myself and it would be kind of silly because a lot of them would be my own.

We can definitely play with the "trusted edits" set. @Lsanabria, are there any user-rights on Spanish Wikiversity that you think might indicated a "trusted" status? Also, do you think if we labeled edits by anyone with over a couple hundred edits as "trusted", would that mostly work out OK? Note that even "trusted" edits get loaded into Wiki Labels for review if they are reverted.

Halfak claimed this task.Apr 29 2019, 4:04 PM
Halfak moved this task from Review to Active on the Scoring-platform-team (Current) board.
Halfak added a comment.May 9 2019, 1:59 PM

@Lsanabria, I'm still waiting on your response to my last questions. No rush. Just want to make sure you know I'm blocked on you taking a look.

Sorry for the delay. I have been somewhat disconnected these days. I just got admin rights in eswikiversity so marking the edits of the admin group members as trusted should address my concern. Let me know if you need any additional information.

Halfak added a comment.May 9 2019, 2:29 PM

OK let me re-run. We were already considering admins "trusted" but I'll see how much of a difference it makes to include your edits.

Halfak added a comment.May 9 2019, 4:08 PM

Turns out that cuts out only 500 edits -- from 10727 to 10214. That's a lot of edits to label. We want to get this down to about 5k at the most. I'll try cutting the number of "trusted edits" down to 200.

Halfak added a comment.May 9 2019, 8:58 PM

Of the ~17k edits we get, here's the breakdown by our "autolabeler":

editstype"needs review"
2185anonTrue
87blocked userTrue
2874otherTrue
2556reverted editTrue
3459trusted editsFalse
6579trusted userFalse

I think a bare minimum of edits for review is to include "anon", "blocked user", and "reverted edit". That will get us 2185 + 87 + 2556 = 4828 edits for review. The other set would add a lot more work. These are edits made by registered newcomers who have made fewer than 100 edits and have not been blocked. Do you think it would be OK to assume these edits are good if they have not been reverted?

If they have not been reverted, I think, they are very likely good edits. Some bad ones might not have been reverted but I don't think they would be too many. Having said that, I don't think it would be a problem if they are included in the review. The eswiki dataset was around 8000 (if I recall) so I think 7000 it is manageable. I will defer to your expertise, whatever you think is better from the technical point of view.

OK I made the change so that autoconfirmed users are "trusted" so long as their edits are not reverted and they have never been blocked. That got us down below 4828 revisions.

Halfak closed this task as Resolved.Tue, Jun 18, 1:39 PM