Page MenuHomePhabricator

Generate datasets for Automoderator model testing for top 150 Wikipedias
Closed, ResolvedPublic

Description

As part of our model testing process (T342641) we need datasets of edits and Revert Risk model scores for the wikis we want feedback and input on.

Projects

Top 150 Wikipedias per Wiki comparison data

Datasets

We want 25,000 random edits per project, along with their Revert Risk score. These should only be article namespace edits. Additionally, we want to include data for each edit, which broadly match dimensions on which Automoderator will avoid edits:

  • Is the edit a self-revert? (i.e. the edit is a revert of an edit made by the same user)
  • Is the edit a page creation?
  • Was the edit made by a bot?
  • Is the user an administrator?
  • Does the edit have the newcomer task links tag?
  • Does the edit have the contenttranslation tag?

Event Timeline

@Samwalton9-WMF

Regarding random sampling, how balanced should the dataset be across all of the dimensions? If we want to ensure a minimum across a some or all of the dimensions, I'd suggest a stratified random sample.

Also, will it be helpful to have contenttranslation tag as well?

@Samwalton9-WMF

Regarding random sampling, how balanced should the dataset be across all of the dimensions? If we want to ensure a minimum across a some or all of the dimensions, I'd suggest a stratified random sample.

We discussed that, for this dataset, we don't want any sampling, since we're just trying to give users a random set of edits and will be filtering these dimensions out. We may in the future want a sampled dataset so we can better understand how impactful filtering on these aspects would be.

@Samwalton9-WMF I shared the dataset with you. Please review and let me know if that will fit the needs.

Thanks @KCVelaga_WMF! Some things I'm noticing initially:

  • It looks like is_self_revert is actually finding whether the edit was later reverted by the same editor, not whether the edit itself is a revert of a previous edit by this user. We're looking for the latter.
  • newcomer tasks: We're specifically interested in the newcomer task links tag but this dataset seems to be targeting newcomer tasks in general - there are other newcomer tasks (like copyediting) which we want to take action on, but we think 'add a link' is so unambiguous that we should always leave that one alone.
  • It looks like there are zero page creations in the datasets - is that just because page creations are rarer, or are they inherently excluded because the Revert Risk API can only consider edits which have a parent revision (or something else)?

@Samwalton9-WMF

  • I am not sure if I understood the first point completely. Let's consider an edit, which is reverting a previous edit, in which case, "edit itself is a revert of a previous edit" would be TRUE. Can you elaborate more on what you mean "by this user"?
  • Got it, fixed that in the code for now. I will update the dataset along with other necessary changes.
  • Yes, I observed this too. The revert scores dataset share by @Pablo doesn't include page creations (probably for the same reason you mentioned).
    • If we want page creations as well, we can generate the scores ourselves for a limited number of edits.

@Samwalton9-WMF

  • I am not sure if I understood the first point completely. Let's consider an edit, which is reverting a previous edit, in which case, "edit itself is a revert of a previous edit" would be TRUE. Can you elaborate more on what you mean "by this user"?

The first edit with is_self_revert = True is https://en.wikipedia.org/w/index.php?diff=1075902621. This edit isn't a revert, but it was later reverted, by the same editor who made this edit. We're not interested in this - we're interested in whether the edit itself is the self-revert, i.e. this edit should be True: https://en.wikipedia.org/w/index.php?title=Dune_(2021_film)&diff=next&oldid=1075902984

Does that make sense?

Samwalton9-WMF moved this task from Done to Ready on the Moderator-Tools-Team (Kanban) board.

We decided to also proactively support German and Japanese Wikipedias since they're engaged with this project. Could you also generate the same datasets for these wikis?

@Samwalton9-WMF I have updated the Sheet with jawiki and dewiki datasets.

We chatted briefly on Slack - it turns out self-reverts were excluded from these datasets entirely. We'd rather have them in the dataset, and have Automoderator say 'No' to reverting them, rather than exclude them from the data at the outset. This isn't a blocker for v1 of testing, so no rush.

@Samwalton9-WMF @KCVelaga_WMF: It's unclear if this is done or there's more work that remains. If it is done according to the original scope but more work emerged (which seems to be the case), please resolve and create a new task for that additional work.

mpopov triaged this task as Medium priority.Oct 26 2023, 3:52 PM
KCVelaga_WMF renamed this task from Generate datasets for Automoderator model testing to Generate datasets for Automoderator model testing for top 150 Wikipedias.Nov 13 2023, 7:26 AM
KCVelaga_WMF updated the task description. (Show Details)

It is more time efficient for me to generate in bulk rather than one each time. To start with, I will generate and store the data for top 150 Wikipedias, according to wiki comparision data.

KCVelaga_WMF raised the priority of this task from Medium to High.
KCVelaga_WMF added subscribers: Chqaz, Ponor.