Generate datasets for Automoderator model testing for top 150 Wikipedias
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Samwalton9-WMF
	Sep 20 2023, 3:19 PM

Description

As part of our model testing process (T342641) we need datasets of edits and Revert Risk model scores for the wikis we want feedback and input on.

Projects

Top 150 Wikipedias per Wiki comparison data

Datasets

We want 25,000 random edits per project, along with their Revert Risk score. These should only be article namespace edits. Additionally, we want to include data for each edit, which broadly match dimensions on which Automoderator will avoid edits:

Is the edit a self-revert? (i.e. the edit is a revert of an edit made by the same user)
Is the edit a page creation?
Was the edit made by a bot?
Is the user an administrator?
Does the edit have the newcomer task links tag?
Does the edit have the contenttranslation tag?

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Declined		None	T342641 Create a tool to enable community testing of Automoderator and the Revert Risk models
		Resolved		KCVelaga_WMF	T346916 Generate datasets for Automoderator model testing for top 150 Wikipedias

Event Timeline

Samwalton9-WMF created this task.Sep 20 2023, 3:19 PM

Samwalton9-WMF added a parent task: T342641: Create a tool to enable community testing of Automoderator and the Revert Risk models.Sep 20 2023, 3:22 PM

Samwalton9-WMF updated the task description. (Show Details)Sep 20 2023, 3:26 PM

KCVelaga_WMF subscribed.Sep 21 2023, 7:25 AM

@Samwalton9-WMF

Regarding random sampling, how balanced should the dataset be across all of the dimensions? If we want to ensure a minimum across a some or all of the dimensions, I'd suggest a stratified random sample.

Also, will it be helpful to have contenttranslation tag as well?

Samwalton9-WMF updated the task description. (Show Details)Sep 21 2023, 11:02 AM

In T346916#9185862, @KCVelaga_WMF wrote:

@Samwalton9-WMF

Regarding random sampling, how balanced should the dataset be across all of the dimensions? If we want to ensure a minimum across a some or all of the dimensions, I'd suggest a stratified random sample.

We discussed that, for this dataset, we don't want any sampling, since we're just trying to give users a random set of edits and will be filtering these dimensions out. We may in the future want a sampled dataset so we can better understand how impactful filtering on these aspects would be.

KCVelaga_WMF claimed this task.Sep 23 2023, 9:11 AM

KCVelaga_WMF edited projects, added Product-Analytics (Kanban), Moderator-Tools-Team (Kanban); removed Product-Analytics.

KCVelaga_WMF moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.

@Samwalton9-WMF I shared the dataset with you. Please review and let me know if that will fit the needs.

KCVelaga_WMF moved this task from Doing to [Deprecated] Done (previously: Needs sign-off) on the Product-Analytics (Kanban) board.Sep 26 2023, 3:19 PM

Thanks @KCVelaga_WMF! Some things I'm noticing initially:

It looks like is_self_revert is actually finding whether the edit was later reverted by the same editor, not whether the edit itself is a revert of a previous edit by this user. We're looking for the latter.
newcomer tasks: We're specifically interested in the newcomer task links tag but this dataset seems to be targeting newcomer tasks in general - there are other newcomer tasks (like copyediting) which we want to take action on, but we think 'add a link' is so unambiguous that we should always leave that one alone.
It looks like there are zero page creations in the datasets - is that just because page creations are rarer, or are they inherently excluded because the Revert Risk API can only consider edits which have a parent revision (or something else)?

@Samwalton9-WMF

I am not sure if I understood the first point completely. Let's consider an edit, which is reverting a previous edit, in which case, "edit itself is a revert of a previous edit" would be TRUE. Can you elaborate more on what you mean "by this user"?
Got it, fixed that in the code for now. I will update the dataset along with other necessary changes.
Yes, I observed this too. The revert scores dataset share by @Pablo doesn't include page creations (probably for the same reason you mentioned).
- If we want page creations as well, we can generate the scores ourselves for a limited number of edits.

In T346916#9200330, @KCVelaga_WMF wrote:

@Samwalton9-WMF

I am not sure if I understood the first point completely. Let's consider an edit, which is reverting a previous edit, in which case, "edit itself is a revert of a previous edit" would be TRUE. Can you elaborate more on what you mean "by this user"?

The first edit with is_self_revert = True is https://en.wikipedia.org/w/index.php?diff=1075902621. This edit isn't a revert, but it was later reverted, by the same editor who made this edit. We're not interested in this - we're interested in whether the edit itself is the self-revert, i.e. this edit should be True: https://en.wikipedia.org/w/index.php?title=Dune_(2021_film)&diff=next&oldid=1075902984

Does that make sense?

Samwalton9-WMF removed a subscriber: Samwalton9.Sep 26 2023, 4:44 PM

KCVelaga_WMF mentioned this in T345242: Determine technical approach for Automoderator testing interface.Sep 30 2023, 5:56 AM

@Samwalton9-WMF I have updated the datasets as per our discussion.

KCVelaga_WMF moved this task from Ready to Eng review on the Moderator-Tools-Team (Kanban) board.Oct 5 2023, 8:11 AM

This looks great :)

Samwalton9-WMF closed this task as Resolved.Oct 10 2023, 1:03 PM

We decided to also proactively support German and Japanese Wikipedias since they're engaged with this project. Could you also generate the same datasets for these wikis?

@Samwalton9-WMF I have updated the Sheet with jawiki and dewiki datasets.

KCVelaga_WMF moved this task from Ready to Done on the Moderator-Tools-Team (Kanban) board.Oct 20 2023, 4:30 PM

We chatted briefly on Slack - it turns out self-reverts were excluded from these datasets entirely. We'd rather have them in the dataset, and have Automoderator say 'No' to reverting them, rather than exclude them from the data at the outset. This isn't a blocker for v1 of testing, so no rush.

Samwalton9-WMF mentioned this in T349606: Generate Automoderator testing data for hr.wiki.Oct 24 2023, 11:57 AM

cwylo mentioned this in T349832: Generate Automoderator testing data for cs.wiki.Oct 26 2023, 3:29 PM

@Samwalton9-WMF @KCVelaga_WMF: It's unclear if this is done or there's more work that remains. If it is done according to the original scope but more work emerged (which seems to be the case), please resolve and create a new task for that additional work.

mpopov triaged this task as Medium priority.Oct 26 2023, 3:52 PM

mpopov moved this task from [Deprecated] Done (previously: Needs sign-off) to Needs Investigation on the Product-Analytics (Kanban) board.Oct 26 2023, 10:07 PM

KCVelaga_WMF moved this task from Needs Investigation to Doing on the Product-Analytics (Kanban) board.Nov 12 2023, 7:18 AM

KCVelaga_WMF renamed this task from Generate datasets for Automoderator model testing to Generate datasets for Automoderator model testing for top 150 Wikipedias.Nov 13 2023, 7:26 AM

KCVelaga_WMF updated the task description. (Show Details)

It is more time efficient for me to generate in bulk rather than one each time. To start with, I will generate and store the data for top 150 Wikipedias, according to wiki comparision data.

KCVelaga_WMF closed this task as a duplicate of T349606: Generate Automoderator testing data for hr.wiki.Nov 13 2023, 7:31 AM

KCVelaga_WMF merged a task: T349832: Generate Automoderator testing data for cs.wiki.

KCVelaga_WMF added subscribers: cwylo, matej_suchanek.

KCVelaga_WMF reopened this task as In Progress.Nov 13 2023, 7:35 AM

KCVelaga_WMF raised the priority of this task from Medium to High.

KCVelaga_WMF merged a task: T349606: Generate Automoderator testing data for hr.wiki.

KCVelaga_WMF added subscribers: Chqaz, Ponor.

related T351057

KCVelaga_WMF moved this task from Ready to In Progress on the Moderator-Tools-Team (Kanban) board.Nov 17 2023, 7:02 AM

KCVelaga_WMF closed this task as Resolved.Nov 17 2023, 7:47 AM

Samwalton9-WMF mentioned this in T372747: Repeat Automoderator testing process with Multilingual Revert Risk data.Aug 19 2024, 10:39 AM