Page MenuHomePhabricator

Tamil draft quality model
Open, LowPublic

Description

@Tshrinivasan explained that reviewing new page creations on Tamil for blog style writing is an important curation job.

This page has a template tagging as mildly problematic: https://ta.wikipedia.org/w/index.php?oldid=2291043

There are also new pages that are severely problematic and should be deleted immediately. (Here is an example of copyright violation deletion: https://quarry.wmflabs.org/query/18667)

We should be able to train a model to detect both mild and severely problematic new pages.

Event Timeline

Here's an example of a deletion comment for copyright violation:

[[WP:CR|பதிப்புரிமை]] மீறல்: மூலம் - http://sarithram.blogspot.in/2011/08/blog-post_4040.html

This links to WP:CR and that indicates the deletion reason. We know this because WP:CR redirects to விக்கிப்பீடியா:பதிப்புரிமை -- a page describing copyright.

I talked to @Tshrinivasan about this and he'll provide a list of all of the page we should look for links to in "severely problematic new page" deletions.

Halfak triaged this task as Low priority.Jun 1 2017, 2:28 PM
Halfak moved this task from Unsorted to New development on the Machine-Learning-Team board.