This card tracks a top 10 wish from the Community Wishlist Survey: https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey
Original proposal: Currently we have a bot that analysis "all" new edits to en WP for copyright concerns. The output is here: https://en.wikipedia.org/wiki/User:EranBot/Copyright/rc And there is the potential for it to work in a number of other languages. Problems is that it is not up as reliably as it should be. Also presentation of the concerns could be improved. Would love to see the output turned into an extension and formatted similar to the en:wp:Special:NewPagesFeed. Currently the output is sort-able by WikiProject. It would be nice to create WikiProject specific modules to go on individual project pages. -- Doc James (talk · contribs · email) 03:45, 4 November 2015 (UTC)
Community Tech preliminary assessment:
Support: Very high. Lots of support on the proposal, with several people specifically calling out for a tool that can be used on multiple projects and multiple languages.
Impact: Medium to High, depending on what we're able to do. Integrating the human-checked false positive/true positive data into EranBot's existing database and improving the API could be particularly useful for research and machine learning projects, potentially improving the bot’s true positive rate and requiring less human involvement. The ability to adapt this for multiple projects and languages would be especially helpful.
Feasibility: There's an existing tool on English Wikipedia - EranBot, aka Plagiabot, based on the Turnitin database - and Community Tech has done some work to make the results more broadly useful in the last couple months, including displaying the tool's results alongside Copyvios Detector's reports. (There's more details on ticket T110144.) Turning EranBot into an extension would be considerably more difficult than making improvements to the bot.
Risk: Medium, higher for more involved work. We'll need considerable discussion on the scope and definition.
Status: We're confident that we can do some helpful work on this wish this year. We need more investigation and discussion to figure out a clear scope of work. We'll be able to focus on it more in a few months.
Project page: https://meta.wikimedia.org/wiki/Community_Tech/Improve_the_plagiarism_detection_bot