Page MenuHomePhabricator

Remove dewiki-tmg-autoformatter from en:WP:US/List import by toolinfo-scraper
Closed, ResolvedPublic

Description

https://toolhub.wikimedia.org/tools/tmg-auto-formatter is a better record created by the script maintainer.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@bd808 would you mind providing more context on this?

@bd808 would you mind providing more context on this?

Sure. I built a tool at https://tools-static.wmflabs.org/toolinfo-scraper/ that is collecting data from https://en.wikipedia.org/wiki/Wikipedia:User_scripts/List and presenting it to Toolhub as a toolinfo.json file. This ticket is about one particular tool record scraped from that page being a duplicate of a toolinfo record created directly in Toolhub by the script's author. I would like to add some sort of exclusion configuration functionality to my custom scraper to make it simple to discard that particular record, and ideally others that are found in the future.

This is probably less of a Toolhub task and more of a toolinfo-scraper task.

Interesting, is there a reason why the author wouldn't want to remove the one they created and update the entry created by the scraper?

Interesting, is there a reason why the author wouldn't want to remove the one they created and update the entry created by the scraper?

The access control model for editing toolinfo data does not currently allow this. A toolinfo record can currently only be edited by it's initial creator. Any record entering the system from the crawler can only be changed by changing the source file crawled. See https://meta.wikimedia.org/w/index.php?diff=22209560 and https://meta.wikimedia.org/wiki/Toolhub/Decision_record#Content_ownership/modification_model for more detail about the current editing model.

bd808 moved this task from To Do to Done on the User-bd808 board.