Change Details

WMDE declined to take over KrBot operations (T290635, T189747) for two reasons: - It's not open source - WMDE puts higher priority on being able to access violations through SPARQL/API (T214362), but that still needs to complete various tech tasks (eg T201150) So this task asks to reimplement (something like) KrBot using the new SPARQL/API access. KrBot generates violation reports like [Wikidata:Database_reports/Constraint_violations/P2088](https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P2088) that are integrated in Property Discussion pages and are viewed as core part of WD. - These pages are the best way to work out data quality problems of specific props. Eg I'm now working out through [#Single_value%22_violations](https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P2088#%22Single_value%22_violations) to remove stale or wrong CrunchBase identifiers - Even when I can get all violation info with SPARQL, I'd prefer to work from a generated WD page because: - all the info is available at a glance, - it can be used by non-tech people (eg Getty Vocabulary Program editors will now use ULAN constraint violations to improve their own data) - I can use it to generate QS corrections. - (There is [Special:ConstraintReport](https://www.wikidata.org/wiki/Special:ConstraintReport/) (eg [ConstraintReport/Q389336](https://www.wikidata.org/wiki/Special:ConstraintReport/Q389336) showss some `P2088` violations of that item) but big-data editors don't fix data problems item by item.) - An improvement is needed: print the labels of WD items in addition to `Qnnnn` Scheduling (update/refresh) - T201150#7351510 discusses potentially useful schedules for when to reprocess (though that's per-item not per-property) - A benefit of a SPARQL/API based bot is that violation pages can easily be refreshed **on demand**