Context: Tainted References feature on Wikidata is intended to make mismatched statement value/reference pairs more prominent to Wikidata editors.
- tainted reference: mismatching statement value and reference pair
- edit triggering a tainted reference: edit changing exclusively a value of the statement
- edit cleaning the tainted reference: one of the following
- edit changing the reference of the statement on which tainted reference has been previously triggered.
- edit removing the reference of the statement on which tainted reference has been previously triggered.
- edit reverting the edit triggering a tainted reference
Goal: Fewer mismatching value/reference pairs exist.
We want to measure how many tainted references are triggered, and how many of these are being cleaned.
To have comparable figures, we need to have a baseline values for the period before enabling the new future (baseline does not exist yet)
In the first iteration we only need to look at the next edit by the same author, making the data simpler, but we might want to extend this later.
Goal: Triggered mismatches do get cleaned up and don’t pile up.
We want to measure how many of tainted references that have been triggered are eventually cleaned, and how long it
Again, we would need to compare with a baseline, and this metric is related to the previous one (at least conceptually, technically those might be measured completely separate)
- Wikibase does not help much to identifying triggering and cleaning edits
- Edits (Mediawiki revisions) changing a statement in any way (without much detail on what has changed: value, reference, qualifier, combination of these) could be filtered by considering only revisions with the comment field containing a value of format /* wbsetclaim-update:N||N */ [[Property:PNNN]]: XYZ, where N, NNN, and XYZ are actualy numbers/values.
- further reasoning on what the edit change might only be possible by inspecting the change done be the edit (revision), i.e. comparing the JSON object representation of an item in before and after
- For identifying revisions (edits) changing the same statement (e.g. to be able to recognize if the tainted reference has been cleaned) relying on statements unique ID might be of help. It still likely will be involving analyzing the JSON structure of the item data, as the identifier of the statement is not exposed in the comment or other field.