Page MenuHomePhabricator

[SPIKE] Evaluate efficacy of reference check detection heuristic by trialing it on a corpus of diffs
Open, Needs TriagePublicSpike

Description

In T324730, we will define the initial heuristic that will determine the conditions under which the initial reference check will be activated (read: presented to people).

This task involves the work running the heuristic T324730 will introduce on a corpus of diffs so that we can evaluate the extent to which the heuristic is being triggered in cases where we think experienced volunteers will expect it be.

Evaluation Methods

MethodRelevant tickets (if any)Notes
Editing Team internal reviewEdit Check Heuristic Review
WMF Community ambassadors reviewGrowth Team Copyedit Review, Growth Team Copyedit Review (results), Growth Team Image Suggestion Review
Wikipedia volunteers review
WMF Research Team

Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptJan 3 2023, 4:39 PM

Meeting notes from today:

  • Difficulty depends on what level our service operates at (VE DM / HTML / wikitext?)
  • @matmarex: We want to use VE transactions in the “real” interface, which is a completely different format from wikitext diffs. Can we convert between them, or to another common format?
  • @Esanders: If we had a service that converted diffs to plain text, one should be able to do a decent job of this in VE as well as wikitext (not 100%, but close enough to test the heuristic?)
  • @cscott: Don't have to use historic diffs, could sample live edits as they happen and log the result
  • @matmarex: Can we load two revisions in VE and compute the transaction that would be needed to reach B from A? (@Esanders: visual diff is kind of like this)

Meeting notes from today:

  • Difficulty depends on what level our service operates at (VE DM / HTML / wikitext?)
  • @matmarex: We want to use VE transactions in the “real” interface, which is a completely different format from wikitext diffs. Can we convert between them, or to another common format?
  • @Esanders: If we had a service that converted diffs to plain text, one should be able to do a decent job of this in VE as well as wikitext (not 100%, but close enough to test the heuristic?)
  • @cscott: Don't have to use historic diffs, could sample live edits as they happen and log the result
  • @matmarex: Can we load two revisions in VE and compute the transaction that would be needed to reach B from A? (@Esanders: visual diff is kind of like this)

The key open question I see in the above is the following: On what "raw material" (there's likely a more descriptive word than this) will we run the heuristic will have defined in T324730 on?

Assuming the question I named is indeed the question y'all think we ought to be allocating our focus to answering, what – if any – additional information do you think we need before we can start answering it?

I'm starting to comb thought diffs tagged with #editcheck-references and logging the results in this Google Sheet: Edit Check Heuristic Review

Next up

Editing Engineering to review Edit Check Heuristic Review and share what – if anything – about the information we are asking for needs to be revised before inviting volunteers at the French and English Wikipedias to start reviewing diffs and inputting what they find in this spreadsheet.

Next up

Editing Engineering to review Edit Check Heuristic Review and share what – if anything – about the information we are asking for needs to be revised before inviting volunteers at the French and English Wikipedias to start reviewing diffs and inputting what they find in this spreadsheet.

On 15 June, the Editing Team met to review a sample of edits the editcheck-reference tag identified as warranting people be prompted to consider "bolstering" with a reference.

The refinements we converged on making to the heuristic are listed below and will be implemented in T340086.

15 June Heuristic Meeting Outcomes

  1. Do not trigger Edit Check or append the editcheck-references tag when an edit involves content being removed that is adjacent to new content being added.
    • Rationale: avoid Edit Check activating for copy edits.
  2. Do not consider changes to image captions as warranting Edit Check being activated
    • Rationale: assuming policy does not require image captions be accompanied by a reference.
  3. Do not consider changes to tables as warranting Edit Check being activated
    • Rationale: assuming policy does not require changes to tables be accompanied by a reference.
  4. Do not consider changes to lists as warranting Edit Check being activated
    • Rationale: assuming policy does not require changes to lists be accompanied by a reference.