Page MenuHomePhabricator

Refine Edit Check reference heuristic
Closed, ResolvedPublic

Description

In T324734, we conducted an initial review of the Edit Check reference heuristic that T324730 introduced.

This task involves the work of making some initial adjustments/improvements to said heuristic to reduce the likelihood that Edit Check would present newcomers with an invitation to include a reference within the edit they're making when a reference might not be necessary (read: refinements to reduce the likelihood of false positives).

Requirements

NOTE: the changes below ought to apply to both the logic Edit Check uses to determines whether it ought to be activated within a given edit as well as the logic that determines whether the editcheck-references tag is appended to an edit.
AdjustmentObjectiveAdjustment required
1.Decrease likelihood reference Edit Check appears when people are attempting to publish copy editsDo not trigger Edit Check or append the editcheck-references tag when an edit involves content being removed that is adjacent to new content being added
2.Decrease the likelihood that the reference Edit Check appears when people are attempting to publish changes to image captionsDo not consider changes to image captions as warranting Edit Check being activated
3.Decrease likelihood reference Edit Check appears when people are attempting to publish edits to tablesDo not consider changes to tables as warranting Edit Check being activated
4.Decrease likelihood reference Edit Check appears when people are attempting to publish edits to listsDo not consider changes to lists as warranting Edit Check being activated

Approach

To meet the Requirements described above, we are going to adjust the heuristic such that it only responds to changes to paragraphs that are at the top level of the document. This means any content that is wrapped in any other structure (e.g. tables, lists, and image captions) will be considered outside of Edit Check's scope.

Event Timeline

ppelberg moved this task from Inbox to Ready to Be Worked On on the Editing-team (Kanban Board) board.
ppelberg moved this task from Backlog to Triaged on the EditCheck board.
ppelberg moved this task from To Triage to Triaged on the VisualEditor board.

During today's team meeting, we decided to move forward with the approach described here:
Rather than excluding specific content types/entities, we're going to adjust the heuristic such that it only responds to changes to paragraphs that are at the top level of the document. This means any content that is wrapped in any other structure (e.g. tables, lists, and image captions) will be considered outside of Edit Check's scope.

Change 934663 had a related patch set uploaded (by Esanders; author: Esanders):

[mediawiki/extensions/VisualEditor@master] EditCheck: Exclude nodes that aren't at the document root (i.e. image captions, table cells)

https://gerrit.wikimedia.org/r/934663

Change 934663 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] EditCheck: Exclude nodes that aren't at the document root (i.e. image captions, table cells)

https://gerrit.wikimedia.org/r/934663

Ryasmeen subscribed.

@ppelberg: Edit check is still being triggered when an edit involves content being removed that is adjacent to new content being added.

It also gets triggered when an edit involves content being added that is adjacent to new content being added. But not sure, if this is also something we are thinking of removing from our initial heuristic.

ppelberg added a subscriber: Esanders.

@ppelberg: Edit check is still being triggered when an edit involves content being removed that is adjacent to new content being added.

It also gets triggered when an edit involves content being added that is adjacent to new content being added. But not sure, if this is also something we are thinking of removing from our initial heuristic.

Great spots, @Ryasmeen.

Per adjustment "1." in the task description, Edit Check being activated in edits when content is removed in places adjacent to new content being added is unexpected.

I'm going to assign this back over to @Esanders to have a look.

Rummana, in the meantime, can you please share a link to a diff that demonstrates the unexpected behavior described above?

@ppelberg: Edit check is still being triggered when an edit involves content being removed that is adjacent to new content being added.

It also gets triggered when an edit involves content being added that is adjacent to new content being added. But not sure, if this is also something we are thinking of removing from our initial heuristic.

Great spots, @Ryasmeen.

Per adjustment "1." in the task description, Edit Check being activated in edits when content is removed in places adjacent to new content being added is unexpected.

I'm going to assign this back over to @Esanders to have a look.

Rummana, in the meantime, can you please share a link to a diff that demonstrates the unexpected behavior described above?

Sure.
https://patchdemo.wmflabs.org/wikis/9a60b4369e/w/index.php?title=Douglas_Adams&diff=194&oldid=193

In that edit the removal is not adjacent to the insertion: there is a retained full stop / period between them. As we are about to start filtering to only new paragraphs, we can stop worrying about this issue for now.