Page MenuHomePhabricator

Create the heuristic that will [initially] trigger the reference check
Closed, ResolvedPublic

Description

This task involves the work of defining what logic will initially cause the reference check to be activated and presented to people within the visual editor.

Heuristic Requirements

This section will eventually contain the rules that comprise the initial reference check heuristic.
To start, the initial Edit Check heuristic will be initiated/activated when the change someone is attempting to make meets all of these conditions:

  1. A minimum of one new paragraph of text is added to the article they are editing
    • Where "new" in this context means the content they are adding does NOT already exist elsewhere within the article.
  2. The "new paragraph(s) of text" they are added does NOT include a reference
  3. The changes described in "1." and "2." are happening on a page within the main namespace (NS:0)

Done

  • The "Heuristic Requirements" are implemented
  • @ppelberg finds a more permanent and prominent place to store the "Research" documented below

Research

This section is a gathering place for rules we will consider including as part of the initial Heuristic Requirements.

NOTE: both sub-section below assume that a yet-to-be defined threshold of new text content has been surpassed.
What factors, unrelated to the edit at-hand, should influence whether reference check is triggered?

E.g. namespace edit being made within, the account status of the person making the edit (logged in/out), experience level of person editing (number of published edits).

We'll address this question separately in T329984

What qualifies as a "content addition" in this context?

See more in T324730#8576570.

FacetValueNotes
New sentence(s) addedTBDDepends on the outcome of T324363.
New characters addedTBD
New paragraph addedTBD
What content additions do require a reference and therefore should cause the reference check to be triggered?
CaseSupporting policyVolunteer contact(s)
New content added to an existing paragraph that contains a {{citation needed}} templateen:User:Phlsph7
New content added to an existing paragraph after a reference that's already been includeden:User:Phlsph7
New content added via reference templateen:User:Phlsph7
New paragraph added without a citation included within itWP:Verifiability (en.wiki, fr.wiki)@Xaosflux (see: Topic:Xdeybsgo5z9dvafh)
What content additions do NOT require a reference and therefore should not cause the reference check to be triggered?

See more in T326856#8550215 and T326856#8601968.

CaseSupporting policyVolunteer contact(s)
Inserting blank lines
Add an image(s) + captionWP:ORIMAGE@Sdkb, @Elmidae, en:User:Phlsph7
Adding content to a cell within an existing tableen:User:Phlsph7
Adding new content that they do accompany with a reference
Adding a template/adding new content to an existing template (e.g. infobox)en:User:Phlsph7
Adding new content to the lead section of an articleen:WP:LEADCITE, fr:Wikipédia:Résumé_introductif@Sdkb, @Elmidae
Adding new content to plot sectionsen:WP:PLOTCITE@Sdkb
Edits beng made to disambiguation pagesen:MOS:DABPAGES@Sdkb
Edits being made to ==External links== sectionsen:WP:ELRC@Sdkb
Edits to short descriptionsWikipedia:Short_description#Content Note: policy does not seem to make explicit mention of excluding references.@Elmidae
"Gnoming" (read: punctuation & capitalization changes, wikilinking & piping, formatting, adding external links, grammar + spelling fixes adding a nominal number of new characters)@Elmidae
Edits made to ==Further reading== sectionsen:Wikipedia:Further readingen:User:Phlsph7
Edits made to ==See also== sectionsen:WP:MoS/Layout#see alsoen:User:Phlsph7
Edits made to ==References== sectionsenWikipedia:MoS/Layout#Notesen:User:Phlsph7
Edits made to ==Bibliography== sectionsen:Wikipedia:MoS/Layout#Works or publicationsen:User:Phlsph7
Edits made to ==Notes== sectionsen:User:Phlsph7
Edits made to ==Bibliography== sectionsen:Wikipedia:MoS/Layout#Works or publicationsen:User:Phlsph7
Edits made to ==Selected Publications== sectionsen:User:Phlsph7
Edits made to ==Selected Works== sectionsen:User:Phlsph7

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ppelberg moved this task from Untriaged to Upcoming on the Editing-team board.
ppelberg renamed this task from [SPIKE] Create the heuristic that will [initially] trigger the reference check to Create the heuristic that will [initially] trigger the reference check.Dec 8 2022, 5:16 AM
VPuffetMichel renamed this task from Create the heuristic that will [initially] trigger the reference check to [edit check] Create the heuristic that will [initially] trigger the reference check.Dec 8 2022, 1:36 PM

Meeting notes from today:

  • User added a run on N characters (N=~50?)
  • Or user created a new sentence (if we have sentence detection)
    • Also of some minimum length?
    • @cscott: might be easier to make it “two sentences”; ie if sentence break iterator detected two sentences, then we know there was at least one complete sentence. (ie distinguishes adding “foo.” to the end of an existing sentence from adding “foo. And another sentence.”)
  • Filters:
    • There is not a reference in the added content, or even nearby?
    • The paragraph is not top level (exclude captions, table cells, list items?)
    • The added content doesn’t have references :D (or isn’t in a reference)
    • The added content is/isn’t a distinct word (i.e. whether it’s a change to the spelling of a word)
  • External factors (not part of the edit)
    • User is "new" (<100 edits?)
    • Namespace is 0 (article)
  • Implementation:
    • @cscott: If this was a service (input: diff, output: this needs a citation), then other teams could iterate on it (e.g. with machine learning)
      • @Esanders: Unless we do it all in the front end
      • @cscott: concerned that this will be a bikeshed that individual editor communities will very much want to paint; useful to try to keep this at arms length from the rest of the codebase to allow this tweaking w/o disrupting everything else
  • @cscott: If this was a service (input: diff, output: this needs a citation), then other teams could iterate on it (e.g. with machine learning)
    • @Esanders: Unless we do it all in the front end
    • @cscott: concerned that this will be a bikeshed that individual editor communities will very much want to paint; useful to try to keep this at arms length from the rest of the codebase to allow this tweaking w/o disrupting everything else

@cscott: can you please say a bit more about what "bikeshed painting" in this context would look like? E.g. volunteers wanting to tweak facets like the Filters @matmarex documented in T324730#8576570?

...I ask the above because we would like to empower experienced volunteers to be able to configure checks in ways that afford them enough flexibility to "express" the policies/conventions the projects they are a part of have agreed upon.

ppelberg renamed this task from [edit check] Create the heuristic that will [initially] trigger the reference check to Create the heuristic that will [initially] trigger the reference check.Feb 6 2023, 5:45 PM
ppelberg updated the task description. (Show Details)
ppelberg updated the task description. (Show Details)
ppelberg updated the task description. (Show Details)

About external links:

Generally speaking, if the content in question contains a URL (i.e., to a non-WMF site), then it's probably not appropriate to suggest adding a reference. Either that content shouldn't have a reference at all (e.g., external links, lists of publications) or it already does,[1] and the problem is with its formatting, rather than with its existence.

[1] https://en.wikipedia.org/wiki/Wikipedia:Inline_citation#Hyperlinking/embedded_links

To start, the initial Edit Check heuristic will be relatively straightforward in so far as it will prompt people to decide whether the change they are making warrants a reference if/when said change meets both of these conditions:

  1. A minimum of one new paragraph of text is added to the article they are editing
    • Where "new" in this context means the content they are adding does NOT already exist elsewhere within the article.
  2. The "new paragraph(s) of text" they are added does NOT include a reference

The above is informed by the decision the Editing Team made to pursue an initial approach to Edit Check that minimizes the likelihood of false positives and assumes this heuristic will evolve to become more robust and complex over time. [i]


i. T329988#8654867

ppelberg updated the task description. (Show Details)

Description
This task involves the work of defining initially trigger the reference check to be activated and presented to people.

I don't understand this description. Defining what?

Description
This task involves the work of defining initially trigger the reference check to be activated and presented to people.

I don't understand this description. Defining what?

I think there might be a small typo, but as I understand it, the plain-English version of this task is, "we need to understand what conditions have to be met in order for the edit check dialogue to be activated."

Description
This task involves the work of defining initially trigger the reference check to be activated and presented to people.

I don't understand this description. Defining what?

I think there might be a small typo, but as I understand it, the plain-English version of this task is, "we need to understand what conditions have to be met in order for the edit check dialogue to be activated."

@Sdkb what you described is accurate; thank you for clarifying.

...I'm sorry for the confusion and I appreciate you naming it, @Mathglot. I've since updated the task description to hopefully make it more clear.

"within the main namespace" should be within the configured namespaces. Surely ns:0 would be useful, but there are others such as enwiki's ns:118 (Draft:) which are also used for composing content.

Very good point, @Xaosflux. The draft and user namespaces are very often where newcomers write articles, and we'll absolutely want the edit checks to come up when that's happening. Userspace may present a challenge, since sometimes it's just used to create one's userpage or to keep track of tasks, which are not scenarios where we want to prompt people to add references.

"within the main namespace" should be within the configured namespaces. Surely ns:0 would be useful, but there are others such as enwiki's ns:118 (Draft:) which are also used for composing content.

Great spot, @Xaosflux and I agree with you in thinking that the namespace Edit Check is available within ought to be something projects ought to configure.

I've updated T330807 (the ticket we're using to define configurability requirements) to include the above. See: T330807#8665105.

Very good point, @Xaosflux. The draft and user namespaces are very often where newcomers write articles, and we'll absolutely want the edit checks to come up when that's happening. Userspace may present a challenge, since sometimes it's just used to create one's userpage or to keep track of tasks, which are not scenarios where we want to prompt people to add references.

Thank you for sharing this additional context, @Sdkb. I've included it as a reference to the updated requirements in T330807.

Change 900356 had a related patch set uploaded (by Bartosz Dziewoński; author: Esanders):

[mediawiki/extensions/VisualEditor@master] Initial edit check tagging

https://gerrit.wikimedia.org/r/900356

Change 900356 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] Initial edit check tagging

https://gerrit.wikimedia.org/r/900356

Change 903604 had a related patch set uploaded (by Bartosz Dziewoński; author: Esanders):

[mediawiki/extensions/VisualEditor@master] Only run edit check on main namespace

https://gerrit.wikimedia.org/r/903604

Change 903604 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] Only run edit check on main namespace

https://gerrit.wikimedia.org/r/903604

We're closing this ticket as refinements to the initial heuristic that this ticket implemented are happening in T340086.

ppelberg claimed this task.
NOTE: we did NOT end up implementing the "new paragraph" requirement this task described. Instead, this work will happen in T345121.