Page MenuHomePhabricator

Implement reference check for on-wiki configuration
Closed, ResolvedPublic

Description

This task involves the work of making it possible for experienced volunteers to configure the initial reference check on-wiki, on a per project basis.

Requirements

Meta

  • Each facet listed in the "Configurability" section below ought to be editable:
    • On-wiki
    • On a per-project basis
    • In a single JSON file
    • Located within the MediaWiki namespace

Configurability
Wikis, on a per-project basis, ought to be able to configure the following faces of the reference check experience:

IDConfigurable "facet"Potential value(s)Default valueImplementationNotesStatus
1.Account state: what account states Edit Check is available forLogged in, Logged out, Logged in & outLogged in & outT330112 (this ticket)๐ŸŸข Implemented
2.Experience level: what experience levels Edit Check is available forgreater or less than an integer<100 editsT330112 (this ticket)Edit count here refers to the cumulative number of edits an account has made at the project they are editing๐ŸŸข Implemented
3.Sections: what sections the reference check is not available withinThe names/titles of sections Edit Check should not be activated withinEmptyT346949E.g. en.wiki's LEADCITE policy suggests citations should not be included in the lead section of an article.Not yet implemented
4Categories: what categories an Edit Check is not available withinThe names/links of categories a page would need to exist within for Edit Check not to be activatedEmptyT347775Not yet implemented
5.Citation placement: define where a citation is automatically placed: before or after a period (.)Before period or after periodAfter periodT344962๐ŸŸข Implemented
6Character count: define the number of net new characters that need to be added for Edit Check to be activatedInteger50T330112 (this ticket)๐ŸŸข Implemented
NOTE: the above are borrowed from the "Use Cases" section of T327959.

Help Documentation

https://www.mediawiki.org/wiki/Edit_check/Configuration

Testing instructions

  1. Visit: https://en.wikipedia.beta.wmflabs.org/wiki/MediaWiki:Editcheck-config.json
  2. Verify configurable facets "1.", "2.", "5." and "6." are shown
  3. Adjust each of the values named in "2."
  4. Verify those adjustments have the impact you intended.

See: https://www.mediawiki.org/wiki/Edit_check/Configuration for additional documentation.

NOTE: the above requires you admin rights on the Beta Cluster.

Done

  • All "Decisions to be made" are addressed and documented
  • Requirements are implemented
  • Members of the Growth Team review technical implementation to ensure it the approach we're taking to start will not preclude our ability to migrate to Community Configuration 2.0 (T323811) in the future.

References

Currently wikis, use a variety of "languages"/"formats" for on-wiki scripting/coding:

  • Lua
  • Javascript
  • Regex
  • Template scripting

Decision(s) to be made

  • 1. What language/format will the heuristics Edit Check will depend on be implemented on-wiki so that experienced volunteers are able to audit and iterate upon these heuristics?

At the outset, all configurable facets of Edit Check will live within a single JSON "object."

As new checks are introduced, we anticipate expanding this "single file" to accommodate the new checks that are introduced.

See T330112#9094862 for more context.

  • 2. How will on-wiki configuration be implemented in ways such that it decreases the likelihood that people use it as a vector for exclusion? E.g. Here we're imagining a scenario where the people who are most actively configuring Edit Check are people who are motivated to minimize their workload.
    • To start, we're going to depend on wikis' existing social and technical conventions to hold people accountable to configuring Edit Check in ways that align with Wikipedia's objectives and Movement's mission and values.

This task was inspired by @Mathglot who raised this question in T327330#8607428

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

In T327330#8607428, @Mathglot wrote:
If you consider this approach, which I very much hope you do, then a major decision is what kind of format would the "rules" that live at en-wiki take? Lua? Javascript? Regex? Template scripting? Or some home-brew descriptor that you come up with, which specifies "when X happens, then do Y"? Conceptually, this places the "if... then..." as your responsibility as part of the API definition, and the "X" and "Y" as our responsibility. The simpler the format, the better, but you also want it to be able to cover all the reasonable situations. If you defined it based on template language (parser functions and so on), then you have a built in community of editors at Wikipedia with an associated privilege, so that might be an easy approach.

I appreciate you naming the decision that will need to be made around how these checks will be implemented to ensure volunteers are equipped to edit and extend them.

I've filed this ticket (T330112) to hold us accountable to thinking through and making an informed decision about implementation.

In T327330#8607428, @Mathglot wrote:
So this might mean, for example, that each rule would be implemented as a template (or possibly, a subtemplate of "Template:EditCheck", to keep them all in one place, kind of like how AWB does it). You write the initial template, the starter set of subtemplates, the doc, and let us take care of the rest.

@Mathglot are you able to share a link that you think could help me to understand what you mean when you say "how AWB does it"?

I assume you're referring to en:Wikipedia:AutoWikiBrowser. Although, I wonder whether there was a specific process/sub-page you had in mind when referring to it above.

Yes, sorry, I did mean AutoWikiBrowser. That particular comment was almost a throw-away line, and not at all important to the scheme I was sketching. The only thing I meant about AWB (which I don't use) is that it keeps all its rules in one place (or, a few big places) in the form of a file that contains hundreds of regexes for fixing typos. That's all I meant by "how AWB does it", and the only import of that statement was as a metaphor for how an implementation via templates might work, by possibly creating a template called Template:EditCheck, under which dozens of subtemplates might live, each holding a structure for one edit check task, so, also "keeps it all in one place, like AWB does it".

Since we're talking about it, I'm not sure if regexes (which I'm very familiar with) would be up to the task for everything you'd like to do. They're great for pattern matching and replacement, but we'd have to go through the edit tasks and see if and how they would be applicable. I suspect it's not well suited for it.

As far as decisions/format Q #1 (which language), there's no particular technical reason to limit it to one format or language ultimately, and as we have all of the languages named above (and more; VE uses JSON to talk to templates) an API could have implementations in multiple languages (although I can envision resource/bandwidth/biz reasons to limit it, in which case "which language" should be considered more carefully).

That earlier message was a bit of an initial brainstorm, and it's hard to brainstorm on a ticket (or maybe I'm just not used to it). I'm sure you'd get a lot more eyeballs and good feedback if you posted at the Village pump idea lab which attracts a lot of creative minds and technical people. An alternatives is WP:VPR for proposals, which is when something is more of a proposal than an idea; sounds like Edit check is a bit of both, maybe? Or on one of your mw pages about the project, such as mw:Talk:Edit_check or one of the subpages or related pages (which I'm having a hard time relocating after I found them the first time; a "See also" compendium of links at the bottom of mw:Edit check might be nice.)

Rather than make a decision about language(s) a priori, seems like it would be a good idea to experiment a bit: maybe take a handful of @Sdkb's ideas from his list and model something to figure out how it would work in practice, whether via a template, or something else. Sdkb has great insight and understanding about such things, and might be best placed to pick three or four tasks that are "most dissimilar" to each other, so most likely to exercise or require different aspects of whatever language or format is eventually used, and then we could try implementing something, with a mini-platform API mocked up (but functional) on your side, and templates (or whatever) on ours. What we learn from that could be pretty useful in deciding about Q 1 (or the knowledge gained might suggest further experiments to try). If you decided to throw it open to VPI or VPR, I bet you'd end up with a few very capable volunteers, to help model a few tasks.

The weak point in all this, is I don't know what your VE platform is, and it could be that none of these languages have access to that environment (or vice versa). I imagine the most likely one to have access is Javascript, as it acts in the browser, but I'm not sure of that one, either, so either you'd have to build that part (if that's feasible) or just go back to plan A and do everything on your side. I hope that's not necessary, though.

ppelberg renamed this task from [SPIKE] Decide how edit check heuristics will be implemented for on-wiki configuration? to [SPIKE] Decide how edit check heuristics will be implemented for on-wiki configuration.Feb 21 2023, 5:30 AM

Proposal
Per what @Esanders and I discussed offline yesterday, the below outlines how we're planning [i] to empower volunteers, on a per-project basis, to configure Edit Check on-wiki.

Near-term
To start, volunteers will be able to configure Edit Check on-wiki similarly to how they currently configure other features on-wiki [ii]: by editing JSON files within the MediaWiki namespace.

At the outset, all configurable facets of Edit Check will live within a single JSON "object." [iii] As new checks are introduced, we anticipate expanding this "single file" to accommodate the new checks that are introduced.

This โ€“ "put everything in one place" โ€“ approach flows from thinking centralizing information to start and splitting it off as needed later on seems common on the wikis, as @Mathglot noted above when referring to AutoWikiBrowser. [iv]

We are planning to move forward with the approach described with the following considerations/acknowledgements in mind:

  1. Locating Edit Check configurations within the MediaWiki namespace means a relatively small population of people will be able to adjust these configurations
  2. Deciding not to introduce a GUI for configuring Edit Check means that people who are not practiced with reading/writing JSON are not likely to feel empowered to adjust these configurations
  3. Moving forward with using existing approaches (JSON pages within the MediaWiki namespace) is the most efficient path to enabling volunteers to configure Edit Check on-wiki

Longer-term
Once Community Configuration becomes available (T323811), the Editing Team will migrate Edit Check on-wiki configuration to the new system the Growth Team is developing.

In doing so, we think a broader set of people will feel empowered to adjust these configurations as doing so will no longer depending on being practiced/comfortable with JSON. Which โ€“ in effect โ€“ will address limitation "2." of the "Near-term" approach described above.


i. Emphasis on "planning" seeing as how we'd like to consult with volunteers and the Growth Team to ensure what the Editing Team is planning will:

  • Eventually be compatible with Community Configuration 2.0 once it's ready for use and
  • Afford volunteers sufficient freedom and access to configure Edit Check to meet individual projects' needs and expectations

ii. E.g. https://en.wikipedia.org/wiki/MediaWiki:Citoid-template-type-map.json
iii. See T327959 for the facets of Edit Check that will be configurable to start
iv. T330112#8631272: "...keeps all its rules in one place (or, a few big places) in the form of a file that contains hundreds of regexes for fixing typos."

ppelberg renamed this task from [SPIKE] Decide how edit check heuristics will be implemented for on-wiki configuration to Implement reference check for on-wiki configuration.Aug 22 2023, 10:15 PM
ppelberg updated the task description. (Show Details)
ppelberg added a subscriber: KStoller-WMF.

reminder: include Martin in code review!

Per today's offline meeting, I've updated the task description and created T345218.

Change 954098 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/VisualEditor@master] Edit check configuration system

https://gerrit.wikimedia.org/r/954098

Change 954098 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] Edit check configuration system

https://gerrit.wikimedia.org/r/954098

Current state: edit check wikis will now pay attention to a special message called MediaWiki:editcheck-config.json, which is hooked up to the character-count requirement for the reference check. (This will also control what counts as "content was added" for tagging purposes, since everything hangs off of that one function.)

Example contents of MediaWiki:editcheck-config.json that would lower the character requirement:

{
    "addReference": {
        "minimumCharacters": 10
    }
}

Re: account state, do we count temp users as "logged in" or "logged out"?

Re: account state, do we count temp users as "logged in" or "logged out"?

Great spot. Per what @DLynch and I talked about offline, in this context, we're going to make it so temp users will be counted/considered as "logged out" users.

Said another way: if a wiki was to configure Edit Check such that it was only made available to "logged out" users, temporary account holders would also have access to Edit Check.


Note: we're following up with the AHT team to confirm the above would not defy/conflict with existing patterns.

Change 956505 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/VisualEditor@master] Edit check config: account-state and experience

https://gerrit.wikimedia.org/r/956505

Change 956505 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] Edit check config: account-state and experience

https://gerrit.wikimedia.org/r/956505

Everything but categories and sections are now configurable in the config-message. If you have interface-admin on beta you can go to https://en.wikipedia.beta.wmflabs.org/wiki/MediaWiki:Editcheck-config.json and adjust it.

I wrote some documentation about what the config message can contain here: https://www.mediawiki.org/wiki/Edit_check/Configuration

Everything but categories and sections are now configurable in the config-message. If you have interface-admin on beta you can go to https://en.wikipedia.beta.wmflabs.org/wiki/MediaWiki:Editcheck-config.json and adjust it.

I'm not sure if that matters for your team, but .json pages are editable by regular admins (both interface and regular admins have edit access to them).

Everything but categories and sections are now configurable in the config-message. If you have interface-admin on beta you can go to https://en.wikipedia.beta.wmflabs.org/wiki/MediaWiki:Editcheck-config.json and adjust it.

I'm not sure if that matters for your team, but .json pages are editable by regular admins (both interface and regular admins have edit access to them).

Great spot โ€“ thank you for clarifying, @Urbanecm_WMF.

Also: can I assign this task over to you to "...review technical implementation to ensure it the approach we're taking to start will not preclude our ability to migrate to Community Configuration 2.0 (T323811) in the future." ?

Also: can I assign this task over to you to "...review technical implementation to ensure it the approach we're taking to start will not preclude our ability to migrate to Community Configuration 2.0 (T323811) in the future." ?

Absolutely! Do you want me to review this task's description, or is there any other task/material I should make myself familiar with?

ppelberg updated the task description. (Show Details)

Also: can I assign this task over to you to "...review technical implementation to ensure it the approach we're taking to start will not preclude our ability to migrate to Community Configuration 2.0 (T323811) in the future." ?

Absolutely! Do you want me to review this task's description, or is there any other task/material I should make myself familiar with?

Wonderful โ€“ thank you, @Urbanecm_WMF ^ _ ^

I think the now-updated task description includes everything you might need. Tho, please let us know if there is anything we can clarify/offer to make review easier on your end!

I reviewed the task description, MediaWiki.org documentation and the example configuration file.

In general, the approach selected here should be compatible with Community configuration 2.0 (CC2.0). One of the access modes CC2.0 will provide is the ability to fetch the configuration as a raw JSON, which can be done rather easily once CC2.0 is ready and usable. I'm bit uncertain about using i18n mechanism for loading the config (in particular, I'm unsure whether the i18n framework might run into issues for large "messages", which are rather config pages), but that's something not related to CC2.0 per se.

Community configuration will (very likely) require users to provide a JSON schema for their configuration, and it will enforce the configuration file meets the JSON schema before returning it. This seems to be different from what EditCheck does now (it returns anything that's stored in the on-wiki page, regardless of validity). I'm unsure whether Edit check would need the ability to forcefully load the configuration regardless of its validity, or to completely skip the validation process. From my perspective, it seems writing a JSON schema for the configuration file as part of the CC2.0 migration would be easy, and in the meantime, https://www.mediawiki.org/wiki/Edit_check/Configuration describes what are supported configuration fields.

In case this is helpful for Editing engineers, there is a very WIP and incomplete PoC for Community configuration available at GitLab, which might be useful to better understand Growth's current thinking of Community configuration.

@ppelberg Reassigning this to you for sign-off; please feel free to ping me if there is anything else I can do to help Editing progress here.

@ppelberg Reassigning this to you for sign-off; please feel free to ping me if there is anything else I can do to help Editing progress here.

Wonderful! Thank you for this thorough and clear review, @Urbanecm_WMF.

I've shared the summary you've provided here with Editing Engineering and we'll follow up here (or in Slack) if this brings any new questions to mind.

I've shared the summary you've provided here with Editing Engineering and we'll follow up here (or in Slack) if this brings any new questions to mind.

Editing Engineering confirmed offline today that the approach the Editing Team is taking now should work well (read: be compatible) with the approach the Growth Team is implementing in Community Configuration 2.0