This task involves the work of investigating what new semantic primitives could be introduced to enable staff and volunteers to understand, with more granularity, the nature of the changes made within a given VE edit, at scale.
Where "semantic primitives" in this context could refer to both types of content as well as actions...
Content types
- Sentences: the number of net new sentences that are added within a given edit. See T347644.
- Images: the number of net new images that are added within a given edit
- External links: the number of new external links that are added within a given edit
- Etc.
Actions
- Pasted text from an external source
This task builds on an existing body of work that seeks to provide the kind of "granular understanding" described above:
- T325713: Introduce a change tag to identify edits that include a reference
- T333714: Introduce a tag to identify edits that involve people adding new content
- T293465: Edit Types Research
- T324363: Investigate sentence splitting
Use cases
We think the kind of "understanding at scale" the semantic primitives this task is asking us to identify could make the following stories possible...
- As an experienced volunteer who is motivated to maintain and improve the quality of content on Wikipedia, I'd value a way to filter change logs (e.g. Special:RecentChanges) for edits that involve someone introducing a number of new sentences, so that I can more easily find and focus my attention on reviewing edits that may have an outsized impact on content quality.
- As a developer who is motivated to maintain and improve the quality of content on Wikipedia, I'd value a way of programmatically detecting what type(s) of changes an edit has introduced/is attempting to introduce so that I can develop a feature/gadget/script that offers feedback relevant to the specific change(s) someone is seeking to make. [i]
- As a member of the Editing Team who is encountering a request to introduce a new potential Edit Check (e.g. mw:Edit check/Ideas), I'd value knowing what "semantic primitives" are available within VE, so that I can more easily and accurately assess the technical feasibility of said idea.
Open questions
- 1. What is the theoretical set of "semantic primitives" that could be introduced to describe edits made with the VisualEditor and 2010 wikitext editor?
- 2. Of the primitives "1." will reveal, what – if any – new technical capabilities would need to be introduced in order to develop/start offering them?
- 3. How much data do we choose to expose about each primitive, bearing in mind that it's difficult to expose this data once other code has started depending on it?
- E.g. Might we expose the number of new sentence added? Might we expose the entire contents of the sentences that were added and leave any kind of additional computation to code that is "consuming" that data?
- 4. To what extent do we want to conform to limitations that make it feasible to provide this data on the server-side? Or, do we want a richer version on the client-side only?
- See more in T350904#9320558 via @DLynch.
Done
- Answers to all "Open questions" are documented
- An API is defined is specified that defines what data is passed about an edit
i. E.g. https://en.wikipedia.org/wiki/User:Suffusion_of_Yellow/wikilint.js