Page MenuHomePhabricator

[Story] Create better edit summaries for wbeditentity API endpoint
Open, HighPublic13 Estimated Story Points

Description

Motivation
For mobile edits, we don't want to create every action atomically. Reasons for this are:

  • The number of edits currently created by one save creates rather cluttering watchlists
  • It would create a lot of requests, and this would increase needed band width
  • It would create "part error states", where some part of an edit is saved, but some other part is not

However, editors should still be able to understand as much as possible about an edit without having to look at the diff. Thus we need edit summaries that are as specific as possible, and at the same time not overwhelming.

As a Wikidata or a Wikipedia patroler (both are affected!)
I want to understand what an edit was about just by reading the edit summary
so that I identify vandalism right from my watchlist

Acceptance Criteria

  • Provide custom generated edit summaries, in accordance with Choosing summary version mechanism

Technical thoughts

  • In order to create a general "edit summary creation" point, the edit summary should be generated as part of the wbeditentity endpoint.
  • Dealing with change operations was a struggle in the lexeme project, maybe we can learn something from there?

Edit summary versions

Expanded version
Show each change seperated by comma, in the following format:
<Added/Changed/Removed> [<language code>] <label/description/alias>: <new label/description contents>,
Example:
Added [en] label: Frame of Notre-Dame de Paris, Changed [en] label: Maja, Added [fr] alias: Marie

The wording is attempted to be as close to the current edit summary formulation as possible

Shortened version
Changed labels, descriptions and/or aliases in: <all language codes that changed >
Example:
Changed labels, descriptions and/or aliases in: fr, es, en, pl, zh, it, hu

Fallback version
Changed label, description and/or alias [whichever of them has been touched anywhere] in XX languages
Example:
Chenged label, description and/or alias in 60 languages

Choosing summary version mechanism

The limit for summary length we want to stick to for now is 480 characters in total.

Go through the following steps to choose the summary version:

  1. If 5 or less terms have changed, we generate expanded version. If this generated summary is below the limit, we stop and use it.
  2. If 50 distinct languages or less have changed, we generate shortend version. If this generated summary is below the limit, we stop and use it.
  3. Otherwise, we generate fallback version and use it.

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@alaa_wmde

  • checks if we can split changing summaries workding from the requirement of doing single request in UI to wbeditentity (i.e. wbeditentity generating separate revisions/summaries)

@Lydia_Pintscher

  • checks with @Lea_WMDE regarding the wording/detail of shortend version
Lea_WMDE updated the task description. (Show Details)

@alaa_wmde: @Lydia_Pintscher and I synced on the text, the shortened version is updated now. Concerning

@alaa_wmde

  • checks if we can split changing summaries workding from the requirement of doing single request in UI to wbeditentity (i.e. wbeditentity generating separate revisions/summaries)

For the Termbox project, we are expecting one edit in total, not multiple ones, even if they come in a single request. One of the reasons for that: We don't want to deal with states where one request went through, but others did not.
If you feel strongly about wanting to go in that direction now, could you come to our storytime today at 4 pm?

WMDE-leszek renamed this task from Create better Termbox edit summaries for wbeditentity endpoint to Create better edit summaries for wbeditentity endpoint.May 15 2019, 2:30 PM
WMDE-leszek renamed this task from Create better edit summaries for wbeditentity endpoint to Create better edit summaries for wbeditentity API endpoint.

The actual limits after which we move from one version to another ( Expanded -> Shortend -> fallback) is to be checked and detailed during implementation.

For example: it could be that one might try to generate one version and fallback to the next only if the generated test exceeded storage limits.

WMDE-leszek set the point value for this task to 13.May 21 2019, 12:51 PM

The actual limits after which we move from one version to another ( Expanded -> Shortend -> fallback) is to be checked and detailed during implementation.

Why is that? Do you think it likely the backend will not be able to handle the requests of the acceptance criteria?
I would not want us to increase the limits. The 5 was chosen, because most activities are below that line, and 5 is still a "small enough" number to overlook. The 50 might actually be way too big.
So please don't increase either of the two numbers, and please stay with using number of terms/ languages as deciding factors

Change 518721 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] Add ADR explaning the change to ChangeOp design.

https://gerrit.wikimedia.org/r/518721

The actual limits after which we move from one version to another ( Expanded -> Shortend -> fallback) is to be checked and detailed during implementation.

Why is that? Do you think it likely the backend will not be able to handle the requests of the acceptance criteria?
I would not want us to increase the limits. The 5 was chosen, because most activities are below that line, and 5 is still a "small enough" number to overlook. The 50 might actually be way too big.
So please don't increase either of the two numbers, and please stay with using number of terms/ languages as deciding factors

That was because we weren't sure about the limit as well during estimating and taking in this task. We can certainly stay with those numbers if that's important. Why is it important to stick to those numbers? I reckon one reason might be to keep it consistent between different edits, but can't see it as an important factor in here. I can be missing something else of course. Are there other reasons?

@alaa_wmde and I made some changes to the ADR for this task here.

A short summary of the change.
  • We decided to proceed with the visitor design pattern (Stateless).
  • Extended the discussion dateline (Now, 2019-07-09).
  • Added Some implementation description to the choice made.

Change 518721 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add ADR explaning the change to ChangeOp design.

https://gerrit.wikimedia.org/r/518721

Change 522154 had a related patch set uploaded (by Rosalie Perside (WMDE); owner: Rosalie Perside (WMDE)):
[mediawiki/extensions/Wikibase@master] [WIP]Add ChangeOpResult interface

https://gerrit.wikimedia.org/r/522154

Change 522430 had a related patch set uploaded (by Rosalie Perside (WMDE); owner: Rosalie Perside (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] ChangeOp::apply() should return ChangeOpResult

https://gerrit.wikimedia.org/r/522430

Change 522154 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add ChangeOpResult interface

https://gerrit.wikimedia.org/r/522154

Change 522430 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] ChangeOp::apply() should return ChangeOpResult

https://gerrit.wikimedia.org/r/522430

Change 524200 had a related patch set uploaded (by Rosalie Perside (WMDE); owner: Rosalie Perside (WMDE)):
[mediawiki/extensions/Wikibase@master] ChangeOp::apply() should return ChangeOpResult

https://gerrit.wikimedia.org/r/524200

Change 524200 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] ChangeOp::apply() should return ChangeOpResult

https://gerrit.wikimedia.org/r/524200

alaa_wmde renamed this task from Create better edit summaries for wbeditentity API endpoint to [Story ]Create better edit summaries for wbeditentity API endpoint.Jul 22 2019, 9:47 AM

In doing as it is a story and has parts of it in progress already

Sorry, this ticket slipped through for me :/ Where do we currently stand with the limitations? Did you come to strongly different conclusions than what is spelled out in the acceptance criteria?

We haven't yet reached a place to deal with limits. We are implementing the fallback version first, and once done with that we should next move to the shortend version where we will have to figure out the limits for it. Will ping you with conclusions once we have any then to get some product feedback on them.

DannyS712 renamed this task from [Story ]Create better edit summaries for wbeditentity API endpoint to [Story] Create better edit summaries for wbeditentity API endpoint.Jul 26 2019, 3:36 PM

Heads up @Lydia_Pintscher

Per our current break-down, we will go live with fallback version first, then with the other two possible together or one after another (taking their conditions into account) .. is that fine? The other option is to rollout all versions together at once instead.

The impact is on edits made through wbeditentity api, which includes new termbox. Edits through other apis/special pages will not be affected by those summaries

The fallback version will be deployed on October 2nd. Announcement is done on Project Chat and mailing-lists.

Here is a diff where one of the new fallback summaries failed to be translated to a human-readable form: https://www.wikidata.org/w/index.php?title=Property:P571&diff=1040450905&oldid=1038877052

Change 547172 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] Add missing messages for property updates

https://gerrit.wikimedia.org/r/547172

Here is a diff where one of the new fallback summaries failed to be translated to a human-readable form: https://www.wikidata.org/w/index.php?title=Property:P571&diff=1040450905&oldid=1038877052

Thanks @Pintoch for reporting this. We did indeed not supply messages for property items by mistake. Sorry about that. Once the patch is merged and deployed, those message should be formatted as expected.

Change 547172 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add missing messages for property updates

https://gerrit.wikimedia.org/r/547172

@Addshore, @alaa_wmde, @rosalieper and I just discussed possible technical solutions for this.

We concluded that the current approach of storing the edit summary as a single translatable message with a list of arguments in the comment_text field of the Comment table is not going to scale well. The summaries we want to generate are becoming increasingly elaborate and dynamic, and can no longer be represented as single messages with arguments. We already changed related stories (T224013#5598017) for this reason and are going to run into similar (and more!) issues for T224015. One additional problem is the 480 character limit (mentioned in T224014#5562933) for the comment text, which in the worst case isn't even enough to store two labels.

A more future-proof solution would be to dynamically generate the edit summary from the diff resulting from the edit, i.e. instead of storing a fixed edit summary in the database, build it from the diff via FormatAutocomments. This would work for the current task of term edit summaries, and also pave the way to have better edit summaries for edits on other entity parts (statements, lexeme fields, ...).

If we want to store the diff in the comment table, we could likely use the [comment_data](https://www.mediawiki.org/wiki/Manual:Comment_table#comment_data) field which exists for the purpose of holding "structured data that is intended to be used to provide localized versions of automatically-generated comments" which sounds promising. One major challenge we see for this is that the aforementioned FormatAutocomments hook does not (in all cases!?) contain the comment_data. See uses of [Linker::formatComment](https://codesearch.wmflabs.org/search/?q=Linker%3A%3AformatComment&i=nope&files=&repos=) which runs the hook.

Exciting developments! I am wondering if that comment_data could be (or already is) exposed in any public API? Exposing such diffs would also solve T106306 in the same go.

As an API consumer I would have a great appetite for such diffs, for instance in the EditGroups tool (see T218779#5049167 for the motivation).

Exciting developments! I am wondering if that comment_data could be (or already is) exposed in any public API? Exposing such diffs would also solve T106306 in the same go.

I couldn't find any existing API where it is currently exposed, so unfortunately it wouldn't come for free. Having code lying around that almost does the job and could possibly be reused would certainly not hurt, though.

Removing from the campsite as it is not a priority right now.