Page MenuHomePhabricator

Figure out the limits in which versions fallback to the next one
Closed, ResolvedPublic

Description

During story time discussion, a question came up regarding the storage capacity for edit summaries and whether that was taken into account when deciding on the limits described in the parent task. Since there was no definite answer at the time, we decided to look into this as part of working on the story, thus this sub-task was created to capture that part.

Find out the strategy of how we decide that the generated text of each version exceeds the limits for that version and should fallback to the next, more compact, version.

The hard limit here is the storage capacity for the summary text.

The limits maybe:

  • number of terms
  • number of languages and terms
  • length of generated text
  • ...

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

hoo renamed this task from Figureout the limits in which versions fallback to the next one to Figure out the limits in which versions fallback to the next one.May 21 2019, 1:38 PM
hoo updated the task description. (Show Details)

The actual limits after which we move from one version to another ( Expanded -> Shortend -> fallback) is to be checked and detailed during implementation.

Why is that? Do you think it likely the backend will not be able to handle the requests of the acceptance criteria?
I would not want us to increase the limits. The 5 was chosen, because most activities are below that line, and 5 is still a "small enough" number to overlook. The 50 might actually be way too big.
So please don't increase either of the two numbers, and please stay with using number of terms/ languages as deciding factors

Storage capacity for the summary is not a problem, I think. Edit summaries go to the comment table and are stored as blobs. 64kb should be plenty even if we want to store the changes/additions/deletions of terms for 50 languages.

Edit: I may have lied. We may not be able to use all of the 64kb. https://meta.wikimedia.org/wiki/Help:Edit_summary#Properties

Testing storing necessary comment arguments for the following examples of each version (if all can be stored succesfully, then the simple count limits we have in the parent task is more than sufficient:

Expanded version

Added [en] label: Frame of Notre-Dame de Paris, Added [fr] alias: Marie, Changed [es] description: this is a loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong description, Added [en] description: this is a much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much looooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooger loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooger really loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooger description

Shortend version

Changed in lang1, lang2, lang3, lang4, lang5, lang6, lang7, lang8, lang9, lang10, lang11, lang12, lang13, lang14, lang15, lang16, lang17, lang18, lang19, lang20: label, description, alias. Changed in lang21, lang22, lang23, lang24, lang25, lang26, lang27, lang28, lang29, lang30, lang31, lang32, lang33, lang34, lang35, lang36, lang37, lang38, lang39, lang40, lang40, lang41, lang42, lang43, lang44, lang45, lang46, lang47, lang48, lang49: label, description

(it will probably enough to test the first example, as the second is guaranteed not to get any longer than it is in example since lang code we store art not allowed longer than 8 or 9 characters iirc)

So trying to save the above example of long version, it ended up in database in comment table (comment_text column) as:
/* wbsetclaim-create:2||en, label, "Frame of Notre-Dame de Paris"|fr, alias, Marie|es, description, ""this is a loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong description"|en, description, "this is a much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much looooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo...

500 characters long. So I think that's our "hard" limit coming from mediawiki core level (since in DB we can really store much more than that).

When looked in core why that's truncated, turns out CommentStore in core does truncate the comment
https://github.com/wikimedia/mediawiki/blame/74b46a1f3364d5529d408c2e061dd98100b8ddbd/includes/CommentStore.php#L490
and the limit is hard set in the class (been reduced from 1000 to 500 ~2 years ago).
https://github.com/wikimedia/mediawiki/blame/74b46a1f3364d5529d408c2e061dd98100b8ddbd/includes/CommentStore.php#L37

What does that mean for us?
The bad and sad news first. I think we should not touch the hard limit of that truncation happening in CommentStore in order to avoid undesirable effects all over the place and expanding the scope of this story technically too much.
The good news. We know what the limit in characters is exactly. It is a small limit, so we might actually hit it with our expanded version if few not-that-short descriptions were provided in one edit.


I will try to come up with something reasonable in terms of how to switch from one version to the other, based on this characters limit we have.

So as for now, I propose to do the following:

  1. Use 480 as a limit for the generated summary.
  2. If we have <= 5 terms changed, we generate expanded version. Then if it is below the limit, we stop and use it.
  3. If we have <= 50 languages changed, we generate shortend version. Then if it is below the limit, we stop and use it.
  4. If we reach here, we just generate fallback version and use it.

@Lydia_Pintscher sounds good for now?

Yeah that sounds good!
I wonder: if the 500 is a configurable limit if we should take that into account as well.

It isn't as it appears in the code so far. it is hard coded in the class and used directly .. perhaps that's something we can do later and make it configurable?

Updated parent task with findings and the current solution we agreed on. Let's get the first iteration of these edit summaries out soon and then we can try to increase our limit a lot as a follow up.