Page MenuHomePhabricator

Step 1: Only send incremental changes through the API (impact: medium)
Closed, ResolvedPublic8 Estimated Story Points



  • We send too much data with every edit and should reduce that in order to be more mobile friendly.

Acceptance criteria:

  • size of the change we are sending is reduced significantly
  • only the statements, describing the targetProperty, which changed are sent

Event Timeline

Please note: How we make the diffing has a intimidate impact on the rollback behavior.

darthmon_wmde renamed this task from Only send changed contents via API to (product decision needed - needs more talking) Only send changed contents via API.Dec 10 2019, 2:27 PM
darthmon_wmde renamed this task from (product decision needed - needs more talking) Only send changed contents via API to Step 1: Only send incremental changes through the API.Dec 10 2019, 2:49 PM
darthmon_wmde moved this task from tech things to Ready to estimate on the Wikidata-Bridge board.

I think when we created this task, we were thinking of using wbsetclaim or a similar API module, instead of wbeditentity. However, since we decided in T238662 that even “update” changes (new statement + demote existing statement) should happen in one edit, as far as I understand we’ll have to use wbeditentity at least for that case, as it’s the only API module that will let one edit more than one statement at once, I believe.

So I think the only thing that we can do to send less data to the API is to drop the unmodified parts from the JSON we send to wbeditentity? As far as I understand, if we send something like { "claims": { "P123": [...] } }, it won’t delete statements for other properties (or labels etc.), so this should be safe to do. (To remove a statement/label/etc. using wbeditentity, you explicitly have to mention it and include a "delete" key or similar, I believe.) I suppose we don’t even necessarily need any diffing for this – just remove everything except the statements for our target property?

I share this view and - conveniently - the way the task break down of T238662#5864963 went, this should be relatively straightforward thereafter.

Lydia_Pintscher renamed this task from Step 1: Only send incremental changes through the API to Step 1: Only send incremental changes through the API (impact: medium).Feb 27 2020, 9:56 AM
Lydia_Pintscher updated the task description. (Show Details)

Edits made through the Bridge have the generic "Item changed" edit summary. We should have more specific edit summaries to make it easier for people to understand what's being done in an edit.

To be clear – following the plan from T230343#5900796 will do nothing to change this.

following the plan from T230343#5900796 will do nothing to change this [specific edit summaries].

True. We can't have "one (small) API request" and "good summaries" once T238662: Step 1: make edit based on fix/update choice (impact: high) is implemented, as we sometimes have to persist two statements. Changes on the server side are needed.

Yes, you are correct. Changing edit summaries is not the goal of this ticket, changing the payload size is. Though, depending on how we implement this, the edit summaries might or might not change for the "I corrected a wrong value" edit decision. So it is mentioned here to underline that this wouldn't be an adverse side-effect.
The task to whose goal is to improve edit summaries is T233395: Step 1.2: more precise edit summaries for Bridge edits (impact: high).

Task broken down into:

  1. T248473: Refactor WritingEntityRepository to accept Entity + EntityRevision
  2. T248474: Implement wrapper WritingEntityRepository removing redundant parts
  3. T248475: Wire up wrapper EntityWritingRepository

Note that the split of this optimization into a separate service will break down once we introduce the ability to remove statements. To remove statements, you need to include "remove" keys in the data sent to the API, but that is very much an API detail that should be encapsulated tightly into the ApiWritingRepository – but then the wrapper service from T248474 will also have to move into ApiWritingRepository, otherwise that class can’t distinguish between parts that were removed as an optimization because they’re redundant, and parts that should actually be removed server-side. However, we certainly won’t need this ability (to remove statements) during step 1, and possibly not for a good while longer, so for now this split still makes sense to us.

Looking good. Payload over the wire reduced, edit history looks the same like what we saw on changes before this story.

On a random example, what gets sent is indicated by the green overlay (and "the path to it" to identify it), but the folded arrow contents outside of the overlay can be omitted.

Screenshot_2020-04-01 Screenshot.png (818×612 px, 53 KB)

Suggesting a retroactive rephrasing (that's what we planned and did, it was just poor wording) of the second AC to save our future selfs some question marks:

  • only the statements describing the targetProperty are sent

Not sure I agree with the rephrasing :) if you use the “update” edit decision, and the old statement had normal rank, then I think the old statement doesn’t get sent to the API (nothing to change there), so not even all the statements for the target property are sent. (Though the current phrasing doesn’t really make this clear either.)

Yes model: “only the necessary data is sent”? Bit vague. Hm.

"only the statements, describing the targetProperty, which changed are sent"?

Verified this is actually the case by cooking validateBridgeApplicability() and editing a multi-statement property.

Lydia_Pintscher claimed this task.
Lydia_Pintscher moved this task from Verification to Done on the Wikidata-Bridge-Sprint-17 board.
Lydia_Pintscher added a subscriber: Lydia_Pintscher.

Looking good - didn't find any breakage \o/