Page MenuHomePhabricator

Step 1: Ensure that Wikidata Bridge uses fresh entity data (impact: high)
Closed, ResolvedPublic5 Estimated Story Points

Description

As an editor, I want to edit based on the latest page data to avoid edit conflicts.

Problem:
Wikidata Bridge may currently load stale entity data, due to T128486 being unresolved.

Example:
If you edit a value in the Wikidata Bridge and reload the page, you’ll likely see the new value in the rendered page but the old value when you open the Bridge again.

Screenshot from 2019-10-01 16-05-32.png (698×723 px, 94 KB)

The Twitter hashtag value on Beta Wikidata was changed from WikidataCon to Wikidata; the infobox has the new value (was automatically updated through change dispatching), but the bridge dialog loaded the entity data via the special page and got a stale value.

BDD
WHEN reopening the Bridge after an edit
THEN the latest data from the repository is shown

Acceptance criteria:

  • always use freshest data from the repository when opening the Bridge

Info

  • after a discussion in the team + tech lead it was decided to implement this by requesting entity information via wbgetentities instead of (currently) Special:EntityData

Event Timeline

T128486 doesn’t look like it’s going anywhere (we asked in the scrum of scrums for an update on its subtask T152425 three times, to no reaction at all as far as I’m aware), so I think we’ll need to work around this in our own code. I see two options:

  • Use the wbgetentities API to download the whole entity. Always fresh, but never cached. Means more server load.
  • Use the query API to get the latest revision of the page, then load Special:EntityData for that particular revision (?revision=id). Means an extra network roundtrip, but possibility of sharing the cache with other requests for the same revision. First API request could probably be batched with other repo query requests (data bridge config, permissions).

possibility of sharing the cache with other requests for the same revision.

As noted in T220826#5185202, requests with ?revision are actually more common than requests without it (I think a lot of them might come from the query service updaters? cf. T217897), so I think there’s a decent chance that this request would actually be cached. (I think we can also track it if we want to, based on the response headers.) So to me the second approach seems preferable.

possibility of sharing the cache with other requests for the same revision.

As noted in T220826#5185202, requests with ?revision are actually more common than requests without it (I think a lot of them might come from the query service updaters? cf. T217897), so I think there’s a decent chance that this request would actually be cached. (I think we can also track it if we want to, based on the response headers.) So to me the second approach seems preferable.

So, remember there are also different formats that entiydata can return, so deeper dive on the data would be needed.
But, special entity data is now used on every page load for wikidata, returning json and using a revision id, so lots of things will be cached.

But, special entity data is now used on every page load for wikidata, returning json and using a revision id, so lots of things will be cached.

Oh right, thanks to T85499. (We also use the JSON format, so presumably we’d share that cache.)

So, this isn't currently marked as "Step 1", but probably should be?

Oh right, thanks to T85499. (We also use the JSON format, so presumably we’d share that cache.)

Indeed.

So, it would be nice to use Special:EntityData, but no point in faffing around with T128486 for bridge.
Thus bridge would have to make a further api call to get the revision ID before showing anything.

I imagine that extra roundtrip is something we want to avoid as it will slow down the experience.
So lets just go for wbgetentities?
As this data is only loaded when the edit link is clicked I don't think the uncached request is anything to worry about.

Further down the line of course it would be nice to make use of both the client and server caching of Specal:EntityData with a revisionid parameter.

Addshore renamed this task from Ensure that Wikidata Bridge uses fresh entity data to Step 1: Ensure that Wikidata Bridge uses fresh entity data.Dec 12 2019, 10:12 AM
Lydia_Pintscher renamed this task from Step 1: Ensure that Wikidata Bridge uses fresh entity data to Step 1: Ensure that Wikidata Bridge uses fresh entity data (impact: high).Feb 27 2020, 10:01 AM
Lydia_Pintscher updated the task description. (Show Details)

Rough estimates from a discussion in story time:

wbgetentities: Michael & Pablo agreed on ~5 points

fix Special:EntityData: Michael 5-8 points as well, Pablo closer to 20 points because of the unknowns

Decision from meeting (see meeting doc): we go with wbgetentities.

There is an estimate from a former story (PG & MG only) time for the decided-upon solution but given the next story time is before the next sprint start we can quickly verify this estimate with the whole team.

Discussion from task break-down:

Change 592267 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] bridge: add ApiReadingEntityRepository

https://gerrit.wikimedia.org/r/592267

Change 592268 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] bridge: use ApiReadingEntityRepository

https://gerrit.wikimedia.org/r/592268

Change 592267 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] bridge: add ApiReadingEntityRepository

https://gerrit.wikimedia.org/r/592267

Change 592268 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] bridge: use ApiReadingEntityRepository

https://gerrit.wikimedia.org/r/592268

Charlie_WMDE moved this task from Verification to Done on the Wikidata-Bridge-Sprint-18 board.
Charlie_WMDE subscribed.

looks good to me \o/

Request for the item is now performed through wbgetentities. It gets batched with the pre-existing request for the property features. In consequence we are performing fewer API requests than before. Thanks to BatchingApi this happens in a transparent fashion, keeping those concerns separated no less.