[Story] Make Special:EntityData be up to date after an edit
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Lydia_Pintscher
	Mar 1 2016, 6:05 PM

Description

When an entity is edited users expect the data they get via Special:EntityData to change as well. We need to purge the caches there after an edit.

The approach that was decided during the task inspection was to "just not cache these specific page requests" T128486#6376066

Questions from story time:

How quickly do we want Special:EntityData to have the latest data

As close to immediately as possible without making it a month long project.

Is this specific for JSON only?

JSON is the most important on (that people talk about the most), but all formats would be good

Is this for all users to just the user that made the edit?

All users

Points:

This task only talks about calls to this page that do not specify a concrete revision ID
Task inspection fnotes rom 11 August 2020 T128486#6376066

Details

	Subject	Repo	Branch	Lines +/-
	Cache Special:EntityData only if revision supplied	mediawiki/extensions/Wikibase	master	+40 -7

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Declined	None	T251245 Make Data Bridge load entity data from Special:EntityData again
Resolved	Tarrow	T128486 [Story] Make Special:EntityData be up to date after an edit
Declined	None	T152425 Use varnish xkey to purge output of Special:EntityData when appropriate

Event Timeline

Lydia_Pintscher created this task.Mar 1 2016, 6:05 PM

Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptMar 1 2016, 6:05 PM

Smalyshev subscribed.Mar 1 2016, 6:46 PM

mkroetzsch subscribed.Mar 2 2016, 8:51 AM

Use the TitleSquidURLs hook?

Does TitleSquidURLs require full list?
Because if have something like: https://www.wikidata.org/wiki/Special:EntityData/Q3361378.ttl?flavor=dump then ttl part can be a bunch of formats, and so can be flavor.

There's EntityDataRequestHandler::purgeWebCache which is supposed to do the purging and it uses EntityDataUriManager::getCacheableUrls but I don't see whether it handles flavors.

I did some quick testing and looks like action=purge does not indeed purge URL like https://www.wikidata.org/wiki/Special:EntityData/Q4115189.ttl?flavor=dump. This looks like independent bug.

If we cache them we should purge them. But I'm worried about the performance implications of sending a purge for every possible combination of parameters.

I hear the new varnish version (was it 4?) allows you to put multiple "variants" of a url into a single "bucket". That would help.

See T128667 for flavor parameter task.

hoo subscribed.Mar 3 2016, 1:03 AM

Yeah...you'd have to purge each variant of ttl and flavor parameter individually, at least right now.

Smalyshev added a comment.Mar 8 2016, 1:24 AM

This comment was removed by Smalyshev.

Lydia_Pintscher triaged this task as High priority.Apr 3 2016, 12:27 PM

Lydia_Pintscher moved this task from incoming to ready to go on the Wikidata board.

Lydia_Pintscher mentioned this in T128667: Special:EntityData with flavor is cached but not purged properly.Apr 3 2016, 12:50 PM

Danny_B added a project: Story.May 23 2016, 11:43 AM

We could rely on xkey for purging, see the discussion of xkey at T114662: RFC: Per-language URLs for multilingual wiki pages

daniel added a subtask: T152425: Use varnish xkey to purge output of Special:EntityData when appropriate.Dec 5 2016, 5:41 PM

Adding to the general Wikidata Bridge board, since this means users may see stale data when starting an edit (even though the page content will typically have the fresh data). We just discovered this on Beta:

Screenshot from 2019-10-01 16-05-32.png (698×723 px, 94 KB)

The Twitter hashtag value on Beta Wikidata was changed from WikidataCon to Wikidata; the infobox has the new value (was automatically updated through change dispatching), but the bridge dialog loaded the entity data via the special page and got a stale value.

(The termbox also loads Special:EntityData, but doesn’t have this problem because it always request the data for the mw.config.get( 'wgRevisionId' ) revision since T215786.)

WMDE-leszek subscribed.Oct 2 2019, 5:25 PM

• Pablo-WMDE mentioned this in T233314: Disable/Prevent Browser Cache of Ajax-request to Special:EntityData.Oct 2 2019, 5:26 PM

• Pablo-WMDE mentioned this in T227758: [investigate] purging strategy.Oct 10 2019, 6:55 PM

• Pablo-WMDE mentioned this in T235429: Mismatched frontend JS and backend DOM should not be shipped.Oct 15 2019, 9:08 AM

Crazy idea suggested by a pragmatic fellow programmer: why don't we simply use our API if we don't want stale information (at least as a workaround)?

That’s a possible workaround, of course, but it causes additional network traffic and server load.

It would be great to be able to use the cached special entity page when we want the bridge to work at scale to avoid increased network and server load.
But for now in an MVP I see no reason that can't use the uncached wbgetentities API?
Or alternatively call an API to initially lookup the latest revid, then call the possible cached special entity data page (but that's more work)

Lucas_Werkmeister_WMDE mentioned this in T240223: Step 1: Ensure that Wikidata Bridge uses fresh entity data (impact: high).Dec 9 2019, 2:43 PM

Addshore added a project: [DEPRECATED] wdwb-tech.Dec 11 2019, 11:52 AM

Lucas_Werkmeister_WMDE mentioned this in T98035: [Task] Drop support for php-serialized output from Special:EntityData.Feb 18 2020, 5:59 PM

WDoranWMF added a project: Platform Engineering.Feb 19 2020, 3:54 AM

Two notes:

On Special:EntityData, anything that has argument just passes through Varnish/ATS. So format=foo, revid=666, etc. All see the uncached version
We support too many formats but why not just cache invalidating of ttl of json for now? for Special:EntityData/Q666.json and Special:EntityData/Q666.ttl It's just two lines of code.

On Special:EntityData, anything that has argument just passes through Varnish/ATS. So format=foo, revid=666, etc. All see the uncached version

Are you sure about this? Because basically all of T217897: Reduce / remove the aggessive cache busting behaviour of wdqs-updater hinges on the fact that requests for a specific revision can be cached in Varnish (we didn’t use ATS at the time).

And I’m getting a cache hit:

$ curl -svo/dev/null https://www.wikidata.org/wiki/Special:EntityData/Q1.ttl?revision=1116941900 2>&1 | grep -i '^< x-cache'
< x-cache: cp3052 miss, cp3052 hit/3
< x-cache-status: hit-front

We support too many formats but why not just cache invalidating of ttl of json for now? for Special:EntityData/Q666.json and Special:EntityData/Q666.ttl It's just two lines of code.

Well, I’d prefer to “do the right thing”, and not just invalidate the formats that happen to be requested most often. If we can afford to do this.

If we cache them we should purge them. But I'm worried about the performance implications of sending a purge for every possible combination of parameters.

So how many combinations do we actually have?

5 formats, see documentation. (.html doesn’t count, that’s a redirect.) Down to four formats if we go ahead with T98035: [Task] Drop support for php-serialized output from Special:EntityData.
4 flavors: “simple”, “dump”, “long”, “full” (default).

I think that’s all the variables, so that’s 20 URLs we would need to purge. Is that enough to be a problem?

Untagging now based on discussion with @darthmon_wmde, please retag us if needed.

Lucas_Werkmeister_WMDE mentioned this in T235208: Step 1: Show updated information after saving (impact: high).Mar 10 2020, 10:52 AM

Lydia_Pintscher added a project: Wikidata-Campsite.Apr 2 2020, 8:32 AM

Lydia_Pintscher moved this task from Incoming to Unconnected Stories on the Wikidata-Campsite board.

Lucas_Werkmeister_WMDE mentioned this in T251245: Make Data Bridge load entity data from Special:EntityData again.Apr 28 2020, 10:50 AM

Lucas_Werkmeister_WMDE added a parent task: T251245: Make Data Bridge load entity data from Special:EntityData again.

In order to prepare this for a story time I think we need to get rid of the idea of purging is the solution, and instead start back at the desired behaviour from a user point of view.
A BDD or two might help here.

Things that need to be considered:

Are we talking about both logged in and logged out users?
Are we talking about requests to Special:EntityData/Q70 for example, or also or only Special:EntityData/Q70.json?

I think then for this story we might be able to make a decision about what to do technically without needing to do a purge for example.
Right now in production the WMF is trying to do less purges, rather than more.
A result of that is also that we are not really in a position to be able to push forward with any xkey cache purging.

Are we talking about both logged in and logged out users?

Both.

Are we talking about requests to Special:EntityData/Q70 for example, or also or only Special:EntityData/Q70.json?

The requests I've seen were for .json I believe. It feels icky though to have different behavior for the different export formats though.

So from IRC, the % of requests to Special:EntityData that we are talking about here is around 3.7% of requests that currently hit when we might not want them to.

Raw data @ https://docs.google.com/spreadsheets/d/1SIS5_Ch4JOj_9Fqi0JdmYcCnInYcOV-thhpRDRH9MVU/edit?usp=sharing
Generated with: P11066

One option might be to just not cache these pages in the first place.
This would result in ~800 more varnish cache misses per min for the special page.
This is nothing in comparison to the 10k per min requests that are not cached for the query service updater currently.

I can check with ops if this would be more desirable than extra purges (i believe it will be)

Lydia_Pintscher removed a project: Wikidata-Bridge.May 13 2020, 10:28 AM

Addshore renamed this task from [Story] Purge Special:EntityData JSON after edit to [Story] Make Special:EntityData be up to date after an edit.Aug 11 2020, 12:47 PM

Addshore updated the task description. (Show Details)

Addshore moved this task from Unconnected Stories to Wikidata-Campsite-Iteration-∞ (On Hold) on the Wikidata-Campsite board.

Addshore edited projects, added Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)); removed Wikidata-Campsite.

Task inspection notes:

Possible 3 approaches:

Don't cache these requests << Decided as the approach to try
Invalidate the cache on the edit
- How expensive is doing the cache invalidation on edits?
- How many cache invalidations will occur after 1 edit?
  - 20 cache invalidations (5 formats, 4 flavours)
  - 20 invalidations * 1000 edits? = 20k invalidations a min potential?
- Cache invalidation has to happen at multiple edge cache sites which = more time etc
Make the requests without the revision id a temporary redirect to the page with a revision id << Decided as the 2nd place choice if we had to reevaluate later
- Would this mean we don't cache the requests to the page with no revision id, then always send an up to date redirect, then point to a possibly cached page with revision id (YES)
- Could potentially be a breaking change? depending on how users make their API requests?
Don't cache the less used formats, do cache the more used formats and send purge requests (a combination of 1 and 2)
- The motivation of this would be to send fewer purges on every edit than number 2

Addshore updated the task description. (Show Details)Aug 11 2020, 1:50 PM

One interesting thing is that the current code seems to expect the opposite situation of what we now want to introduce:

EntityDataRequestHandler::outputData()

//FIXME: do not cache if revision was requested explicitly!
$maxAge = $request->getInt( 'maxage', $this->maxAge );
$sMaxAge = $request->getInt( 'smaxage', $this->maxAge );

// XXX: do we want public caching even for data from old revisions?
$maxAge  = max( self::MINIMUM_MAX_AGE, min( self::MAXIMUM_MAX_AGE, $maxAge ) );
$sMaxAge = max( self::MINIMUM_MAX_AGE, min( self::MAXIMUM_MAX_AGE, $sMaxAge ) );

At the time this was written (2013: I1dabe79261, I7298de0b9d), the expectation apparently was that eventually, we should only cache Special:EntityData requests without a revision ID, not ones with a revision ID.

I have a feeling that the original code didn't take into account how varnish/ATS works (or it worked differently back then)

Working on this with @guergana.tzatchkova

Change 620318 had a related patch set uploaded (by Tarrow; owner: Tarrow):
[mediawiki/extensions/Wikibase@master] Cache Special:EntityData only if revision supplied

https://gerrit.wikimedia.org/r/620318

gerritbot added a project: Patch-For-Review.Aug 14 2020, 11:24 AM

guergana.tzatchkova moved this task from Doing to Peer Review on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Aug 14 2020, 11:25 AM

guergana.tzatchkova moved this task from Peer Review to Doing on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.

guergana.tzatchkova moved this task from Doing to Peer Review on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Aug 14 2020, 1:15 PM

Change 620318 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Cache Special:EntityData only if revision supplied

https://gerrit.wikimedia.org/r/620318

ReleaseTaggerBot added a project: MW-1.36-notes (1.36.0-wmf.5; 2020-08-18).Aug 14 2020, 3:00 PM

Maintenance_bot removed a project: Patch-For-Review.Aug 14 2020, 3:10 PM

Michael moved this task from Peer Review to Test (Verification) on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Aug 14 2020, 3:42 PM

Lucas_Werkmeister_WMDE mentioned this in T260349: Wikibase does not purge cached Special:EntityData URLs when revisions or entities are deleted.Aug 21 2020, 11:33 AM

\o/

Lydia_Pintscher closed subtask T152425: Use varnish xkey to purge output of Special:EntityData when appropriate as Declined.Aug 25 2020, 3:08 PM

Lydia_Pintscher mentioned this in T152425: Use varnish xkey to purge output of Special:EntityData when appropriate.

LucasWerkmeister mentioned this in R2362:32dc9e63a0ae: Remove old TODO.Apr 15 2022, 4:47 PM

Maintenance_bot moved this task from Inbox to Active on the [DEPRECATED] wdwb-tech board.Apr 15 2022, 5:29 PM

	F30518012: Screenshot from 2019-10-01 16-05-32.png
	Oct 1 2019, 2:06 PM

	F31785447: image.png
	Apr 28 2020, 6:57 PM

[Story] Make Special:EntityData be up to date after an editClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

[Story] Make Special:EntityData be up to date after an edit
Closed, ResolvedPublic
Actions

Related Objects
Search...