Page MenuHomePhabricator

[Story] purge cached renderings of IDs when the formatter URL changes
Open, LowPublic

Description

When we start using the "formatter URL" statements on properties to generate HTML links to authority vocabularies, we need a way to purge such cached renderings when the respective statement in the property definition changes.

Event Timeline

daniel raised the priority of this task from to Needs Triage.
daniel updated the task description. (Show Details)
daniel added a subscriber: daniel.
hoo renamed this task from [Story] purge cached renderings of IDs when the formatter URl changes to [Story] purge cached renderings of IDs when the formatter URI changes.Aug 29 2017, 7:33 PM
hoo renamed this task from [Story] purge cached renderings of IDs when the formatter URI changes to [Story] purge cached renderings of IDs when the formatter URL changes.

As an interim workaround, I've put together a quick-and-dirty script to slowly do a rolling purge of items using pywikibot. It identifies all items that have not been edited since before the formatter URL was changed, generates a list, and works through them. As it uses the PWB framework it respects maxlag, and will back off if overloaded.

https://github.com/generalist/wikidata-misc/blob/master/wikidata-purge.sh

Works fine for small sets of items, but maxlag-related delays mean it's probably not a practical solution for very heavily used properties.

As an interim workaround, I've put together a quick-and-dirty script to slowly do a rolling purge of items using pywikibot. It identifies all items that have not been edited since before the formatter URL was changed, generates a list, and works through them.

Is this scheduled?

For example we noticed that P6288 should be purged.

Is this scheduled?

For example we noticed that P6288 should be purged.

It's not scheduled in any way - just an as-needed backup - but I'll run it for P6288 today.

It has occurred to me today that this problem is (in some ways) a blessing in disguise - it helps mitigate against the effects of vandalism by deleting or changing a formatter URL.

If we do manage to solve this eg by having a server-side script purge items after the formatter URL changes, maybe it would be good to build in a natural delay - say it only runs twelve hours after the changes, or checks for new formatter URLs once a day, or something.

After purging a property and an item where the property is used, it still seems to be cached somewhere, and if a property is added after the formatter URL is changed, the link created is based on the old formatter URL.

Property information, probably including the formatter URL, is cached for 24 hours; as far as I can tell, there’s no way to actively purge that cache.