Page MenuHomePhabricator

Don’t cache constraint check results involving current time whose validity will change soon
Closed, ResolvedPublic

Description

If a “range” or “difference within range” check currently results in a violation, but will change to compliance within the configured cache TTL (or vice versa), don’t cache that result. (Which means, since we cache results per-entity, that we kill caching completely for that entity.)

(Note: it would be easier to just never cache constraint check results for ranges involving the current time, but then I suspect we would lose a lot of caching. Since most violations will probably not change within the cache TTL, this approach should be better.)

Patch-For-Review:

Event Timeline

Implementation sketch:

  • If one of the range endpoints is a NowValue, have the RangeHelper calculate the difference between a NowValue and the actual value in the statement.
  • If that difference is less than, say, twice the configured cache TTL, simply declare the value to depend on the fact that the current date is in the future, which will cause the value to be considered stale the next time it’s retrieved from the cache already. Since this should be a rare case, I think that’s acceptable.
  • If that difference is more than 2×TTL, declare the value to depend on the fact that “now + 2×TTL” is in the future. If the value is somehow cached for over twice the configured TTL, then it will be invalidated at that point in the future, which is fine, because we were assuming it would be invalidated via the cache TTL anyways.

The tricky bit is computing the TimeValue “now + 2×TTL” – I can’t see any provision for adding to a date in TimeValueCalculator. I guess we can just use gmmktime, gmdate etc. – the usual built-in PHP date/time functions – because for the current date, we don’t need to worry about the 32-bit precision problem?

Wait, I’m stupid… this is easy!

When does the check result become invalid? When the value which used to be in the future is now in the past.
When does that happen? Exactly on that value! We don’t need to calculate anything! A TimeValue already represents exactly the TimeValue on which that same value changes from being in the future to being in the past!

Revised implementation sketch:

  • If one of the range endpoints is a NowValue, declare the check result to depend on the fact that the value is in the future.
  • There is no second bullet point.

The more interesting part is in the CachingResultsBuilder, which now needs to determine whether a certain date is in the past or not, presumably via a newly injected RangeCheckerHelper’s getDifference method… but that was already true in the above comment.

Change 411038 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Add future time to DependencyMetadata

https://gerrit.wikimedia.org/r/411038

Change 411041 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Add DependencyMetadata to RangeChecker results

https://gerrit.wikimedia.org/r/411041

A TimeValue already represents exactly the TimeValue on which that same value changes from being in the future to being in the past!

And then I remembered that different calendars exist…

The thing is – RangeChecker already doesn’t treat those correctly. We use TimeValueCalculator to compare time values against ranges, which just throws away the calendar and assumes the value is in the Gregorian calendar. So I’m tempted to just ignore that for now. (After opening a separate task for that issue, of course, so we don’t forget about it. Edit: T187529)

Change 411236 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Add TimeValueComparer

https://gerrit.wikimedia.org/r/411236

Change 411252 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Add RangeCheckerHelper::isFutureTime()

https://gerrit.wikimedia.org/r/411252

Change 411236 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Add TimeValueComparer

https://gerrit.wikimedia.org/r/411236

Change 411038 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Add future time to DependencyMetadata

https://gerrit.wikimedia.org/r/411038

Change 411252 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Add RangeCheckerHelper::isFutureTime()

https://gerrit.wikimedia.org/r/411252

Change 411041 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Add DependencyMetadata to RangeChecker results

https://gerrit.wikimedia.org/r/411041

I think this should be working now, but I’ll try a manual test, just to make sure.

Well, this doesn’t seem to be working completely… I added a “21 February 2018” statement on 20 February 2018, and today, on 23 February 2018, I still see the same violation message:

The value for point in time (21 February 2018) should not be in the future.

So the cached result is still used, even though it should be invalidated per this task and its cache TTL is already expired. I can debug the invalidation part (this task), but apparently we can’t even rely on the cache TTL?

Nevermind, it works as intended, my cached value was from a version of the code without this fix so it was missing the futureTime indication.

The cache TTL thing is unfortunate, but I don’t see how it could be a WBQC problem…